Source Job

20 jobs similar to Senior AI Infrastructure Engineer (Europe based - Remote)

Jobs ranked by similarity.

US

  • Build and maintain infrastructure-as-code for our AWS EKS and GCP GKE clusters, plus on-premises deployments.
  • Own CI/CD pipelines and drive GitOps adoption.
  • Deploy, scale, and optimize ML/NLP inference workloads.

Vectara is the Enterprise Agent Platform that enables businesses to build and deploy governed, grounded, auditable AI agents across SaaS, VPC, and on-prem. We’re a passionate team that’s hyper-focused on solving enterprise-level technology and business problems with AI.

Europe

  • Build and operate production-grade model serving infrastructure using frameworks such as vLLM, TGI, Triton, or equivalent
  • Design and implement robust deployment pipelines with blue/green and canary rollout strategies for ML models
  • Develop and maintain auto-scaling systems, multi-model serving architectures, and intelligent request routing layers

Pragmatike is recruiting on behalf of a fast-scaling, well-funded distributed cloud infrastructure startup building next-generation AI-native cloud services. The company is redefining how compute is delivered by providing GPU-powered infrastructure for AI/ML workloads, secure storage, and high-speed data transfer through a decentralized architecture that significantly reduces environmental impact compared to traditional cloud providers.

Europe

  • Maintain and scale Kubernetes clusters, managing workloads and monitoring at production scale.
  • Manage and evolve our AWS and GCP cloud environments, balancing reliability, cost, and velocity.
  • Own and improve our CI/CD systems using GitHub Actions on our self-hosted AWS runners.

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company develops products to enhance visual communication and enterprise skill development, helping people work better. Our valuation stands at $4 billion and our culture values building and hiring smart, kind, unrelenting people.

Europe 5w PTO

  • Define and drive the roadmap for deployment, configuration, infrastructure, and operational tooling across cloud and on-premise environments.
  • Work closely with engineering, design, customer-facing teams, and customers to identify and resolve deployment friction.
  • Improve how enterprise customers install, configure, upgrade, secure, and operate Rasa in production.

Rasa is a leader in generative conversational AI, enabling enterprises to build and deliver next-level AI assistants. The company was founded in 2016 and is remote-first with a global presence.

Global

  • Design and evolve multi-provider, multi-region GPU compute clusters optimized for large-scale training.
  • Serve as the primary technical point of contact for customers running large-scale training workloads.
  • Build production-grade automation for cluster provisioning, GPU health checks, job scheduling, self-healing, and firmware/driver lifecycle management.

Andromeda Cluster gives early-stage startups access to scaled AI infrastructure. They work with leading AI labs, data centers, and cloud providers to deliver compute when and where it’s needed most and are expanding to find the brightest in AI infrastructure, research and engineering.

Europe

  • Collaborate within a multi-disciplinary team of product managers, designers, software engineers, machine learning and biomedical scientists.
  • Design, build, and maintain scalable, reliable AI systems.
  • Drive technical decisions and provide context-aware solutions for AI systems in biological research.

Owkin is an AI company with a mission to solve the complexity of biology. They are building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software.

Turkey

  • Lead the strategy and architecture for a scalable AI platform that integrates model orchestration, tool integration, and real-time decision systems.
  • Design, develop, and maintain the platform with full ownership from ideation to deployment, ensuring reliability, observability, and security.
  • Mentor engineers and collaborate across teams to evangelize AI best practices and drive the integration of AI throughout the product development lifecycle.

JumpCloud is an AI-powered unified IT management platform designed to secure the modern workforce by consolidating identity, device, and access management. The company is remote-first with teams in over 15 countries, fostering a culture that values building connections, out-of-the-box thinking, and passionate collaboration on challenging technical problems.

Global

  • Design and implement infrastructure and tools that empower our product teams to rapidly and securely iterate, emphasizing reliability and automation.
  • Influence the strategic direction of our infrastructure and operational practices, ensuring that we are well-positioned to scale and support our growing organization.
  • Take a proactive role in the resolution of production issues, ensuring that we are well-prepared to handle incidents and that we learn from them in a blameless manner.

SSV Labs is the core team behind the SSV Network - pioneering decentralized infrastructure for Ethereum staking. They are building tools, protocols, and standards to make staking more secure, scalable, and trustless.

$200,000–$250,000/yr
US Canada Unlimited PTO

  • Design the BYOC deployment model for Archie across customer environments.
  • Build and own Kubernetes-based infrastructure that runs reliably across multiple clouds and customer setups.
  • Create deployment tooling using Helm, GitOps, or similar approaches to make installation and operations repeatable.

P-1 AI is building an engineering AGI with their first product, Archie, an AI engineer. They closed a $23 million seed round and aim to put an Archie on every engineering team at every industrial company on earth.

$130,000–$160,000/yr
US

  • Design, build, and optimize cloud platform capabilities.
  • Tackle complex infrastructure challenges and raise engineering quality.
  • Apply AI and AIOps to make the platform smarter and more resilient.

PerfectServe offers Best in KLAS clinical communication and physician scheduling solutions and is a Leader in the Gartner Magic Quadrant for Clinical Communication and Collaboration. We focus on optimizing provider schedules and dynamically routing messages to advance patient care and clinical workflows, valuing growth, transparency, and innovation.

Europe

  • Build scalable Edge infrastructure, designing and maintaining delivery systems for model deployment.
  • Work with cross-functional teams to integrate complex features, translating research into hardware realities.
  • Drive automation and reliability by implementing infrastructure to test models and monitor performance.

Hudl builds great teams and hires the best to ensure employees are working with people they can constantly learn from. They provide a culture where everyone feels supported, becoming one of Newsweek's Top 100 Global Most Loved Workplaces.

Europe 5w PTO

  • Work with other Engineering teams to design sustainable infrastructure and microservice solutions.
  • Automate tools and infrastructure to reduce manual work.
  • Monitor applications and participate in an on-call rotation as required.

Bloomreach is building the world’s premier agentic platform for personalization, revolutionizing how businesses connect with their customers by building and deploying AI agents to personalize the entire customer journey. They power personalization for more than 1,400 global brands.

Europe

  • Develop and deploy LLM-based solutions and RAG architectures.
  • Contribute to the end-to-end lifecycle of AI features.
  • Integrate AI solutions into the company's cloud infrastructure.

Remote People is building the infrastructure to power borderless teams. Their technology handles global payroll, benefits, taxes, and compliance, enabling businesses to hire anyone anywhere compliantly. They are committed to building a global, diverse team representing different backgrounds, perspectives, and experiences.

Europe

  • Design, implement, and maintain cloud-based infrastructure and services at the intersection of agentic AI and biomedical data.
  • Collaborate with software engineers, data engineers, researchers and data scientists to understand their needs and implement solutions that enhance their productivity.
  • Build and lead a high-performing platform engineering team, setting a high bar for technical excellence, ownership, and accountability in the organization.

Owkin is an AI company on a mission to solve the complexity of biology. They are building the first Biology Super Intelligence (BASI) by combining powerful biological large language models, multimodal patient data, and agentic software.

Spain 6w PTO

  • Operating and evolving 100+ multi-cloud streaming clusters and related database infrastructure.
  • Diagnosing and eliminating cross-layer failure modes.
  • Designing safe upgrade and rollout strategies at scale.

Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana, its open source visualization tool. Grafana Labs helps more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and its team thrives in an innovation-driven environment.

$174,000–$233,000/yr
US 4w PTO

  • Design and implement evaluation systems and tooling to validate Oura’s custom AI models and Advisor
  • Develop novel evaluation methods to measure grounding, reliability, and actionability of LLM and agentic systems
  • Build and optimize custom AI models through fine-tuning, knowledge distillation, and quantization

Oura's mission is to empower every person to own their inner potential. Their award-winning products help their global community gain a deeper knowledge of their readiness, activity, and sleep quality by using their Oura Ring and its connected app. They are focused on helping people live healthier and happier lives, and ensure that their team members have what they need to do their best work — both in and out of the office.

5w PTO

  • Own the design, implementation, and evolution of core MLOps systems across Hyperstack.
  • Build and improve systems that orchestrate model training, fine-tuning, evaluation, and deployment.
  • Define and embed strong MLOps practices across teams.

NexGen Cloud is the company behind Hyperstack, a full-stack AI cloud serving tens of thousands of customers from AI researchers to enterprises running the world's most compute-intensive workloads. They deliver on-demand and private GPU infrastructure to teams who treat performance as a requirement, not a feature.

Europe

  • Own the architecture and delivery of production-grade LLM systems and classical ML solutions.
  • Design, evaluate, and optimize RAG pipelines (retrieval strategy, chunking, indexing, monitoring).
  • Build scalable, production-grade LLM services and agentic workflows, alongside traditional ML systems where appropriate.

Hiflylabs is a team of 250+ data and tech enthusiasts based in Budapest. They focus on data engineering, data science, artificial intelligence and application development, working on a wide range of projects around the world. Hiflylabs values its people and is committed to nurturing their personal and professional development through a mentoring system.

$100,000–$130,000/yr
Canada

  • Design, build, and maintain Kubernetes-based infrastructure and cloud environments.
  • Build and optimize CI/CD pipelines that enable fast, safe, and repeatable deployments.
  • Leverage AI coding tools and agentic workflows as a core part of your work.

Intrahealth, a subsidiary of HEALWELL AI Inc., is an enterprise class EMR provider supporting approximately 20,000 providers and the care delivery of tens of millions of patients and clients across Canada, Australia and New Zealand. Intrahealth provides a suite of flexible software solutions to a wide variety of customers including health authorities, public health, community health, home care, and primary care professionals.

$134,000–$149,000/yr
US

  • Design, implement, and operate cloud-native infrastructure for production workloads.

PointClickCare's mission is to help providers deliver exceptional care. They are a leading health tech company that’s founder-led and privately held that empowers their employees to push boundaries, innovate, and shape the future of healthcare. They have the largest long-term and post-acute care dataset and a Marketplace of 400+ integrated partners, their platform serves over 30,000 provider organizations.