Source Job

Europe 5w PTO

  • Design, implement, and manage AI Platform architecture.
  • Control AI-related costs, including models, GPUs, and other resources.
  • Collaborate with ML teams to operationalize AI models and integrate them into systems.

Kubernetes Terraform Python

20 jobs similar to Senior Platform Engineer - AI Team

Jobs ranked by similarity.

India

  • Design and manage AWS infrastructure for AI services.
  • Implement Infrastructure as Code using Terraform.
  • Collaborate with cross-functional teams to enhance performance.

Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

US Unlimited PTO

  • Influence the technical direction for infrastructure and platform capabilities that support our rapidly growing AI product suite.
  • Architect and evolve our cloud infrastructure (primarily on AWS) to support current and future products.
  • Mentor and level up engineers across Platform and product teams; review design docs, guide architecture decisions, and model high standards.

Rad AI is on a mission to transform healthcare with artificial intelligence. Our AI-driven solutions are revolutionizing radiology—saving time, reducing burnout, and improving patient care. Rad AI has secured over $140M in funding and our valuation is at $528M.

North America Unlimited PTO

  • Build and operate scalable backend services and internal APIs for the AI platform.
  • Integrate LLMs and AI tool execution into reliable, production-ready workflows.
  • Own production reliability for AI platform infrastructure through observability, alerting, and incident response.

MaintainX is the world's leading Asset and Work Intelligence platform for industrial and frontline environments. They are a modern IoT-enabled cloud-based tool for reliability, safety, and operations on physical equipment and facilities, powering operational excellence for 13,000+ businesses. MaintainX recently completed a $150 million Series D round, at a valuation of $2.5 billion.

Australia

  • Support and evolve the reliability of platforms used by the AI Research team.
  • Ensure production services meet expectations for availability, latency, and operational readiness.
  • Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps.

Algolia is a pioneer and market leader in AI Search, empowering 17,000+ businesses to deliver blazing-fast, predictive search and browse experiences. They have raised $150 million in Series D funding, quadrupling their valuation to $2.25 billion, investing in their market-leading platform.

Global

  • Design, implement, and maintain high-performance ML training and inference platforms.
  • Ship tools that allow any ML engineer to deploy a model in minutes, not days.
  • Improve scalability, reliability, and cost efficiency of model training and serving systems.

Speechify's mission is to make sure that reading is never a barrier to learning. With nearly 200 people around the globe working in a 100% distributed setting, Speechify's team includes frontend and backend engineers, AI research scientists, and others.

US Unlimited PTO

  • Build Enterprise-Scale Infrastructure leveraging infrastructure-as-code to manage complex cloud environments.
  • Sustain Platform Health and Performance owning critical systems in production, including reliability and security.
  • Enable Teams and Customers to Move Faster creating abstractions and tooling that deploy, run, and scale AI/ML workloads.

Cake is on a mission to make cutting-edge AI accessible to enterprise teams. Backed by top investors, Cake is seeing strong adoption and is positioned for rapid growth in the next 12 months, emphasizing ownership, clear communication, and collaboration.

Global

  • Deploy and manage AI agents and multi-agent workflows
  • Configure and enforce access control, permissions, and knowledge boundaries
  • Maintain governance standards and audit trails

SPACE44 builds and operates software systems for companies that need technology to work reliably in real, day-to-day operations. They work as long-term engineering partners, embedding experienced engineers into client environments and taking responsibility for execution, stability, and ongoing improvement of production systems.

US

  • Architect and deploy secure, scalable infrastructure using Terraform, CloudFormation, or similar tools.
  • Ensure the platform meets strict SLA requirements for enterprise clients, minimizing downtime.
  • Implement comprehensive monitoring, logging, and alerting to provide deep visibility into system health.

Filevine provides cloud-based workflow tools for legal professionals, helping them manage organizations and serve clients. They are recognized as a fast-growing and innovative technology company with a team of passionate professionals.

$100,000–$185,000/yr
US Unlimited PTO

  • Work hands-on with the infrastructure that supports our distributed & highly scalable services.
  • Gather requirements from customers and adapt manifests and software to support new environments.
  • Automate and optimize the release pipeline to make it as frictionless as possible.

Arize AI is transforming the world by providing a leading AI observability and evaluation platform. They empower AI engineers to ship high-performing, reliable agents and applications, unifying build, test, and run in a single workspace, with over 150 leading enterprises as customers.

US

  • Make deployments boring (in the best way possible)
  • Own CI/CD pipelines: optimize build times, improve caching, reduce flakiness
  • Evolve our Kubernetes (EKS) deployment strategy for reliability and speed

Obvious is building an AI-native workspace, an operating system for work that puts co-intelligence at the center. They are a small and talent-dense team with world-class builders, former founders, and leaders from companies like Netflix, Google, and Meta.

US

  • Develop and manage strategic technical partnerships across the AI infrastructure ecosystem.
  • Support Business Development leadership as the primary technical liaison between Mirantis and strategic technology partners.
  • Collaborate with product management, engineering, and sales to drive joint solution development, technical validation, and technical go-to-market alignment.

Mirantis is a Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data-intensive applications. They are committed to open standards and freedom from lock-in, ensuring that customers retain full control of their infrastructure strategy.

$250,000–$325,000/yr
US

  • Design cloud-native architectures for agentic AI workloads using Kubernetes/EKS, Terraform, Docker, serverless APIs, AWS Batch, and async orchestration frameworks.
  • Define agentic system patterns using LangChain, LangGraph, Autogen, LlamaIndex, Pinecone, and other multi-agent frameworks; ensure consistency of prompt/tool design.
  • Architect vector database, RAG, embeddings pipelines, and model-serving endpoints (LLM/SLM) with strong emphasis on scalability and latency management.

AHEAD builds platforms for digital business by weaving together advances in cloud infrastructure, automation and analytics, and software delivery, helping enterprises deliver on the promise of digital transformation. They prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard.

  • Helping improve the infrastructure and data platform using a lean approach.
  • Creating a data platform and infrastructure optimized for developments using Machine Learning and massive data processing.
  • Improving the development experience and spreading the DevOps culture in the company.

Clarity AI is a global tech company founded in 2017 with a mission to bring societal impact to markets. They leverage AI and machine learning to provide data, methodologies, and tools to investors, governments, companies, and consumers for informed decisions; they are a team of over 300 individuals with offices in New York, Madrid, London, Paris, and Abu Dhabi, backed by investors like BlackRock and SoftBank. .

Global 6w PTO 26w maternity

  • Build self-service systems that automate managing, deploying and operating services.
  • Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems.
  • Ensure we hit defined SLOs, including participation in an on-call rotation.

Cohere is focused on scaling intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers. They value diversity and strive to create an inclusive work environment.

$125,600–$157,000/yr
US

  • Design, build, and scale enterprise-grade AI/ML systems that power internal workflows and external-facing AI/ML platforms.
  • Develop a production-ready Generative AI and MLOps platform with reusable components used to deploy multiple AI solutions across Natera’s business units.
  • Implement cloud-native infrastructure for large-scale model training and serving using Kubernetes, MLflow, Terraform, and AWS-native services

Natera is a global leader in cell-free DNA (cfDNA) testing. They are dedicated to oncology, women’s health, and organ health, aiming to make personalized genetic testing and diagnostics part of the standard of care. The Natera team consists of highly dedicated statisticians, geneticists, doctors, laboratory scientists, business professionals, software engineers and many other professionals from world-class institutions.

Europe 6w PTO

  • Own deployment engineering projects, leading the technical execution of Parloa’s deployments inside large, complex enterprise environments.
  • Design for scale and resilience, architecting deployment solutions that meet enterprise-grade requirements for performance, reliability, and security.
  • Engineer solutions where none exist, building custom extensions, integrations, and configurations to close product gaps and meet enterprise requirements.

Parloa is a fast-growing startup in the world of Generative AI and customer service. Their voice-first GenAI platform automates customer service with natural-sounding conversations and has over 400+ employees in Berlin, Munich, and New York.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

Europe 5w PTO

  • Guide the technical direction of Bondora’s ML engineering stack by selecting, evaluating, and implementing technologies to improve scalability and reliability.
  • Lead complex, high-risk, or cross-departmental projects that directly influence Data Science delivery, risk model performance, and production stability.
  • Act as the bridge between Data Science, Data Engineering, and Development to identify and solve systemic technical challenges.

Bondora's mission is to empower people to enjoy life more while alleviating the stress of managing finances. Founded in 2008, Bondora has served over 1 million customers for 16 years and is rapidly growing as a fintech company, set to acquire a banking license and expand investment and loan products across Europe.

Global

  • Build and scale ML-optimized HPC infrastructure by deploying and managing Kubernetes-based GPU/TPU superclusters across multiple clouds.
  • Optimize for AI/ML training by collaborating with cloud providers to fine-tune infrastructure for cost efficiency, reliability, and performance.
  • Troubleshoot and resolve complex issues and proactively identify infrastructure bottlenecks, performance degradation, and system failures.

Cohere's mission is to scale intelligence to serve humanity by training and deploying frontier models for developers and enterprises who are building AI systems. The company is composed of researchers, engineers, and designers passionate about their craft and believes that a diverse range of perspectives is a requirement for building great products.

$133,109–$239,596/yr
US 4w PTO

  • Develop scalable MLOps pipelines for model training, validation, deployment, and monitoring using AWS services
  • Implement infrastructure as code and CI/CD workflows to support rapid experimentation and reliable production releases
  • Collaborate with data scientists to productionize ML models and ensure reproducibility, versioning, and traceability

Experian is a global data and technology company, powering opportunities for people and businesses around the world. A FTSE 100 Index company listed on the London Stock Exchange (EXPN), they have a team of 23,300 people across 32 countries and corporate headquarters are in Dublin, Ireland.