Design, build, and own AWS-based MLOps infrastructure, defining standards for security, automation, cost-efficiency, and governance. Architect and operate production Kubernetes clusters, including containerizing and deploying ML models using Docker and Helm. Build and maintain CI/CD pipelines for training, validation, and deployment of ML workloads, implementing canary, blue-green, and rollback strategies.
Source Job
20 jobs similar to Senior MLOps Platform Architect
Jobs ranked by similarity.
- Operationalize data science solutions for risk-prediction products.
- Design and build ML pipelines using AWS services and tools like MLflow and Snowflake.
- Implement testing strategies within CI/CD pipelines to maintain high platform quality.
Quanata is on a mission to help ensure a better world through context-based insurance solutions.
- Design and manage AWS infrastructure for AI services.
- Implement Infrastructure as Code using Terraform.
- Collaborate with cross-functional teams to enhance performance.
Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
- Help define the direction for the team.
- Define and prioritize ML Platform initiatives.
- Enable teams to build features at scale by providing a foundation of reusable software components and infrastructure.
Motive empowers the people who run physical operations with tools to make their work safer, more productive, and more profitable. Motive serves nearly 100,000 customers – from Fortune 500 enterprises to small businesses – across a wide range of industries.
- Design, build, and maintain our petabyte-scale data and ML platform.
- Ensure reliability, security, scalability, and performance across our internal systems.
- Automate deployment pipelines, monitoring, and alerting for ML and data services.
Serve Robotics is reimagining how things move in cities with its personable sidewalk robot designed to take deliveries away from congested streets.
- Strong computer science or engineering background with 3+ years of coding experience with Python.
- Advanced knowledge of AWS services including but not limited to their ML services (AWS SageMaker and AWS Step Functions).
- Experience with ML monitoring and automation tools (MLflow, SagaMaker Pipelines).
Bluelight is a leading software consultancy dedicated to designing and developing innovative technology that enhances users' lives. With a presence across the United States and Central/South America, Bluelight is in an exciting phase of expansion, continually seeking exceptional talent to join its dynamic and diverse community.
- Foster a culture of collaboration, shared ownership, and excellence.
Octopus Deploy sets the standard for Continuous Delivery, empowering software teams to deliver value in an agile way. Founded in Australia in 2012, their team of over 300 Octonauts now spans the globe and they combine high growth and big ambitions with a sustainable, balanced working environment.
- Design and manage infrastructure-as-code with Terraform and GitOps.
- Build and maintain secure CI/CD pipelines with integrated security automation.
- Deploy and operate Kubernetes/K3s clusters in AWS GovCloud (IL5/IL6).
Rackner is a cloud-native software consultancy delivering solutions for startups, enterprises, and the public sector. They enable digital transformation through DevSecOps, AI/ML, and cloud-first innovation, solving high-impact problems and delivering secure, scalable solutions for the Department of Defense and federal health programs.
- Design and plan cloud-native systems aligned with business goals and security best practices.
- Implement and support AI-based automation tools and services.
- Continuously tune cloud and automation workloads to improve reliability and performance.
PerfectServe offers unified healthcare communication solutions to help physicians, nurses, and care team members provide exceptional patient care.
- Deploy and manage cloud infrastructure across all three clouds using Terraform IaC.
- Architect, build, and maintain reliable CI/CD pipelines in Github Actions and ArgoCD.
- Contribute to decisions around our departmental roadmap and project priorities.
Coalesce is the only data transformation and governance platform designed for the AI era, improving data professionals' lives since its founding in 2020.
- Design, implement, and manage infrastructure for our cloud-based platforms (AWS).
- Create and automate deployment pipelines using CI/CD tools (Gitlab / Github Actions).
- Ensure system scalability, availability, and reliability through proactive monitoring and automation.
Prompt is revolutionizing healthcare by delivering highly automated and modern B2B enterprise software to rehab therapy businesses, the teams within, and the patients they serve.
Deploy and monitor machine learning models in production using tools like Docker, Kubernetes, and MLflow to ensure scalability and reliability. Build and maintain data pipelines using Airflow, Spark, or Kafka to support model training and inference. Integrate ML models into business applications, collaborating with software engineers to operationalize solutions.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.
Design, implement, and maintain cloud infrastructure and deployment pipelines across AWS environments. Ensure efficient CI/CD operations and infrastructure automation. Uphold high platform reliability and security standards.
Software Mind develops solutions that make an impact for companies around the globe.
- Design, build, and maintain cloud infrastructure primarily on AWS, with exposure to GCP and Azure.
- Support developers and product teams by troubleshooting infrastructure and deployment issues.
- Enforce and promote security best practices, including least-privilege access and monitoring.
EX Squared LATAM works with international clients to build scalable, data-driven platforms that support complex digital ecosystems. They have a multicultural, LATAM-based engineering team with a culture focused on collaboration, ownership, and continuous improvement.
- Oversee the reliability, scalability, performance, and security of key production services.
- Collaborate with cross-functional teams to develop and maintain resilient infrastructure.
- Provide expert mentorship and guidance on best practices to engineers throughout the organization.
Cision is a global leader in PR, marketing and social media management technology and intelligence, helping brands and organizations connect with customers and stakeholders to drive business results. The company has offices in 24 countries throughout the Americas, EMEA and APAC.
- Building world-class AI infrastructure to support a 100+ person research team.
- Designing and scaling multi-cloud systems that support high-performance model training and inference.
- Improving monitoring, alerting and system observability for AI workloads.
Canva is redefining how the world experiences design. They have campuses in Sydney and Melbourne, co-working spaces in Brisbane, Perth, Adelaide and Auckland, and trust their employees to choose the balance that empowers them and their team to achieve their goals.
Mirantis is looking for a talented Systems/DevOps engineer to join our product team and will be designing, implementing, deploying and testing cloud infrastructure products on top of open-source components. Deploy, test, and evaluate Mirantis K8S-related products. Manage and maintain bare metal test environments, ensuring optimal performance, security, and availability.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure.
- Design and implement the "Golden Paths"—standardized, automated templates for microservices and infrastructure.
- Develop the CLI tools, portals, or API interfaces that abstract the complexity of our cloud infrastructure.
- Develop and maintain a library of modular, testable, and versioned Terraform modules.
SEON is a command center for fraud prevention and AML compliance, helping companies stop fraud, reduce risk and protect revenue. They are powered by real-time, first-party data signals, enriches customer profiles, flags suspicious behavior and streamlines compliance workflows.
- Lead and support the platform team through coaching and clear expectations.
- Own the platform strategy and roadmap, prioritizing initiatives and managing team capacity.
- Provide technical direction for the AWS- and Kubernetes-based platform.
bunch is building the backbone of private markets, combining exceptional expertise, operational excellence, and frictionless technology.
- Design, build, and scale systems, APIs, and tools for efficient software deployment and management.
- Contribute to creating secure, reliable, and scalable software that enhances developer workflows and automates infrastructure capabilities.
- Improve the overall efficiency and effectiveness of the development process.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Lead AI and ML initiatives to design and implement production-grade machine learning systems and pipelines. Develop scalable infrastructure for model training, evaluation, and deployment, ensuring reliability and observability. Collaborate with cross-functional teams to drive innovation and efficiency.
Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching.