Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
Collaborate with Data Science teams to apply software engineering best practices to ML workflows.
Wellhub revolutionizes workplace wellness by connecting employees to partners for fitness, mindfulness, therapy, nutrition, and sleep in one subscription. Headquartered in NYC with team members across the globe, we value wellbeing, collaboration, and different perspectives.
Build and maintain CI/CD pipelines and deployment infrastructure.
Leverage AI to automate analysis and resolution of production issues.
Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.
Define, drive, design, and build/ship end-to-end solutions that solve real customer problems.
Contribute to the end-to-end AI/ML software development lifecycle, ensuring reproducible research.
Drive architecture, design, and delivery of advanced ML systems in the Product R&D team.
Kinaxis is a global leader in modern supply chain orchestration. Known for its AI-infused platform and transparency across end-to-end supply chains, Kinaxis helps customers make faster, better decisions. The company has over 2000 employees worldwide and is recognized with Top Employer awards.
Design, build, and maintain the core infrastructure layer supporting GenAI products.
Implement secure access controls and authentication mechanisms integrated by default into the AI platform components.
Develop and manage observability, monitoring, and logging solutions for GenAI workloads and infrastructure.
PointClickCare is a healthcare technology company. This team will serve as the product owner for GenAI capabilities, closely integrated with key horizontal partners to ensure delivery of safe, scalable and high-impact AI Products.
Develop, deploy, maintain, operate, and support an Agentic AI Developer Platform.
Strongly oriented towards technical implementation and operation of the platform with hands-on experience.
Collaborate and lend experience to less experienced team members as needed.
We build modern Machine Learning systems for demand planning and budget forecasting, offering custom AI solutions to optimize cloud-based systems. We are a remote startup with a culture that values being data nerds, open team players, ownership, and a positive mindset.
Maintain, improve, and extend an AI platform already running in production.
Handle a mix of backend development, data pipelines, DevOps, and infrastructure work.
Translate business and product requirements into technical decisions independently.
Provectus is an AI consultancy and solutions provider. We help businesses adopt AI technologies, offering development and integration services. While the job posting doesn't mention company size information, they seem to foster a flexible, autonomous, and tech-forward culture.
Contribute to the development of the Everywhere Inference platform, a Kubernetes-based solution.
Design and implement APIs and developer tools to simplify deployment, management, and monitoring of AI applications.
Optimize serverless container workflows for AI workloads, ensuring performance, scalability, and seamless autoscaling.
Gcore provides infrastructure and software solutions for AI, cloud, network, and security. They have 550+ professionals globally and power everything from real-time communication and streaming to enterprise AI and secure web applications.
Own the technical direction of Remote's SRE/Platform domain.
Define and drive the reliability strategy across the platform.
Identify and lead AI enablement initiatives across the engineering organisation.
Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.
Define and evolve the architecture and roadmap for enterprise‑scale Data and AI platforms.
Design and build multi‑tenant, multi‑region, highly available AI platforms with governance.
Lead capacity planning and cost optimization strategies for GPU and CPU workloads.
NEORIS accelerates growth in Ibero‑America, combining global engineering with regional expertise. With over 60,000 professionals across 55+ countries, they offer technical specialization career paths and value responsibility, collaboration, creativity, and commitment.
Deploy and maintain infrastructure using Terraform on AWS.
Operate and govern production-grade platforms running on Kubernetes / EKS.
Build and maintain CI/CD pipelines using GitHub Actions.
Muttdata is a dynamic startup committed to crafting innovative systems using cutting-edge Big Data and Machine Learning technologies. They are looking for a hands-on DevOps to join a strategic initiative focused on deploying and operating Data & AI platforms.
Design and develop CI/CD systems for websites, services, and release workflows, and operate an EKS-based Kubernetes platform.
Diagnose debug production incidents, drive root-cause analysis, and implement improvements to enhance system reliability.
Write and maintain infrastructure as code using Pulumi or Terraform/OpenTofu across multiple AWS accounts with security-conscious practices.
Thunderbird is one of the world’s most trusted open-source email applications, empowering more than 20 million people globally. Our small but growing distributed team includes 65+ people across seven countries, and we build privacy-respecting communication tools with a collaborative, inclusive, and user-first spirit.
Build and operate the self-service infrastructure platform for developers and AI agents.
Own core platform layers including CI/CD, GitOps, IaC module catalog, and golden-path scaffolding.
Build internal tooling, observability, and metrics to make pipelines observable and improvable.
Luxury Presence is building the AI growth platform for real estate. Backed by top investors like Bessemer Venture Partners, we're a Series C company with over $100M in ARR and more than 90,000 real estate professionals using our platform.
Monitor, operate, and support production AI infrastructure platforms.
Investigate and resolve infrastructure, networking, hardware, and platform-related incidents.
Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve technical issues.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure infrastructure for AI and data-intensive applications. The company is growing and invests heavily in AI infrastructure and platform services.
Identify systemic engineering challenges across our platforms and drive their resolution.
Write code, review PRs, debug production issues, and optimize system performance.
Partner with engineering teams as a technical point of contact on complex projects.
Zeta Global is an AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to help marketers acquire, grow, and retain customers more efficiently. They were founded in 2007 and are headquartered in New York City with offices around the world.
Lead the design, implementation, manage, support and operation of cloud-native infrastructure and container orchestration platforms.
Drive platform reliability, scalability, automation, and operational excellence across critical SaaS and cloud-based workloads.
Contribute to architectural decisions, mentoring engineers, and ensuring alignment with security, compliance, and operational standards.
Availity delivers revenue cycle and related business solutions for health care professionals who want to build healthy, thriving organizations. They are a global team with headquarters in Jacksonville, FL, and an office in Bangalore, India, united by a mission to bring the focus back to patient care.
Design, train, evaluate, and ship ML systems for governance and security, starting with prompt injection detection and behavioral anomaly detection.
Build supporting infrastructure including data pipelines, feature stores, model serving, and evaluation harnesses.
Set technical direction for ML work, own architecture, evaluation methodology, and model lifecycle.
Docker provides developer tools for building, sharing, and running applications across Docker Desktop, Docker Hub, and Docker Scout. With over 20 million monthly users and a globally distributed remote-first team, Docker is trusted by solo founders to the world's largest companies.
Own and evolve the cloud platform including compute layer, EKS fleet, serverless infrastructure, networking, and cloud operations across AWS and GCP.
Design and maintain infrastructure-as-code foundation and networking layer for reliability, security, and scalability.
Build AI-powered automation for cloud infrastructure management, including policy-as-code, drift detection, and LLM-assisted runbook generation.
Webflow builds the world's leading AI-native Digital Experience Platform, empowering teams to design, launch, and optimize for the web without barriers. As a remote-first company with over 2 million users across 190 countries, it fosters a culture of trust, transparency, and creativity.
Act as pre-sales technical lead for federal pursuits, leading discovery workshops and architecting AI security solutions in SaaS and airgapped environments.
Build mission-focused demonstrations and proof-of-concept AI applications, integrating SDKs and APIs to protect computer vision, LLM, and agentic workloads.
Advise customers on securing AI infrastructure aligned to MITRE ATLAS, OWASP Top 10 for LLMs, and NIST AI Risk Management Framework.
HiddenLayer protects the world’s most valuable technologies from adversarial AI attacks. Founded by AI professionals and security specialists, the company has been recognized with awards such as RSA Innovation Sandbox Winner and CB Insights AI 100, and has a venture-backed team focused on accelerating secure AI adoption.
Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
Mentor and support AI Infrastructure & Platform Operations Engineers, sharing technical knowledge through documentation and training.
Mirantis helps organizations ship code faster on public and private clouds, providing a public cloud experience on any infrastructure from the data center to the edge. The company serves many of the world's leading enterprises, including Adobe, DocuSign, Liberty Mutual, and PayPal, and is a leader in container management.