Design and implement advanced GPU virtualization solutions.
Manage and optimize large-scale GPU and HPC clusters.
Collaborate with data science and engineering teams to optimize AI models.
Jobgether is a company that connects job seekers with potential employers. They use AI-powered matching to ensure applications are reviewed quickly and fairly, and their system identifies top-fitting candidates for hiring companies.
Build Enterprise-Scale Infrastructure leveraging infrastructure-as-code to manage complex cloud environments.
Sustain Platform Health and Performance owning critical systems in production, including reliability and security.
Enable Teams and Customers to Move Faster creating abstractions and tooling that deploy, run, and scale AI/ML workloads.
Cake is on a mission to make cutting-edge AI accessible to enterprise teams. Backed by top investors, Cake is seeing strong adoption and is positioned for rapid growth in the next 12 months, emphasizing ownership, clear communication, and collaboration.
Build self-service systems that automate managing, deploying and operating services.
Automate environment observability and resilience. Enable all developers to troubleshoot and resolve problems.
Ensure we hit defined SLOs, including participation in an on-call rotation.
Cohere is focused on scaling intelligence to serve humanity by training and deploying frontier models for developers and enterprises. They are a team of researchers, engineers, and designers. They value diversity and strive to create an inclusive work environment.
Work with research teams to design and build our training infrastructure
Prototype new training frameworks and production-ize solutions at scale
Design, optimize and test model integration infrastructure
Clarifai is a leading AI platform specializing in computer vision, NLP, LLMs, and audio recognition, helping organizations transform unstructured data into structured data. Founded in 2013, they remotely operate across multiple countries with backing from industry leaders, fostering a diverse and equal opportunity workplace.
Building world-class AI infrastructure to support a 100+ person research team.
Designing and scaling multi-cloud systems that support high-performance model training and inference.
Improving monitoring, alerting and system observability for AI workloads.
Canva is redefining how the world experiences design. They have campuses in Sydney and Melbourne, co-working spaces in Brisbane, Perth, Adelaide and Auckland, and trust their employees to choose the balance that empowers them and their team to achieve their goals.
Own the end-to-end lifecycle of ML model deployment—from training artifacts to production inference services.
Design, build, and maintain scalable inference pipelines using modern orchestration frameworks (e.g., Kubeflow, Airflow, Ray, MLflow).
Implement and optimize model serving infrastructure for latency, throughput, and cost efficiency across GPU and CPU clusters.
MARA is building a modular platform that unifies IaaS, PaaS, and SaaS which will enable governments, enterprises, and AI innovators to deploy, scale, and govern workloads across data centers, edge environments, and sovereign clouds. They are redefining the future of sovereign, energy-aware AI infrastructure.
Prototype new training frameworks and production-ize solutions at scale.
Design, optimize and test model integration infrastructure.
Clarifai is a leading, full-lifecycle deep learning AI platform for computer vision, natural language processing, LLM's and audio recognition. Clarifai was founded in 2013 and has employees remotely based throughout the United States, Canada, Argentina, India and Estonia.
Develop and manage strategic technical partnerships across the AI infrastructure ecosystem.
Support Business Development leadership as the primary technical liaison between Mirantis and strategic technology partners.
Collaborate with product management, engineering, and sales to drive joint solution development, technical validation, and technical go-to-market alignment.
Mirantis is a Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data-intensive applications. They are committed to open standards and freedom from lock-in, ensuring that customers retain full control of their infrastructure strategy.
Own the reliability, performance, and operational health of production AI systems.
Lead efforts to refactor and harden the AI codebase.
Design and build monitoring, alerting, and debugging tools.
MixMode is a leading provider of AI-powered cybersecurity solutions at scale, pioneering a patented third-wave, context-aware AI approach. Large organizations with big data workloads trust MixMode to defend their most important assets.
Design and manage AWS infrastructure for AI services.
Implement Infrastructure as Code using Terraform.
Collaborate with cross-functional teams to enhance performance.
Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Engineer features for Kubernetes and Azure Red Hat OpenShift deployment and lifecycle management.
Define deployment infrastructure architecture and drive offerings from inception to delivery.
Play an active role in container and virtualization-related projects and communities.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.
GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.
Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Design and implement scalable, secure, and cost-effective infrastructure solutions
Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.
Enable teams to build features at scale by providing a foundation of reusable software components and infrastructure.
Motive empowers the people who run physical operations with tools to make their work safer, more productive, and more profitable. Motive serves nearly 100,000 customers – from Fortune 500 enterprises to small businesses – across a wide range of industries.
Challenge advanced language models on realistic infrastructure and platform scenarios.
Verify architectural soundness and logical correctness, assess code quality and testing strategies.
Analyze performance bottlenecks and deployment risks, capture reproducible failure cases, and suggest improvements.
The company is hiring for a SWE Infrastructure Specialist. As a contractor, the employee will need to supply a secure computer and high-speed internet; company-sponsored benefits such as health insurance and PTO do not apply.
Act as a solution expert across ML domains including evaluations, training, inference, data pipelines, quality, and optimisation.
Work directly alongside product teams as a trusted partner, helping them navigate technical challenges and arrive at effective solutions.
Develop blueprints, patterns, and paved roads that allow other teams to follow proven approaches and accelerate their own implementations.
Canva is a design platform that enables users to create professional designs. They have a flagship campus in Sydney, a second campus in Melbourne, and co-working spaces in other locations, with a flexible work environment.
Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows
Develop full-stack tooling and backend services for large-scale data annotation , validation, and quality control
Improve reliability, performance, and safety across existing Python codebases
Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.
Design scalable, future-proof data platforms optimized for AI research workloads.
Build efficient self-serve data processing pipelines leveraging GCP's advanced services.
Implement guardrails for cost, quality, and performance.
AssemblyAI is at the forefront of Speech AI, creating powerful models for speech-to-text and speech understanding via an API. They're a remote team of startup veterans and AI researchers looking to build one of the next great AI companies.
Own deployment engineering projects, leading the technical execution of Parloa’s deployments inside large, complex enterprise environments.
Design for scale and resilience, architecting deployment solutions that meet enterprise-grade requirements for performance, reliability, and security.
Engineer solutions where none exist, building custom extensions, integrations, and configurations to close product gaps and meet enterprise requirements.
Parloa is a fast-growing startup in the world of Generative AI and customer service. Their voice-first GenAI platform automates customer service with natural-sounding conversations and has over 400+ employees in Berlin, Munich, and New York.
Build Containerized Agent Systems: Design and implement systems that leverage Docker containers as the ideal runtime for AI agents, ensuring isolation, scalability, and portability
Expand cagent: Maintain and evolve the open-source cagent project, adding new capabilities for containerized agent deployment and orchestration
Agent Runtime Development: Build robust infrastructure for packaging, deploying, and managing agents in containers
Docker makes app development easier so developers can focus on what matters. They are a remote-first team that spans the globe, united by a passion for innovation and great developer experiences, with over 20 million monthly users and 20 billion image pulls.