Own performance optimization and reliability of large-scale GPU clusters and InfiniBand networking for HPC workloads.
Diagnose and resolve complex system-level issues across GPU, network, and compute layers, integrating new hardware components.
Develop automation for monitoring, fault detection, and proactive remediation in distributed compute environments.
Our partner is building a next-generation AI cloud infrastructure environment, focusing on large-scale high-performance computing systems. They foster a highly technical engineering culture with experts across systems, networking, and virtualization, offering career development and continuous learning opportunities.
Design and build the orchestration layer using Kubernetes, Slurm, or comparable technologies.
Build customer-facing platform APIs, CLIs, web portals, and SDKs.
Drive infrastructure-as-code, multi-tenant isolation, and platform reliability.
GPU One provides GPU-as-a-Service (GPUaaS), turning raw GPU infrastructure into a usable cloud platform. The company is building a multi-tenant orchestration layer to serve customers at scale, with a focus on platform engineering and AI infrastructure.
Lead investigation and resolution of complex infrastructure, networking, and platform incidents.
Provide technical leadership for Kubernetes platform operations and drive automation initiatives.
Mentor engineers and develop operational standards, runbooks, and best practices.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI, machine learning, and data-intensive applications. Serving enterprises like Adobe, PayPal, and Volkswagen, Mirantis is committed to open standards and freedom from lock-in.
Operate and maintain Linux-based infrastructure, deploy and scale Kubernetes clusters, and implement automation with Ansible and GitOps.
Design networking architecture, build observability stacks, and lead incident response across the platform.
Manage virtualization layers and collaborate with development teams to optimize resource utilization and system availability.
Pragmatike develops cutting-edge solutions in Cloud Computing, focusing on ambitious projects with a culture of collaboration and innovation. The team is passionate and collaborative, working in a dynamic and flexible environment to shape tomorrow's technologies.
Own core compute infrastructure across multiple cloud providers and regions.
Design capabilities for greater performance and flexibility in service deployment.
Investigate and resolve challenging cloud and compute issues across the stack.
Render is a cloud platform for developers building AI-native, full-stack, multi-service applications. Trusted by over 6 million developers, the company has raised $257M in funding and values craft, velocity, and user experience.
Diagnose and resolve complex production issues across Linux, Kubernetes, networking, storage, and GPU systems.
Act as a senior escalation point for critical incidents, collaborating with engineering teams on root cause analysis.
Develop tools and automation in Python, Bash, or Go to improve troubleshooting efficiency and observability.
The partner company provides advanced AI and cloud infrastructure solutions, supporting large-scale distributed computing and AI workloads. They operate in a fast-moving, collaborative environment with highly skilled engineering teams focused on cutting-edge technology and operational excellence.
Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
Mentor and support AI Infrastructure & Platform Operations Engineers, sharing technical knowledge through documentation and training.
Mirantis helps organizations ship code faster on public and private clouds, providing a public cloud experience on any infrastructure from the data center to the edge. The company serves many of the world's leading enterprises, including Adobe, DocuSign, Liberty Mutual, and PayPal, and is a leader in container management.
Monitor, operate, and support production AI infrastructure platforms.
Investigate and resolve infrastructure, networking, hardware, and platform-related incidents.
Collaborate with engineering teams, hardware vendors, and datacenter personnel to resolve technical issues.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure infrastructure for AI and data-intensive applications. The company is growing and invests heavily in AI infrastructure and platform services.
Troubleshoot and resolve issues in customer environments based on Linux, OpenStack, Kubernetes, and networking technologies, owning escalations end-to-end.
Reproduce customer issues in labs, confirm bug reports, and collaborate with the development team to improve product stability.
Communicate with customers during incidents via email and remote sessions, guiding them through troubleshooting and resolution processes.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI and data-intensive applications. With deep expertise in open source and Kubernetes, Mirantis empowers platform engineering teams across enterprises worldwide.
Operate and maintain large-scale Linux environments (bare metal, clusters, cloud) and monitor system health to ensure high availability.
Help scale clusters toward hundreds to thousands of nodes, improving performance, reliability, and resource utilization.
Automate operational tasks using Python, Bash, Ansible, or Terraform and contribute to system design and architecture decisions.
Mistral AI builds high-performance, open, and efficient AI systems to power next-generation applications. We are a collaborative, low-ego, and highly technical team operating across Europe, the US, and beyond, scaling rapidly to support thousands of nodes.
Write and maintain test suites that give the team confidence to ship.
Package and deploy open source software components in Kubernetes environments.
Contribute to internal tooling, dashboards, and documentation that make complex systems understandable.
Defense Unicorns delivers mission value by streamlining software delivery for defense and government customers. Their team of innovators, software engineers, and veterans focuses on security, speed, and user experience in a remote-first culture.