Play a key role in building the next generation AI cloud platform β a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters.
Job listings
Lead and mentor a team of site reliability engineers (SREs), fostering a culture of continuous improvement. Oversee system reliability, ensuring that Camundaβs SaaS offering is highly available and performant. Provide project management support for reliability engineering initiatives, ensuring projects are delivered on time, within scope, and meet quality standards by coordinating cross-functional teams, managing timelines, and mitigating risks.
This position is in the Data Intelligence BU focusing on the development of the Telekom Data Intelligence Hub (DIH). Main tasks include setting up CI/CD pipelines, creating deployment scripts for Kubernetes, maintaining cloud-based infrastructures, and ensuring compliance with security and data privacy requirements. Act as 3rd level support in case of incidents working in an agile environment with cross-functional teams and DevOps working mode.
Ensure the reliability, scalability, and performance of Groqβs observability tools and services for provisioning and managing the full lifecycle of Groq hardware, software, and networking systems at massive scale. The observability team builds the monitoring and observability infrastructure and tooling that supports Groqβs inferencing hardware at massive scale, both in the cloud and our own datacenters.
The Hardware Infrastructure team in Groq is responsible for architecting and supporting a world class pre- and post-silicon ASIC development and verification environments that optimizes the productivity of our hardware and software engineering teams. Work closely with our systems engineering team to bring up and deploy new hardware product platforms including integrating those new platforms into established build, delivery, and verification frameworks. Automate and integrate flows into Groqβs overall infrastructure to streamline our development cycle.
Embark on a pioneering journey to drive air freight and logistics digital transformation. As a Cloud Infrastructure Engineer, you will engage with diverse stakeholders, contributing to our company cloud infrastructure, and ensuring the quality of the work. You will take charge of the infrastructure that runs our software platform, operating a lean multicloud platform that allows us to run workloads according to their needs.