Lead the team responsible for the operational reliability of our bare metal infrastructure, networking, and system configuration that powers our product offerings in this hands-on "player/coach" role. You will help shape a critical function in a growing company, evolving the Network Operations Center (NOC) into a modern, proactive SRE function that leverages automation, data science, and reliability engineering principles.
Job listings
Lead and mentor a team of site reliability engineers (SREs), fostering a culture of continuous improvement. Oversee system reliability, ensuring that Camundaβs SaaS offering is highly available and performant. Provide project management support for reliability engineering initiatives, ensuring projects are delivered on time, within scope, and meet quality standards by coordinating cross-functional teams, managing timelines, and mitigating risks.
The responsibilities include proficiency in observability tools (New Relic, DataDog) for proactive outage reduction and rapid detection, delivering using CI/CD pipelines (AWS CodeDeploy, GitHub Actions, Jenkins), and managing remote engineering teams across multiple time zones.
The Infrastructure Platform team at Fieldguide is responsible for the stability, resiliency, security, and performance of the underlying infrastructure that powers the multinational Fieldguide cloud platform. You will be a key technical contributor collaborating with development teams and business leaders to ensure world-class scalability and reliability of our products.
Seeking an experienced technology leader to build and lead our global Technology Operations practice at Virtasant. The ideal candidate is passionate about cloud technology, has deep expertise in technology operations, has successfully led teams to deliver robust solutions in a client-facing environment and is passionate about the power of artificial intelligence to transform how organizations operate.
Tackle complex challenges by designing and implementing scalable, reliable infrastructure and services that power the future of customer engagement technology. You'll leverage your extensive expertise in backend systems and infrastructure management to enhance the performance and reliability of our platforms. Your contributions will directly influence the shaping of architecture and operational excellence needed for our product to thrive.
Weβre looking for a Principal Engineer with deep expertise in Site Reliability, Backend, or Platform Engineering to lead high-impact platform initiatives for our multi-tenant SaaS infrastructure. This role blends technical vision, architectural leadership, and organizational influenceβshaping how GitLabβs infrastructure scales to meet both business demands and customer expectations.
We are seeking a DevOps & Site Reliability Engineer to join a growing AI-focused SaaS startup. In this role, youβll be responsible for maintaining, optimizing, and scaling the infrastructure that supports our platform, ensuring high availability, performance, and reliability. Youβll work closely with development and product teams to improve deployment processes, monitor systems, and respond to incidents proactively.
As a Senior Site Reliability Engineer, you will help enhance the stability, performance, and observability of platforms, focusing on maintaining and optimizing the current infrastructure and ensuring strong monitoring coverage. You will also support compliance and security practices and collaborate closely with development teams to supervise the platforms, optimize system behavior, and drive improvements in security and documentation practices.