As a Site Reliability Engineer (SRE), you are the bridge between software development and operations, helping to deliver reliable speed to clients. You'll work with multidisciplinary teams in a DevOps way, maintaining focus on production and creating necessary facilities. You'll provide guidance on production environment robustness, deployment procedures, and failure analysis.
Job listings
Work on core internal systems, tooling and automation around logging, monitoring and traceability; develop modules with scalability and availability at their core. Youβll contribute to efforts to revolutionize how cloud-based products and services are built, tested, operated and monitored. You will also maintain an understanding of system functionality and architecture, focusing on the operational aspects of the service within a team environment.
This is an opportunity to join a mission-critical engineering team that is driving the productivity and reliability of Temporalβs developers and core platforms. You will build and lead the end-to-end Software Development Lifecycle, formulate feature designs, and document design choices. You will design and build multi-component, distributed systems that operate at scale and investigate issues with a methodical approach to identify a root cause.
Weβre building systems to collect massive datasets, train models, and host them for inference. As a Platform Engineer, youβll develop infrastructure that supports millions of files, billions of vectors, and countless log messages left behind in code review.
This role involves owning the reliability, performance, and cost optimization of Acquisition.comβs platforms, designing, implementing, and maintaining infrastructure and automation to ensure systems are secure, scalable, and efficient. The role focuses on accelerating delivery while safeguarding uptime, performance, and spend efficiency, partnering with engineering and leadership to shape infrastructure and anticipating future growth.
As the SRE Manager at Shippo, you will lead a team of engineers responsible for building platforms, tooling, and infrastructure that enable product teams to operate reliable, performant, and scalable services. You will establish frameworks for observability, deployment automation, and infrastructure management that allow product teams to own their service reliability. You will maintain a strong support oriented team while building automation and enabling engineering productivity and operational excellence across the organization.
We are seeking a Cloud Engineer who loves solving complex problems and making a difference. You will drive the architecture, optimization, and governance of our mission-critical cloud infrastructure. You will be responsible for designing, implementing, and managing scalable and secure cloud infrastructure solutions across platforms such as AWS and Google Cloud.
Weβre looking for an experienced Infrastructure Engineer to join our Core Infra team and work on securing and scaling the foundation of our global platform. This role combines hands-on infrastructure engineering with a focus on security compliance, networking, and systems reliability. Youβll be deeply involved in auditing, configuring, and improving how our distributed systems run across AWS(currently) and GCP(plans).
Provide Tier 3 technical support and implement infrastructure automation, monitoring, and continuous integration/continuous deployment (CI/CD) processes to ensure secure and reliable delivery across multiple cloud environments. Collaborate with development, cybersecurity, and operations teams to maintain high system availability, integrate automated testing, and ensure compliance with DoD cybersecurity and RMF requirements.
As a Site Reliability Engineer, you will play a crucial role in the development of solutions for our Enterprise platform, developing applications that provide self-service and increased efficiency to internal customers across Cloud Operations, Engineering, Customer Success & Support, and Customer Value Management. The SRE team is looking for an engineer who is ready to constantly question the status quo with a mixture of system design, code development, deployment, automation, networking, and more.