Source Job

US

  • Leverage infrastructure as code (Terraform) to build and maintain complex production and analytics workflows including networking and containerized services.
  • Rapidly diagnose and resolve faults in system services as part of a 24/7 on-call rotation focused on actionable alerting and eliminating toil.
  • Improve speed of delivery by developing and maintaining CI/CD pipelines.

Terraform Python Typescript AWS CI/CD

20 jobs similar to Site Reliability Engineer

Jobs ranked by similarity.

US Canada Europe

  • Design, build, and maintain highly available, scalable infrastructure.
  • Manage and optimize infrastructure across GCP, AWS, Azure, and other cloud providers.
  • Develop comprehensive monitoring, logging, and alerting systems.

Bobsled is seeking a Site Reliability Engineer to enhance its data-sharing platform's reliability and scalability. We're a company that values growth, offering flexible work hours in a fully remote environment and fully sponsored individual coaching for all employees.

Europe South America

  • Design, build, and maintain efficient and reliable software and infrastructure delivery pipelines on AWS
  • Recommend upgrades to services as/when new features on the underlying platform (AWS) are built and functioning
  • Implement and maintain infrastructure as code (IaC) using tools like Terraform

They build and deploy software and infrastructure delivery pipelines. They optimize and maintain production systems and services, set up, monitor and observe key alerts, and balance service reliability with delivery speed.

US

  • Architect and deploy secure, scalable infrastructure using Terraform, CloudFormation, or similar tools.
  • Ensure the platform meets strict SLA requirements for enterprise clients, minimizing downtime.
  • Implement comprehensive monitoring, logging, and alerting to provide deep visibility into system health.

Filevine provides cloud-based workflow tools for legal professionals, helping them manage organizations and serve clients. They are recognized as a fast-growing and innovative technology company with a team of passionate professionals.

  • Designing, building, and maintaining infrastructure that enables fast, reliable, and secure product delivery.
  • Improving and maintaining CI/CD pipelines to streamline deployments and increase reliability.
  • Contributing to infrastructure reliability and ensuring systems are designed for resilience and growth.

Incident.io is the leading AI incident response platform, built to help teams dramatically reduce incident response time and improve reliability. They have raised $100M from Index Ventures, Insight Partners, and Point Nine, alongside founders and executives from world-class technology companies.

US Unlimited PTO

  • Contribute to high impact AWS cloud infrastructure initiatives.
  • Participate in operability and production readiness reviews.
  • Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

$120,000–$145,000/yr
Global

  • Automate and scale infrastructure provisioning using Infrastructure-as-Code to support self-service for engineering teams
  • Maintain and improve CI/CD pipelines, tooling, and deployment workflows across multiple services
  • Monitor and troubleshoot systems to ensure high availability, performance, and reliability

H1's mission is to provide a platform that can optimally inform every doctor interaction globally in order to promote health equity and build needed trust in healthcare systems. They harness the power of data and AI-technology to unlock groundbreaking medical insights and convert those insights into actions that result in optimal patient outcomes and accelerates an equitable and inclusive drug development lifecycle.

Latin America

  • Design, implement, and manage cloud infrastructure using Infrastructure as Code (IaC) tools.
  • Design, build, and maintain scalable CI/CD pipelines using tools like CircleCI or GitHub Actions.
  • Implement and maintain observability tooling (Prometheus, Grafana, Datadog), and lead incident response to ensure system reliability.

Engine is transforming business travel into something personalized, rewarding, and simple. More than 20,000 companies already rely on Engine to support over 1 million travelers and billions in annual bookings each year.

$219,000–$245,000/yr
US Unlimited PTO

  • Architect, operate, improve and secure the platform the Garner Health app runs on
  • Boost development velocity and productivity
  • Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

$29,905–$49,842/yr
Europe

  • Collaborate with Engineering, QA, & Support teams.
  • Break down projects into manageable tasks.
  • Maintain infrastructure in on-premises & AWS environments.

Turnitin partners with educational institutions to promote honesty, consistency, and fairness across all subject areas and assessment types. It is a global organization with team members in over 35 countries and offers a remote-centric culture.

$98,583–$138,016/yr
US Unlimited PTO

  • Respond to production incidents and contribute to post-incident analysis.
  • Identify and automate manual processes to improve efficiency and reduce risk.
  • Enhance monitoring tools and platforms to improve observability.

Restaurant365 is a SaaS company that provides a unique, centralized solution for accounting and back-office operations for restaurants. They focus on empowering team members to produce top-notch results while elevating their skills.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

Global

  • Automate deployments utilizing custom templates for customer environments on AWS.
  • Architect AWS environment best practices and deployment methodologies.
  • Create automation tools and processes to improve day to day functions.

Rackspace is a technology services company. They specialize in helping businesses manage their cloud infrastructure.

$140,200–$175,200/yr
US

  • Own the entire Laboratory Operations Software release process execution, ensuring smooth and timely software releases with minimal downtime.
  • Act as an internal consultant and subject matter expert, coaching individual product teams on best-in-class DevOps practices.
  • Continuously improve and automate infrastructure provisioning, configuration management, application deployment, and testing using tools like Terraform, Kubernetes and CI/CD.

Natera is a global leader in cell-free DNA (cfDNA) testing, dedicated to oncology, women’s health, and organ health, aiming to make personalized genetic testing standard. The Natera team consists of highly statisticians, geneticists, doctors, laboratory scientists, business professionals, software engineers and many other professionals from world-class institutions, who care deeply for the work and each other.

$100,000–$165,000/yr
Europe Latin America 3w PTO

  • You’ll lead the initial setup of our DevOps and platform engineering practices
  • You’ll design and deliver an internal platform for personal or feature environments to boost developer velocity
  • You’ll build and maintain AWS-based infrastructure for performance, scale, and security

DualEntry, founded in 2024, is a rapidly growing AI startup focused on revolutionizing the finance industry. Our AI-native ERP platform helps accounting teams achieve more with less effort, automating manual data entry using AI for businesses ranging from $5M-ARR to NYSE-listed companies.

US

  • Own developer operations and platform reliability across Introzy’s product stack.
  • Lead how we run infrastructure on Render, design and evolve our observability and alerting, shape our CI/CD and release practices.
  • Continuously improve internal developer experience so the engineering team can ship quickly and safely.

Introzy is a multi-app platform designed to unify networking, workflow, and productivity. As a subsidiary of Sanguine Technology Solutions, they are an early-stage company moving fast to deliver value, with a lean engineering team and a culture that embraces AI.

  • Contribute to the design, development, and implementation of platform components and services using infrastructure-as-code principles.
  • Identify opportunities for automation and develop solutions to streamline operational tasks, improve efficiency, and reduce manual intervention.
  • Participate in the provisioning, configuration, and management of AWS resources, ensuring adherence to best practices and security standards.

Mambu is a leading SaaS cloud banking platform, aiming to improve banking for a billion people. Mambu offers exciting career opportunities and helps shape the future of financial services; their culture is vibrant, and they value diversity.

Turkey

  • Responsible for Insider One's technological well-being and impacts the development lifecycle.
  • Develops internal solutions and improves site reliability through continuous delivery and integration.
  • Creates analytical tools for application performance insights and ensures projects are completed on time.

Insider One is a platform that provides marketing and customer engagement tools, enabling teams to reach their full potential. They are a B2B SaaS unicorn with 1,500+ team members representing 50+ nationalities across 30+ offices and are dedicated to social responsibility.

India

  • Design and manage AWS infrastructure for AI services.
  • Implement Infrastructure as Code using Terraform.
  • Collaborate with cross-functional teams to enhance performance.

Jobgether uses an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements. Their system identifies the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

US

  • Make deployments boring (in the best way possible)
  • Own CI/CD pipelines: optimize build times, improve caching, reduce flakiness
  • Evolve our Kubernetes (EKS) deployment strategy for reliability and speed

Obvious is building an AI-native workspace, an operating system for work that puts co-intelligence at the center. They are a small and talent-dense team with world-class builders, former founders, and leaders from companies like Netflix, Google, and Meta.

  • Helping improve the infrastructure and data platform using a lean approach.
  • Creating a data platform and infrastructure optimized for developments using Machine Learning and massive data processing.
  • Improving the development experience and spreading the DevOps culture in the company.

Clarity AI is a global tech company founded in 2017 with a mission to bring societal impact to markets. They leverage AI and machine learning to provide data, methodologies, and tools to investors, governments, companies, and consumers for informed decisions; they are a team of over 300 individuals with offices in New York, Madrid, London, Paris, and Abu Dhabi, backed by investors like BlackRock and SoftBank. .