Source Job

$98,583–$138,016/yr
US Unlimited PTO

  • Respond to production incidents and contribute to post-incident analysis.
  • Identify and automate manual processes to improve efficiency and reduce risk.
  • Enhance monitoring tools and platforms to improve observability.

Terraform Ansible Python Bash PowerShell

20 jobs similar to Site Reliability Engineer II

Jobs ranked by similarity.

US

  • Architect and deploy secure, scalable infrastructure using Terraform, CloudFormation, or similar tools.
  • Ensure the platform meets strict SLA requirements for enterprise clients, minimizing downtime.
  • Implement comprehensive monitoring, logging, and alerting to provide deep visibility into system health.

Filevine provides cloud-based workflow tools for legal professionals, helping them manage organizations and serve clients. They are recognized as a fast-growing and innovative technology company with a team of passionate professionals.

US Unlimited PTO

  • Contribute to high impact AWS cloud infrastructure initiatives.
  • Participate in operability and production readiness reviews.
  • Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

US Canada Europe

  • Design, build, and maintain highly available, scalable infrastructure.
  • Manage and optimize infrastructure across GCP, AWS, Azure, and other cloud providers.
  • Develop comprehensive monitoring, logging, and alerting systems.

Bobsled is seeking a Site Reliability Engineer to enhance its data-sharing platform's reliability and scalability. We're a company that values growth, offering flexible work hours in a fully remote environment and fully sponsored individual coaching for all employees.

$126,000–$184,000/yr
US

  • Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
  • Lead automation efforts and architect for reliability.
  • Act as the final escalation point for critical incidents.

Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.

$109,800–$252,500/yr
US Unlimited PTO 16w maternity 8w paternity

  • Design, implement, and maintain scalable and reliable infrastructure solutions.
  • Automate deployments and maintain a resilient, secure SaaS application platform.
  • Develop comprehensive monitoring and alerting solutions, and respond to incidents.

Veeam is the #1 global market leader in data resilience, believing businesses should control all their data whenever and wherever they need it, providing data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running.

US

  • Design, build, and maintain secure, scalable cloud infrastructure.
  • Own CI/CD pipelines and deployment workflows across services and environments.
  • Improve reliability, availability, and performance through monitoring, alerting, and incident response practices.

Jobgether is a company that uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates and share this short list directly with the hiring company.

US

  • Leverage infrastructure as code (Terraform) to build and maintain complex production and analytics workflows including networking and containerized services.
  • Rapidly diagnose and resolve faults in system services as part of a 24/7 on-call rotation focused on actionable alerting and eliminating toil.
  • Improve speed of delivery by developing and maintaining CI/CD pipelines.

Linus Health is a Boston-based digital health company transforming brain health worldwide. They combine cutting-edge neuroscience, clinical expertise, and AI to advance early detection and intervention for cognitive and brain disorders, empowering people to live longer, healthier lives. With 100+ team members and growing, they’re entering a phase of accelerated growth and looking for top talent to help shape their future.

US

  • Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
  • Create highly automated, available and scalable systems by applying software and infrastructure principles
  • Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

US Unlimited PTO

  • Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle.
  • Build and execute the monitoring strategy for ScienceLogic SaaS infrastructure.
  • Define, deploy, and maintain system and service monitors.

ScienceLogic is a leader in IT Operations Management, giving modern IT operations actionable insights for faster problem resolution and prediction. They see everything across cloud and distributed architectures, contextualizing data through relationship mapping, and acting on this insight through integration and automation.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

  • Contribute to the design, development, and implementation of platform components and services using infrastructure-as-code principles.
  • Identify opportunities for automation and develop solutions to streamline operational tasks, improve efficiency, and reduce manual intervention.
  • Participate in the provisioning, configuration, and management of AWS resources, ensuring adherence to best practices and security standards.

Mambu is a leading SaaS cloud banking platform, aiming to improve banking for a billion people. Mambu offers exciting career opportunities and helps shape the future of financial services; their culture is vibrant, and they value diversity.

$120,000–$160,000/yr
US 3w PTO

  • Design, build, and maintain shared platform services that support secure and scalable infrastructure across client and internal environments.
  • Develop and maintain infrastructure-as-code (IaC) using tools such as Terraform, ARM/Bicep, or similar frameworks.
  • Build automation for system provisioning, configuration management, patching, and lifecycle operations.

Sentinel Blue is bringing enterprise-class cybersecurity to small and medium sized businesses. They are pushing the envelope of how things are done and constantly seeking innovative ways to meet that mission.

  • Designing, building, and maintaining infrastructure that enables fast, reliable, and secure product delivery.
  • Improving and maintaining CI/CD pipelines to streamline deployments and increase reliability.
  • Contributing to infrastructure reliability and ensuring systems are designed for resilience and growth.

Incident.io is the leading AI incident response platform, built to help teams dramatically reduce incident response time and improve reliability. They have raised $100M from Index Ventures, Insight Partners, and Point Nine, alongside founders and executives from world-class technology companies.

US

  • Designs and maintains CI/CD pipelines using GitLab CI/CD.
  • Implements Infrastructure as Code (IaC) with tools like Terraform.
  • Automates complex workflows and enhances infrastructure scalability.

Everseen is a vision AI solutions provider for global retailers. They have over 900 employees globally, with headquarters in Cork, Ireland, European headquarters in Cork, Ireland, and a U.S. headquarters in Miami, with hubs in Romania, Serbia, India, Australia, and Spain.

$219,000–$245,000/yr
US Unlimited PTO

  • Architect, operate, improve and secure the platform the Garner Health app runs on
  • Boost development velocity and productivity
  • Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

Global

  • Automate infrastructure provisioning, configuration management, monitoring, and operational workflows using IaC and scripting languages.
  • Own the deployment, maintenance, and lifecycle management of systems supporting engineering, leveraging deep expertise in Kubernetes.
  • Troubleshoot complex infrastructure and application issues, driving root-cause analysis and developing long-term remediation solutions

SingleStore delivers the cloud-native database with the speed and scale to power the world’s data-intensive applications. They are venture-backed and headquartered in San Francisco with offices in Sunnyvale, Raleigh, Seattle, Boston, London, Lisbon, Bangalore, Dublin and Kyiv.

Global

  • Own and operate core platform systems across AWS, GCP, Vercel, Github, and Cloudflare.
  • Improve reliability, scalability, and security of production and non-production environments.
  • Improve local development environments and onboarding experience for engineers.

Moxie empowers ambitious aesthetic entrepreneurs to build profitable, independent practices. A global, remote-first team of more than 140 people supports hundreds of practices nationwide as they unlock sustainable success for aesthetic entrepreneurs.

Latin America

  • Design, implement, and manage cloud infrastructure using Infrastructure as Code (IaC) tools.
  • Design, build, and maintain scalable CI/CD pipelines using tools like CircleCI or GitHub Actions.
  • Implement and maintain observability tooling (Prometheus, Grafana, Datadog), and lead incident response to ensure system reliability.

Engine is transforming business travel into something personalized, rewarding, and simple. More than 20,000 companies already rely on Engine to support over 1 million travelers and billions in annual bookings each year.

Latin America Unlimited PTO

  • Audit and optimize cloud usage, capacity, and spend.
  • Improve reliability through better automation, monitoring, and alerting.
  • Partner with engineers to upgrade infrastructure components and roll out changes safely.

Our client builds a high-scale data and analytics platform used by sophisticated teams to make critical business decisions. They are trusted by 800+ companies and value collaboration, high ownership, and long-term system reliability.

Canada 4w PTO

  • Build and operate the systems that power Vanta’s FedRAMP environments.
  • Design and maintain Vanta’s vulnerability management platform, automating detection, remediation, and compliance reporting.
  • Define and evolve Vanta’s production reliability framework, including SLOs, incident response patterns, observability standards.

Vanta helps businesses earn and prove trust by making security monitored and verified continuously. They empower companies to practice better security and prove it with ease. Vanta has a kind and talented team, and while some have prior security experience, many have been successful without it.