Source Job

Global

  • Lead a team of experienced SRE engineers to raise reliability standards in blockchain infrastructure.
  • Set engineering direction, build conditions for good work, and apply SRE disciplines like SLOs and error budgets.
  • Drive automation and foster people development in a small, broad-scope team.

SRE Distributed Systems Automation People Management Infrastructure

14 jobs similar to Engineering Manager (SRE)

Jobs ranked by similarity.

Europe

  • Lead, coach, and support a team of around 9 engineers within the Runtime function.
  • Drive planning, prioritisation, and execution across complex technical workstreams.
  • Build enough technical context to understand architecture discussions and translate them into executable plans.

Parity builds core infrastructure for blockchain, enabling secure sharing of information and value without intermediaries. Their remote-first global team develops open-source software like Polkadot and Kusama, focusing on a decentralized web that empowers developers.

Global

  • Collaborate with service teams to define SLIs and SLOs based on customer experience and build error budget policies that influence engineering decisions.
  • Own the Operational Readiness Review process, conducting reviews for new services and major changes across observability, alerting, runbooks, capacity, and graceful degradation.
  • Act as a reliability expert for architecture reviews, failure mode analysis, dependency mapping, and resilience design.

Supabase provides the Postgres development platform with a complete backend solution including Database, Auth, Storage, Edge Functions, Realtime, and Vector Search. With 280+ team members across 55+ countries, they are an open-source-first company that values async work and has raised $500M.

Canada US 4w PTO

  • Lead and grow high-performing platform engineering teams that deliver reliable, scalable infrastructure and operational excellence for Vanta’s products and customers.
  • Set technical direction and drive multi-quarter platform initiatives spanning infrastructure reliability, security, scalability, and developer experience across shared systems and services.
  • Partner closely with product engineering, security, and engineering leadership to identify organizational needs and deliver scalable platform solutions.

Vanta helps businesses earn and prove trust by empowering companies to practice better security and prove it with ease. They have a kind and talented team, and while some have prior security experience, many have been successful without it.

UK Netherlands Ireland Unlimited PTO

  • Partner with Ads Engineering teams to improve reliability, scalability, and operational excellence of ad-serving and related systems.
  • Design, build, and maintain infrastructure, tooling, and automation to improve service reliability and engineering productivity.
  • Participate in on-call rotations, lead incident response, and drive root cause analysis and corrective actions.

Reddit is a community of communities built on shared interests, passion, and trust. With 100,000+ active communities and approximately 126 million daily active unique visitors, it is one of the internet's largest sources of information.

US

  • Build and lead a high-performance product engineering team focused on innovation, accountability, and reliability.
  • Develop scalable reliability, risk management, and operational governance capabilities for production systems.
  • Drive alignment across Platform Engineering, SRE, Infrastructure, and product teams to deliver long-term technical roadmap outcomes.

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without hidden fees or compounding interest. It is a publicly traded, remote-first company with competitive benefits and a culture focused on innovation and people.

Europe

  • Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
  • Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
  • Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.

Reddit is a community of communities, built on shared interests, passion, and trust, home to the most open and authentic conversations on the internet. With 100,000+ active communities and approximately 126 million daily active unique visitors, it is one of the internet's largest sources of information.

US Unlimited PTO

  • Lead a global SRE team of ~10 engineers, owning day-to-day operations and long-term technical direction.
  • Drive strategic partnerships with product engineering to shift from reactive support to proactive reliability ownership.
  • Scale multi-tenant infrastructure, manage cloud costs, and champion developer self-service.

Counterpart Health is transforming healthcare by providing an AI-enabled primary care tool that supports physicians in early diagnosis and management of chronic conditions. As a subsidiary of Clover Health, it has a remote-first culture that emphasizes collaboration and innovation.

Latin America

  • Design, implement, and improve Site Reliability Engineering practices across production environments with a focus on SLOs, SLIs, and error budgets.
  • Lead incident response processes and build observability strategies including monitoring, logging, alerting, and distributed tracing.
  • Partner with engineering teams to enhance system reliability, availability, scalability, and operational efficiency.

Oowlish is a rapidly expanding software development company in Latin America that collaborates with premier clients from the United States and Europe to create pioneering digital solutions. Certified as a Great Place to Work, it offers a nurturing environment with opportunities for professional growth and international impact.

US

  • Lead design and operation of internal developer platforms and self-service infrastructure.
  • Build and optimize CI/CD pipelines, deployment workflows, and automation across GitHub Actions, Jenkins, ArgoCD.
  • Apply SRE principles to improve developer-facing systems and software delivery performance.

Versant is a media company owning iconic brands in news, sports, and entertainment, including USA Network, Fandango, and Rotten Tomatoes. It is an independent, publicly traded company with a collaborative, inclusive culture and a remote-first work environment.

US

  • Lead the Site Reliability Operations team, overseeing observability, monitoring, incident response, and operational excellence for key enterprise services.
  • Partner with product, engineering, and infrastructure teams to embed CI/CD and release best practices, automating build/test/deploy and release monitoring.
  • Own problem management, driving root cause analysis and corrective actions to improve system resilience and reduce incident impact.

Mercury Insurance helps people reduce risk and overcome unexpected events, serving customers for over 60 years. They are a midsize employer recognized as one of America's Best Midsize Employers for 2026, with a collaborative culture focused on growth and inclusion.

US

  • Build the SRE practice from scratch: define SLO frameworks, on-call rotation, and incident command for live bank customers.
  • Define severity tiers, SLA commitments, and escalation paths for production support, acting as the technical owner during incidents.
  • Set engineering operations across sprint discipline, release rituals, code review standards, and compliance artifacts for bank examiners.

Titan builds AI software for banks, specializing in purpose-built small language models and AI bankers that financial institutions trust. The company is a backed fintech startup scaling from a handful to hundreds of customers, with a hands-on, build-first culture under strict compliance standards.

Europe

  • Drive the healthy growth of the engineering organization.
  • Help with hiring high-caliber Engineers and mentor team members.
  • Drive processes that can sustain a high-performance engineering team.

Anchorage Digital is building a platform for institutions to participate in digital assets through custody, staking, trading, governance, settlement, and security infrastructure. The company has over 600 employees and strives to be a welcoming and inclusive workplace where people feel respected, supported, and connected.

United States 12w maternity 12w paternity

  • Lead multiple engineering teams to build and maintain the core platform supporting a growing portfolio of products.
  • Define and drive the technical roadmap, architecture strategy, and engineering best practices for scalability and reliability.
  • Recruit, develop, and retain top engineering talent while coaching engineering managers and driving accountability.

Our partner is a fast-growing technology organization dedicated to building scalable platforms that power innovative products used by millions worldwide. They foster an inclusive, collaborative culture focused on growth, innovation, and employee success.

US Unlimited PTO 16w maternity

  • Lead and grow high-performing platform engineering teams.
  • Set technical direction and drive multi-quarter platform initiatives.
  • Design and evolve internal platforms for product teams.

Vanta's mission is to help businesses earn and prove trust by making security continuous and verifiable. They empower companies to practice better security, automating security monitoring for compliance standards. Vanta has a kind and talented team.