Jobs Similar to Site Reliability Engineer | TangerineFeed

Site Reliability Engineer

BJAK 15 hours ago

China

Own reliability and operational stability of BJAK’s production systems.
Design and improve monitoring, alerting, logging and observability across services.
Lead incident response, troubleshooting and structured root cause analysis.

Site Reliability Engineering DevOps Cloud Infrastructure CI/CD Monitoring

20 jobs similar to Site Reliability Engineer

Jobs ranked by similarity.

DevOps Engineer

BJAK 1 day ago

China

Manage cloud infrastructure and deployment pipelines for production systems.
Design and improve CI/CD processes to make deployments safer and faster.
Improve monitoring, alerting, and system observability across services.

BJAK provides AI-powered automation for insurance processes including quote generation, policy issuance, claims, and payments. The company operates with a global engineering team and a modern culture focused on reliability and operational excellence.

View details Similar jobs

DevOps Engineer

BJAK 15 hours ago

China

Build and maintain a highly reliable platform for BJAK's AI automation systems.
Manage cloud infrastructure, deployment pipelines, and CI/CD workflows.
Improve system resilience, monitoring, and incident response.

BJAK’s automation systems support customer journeys across quote generation, policy issuance, claims, payments, renewals and insurer integrations. They are a global engineering team with a focus on reliability, ownership, and modern engineering culture.

View details Similar jobs

Senior SRE, Ads

Reddit 9 days ago

UK Netherlands Ireland Unlimited PTO

Partner with Ads Engineering teams to improve reliability, scalability, and operational excellence of ad-serving and related systems.
Design, build, and maintain infrastructure, tooling, and automation to improve service reliability and engineering productivity.
Participate in on-call rotations, lead incident response, and drive root cause analysis and corrective actions.

Reddit is a community of communities built on shared interests, passion, and trust. With 100,000+ active communities and approximately 126 million daily active unique visitors, it is one of the internet's largest sources of information.

View details Similar jobs

Senior Site Reliability Engineer (SRE)

Oowlish 5 days ago

Latin America

Design, implement, and improve Site Reliability Engineering practices across production environments with a focus on SLOs, SLIs, and error budgets.
Lead incident response processes and build observability strategies including monitoring, logging, alerting, and distributed tracing.
Partner with engineering teams to enhance system reliability, availability, scalability, and operational efficiency.

Oowlish is a rapidly expanding software development company in Latin America that collaborates with premier clients from the United States and Europe to create pioneering digital solutions. Certified as a Great Place to Work, it offers a nurturing environment with opportunities for professional growth and international impact.

View details Similar jobs

Lead Software Engineer

BJAK 15 hours ago

China

Lead design and delivery of core platform systems for AI-driven insurance automation.
Translate complex business needs into scalable backend architecture and APIs.
Mentor engineers and ensure system reliability, maintainability, and observability.

BJAK uses AI, automation, and backend systems to power end-to-end insurance operations. We are a growing global team with a modern engineering culture focused on reliability, scalability, and excellence.

View details Similar jobs

Manager Site Reliability Operations

Mercury Insurance 16 days ago

US

Lead the Site Reliability Operations team, overseeing observability, monitoring, incident response, and operational excellence for key enterprise services.
Partner with product, engineering, and infrastructure teams to embed CI/CD and release best practices, automating build/test/deploy and release monitoring.
Own problem management, driving root cause analysis and corrective actions to improve system resilience and reduce incident impact.

Mercury Insurance helps people reduce risk and overcome unexpected events, serving customers for over 60 years. They are a midsize employer recognized as one of America's Best Midsize Employers for 2026, with a collaborative culture focused on growth and inclusion.

View details Similar jobs

Site Reliability Engineer (SRE)

Supabase 9 days ago

Global

Collaborate with service teams to define SLIs and SLOs based on customer experience and build error budget policies that influence engineering decisions.
Own the Operational Readiness Review process, conducting reviews for new services and major changes across observability, alerting, runbooks, capacity, and graceful degradation.
Act as a reliability expert for architecture reviews, failure mode analysis, dependency mapping, and resilience design.

Supabase provides the Postgres development platform with a complete backend solution including Database, Auth, Storage, Edge Functions, Realtime, and Vector Search. With 280+ team members across 55+ countries, they are an open-source-first company that values async work and has raised $500M.

View details Similar jobs

Sr. Site Reliability Engineer

Filevine 4 days ago

United States

Own and evolve observability strategy including monitoring, alerting, dashboards, logging, and distributed tracing.
Define and manage SLIs, SLOs, and reliability metrics, improving MTTD and MTTR through automation.
Build and maintain reliable cloud infrastructure on AWS and Kubernetes while mentoring engineers on SRE best practices.

Filevine is a Legal AI company delivering Legal Operating Intelligence for legal work. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.

View details Similar jobs

Senior Site Reliability Engineer

Flip 4 days ago

Co-own the architecture of cloud infrastructure on Azure and Kubernetes clusters for high throughput and availability.
Drive resilience strategy for global scaling, zero-downtime deployments, and disaster recovery.
Evolve observability stack with LGTM (Loki, Grafana, Tempo, Mimir) and lead incident response.

Flip is an AI-powered employee experience platform for frontline workers in retail, manufacturing, and logistics. The company is a young, rapidly growing tech company with a remote-first culture and offices in Berlin and Stuttgart.

View details Similar jobs

Senior Site Reliability Engineer

CertifyOS 11 days ago

US Unlimited PTO

Design and build cloud-native infrastructure for reliability, observability, and automation across GCP, GKE, and Cloud Run.
Own incident response, root cause analysis, escalation workflows, and runbooks to prevent hard problems from recurring.
Develop Infrastructure as Code, CI/CD pipelines, and operational tooling to improve developer velocity and platform efficiency.

CertifyOS is building the data infrastructure that powers modern healthcare, automating provider licensing, enrollment, credentialing, and network monitoring through an API-first platform. The company is backed by leading investors with a team of deep experience in provider data systems, valuing authenticity, accountability, collaboration, results, and openness to feedback.

View details Similar jobs

Platform Engineer

BJAK 15 hours ago

China

Design and build platform components that support AI-driven workflow automation systems.
Develop shared infrastructure and services for workflow orchestration, state management and execution tracking.
Improve APIs, service frameworks and backend standards used across multiple engineering teams.

BJAK’s automation systems power end-to-end insurance journeys across quote generation, policy issuance, renewals, endorsements, claims, payments and insurer integrations. They have a global engineering team working across multiple countries and offer a fully remote, high-ownership environment with a focus on scalability and reliability.

View details Similar jobs

Staff SRE, Ads

Reddit 9 days ago

Europe

Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.

Reddit is a community of communities, built on shared interests, passion, and trust, home to the most open and authentic conversations on the internet. With 100,000+ active communities and approximately 126 million daily active unique visitors, it is one of the internet's largest sources of information.

View details Similar jobs

Senior Software Engineer- Site Reliability Engineering (SRE)

Noctua Technology, LLC 17 hours ago

US

Drive the definition and adoption of SLIs and SLOs across services, reducing toil through automation and incident response.
Design and architect Infrastructure as Code solutions for large-scale environments using Docker, Kubernetes, and cloud-native services.
Serve as primary SRE liaison for development teams, influencing architecture and conducting training for clients.

Noctua Technology, LLC is a company that drives digital transformation by treating operations as a software engineering challenge, focusing on cloud native systems. They are a dynamic team seeking a Senior SRE to define strategy and bridge development and operations for clients.

View details Similar jobs

Site Reliability Engineer (SRE)

Synthesia 9 days ago

US

Take ownership of incident management and operational excellence across cloud infrastructure.
Automate high-risk manual processes and drive reliability gains through engineering.
Own a platform domain such as Temporal, observability, or Kubernetes operations.

Synthesia is the world’s leading AI video platform for business, used by over 90% of the Fortune 100. Founded in 2017, the company is headquartered in London with offices across Europe and the US, and has over $530 million in funding from premier investors like Accel and Nvidia's VC arm.

View details Similar jobs

Senior Site Reliability Engineer (Remote Build)

Remote 13 days ago

Global Unlimited PTO 16w maternity 16w paternity

Own the operational excellence and infrastructure strategy for Remote Build's platform, ensuring reliability, performance, and security.
Lead incident response, build observability systems, and drive continuous improvement in system reliability.
Embed security into infrastructure, optimize costs, and automate operational toil to scale efficiently.

Remote solves modern organizations' biggest challenge of navigating global employment compliantly. With a fully distributed team across 6 continents, the company fosters a future-focused culture with core values of innovation and async work.

View details Similar jobs

Senior Site Reliability Engineer

Circle 18 days ago

Americas 7w PTO

Act as a first responder for system incidents and outages, ensuring high availability and performance.
Own and evolve monitoring, alerting, and log management systems while optimizing database infrastructure.
Collaborate with engineering teams to build scalable, resilient systems and contribute to SRE tooling and automation.

Circle is building the world's leading all-in-one platform for online communities. We're a fully remote company of around 200 team members from 30+ countries, with a culture that values autonomy, async collaboration, and high expectations.

View details Similar jobs

Head of Site Reliability Engineering

Titan 7 days ago

US

Build the SRE practice from scratch: define SLO frameworks, on-call rotation, and incident command for live bank customers.
Define severity tiers, SLA commitments, and escalation paths for production support, acting as the technical owner during incidents.
Set engineering operations across sprint discipline, release rituals, code review standards, and compliance artifacts for bank examiners.

Titan builds AI software for banks, specializing in purpose-built small language models and AI bankers that financial institutions trust. The company is a backed fintech startup scaling from a handful to hundreds of customers, with a hands-on, build-first culture under strict compliance standards.

View details Similar jobs

Sr. Production Engineer

Zscaler 12 days ago

US

Implement highly available, scalable infrastructure across AWS, GCP, and bare-metal environments.
Drive an "automation-first" culture by writing code in Python/Go to build self-healing systems.
Act as lead Incident Commander, develop response playbooks, and conduct post-incident analyses.

Zscaler accelerates digital transformation to secure customers with a cloud-native Zero Trust Exchange platform. The company processes over 200 billion transactions daily and fosters a culture of execution, collaboration, and accountability.

View details Similar jobs

Staff Engineer, Site Reliability

Babylist 17 days ago

US Canada

Own and evolve AWS infrastructure using Terraform, managing EKS clusters, databases, and core services.
Maintain CI/CD reliability and developer tooling across the full engineering org.
Lead incident response, drive post-incident reviews, and improve monitoring and alerting standards.

Babylist is the leading platform for expecting and new families, helping parents feel confident, connected, and cared for at every step. As a modern, AI-forward tech company with over 10 million yearly shoppers, Babylist has expanded into a full ecosystem and generated $750M in revenue in 2025, reshaping the $235B kids and baby market.

View details Similar jobs

Field Reliability Engineer- LATAM

Honeycomb 4 days ago

Unlimited PTO 16w maternity 16w paternity

Own and operate customer-facing managed infrastructure across multiple AWS accounts and regions.
Serve as the senior technical escalation point for production incidents and complex configurations.
Contribute to OpenTelemetry distributions and maintain open source projects like Refinery.

Honeycomb provides observability for developer tools, helping companies like HelloFresh and Slack understand their software. They have over 200 employees and were named to Forbes' Best Startups in 2022 and 2023, with a culture that values inclusion and autonomy.

View details Similar jobs