Jobs Similar to Staff Site Reliability Engineer

Intermediate Site Reliability Engineer, Tenant Scale

GitLab 24 days ago

Americas EMEA Unlimited PTO

Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.

View details Similar jobs

New Staff Site Reliability Engineer

Garner Health 26 days ago

$219,000–$245,000/yr

US Unlimited PTO

Architect, operate, improve and secure the platform the Garner Health app runs on
Boost development velocity and productivity
Build systems to a high engineering standard and hold others to the same high standard

Garner has developed a revolutionary approach to evaluating doctor performance and a unique incentive model that's reshaping the healthcare economy to ensure everyone can afford high quality care. They have more than doubled their revenue annually over the last 5 years. Garner's award winning culture is designed to cultivate teamwork, trust, autonomy, exceptional results, and individual growth.

View details Similar jobs

Senior Site Reliability Engineer

Clarifai 20 days ago

US

Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

View details Similar jobs

Senior Site Reliability Engineer - Infrastructure

Underdog 20 days ago

$160,000–$240,000/yr

US Unlimited PTO 11w maternity

Own and maintain the incident response process, including defining procedures, tools, and best practices
Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs

Underdog makes sports more fun by building the best products for sports fans. They are a fast-growing sports company valued at $1.3B with a focus on a seamless, simple, easy to use, intuitive and fun app.

View details Similar jobs

Site Reliability Engineer

Patreon 12 days ago

US Unlimited PTO

Contribute to high impact AWS cloud infrastructure initiatives.
Participate in operability and production readiness reviews.
Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

View details Similar jobs

Senior Site Reliability Engineer

Cloudbeds 7 days ago

Global

Design and implement reliable and scalable AWS architecture.
Support the CICD process with ArgoCD and GitOps, automating deployments with Terraform.
Optimize system performance and troubleshoot issues, collaborating with development teams.

Cloudbeds is transforming hospitality with its intelligently designed platform that powers properties across 150 countries. They are a completely remote team of 650+ employees across 40+ countries, focused on building AI-powered solutions for hotels.

View details Similar jobs

Software Engineer, Platform & Infrastructure

Dune 1 day ago

Europe US 5w PTO 16w maternity 6w paternity

Design, operate, and continuously improve the cloud infrastructure that powers our systems using infrastructure-as-code, monitoring, and observability.
Own critical parts of the platform: identify bottlenecks, propose and implement improvements, and drive reliability and performance at scale.
Run Kubernetes in production and evolve how we operate it.

Dune is on a mission to make crypto data accessible. They’re a collaborative multi-chain analytics platform used by thousands of developers, analysts, & investors to understand the on-chain world and the frontiers of finance. They are a team of ~60 employees working together across Europe and eastern US timezones.

View details Similar jobs

Infrastructure Engineer

Glia 20 days ago

Europe

Maintaining and updating Glia’s core infrastructure.
Troubleshooting and resolving infrastructure-related issues.
Improving our security posture.

Glia provides an AI customer service solution for banks and credit unions, unifying AI and human agents across every voice and digital conversation through its ChannelLess® Architecture. Valued at over $1 billion, Glia powers over 700 financial institutions and is certified as a Great Place to Work, with 98% employee satisfaction.

View details Similar jobs

Senior DevOps Engineer

Scribe 10 days ago

$150,000–$200,000/yr

US Unlimited PTO

Architect, maintain, and scale critical infrastructure.
Ensure system reliability and optimize performance.
Implement modern deployment strategies.

Scribe's Workflow AI platform automatically captures and optimizes workflows so teams work smarter, faster, and more consistently. They are a fast-growing company founded in 2019 with over 5 million users across 600,000 businesses, and they are backed by leading investors.

View details Similar jobs

Site Reliability Engineer

66degrees 15 days ago

US

Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
Create highly automated, available and scalable systems by applying software and infrastructure principles
Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

View details Similar jobs

SENIOR PLATFORM ENGINEER

Moxie 13 days ago

Global

Own and operate core platform systems across AWS, GCP, Vercel, Github, and Cloudflare.
Improve reliability, scalability, and security of production and non-production environments.
Improve local development environments and onboarding experience for engineers.

Moxie empowers ambitious aesthetic entrepreneurs to build profitable, independent practices. A global, remote-first team of more than 140 people supports hundreds of practices nationwide as they unlock sustainable success for aesthetic entrepreneurs.

View details Similar jobs

Senior Site Reliability Engineer

Jobgether 6 days ago

$130,000–$140,000/yr

Global 7w PTO

Act as a primary responder for system incidents and outages, ensuring high availability and fast recovery.
Own and continuously improve monitoring, alerting, and log management systems.
Manage, optimize, and scale database infrastructure including MySQL, PostgreSQL, ClickHouse, and Redis.

Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

View details Similar jobs

Senior Site Reliability Engineer, Observability

ScienceLogic 7 days ago

US Unlimited PTO

Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle.
Build and execute the monitoring strategy for ScienceLogic SaaS infrastructure.
Define, deploy, and maintain system and service monitors.

ScienceLogic is a leader in IT Operations Management, giving modern IT operations actionable insights for faster problem resolution and prediction. They see everything across cloud and distributed architectures, contextualizing data through relationship mapping, and acting on this insight through integration and automation.

View details Similar jobs

Senior Site Reliability Engineer, AI Research

Algolia 13 days ago

Australia

Support and evolve the reliability of platforms used by the AI Research team.
Ensure production services meet expectations for availability, latency, and operational readiness.
Build and maintain Kubernetes-based services on GCP using infrastructure-as-code and GitOps.

Algolia is a pioneer and market leader in AI Search, empowering 17,000+ businesses to deliver blazing-fast, predictive search and browse experiences. They have raised $150 million in Series D funding, quadrupling their valuation to $2.25 billion, investing in their market-leading platform.

View details Similar jobs

Senior Site Reliability Engineer (Forward Deployed)

Teleport 1 day ago

$180,800–$311,000/yr

US

Work directly with customers to ensure successful Teleport deployments.
Meet regularly with customers, understand pain points blocking deployments and remove roadblocks.
Work with customers to articulate the problem they are trying to solve, gather requirements, and make the business case to the product and engineering teams to invest in resolving the issue.

Teleport is the Infrastructure Identity Company, modernizing identity, access, and policy for infrastructure, improving engineering velocity and resiliency of critical infrastructure against human factors and/or compromise. They are a fast-growing, well-funded Y-Combinator company that values craft, strongly supports work/life balance, and embraces a culture of humility, honesty, and transparency.

View details Similar jobs

Infrastructure Engineer

Obvious 10 days ago

US

Make deployments boring (in the best way possible)
Own CI/CD pipelines: optimize build times, improve caching, reduce flakiness
Evolve our Kubernetes (EKS) deployment strategy for reliability and speed

Obvious is building an AI-native workspace, an operating system for work that puts co-intelligence at the center. They are a small and talent-dense team with world-class builders, former founders, and leaders from companies like Netflix, Google, and Meta.

View details Similar jobs

SRE Manager

AuthZed 16 days ago

US Canada Europe

Lead a global team of Site Reliability Engineers.
Recruit, hire, onboard and develop engineers.
Guide project planning by defining milestones and identifying dependencies.

AuthZed creates and maintains SpiceDB and the authorization infrastructure. They are a Series A company with a fully remote team across the US, Canada, and Europe and a hardworking, close-knit group with a software-driven culture that values integrity, collaboration, and open-mindedness.

View details Similar jobs

Federal Site Reliability Engineer

Confluent 21 days ago

US

Understand and participate in the changing FedRAMP space.
Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies.
Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.

Confluent is rewriting how data moves and what the world can do with it. Their platform puts information in motion, streaming in near real-time so companies can react faster and build smarter. They value team players who ask hard questions, give honest feedback, and show up for each other.

View details Similar jobs

DevOps Engineer

Insider One 16 days ago

Turkey

Responsible for Insider One's technological well-being and impacts the development lifecycle.
Develops internal solutions and improves site reliability through continuous delivery and integration.
Creates analytical tools for application performance insights and ensures projects are completed on time.

Insider One is a platform that provides marketing and customer engagement tools, enabling teams to reach their full potential. They are a B2B SaaS unicorn with 1,500+ team members representing 50+ nationalities across 30+ offices and are dedicated to social responsibility.

View details Similar jobs

Senior Site Reliability Engineer

Juul Labs 12 days ago

$126,000–$184,000/yr

US

Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
Lead automation efforts and architect for reliability.
Act as the final escalation point for critical incidents.

Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.

View details Similar jobs

Source Job