Jobs Similar to Site Reliability Operations

Site Reliability Engineer II

Restaurant365 3 days ago

$98,583–$138,016/yr

US Unlimited PTO

Respond to production incidents and contribute to post-incident analysis.
Identify and automate manual processes to improve efficiency and reduce risk.
Enhance monitoring tools and platforms to improve observability.

Restaurant365 is a SaaS company that provides a unique, centralized solution for accounting and back-office operations for restaurants. They focus on empowering team members to produce top-notch results while elevating their skills.

View details Similar jobs

Senior Site Reliability Engineer, Observability

ScienceLogic 7 days ago

US Unlimited PTO

Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle.
Build and execute the monitoring strategy for ScienceLogic SaaS infrastructure.
Define, deploy, and maintain system and service monitors.

ScienceLogic is a leader in IT Operations Management, giving modern IT operations actionable insights for faster problem resolution and prediction. They see everything across cloud and distributed architectures, contextualizing data through relationship mapping, and acting on this insight through integration and automation.

View details Similar jobs

Incident Commander

EBay 16 days ago

$103,200–$178,400/yr

US

Serve as Incident Commander, leading real-time response efforts, managing communication across teams, triaging issues, and driving resolution of high-priority incidents.
Execute documented runbooks for troubleshooting and resolving production incidents involving AWS services and Kubernetes Clusters.
Collaborate post-incident with engineering teams, performing root cause analysis, documenting lessons learned, and driving the implementation of durable solutions.

EBay is a global ecommerce leader that is changing the way the world shops and sells. Our platform empowers millions of buyers and sellers in more than 190 markets around the world and the team fosters an inclusive and collaborative culture, encouraging open communication, continuous learning, and professional growth.

View details Similar jobs

Senior Incident Manager

NetDocuments 6 days ago

$83,000–$96,000/yr

US

Lead the identification, triage, escalation, and resolution of incidents to minimize customer and business impact.
Provide timely, clear, and professional communication to internal stakeholders throughout the incident lifecycle.
Develop, maintain, and improve incident management processes, procedures, runbooks, and playbooks.

NetDocuments is the world’s #1 trusted cloud-based content management and productivity platform that helps legal professionals do their best work. They strive to win together through passionate hard work, exploring new things and recognizing every interaction matters.

View details Similar jobs

IT Service Operations Manager

RWS 2 hours ago

Global

Lead role in major incidents and ensure effective communication to stakeholders.
Monitor, control, and support service delivery, ensuring systems and procedures are followed.
Define and track service measures and KPIs to manage the performance of IT services.

RWS unlocks global understanding by growing the value of ideas, data, and content. The company values every language and culture and has a global reach, providing support services to over 7500 end users worldwide, with a dedicated team of over 500 staff across all regions.

View details Similar jobs

Site Reliability Engineer

66degrees 14 days ago

US

Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
Create highly automated, available and scalable systems by applying software and infrastructure principles
Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

View details Similar jobs

DevOps Engineer

Jobgether 14 days ago

US

Design, build, and maintain secure, scalable cloud infrastructure.
Own CI/CD pipelines and deployment workflows across services and environments.
Improve reliability, availability, and performance through monitoring, alerting, and incident response practices.

Jobgether is a company that uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates and share this short list directly with the hiring company.

View details Similar jobs

Senior Site Reliability Engineer

Juul Labs 12 days ago

$126,000–$184,000/yr

US

Own the operational stability and performance of Juul’s hybrid cloud infrastructure.
Lead automation efforts and architect for reliability.
Act as the final escalation point for critical incidents.

Juul Labs aims to transition the world’s billion adult smokers away from combustible cigarettes and eliminate their use, while also combating underage usage of their products. They are backed by leading technology investors and are committed to hiring great talent and building a diverse team.

View details Similar jobs

Site Reliability Engineer

Bobsled 7 days ago

US Canada Europe

Design, build, and maintain highly available, scalable infrastructure.
Manage and optimize infrastructure across GCP, AWS, Azure, and other cloud providers.
Develop comprehensive monitoring, logging, and alerting systems.

Bobsled is seeking a Site Reliability Engineer to enhance its data-sharing platform's reliability and scalability. We're a company that values growth, offering flexible work hours in a fully remote environment and fully sponsored individual coaching for all employees.

View details Similar jobs

Senior Site Reliability Engineer

Cloudbeds 7 days ago

Global

Design and implement reliable and scalable AWS architecture.
Support the CICD process with ArgoCD and GitOps, automating deployments with Terraform.
Optimize system performance and troubleshoot issues, collaborating with development teams.

Cloudbeds is transforming hospitality with its intelligently designed platform that powers properties across 150 countries. They are a completely remote team of 650+ employees across 40+ countries, focused on building AI-powered solutions for hotels.

View details Similar jobs

Senior Site Reliability Engineer - Infrastructure

Underdog 20 days ago

$160,000–$240,000/yr

US Unlimited PTO 11w maternity

Own and maintain the incident response process, including defining procedures, tools, and best practices
Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs

Underdog makes sports more fun by building the best products for sports fans. They are a fast-growing sports company valued at $1.3B with a focus on a seamless, simple, easy to use, intuitive and fun app.

View details Similar jobs

Crypto Production Engineer

Wormhole Foundation 15 days ago

Act as first responder and incident commander during production incidents
Improve reliability and uptime across all Wormhole services
Harden infrastructure for security and operational resiliency

Wormhole Foundation empowers passionate people in the research and development of blockchain interoperability technologies. They support teams building secure, open-source, and decentralized products within the Wormhole ecosystem.

View details Similar jobs

Lead Cloud Infrastructure Manager

Jobgether 8 hours ago

Global

Lead and manage the DevOps team, prioritizing performance and accountability across cloud functions.
Define and enforce DevSecOps standards integrating automation, security, and compliance.
Optimize cloud infrastructure across AWS, GovCloud, and Azure for uptime and cost-effectiveness.

Jobgether is a company using an AI-powered matching process to ensure applications are reviewed quickly, objectively, and fairly. This allows them to identify the top-fitting candidates for companies, and this shortlist is then shared directly with the hiring company.

View details Similar jobs

Site Reliability Engineer III

Veeam 18 days ago

$109,800–$252,500/yr

US Unlimited PTO 16w maternity 8w paternity

Design, implement, and maintain scalable and reliable infrastructure solutions.
Automate deployments and maintain a resilient, secure SaaS application platform.
Develop comprehensive monitoring and alerting solutions, and respond to incidents.

Veeam is the #1 global market leader in data resilience, believing businesses should control all their data whenever and wherever they need it, providing data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running.

View details Similar jobs

Remote Systems Engineer

Jobgether 24 days ago

Europe

Ensure the infrastructure is configured and distributed correctly to meet stability and performance objectives.
Manage the day-to-day operations of the IT infrastructure environment by monitoring performance, configuration, maintenance, and repair.
Deploy and manage Windows/Linux/Unix servers.

Jobgether is connecting tech talent to opportunity. They focus on AI-powered matching processes.

View details Similar jobs

Director of Technical Operations

SugarShot 9 days ago

$125,000–$175,000/yr

US

Oversee daily operations of all technical departments.
Ensure SLA adherence and quality control.
Partner with Client Success on service reviews.

SugarShot is an information technology company with practice areas in Cybersecurity, IT Support and Professional Services. They are growing quickly, been honored on the Inc. 5000 3 years in a row, and have excellent opportunities for great people who are looking to make a real difference in the market place.

View details Similar jobs

Site Reliability Engineer

Patreon 12 days ago

US Unlimited PTO

Contribute to high impact AWS cloud infrastructure initiatives.
Participate in operability and production readiness reviews.
Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

View details Similar jobs

Lead DevOps Engineer

Jobgether 3 days ago

India

Configure/operate monitoring, logging, and tracing tools for application performance.
Build dashboards and automation workflows for system reliability and uptime.
Collaborate with software engineering teams to design and implement robust systems.

Jobgether is a platform that uses AI-powered matching to connect job seekers with employers. They ensure applications are reviewed quickly and fairly, then share a shortlist with the hiring company for final decisions.

View details Similar jobs

Infrastructure Engineer

Dataiku 20 days ago

Europe

Operate, maintain, and troubleshoot UNIX/Linux systems running in cloud environments
Support and maintain existing configuration management and Infrastructure as Code setups
Assist with the operation of cloud-based infrastructure, including virtual machines, networking components, and managed services

Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. Providing no-, low-, and full-code capabilities, Dataiku meets teams where they are today, allowing them to begin building with AI using their existing skills and knowledge.

View details Similar jobs

Senior Manager, Infrastructure

Wealthsimple 28 days ago

North America

Own the strategy and execution for Runtime Platform.
Set the technical direction, build and develop the team, and are accountable for outcomes.
Translate product needs into platform capabilities and building trust through consistent delivery.

Wealthsimple aims to help everyone achieve financial freedom by reimagining how people manage their money. As the largest fintech company in Canada, it has over 3+ million users and manages more than $100 billion in assets, fostering inclusive and high-performing teams.

View details Similar jobs

Source Job