Jobs Similar to Site Reliability Engineer | TangerineFeed

Site Reliability Engineer

Jobgether 13 hours ago

LATAM

Monitor production systems, dashboards, logs, and alerts to ensure high availability and performance across distributed environments.
Assist in incident detection, triage, escalation, and resolution, following structured on-call rotations with mentorship support.
Maintain, follow, and continuously improve runbooks, operational procedures, and incident response workflows.

Linux Python Bash Go Cloud

20 jobs similar to Site Reliability Engineer

Jobs ranked by similarity.

Senior Site Reliability Engineer, Observability

ScienceLogic 17 days ago

US Unlimited PTO

Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle.
Build and execute the monitoring strategy for ScienceLogic SaaS infrastructure.
Define, deploy, and maintain system and service monitors.

ScienceLogic is a leader in IT Operations Management, giving modern IT operations actionable insights for faster problem resolution and prediction. They see everything across cloud and distributed architectures, contextualizing data through relationship mapping, and acting on this insight through integration and automation.

View details Similar jobs

Site Reliability Operations

Truelogic 10 days ago

US

Lead incident response as Incident Commander, coordinating teams, communications, and service restoration
Produce executive-level incident reports, run RCAs, and drive continuous improvement
Enforce change management and risk assessment for production changes

Truelogic is a leading provider of nearshore staff augmentation services headquartered in New York, delivering top-tier technology solutions to companies of all sizes. Their team of 600+ highly skilled tech professionals, based in Latin America, drives digital disruption by partnering with U.S. companies on their most impactful projects.

View details Similar jobs

Site Reliability Engineer

66degrees 24 days ago

US

Ensure near-zero downtime with monitoring and alerting, self-healing automation, and continuous improvement
Create highly automated, available and scalable systems by applying software and infrastructure principles
Employ and advise clients on DevOps and SRE principles and practices, covering deployment pipelines, HA, service reliability, technical debt, and operational toil for live services running at scale

66degrees is an AI transformation partner. They guide enterprises from business challenges to quantifiable outcomes, helping businesses reach their inflection point where chaotic data becomes a strategic asset, complexity becomes clarity, and AI becomes an engine for growth. They believe in thriving through challenges and winning together.

View details Similar jobs

Senior Site Reliability Engineer

Transcend 2 days ago

$150,000–$167,000/yr

US

Lead reliability-focused design and readiness reviews.
Build, operate, and continuously improve our observability stack.
Own and evolve incident management practices.

Transcend is building the privacy platform that easily embeds privacy into your entire tech stack. They are growing quickly, backed by top-tier investors and are proud to serve some of the world's most iconic brands.

View details Similar jobs

Senior Site Reliability Engineer

Jobgether 8 days ago

$113,082–$175,725/yr

Canada

Operate and maintain large-scale data systems, ensuring stability and performance.
Design, implement, and optimize deployment processes using virtualization.
Monitor system health, analyze failures, and identify instability sources.

Jobgether is a platform that uses AI-powered matching to connect candidates with companies. They ensure applications are reviewed quickly, objectively, and fairly, then share a shortlist of top candidates directly with the hiring company.

View details Similar jobs

Systems Reliability Engineering

MEMX 7 hours ago

Global

Responsible for providing support of MEMX exchange platforms including on-call, respond to incidents and support triaging the issue
Help isolate and resolve unplanned system outages
Enhance monitoring and alerting based on symptoms

MEMX is building a next-generation exchange that will bring greater competition, transparency, and efficiency to equity trading. We offer competitive employee benefits and perks and will continue to make this a priority to attract the best.

View details Similar jobs

Site Reliability Engineer II

Restaurant365 13 days ago

$98,583–$138,016/yr

US Unlimited PTO

Respond to production incidents and contribute to post-incident analysis.
Identify and automate manual processes to improve efficiency and reduce risk.
Enhance monitoring tools and platforms to improve observability.

Restaurant365 is a SaaS company that provides a unique, centralized solution for accounting and back-office operations for restaurants. They focus on empowering team members to produce top-notch results while elevating their skills.

View details Similar jobs

Staff Site Reliability Engineer

Juniper Square 10 days ago

US Canada Europe Asia

Automate the provisioning of all of Juniper Square’s infrastructure in code.
Partner with our Platform Engineering team on building developer tooling / improving developer experiences via joint initiatives and enhancements.
Partner with our Data Engineering team on improving our data posture and driving operational excellence.

Juniper Square's mission is to unlock the full potential of private markets by digitizing them to bring efficiency, transparency, and access. They are a values-driven organization with a hybrid workplace strategy, allowing employees to collaborate effectively across multiple countries and offering physical offices in several major cities.

View details Similar jobs

Senior Site Reliability Engineer (Forward Deployed)

Teleport 11 days ago

$180,800–$311,000/yr

US

Work directly with customers to ensure successful Teleport deployments.
Meet regularly with customers, understand pain points blocking deployments and remove roadblocks.
Work with customers to articulate the problem they are trying to solve, gather requirements, and make the business case to the product and engineering teams to invest in resolving the issue.

Teleport is the Infrastructure Identity Company, modernizing identity, access, and policy for infrastructure, improving engineering velocity and resiliency of critical infrastructure against human factors and/or compromise. They are a fast-growing, well-funded Y-Combinator company that values craft, strongly supports work/life balance, and embraces a culture of humility, honesty, and transparency.

View details Similar jobs

Site Reliability Engineer

Bobsled 17 days ago

US Canada Europe

Design, build, and maintain highly available, scalable infrastructure.
Manage and optimize infrastructure across GCP, AWS, Azure, and other cloud providers.
Develop comprehensive monitoring, logging, and alerting systems.

Bobsled is seeking a Site Reliability Engineer to enhance its data-sharing platform's reliability and scalability. We're a company that values growth, offering flexible work hours in a fully remote environment and fully sponsored individual coaching for all employees.

View details Similar jobs

Senior Software Engineer, Infrastructure

Engine 17 days ago

Latin America

Design, implement, and manage cloud infrastructure using Infrastructure as Code (IaC) tools.
Design, build, and maintain scalable CI/CD pipelines using tools like CircleCI or GitHub Actions.
Implement and maintain observability tooling (Prometheus, Grafana, Datadog), and lead incident response to ensure system reliability.

Engine is transforming business travel into something personalized, rewarding, and simple. More than 20,000 companies already rely on Engine to support over 1 million travelers and billions in annual bookings each year.

View details Similar jobs

Site Reliability Engineer

Patreon 21 days ago

US Unlimited PTO

Contribute to high impact AWS cloud infrastructure initiatives.
Participate in operability and production readiness reviews.
Advocate and implement Site Reliability Engineering practices.

Patreon is a media and community platform where creators give fans access to exclusive work. They have generated over $10 billion for creators and have 25 million+ paid memberships, with a hybrid work model and offices in New York and San Francisco.

View details Similar jobs

Crypto Production Engineer

Wormhole Foundation 25 days ago

Act as first responder and incident commander during production incidents
Improve reliability and uptime across all Wormhole services
Harden infrastructure for security and operational resiliency

Wormhole Foundation empowers passionate people in the research and development of blockchain interoperability technologies. They support teams building secure, open-source, and decentralized products within the Wormhole ecosystem.

View details Similar jobs

SENIOR PLATFORM ENGINEER

Moxie 22 days ago

Global

Own and operate core platform systems across AWS, GCP, Vercel, Github, and Cloudflare.
Improve reliability, scalability, and security of production and non-production environments.
Improve local development environments and onboarding experience for engineers.

Moxie empowers ambitious aesthetic entrepreneurs to build profitable, independent practices. A global, remote-first team of more than 140 people supports hundreds of practices nationwide as they unlock sustainable success for aesthetic entrepreneurs.

View details Similar jobs

New DevOps Engineer

Higher Logic 7 days ago

US

Manage and troubleshoot Linux-based systems in production and non-production environments.
Improve infrastructure automation, monitoring, and operational processes.
Assist with incident response, root cause analysis, and continuous improvement.

Higher Logic provides online communities and communication tools to help organizations build, retain, and grow their member or customer base. They are a global company with offices throughout the US, Canada, and Australia, serving more than 3,000 customers.

View details Similar jobs

Site Reliability Engineer

Linus Health 13 days ago

US

Leverage infrastructure as code (Terraform) to build and maintain complex production and analytics workflows including networking and containerized services.
Rapidly diagnose and resolve faults in system services as part of a 24/7 on-call rotation focused on actionable alerting and eliminating toil.
Improve speed of delivery by developing and maintaining CI/CD pipelines.

Linus Health is a Boston-based digital health company transforming brain health worldwide. They combine cutting-edge neuroscience, clinical expertise, and AI to advance early detection and intervention for cognitive and brain disorders, empowering people to live longer, healthier lives. With 100+ team members and growing, they’re entering a phase of accelerated growth and looking for top talent to help shape their future.

View details Similar jobs

Staff Software Engineer - Observability Knowledge Graph Backend

Grafana Labs 20 days ago

$174,986–$209,983/yr

US Canada 6w PTO

Work with your team to build and roll out new features, then use the results to iterate and improve.
Drive projects from initial ideation all the way to operations once it is in the hands of customers.
Maintain critical systems, and own their reliability, performance, and availability.

Grafana Labs is a remote-first, open-source powerhouse with over 20M users. They provide observability strategies for over 3,000 companies, featuring scalable metrics, logs, and traces, and thrive in an innovation-driven environment with transparency, autonomy, and trust.

View details Similar jobs

Senior Site Reliability Engineer - Infrastructure

Underdog 30 days ago

$160,000–$240,000/yr

US Unlimited PTO 11w maternity

Own and maintain the incident response process, including defining procedures, tools, and best practices
Guide teams in establishing and monitoring Service Level Objectives (SLOs), including setting up alerts and reporting systems
Lead capacity planning initiatives, focusing on both short and long-term scalability while optimizing costs

Underdog makes sports more fun by building the best products for sports fans. They are a fast-growing sports company valued at $1.3B with a focus on a seamless, simple, easy to use, intuitive and fun app.

View details Similar jobs

Site Reliability Engineer III

Veeam 27 days ago

$109,800–$252,500/yr

US Unlimited PTO 16w maternity 8w paternity

Design, implement, and maintain scalable and reliable infrastructure solutions.
Automate deployments and maintain a resilient, secure SaaS application platform.
Develop comprehensive monitoring and alerting solutions, and respond to incidents.

Veeam is the #1 global market leader in data resilience, believing businesses should control all their data whenever and wherever they need it, providing data resilience through data backup, data recovery, data portability, data security, and data intelligence. Based in Seattle, Veeam protects over 550,000 customers worldwide who trust Veeam to keep their businesses running.

View details Similar jobs

Senior Site Reliability Engineer

Clarifai 29 days ago

US

Ensure the smooth operation and high availability of Clarifai's core services
Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

View details Similar jobs