Source Job

Global

  • Provide production support on a shift according to the team on-call roster.
  • Work on the customer and internal engineering/implementation team raised tickets while not on-call for production support.
  • Continuously monitor the health and performance of our services, systems, and infrastructure.

Linux AWS Azure Python Bash

20 jobs similar to Site Reliability Engineer 3

Jobs ranked by similarity.

Unlimited PTO

  • Assess and improve visibility by identifying gaps in dashboards, metrics, and logs.
  • Refine alerts and dashboards for critical services to catch issues earlier.
  • Automate routine checks and monitoring tasks to free up engineers.

PlayOn is where high school sports come to life through platforms like GoFan, NFHS Network, and MaxPreps. As a growth-stage company backed by KKR, we build the technology that powers high school athletics from ticketing and streaming to fundraising and merchandise.

Brazil

  • Maintain and optimize AWS EC2 and EKS clusters to ensure high availability and performance.
  • Lead troubleshooting of production outages, providing timely resolution and root cause analysis.
  • Implement and improve CI/CD pipelines using tools like Jenkins and GitHub Actions to streamline deployment processes.

CI&T are tech transformation specialists uniting human expertise with AI to create scalable tech solutions. With over 8,000 CI&Ters globally, they have built partnerships with more than 1,000 clients over 30 years, and Artificial Intelligence is deeply embedded in their work reality.

  • Support and maintain Azure cloud infrastructure.
  • Administer Windows and Linux servers.
  • Troubleshoot infrastructure and application issues.

Tieto Tech Consulting solves clients’ toughest technology challenges and delivers reliable outcomes. Tieto Tech Consulting merges data, cloud, AI and design with deep industry expertise to create impactful digital solutions.

US Global

  • Performing day-to-day operational/DevOps tasks on Wikimedia’s public facing infrastructure.
  • Implementing and utilizing configuration management and deployment tools.
  • Leading continuous improvement, by automating the installation, configuration and maintenance of services on our platform.

The Wikimedia Foundation operates Wikipedia and other Wikimedia free knowledge projects with the vision of a world where every single human can freely share in the sum of all knowledge. As a charitable, not-for-profit organization, it relies on donations and has staff members based in 40+ countries.

$188,550–$212,150/yr
Global Unlimited PTO

  • Own the technical direction of Remote's SRE/Platform domain.
  • Define and drive the reliability strategy across the platform.
  • Identify and lead AI enablement initiatives across the engineering organisation.

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.

Mexico

  • Design systems with resilience, graceful degradation, and capacity in mind.
  • Define and measure SLOs and SLIs that actually reflect what our customers feel.
  • Use Datadog (logging, metrics, APM) together with CloudWatch to build signal-heavy, noise-light observability.

EarnIn is building products that deliver real-time financial flexibility for those with the unique needs of living paycheck to paycheck. They are growing fast and are excited to continue bringing world-class talent onboard to help shape the next chapter of their growth journey.

$29,000–$36,000/yr
India

  • Design, build, and maintain scalable, reliable systems on GCP.
  • Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
  • Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.

SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.

Global

  • Own customer issues from initial response through resolution, escalating when needed with clear documentation.
  • Troubleshoot platform and data ingestion issues across cloud and Kubernetes environments.
  • Support integrations including Filebeat, Fluentd, Fluent Bit, OpenTelemetry, AWS, and Azure services.

Logz.io helps customers keep their observability environments running smoothly across fast-moving cloud and Kubernetes ecosystems. They are focused on AI-powered capabilities, but their employee count is unavailable.

Latin America

  • Monitor critical production systems using advanced dashboards and proactive alerting.
  • Act as the primary technical responder for live production incidents and Slack escalations.
  • Collaborate deeply with core DevOps and software engineering teams to elevate platform reliability.

Inallmedia.com is a global technology and design firm focused on building impactful digital solutions through remote, distributed teams across LATAM. They partner with international clients across industries, providing long-term technical expertise, product innovation, and team augmentation.

$110,000–$140,000/yr
US

  • Perform systems administration and maintenance including patching and vulnerability scanning.
  • Primarily support AWS environments, including Windows and Linux virtual machines.
  • Troubleshoot issues across network, compute, application, and identity layers.

Tyto Athene delivers mission-focused digital transformation through IT services and solutions. They have over 50 years of experience and foster a collaborative, innovative, and mission-driven environment.

US

  • Administer and support cloud-native infrastructure powering Telecommunication systems, with a strong focus on AWS-hosted services.
  • Perform day-to-day system administration tasks within AWS GovCloud, including provisioning, configuring, monitoring, and patching Linux-based virtual machines.
  • Monitor cloud system performance using CloudWatch, CloudTrail, and Splunk, diagnosing and resolving infrastructure issues to maintain 24/7 uptime.

TekSynap is a fast-growing high-tech company that understands the pace of technology and the need for a comprehensive information management environment. They aim to utilize the best of information technology to meet the business needs of Federal Government customers.

Brazil Unlimited PTO

  • Collaborate with a tight-knit development team.
  • Design, deploy, and operate critical systems balancing reliability, cost, and agility.
  • Perform troubleshooting and root-cause analysis of system operation issues.

Loadsmart is a logistics technology company valued at over $1 billion. We are a collection of industry veterans and user-centered engineers using innovative technology to fearlessly reinvent the future of freight.

Europe

  • Designs, develops, tests and implements infrastructure for CI/CD pipelines and IaC.
  • Manages source code, configuration management, release management, build and deployment activities.
  • Consults and implements new innovative technologies to satisfy innovation strategy.

Deutsche Telekom IT Solutions Slovakia entered the life of Košice region in 2006. They are the second largest employer in the eastern part of the country with more than 3900 employees, providing innovative information and communication technology services.

$127,800–$135,900/yr
US

  • Building infrastructure as code and DevOps pipelines and reviewing solutions.
  • Researching and analyzing technical solutions, maintaining and enhancing documentation.
  • Proactively identifying blockers, risks, and issues, proposing solutions or escalating as appropriate.

Nava is a consultancy and public benefit corporation working to make government services simple and effective. They guide agencies constrained by legacy systems to a future with sharp user experiences built on secure, reliable, fault-tolerant cloud infrastructure.

US Unlimited PTO

  • Provide technical support to customers through email, screen sharing, and chat within established SLAs.
  • Own and resolve complex technical customer issues, partnering with Technical Support Specialists.
  • Problem-solve and troubleshoot in a repeatable manner, documenting in the Support CRM to identify trends.

Vanta helps businesses earn and prove trust by enabling companies to practice better security. They have a talented team and empower companies to improve and prove their security.

Engineer

FAL
$180,000–$250,000/yr
US

  • Build and maintain Python fleet tracking system that manages the full lifecycle of servers.
  • Build server management tooling that automates provisioning, health checks, GPU diagnostics, recovery and alerting.
  • Create and maintain metrics, dashboards, and alerting for hardware health across the fleet.

FAL is committed to keeping a large fleet of GPU servers healthy and productive. They offer a collaborative and supportive culture with learning and growth opportunities.

$115,200–$172,800/yr
US 8w paternity

  • Build internal tooling to help other engineers and the rest of the company understand and operate our system.
  • Design and implement security best practices for our team and infrastructure.
  • Reduce toil through automation, including building and maintaining CI/CD infrastructure.

Openly is rebuilding insurance from the ground up by re-envisioning and enhancing every aspect of the customer experience. They are a rapidly growing team of exceptional, curious, empathetic people with a wide range of skill sets, spanning many departments.

  • Provide real-time operational support for live/on-air programming across NBCU brands and shows, prioritizing based on impact.
  • Own incident response during your shift, triaging and driving issues forward to restoration without waiting for others.
  • Perform hands-on L1–L3 support for virtualized production systems, using logs and monitoring to identify root cause candidates.

NBCUniversal is a world-leading media and entertainment company that creates world-class content distributed across film, television, and streaming. They own leading entertainment and news brands, operate film and television studios, and manage theme parks and experiences worldwide.

Ireland

  • Design, build, and deploy production systems with a focus on scalability, reliability, observability, and performance.
  • Develop and maintain comprehensive automation solutions to eliminate toil and streamline operational efficiency.
  • Proactively monitor production systems and implement automated incident response mechanisms to minimise downtime.

Arista Networks is an industry leader in data-driven, client-to-cloud networking for large data center, campus and routing environments. The company is well-established and profitable with over $8 billion in revenue and values diversity and inclusivity.

  • Maintain the reliability and performance of customer environments remotely, supporting Mirantis Opensack/k0s layers.
  • Diagnose and resolve system-level issues, requiring hands-on Linux administration experience.
  • Troubleshoot customer environments based on Linux, OpenStack, Kubernetes, networking, and other cloud technologies; detect, report, and resolve issues.

Mirantis helps enterprises move to the cloud on their terms, delivering a true cloud experience on any infrastructure, powered by Kubernetes. They serve many of the world’s leading enterprises and value openness, collaboration, risk-taking, and continuous growth.