Remote Devops Jobs · SRE

Job listings

  • Lead Reliability Engineering for User Experience.
  • Architect for Scale, partnering with product and infrastructure teams to design highly available systems.
  • Drive Automation to eliminate repetitive operational work through tooling and systems.

Reddit is a community-based platform where users submit, vote, and comment on various topics. It hosts over 100,000 active communities and attracts millions of daily active users, making it one of the largest and most influential internet platforms.

$160,000–$190,000/yr

  • Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture.
  • Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control.
  • Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics.

Launch Potato is a digital media company that connects consumers with leading brands through data-driven content and technology. They are headquartered in South Florida with a remote-first team spanning over 15 countries, with a high-growth, high-performance culture.

6w PTO

  • Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and their team thrives in an innovation-driven environment.

US Unlimited PTO

  • Design and implement secure, scalable infrastructure in Azure, integrating security best practices.
  • Partner with the infrastructure team to enhance the reliability and performance of systems.
  • Lead security incident response efforts within the Azure ecosystem and automate responses.

Mesh's mission is to enable consumers to pay and be paid with any asset, bridging the gap by making crypto payments reliable and ubiquitous. Backed by leading investors and combining a powerful orchestration engine with a seamless consumer app to unlock liquidity for the world.

  • Lead the design, implementation, and ongoing improvement of reliable, scalable, performant, and secure production platforms and services.
  • Work closely with cross-functional teams to build and maintain resilient infrastructure and deployment patterns.
  • Provide technical leadership and mentorship to engineers across the organisation, promoting strong engineering standards and operational best practice.

Cision empowers individuals to make an impact and values diverse perspectives. They foster curiosity, collaboration, and innovation while driving meaningful contributions to brands; they have offices in 24 countries throughout the Americas, EMEA and APAC.

US Unlimited PTO

  • Lead software engineering teams providing infrastructure-as-code to manage cloud infrastructure.
  • Hire experienced site reliability staff, and a line manager to grow and oversee the SRE team.
  • Establish design-before-build discipline; facilitate lightweight design documents, architectural decision records, and working group reviews.

Horizon3.ai is a cybersecurity company dedicated to enabling organizations to proactively find, fix, and verify exploitable attack vectors. They are a fast-growing company with a culture of respect, collaboration, ownership, and results.