Jobs Similar to Software Engineer / Site Reliability Engineer

Senior Software Engineer (Golang, Kubernetes) - Cloud Compute Team

Canva 4 days ago

ANZ

Designing, building, and operating Kubernetes infrastructure across multiple cloud providers.
Building and maintaining automation for cluster lifecycle management, node provisioning, and provider onboarding.
Developing platform tooling and abstractions that enable other Canva engineers to deploy and scale workloads.

Canva is a design platform redefining how the world experiences design. They have campuses in Sydney and Melbourne, along with co-working spaces in Brisbane, Perth and Adelaide, offering a flexible and inclusive work environment.

View details Similar jobs

Senior Site Reliability Engineer- Remote

ClickHouse 8 days ago

$141,000–$230,000/yr

US

Collaborate with engineering teams to design and implement scalable, secure systems.
Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
Enhance incident response processes and post-mortem analysis for outages.

ClickHouse, recognized on the 2025 Forbes Cloud 100 list, is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.

View details Similar jobs

Senior Infrastructure Engineer/SRE

Cresta 18 days ago

$205,000–$270,000/yr

US Unlimited PTO

Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Focus on automation so we can spend energy where it matters.

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.

View details Similar jobs

Site Reliability Engineering (SRE) Intern

AWP Safety 13 days ago

$30–$34/hr

US

Help deploy and configure Dynatrace OneAgent and ActiveGates with automated tooling.
Define and instrument user‑centric metrics and objectives in Dynatrace.
Combine Davis® AI with Copilot/Claude to identify root causes and reduce MTTR.

AWP Safety's IT Internship Program is a hands‑on, learning experience for early‑career professionals who want to build a future in IT Site Reliability Engineering. They operate at the intersection of Software Engineering and Systems Operations, using Dynatrace to diagnose performance bottlenecks and automate "toil" out of existence.

View details Similar jobs

Senior Site Reliability Engineer

Akuity 1 day ago

US Canada

Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production

Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.

View details Similar jobs

Senior Site Reliability Engineer

Loadsmart 9 days ago

$172,614–$172,614/yr

US

Design infrastructure, networking, and software platform architecture.
Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
Troubleshoot infrastructure, internal applications, networking, and security issues.

Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.

View details Similar jobs

Infrastructure Reliability Engineer

Whatnot 23 days ago

North America Europe

Build distributed systems that support reliability, resiliency, and safe operation at scale.
Design and operate traffic control mechanisms: circuit breakers, rate limiting, admission control, backpressure, and graceful degradation.
Develop tooling that improves incident detection, response, and automated mitigation.

Whatnot is the largest live shopping platform in North America and Europe to buy, sell, and discover the things you love. They are a remote co-located team, inspired by innovation and anchored in their values.

View details Similar jobs

Senior Site Reliability Engineer

Diagrid 10 days ago

Unlimited PTO

Build and operate cutting-edge cloud infrastructure to support Diagrid's core products
Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available
Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds

Diagrid believes that open-source software, open standards and APIs are the greatest transformational tools for organizations. They provide developers with APIs and tools that help them focus on their code and not on infrastructure and are founded by the creators of the Dapr and KEDA open-source projects.

View details Similar jobs

Sr. DevOps Engineer

Jobgether 21 days ago

Europe

Implement SLI/SLO frameworks with error budgets to drive reliability decisions
Design release strategies including blue/green deployments and version tracking
Lead incident response and develop automated runbooks to reduce MTTR

Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.

View details Similar jobs

Software Engineer - Solutions Engineering

Canonical 26 days ago

Americas

Work in Python and Golang to design and deliver open source software operations code
Shape high quality open source monitoring and alerting infrastructure
Grow a healthy, collaborative engineering culture in line with the company values

Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. As the company that publishes Ubuntu, one of the most important open-source projects and the platform for AI, IoT, and the cloud, it is changing the world of software. The company has 1200+ colleagues in 75+ countries company and has a global distributed collaboration culture.

View details Similar jobs

Site Reliability Engineering Manager II

Flywire 14 days ago

$160,000–$200,000/yr

US

Help drive reliability, automation and performance within our cloud-based infrastructure.
Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.

Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.

View details Similar jobs

Staff Site Reliability Engineer

SmarterDx 7 days ago

$230,000–$250,000/yr

US Unlimited PTO 12w paternity

Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.

SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.

View details Similar jobs

Senior Product Engineer

Humanitec 17 days ago

Global

Contribute to our core product, working across our stack primarily in Go, on services that power our applications.
Design and refine technical systems, including microservices, customer interfaces, and automated tests.
Collaborate closely across disciplines to explore problems, prototype ideas, and iterate quickly.

Humanitec is at the forefront of the Platform Engineering revolution, as enterprise companies across the globe re-shape how they manage their cloud infrastructure. Their mission is to help platform engineering teams build Internal Developer Platforms that unlock true developer self-service.

View details Similar jobs

Senior Platform Engineer

Propel 1 day ago

$170,000–$240,000/yr

US 4w PTO

Own our fundamental cloud services and tooling.
Own our application platform.
Own our developer experience.

Propel builds technology that strengthens the social safety net. They are a passionate team of ~100 Propellers who envision a future where every American has the tools and resources they need to thrive, offering a remote-first working environment with headquarters in Brooklyn.

View details Similar jobs

Technical Lead - DevEx + Cloud Infrastructure

Traackr 2 days ago

$60,000–$80,000/yr

LATAM Unlimited PTO

Tech lead two teams (DevEx and Cloud Infrastructure) totaling 6–8 engineers: set technical direction, review key designs/changes, and raise engineering standards across both domains.
Own the delivery toolchain end-to-end (Git, CI, deployments/releases): reduce flakiness, improve build/test times, make releases repeatable with clear rollback, and drive adoption of org-wide standards through tooling, docs, and supported migrations.
Improve the software development lifecycle (setup → build/test → PR → deploy → observe) and standardize environments so teams spend less time on tooling and more time shipping.

Traackr is a global SaaS technology company providing a data-driven influencer marketing platform that marketers use to optimize investments, streamline campaigns, and scale programs. They are a remote-first company with offices in San Francisco, New York, Boston, Paris, and London and operate on a culture of mutual respect.

View details Similar jobs

Senior DevOps & Platform Engineer

About Us 23 days ago

Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

View details Similar jobs

Engineering Manager, Infrastructure (EVM / Cloud / Security)

Gensyn 20 days ago

US Europe

Build and lead the team responsible for the reliability, security, and scalability of Gensyn’s production infrastructure and developer platform.
Own the availability, scalability, and security posture of production systems: SLOs/SLIs, incident response, postmortems, reliability improvements, and hardening.
Drive delivery across ambiguous, high-stakes initiatives: roadmap planning, prioritization, and execution against tight timelines.

Gensyn is building a protocol that networks together the core resources required for machine intelligence to flourish alongside human intelligence. They value autonomy, independence, direct feedback and an extreme learning rate, and strive to reject mediocrity and waste.

View details Similar jobs

Platform Engineer

Rewe Group 7 days ago

Europe

Analyze, evaluate, and resolve network incidents and service requests (L1–L2).

REWE Group Austria develops innovative IT products and services for all corporate divisions in Austria and abroad, setting the tone for modern trade. They have more than 700 employees. Their culture is family-friendly, with flexible working hours and remote working options.

View details Similar jobs

Staff Software Engineer - Grafana Cloud k6

Grafana Labs 23 days ago

$174,986–$209,983/yr

US 6w PTO

Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.

View details Similar jobs

Senior Product Engineer

Humanitec 17 days ago

Global

Comfortable working in a fully remote environment.
Value designing solutions to customer problems.
Comfortable rolling up your sleeves to understand incidents.

Humanitec is at the forefront of the Platform Engineering revolution, as enterprise companies across the globe re-shape how they manage their cloud infrastructure. They aim to help platform engineering teams build Internal Developer Platforms that unlock true developer self-service.

View details Similar jobs

Source Job