Source Job

US Unlimited PTO

  • Lead the entire Software Development Lifecycle from start to finish.
  • Design and build multi-component, distributed systems that operate at scale.
  • Investigate issues with a methodical approach to identify their root cause.

Kubernetes SQL AWS GCP Go

20 jobs similar to Staff Software Engineer, DevProd (Observability)

Jobs ranked by similarity.

$145,000–$185,000/yr
US Unlimited PTO

  • Be a keen learner, working with cloud-native, highly scalable infrastructure and gaining expertise in container orchestration, networking, and observability.
  • Be a passionate problem solver, tackling scalability, reliability, and troubleshooting challenges in distributed systems.
  • Be a great communicator, engaging directly with developers, engineering teams, and product teams to understand infrastructure challenges and provide solutions.

Temporal provides an open-source programming model that simplifies code, improves application reliability, and helps developers focus on delivering features faster. They aim to be the reliable foundation of every developer’s toolbox and value curiosity, drive, collaboration, genuineness, and humility.

Americas EMEA Unlimited PTO

  • Design and implement highly scalable infrastructure for GitLab.com to support current and future growth.
  • Collaborate with cross-functional teams across the Infrastructure organization to plan and deliver projects that shape GitLab’s platform direction.
  • Operate and improve edge services and Kubernetes workloads, acting as a subject matter expert within the infrastructure department.

GitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. They aim to enable everyone to contribute to and co-create the software that powers our world.

US Unlimited PTO

  • Design, build, and maintain systems that power Temporal's customer acquisition.
  • Collaborate with Finance, RevOps, and Sales teams to transform business requirements.
  • Develop, test, and maintain code for distributed data systems.

Temporal is an open-source programming model company that simplifies code and makes applications reliable. They are a growing company looking for individuals who share their values and want to influence their future.

US Unlimited PTO

  • Design and implement core backend service features
  • Provide appropriate test coverage for unit, integration, and performance for your feature ownership area
  • Clearly document design choices and operational knowledge to successfully deploy and run service with those features

Temporal provides an open-source programming model simplifying code and enhancing application reliability, allowing developers to focus on feature delivery. They are a growing company aiming to be the reliable foundation of every developer's toolbox with a curious, driven, collaborative, genuine, and humble culture.

$180,000–$225,000/yr
US Canada Unlimited PTO

  • Lead the design and implementation of features for our Cloud Operational API.
  • Drive architectural discussions and set direction for the scalability and reliability of our services.
  • Take ownership across the lifecycle of services from design to operations.

Temporal simplifies code and makes applications more reliable. They are building a team to be the reliable foundation of every developer’s toolbox. They value curiosity, drive, collaboration, genuineness and humility and are looking for those who share those values.

US

  • Designs, implements, and continuously improves observability strategies across services.
  • Focuses on understanding system behavior in production, identifying failure modes, performance bottlenecks, and reliability risks.
  • Evolves and maintains shared AWS CDK and CDK8s constructs, with emphasis on observability, autoscaling, and operational safeguards.

Truelogic is a leading provider of nearshore staff augmentation services. They have a team of 600+ highly skilled tech professionals based in Latin America, partnering with U.S. companies on impactful projects and valuing expertise and aspirations.

$169,000–$195,000/yr
US

  • Build and evolve systems that provision and manage Aerospike Cloud clusters.
  • Collaborate with product managers, architects, control-plane engineers, and SREs.
  • Design and implement systems that behave predictably under load and degrade gracefully.

Aerospike is the real-time database for mission-critical use cases and workloads, including machine learning, generative, and agentic AI. They power millions of transactions per second with millisecond latency. At Aerospike, their mission is to unleash the power of the world’s real-time data with a database built for infinite scale, speed, and sustainability.

  • Contribute to our core product, primarily in Go, on services that power our applications.
  • Design and refine technical systems, helping to shape them to remain scalable, reliable, and elegant.
  • Collaborate closely across disciplines to explore problems, prototype ideas, and iterate quickly.

Humanitec is reshaping how enterprises build and run their cloud-native setups and helps teams build Internal Developer Platforms (IDPs) that unlock true developer self-service. They are a fully remote company where small teams work closely.

$141,000–$208,000/yr
US Unlimited PTO

  • Design and develop a highly available, scalable, and secure ClickHouse Cloud platform.
  • Build innovative deployment automation across cloud, hybrid, and on-prem systems.
  • Solve unique scaling, reliability, and performance challenges in regulated environments.

ClickHouse is a fast-growing private cloud company recognized on the 2025 Forbes Cloud 100 list. With over 2,000 customers and ARR that has more than quadrupled over the past year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.

$137,000–$165,000/yr
US Canada

  • Build and extend services to provide new core functionality to the platform.
  • Expand our authentication, authorization, and analytics capabilities.
  • Build, test, deploy, and monitor a distributed set of services.

Censys strives to be the go-to source for understanding everything on the internet, offering comprehensive, accurate, and up-to-date internet mapping. They provide real-time internet intelligence and actionable threat insights to global governments and over 50% of the Fortune 500.

Global

  • Act as a specialized engineer who identifies interesting product opportunities, researches solutions, and quickly builds prototypes to validate if they work.
  • Take ownership of responsibilities beyond coding, including managing complex technical projects, writing technical abstracts, and helping the wider engineering team navigate difficult challenges.
  • Work directly with the CTO and operate outside the traditional engineering organization boundaries to solve high-level problems and ideate on new product capabilities.

Vcluster Labs is a venture-backed tech startup focused on enabling platform engineers. They've raised over $30M from VCs and have a remote-first work culture with a distributed team around the globe.

$167,800–$246,700/yr
US

  • Build and own product capabilities for Stytch’s agentic identity platform on Twilio.
  • Design, implement, and maintain scalable, reliable distributed services, optimizing for security, latency, and developer experience.
  • Partner with Product and Engineering leadership to set direction, translate customer needs into technical plans, and deliver high-impact roadmap features.

Twilio is shaping the future of communications from home. They deliver innovative solutions to hundreds of thousands of businesses and empower millions of developers worldwide to craft personalized customer experiences.

Global

  • Contribute to a core system that millions of end users will rely on
  • Implement backend services and work on designing an architecture where reliability matters
  • Take ownership of tasks, identify and address technical challenges proactively, and propose innovative solutions

Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24/5 trading, and more. Their global team of 230+ members is a diverse group of experienced engineers, traders, and brokerage professionals who are working to achieve their mission of opening financial services to everyone on the planet.

$120,000–$290,000/yr
US

  • Design and build critical systems that power PlanetScale's database platform.
  • Collaborate with a team of expert engineers to solve complex distributed systems challenges.
  • Work directly with customers to understand their needs and translate them into robust technical solutions.

PlanetScale is rapidly growing and reinventing the database space. The PlanetScale platform offers both Postgres and Vitess clusters with a company philosophy centered around building small teams. They are recognized as one of the fastest growing companies in America and strive to build an inclusive environment where all people feel that they are equally respected and valued.

$141,520–$208,000/yr
US

  • Build and own backend product capabilities for Stytch’s identity platform.
  • Design, implement, and maintain scalable, reliable distributed services.
  • Partner with Product and Engineering leadership to set direction.

Twilio is shaping the future of communications. With a dedication to remote-first work and a strong culture of connection and global inclusion, they deliver innovative solutions to hundreds of thousands of businesses and empower millions of developers worldwide.

US Unlimited PTO

  • Design, build, and maintain tools and systems that support release automation and deployment workflows.
  • Write clean, reliable, and concurrent code that supports distributed systems (e.g., build pipelines, deployment tooling).
  • Collaborate with cross‑functional teams to understand and improve release quality and developer productivity.

Temporal is an open source programming model simplifying code and enhancing application reliability. They are a growing company focused on being the reliable foundation of every developer’s toolbox, fostering a curious, driven, and collaborative culture.

US

  • Ensure the smooth operation and high availability of Clarifai's core services
  • Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency
  • Design and implement scalable, secure, and cost-effective infrastructure solutions

Clarifai is a leading AI platform specializing in computer vision and generative AI, empowering organizations to transform unstructured data into actionable insights. Founded in 2013, they have a diverse, globally distributed team with $100M in funding and are committed to building a diverse and inclusive team.

Unlimited PTO

  • Define the technical architecture for Docker's unified enterprise governance platform.
  • Own end-to-end delivery of major platform components.
  • Mentor engineers across the organization, helping them grow their technical skills and judgment.

Docker makes app development easier so developers can focus on what matters. Their remote-first team spans the globe and is united by a passion for innovation and great developer experiences. Docker is the #1 tool for building, sharing, and running apps and is trusted by startups and Fortune 100s alike.

US

  • Understand and participate in the changing FedRAMP space.
  • Own and champion high operational standards of Confluent Cloud systems leveraged by federal agencies.
  • Innovate and design solutions to reduce toil, bolster operational maturity, and make day-to-day worklife easier.

Confluent is rewriting how data moves and what the world can do with it. Their platform puts information in motion, streaming in near real-time so companies can react faster and build smarter. They value team players who ask hard questions, give honest feedback, and show up for each other.

Canada

  • Design, create, and maintain software and systems to improve the availability, scalability, and efficiency of Thumbtack's services
  • Set the architectural direction of infrastructure and platform services while supporting the engineering organization
  • Design and implement tools and processes used for deployment, change, service, and infrastructure management

Thumbtack helps millions of people confidently care for their homes through personalized guidance, AI tools, and a hiring experience. They have a growing community of 300,000 local service businesses.