Source Job

US

  • Drive improvements to Dagster’s backend system, underlying open source framework, and product UI.
  • Solve difficult technical problems throughout the software stack and work collaboratively with the rest of the team.
  • Instrument, monitor, debug, and optimize distributed systems from end to end.

Python React AWS Kubernetes Postgres

20 jobs similar to Software Engineer - Observability Product

Jobs ranked by similarity.

  • Architect observability platform: Design, implement, and maintain the LGTM stack as the primary observability platform across all engineering teams.
  • Build internal observability products: Design and develop production-grade internal platform products with React/TypeScript frontends and Python/Rust backends.
  • Develop custom log indexing systems: Architect and build high-performance log indexing solutions using Rust that process logs and provide sub-second search across billions of log lines.

Judi Health is an enterprise health technology company providing a comprehensive suite of solutions for employers and health plans. They have a mission of rebuilding trust in healthcare in the U.S. and deploying the infrastructure we need for the care we deserve.

$120,480–$155,950/yr
Europe Unlimited PTO

  • Build and maintain stable, scalable foundational services that can be leveraged by other engineering teams.
  • Collaborate with many internal partners and product teams to influence the design of our API surface.
  • Design and develop reliable, secure, highly available and delightful experiences for the dbt Cloud admin and the end user.

Dbt Labs is the pioneer of analytics engineering, helping data teams transform raw data into reliable, actionable insights. They've grown from an open source project and now serve more than 5,400 dbt Platform customers, including Astra Zenica, Sky, Nasdaq, Volvo, JetBlue, and SafetyCulture.

US 6w PTO

  • Design, implement, and maintain scalable integrations for metrics, logs, and traces across cloud and Kubernetes environments.
  • Build middleware, libraries, and services to simplify development and observability workflows.
  • Lead technical direction and strategic planning for observability projects.

They are currently looking for a Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring in United States. This role offers a unique opportunity to shape and advance cloud observability solutions for large-scale systems, focusing on metrics, logs, and traces.

Global

  • Collaborate with product management and stakeholders to deliver SaaS solutions.
  • Design, develop, and maintain backend services using Python and FastAPI for agentic workflows.
  • Contribute to developing multi-agent systems and integrate agents with third-party tools.

Granicus is dedicated to transforming the Govtech industry by uniting governments and constituents through technology. They offer cloud-based solutions for communications, website design, meeting management, and digital services, empowering stronger relationships between government and residents. The company has 5,500 federal, state, and local government agency customers.

$130,000–$160,000/yr
US

  • Learn and build expertise across several software engineering disciplines.
  • Solve challenging Airflow problems for our customers.
  • Spend up to 25% of your time on side projects that contribute to Astronomer’s overall success.

Astronomer empowers data teams to bring mission-critical software, analytics, and AI to life and is behind Astro, the industry-leading unified DataOps platform powered by Apache Airflow®. They are trusted by more than 800 of the world's leading enterprises, letting businesses do more with their data.

  • Work directly with enterprise customers to deploy and configure OpenTelemetry instrumentation across their environments.
  • Build custom integrations, dashboards, and tooling to help customers realize the full value of Dash0.
  • Troubleshoot complex issues in distributed systems, Kubernetes clusters, and observability pipelines.

Dash0 is building an AI-centric platform that eliminates vendor lock-in and meaningless toil and is OpenTelemetry-native. They are backed by top-tier investors including Balderton Capital, Accel and Cherry Ventures and led by a founding team with decades of experience in observability.

Global 5w PTO 14w maternity

  • Own and evolve the uptime monitoring platform to enhance customer capabilities.
  • Deploy a Clickhouse instance to capture check run logs and design APIs for reporting.
  • Collaborate with customers to resolve bugs affecting their infrastructure.

Jobgether is a platform posting jobs on behalf of partner companies. We use AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements.

$200,000–$230,000/yr
US

  • Own the technical direction of the Signals platform.
  • Lead a team of engineers and QA, setting priorities and driving execution.
  • Work directly with product, data, revenue, and customers to translate business problems into a technical roadmap.

YipitData is a market research and analytics firm for the disruptive economy. They analyze billions of alternative data points to uncover actionable insights across sectors. The company emphasizes transparency, ownership, and continuous mastery in their award-winning culture recognized by Inc.

$180,000–$220,000/yr
US Unlimited PTO

  • Code, test, debug, document, and ship and maintain software applications using established coding standards and methodologies
  • Designing and building out API’s that will be consumed by the UI to surface application data.
  • Partner with product management to drive agile delivery of both existing and new products based on project requirements, UX design, and industry best practices

SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They are building the future of identity verification in the United States replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.

$160,800–$193,000/yr
US Unlimited PTO

  • Create robust pipelines to process massive daily volumes of data.
  • Build and support scalable pipelines as part of Torc’s Data Factory.
  • Scale Torc’s data lake through a distributed storage system.

Torc has been a leader in autonomous driving since 2007 and is now part of the Daimler family, focused on developing software for automated trucks. Their culture is collaborative, energetic, and team-focused, offering flexibility and valuing work/life balance.

US

  • Design, develop, and deliver high-quality software iteratively and incrementally.
  • Take ownership of key components and services—from hands-on coding to deployment and monitoring.
  • Contribute to a culture of learning, curiosity, and continuous improvement within the engineering team.

Best Egg is a market-leading, tech-enabled financial platform helping people build financial confidence through a variety of installment lending solutions and financial health tools. They offer top-tier benefits and growth opportunities in a culture built on their core values.

$120,000–$210,000/yr
Europe

  • Deep engagement with exploration geologists and data scientists, continual learning about mineral exploration, and tailoring technology development to the needs of exploration project scientists
  • Building data pipelines and tooling for deriving advanced human and machine insights from exploration data, often leading a small group of software engineers to successful delivery
  • Developing expertise in KoBold’s Data Systems and deeply understanding how they impact exploration

KoBold builds AI models for mineral exploration and deploys those models—alongside our novel sensors—to guide decisions on KoBold-owned-and-operated exploration programs. In the six years since founding, KoBold has become by far both the largest independent mineral exploration company and the largest exploration technology developer.

$103,174–$117,720/yr
Canada

  • Lead efforts to scale and improve our infrastructure.
  • Develop and support internal team tooling.
  • Troubleshoot, debug and resolve issues as part of a shared on-call rotation.

Lillio, formerly HiMama, empowers early childhood educators through innovative tools. They are a Series B, private-equity backed company recognized as an industry leader and selected in 2025 by Time Magazine as one of the world's top EdTech companies.

Global

  • Design, build, and maintain scalable backend services primarily using Python
  • Develop and operate cloud-native systems on AWS, ensuring reliability, security, and performance
  • Contribute to infrastructure design and automation using Terraform

Smart Working connects skilled professionals with global teams for full-time, long-term roles, breaking down geographic barriers. They value growth and well-being, fostering a genuine community and empowering individuals to thrive in a remote-first world.

$200,000–$285,000/yr
US

  • Spearhead development and implementation of observability tools.
  • Drive performance and ensure resilient systems.
  • Provide technical guidance and improve operational efficiencies.

Jobgether is a platform connecting job seekers with employers using AI-powered matching. They aim to ensure applications are reviewed quickly and fairly, focusing on core role requirements.

$123,586–$174,043/yr
Europe

  • Break down larger projects into individual tasks and deliver them in multiple phases.
  • Support peers and stakeholders in the product development lifecycle.
  • Contribute to a sense of community by engaging in growth and development activities.

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. They are a remote-first company that values its employees by offering competitive benefits.

  • Lead and resolve technically deep Level 2 support cases from initial triage to full root cause analysis and final fix.
  • Diagnose issues across distributed, cloud-native systems, with emphasis on application and API behaviour.
  • Perform code-level debugging (Python, Go, or Java) to pinpoint application defects or misconfigurations.

Mambu is a leading SaaS cloud banking platform. They are on a mission to make banking better for a billion people and shape the future of financial services.

$230,000–$250,000/yr
US Unlimited PTO 12w paternity

  • Define and evolve reliability standards for the SmarterDx platform.
  • Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
  • Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.

SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.

Australia

  • Support and implement monitoring and alerting strategy across Kraken’s customer business.
  • Define and uphold observability best practices across multiple products and platforms.
  • Partner with product teams to implement observability tooling and improve reliability across the organisation.

Kraken is a technology company focused on creating a smart, sustainable energy system. Their operating system for energy is transforming the industry around the world in a way that benefits everyone. They are a Great Place to Work with genuinely decent, honest, and empathetic people.

$160,000–$200,000/yr
US

  • Lead the design, development, and delivery of enterprise-grade, customer-facing web applications using modern microservices architectures.
  • Provide senior technical leadership through architecture decisions, design reviews, and code reviews, ensuring scalability, reliability, security, and maintainability.
  • Manage, mentor, and grow a distributed team of software and quality engineers, fostering a culture of ownership, accountability, and continuous improvement.

Firstup's mission is to improve the employee experience. They serve 40 of the Fortune 100 companies, reaching and connecting more than 17 million employees daily and are experts in the employee experience, workforce communications and technology.