Source Job

US Unlimited PTO

  • Invent new managed compute primitives that feel first-class in Temporal Cloud.
  • Design self-optimizing autoscaling systems that scale worker fleets safely and predictably.
  • Architect, build, and operate services on the hot path of task execution where performance and correctness are customer-visible.

Distributed Systems Cloud Infrastructure Kubernetes

20 jobs similar to Senior Software Engineer, Compute (Temporal Cloud)

Jobs ranked by similarity.

$230,000–$285,000/yr
US Canada

  • Own the technical vision, roadmap, and delivery of scalable, multi-tenant distributed systems for Workflow and Function Runtime primitives.
  • Lead and grow multiple engineering teams, being hands-on with architecture, code reviews, incidents, and driving operational excellence including SLOs and on-call.
  • Recruit, develop, and retain senior engineering talent, collaborating across product, security, and infrastructure groups to meet diverse workload needs.

GitLab is an intelligent orchestration platform for DevSecOps that helps organizations improve developer productivity, operational efficiency, and security. The company has over 50 million registered users, a high-performance culture driven by values like collaboration and efficiency, and embraces AI as a core productivity multiplier.

Canada

  • Design, develop, and maintain core infrastructure supporting large-scale optimization engines and planning workflows to improve scalability and performance.
  • Analyze and optimize performance bottlenecks in optimization pipelines, focusing on compute, memory usage, and data flow for complex planning problems.
  • Contribute to evolving platform architecture, designing systems for large datasets and parallel execution while ensuring enterprise-grade reliability and maintainability.

Kinaxis is a global leader in modern supply chain orchestration, providing an AI-powered platform for end-to-end supply chain transparency and faster decision-making. The company has over 2000 employees globally, is a multi-time Top Employer award winner, and fosters a culture of innovation with a serious focus on technology, customers, and a collaborative, not-too-serious internal environment.

Canada

  • Set technical strategy for your team on a year-long scale and tie it to business-impacting projects.
  • Collaborate across product management, design, and analytics to ensure technical sustainability and manage risks.
  • Foster a culture of quality and ownership by setting code review standards and developing team talent.

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. They are a remote-first company that provides competitive benefits anchored to their core value of people coming first.

$152,000–$190,000/yr
US

  • Lead architectural design and technical discovery for complex, distributed systems across our platform.
  • Define and evolve system boundaries, service interactions, and data flow within our event-driven ecosystem.
  • Guide the design of scalable, fault-tolerant systems leveraging asynchronous communication patterns (e.g., RabbitMQ, Kafka, SNS/SQS).

Fanatics is building a leading global digital sports platform that ignites the passions of global sports fans. We offer products and services across Fanatics Commerce, Fanatics Collectibles, and Fanatics Betting & Gaming. Our more than 22,000 employees are committed to relentlessly enhancing the fan experience and delighting sports fans globally.

Europe 6w PTO

  • Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
  • Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
  • Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.

Grafana Labs is a remote-first, open-source powerhouse with over 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics (Grafana Mimir), logs (Grafana Loki), and traces (Grafana Tempo).

$160,000–$180,000/yr
US Unlimited PTO

  • Identify systemic engineering challenges across our platforms and drive their resolution.
  • Write code, review PRs, debug production issues, and optimize system performance.
  • Partner with engineering teams as a technical point of contact on complex projects.

Zeta Global is an AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to help marketers acquire, grow, and retain customers more efficiently. They were founded in 2007 and are headquartered in New York City with offices around the world.

$210,000–$278,000/yr
US Unlimited PTO

  • Architect future iterations of core systems, addressing scaling requirements.
  • Design and implement developer tools to enhance deployment safety and reproducibility.
  • Drive excellence in monitoring and guide incident response for quick issue resolution.

Found provides tools for self-employed individuals, offering a business bank account that automates taxes and expense tracking. They aim to give self-employed people the security and peace of mind historically available only at large corporations and are looking for kind, resourceful, and passionate people.

$180,000–$200,000/yr
US

  • Own and evolve a scalable observability platform spanning metrics, logs, traces, and events.
  • Design telemetry pipelines ingesting data from GPUs, CPUs, networking, containers, APIs, and BMC/Redfish.
  • Design and implement noise-resistant alerting systems to improve signal quality and reduce operational load.

Lightning AI builds an end-to-end platform for developing, training, and deploying AI systems, designed to take ideas from research to production with less friction. They combine developer-first software with cost-efficient, large-scale compute, serving solo researchers, startups, and large enterprises.

5w PTO

  • Drive major technical initiatives from design through production, improving scalability, reliability, and correctness across critical systems.
  • Design and evolve backend services, APIs, event-driven workflows, and data models that support complex business processes at scale.
  • Improve the operational foundations of the platform through better observability, testing, deployment safety, and incident reduction.

Tem is rebuilding the energy transaction, making it transparent and fair. They aim to put power back in the hands of customers and tackle the critical problem of access to low-cost electricity, leveraging AI-driven infrastructure for efficient and sustainable energy markets.

US 6w PTO

  • Design, build, and operate reconciliation systems to track desired stack state, detect and repair drift across stack templates, grafana.com state, Hosted Grafana, and actual customer stack configuration.
  • Collaborate across SSS, grafana.com, and deployment configurations to ensure stack lifecycle workflows remain reliable, observable, and resilient.
  • Improve operational efficiency by reducing deployment complexity and contributing to the Stack Config Reconciliation project.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack. Their team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.

US 4w PTO

  • Architect complex systems and make critical technical decisions.
  • Solve challenging technical problems with innovative solutions.
  • Mentor engineers and promote engineering excellence across teams.

Aledade, a public benefit corporation, empowers independent primary care. Founded in 2014, it's the largest network of independent primary care in the country, helping practices deliver better care and thrive in value-based care with a collaborative, inclusive, and remote-first culture.

Germany 6w PTO

  • Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
  • Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
  • Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, and their team thrives in an innovation-driven environment.

$245,000–$295,000/yr
US

  • Build, lead, and grow the platform team, setting the pace and creating an environment where strong engineers want to stay.
  • Remain hands-on by writing code, reviewing architecture decisions, and debugging production issues while owning the platform's technical direction.
  • Steer projects through ambiguity, solving technical problems, resourcing gaps, and prioritization calls to ensure the infrastructure scales effectively.

OpenRouter is the leading AI routing and infrastructure layer that enterprises use to access, manage, and optimize the best large language models across providers. It's a fast-scaling technology company powering advanced AI teams by providing flexibility, scalability, and future-proof infrastructure.

$212,000–$286,000/yr
US Unlimited PTO

  • Design and build Temporal Cloud's identity platform end-to-end.
  • Scale the auth hot path to meet Temporal Cloud's SLOs.
  • Integrate with enterprise IdPs and threat-model identity flows.

Temporal provides an open-source programming model simplifying code and enhancing application reliability. They aim to be the reliable foundation of every developer’s toolbox. They value curiosity, drive, collaboration, authenticity, and humility and are growing.

APAC

  • Partner directly with customer engineering teams running training and inference workloads in production.
  • Investigate failures involving distributed training, Kubernetes orchestration, GPU allocation, networking, and storage systems.
  • Identify recurring patterns across customer issues and drive long term reliability improvements.

Lightning AI is the company behind PyTorch Lightning, building an end-to-end platform for developing, training, and deploying AI systems. They serve solo researchers, startups, and large enterprises, operating globally with offices in New York City, San Francisco, Seattle, and London.

US

  • Lead the design and evolution of Fieldguide's core platform services.
  • Build platform capabilities that compound the leverage of product and AI engineers.
  • Define the architecture for how new product capabilities get delivered across environments.

Fieldguide is automating and streamlining the work of assurance and audit practitioners. They are based in San Francisco, CA, remote-first and backed by Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, and more.

US Canada 6w PTO

  • Earning the trust of our large-scale operator customers to further Grafana's "big tent" philosophy of data accessibility and to meet clear business objectives.
  • Designing and leading the development of backend services, distributed systems, and enterprise features at scale.
  • Driving continuous improvement of our engineering culture through words and actions.

Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, which can be run fully managed with Grafana Cloud or self-managed with the Grafana Enterprise Stack. The Grafana team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything they do.

Hungary

  • Participation in the OTC operation (openstack based cloud technology).
  • Ticket administration in our ticketing tool (SNOW, Jira), and work on issues.
  • Change implementation on our platform.

Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group and was ranked as Hungary’s most attractive employer in 2025. The company provides a wide portfolio of IT and telecommunications services with more than 5300 employees.

$180,000–$300,000/yr
US 20w maternity 12w paternity

  • Act as a trusted advisor to clients, providing technical expertise and guidance throughout engagements
  • Conduct PoCs, workshops, presentations, and training sessions on GPU cloud technologies and best practices
  • Collaborate with clients to understand their business requirements and develop solution architectures

Lavendo partners with startups and high‑growth companies to help them hire top‑tier sales, go‑to-market, and technical talent. They are an equal opportunity workplace and consider all qualified applicants without regard to race, color, religion, national origin, age, sex, marital status, ancestry, disability, genetic information, veteran or military status, gender identity or expression, sexual orientation, or any other characteristic protected by law.

$188,550–$212,150/yr
Global Unlimited PTO

  • Own the technical direction of Remote's SRE/Platform domain.
  • Define and drive the reliability strategy across the platform.
  • Identify and lead AI enablement initiatives across the engineering organisation.

Remote is solving modern organizations’ biggest challenge – navigating global employment compliantly with ease. With our core values at heart and a future-focused work culture, our team works tirelessly on ambitious problems, asynchronously, around the world.