You will own the design and implementation of high-availability patterns, state replication and recovery, and robust observability for our real-time planning and coordination services.
This is a senior engineering role for someone who wants to be the technical owner of a reliable distributed system in production.
Engineer restart-safe, idempotent workflows for trip/ticket handling and routing decisions so replays, retries, and partial failures do not cause double assignment or missing states.
Own the vision and roadmap for how developers build and run workflows.
Define the primitives, APIs, and developer experience that make Temporal the best platform.
Drive coherence across Temporal’s primitives so workflows, activities, and messaging feel like parts of one operating system for distributed apps.
Temporal is an open source programming model that can simplify code, make applications more reliable, and help developers focus on the important things.
Serve as the critical bridge between our engineering team and customer deployments by focusing on integration and deployment automation. Debug complex distributed systems issues in the field, analyzing logs, network traffic, and system performance to resolve software failures quickly. Develop automated test frameworks and scripts to validate software functionality, performance, and integration with third-party systems.
Swarm Aero is redefining air power, building the largest swarming UAV and most versatile swarming aircraft network in the world.
Observe work across teams and surface patterns limiting speed and clarity.
Translate complex decision-making into clear structures and scaling mechanisms.
Redesign information flows to improve reliability and reduce judgment concentration.
Formic is revolutionizing American manufacturing by making automation accessible to all manufacturers, increasing their factory productivity. They are backed by leading global investors and are experiencing rapid growth in production hours, aiming to grow "Made in America" products.
Work on distributed build systems and automated testing services. Improve workflows and deliver content to players. Tackle the unique challenges of continuously delivering live service games.
Join Bungie’s Central Technology organization, supporting Bungie projects like Destiny and Marathon, and become part of an innovative environment where your expertise can thrive.
Lead the Reliability & Operations function within the Developer & Production Enablement (DPE) division of RWS’s Product & Technology organization. Take ownership of global production operations and lead the transition from manual, ticket-based workflows to platform-integrated automation. Ensure stability today, while designing for scalability and autonomy in the future.
RWS's purpose is to unlock global understanding, valuing every language and culture, and celebrating diversity and inclusion to make the company strong.
Lead the full Software Development Lifecycle from requirements and design to implementation, deployment, and long-term operation.
Define and uphold high standards of technical excellence and reliability.
Work closely with all the engineering teams to evolve the release process and ownership model.
Temporal is an open source programming model that can simplify code, make applications more reliable, and help developers focus on the important things.
Create automated testing approaches and infrastructure for validating reliability, performance, and resilience of cloud orchestration tools and applications
Enable engineering teams across Canonical to develop software with confidence by making distributed system testing tooling available across the company
Enhance continuous integration pipelines for deploying and testing Canonical’s cloud native products such as Kubeflow Deploy, manage, and debug highly distributed systems
Canonical is a software company that provides open-source software solutions. They are known for their Ubuntu operating system and Juju cloud orchestration tools.