You will play a critical role in building and maintaining scalable infrastructure, managing CI/CD pipelines, and deploying cloud-native and bare-metal applications. This role requires a high level of technical expertise, a collaborative mindset, and a strong desire to continuously improve systems and processes
Job listings
Observability Platforms is focused on creating visibility into DigitalOcean’s services and infrastructure by designing, building, and operating internal logging, metrics, distributed tracing, error reporting, monitoring, and alerting platforms that are depended on to ensure good, reliable experiences for DigitalOcean’s customers. The senior engineer will integrate and operate open source observability software and implement features to improve operability and next generations of metrics and logging.
As a Software Engineer on our Infrastructure team, you’ll help design, build, and operate the systems that power our real-time collaborative design tools used by millions of people worldwide. We’re scaling fast, and we’re looking for experienced infrastructure engineers across a variety of teams. Whether you’re passionate about storage, compute orchestration, developer tooling, networking, or real-time data systems, this role offers an opportunity to shape the technical foundation of one of the most beloved design platforms in the world.
Design and implement solutions to problems of scale for multi-site deployment and management of CoreWeave’s global server hardware fleet. Build and maintain backend services and APIs (gRPC/REST) in Go or Python to interact with Kubernetes and other infrastructure systems. Develop provisioning services, automation workflows, and fleet management tools that span from bare metal to container orchestration.
An experienced Senior DevOps-Networking Engineer comfortable working in multiple cloud environments and experienced in cloud networking components. They should be comfortable in the full Software Development Lifecycle (SDLC) with networking experience and a DevOps mindset. The DevOps-Networking Engineer will work in a fast paced, results driven environment and be responsible for highly scalable, secure enterprise applications.
Lead the team responsible for the operational reliability of our bare metal infrastructure, networking, and system configuration that powers our product offerings in this hands-on "player/coach" role. You will help shape a critical function in a growing company, evolving the Network Operations Center (NOC) into a modern, proactive SRE function that leverages automation, data science, and reliability engineering principles.
Lead the Platform Team and enhance engineering productivity, enable fast, reliable, and secure software delivery, and ensure that infrastructure, tooling, and processes scale with the company’s needs. Lead a high-impact team of 6 individuals that builds and maintains the engineering platform, providing the foundation for 15+ product development teams. Drive developer productivity improvements by optimizing workflows, automation, and developer experience across build, test, and deployment processes.
We are looking for a skilled and motivated Lead Infrastructure Engineer to lead our Platform Engineering team. As the team leader, you will direct the planning, design, development, and implementation of our platform architecture, ensuring it meets the needs of our growing product portfolio. You will guide a talented team of engineers, driving best practices and fostering a culture of excellence and innovation.
The Reliability Engineering team helps realize our vision by supporting Coinbase engineering teams to build software that is world-class in terms of its reliability. As a core service team, Coinbase Reliability Engineers work closely with the rest of engineering. Improve observability, reliability and availability by defining and measuring key metrics. Build automation and improve systems to eliminate toil and operations work.
Play a key role in shaping the future of our global infrastructure, overseeing a global infrastructure of ~10,000 on-prem servers, you’ll tackle unique technical challenges, engineer scalable systems, and have a direct impact on the reliability and performance of our products. Build Reliable Infrastructure, Automate Everything, Ensure Observability, Solve Complex Issues, and Collaborate & Innovate.