Play a key role in building the next generation AI cloud platform β a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters.
Job listings
We are looking for a Site Reliability Engineer to join our growing global team at Sectigo, who will design and implement solutions to reduce toil and ensure reliability of our critical services. This is a full-time and remote position, with the ideal candidate located within 1-hour of vehicle commute distance from Newark, New Jersey area. The core functions are to ensure the reliability of critical products and services, and automate deployments by following CI/CD practices.
Design and implement solutions to problems of scale for multi-site deployment and management of CoreWeaveβs global server hardware fleet. Build and maintain backend services and APIs (gRPC/REST) in Go or Python to interact with Kubernetes and other infrastructure systems. Develop provisioning services, automation workflows, and fleet management tools that span from bare metal to container orchestration.
An experienced Senior DevOps-Networking Engineer comfortable working in multiple cloud environments and experienced in cloud networking components. They should be comfortable in the full Software Development Lifecycle (SDLC) with networking experience and a DevOps mindset. The DevOps-Networking Engineer will work in a fast paced, results driven environment and be responsible for highly scalable, secure enterprise applications.
Deploy new servers. Implement infrastructure changes, plan and execute infrastructure maintenance. Diagnose, localize, and resolve issues related to both software and hardware in the infrastructure. Enhance infrastructure lifecycle pipelines. Strong proficiency in Linux systems is needed and experience with OpenStack.
Easypost is seeking a highly experienced and skilled Senior Engineer to work with our DevOps team. This role will be involved in designing, building, and optimizing our cloud infrastructure, ensuring scalability, reliability, and high availability in a multi-Cloud environment. The ideal candidate will have deep expertise in cloud platforms and a strong background in DevOps and automation.
As a Senior Security (DevSecOps) Engineer I for Product Security, you will play a central role in helping secure our enterprise, cloud native environments, applications and data. You will work with various engineering and infrastructure teams to ensure our cloud environments are secure and scalable. Weβre looking for engineers that understand cloud, data and automation and know how to actively employ these ingredients at scale.
This role as Cloud Developer (Senior) on a team supporting a government customer, utilizing Agile teams to advance the organization's Enterprise Services capabilities, focusing on delivering secure, scalable, and high-quality solutions that meet evolving business needs. The team follows the Scaled Agile Framework (SAFe) Agile methodology, ensuring a continuous delivery model. The teams will be responsible for understanding complex technical and organizational requirements.
The CNS Systems Delivery team is seeking a highly skilled Staff Network Development Engineer with extensive experience in network enablement, automation, and scripting. The ideal candidate will possess advanced network engineering expertise in both on-premise and hyperscaler cloud operations, adept at developing tools, scripts, and applications to enhance automation and improve network deployments. You will collaborate with a dedicated team of network engineers to expand our datacenter and cloud production network globally.
Seeking an individual with experience in AWS to join our Service team to design, implement and support public cloud solutions for our customers. Responsibilities include designing, deploying, and maintaining AWS infrastructure using best practices, focusing on reliability, scalability, and security, automating infrastructure provisioning, and implementing a Kubernetes-based platform.