Manage and maintain our AWS infrastructure (ECS, SQS, RDS, Lambda, etc.). Implement and optimize CI/CD pipelines for multiple environments. Automate infrastructure provisioning with Terraform or similar IaC tools. Monitor and manage resource usage, scaling, and cost optimization.
Job listings
We are hiring a Site Reliability Engineering Manager aspiring for a world-class devops and gitops engineering management challenge, bringing together operations management, software engineering and product development, and team leadership in a single high-value role. You will need to be a Linux and operations expert, as well as a great manager capable of leading a high-performance team, to excel in this role.
You will join a team working with cutting-edge technologies and striving to utilize cloud services to the maximum. Your role will be to provide the necessary infrastructure using Terraform, develop CI/CD pipelines in GitLab or Jenkins, and create automations to support the development teams. As a senior engineer, you will also take ownership of specific domains or projects within the business, leading technical direction and ensuring alignment with strategic goals.
As a Site Reliability Engineer, you'll be an integral member of product teams, helping to build, deploy, and monitor cloud services reliably, actively developing code and build frameworks to monitor services deployed in production. You will be responsible for ensuring the reliability, availability, and performance of our Elasticsearch infrastructure.
The Astronomer Customer Reliability Engineering (CRE) team is responsible for the success of our customers' usage of our managed Airflow service. As an infrastructure specialist within the team, you will learn to become an expert on the reliability of Kubernetes and the underlying cloud infrastructure. You will create strong relationships with customers and help them achieve their reliability goals.
The Infrastructure Engineer will design and build automation, tooling, and systems that bridge the gap between physical infrastructure and the platforms that power large-scale AI/ML and HPC workloads. The role combines the breadth of a core infrastructure engineer with a specialty in high-performance networking and GPU communication. The engineer will help ensure the InfiniBand fabric and NCCL stack are tuned, reliable, and efficient at scale β supporting some of the worldβs largest GPU clusters.
Join our engineering team responsible for developing and operating the Secure Remote Access platform for DT Technik, enabling secure connections to DT Technikβs internal systems for suppliers, partners, and internal colleagues worldwide. You will play a key role in infrastructure development, ensure security compliance, and enable automation.
The Cloud Reliability Engineer will write and integrate various open source and closed sources tools and will be responsible for configuration management, containerization, and scripting. Duties include developing, configuring, and deploying tools for cloud based systems and services, containerizing new and legacy applications, and providing LOE/scoping for projects.
The Sr. Manager, Site Reliability & DevOps leads, manages, and coaches endpointβs Site Reliability, DBA, and Cloud Engineering teams to help build, operate, and ensure reliability of endpointβs software systems. This individual will work across departments and disciplines to deliver cloud infrastructure, database code, and deployment pipelines facilitating quality, reliability, and availability of endpointβs systems.
We're looking for a mid-level DevOps Engineer to join our globally distributed team and help scale and operate the infrastructure that powers millions of trades daily across CeFi and DeFi venues. Youβll play a mission-critical role in ensuring the performance, reliability, and security of our systems in one of the most demanding environments in tech: digital asset trading.