The Senior DevOps Engineer will take complete ownership of architecting, deploying, and managing our sophisticated AWS -based infrastructure, including services like EC2, EKS, S3, IAM, Fargate, Lambda, and SQS. Independently drive and own the entire infrastructure as code strategy, expertly using Terraform and Terragrunt to build and maintain a fully automated environment.
Job listings
As a Senior DevOps Engineer, you will continuously improve our development operations and support the reliability and availability of all our applications and services deployed to the cloud. Partner with various engineering teams to own and manage availability, latency, performance, reliability and scalability of all services to maintain SLAs that our customers expect from us. Provide strong technical leadership and people management to the team.
Design, deploy, and operate large-scale distributed systems across compute, storage, networking, and AI/ML environments. Lead projects from architecture to automation to intelligent monitoring, collaborating with both clients and teammates to build resilient, high-performing infrastructure. You'll operate and optimize Kubernetes clusters, Istio service mesh, and Linux-based systems, automating workflows using Go, Python, and Shell scripting.
Develop and enhance services and offerings on cloud platform in a customer-oriented manner, planning, implementing and maintaining a stable technical infrastructure to support organizationβs business processes. Solve complex problems in the daily operation of a hyper-scaler's cloud backend and support development of cloud native apps.
In this role, you will address critical challenges such as efficiently operating and managing over 200 Kafka clusters, supporting and monitoring over 1 million Kafka topics, and handling over 1PB of data daily, ensuring minimal latency and high reliability.
As a Senior DevOps Engineer in 3Cloudβs managed services team, you will be responsible for ongoing support, addressing escalations from the monitoring team, and performing proactive maintenance for our clientβs Azure platforms, utilizing the Managed Services teams processes, procedures, and tools. You will play a critical role in our core team, essential to the success of our Managed Services division.
As a Site Reliability Engineer, you will be part of a platform team, providing internal tools and products to all technology teams in the company to increase productivity, stability, efficiency, and minimize risks for internal clients. You will contribute to cloud and on-premise infrastructure, observability, automation, alert and incident management, delivery pipelines, and other technologies to ensure the availability of the company's systems.
As a Staff Engineering Operations Specialist, you will play a key role in the operations and improvement of engineering practices across ecobeeβs progressive CDM Engineering team by helping us grow our engineering maturity and culture. You will be an important partner to all levels of engineering leadership as we continue to create an amazing work environment in which we can consistently and effectively deliver high-quality software.
Join Granicus as a Site Reliability Engineer! You will be pivotal in ensuring the reliability, scalability, and performance of our services, leading efforts in building and maintaining a robust infrastructure, automating processes, and guiding the team to implement best practices in site reliability. This role involves on-call production support, monitoring systems, automating processes, incident management, and collaboration with software engineers.
This role provides technical direction, implements new tools to maintain and improve service, ensures the continuous reliability of the production environment, and identifies opportunities for improvement. A senior engineer will be 'tuned in' to the wider team to understand frustrations and pain points. Senior DevOps Engineers are expected to mentor and support more junior team members.