The Senior DevOps Engineer will own the design, implementation, and optimization of our cloud infrastructure and deployment pipelines for our IoT-enabled AI platform, ensuring reliable, scalable, and secure operations. This role bridges hardware innovation with software intelligence and collaborates closely with Software Engineering, Integration, and Product teams.
Remote Devops Jobs
319 results
FiltersJob listings
Troubleshoot complex Kubernetes and vCluster issues directly with customers via Slack and ticketing as a crucial part of the small support team. This role involves working closely with engineering to take ownership of customer problems and helping shape how support is delivered as the company grows.
Strengthen backend systems and ensure the stability, scalability, and performance of the core marketplace platform. This role is critical as we modernize our architecture, build new services, and improve reliability across our Django- and AWS-powered infrastructure. This engineer will occasionally partner on frontend work to support cross-pod needs.
We are searching for an engineer focused on our DevOps, infrastructure, reliability, and tooling. This is a DevOps position, you'll be part of a team focused on building and maintaining an enterprise-level, cloud infrastructure for microservices, and cloud-native applications running on Kubernetes. The person who joins our team will be encouraged and empowered to influence new solutions that will keep our technology running smoothly.
The Cloud Platform Team ensures that all our systems are reliable, secure, and meet their uptime targets. As a member, you'll manage our cloud infrastructure, design, build, and run distributed, fault-tolerant systems, and improve and automate existing processes. In this role, you will: Manage, monitor, and improve existing infrastructure and related tools. Support and enhance release processes.
Deliver platform components in a clean and consolidated build. This position within the DevOps team will be responsible for process workflow (monitoring and documentation), continuous integration with the code repository (Jenkins pipelines), configuration management (Ansible, Pipelines), and vendor management (cloud providers). The ideal candidate will be comfortable in a dynamic environment and possess excellent troubleshooting and organizational skills, along with the ability to deliver complete solutions for multiple product development pipelines.
As a Senior AI Platform engineer at Vanta, you will play a crucial role in shaping Vanta’s AI offerings, improving our systems, and setting the long-term technical strategy for our end-to-end AI architecture. You’ll be part of the core AI team, working alongside a multidisciplinary group to implement, scale, and maintain pipelines and systems that accelerate AI innovations.
As a Senior Site Reliability Engineer, you will partner with development teams to manage infrastructure, improve CI/CD pipelines, and support operational excellence across Growth and help ensure the reliability, scalability, and performance of the systems that power Kraken’s growth initiatives. You will bring your expertise in infrastructure, monitoring, and automation to ensure Kraken’s services are performant, resilient, and continuously improving.
As a Site Reliability Engineer (SRE) at Alpaca, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services. You will work closely with development, operations and DevOps teams to build and maintain robust applications, ensuring they run smoothly and efficiently. This role requires a blend of software engineering and operations skills, with a strong ability to troubleshoot technical issues and resolve problems before they impact our users.
As a founding member of the Site Reliability Engineering (SRE) team, helps define the culture and build the systems that keep regulated, cloud-based production environments reliable. Designs, implements, and operates observability, reliability, and incident management systems. Partners with engineering teams to define SLIs, SLOs, and error budgets, build runbooks and operational playbooks, and develop the monitoring and automation needed to ensure systems are reliable and compliant.