Remote Devops Jobs · Kubernetes

Job listings

$85,000–$120,000/yr

The Senior DevOps Engineer will own the design, implementation, and optimization of our cloud infrastructure and deployment pipelines for our IoT-enabled AI platform, ensuring reliable, scalable, and secure operations. This role bridges hardware innovation with software intelligence and collaborates closely with Software Engineering, Integration, and Product teams.

We are searching for an engineer focused on our DevOps, infrastructure, reliability, and tooling. This is a DevOps position, you'll be part of a team focused on building and maintaining an enterprise-level, cloud infrastructure for microservices, and cloud-native applications running on Kubernetes. The person who joins our team will be encouraged and empowered to influence new solutions that will keep our technology running smoothly.

$77,324–$94,712/yr
Canada US Unlimited PTO

The Cloud Platform Team ensures that all our systems are reliable, secure, and meet their uptime targets. As a member, you'll manage our cloud infrastructure, design, build, and run distributed, fault-tolerant systems, and improve and automate existing processes. In this role, you will: Manage, monitor, and improve existing infrastructure and related tools. Support and enhance release processes.

Deliver platform components in a clean and consolidated build. This position within the DevOps team will be responsible for process workflow (monitoring and documentation), continuous integration with the code repository (Jenkins pipelines), configuration management (Ansible, Pipelines), and vendor management (cloud providers). The ideal candidate will be comfortable in a dynamic environment and possess excellent troubleshooting and organizational skills, along with the ability to deliver complete solutions for multiple product development pipelines.

As a Site Reliability Engineer (SRE) at Alpaca, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services. You will work closely with development, operations and DevOps teams to build and maintain robust applications, ensuring they run smoothly and efficiently. This role requires a blend of software engineering and operations skills, with a strong ability to troubleshoot technical issues and resolve problems before they impact our users.

$131,325–$201,000/yr

As a founding member of the Site Reliability Engineering (SRE) team, helps define the culture and build the systems that keep regulated, cloud-based production environments reliable. Designs, implements, and operates observability, reliability, and incident management systems. Partners with engineering teams to define SLIs, SLOs, and error budgets, build runbooks and operational playbooks, and develop the monitoring and automation needed to ensure systems are reliable and compliant.

As the Engineering Manager for the Platform Team, the role is the operational owner of the infrastructure layer that enables Onebrief to move quickly and securely. Managing and mentoring a team of Platform Engineers to success includes defining the methodology for how they support software delivery in commercial, government, and classified environments. Working cross-functionally with DevOps, AppSec, and Engineering leadership will be absolutely necessary to ensure tooling is a strategic asset.

As an Machine Learning Engineer focusing on MLOps you will play a pivotal role in operationalising our ML models, ensuring they are scalable, reliable, and easy to monitor; you'll build feature stores for model development, facilitate model refit/retrain iterations, enhance model deployment (CI/CD) pipeline, as well as strengthen model monitoring, working in a small team of Data Scientists and Data Engineers in APAC.

ENS Labs is seeking an experienced DevOps Engineer to manage the infrastructure for ENS, ensuring fast and reliable resolution across chains, involving infra management, automation/CI/CD, observability, and security. You'll run the metadata service, CCIP-Read gateways, multichain indexers, and emerging Namechain/L2 nodes in a small, remote team.