This role is perfect for someone with a strong foundation in Linux systems, modern monitoring and observability tools, and a passion for ensuring system reliability at scale. You will be designing, implementing, and maintaining robust observability solutions, developing and managing logging systems, and collaborating with teams to build and maintain automation scripts.
Remote Devops Jobs · Docker
68 results
FiltersJob listings
Build ClickHouse's next frontier - secure, airgapped deployments for government and enterprise clients who can't use public cloud. Architect solutions for environments with zero internet connectivity, ensuring compliance with security frameworks while enabling elastic, limitless scale, high-performance server-less clickHouse Cloud capabilities in isolated environments.
As a Senior Site Reliability Engineer, you will help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. You’ll be focused on running better production applications and systems.
A Senior DevOps Engineer is needed with expertise in infrastructure-as-code (IaC) using Terraform to build and manage scalable infrastructure across cloud platforms. You’ll play a leading role in evolving our platform while driving cost efficiencies and operational excellence across our environments.
LILT is seeking a world-class devops engineer to build highly reliable software systems by applying software talents to our infrastructure and operations needs, including bare-metal Linux deployments and commercial deployments in GCP and AWS. You'll continuously improve our multi-region configuration, and help us expand into new regions. The person in this role may have technical leadership responsibility depending of experience and will collaborate to build the roadmap of the DevOps organization.
The Cloud Developer is responsible for designing, building and maintaining cloud hosted services and platforms to support Smile Digital Health’s SaaS and offering. This includes owning the development and maintenance of deployment artifacts such as HELM charts, docker container/ compose configurations, infrastructure as code and build/deployment/automation pipelines. The role works closely with platform, infrastructure, architecture and security teams to ensure cloud deployments are scalable, reliable, secure and aligned with enterprise architecture patterns.
You will play an essential role in building, maintaining, and improving the infrastructure that supports our growing suite of products. You’ll work closely with software engineers to ensure our systems are scalable, reliable, and secure, while driving continuous improvement in our deployment and monitoring processes. You’ll be part of a collaborative and fast-paced environment where you’ll automate workflows, optimize CI/CD pipelines, and contribute to the stability and performance of our cloud infrastructure.
As a Staff SRE, you will own the design, deployment, and continuous improvement of high-performance Solana validator infrastructure. This is a high-impact role for an engineer who thrives at the intersection of distributed systems, performance engineering, and blockchain protocol operations. You will implement and continuously improve validation, monitoring, alerting, and logging frameworks. You will also optimize validator timing and block production to improve APY and overall performance.
Maintain, optimize, and evolve our cloud applications and infrastructure. You’ll work closely with cross-functional teams to ensure our cloud environments are secure, reliable, scalable, and at the forefront of cloud technology. Design, deploy, and maintain secure, scalable, and high-performing cloud infrastructure using AWS services. Lead cloud migration initiatives.
As a Blockchain Site Reliability Engineer, you will be responsible for ensuring the reliability, availability, and performance of blockchain nodes and related infrastructure. You’ll monitor, troubleshoot, and resolve incidents in production environments, while also building automation tools to improve efficiency and reduce operational risks. This role requires strong Linux system expertise, solid on-call and incident response experience, and the ability to work under pressure to quickly restore services.