Monitor public and internal IT services round-the-clock. Process events in the incident management system in a timely manner. Diagnose issues and fix them when possible. Develop and maintain the existing monitoring systems: Terraform for managing resources on AWS and VMware vSphere, Ansible for configuration management, TeamCity for Continuous Delivery; develop the Prometheus + Kubernetes bundle. Perform DevOps tasks for other teams.
Remote Devops Jobs · Linux
51 results
FiltersJob listings
The SRE II will collaborate with engineering, product, and operations teams to embed reliability practices into day-to-day development and operations while contributing to tools and processes that improve efficiency and reduce manual effort. This role focuses on building automation, maintaining observability, and supporting incident response to keep customer-facing systems performing at their best and help ensure the stability, scalability, and reliability of our services and infrastructure.
You would be working in our pre-training team focused on building out our distributed training and inference of Large Language Models (LLMs). This is a hands-on role that focuses on software development best practices, maintenance, and code architecture. You will have access to thousands of GPUs to verify changes.
Lead Infrastructure Engineers help clients build and evolve systems that client organizations use to deliver and run software, combining technical expertise and understanding with consideration of different situational needs. They champion technical quality and effective ways of working as a means to better outcomes for clients. You will explore the client’s needs and drive the building of a technical roadmap and impactful solution that will support their ambitious business goals.
Responsible for designing, building, and maintaining development setups and tools. Work closely with developers, DevOps, and infrastructure teams to ensure the efficiency and quality of our software development. Opportunity to join a fast-growing company and support the success of enterprise clients.
As a Senior Platform Engineer, you are a champion for DevOps and SRE culture and industry best practice within Megaport. You will work alongside talented team members in multiple timezones ensuring that systems are secure, maintainable and available. External to the team you will be engaging with stakeholders in requirements analysis and demonstrations. Technically you will be very hands on and continually evolving your skills through peer reviews and research.
Troubleshoot complex Kubernetes and vCluster issues directly with customers via Slack and ticketing as a crucial part of the small support team. This role involves working closely with engineering to take ownership of customer problems and helping shape how support is delivered as the company grows.
The Cloud Platform Team ensures that all our systems are reliable, secure, and meet their uptime targets. As a member, you'll manage our cloud infrastructure, design, build, and run distributed, fault-tolerant systems, and improve and automate existing processes. In this role, you will: Manage, monitor, and improve existing infrastructure and related tools. Support and enhance release processes.
As a Site Reliability Engineer (SRE) at Alpaca, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services. You will work closely with development, operations and DevOps teams to build and maintain robust applications, ensuring they run smoothly and efficiently. This role requires a blend of software engineering and operations skills, with a strong ability to troubleshoot technical issues and resolve problems before they impact our users.
Join Vultr's Platform Services team as a Senior Platform Engineer, where you'll drive developer experience and infrastructure automation. You will design systems for continuous delivery and internal developer tooling, optimizing CI/CD pipelines and automating infrastructure deployments using Terraform and Puppet or Kubernetes. Collaborate with engineering teams to integrate observability and reliability into delivery pipelines, enhance development environments, and contribute to configuration management. Author documentation and improve DevOps practices.