Source Job

$110,000–$125,000/yr
US

  • Monitor cloud infrastructure and application health using observability tools; respond to alerts.
  • Perform Tier 1 incident triage, document findings, and escalate appropriately to Development or SRE teams.
  • Monitor and support CI/CD pipelines to ensure successful builds and deployments.

Grafana AWS

20 jobs similar to Cloud Operations Engineer

Jobs ranked by similarity.

Global

  • Design and enhance proactive monitoring capabilities for AWS Amazon Connect CCaaS platforms.
  • Collaborate with developers, architects, and platform owners to establish logging standards.
  • Troubleshoot and resolve production incidents, performing root cause analysis and implementing preventive measures.

Miratech helps visionaries change the world. They are a global IT services and consulting company that brings together enterprise and start-up innovation. Miratech retains nearly 1000 full-time professionals, and their annual growth rate exceeds 25% with a culture of relentless performance.

Global

  • Supporting and maintaining AWS-based CCaaS and contact center environments.
  • Monitoring, troubleshooting, and resolving production issues.
  • Using observability tools such as Splunk and Zabbix.

Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation. They are a values-driven organization with a culture of relentless performance that has enabled over 99% of Miratech's engagements to succeed. The company has nearly 1000 full-time professionals, and their annual growth rate exceeds 25%.

$120,000–$140,000/yr
US Unlimited PTO

  • Architect and manage scalable cloud infrastructure within AWS.
  • Implement and maintain infrastructure using Terraform.
  • Develop automation scripts to improve operational efficiency.

Attune empowers insurance agents with their technology solutions. We foster a remote-first culture and value employee development.

Australia

  • Support and implement monitoring and alerting strategy across Kraken’s customer business.
  • Define and uphold observability best practices across multiple products and platforms.
  • Partner with product teams to implement observability tooling and improve reliability across the organisation.

Kraken is a technology company focused on creating a smart, sustainable energy system. Their operating system for energy is transforming the industry around the world in a way that benefits everyone. They are a Great Place to Work with genuinely decent, honest, and empathetic people.

Europe

  • Developing infrastructure to support cloud-based applications.
  • Creating deployment architect and continuous delivery pipelines.
  • Designing high-availability approaches, and implementing monitoring architecture.

Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.

$72,000–$111,000/yr
US Unlimited PTO

  • Enabling faster incident response by improving monitoring coverage, alert accuracy, and root cause visibility
  • Helping teams shift from reactive to proactive operations by applying telemetry data and AI-driven insights
  • Empowering service owners with clear dashboards and actionable insights that guide performance improvements

HealthEquity's mission is to save and improve lives by empowering healthcare consumers. They envision making HSAs as widespread and popular as retirement accounts by 2030, valuing individuals more than their positions and passionate about connecting health and wealth for American families.

US

  • Design and maintain scalable cloud environments using tools like Terraform, CloudFormation, or Ansible.
  • Build and optimize automated deployment pipelines to ensure rapid and reliable software delivery.
  • Implement robust monitoring, logging, and alerting frameworks to ensure 24/7 system health.

CodeRoad offers end-to-end software development services, helping businesses scale with infrastructure solutions. They provide staff augmentation, dedicated IT teams, and software engineering to empower businesses in a digital landscape.

Global

  • Provide incident response within defined SLAs, troubleshoot production issues, and perform root cause analysis.
  • Monitor and maintain observability using Splunk, CloudWatch, Zabbix, and similar tools.
  • Manage Amazon Connect configurations, contact flows, bots (Lex), and integrations with Lambda, S3, QuickSight, and DynamoDB.

Miratech is a global IT services and consulting company that brings together enterprise and start-up innovation. They support digital transformation for some of the world's largest enterprises with their values-driven organization and a culture of relentless performance. Miratech retains nearly 1000 full-time professionals, and their annual growth rate exceeds 25%.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

Europe

  • Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
  • Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
  • Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.

Fixify is on a mission to reimagine IT teams support companies. They need a Senior Site Reliability Engineer who finds joy in building systems that fade into the background, empowering product engineers to ship with confidence and their customers to work without interruption.

Europe

  • Build and maintain CI/CD pipelines and GitOps workflows across a diverse set of engineering teams.
  • Own observability — monitoring, alerting, logging — and support development teams in instrumenting their services.
  • Optimise infrastructure for security, cost, performance and reliability.

1inch is a decentralized finance (DeFi) platform. We empower users to access the best rates and execute efficient and secure trades across multiple liquidity sources.

Global

  • Design and implement comprehensive monitoring strategies.
  • Take ownership of production incident response, lead handling, and drive remediation.
  • Continuously improve operational processes, reliability practices, and team readiness.

InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.

India

  • Provide day-to-day support, administration, and monitoring of clients’ AWS cloud infrastructure.
  • Participate in weekly status calls with clients to review open issues, planned changes, and improvement recommendations.
  • Assist in designing and developing automation solutions for monitoring, scaling, and managing cloud workloads.

AHEAD builds platforms for digital business by weaving together advances in cloud infrastructure, automation and analytics, and software delivery, helping enterprises deliver on the promise of digital transformation. They prioritize creating a culture of belonging where all perspectives and voices are represented, valued, respected, and heard.

$230,000–$250,000/yr
US Unlimited PTO 12w paternity

  • Define and evolve reliability standards for the SmarterDx platform.
  • Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
  • Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.

SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.

US

  • Own and scale AWS and Kubernetes infrastructure.
  • Build and maintain CI/CD pipelines and infrastructure-as-code.
  • Lead observability and monitoring initiatives.

Truelogic is a nearshore staff augmentation services provider headquartered in New York. They deliver technology solutions to companies of all sizes, helping them achieve their digital transformation goals with a team of 600+ highly skilled tech professionals based in Latin America.

US

  • Collaborate with application engineering teams on platform infrastructure.
  • Enhance observability and spearhead the adoption of SRE best practices.
  • Build and maintain reliable CI/CD pipelines, tooling, and infrastructure.

Rula strives to provide quality, evidence-based, compassionate mental healthcare and aims to create a world where mental health is no longer stigmatized. They are a remote-first company operating in most U.S. states, and are dedicated to having a culture of inclusion that supports their employees.

Global

  • Build Reliable Cloud Infrastructure: Implement and maintain AWS infrastructure using Terraform across EKS, Lambda, EC2, and S3.
  • Improve Developer Workflows: Contribute to CI/CD pipelines, starter kits, and internal tooling that reduce manual effort and improve deployment confidence.
  • Strengthen Observability & Operations: Add monitoring, logging, and alerting (DataDog) to platform services and participate in an on-call rotation.

Spreetail helps brands increase their ecommerce market share globally while improving operational costs. They are building one of the fastest-growing ecommerce companies in history with a focus on innovation.

US

  • Design and implement resilient, secure, and scalable cloud environments to support client platforms in production.
  • Drive production readiness and operations: monitoring and alerting, incident support, runbooks, capacity planning, reliability improvements, and release readiness.
  • Build and maintain CI/CD workflows and reconfigure/enhance an existing proprietary pipeline using Argo.

Kunai builds full-stack technology solutions for banks, credit and payment networks, infrastructure providers, and their customers. The company helps its clients modernize, capitalize on emerging trends, and evolve their business for the coming decades by remaining tech-agnostic and human-centered.

$90,000–$125,000/yr
US 3w PTO

  • Support Engineering and Platform automation efforts with development and scripting skills.
  • Automate operational processes using scripting languages.
  • Develop, implement, and continually improve system and network monitoring and alerting capabilities and procedures.

Cotiviti is focused on providing payment accuracy and analytics-driven solutions that drive measurable results. They offer team members a competitive benefits package and has a culture of valuing individual qualifications without regard to race, gender, or other protected characteristics.

$100,000–$140,000/yr
US

  • Architect and scale our AWS infrastructure.
  • Build our observability and alerting platform from the ground up.
  • Lead infrastructure builds for compliance (SOC 2, HIPAA).

Truv is transforming the financial data industry with a secure and real-time API platform for payroll account access. Backed by $30M from top investors, they're disrupting a $2B legacy market with cutting-edge innovation and a customer-first approach.