Source Job

$155,000–$170,000/yr
US Canada Europe UK

  • Design, deploy, and manage scalable and highly available cloud infrastructure on AWS.
  • Design reusable Terraform/OpenTofu modules following DRY principles and organizational standards.
  • Implement AIOps practices, leveraging AI tools to enhance monitoring, incident response, and predictive alerting.

AWS Terraform Python Linux AIOps

20 jobs similar to Senior Systems Operations Engineer

Jobs ranked by similarity.

US Canada Ireland UK Mexico Argentina

  • Partner with engineering leadership, EMs, and Product Managers to define and deliver AI products.
  • Architect scalable, high-performance systems that support a growing number of AI-powered products.
  • Drive technical strategy and make architectural decisions that compound - enabling the team to ship more AI experiences faster.

Webflow is building the world’s leading AI-native Digital Experience Platform as a remote-first company built on trust, transparency, and a whole lot of creativity. They empower teams to design, launch, and optimize for the web without barriers, from entrepreneurs launching their first idea to global enterprises scaling their digital presence.

Global Unlimited PTO

  • Build and maintain Infrastructure as Code to power our production systems, Python tools to automate toil, and monitoring systems to detect problems early.
  • Independently execute on large DevOps projects such as major migrations, product rollouts, and infrastructure enhancements
  • Participate in the infrastructure on-call rotation & incident response process, including triaging alerts, coordinating responders, and contributing to blame-free RCAs. Leverage senior level expertise to drive rapid resolutions.

Super.com aims to maximize the lives of both customers and employees, providing opportunities to unlock potential through learning and impact. They are a fast-paced, high-growth tech company that values career progression and supports employees through various programs.

Europe

  • Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
  • Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
  • Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.

Fixify is on a mission to reimagine IT teams support companies. They need a Senior Site Reliability Engineer who finds joy in building systems that fade into the background, empowering product engineers to ship with confidence and their customers to work without interruption.

US

  • Implement cloud infrastructure, automation, and DevOps best practices.
  • Support platform and engineering teams specializing in AWS Bedrock AgentCore.
  • Contribute to building & maintaining CI/CD pipelines using Bitbucket Pipelines.

Nagarro is a Digital Product Engineering company scaling rapidly. They build products, services, and experiences that inspire, excite, and delight, operating across all devices and digital mediums with over 17000 experts across 39 countries, fostering a dynamic and non-hierarchical work culture.

$120,000–$140,000/yr
US Unlimited PTO

  • Architect and manage scalable cloud infrastructure within AWS.
  • Implement and maintain infrastructure using Terraform.
  • Develop automation scripts to improve operational efficiency.

Attune empowers insurance agents with their technology solutions. We foster a remote-first culture and value employee development.

$80,300–$109,500/yr
Canada 3w PTO

  • Lead and mentor a team of DevOps engineers.
  • Design, implement, and manage scalable cloud infrastructure.
  • Automate and optimize infrastructure management tasks.

Rival Group is a forward-thinking, results-driven organization obsessed with helping innovative brands get closer to their customers. They have a fast-growing tech company with award-winning market research agency with offices in Chicago, Toronto, and Vancouver.

Global

  • Build Reliable Cloud Infrastructure: Implement and maintain AWS infrastructure using Terraform across EKS, Lambda, EC2, and S3.
  • Improve Developer Workflows: Contribute to CI/CD pipelines, starter kits, and internal tooling that reduce manual effort and improve deployment confidence.
  • Strengthen Observability & Operations: Add monitoring, logging, and alerting (DataDog) to platform services and participate in an on-call rotation.

Spreetail helps brands increase their ecommerce market share globally while improving operational costs. They are building one of the fastest-growing ecommerce companies in history with a focus on innovation.

$165,000–$195,000/yr
US

  • Support and operate Legion’s AWS-based cloud platform and Kubernetes (EKS) environments.
  • Build and maintain infrastructure-as-code using Terraform.
  • Improve CI/CD pipelines to increase deployment safety and velocity.

Legion Technologies delivers the industry’s most innovative workforce management platform. The AI-driven Legion WFM platform maximizes labor efficiency and employee engagement. They are a remote, mission-driven team that embraces a collaborative, fast-paced, and entrepreneurial culture.

UK

  • Design and maintain AWS infrastructure using best practices.
  • Develop, operate, and improve CI/CD pipelines using GitHub Actions.
  • Lead initiatives around infrastructure security and compliance.

Bluefish is building the platform that helps brands engage consumers on the new AI channel, with enterprise tools to manage AI brand safety and engage consumers with personalized AI marketing experiences. The Bluefish team is a tight-knit group of mar-tech industry veterans.

Canada

  • Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
  • Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
  • Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.

Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.

Europe

  • Manage cloud infrastructure and optimize costs, particularly in AWS environments using Terraform and Python.
  • Design, develop, and maintain CI/CD pipelines and infrastructure for AI model training and deployment.
  • Ensure platform scalability and efficient resource utilization.

NEORIS, now part of EPAM Systems, is a Digital Accelerator that helps companies step into the future. With more than 20 years of experience as Digital Partners to some of the world’s leading organizations, they are over 4,000 professionals across 11 countries and foster a multicultural, startup-minded culture that promotes innovation, continuous learning, and the delivery of high-impact solutions for their clients.

India

  • Deploy, manage, and administer web services in public cloud environments.
  • Design and develop solutions for secure, highly available, performant, and scalable services in elastic environments.
  • Own all operational aspects of web services: automation, monitoring, alerting, reliability, and performance.

Jumio is the leading provider of online identity verification, eKYC, and AML solutions. With a global footprint, they are expanding to meet strong client demand across industries such as Financial Services, Travel, Sharing Economy, Fintech, Gaming, and more. We welcome applications from colleagues of all backgrounds and statuses.

Europe

  • Implement SLI/SLO frameworks with error budgets to drive reliability decisions
  • Design release strategies including blue/green deployments and version tracking
  • Lead incident response and develop automated runbooks to reduce MTTR

Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.

US

  • Design and maintain scalable cloud environments using tools like Terraform, CloudFormation, or Ansible.
  • Build and optimize automated deployment pipelines to ensure rapid and reliable software delivery.
  • Implement robust monitoring, logging, and alerting frameworks to ensure 24/7 system health.

CodeRoad offers end-to-end software development services, helping businesses scale with infrastructure solutions. They provide staff augmentation, dedicated IT teams, and software engineering to empower businesses in a digital landscape.

$146,200–$212,000/yr
US Unlimited PTO

  • Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions.
  • Implement SRE principles to improve system reliability and reduce downtime.
  • Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes.

Flex is a growth-stage FinTech company creating the best rent payment experience. They empower renters with flexibility over their most significant recurring expense and are growing quickly with a focus on building an inclusive culture.

Asia Australia Japan South Korea

  • Deploy, configure, and manage blockchain networks (e.g., Bitcoin, Ethereum, Solana)
  • Design and implement cloud infrastructure on AWS in line with best practices.
  • Administer and scale Kubernetes clusters (EKS) for deploying blockchain nodes and related services.

Binance is a leading global blockchain ecosystem behind the world’s largest cryptocurrency exchange by trading volume and registered users. Trusted by 300+ million people in 100+ countries, they offer trading, finance, education, research, payments, institutional services, Web3 features, and more.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

US

  • Design and implement resilient, secure, and scalable cloud environments to support client platforms in production.
  • Drive production readiness and operations: monitoring and alerting, incident support, runbooks, capacity planning, reliability improvements, and release readiness.
  • Build and maintain CI/CD workflows and reconfigure/enhance an existing proprietary pipeline using Argo.

Kunai builds full-stack technology solutions for banks, credit and payment networks, infrastructure providers, and their customers. The company helps its clients modernize, capitalize on emerging trends, and evolve their business for the coming decades by remaining tech-agnostic and human-centered.

$100,000–$140,000/yr
US

  • Architect and scale our AWS infrastructure.
  • Build our observability and alerting platform from the ground up.
  • Lead infrastructure builds for compliance (SOC 2, HIPAA).

Truv is transforming the financial data industry with a secure and real-time API platform for payroll account access. Backed by $30M from top investors, they're disrupting a $2B legacy market with cutting-edge innovation and a customer-first approach.

$80,547–$106,026/yr
North America

  • Develop and maintain resilient, cost-efficient infrastructure using AWS and other cloud services to meet evolving business needs.
  • Use IaC solutions to enable automated provisioning and ensure consistency across all environments.
  • Design, develop, and maintain advanced pipelines, ensuring automated testing integration and deployment efficiency at scale.

Pagefreezer's vision is to make the Internet a safer place by delivering solutions that transform how people protect integrity online, ensuring accountability, and enabling the pursuit of justice. They simplify compliance and litigation by automatically archiving websites, social media, mobile text messages, and enterprise collaboration platforms. It appears they have a good company culture as they have been named Canada’s Most Admired Culture 2023, 2024 and 2025, one of BC’s Top Employers 2024 and as one of Canada’s Top Small & Medium Employers for 2024.