Source Job

Europe

  • Lead the Infrastructure Engineering team, taking full ownership of cloud infrastructure, Kubernetes platforms, DevOps tooling, and CI/CD pipelines.
  • Drive reliability, scalability, and security across the production environment while maintaining a sharp focus on developer velocity and business impact.
  • Mentor and guide engineers across SRE, DevOps, and Database Reliability functions, fostering a culture of operational excellence and pragmatic problem-solving.

SRE DevOps Kubernetes PostgreSQL GCP

20 jobs similar to Head of Infrastructure & Reliability

Jobs ranked by similarity.

Unlimited PTO

  • Build and operate cutting-edge cloud infrastructure to support Diagrid's core products
  • Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available
  • Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds

Diagrid believes that open-source software, open standards and APIs are the greatest transformational tools for organizations. They provide developers with APIs and tools that help them focus on their code and not on infrastructure and are founded by the creators of the Dapr and KEDA open-source projects.

  • Maximize the velocity of our product engineering team.
  • Ensure platform scalability, reliability, and security.
  • Champion best practices and shape the engineering culture.

They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.

US Europe

  • Build and lead the team responsible for the reliability, security, and scalability of Gensyn’s production infrastructure and developer platform.
  • Own the availability, scalability, and security posture of production systems: SLOs/SLIs, incident response, postmortems, reliability improvements, and hardening.
  • Drive delivery across ambiguous, high-stakes initiatives: roadmap planning, prioritization, and execution against tight timelines.

Gensyn is building a protocol that networks together the core resources required for machine intelligence to flourish alongside human intelligence. They value autonomy, independence, direct feedback and an extreme learning rate, and strive to reject mediocrity and waste.

US

  • Collaborate with application engineering teams on platform infrastructure.
  • Enhance observability and spearhead the adoption of SRE best practices.
  • Build and maintain reliable CI/CD pipelines, tooling, and infrastructure.

Rula strives to provide quality, evidence-based, compassionate mental healthcare and aims to create a world where mental health is no longer stigmatized. They are a remote-first company operating in most U.S. states, and are dedicated to having a culture of inclusion that supports their employees.

US Canada

  • Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
  • Participate in an on-call rotation and act as incident commander for high-severity production events.
  • Partner with engineering teams to build reliability into new features before they ship to production

Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.

South America

  • Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
  • Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
  • Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.

Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.

Global

  • Own the end-to-end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components.
  • Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning.
  • Apply and promote SRE practices across the platform, including: Clear ownership and runbooks for platform components.

Pismo provides a comprehensive processing platform for banking, card issuing and financial market infrastructure and helps customers innovate and build the next generation of banking and payment solutions. Pismo’s 500+ employees are located in more than 10 countries around the world.

$205,000–$270,000/yr
US Unlimited PTO

  • Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
  • Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
  • Focus on automation so we can spend energy where it matters.

Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.

$160,000–$200,000/yr
US

  • Help drive reliability, automation and performance within our cloud-based infrastructure.
  • Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
  • Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.

Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.

India

  • Leading infrastructure strategy and driving DevOps best practices across the engineering organization
  • Helping engineers build reliable products by improving infrastructure and application monitoring, alerting, and tooling
  • Building tools and frameworks that help developers better understand and debug their systems and data

Aspire provides influencer marketing software and services for social commerce. They have helped brands build and manage relationships with millions of influencers and are trusted by over 800 top brands.

$230,000–$250,000/yr
US Unlimited PTO 12w paternity

  • Define and evolve reliability standards for the SmarterDx platform.
  • Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
  • Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.

SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.

US

  • Own and scale AWS and Kubernetes infrastructure.
  • Build and maintain CI/CD pipelines and infrastructure-as-code.
  • Lead observability and monitoring initiatives.

Truelogic is a nearshore staff augmentation services provider headquartered in New York. They deliver technology solutions to companies of all sizes, helping them achieve their digital transformation goals with a team of 600+ highly skilled tech professionals based in Latin America.

Europe

  • Own the container-based application lifecycle, bi-weekly releases, and CI/CD pipelines for GMS.
  • Manage deployments on customer-isolated Kubernetes clusters running stateful applications.
  • Ensure high availability and performance by meeting contractual SLAs through proactive monitoring and alert response.

Planet designs, builds, and operates the largest constellation of imaging satellites in history, delivering data via a cloud-based platform. They are both a space company and data company with a people-centric approach, striving to put their team members first.

$165,000–$195,000/yr
US

  • Support and operate Legion’s AWS-based cloud platform and Kubernetes (EKS) environments.
  • Build and maintain infrastructure-as-code using Terraform.
  • Improve CI/CD pipelines to increase deployment safety and velocity.

Legion Technologies delivers the industry’s most innovative workforce management platform. The AI-driven Legion WFM platform maximizes labor efficiency and employee engagement. They are a remote, mission-driven team that embraces a collaborative, fast-paced, and entrepreneurial culture.

India

  • Deploy, manage, and administer web services in public cloud environments.
  • Design and develop solutions for secure, highly available, performant, and scalable services in elastic environments.
  • Own all operational aspects of web services: automation, monitoring, alerting, reliability, and performance.

Jumio is the leading provider of online identity verification, eKYC, and AML solutions. With a global footprint, they are expanding to meet strong client demand across industries such as Financial Services, Travel, Sharing Economy, Fintech, Gaming, and more. We welcome applications from colleagues of all backgrounds and statuses.

Europe

  • Work closely with developers and operations teams to scale and optimize their infrastructure for sustained growth.
  • Design, deploy, and operate their core backend infrastructure using automated, Infrastructure-as-Code approach.
  • Prioritize and own delivery in a small, highly efficient team — you set the bar, not just maintain it.

Relai is Europe's fastest growing Bitcoin-only app. They are looking for an experienced, results-oriented and impact-driven Senior DevOps Engineer who can help them scale their infrastructure and pursue their mission of bringing the best store of value to more people.

Europe

  • Standardize CI/CD pipelines (GitHub Actions) and Helm charts across 10+ microservices
  • Build centralized logging, metrics, and alerting (currently a gap)
  • Extend Terraform to cover full AWS infrastructure

Kiefer Tech delivers cutting-edge AI, robotics, and enterprise solutions across Greece and the EU, leveraging over 20 years of engineering heritage from the Green Energy sector. As the technology arm of Kiefer, they are guided by innovation, quality, and long-term client partnerships and are building sovereign AI infrastructure.

Europe

  • Build and maintain CI/CD pipelines and GitOps workflows across a diverse set of engineering teams.
  • Own observability — monitoring, alerting, logging — and support development teams in instrumenting their services.
  • Optimise infrastructure for security, cost, performance and reliability.

1inch is a decentralized finance (DeFi) platform. We empower users to access the best rates and execute efficient and secure trades across multiple liquidity sources.

US

  • Design, build, and maintain our core cloud infrastructure on AWS/GCP using Infrastructure as Code.
  • Manage and scale our mission-critical services on Kubernetes, ensuring high availability and resilience.
  • Enhance and operate our CI/CD systems and developer tools within a GitLab-based workflow.

Mambu is a leading SaaS cloud banking platform that is on a mission to make banking better for a billion people. They empower customers to build innovative and secure financial products, and power billions of transactions for millions of end-users.

$80,547–$106,026/yr
North America

  • Develop and maintain resilient, cost-efficient infrastructure using AWS and other cloud services to meet evolving business needs.
  • Use IaC solutions to enable automated provisioning and ensure consistency across all environments.
  • Design, develop, and maintain advanced pipelines, ensuring automated testing integration and deployment efficiency at scale.

Pagefreezer's vision is to make the Internet a safer place by delivering solutions that transform how people protect integrity online, ensuring accountability, and enabling the pursuit of justice. They simplify compliance and litigation by automatically archiving websites, social media, mobile text messages, and enterprise collaboration platforms. It appears they have a good company culture as they have been named Canada’s Most Admired Culture 2023, 2024 and 2025, one of BC’s Top Employers 2024 and as one of Canada’s Top Small & Medium Employers for 2024.