Source Job

Global

  • Leading a team focused on designing, building, and evolving cloud-native, containerized infrastructure.
  • Driving complex technical initiatives and ensuring the availability, security, scalability, and reliability of our data ecosystem.
  • Guiding and developing engineering talent, setting priorities, driving execution, and partnering across teams.

AWS Azure Kubernetes IaC Data

20 jobs similar to Senior Manager, Data Reliability Engineering

Jobs ranked by similarity.

Europe

  • Lead the Infrastructure Engineering team, taking full ownership of cloud infrastructure, Kubernetes platforms, DevOps tooling, and CI/CD pipelines.
  • Drive reliability, scalability, and security across the production environment while maintaining a sharp focus on developer velocity and business impact.
  • Mentor and guide engineers across SRE, DevOps, and Database Reliability functions, fostering a culture of operational excellence and pragmatic problem-solving.

Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs with an all-in-one B2B platform. They have raised $346 million, are expanding across key EU markets, and foster innovation, prioritizing research and solutions that benefit users, employees, partners, and the business.

US

  • Design, build, and operate core cloud infrastructure across compute, storage, databases, and networking layers.
  • Own and improve the reliability, scalability, and security of Valon’s production systems as we scale to support major enterprise deployments.
  • Evaluate, adopt, and operationalize new infrastructure technologies (e.g., Vitess, Clickhouse, Redis) to meet evolving product and scale requirements.

Valon is building the AI-native operating system for regulated finance, starting with mortgage servicing. They are a Series C company backed by a16z, transforming industries that others have written off as too complex to innovate.

US Unlimited PTO 12w maternity 12w paternity

  • Help define and drive the technical direction of our Cloud Infrastructure team within Platform Engineering.
  • Work across Valon’s production systems—compute, databases, storage, and networking—shaping the infrastructure foundations that every product and team depends on.
  • Set the technical direction for how we meet those challenges.

Valon is building the AI-native operating system for regulated finance, starting with mortgage servicing. We're a Series C company backed by a16z, transforming industries that others have written off as too complex to innovate.

Global Unlimited PTO

  • Lead, mentor, and grow a team of 8-10 skilled and globally distributed engineers, supporting their technical success, career development, and personal growth
  • Plan and deliver high-quality solutions that meet business and technical goals
  • Collaborate with Product, the Senior Director of Engineering - Cloud, and other Engineering teams to align database capabilities with business needs

Ditto is redefining how data moves at the edge, aiming to make building resilient, real-time applications seamless regardless of network conditions. As a globally distributed, fast-growing startup with over $145 million in funding, we're committed to a diverse and inclusive team to solve connectivity problems.

US

  • Lead the design and implementation of scalable, secure, and resilient cloud infrastructure across AWS and Azure.
  • Drive the architectural vision and strategy, ensuring alignment with long-term business goals.
  • Take the lead on automating and accelerating SDLC processes by identifying bottlenecks.

Candidly flips the script on planning, borrowing, repaying, and saving for college and is a category leader with an AI-driven student debt and savings optimization platform. They partner with hundreds of top employers and have a fully remote, international team of 70+ including alumni from Google, UBS, and Twitter.

Global

  • Deliver a scalable internal infrastructure platform on public cloud environments.
  • Establish and evolve Kubernetes-based platform capabilities to support high-availability, production-grade workloads at scale.
  • Build a secure and reliable foundation that supports CI/CD pipelines and minimizes operational risk across engineering teams

Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi.

$230,000–$250,000/yr
US Unlimited PTO 12w paternity

  • Define and evolve reliability standards for the SmarterDx platform.
  • Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
  • Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.

SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.

Unlimited PTO

  • Build and operate cutting-edge cloud infrastructure to support Diagrid's core products
  • Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available
  • Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds

Diagrid believes that open-source software, open standards and APIs are the greatest transformational tools for organizations. They provide developers with APIs and tools that help them focus on their code and not on infrastructure and are founded by the creators of the Dapr and KEDA open-source projects.

US

  • Own all technical aspects of a customer environment from implementation through ongoing operations.
  • Oversee and approve all changes for assigned customers.
  • Consult with customers as you learn their business and understand how SAS drives their business outcomes.

SAS is the leader in analytics, providing software and services that help customers transform data into intelligence. They are a debt-free multi-billion-dollar organization aiming to provide a dynamic and fulfilling career coupled with flexibility.

US

  • Set the technical strategy and operating model for the database platform engineering team.
  • Grow engineering talent and develop a high-leverage engineering organization.
  • Drive measurable improvements in developer experience and security.

Jobgether is a platform that helps connect job seekers with companies. They use an AI-powered matching process to ensure applications are reviewed quickly and fairly.

US EMEA

  • Design and implement the complex distributed infrastructure that powers our core AI engine and distributed analysis systems.
  • Tune and optimize cloud services across compute, storage, networking, and observability to drive performance and reliability.
  • Develop our core services, written in TypeScript, Kotlin and Go to support our unique deployment and infrastructure requirements.

XBOW is building the future of offensive security. They create the platform that puts security ahead in the arms race, using AI to autonomously discover, validate, and exploit vulnerabilities. Founded by Oege de Moor, the company is backed by Sequoia, Altimeter, and other leading investors.

US Unlimited PTO

  • Define long-term architectural strategy for multi-cloud compute and traffic platforms.
  • Provide mentorship to engineers through design reviews and code contributions.
  • Partner with Security to build ‘secure by default’ systems.

Temporal Technologies develops an open-source programming model that simplifies code and enhances application reliability. With a focus on developer experience and open-source software, they foster a culture of curiosity, collaboration, and genuine impact.

Canada Global

  • Lead, mentor, and foster a healthy, high-performing globally distributed engineering team.
  • Own the execution and delivery of highly critical, complex yearly roadmap items centered around large-scale foundational infrastructure upgrades, high availability, and platform resilience.
  • Own and drive the change management processes across engineering and product domains.

Alpaca is a US-headquartered self-clearing broker-dealer and brokerage infrastructure for stocks, ETFs, options, crypto, fixed income, 24/5 trading, and more. Their global team of 230+ members is a diverse group of experienced engineers, traders, and brokerage professionals fostering a vibrant community.

$170,000–$240,000/yr
US 4w PTO

  • Own our fundamental cloud services and tooling.
  • Own our application platform.
  • Own our developer experience.

Propel builds technology that strengthens the social safety net. They are a passionate team of ~100 Propellers who envision a future where every American has the tools and resources they need to thrive, offering a remote-first working environment with headquarters in Brooklyn.

US Canada

  • Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
  • Participate in an on-call rotation and act as incident commander for high-severity production events.
  • Partner with engineering teams to build reliability into new features before they ship to production

Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.

$134,000–$149,000/yr
US

  • Design, implement, and operate cloud-native infrastructure for production workloads.

PointClickCare's mission is to help providers deliver exceptional care. They are a leading health tech company that’s founder-led and privately held that empowers their employees to push boundaries, innovate, and shape the future of healthcare. They have the largest long-term and post-acute care dataset and a Marketplace of 400+ integrated partners, their platform serves over 30,000 provider organizations.

Global

  • Cooperate closely with other Platform and Engineering teams on strategic initiatives
  • Improve, automate and grow SmartRecruiters cloud platform
  • Respond to client threats and remediate issues

SmartRecruiters is the Recruiting AI Company that transforms hiring for the world’s leading enterprises. An SAP company, they deliver an AI-powered hiring platform that automates and optimizes the entire talent acquisition process. They are a values-driven tech company with strong financial backing and a bold vision.

Canada

  • Design and manage CI/CD and deployment pipelines.
  • Collaborate with product teams to implement cloud best practices.
  • Automate code changes, testing, and analysis using CI tools.

Jobgether is a platform that uses AI to match candidates with jobs. They ensure applications are reviewed quickly, objectively, and fairly against the role's core requirements.

$130,000–$150,000/yr
US

  • Lead and mentor a team responsible for managing and maintaining the company's IT infrastructure.
  • Collaborate with cross-functional teams to define IT strategies, roadmaps, and solutions aligned with business objectives.
  • Develop and implement IT policies, procedures, and standards to ensure security, availability, and performance of IT systems

AHEAD builds platforms for digital business. By weaving together advances in cloud infrastructure, automation and analytics, and software delivery, they help enterprises deliver on the promise of digital transformation. At AHEAD, they prioritize creating a culture of belonging, where all perspectives and voices are represented, valued, respected, and heard.

US Canada 16w maternity

  • Build and deploy computing services and infrastructure in customer environments.
  • Clarify and surface requirements from ambiguous use cases defined by cross-functional stakeholders.
  • Improve reliability and scalability by resolving edge cases, studying failure modes, and writing tests.

Planet designs, builds, and operates the largest constellation of imaging satellites in history. They deliver an unprecedented dataset of empirical information via a revolutionary cloud-based platform to authoritative figures in commercial, environmental, and humanitarian sectors. Planet has a people-centric approach toward culture and community and it strives to iterate in a way that puts their team members first and prepares their company for growth.