Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
Architect new and existing systems to enhance performance, reliability, and scalability.
Build, implement, iterate over CI/CD pipelines.
Assist with the Management, Development, Design, and Deployment of microservice and containerized applications.
AbbVie's mission is to discover and deliver innovative medicines and solutions that solve serious health issues today and address the medical challenges of tomorrow. They strive to have a remarkable impact on people's lives across several key therapeutic areas.
Build and operate cutting-edge cloud infrastructure to support Diagrid's core products
Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available
Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds
Diagrid believes that open-source software, open standards and APIs are the greatest transformational tools for organizations. They provide developers with APIs and tools that help them focus on their code and not on infrastructure and are founded by the creators of the Dapr and KEDA open-source projects.
Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.
Fixify is on a mission to reimagine IT teams support companies. They need a Senior Site Reliability Engineer who finds joy in building systems that fade into the background, empowering product engineers to ship with confidence and their customers to work without interruption.
Define and execute the reliability engineering roadmap.
Establish SLO/SLI/error budget frameworks for system stability.
Drive continuous improvement through DORA metrics and analysis.
Jobgether leverages AI for HR solutions. They focus on connecting talent with opportunities, using AI-driven matching to ensure fair and objective application reviews.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Own the end-to-end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components.
Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning.
Apply and promote SRE practices across the platform, including: Clear ownership and runbooks for platform components.
Pismo provides a comprehensive processing platform for banking, card issuing and financial market infrastructure and helps customers innovate and build the next generation of banking and payment solutions. Pismo’s 500+ employees are located in more than 10 countries around the world.
Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions.
Implement SRE principles to improve system reliability and reduce downtime.
Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes.
Flex is a growth-stage FinTech company creating the best rent payment experience. They empower renters with flexibility over their most significant recurring expense and are growing quickly with a focus on building an inclusive culture.
Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.
Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.
Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Focus on automation so we can spend energy where it matters.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Implement SLI/SLO frameworks with error budgets to drive reliability decisions
Design release strategies including blue/green deployments and version tracking
Lead incident response and develop automated runbooks to reduce MTTR
Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.
Build Reliable Cloud Infrastructure: Implement and maintain AWS infrastructure using Terraform across EKS, Lambda, EC2, and S3.
Improve Developer Workflows: Contribute to CI/CD pipelines, starter kits, and internal tooling that reduce manual effort and improve deployment confidence.
Strengthen Observability & Operations: Add monitoring, logging, and alerting (DataDog) to platform services and participate in an on-call rotation.
Spreetail helps brands increase their ecommerce market share globally while improving operational costs. They are building one of the fastest-growing ecommerce companies in history with a focus on innovation.
Design and maintain scalable cloud environments using tools like Terraform, CloudFormation, or Ansible.
Build and optimize automated deployment pipelines to ensure rapid and reliable software delivery.
Implement robust monitoring, logging, and alerting frameworks to ensure 24/7 system health.
CodeRoad offers end-to-end software development services, helping businesses scale with infrastructure solutions. They provide staff augmentation, dedicated IT teams, and software engineering to empower businesses in a digital landscape.
Build and maintain CI/CD pipelines and infrastructure-as-code.
Lead observability and monitoring initiatives.
Truelogic is a nearshore staff augmentation services provider headquartered in New York. They deliver technology solutions to companies of all sizes, helping them achieve their digital transformation goals with a team of 600+ highly skilled tech professionals based in Latin America.
Build and maintain Infrastructure as Code to power our production systems, Python tools to automate toil, and monitoring systems to detect problems early.
Independently execute on large DevOps projects such as major migrations, product rollouts, and infrastructure enhancements
Participate in the infrastructure on-call rotation & incident response process, including triaging alerts, coordinating responders, and contributing to blame-free RCAs. Leverage senior level expertise to drive rapid resolutions.
Super.com aims to maximize the lives of both customers and employees, providing opportunities to unlock potential through learning and impact. They are a fast-paced, high-growth tech company that values career progression and supports employees through various programs.
Develop and maintain resilient, cost-efficient infrastructure using AWS and other cloud services to meet evolving business needs.
Use IaC solutions to enable automated provisioning and ensure consistency across all environments.
Design, develop, and maintain advanced pipelines, ensuring automated testing integration and deployment efficiency at scale.
Pagefreezer's vision is to make the Internet a safer place by delivering solutions that transform how people protect integrity online, ensuring accountability, and enabling the pursuit of justice. They simplify compliance and litigation by automatically archiving websites, social media, mobile text messages, and enterprise collaboration platforms. It appears they have a good company culture as they have been named Canada’s Most Admired Culture 2023, 2024 and 2025, one of BC’s Top Employers 2024 and as one of Canada’s Top Small & Medium Employers for 2024.
Help drive reliability, automation and performance within our cloud-based infrastructure.
Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.
Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.
Design, build, and maintain our core cloud infrastructure on AWS/GCP using Infrastructure as Code.
Manage and scale our mission-critical services on Kubernetes, ensuring high availability and resilience.
Enhance and operate our CI/CD systems and developer tools within a GitLab-based workflow.
Mambu is a leading SaaS cloud banking platform that is on a mission to make banking better for a billion people. They empower customers to build innovative and secure financial products, and power billions of transactions for millions of end-users.