Own the end‑to‑end lifecycle of core platform components, including cloud infrastructure primitives and Kubernetes clusters.
Design platform components to be resilient by default, applying SRE principles like fault isolation and capacity planning.
Drive Infrastructure‑as‑Code and GitOps‑first practices to ensure platform components are reproducible and auditable.
Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate in banking and payments. With over 500 employees across 10+ countries, Pismo joined Visa in 2024, leveraging Visa’s solutions to advance financial technology.
Maintain and continuously improve production uptime, supporting our ≥99.9% target for 2026.
Monitor systems proactively and respond effectively to production incidents.
Drive improvements in MTTR (Mean Time to Resolution).
Infiterra's B2B SaaS platform simplifies subscription service delivery, helping IT Distributors and Managed Service Providers (MSPs) automate and grow their subscription business. With 100+ customers in 75 countries, Infiterra is known for its collaborative and growth-oriented culture.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Design, deploy and maintain a cloud infrastructure to support a Dataiku SaaS offering mainly on AWS and Azure and GCP
Continuously improve the infrastructure, deployment and configuration to deliver more reliable, resilient, scalable and secure services
Automate as much as possible all technical operations
Dataiku is The Universal AI Platform™, giving organizations control over their AI talent, processes, and technologies to unleash the creation of analytics, models, and agents. They connect many data science technologies and integrate the best of data and AI tech.
Contribute to building and operating the infrastructure that supports the HackerOne platform.
Improve the reliability, security, and scalability of our systems.
Design and operate highly available cloud systems and apply best practices for reliability, observability, and security.
HackerOne is a global leader in Continuous Threat Exposure Management (CTEM). The HackerOne Platform unites agentic AI solutions with the ingenuity of the world’s largest community of security researchers to continuously discover, validate, prioritize, and remediate exposures across code, cloud, and AI systems. They combine the ingenuity of the largest security research community with a best-in-class AI-powered platform, trusted by the world’s top organizations.
Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Design and evolve production environments, define standards and best practices.
Partner with engineering and IT teams to build scalable, reliable systems.
Lead incident response practices, and set guardrails around security, reliability, and cost management.
They are looking for a Senior Site Reliability Engineer who can own the architecture, governance, and cost efficiency of their cloud and platform infrastructure. This role is a remote contractor role and they are seeking candidates located in LATAM.
Lead reliability-focused design and readiness reviews.
Build, operate, and continuously improve our observability stack.
Own and evolve incident management practices.
Transcend is building the privacy platform that easily embeds privacy into your entire tech stack. They are growing quickly, backed by top-tier investors and are proud to serve some of the world's most iconic brands.
Own the reliability, scalability, and performance of Peec AI’s core systems and infrastructure
Design, build, and maintain the tooling, automation, and monitoring that keep our services fast, secure, and highly available
Partner closely with product and engineering teams to ensure new features are reliable, observable, and easy to operate from day one
Peec AI is one of Europe’s fastest-growing Series A startups (no employee count/culture details given). They provide exciting and challenging work in the AI space.
Own and deliver infrastructure projects end-to-end.
Build and improve platform primitives for service teams.
Improve observability and implement cost and performance improvements.
Afresh is the leading AI company in fresh food, partnering with grocers to order fresh food. They've experienced record-breaking growth and are on a mission to eliminate food waste. They have over 148 million in funding and embody values of proactivity, kindness, candor, and humility.
Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.
Fixify is on a mission to reimagine IT teams support companies. They need a Senior Site Reliability Engineer who finds joy in building systems that fade into the background, empowering product engineers to ship with confidence and their customers to work without interruption.
Lead the implementation and optimization of CI/CD pipelines.
Develop and maintain Infrastructure as Code (IaC) scripts to automate infrastructure provisioning and management.
Identify and implement automation opportunities to improve efficiency and reduce manual effort.
Pismo, founded in 2016, provides a comprehensive processing platform for banking, card issuing, and financial market infrastructure, helping customers innovate and build next-generation banking and payment solutions. Pismo joined Visa in 2024 and has over 500 employees in more than 10 countries.
Build and maintain Infrastructure as Code to power our production systems, Python tools to automate toil, and monitoring systems to detect problems early.
Independently execute on large DevOps projects such as major migrations, product rollouts, and infrastructure enhancements
Participate in the infrastructure on-call rotation & incident response process, including triaging alerts, coordinating responders, and contributing to blame-free RCAs. Leverage senior level expertise to drive rapid resolutions.
Super.com aims to maximize the lives of both customers and employees, providing opportunities to unlock potential through learning and impact. They are a fast-paced, high-growth tech company that values career progression and supports employees through various programs.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Focus on automation so we can spend energy where it matters.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Design and implement comprehensive monitoring strategies.
Take ownership of production incident response, lead handling, and drive remediation.
Continuously improve operational processes, reliability practices, and team readiness.
InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.
Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions.
Implement SRE principles to improve system reliability and reduce downtime.
Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes.
Flex is a growth-stage FinTech company creating the best rent payment experience. They empower renters with flexibility over their most significant recurring expense and are growing quickly with a focus on building an inclusive culture.
Design, build, and maintain our core cloud infrastructure on AWS/GCP using Infrastructure as Code.
Manage and scale our mission-critical services on Kubernetes, ensuring high availability and resilience.
Enhance and operate our CI/CD systems and developer tools within a GitLab-based workflow.
Mambu is a leading SaaS cloud banking platform that is on a mission to make banking better for a billion people. They empower customers to build innovative and secure financial products, and power billions of transactions for millions of end-users.
Design, implement, and maintain cloud-based infrastructure using AWS, Azure, or GCP.
Build, optimize, and manage continuous integration and continuous deployment (CI/CD) pipelines.
Integrate AI-powered tooling into engineering workflows to accelerate delivery and improve code quality.
Givebutter is a nonprofit fundraising and CRM platform. They empower millions to raise more, pay less, and give better by offering tools like fundraisers, donation forms, donor management, emails, and text blasts all in one place.
Design and maintain scalable cloud environments using tools like Terraform, CloudFormation, or Ansible.
Build and optimize automated deployment pipelines to ensure rapid and reliable software delivery.
Implement robust monitoring, logging, and alerting frameworks to ensure 24/7 system health.
CodeRoad offers end-to-end software development services, helping businesses scale with infrastructure solutions. They provide staff augmentation, dedicated IT teams, and software engineering to empower businesses in a digital landscape.
Develop and maintain resilient, cost-efficient infrastructure using AWS and other cloud services to meet evolving business needs.
Use IaC solutions to enable automated provisioning and ensure consistency across all environments.
Design, develop, and maintain advanced pipelines, ensuring automated testing integration and deployment efficiency at scale.
Pagefreezer's vision is to make the Internet a safer place by delivering solutions that transform how people protect integrity online, ensuring accountability, and enabling the pursuit of justice. They simplify compliance and litigation by automatically archiving websites, social media, mobile text messages, and enterprise collaboration platforms. It appears they have a good company culture as they have been named Canada’s Most Admired Culture 2023, 2024 and 2025, one of BC’s Top Employers 2024 and as one of Canada’s Top Small & Medium Employers for 2024.