Design, build, and maintain scalable, reliable systems on GCP.
Develop automation for infrastructure provisioning using Terraform, Ansible, or Deployment Manager.
Manage incident response, conduct postmortems, and implement improvements to reduce recurrence.
SupplyHouse.com is an industry-leading e-commerce company specializing in HVAC, plumbing, heating, and electrical supplies since 2004. They value every individual team member and cultivate a community where people come first with Generosity, Respect, Innovation, Teamwork, and GRIT.
Lead software engineering teams providing infrastructure-as-code to manage cloud infrastructure.
Hire experienced site reliability staff, and a line manager to grow and oversee the SRE team.
Establish design-before-build discipline; facilitate lightweight design documents, architectural decision records, and working group reviews.
Horizon3.ai is a cybersecurity company dedicated to enabling organizations to proactively find, fix, and verify exploitable attack vectors. They are a fast-growing company with a culture of respect, collaboration, ownership, and results.
Build and maintain CI/CD pipelines and deployment infrastructure.
Leverage AI to automate analysis and resolution of production issues.
Fal is the generative media ecosystem powering the next generation of AI products. They build the infrastructure, tools, and model access that teams need to move from idea to production.
Design and operate our Kubernetes ecosystem with a focus on high availability and zero-downtime operations.
Own and evolve our PaaS strategy, using GitOps and CI/CD to empower domain teams to deploy independently.
Define and implement our observability strategy across metrics, logs, and tracing.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one financial B2B solution integrating banking, accounting, financial management, and invoicing into a mobile-first platform, with about 346 million in funding.
Design, deploy, and operate critical systems balancing reliability, cost, and agility.
Perform troubleshooting and root-cause analysis of system operation issues.
Loadsmart is a logistics technology company valued at over $1 billion. We are a collection of industry veterans and user-centered engineers using innovative technology to fearlessly reinvent the future of freight.
Own and evolve Launch Potato's cloud infrastructure, CI/CD platform, and compliance posture.
Build the SRE function from the ground up so product teams can ship faster without compromising reliability, security, or cost control.
Stand up the SRE practice from scratch: on-call rotation, PagerDuty configuration, SLA/SLO definitions for core infrastructure services, runbook library, and observability dashboards that tie site performance to business metrics.
Launch Potato is a digital media company that connects consumers with leading brands through data-driven content and technology. They are headquartered in South Florida with a remote-first team spanning over 15 countries, with a high-growth, high-performance culture.
Lead the design, implementation, and ongoing improvement of reliable, scalable, performant, and secure production platforms and services.
Work closely with cross-functional teams to build and maintain resilient infrastructure and deployment patterns.
Provide technical leadership and mentorship to engineers across the organisation, promoting strong engineering standards and operational best practice.
Cision empowers individuals to make an impact and values diverse perspectives. They foster curiosity, collaboration, and innovation while driving meaningful contributions to brands; they have offices in 24 countries throughout the Americas, EMEA and APAC.
Deploy and maintain infrastructure using Terraform on AWS.
Operate and govern production-grade platforms running on Kubernetes / EKS.
Build and maintain CI/CD pipelines using GitHub Actions.
Muttdata is a dynamic startup committed to crafting innovative systems using cutting-edge Big Data and Machine Learning technologies. They are looking for a hands-on DevOps to join a strategic initiative focused on deploying and operating Data & AI platforms.
Provide technical leadership for infrastructure, reliability, and observability.
Own the observability stack using Datadog and CloudWatch.
Design and evolve AWS infrastructure for reliability, security, scalability, and cost efficiency.
Topstep is an engaging working environment that ranges from fully remote to hybrid. They foster a culture of collaboration by keeping cameras on during meetings and maintaining a robust Slack environment for communication.
Own and evolve CI/CD pipelines using GitHub Actions and OIDC-based authentication for microservices and agentic workloads.
Automate infrastructure provisioning using Infrastructure as Code tools such as Terraform and CloudFormation.
Operate and scale our Kubernetes platform, including autoscaling, ingress, and multi-tenant isolation for enterprise customers.
Zingtree is a next-generation intelligent process automation platform reimagining customer experience operations for enterprise support leaders. It is a small team with high ownership, emphasizing automation, collaboration, and transparency.
Own and evolve Quansight's cloud infrastructure across AWS, Azure, and GCP.
Build, deploy, and maintain internal dashboards and reporting for operations and project management.
Lead infrastructure engagements for clients from scoping and architecture through delivery, upskilling client teams.
Quansight is rooted in the Python and PyData ecosystems. They provide services ranging from open-source software development to training and consulting, believing in a culture of do-ers, learners, and collaborators.
Architect and maintain infrastructure as code with Terraform.
Set up monitoring, alerting, and incident response.
We're a UK fintech building high-throughput digital infrastructure for the mortgage and property space. Recently acquired Trussle and we are taking our platform to the next level. The company values innovation and building high-quality products.
Lead the architecture of a high-scale AWS environment optimized for AI workloads.
Manage and mentor a high-performing team of 8 engineers, providing technical leadership and career coaching.
Conduct user research with internal Natera developers to identify friction points.
Natera is a global leader in cell-free DNA (cfDNA) testing, dedicated to oncology, women’s health, and organ health. The Natera team consists of statisticians, geneticists, doctors, laboratory scientists, business professionals, software engineers, and many other professionals from world-class institutions.
Designing and managing cloud-based infrastructure on AWS.
Creating and maintaining deployment architectures and continuous delivery pipelines.
Automating infrastructure provisioning and management using Infrastructure as Code (IaC) tools such as Terraform or CloudFormation.
Nearform is an independent team of data & AI experts, engineers, and designers who build intelligent digital solutions and capability at pace. Our team of 500 experts in 20+ countries is trusted by leading enterprises.
Support the Platform Infrastructure by managing container environments on EKS, implementing GitOps workflows, and maintaining CI/CD pipelines.
Build for Reliability by defining SLIs/SLOs, leading incident response, and contributing to disaster recovery planning.
Drive Observability by designing and maintaining monitoring and logging stacks with Datadog, Sentry, and CloudWatch.
Turquoise Health is a Series C price transparency platform for finance leaders across healthcare, building the infrastructure for a more open, efficient healthcare marketplace. The company is a remote-first, US-based team of over 300 enterprise organizations that values transparency, empathy, inclusivity, creativity, and ownership.
Construct infrastructure as code, developing and enforcing best practice across configurations while preventing drift between Terraform configurations and infrastructure deployments.
SentiLink provides innovative identity and risk solutions, empowering institutions and individuals to transaction with confidence. They are building the future of identity verification in the United States replacing a clunky, ineffective, and expensive status quo with solutions that are 10x faster, smarter, and more accurate.
Design, build, and maintain infrastructure using Infrastructure as Code tools such as Terraform.
Improve system reliability, scalability, resilience, and performance across the Mast platform.
Build systems and tooling that automate infrastructure management and operational workflows wherever possible.
Mast is on a mission to make complex lending simple by building modern, cloud-native lending technology purpose-built for specialist lenders. It is a high-performance team of engineers and lending experts that values radical honesty, transparency, and speed.
Lead the design, implementation, and continuous improvement of our cloud infrastructure and DevOps practices.
Ensure that our systems are scalable, reliable, and secure, enabling seamless software delivery across environments.
Improve development velocity while increasing system reliability
Cadence is building a remote care delivery system that keeps older people healthy, out of the hospital, and at home. They support tens of thousands of active patients nationwide with their AI‑powered system and scalable clinical model enabling proactive, population‑level care.
Maintain and optimize AWS EC2 and EKS clusters to ensure high availability and performance.
Lead troubleshooting of production outages, providing timely resolution and root cause analysis.
Implement and improve CI/CD pipelines using tools like Jenkins and GitHub Actions to streamline deployment processes.
CI&T are tech transformation specialists uniting human expertise with AI to create scalable tech solutions. With over 8,000 CI&Ters globally, they have built partnerships with more than 1,000 clients over 30 years, and Artificial Intelligence is deeply embedded in their work reality.
Oversee a specialized SRE team focused on the design, deployment, and maintenance of automation toolsets.
Establish and enforce standards for IaC to ensure consistent, repeatable, and secure deployments.
Drive the automated lifecycle of both physical and virtual assets, from initial template creation/deployment to automated patching, scaling, and decommissioning.
Galaxy is a global leader in digital assets and data center infrastructure, delivering solutions that accelerate progress in finance and artificial intelligence. Led by CEO and Founder Michael Novogratz, their team blends deep crypto expertise with institutional experience and a shared commitment to shaping the future of Web3 and AI.