Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.
Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.
Design infrastructure, networking, and software platform architecture.
Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
Troubleshoot infrastructure, internal applications, networking, and security issues.
Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.
Help deploy and configure Dynatrace OneAgent and ActiveGates with automated tooling.
Define and instrument user‑centric metrics and objectives in Dynatrace.
Combine Davis® AI with Copilot/Claude to identify root causes and reduce MTTR.
AWP Safety's IT Internship Program is a hands‑on, learning experience for early‑career professionals who want to build a future in IT Site Reliability Engineering. They operate at the intersection of Software Engineering and Systems Operations, using Dynatrace to diagnose performance bottlenecks and automate "toil" out of existence.
Collaborate with engineers in supporting new features and services.
Build tools to monitor site stability and performance.
Troubleshoot site issues using industry-leading tools like Splunk, Prometheus and OpenTelemetry.
Yelp's engineering culture is cooperative and values individual authenticity. They encourage creative solutions to problems and help users, grow as engineers, and have fun in a collaborative environment.
Architect new and existing systems to enhance performance, reliability, and scalability.
Build, implement, iterate over CI/CD pipelines.
Assist with the Management, Development, Design, and Deployment of microservice and containerized applications.
AbbVie's mission is to discover and deliver innovative medicines and solutions that solve serious health issues today and address the medical challenges of tomorrow. They strive to have a remarkable impact on people's lives across several key therapeutic areas.
Enhance system monitoring with tools like Prometheus, Grafana, and ELK Stack, ensuring visibility and alignment with business objectives.
Transition manual processes to automated solutions using IaC tools (e.g., Terraform, Ansible) to streamline deployments and improve operational efficiency.
Improve pipeline architecture for fast, reliable releases, ensuring scalability and resilience to handle high volumes of changes.
Upsun (formerly Platform.sh) is a cloud application platform designed for hybrid teams, enabling developers, DevOps engineers, and platform teams to build, ship, and scale confidently without backend infrastructure hassles. Upsunners are a remote, global workforce committed to open source and an open, welcoming environment, valuing curiosity, knowledge, and innovative ideas.
Developing infrastructure to support cloud-based applications.
Creating deployment architect and continuous delivery pipelines.
Designing high-availability approaches, and implementing monitoring architecture.
Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.
Design and implement comprehensive monitoring strategies.
Take ownership of production incident response, lead handling, and drive remediation.
Continuously improve operational processes, reliability practices, and team readiness.
InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.
Collaborate with cross-functional teams to translate product requirements into technical solutions.
Develop and maintain core services for Chainguard.
Practice continuous improvement by iterating on how services are deployed, configured, monitored, and maintained on our platform.
Chainguard provides a secure foundation for software development and deployment by offering guarded open source software that is built from source and continuously updated. Founded by industry experts, they aim to be the safe source for open source and have built the largest library of open source software that is secure by default.
Deploy, manage, and secure Ivanti’s production Software-as-a-Service (SaaS) environments in AWS and Azure
Automate common and repetitive tasks
Participate in on-call rotations for 24x7 coverage (follow-the-sun model) for incident response, issue triage, and problem resolution
Ivanti's mission is to elevate human potential within organizations by managing, protecting and automating technology for continuous innovation. They are committed to building a diverse team and fostering an inclusive environment where everyone belongs.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Build and scale infrastructure to support billions of messages per day and real-time events
Automate deployments, alerting, and incident response
Tune MySQL and other datastore performance and improve reliability across distributed systems
Customer.io's platform enables over 8,000 companies, from scrappy startups to global brands, to send billions of automated emails, push notifications, in-app messages, and SMS every day. They foster a culture that values empathy, transparency, and responsibility.
Collaborate with service engineering teams to design, implement, and maintain scalable and resilient infrastructure solutions.
Implement SRE principles to improve system reliability and reduce downtime.
Improve developer workflows by creating self-service tools, optimizing CI/CD pipelines, and enhancing deployment processes.
Flex is a growth-stage FinTech company creating the best rent payment experience. They empower renters with flexibility over their most significant recurring expense and are growing quickly with a focus on building an inclusive culture.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Focus on automation so we can spend energy where it matters.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Implement SLI/SLO frameworks with error budgets to drive reliability decisions
Design release strategies including blue/green deployments and version tracking
Lead incident response and develop automated runbooks to reduce MTTR
Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.
Own the end-to-end lifecycle (design, provisioning, upgrades, and decommissioning) of core platform components.
Lead the design and implementation of infrastructure bootstrap orchestration, including: Automated cluster and environment provisioning.
Apply and promote SRE practices across the platform, including: Clear ownership and runbooks for platform components.
Pismo provides a comprehensive processing platform for banking, card issuing and financial market infrastructure and helps customers innovate and build the next generation of banking and payment solutions. Pismo’s 500+ employees are located in more than 10 countries around the world.
Support and operate Legion’s AWS-based cloud platform and Kubernetes (EKS) environments.
Build and maintain infrastructure-as-code using Terraform.
Improve CI/CD pipelines to increase deployment safety and velocity.
Legion Technologies delivers the industry’s most innovative workforce management platform. The AI-driven Legion WFM platform maximizes labor efficiency and employee engagement. They are a remote, mission-driven team that embraces a collaborative, fast-paced, and entrepreneurial culture.