Develop automation to eliminate manual and repetitive operational tasks.
Investigate and resolve customer complaints escalated beyond L1 and L2 support.
Moniepoint is an all-in-one financial services platform for emerging markets. Since 2019, Moniepoint’s technology has powered over 3 million people, offering personal and business banking, payment, credit and business management tools to help them succeed.
Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.
Design infrastructure, networking, and software platform architecture.
Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
Troubleshoot infrastructure, internal applications, networking, and security issues.
Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.
Build and operate cutting-edge cloud infrastructure to support Diagrid's core products
Define standards, deliver tools, processes, and frameworks to make our products secure, reliable, efficient, and highly available
Build and maintain CI/CD pipelines that enable delivering software quickly and securely across clouds
Diagrid believes that open-source software, open standards and APIs are the greatest transformational tools for organizations. They provide developers with APIs and tools that help them focus on their code and not on infrastructure and are founded by the creators of the Dapr and KEDA open-source projects.
Build and scale infrastructure to support billions of messages per day and real-time events
Automate deployments, alerting, and incident response
Tune MySQL and other datastore performance and improve reliability across distributed systems
Customer.io's platform enables over 8,000 companies, from scrappy startups to global brands, to send billions of automated emails, push notifications, in-app messages, and SMS every day. They foster a culture that values empathy, transparency, and responsibility.
Develop automation code to provision and operate infrastructure at scale.
Build resilient, scalable, secure, and observable services with cost optimization.
Proactively identify and address security concerns across systems and infrastructure.
Globality uses AI to transform enterprise spending into a more efficient and inclusive process. They aim to revolutionize enterprise procurement with AI and have a culture built on trust, collaboration, and innovation, fostering an environment where every individual feels valued and included.
Monitor production systems, dashboards, logs, and alerts to ensure high availability and performance across distributed environments.
Assist in incident detection, triage, escalation, and resolution, following structured on-call rotations with mentorship support.
Maintain, follow, and continuously improve runbooks, operational procedures, and incident response workflows.
Jobgether is a platform that helps job seekers find the right opportunities. They use an AI-powered matching process to ensure applications are reviewed quickly and fairly.
Partner with engineers to build dev tools that empower developer workflows and deployment infrastructure.
Ensure reliability of multi-cloud Kubernetes clusters and pipelines.
Focus on automation so we can spend energy where it matters.
Cresta is on a mission to turn every customer conversation into a competitive advantage by unlocking the true potential of the contact center. Their platform combines the best of AI and human intelligence to help contact centers discover customer insights and behavioral best practices.
Support teammates with goal-setting, professional development, and mentoring.
Ensure delivery of maintainable, high-quality platform systems.
Build and sustain a healthy team culture where ownership and collaboration are the norm.
onX is a pioneer in digital outdoor navigation solutions through its suite of apps. With over 400 employees, they foster a fast-paced, tech-forward environment valuing ownership, accountability, and teamwork.
Implement SLI/SLO frameworks with error budgets to drive reliability decisions
Design release strategies including blue/green deployments and version tracking
Lead incident response and develop automated runbooks to reduce MTTR
Jobgether is a company that helps connect individuals with jobs through an AI-powered matching process. They ensure applications are reviewed quickly, objectively, and fairly against roles' core requirements.
Build and scale a strong culture of operational excellence by defining standards and coaching teams to own reliability and availability.
Drive mature DevOps/SRE practices, including incident response and PIRs, on-call readiness, runbooks, alerting, observability, and release/change management.
Guide teams in the design, development, evolution, and operation of large-scale, distributed cloud systems.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana, the open source visualization tool, around the globe. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack.
Help deploy and configure Dynatrace OneAgent and ActiveGates with automated tooling.
Define and instrument user‑centric metrics and objectives in Dynatrace.
Combine Davis® AI with Copilot/Claude to identify root causes and reduce MTTR.
AWP Safety's IT Internship Program is a hands‑on, learning experience for early‑career professionals who want to build a future in IT Site Reliability Engineering. They operate at the intersection of Software Engineering and Systems Operations, using Dynatrace to diagnose performance bottlenecks and automate "toil" out of existence.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Designing and implementing SLI/SLO frameworks with error budgets to guide reliability and performance decisions.
Building and maintaining AWS-based production infrastructure using Infrastructure as Code (Terraform, CloudFormation), including ECS, EKS/Kubernetes, and microservices orchestration.
Developing internal tools, automation frameworks, and reliability services in TypeScript, Python, or similar languages to enhance operational efficiency.
Jobgether uses an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements. They identify the top-fitting candidates, and this shortlist is then shared directly with the hiring company.
Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.
Fixify is on a mission to reimagine IT teams support companies. They need a Senior Site Reliability Engineer who finds joy in building systems that fade into the background, empowering product engineers to ship with confidence and their customers to work without interruption.
Help drive reliability, automation and performance within our cloud-based infrastructure.
Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.
Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.
Collaborate with engineers in supporting new features and services.
Build tools to monitor site stability and performance.
Troubleshoot site issues using industry-leading tools like Splunk, Prometheus and OpenTelemetry.
Yelp's engineering culture is cooperative and values individual authenticity. They encourage creative solutions to problems and help users, grow as engineers, and have fun in a collaborative environment.
Design and implement comprehensive monitoring strategies.
Take ownership of production incident response, lead handling, and drive remediation.
Continuously improve operational processes, reliability practices, and team readiness.
InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.
Partner closely with product engineering squads (embedded model)
Own production reliability for high-SLA and complex customer environments
Design and implement automation to scale our reliability practices
Grafana Labs is a remote-first, open-source powerhouse that helps more than 3,000 companies manage their observability strategies. They are scaling fast and staying true to what makes them different: an open-source legacy, a global collaborative culture, and a passion for meaningful work.
Build and maintain stable, scalable foundational services that can be leveraged by other engineering teams.
Collaborate with many internal partners and product teams to influence the design of our API surface.
Design and develop reliable, secure, highly available and delightful experiences for the dbt Cloud admin and the end user.
Dbt Labs is the pioneer of analytics engineering, helping data teams transform raw data into reliable, actionable insights. They've grown from an open source project and now serve more than 5,400 dbt Platform customers, including Astra Zenica, Sky, Nasdaq, Volvo, JetBlue, and SafetyCulture.