Manage a team of T1 technicians and cultivate a culture of curiosity.
Oversee all facets of cloud and desktop support, ensuring effective ticket queue management.
Act as Incident Commander during major service disruptions, ensuring swift resolution.
Atmosera empowers businesses to redefine what's possible with modern technology and human expertise. They deliver cutting-edge, integrated solutions that deliver business value with experience across Applications, Data & AI, DevOps, Security, and the Microsoft Azure platform.
Monitor cloud infrastructure and application health using observability tools; respond to alerts.
Perform Tier 1 incident triage, document findings, and escalate appropriately to Development or SRE teams.
Monitor and support CI/CD pipelines to ensure successful builds and deployments.
Lumin Digital empowers credit unions and banks by creating cutting-edge digital experiences. They are a trailblazer in digital banking solutions with a culture that fosters trust, respect, and boldness, encouraging team members to explore and experiment with new ideas.
Help deploy and configure Dynatrace OneAgent and ActiveGates with automated tooling.
Define and instrument user‑centric metrics and objectives in Dynatrace.
Combine Davis® AI with Copilot/Claude to identify root causes and reduce MTTR.
AWP Safety's IT Internship Program is a hands‑on, learning experience for early‑career professionals who want to build a future in IT Site Reliability Engineering. They operate at the intersection of Software Engineering and Systems Operations, using Dynatrace to diagnose performance bottlenecks and automate "toil" out of existence.
Act as the primary responder for high-priority production incidents during the Australian business day.
Work with the core product team to identify recurring support patterns and develop automated fixes or feature enhancements.
Participate in daily handovers with EMEA and US teams to ensure seamless continuity of operations.
EngFlow helps developers save time by accelerating software builds and tests. They are backed by top investors and are redefining how companies build software, with solutions that speed up builds and an observability platform for actionable insights.
Deep expertise in reliability engineering and automation.
Experience with cloud platforms is a great opportunity.
Arthur Grand is an IT services firm specializing in Digital Transformation initiatives for Federal, Commercial, State & local customers. With a culture of delivery excellence and a commitment to bringing the best talent, they have earned an unparalleled reputation for delivering transformative results.
Define and execute the Customer Experience and Success coverage model for the US region.
Recruit, onboard, and develop Customer Experience and Success professionals within the region.
Personally manage complex enterprise accounts through the full engagement cycle.
Dash0 is building an AI-centric OpenTelemetry-native platform that eleminates vendor lock-in and toil. Dash0 is growing rapidly, with the United States as a primary strategic market.
Design and implement comprehensive monitoring strategies.
Take ownership of production incident response, lead handling, and drive remediation.
Continuously improve operational processes, reliability practices, and team readiness.
InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.
Lead a technically diverse team in complex customer environments to ensure optimal performance, scalability, and reliability.
Collaborate with teams across SAS to improve our products based on customer experiences.
Define, monitor, and analyze KPIs related to the SAS Cloud Operational service.
SAS is a leader in data and AI. Through their software and services, they inspire customers around the world to transform data into intelligence - and questions into answers. They're recognized around the world for their inclusive, meaningful culture and innovative technologies.
Enabling faster incident response by improving monitoring coverage, alert accuracy, and root cause visibility
Helping teams shift from reactive to proactive operations by applying telemetry data and AI-driven insights
Empowering service owners with clear dashboards and actionable insights that guide performance improvements
HealthEquity's mission is to save and improve lives by empowering healthcare consumers. They envision making HSAs as widespread and popular as retirement accounts by 2030, valuing individuals more than their positions and passionate about connecting health and wealth for American families.
Fueled is a leading digital strategy, design, and engineering agency. They are a 300+ person team that has designed and built hundreds of digital products and experiences for brands.
Collaborate with engineers in supporting new features and services.
Build tools to monitor site stability and performance.
Troubleshoot site issues using industry-leading tools like Splunk, Prometheus and OpenTelemetry.
Yelp's engineering culture is cooperative and values individual authenticity. They encourage creative solutions to problems and help users, grow as engineers, and have fun in a collaborative environment.
Own SLI/SLO/SLA definitions for the Akuity SaaS platform and drive continuous improvement.
Participate in an on-call rotation and act as incident commander for high-severity production events.
Partner with engineering teams to build reliability into new features before they ship to production
Akuity helps enterprises ship software faster and more reliably with modern GitOps best practices. The Akuity Platform enables teams to manage the development and deployment across hundreds – if not thousands – of Kubernetes clusters from a single control plane.
Developing infrastructure to support cloud-based applications.
Creating deployment architect and continuous delivery pipelines.
Designing high-availability approaches, and implementing monitoring architecture.
Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.
Design and enhance proactive monitoring capabilities for AWS Amazon Connect CCaaS platforms.
Collaborate with developers, architects, and platform owners to establish logging standards.
Troubleshoot and resolve production incidents, performing root cause analysis and implementing preventive measures.
Miratech helps visionaries change the world. They are a global IT services and consulting company that brings together enterprise and start-up innovation. Miratech retains nearly 1000 full-time professionals, and their annual growth rate exceeds 25% with a culture of relentless performance.
Managing and optimizing client cloud environments.
Ensuring the reliability and performance of infrastructure supporting cloud-based applications.
Driving technical discussions and advising on best practices and approaches.
Nearform is an independent team of data & AI experts, engineers, and designers who build intelligent digital solutions and capability at pace. With 500 experts in 20+ countries, they are trusted by leading enterprises and have a close-knit community built on trust and camaraderie.
Support and implement monitoring and alerting strategy across Kraken’s customer business.
Define and uphold observability best practices across multiple products and platforms.
Partner with product teams to implement observability tooling and improve reliability across the organisation.
Kraken is a technology company focused on creating a smart, sustainable energy system. Their operating system for energy is transforming the industry around the world in a way that benefits everyone. They are a Great Place to Work with genuinely decent, honest, and empathetic people.
Resolve technical issues across infrastructure, deployments, databases, caching, and web performance.
Manage support tickets via Zendesk, with occasional live chat or voice support where needed.
Contribute to platform reliability by monitoring alert queues and participating in on-call rotation during working hours.
Upsun is a cloud application platform designed for hybrid teams, where AI agents and humans collaborate to solve complex problems, allowing developers, DevOps engineers, and platform teams to build, ship, and scale confidently. They are a remote, global workforce committed to open source and an open, welcoming environment.
Act as a senior escalation point for SOC investigations, providing guidance aligned to Copperleaf’s security architecture and operational practices.
Lead investigations into security alerts across Copperleaf’s Azure‑hosted environments, identity systems, corporate endpoints, and product infrastructure.
Track emerging threats relevant to SaaS providers, cloud platforms, Kubernetes, identity infrastructure, and AI‑driven attack techniques.
IFS is a billion-dollar revenue company with 7000+ employees across all continents specialized at AI technology. They enable customers to be their best when it really matters–at the Moment of Service™ and are committed to promoting an inclusive workforce that fully represents diverse cultures, backgrounds, and viewpoints.
Own the support experience for complex integration tickets, coaching the team on troubleshooting best practices.
Serve as a primary escalation point for the integrations support squad, acting as incident commander and driving issues to resolution.
Lead root cause analysis and technical debugging for production issues across our integrations platform.
Vanta's mission is to help businesses earn and prove trust by making security monitored and verified continuously. They have a kind and talented team of various backgrounds, and they empower companies to practice better security and prove it with ease.
Help drive reliability, automation and performance within our cloud-based infrastructure.
Become embedded within an Engineering team helping them navigate production excellence and advocate for best practices.
Debug production issues across services and levels of the stack as well as practice incident response and blameless postmortems.
Flywire is a global payments enablement and software company that was founded over a decade ago. They have over 1,200 global FlyMates, representing more than 40 nationalities, in 12 offices worldwide, and are looking for people to join the next stage of their journey as they continue to grow.