Troubleshoot and resolve issues in customer environments based on Linux, OpenStack, Kubernetes, and networking technologies, owning escalations end-to-end.
Reproduce customer issues in labs, confirm bug reports, and collaborate with the development team to improve product stability.
Communicate with customers during incidents via email and remote sessions, guiding them through troubleshooting and resolution processes.
Mirantis is the Kubernetes-native AI infrastructure company, enabling organizations to build and operate scalable, secure, and sovereign infrastructure for modern AI and data-intensive applications. With deep expertise in open source and Kubernetes, Mirantis empowers platform engineering teams across enterprises worldwide.
Improve cybersecurity posture through secure network design, firewall policy management, segmentation, and access controls.
Slingshot Aerospace is dedicated to making space safer and more secure. They are a team of builders, thinkers, and problem-solvers who believe that the next generation of space operations will be powered by better data and smarter software.
Deliver network stack projects end-to-end including service mesh, DNS, CDN, and edge protection while shaping technical vision and maintaining operability.
Integrate networking into self-service platforms to streamline workflows and enable engineering teams to operate independently.
Participate in on-call rotation, driving incident resolution and continuous improvement through postmortem analysis.
1Password builds a safe, productive digital future by unleashing employee productivity without compromising security. Over 180,000 businesses trust them, and they've earned a spot on the Forbes Cloud 100 for four consecutive years, fostering a collaborative, curious, and driven culture.
Build and maintain Python fleet tracking system that manages the full lifecycle of servers.
Build server management tooling that automates provisioning, health checks, GPU diagnostics, recovery and alerting.
Create and maintain metrics, dashboards, and alerting for hardware health across the fleet.
FAL is committed to keeping a large fleet of GPU servers healthy and productive. They offer a collaborative and supportive culture with learning and growth opportunities.
Lead the design and architecture of AWS network solutions for a large-scale migration project.
Plan and execute on-premises to AWS migration strategies, ensuring minimal disruption and high availability.
Collaborate with cross-functional teams to define and implement robust network security controls and performance optimization.
RWS provides technology and support services to help organizations unlock global understanding. With a global reach and a dedicated team of over 500 staff, the Product & Technology division ensures efficient support to operations worldwide.
Lead Onboarding end‑to‑end and extend with additional use cases.
Own a small portfolio of customer account and act as a trusted technical partner all year.
Provide technical support and communicate crisply with customers throughout.
OpsMill is building the next generation of infrastructure data management, focusing on helping automation teams unify data and scale automation reliably. As a commercial open-source company, they are practitioners who understand the real-world challenges of scaling infrastructure automation.
Act as the final escalation point for complex Cloud infrastructure issues, analyzing logs and metrics to identify root causes.
Own high-severity incidents, coordinate resolution with Engineering, DevOps, and SRE teams, and contribute to preventive actions.
Mentor L1 and L2 support engineers, create runbooks and SOPs, and collaborate with Product teams to reproduce issues.
Gcore provides infrastructure and software solutions for AI, cloud, network, and security, powering real-time communication, streaming, enterprise AI, and secure web applications. With 550+ professionals globally, they collaborate with partners like Intel, NVIDIA, Dell, and Equinix to support the digital ecosystem.
Implement highly available, scalable infrastructure across AWS, GCP, and bare-metal environments.
Drive an "automation-first" culture by writing code in Python/Go to build self-healing systems.
Act as lead Incident Commander, develop response playbooks, and conduct post-incident analyses.
Zscaler accelerates digital transformation to secure customers with a cloud-native Zero Trust Exchange platform. The company processes over 200 billion transactions daily and fosters a culture of execution, collaboration, and accountability.
Improve the reliability, performance, and scalability of our production platform.
Operate reliable infrastructure, improve observability, and drive incident response.
Use data-driven reliability practices such as SLIs, SLOs, SLAs, and DORA metrics.
VRChat is a game-changing platform that provides an endless collection of social VR experiences. They empower their community to bring their imaginations to life and help shape the metaverse. Their team includes people from Netflix, Twitter, Meta, and Microsoft.
Design, build, and implement robust infrastructure solutions aligned with business needs and security best practices.
Automate resource deployment, compute and storage allocation, and optimize delivery of key infrastructure services.
Troubleshoot escalated issues, perform root-cause analyses, and drive process improvements using AI and automation.
Hyland is the pioneer of the Content Innovation Cloud™, delivering ubiquitous enterprise intelligence to organizations through solutions that unlock actionable insights and drive automation. Trusted by thousands of organizations worldwide, including many of the Fortune 100, Hyland has grown to nearly 4,000 employees with a culture focused on employee initiatives, wellbeing, and innovation.
Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.
VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.
Own Render's core network infrastructure across multiple data centers and cloud providers, shaping how networking evolves as Render rapidly scales.
Design and build customer-facing networking capabilities that give users greater flexibility in how their services connect and communicate, and how traffic is routed.
Investigate complex networking issues across the stack, from the kernel and data plane to distributed systems and edge networking.
Render is building a modern cloud platform for developers creating AI-native, full-stack, multi-service applications, eliminating the tradeoff between hyperscaler power and developer-friendliness. They are a diverse and talented team that values craft, velocity, and user experience.
Design, provision, and manage AWS infrastructure using Terraform and Kubernetes.
Build, operate, and improve observability, monitoring, and incident response processes.
Collaborate with engineering teams on capacity planning, performance optimization, and resilient system design.
Vynca provides comprehensive care for individuals with complex needs, focusing on quality days at home. The company is a close-knit community guided by core values of Excellence, Compassion, Curiosity, and Integrity.
Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents.
Provide technical leadership for Kubernetes platform operations and supporting infrastructure services.
Mentor and support AI Infrastructure & Platform Operations Engineers, sharing technical knowledge through documentation and training.
Mirantis helps organizations ship code faster on public and private clouds, providing a public cloud experience on any infrastructure from the data center to the edge. The company serves many of the world's leading enterprises, including Adobe, DocuSign, Liberty Mutual, and PayPal, and is a leader in container management.
Be the technical anchor for your customers: build deep familiarity with their environments, participate in architecture reviews, and help translate network management goals into workflows.
Own technical issues end-to-end: diagnose, reproduce, and resolve issues across installation, configuration, integrations, and upgrades, meeting SLAs for response and resolution.
Make the team and product better: surface customer feedback with context, build runbooks, identify patterns in support volume, and collaborate on deep technical escalations.
NetBox Labs helps companies build and manage complex networks, accelerating network automation with open, composable products. Backed by Notable Capital, Grafana Labs CEO, and others, we are the commercial steward of open source NetBox and support a thriving community of thousands of companies.
Develop and maintain automated provisioning pipelines for bare-metal servers across global data centers.
Perform security monitoring, repair and recover from hardware or software failures.
Act as technical lead, mentor engineers, and report directly to the CTO.
Kayzen is a mobile demand-side platform (DSP) that democratizes programmatic advertising. With 160B+ daily ad requests and 1B+ ads served per day globally, it powers top mobile marketing teams with a focus on performance, transparency, and control.
Act as the 3rd-level escalation point for complex technical issues related to CDN and Edge Network products.
Diagnose and resolve advanced issues involving caching, DNS, routing, load balancing, SSL/TLS, and web security.
Take ownership of high-severity incidents (P1/P2) and drive resolution in collaboration with Engineering, Network, and Operations teams.
Gcore provides infrastructure and software solutions for AI, cloud, network, and security. They have 550+ professionals and offer a global team environment.
Lead consultative discovery engagements and executive workshops.
Architect and position observability and infrastructure intelligence solutions.
Enable reseller and channel partner technical teams through onsite and remote engagements.
BlueCat is a key player in Intelligent Network Operations, offering a combination of systems of understanding and change. They have an award-winning culture with "Great Place to Work" certification and are recognized as a top workplace in Canada.
Support and improve hybrid production infrastructure for 15+ development teams handling 100+ products, 10K+ domains, and billions of hits per day.
Architect and plan improvements of a multi-datacenter development environment, advocating for migration to automated, elastic infrastructures using cloud, Kubernetes, and serverless technologies.
Aylo is a tech pioneer that offers world-class adult entertainment and games on safe, popular platforms. With an international team of dynamic innovators, the company focuses on trust-and-safety protocols and has offices in Montreal, Austin, and Nicosia.
Design, build, and maintain CI/CD pipelines and Infrastructure as Code using tools like CloudFormation, Ansible, and Terraform.
Monitor and respond to infrastructure and application health, troubleshoot operational issues, and provide on-call support.
Maintain operational documentation, communicate proactively with teams, and ensure service delivery meets client expectations.
NICE Ltd. provides software used by 25,000+ global businesses, including 85 of the Fortune 100, to deliver customer experiences, fight financial crime, and ensure public safety. With over 8,500 employees across 30+ countries, NICE is recognized as a market leader in AI, cloud, and digital innovation.