Anticipate and support the Solutions Engineering team by designing technical presentations, demos, and white papers.
Create and deliver training materials, product workshops, and webinars for internal teams and customers.
Partner with Product, Marketing, and Engineering to enable the field with deep technical expertise and strategic support.
Grafana Labs is the company behind the open-source observability platform, providing a fully managed cloud service for monitoring and analytics. With over 1,600 team members across 40+ countries, they foster a global collaborative culture rooted in open source, transparency, and autonomy.
Define and lead the end-to-end observability strategy covering logging, metrics, tracing, and alerting.
Architect and evolve a unified observability platform ensuring scalability and reliability.
Build and lead a high-performing observability engineering team with strong technical standards.
The company operates a high-scale developer-facing platform focused on reliability and performance. It is a remote-first organization with a globally distributed engineering team committed to building best-in-class developer infrastructure.
Act as a trusted technical partner, guiding organizations through onboarding, implementation, and expansion with white-glove support and best practices.
Deliver high-impact training, jumpstart engagements, and provide tailored technical consulting to help customers succeed.
Identify recurring issues, monitor support needs, and advocate for product improvements in close collaboration with internal teams.
Grafana Labs is the company behind Grafana, the open observability platform. With over 1,600 team members across 40+ countries, we are a 100% remote company backed by leading investors and trusted by more than 35 million users and 7,000+ customers.
Design and operate enterprise-grade observability platforms across metrics, logs, traces, and events.
Build scalable monitoring stacks with Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Datadog.
Define SLOs, SLIs, error budgets, and alerting strategies to ensure system reliability.
Our partner is a technology company focused on building scalable observability platforms for distributed systems. They are an engineering-driven organization with a strong emphasis on automation, scalability, and developer experience.
Own and evolve observability strategy including monitoring, alerting, dashboards, logging, and distributed tracing.
Define and manage SLIs, SLOs, and reliability metrics, improving MTTD and MTTR through automation.
Build and maintain reliable cloud infrastructure on AWS and Kubernetes while mentoring engineers on SRE best practices.
Filevine is a Legal AI company delivering Legal Operating Intelligence for legal work. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.
Design and build cloud-native infrastructure for reliability, observability, and automation across GCP, GKE, and Cloud Run.
Own incident response, root cause analysis, escalation workflows, and runbooks to prevent hard problems from recurring.
Develop Infrastructure as Code, CI/CD pipelines, and operational tooling to improve developer velocity and platform efficiency.
CertifyOS is building the data infrastructure that powers modern healthcare, automating provider licensing, enrollment, credentialing, and network monitoring through an API-first platform. The company is backed by leading investors with a team of deep experience in provider data systems, valuing authenticity, accountability, collaboration, results, and openness to feedback.
Lead the design, development and operation of large-scale, secure observability systems to keep services online and performant.
Deploy and scale Prometheus, ElasticSearch clusters, and high-throughput Kafka data pipelines for millions of customer devices.
Collaborate with the Observability team to build alerting systems, APIs, and self-service monitoring tools using Terraform and multiple languages.
ItD is a new generation consulting and software development company that blends diversity, innovation, and integrity with real business results. It is a woman- and minority-led firm with a global community, empowering employees and offering benefits like medical, dental, vision, 401(k), and career development.
Build and operate the internal engineering platform that provides application engineers with the tools, systems, and Kubernetes clusters they need to deploy and run their workloads.
Focus on cloud infrastructure, capacity management, security, engineering productivity, monitoring, and US Federal compliance across squads.
Participate in on-call rotations to ensure the health of the system and understand how people use our products.
Grafana Labs, the company behind the open observability cloud, is founded on the principles of open source, open standards, open ecosystems, and open culture. We are a 100% remote company with 1,600+ team members across 40+ countries, backed by leading investors including Lightspeed Venture Partners, Sequoia Capital, GIC, Coatue, J.P. Morgan, CapitalG, and Lead Edge Capital.
Design and develop CI/CD systems for websites, services, and release workflows, and operate an EKS-based Kubernetes platform.
Diagnose debug production incidents, drive root-cause analysis, and implement improvements to enhance system reliability.
Write and maintain infrastructure as code using Pulumi or Terraform/OpenTofu across multiple AWS accounts with security-conscious practices.
Thunderbird is one of the world’s most trusted open-source email applications, empowering more than 20 million people globally. Our small but growing distributed team includes 65+ people across seven countries, and we build privacy-respecting communication tools with a collaborative, inclusive, and user-first spirit.
Co-own the architecture of cloud infrastructure on Azure and Kubernetes clusters for high throughput and availability.
Drive resilience strategy for global scaling, zero-downtime deployments, and disaster recovery.
Evolve observability stack with LGTM (Loki, Grafana, Tempo, Mimir) and lead incident response.
Flip is an AI-powered employee experience platform for frontline workers in retail, manufacturing, and logistics. The company is a young, rapidly growing tech company with a remote-first culture and offices in Berlin and Stuttgart.
Build and operate secure agent runtimes with sandboxing, runtime isolation, and RBAC.
Design and maintain integration surfaces with MCP-style adapters and gateways across marketplace teams.
Implement observability and cost control including traces, telemetry, and cost-per-workflow.
Zartis is a global AI transformation and technology consulting partner that designs, builds, and scales technology solutions for ambitious organizations. With engineering hubs across EMEA and LATAM and long-term partnerships in financial services, healthcare, and energy, they foster an inclusive culture based on trust and innovation.
Design, build, and maintain CI/CD pipelines and Infrastructure as Code using tools like CloudFormation, Ansible, and Terraform.
Monitor and respond to infrastructure and application health, troubleshoot operational issues, and provide on-call support.
Maintain operational documentation, communicate proactively with teams, and ensure service delivery meets client expectations.
NICE Ltd. provides software used by 25,000+ global businesses, including 85 of the Fortune 100, to deliver customer experiences, fight financial crime, and ensure public safety. With over 8,500 employees across 30+ countries, NICE is recognized as a market leader in AI, cloud, and digital innovation.
Lead client discovery, architecture workshops, and solution design across observability, telemetry, reliability, and operational intelligence initiatives.
Define scalable standards for telemetry onboarding, naming, tagging, RBAC, service ownership, dashboards, alert governance, runbooks, and operational handoff.
AHEAD builds platforms for digital business by weaving together cloud infrastructure, automation, analytics, and software delivery to help enterprises achieve digital transformation. The company prioritizes a culture of belonging and is an equal opportunity employer that values diversity and inclusion.
Design and implement high-quality, scalable integrations for observability solutions.
Collaborate with cross-functional teams to deliver features aligned with product strategy.
Participate in on-call rotations and contribute to open-source communities.
Grafana Labs provides an open-source observability platform, Grafana Cloud, that integrates metrics, logs, and traces. With over 1,600 team members across 40+ countries, they maintain a remote-first, collaborative culture backed by leading investors.
Maintain and support core infrastructure with deep Linux expertise.
Design scalable networks using VLANs, routing, VPNs, and UniFi equipment.
Automate provisioning with Ansible, Bash/Python, and MAAS-based workflows.
A European deep-tech company is developing a decentralized, energy-efficient cloud platform using distributed bare-metal infrastructure. It is a startup or hyper-growth environment that values autonomy, speed, and problem-solving.
Design, provision, and manage AWS infrastructure using Terraform and Kubernetes.
Build, operate, and improve observability, monitoring, and incident response processes.
Collaborate with engineering teams on capacity planning, performance optimization, and resilient system design.
Vynca provides comprehensive care for individuals with complex needs, focusing on quality days at home. The company is a close-knit community guided by core values of Excellence, Compassion, Curiosity, and Integrity.
Own the operational excellence and infrastructure strategy for Remote Build's platform, ensuring reliability, performance, and security.
Lead incident response, build observability systems, and drive continuous improvement in system reliability.
Embed security into infrastructure, optimize costs, and automate operational toil to scale efficiently.
Remote solves modern organizations' biggest challenge of navigating global employment compliantly. With a fully distributed team across 6 continents, the company fosters a future-focused culture with core values of innovation and async work.
Design and implement a SUSE-based virtualization platform using KVM as the hypervisor.
Standardize on Infrastructure as Code using Terraform and Ansible for automation.
Integrate automation with CI/CD pipelines and maintain living architecture documentation.
Deutsche Telekom IT Solutions is a subsidiary of the Deutsche Telekom Group providing IT and telecommunications services. With over 5300 employees across four sites in Hungary, the company is recognized as a top employer and emphasizes ethical practices and continuous development.
Own and evolve AWS infrastructure using Terraform, managing EKS clusters, databases, and core services.
Maintain CI/CD reliability and developer tooling across the full engineering org.
Lead incident response, drive post-incident reviews, and improve monitoring and alerting standards.
Babylist is the leading platform for expecting and new families, helping parents feel confident, connected, and cared for at every step. As a modern, AI-forward tech company with over 10 million yearly shoppers, Babylist has expanded into a full ecosystem and generated $750M in revenue in 2025, reshaping the $235B kids and baby market.
Design, deploy, and manage production Kubernetes clusters with workload scheduling, resource quotas, network policies, and RBAC.
Build and optimize CI/CD pipelines using Infrastructure as Code and GitOps principles.
Implement observability solutions using Prometheus, Grafana, and OpenTelemetry for performance tuning and reliability.
VerTALENTS is a subsidiary of VerSprite Cybersecurity, specializing in technology staffing. The company connects top technical talent with industry clients through various methods, adding value to both clients and candidates for full-time and contracting opportunities.