Lead client discovery, architecture workshops, and solution design across observability, telemetry, reliability, and operational intelligence initiatives.
Own and evolve observability strategy including monitoring, alerting, dashboards, logging, and distributed tracing.
Define and manage SLIs, SLOs, and reliability metrics, improving MTTD and MTTR through automation.
Build and maintain reliable cloud infrastructure on AWS and Kubernetes while mentoring engineers on SRE best practices.
Filevine is a Legal AI company delivering Legal Operating Intelligence for legal work. Fueled by a team of exceptional collaborators and innovators, Filevine’s rapid growth has earned AI awards and recognition from Deloitte and Inc. as one of the most innovative and fastest-growing technology companies in the country.
Latin America
Unlimited PTO
16w maternity
16w paternity
Lead customers in strategic application of Honeycomb and observability practices to meet technical and business goals.
Act as a trusted advisor on telemetry schema design, data modeling, and sampling strategies.
Coach and mentor engineering teams on observability, SRE concepts, and instrumentation best practices.
Honeycomb defines observability for developer tools, working with companies like HelloFresh, Slack, and Vanguard. They are a fully distributed company of over 200 employees, named to Forbes' America's Best Startups in 2022 and 2023, with a culture focused on impact, inclusion, and autonomy.
Define and lead the end-to-end observability strategy covering logging, metrics, tracing, and alerting.
Architect and evolve a unified observability platform ensuring scalability and reliability.
Build and lead a high-performing observability engineering team with strong technical standards.
The company operates a high-scale developer-facing platform focused on reliability and performance. It is a remote-first organization with a globally distributed engineering team committed to building best-in-class developer infrastructure.
Design and operate enterprise-grade observability platforms across metrics, logs, traces, and events.
Build scalable monitoring stacks with Prometheus, Grafana, Loki, Tempo, OpenTelemetry, and Datadog.
Define SLOs, SLIs, error budgets, and alerting strategies to ensure system reliability.
Our partner is a technology company focused on building scalable observability platforms for distributed systems. They are an engineering-driven organization with a strong emphasis on automation, scalability, and developer experience.
Lead the Site Reliability Operations team, overseeing observability, monitoring, incident response, and operational excellence for key enterprise services.
Partner with product, engineering, and infrastructure teams to embed CI/CD and release best practices, automating build/test/deploy and release monitoring.
Own problem management, driving root cause analysis and corrective actions to improve system resilience and reduce incident impact.
Mercury Insurance helps people reduce risk and overcome unexpected events, serving customers for over 60 years. They are a midsize employer recognized as one of America's Best Midsize Employers for 2026, with a collaborative culture focused on growth and inclusion.
Act as a trusted technical partner, guiding organizations through onboarding, implementation, and expansion with white-glove support and best practices.
Deliver high-impact training, jumpstart engagements, and provide tailored technical consulting to help customers succeed.
Identify recurring issues, monitor support needs, and advocate for product improvements in close collaboration with internal teams.
Grafana Labs is the company behind Grafana, the open observability platform. With over 1,600 team members across 40+ countries, we are a 100% remote company backed by leading investors and trusted by more than 35 million users and 7,000+ customers.
Design and maintain Grafana dashboards and telemetry visualizations to monitor system performance and platform health.
Develop and maintain modular Ansible playbooks to automate infrastructure provisioning and configuration.
Configure observability solutions with Prometheus monitoring and alerting, and participate in Agile ceremonies.
Miratech is a global IT services and consulting company that helps visionaries change the world by supporting digital transformation for large enterprises. With nearly 1,000 full-time professionals across 5 continents and 25 countries, the company has a culture of Relentless Performance with a 99% project success rate and over 25% annual growth.