Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.
Manage technical inquiries and troubleshoot complex issues.
Work closely with clients and developers to provide industry-leading client communications.
Analyze and interpret data to identify trends and patterns.
Entersekt is a leader in digital banking fraud prevention and payment security, including mobile authentication, mobile app security, and 3-D Secure authentication. They enable secure digital transactions for leading financial institutions globally and protect the digital transactions of over 210 million active users.
Write code, automate everything, design for reliability, and deeply understand the systems.
Build or extend Terraform modules and contribute to Platform Engineering around Observability.
Collaborate with developers to shape feature design so that reliability is built in, not added later.
InPost Group is an innovative European out of home deliveries company, revolutionizing the way parcels are delivered to customers. With over 10,000 employees worldwide, InPost Group is one of the largest out of home delivery providers in Europe, committed to providing sustainable and efficient delivery solutions.
Implementing the improvements to the reliability, fault tolerance, scalability, and performance of our infrastructure
Managing incidents using your technical know-how to involve the appropriate teams and automate away manual practices
Improving observability across our systems (metrics, logs, tracing) to reduce time to detection and resolution
Newton is changing how Canadians trade crypto with the goal to make financial freedom achievable for everyone by giving their customers the tools and knowledge needed to navigate the crypto world. They are a remote team spread across Canada that values pushing boundaries and getting things done.
Own the support experience for complex integration tickets, coaching the team on troubleshooting best practices.
Serve as a primary escalation point for the integrations support squad, acting as incident commander and driving issues to resolution.
Lead root cause analysis and technical debugging for production issues across our integrations platform.
Vanta's mission is to help businesses earn and prove trust by making security monitored and verified continuously. They have a kind and talented team of various backgrounds, and they empower companies to practice better security and prove it with ease.
Help deploy and configure Dynatrace OneAgent and ActiveGates with automated tooling.
Define and instrument user‑centric metrics and objectives in Dynatrace.
Combine Davis® AI with Copilot/Claude to identify root causes and reduce MTTR.
AWP Safety's IT Internship Program is a hands‑on, learning experience for early‑career professionals who want to build a future in IT Site Reliability Engineering. They operate at the intersection of Software Engineering and Systems Operations, using Dynatrace to diagnose performance bottlenecks and automate "toil" out of existence.
Collaborate and work closely with engineering, product and support teams
Identify the underlying causes of critical issues and improve processes to prevent recurrence
Camunda is the leader in enterprise agentic automation, orchestrating complex business processes across agents, people, and systems. Over 700 leading innovators rely on Camunda to slash time-to-value from months to days, boost operational efficiency, and elevate customer experiences. As a fully remote, global company, they’re rewriting the rules of modern business and growing fast, looking for top talent to join their team.
Act as the first point of contact for customer support inquiries.
Resolve common customer issues related to API usage, onboarding, authentication, integration, and billing.
Escalate issues to L2/L3 support or engineering with clear documentation, reproduction steps, and logs.
Databento provides modern APIs for financial market data, making it dramatically easier for firms of all sizes to access and use market data. As a Series A company, they’ve raised $37.8M to date and grown revenue by over 400% YoY.
Research, diagnose, and resolve customer issues with Product, DevOps, and engineering teams.
Validate customer-specific fixes and releases with the quality assurance team.
Analyze customer feedback for product issues and bugs, creating JIRAs for the engineering team.
Zimperium provides mobile threat defense solutions. They cater to enterprises and governments globally, protecting them against mobile cyberattacks. I am unable to ascertain the company's size and culture from the provided text.
Handle escalated cases requiring high-level expertise and advanced troubleshooting.
Maintain an in-depth understanding of all product areas, like Driver Safety and Fleet Management.
Streamline workflows between engineering, product, and technical support teams.
Motive empowers those who run physical operations with tools that improve safety, productivity, and profitability. They serve nearly 100,000 customers from enterprises to small businesses across industries like transportation, construction, energy, and agriculture.
Build and own the foundational infrastructure that our products run upon.
Work directly on our products' golang code base to implement SRE related objectives.
Take a data driven approach to quantifying system performance and reliability.
LiveKit provides the network infrastructure for multimodal AI interfaces, enabling seamless audio and visual interactions. Founded in 2021, LiveKit supports over 3 Billion calls annually, with 100,000+ developers and industry giants like OpenAI, Spotify, and Meta.
Comfortable working in a fully remote environment.
Value designing solutions to customer problems.
Comfortable rolling up your sleeves to understand incidents.
Humanitec is at the forefront of the Platform Engineering revolution, as enterprise companies across the globe re-shape how they manage their cloud infrastructure. They aim to help platform engineering teams build Internal Developer Platforms that unlock true developer self-service.
Own and evolve the uptime monitoring platform to enhance customer capabilities.
Deploy a Clickhouse instance to capture check run logs and design APIs for reporting.
Collaborate with customers to resolve bugs affecting their infrastructure.
Jobgether is a platform posting jobs on behalf of partner companies. We use AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements.
Build and maintain stable, scalable foundational services that can be leveraged by other engineering teams.
Collaborate with many internal partners and product teams to influence the design of our API surface.
Design and develop reliable, secure, highly available and delightful experiences for the dbt Cloud admin and the end user.
Dbt Labs is the pioneer of analytics engineering, helping data teams transform raw data into reliable, actionable insights. They've grown from an open source project and now serve more than 5,400 dbt Platform customers, including Astra Zenica, Sky, Nasdaq, Volvo, JetBlue, and SafetyCulture.
Design, implement, and maintain scalable integrations for metrics, logs, and traces across cloud and Kubernetes environments.
Build middleware, libraries, and services to simplify development and observability workflows.
Lead technical direction and strategic planning for observability projects.
They are currently looking for a Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring in United States. This role offers a unique opportunity to shape and advance cloud observability solutions for large-scale systems, focusing on metrics, logs, and traces.
Work directly with enterprise customers to deploy and configure OpenTelemetry instrumentation across their environments.
Build custom integrations, dashboards, and tooling to help customers realize the full value of Dash0.
Troubleshoot complex issues in distributed systems, Kubernetes clusters, and observability pipelines.
Dash0 is building an AI-centric platform that eliminates vendor lock-in and meaningless toil and is OpenTelemetry-native. They are backed by top-tier investors including Balderton Capital, Accel and Cherry Ventures and led by a founding team with decades of experience in observability.
Design, build, and maintain scalable infrastructure and tooling that improves reliability, performance, and availability across OnePay’s platform
Contribute to the evolution of our observability stack, platform libraries, cloud architecture, and CI/CD pipelines
Develop automation and monitoring systems to detect, prevent, and remediate incidents before they impact customers
OnePay is a consumer fintech company trusted by millions of Americans to make money better, providing an all-in-one financial services platform. Backed by Walmart and Ribbit Capital, OnePay provides banking, savings, credit cards, lending, investing, and crypto services and embedded financial services to frontline workers.
Design infrastructure, networking, and software platform architecture.
Build and maintain automation of Continuous Integration and Continuous Deployment pipelines.
Troubleshoot infrastructure, internal applications, networking, and security issues.
Loadsmart is a technology company focused on the logistics and supply chain industry. They leverage data and technology to automate and optimize freight transportation, connecting shippers and carriers to streamline the shipping process. They are a mid-sized company passionate about transforming the future of freight.
Provide daily expert guidance to existing customers
Develop strong relationships with customers, sales, and product teams
Proactively guide customers through their architectural and product setup and decisions
Solo enables companies to Connect, Secure and Observe modern applications – APIs, Microservices and Data – with the industry’s leading API and Service Mesh Management Platform (“Gloo”). Solo is a VC-backed company that was founded in 2017 by Idit Levine and valued at $1B in 2021.
Collaborate with engineering teams to design and implement scalable, secure systems.
Establish and manage service level objectives (SLOs) and service level agreements (SLAs).
Enhance incident response processes and post-mortem analysis for outages.
ClickHouse, recognized on the 2025 Forbes Cloud 100 list, is one of the most innovative and fast-growing private cloud companies. With more than 3,000 customers and ARR that has grown over 250 percent year over year, ClickHouse leads the market in real-time analytics, data warehousing, observability, and AI workloads.