Developing infrastructure to support cloud-based applications.
Creating deployment architect and continuous delivery pipelines.
Designing high-availability approaches, and implementing monitoring architecture.
Nearform is a digital and AI engineering consultancy with a reputation for experience-led modernization. They focus on creating transformative digital products for enterprise customers across the UK and Ireland. Nearformers form a close-knit community built on trust and camaraderie.
Design, implement, and maintain scalable integrations for metrics, logs, and traces across cloud and Kubernetes environments.
Build middleware, libraries, and services to simplify development and observability workflows.
Lead technical direction and strategic planning for observability projects.
They are currently looking for a Staff Software Engineer - Grafana Cloud Observability, Kubernetes Monitoring in United States. This role offers a unique opportunity to shape and advance cloud observability solutions for large-scale systems, focusing on metrics, logs, and traces.
Design and implement high-quality, scalable integrations for various infrastructure components, applications, and data ingestion pipelines.
Create middleware components and libraries that simplify development and maintenance of observability solutions.
Lead the technical direction and vision of the team, contributing to strategic discussions and future development of observability solutions.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana. They help more than 3,000 companies manage their observability strategies with the Grafana LGTM Stack, featuring scalable metrics, logs, and traces, and thrive in an innovation-driven environment.
Monitor cloud infrastructure and application health using observability tools; respond to alerts.
Perform Tier 1 incident triage, document findings, and escalate appropriately to Development or SRE teams.
Monitor and support CI/CD pipelines to ensure successful builds and deployments.
Lumin Digital empowers credit unions and banks by creating cutting-edge digital experiences. They are a trailblazer in digital banking solutions with a culture that fosters trust, respect, and boldness, encouraging team members to explore and experiment with new ideas.
Define and evolve reliability standards for the SmarterDx platform.
Enhance observability systems (metrics, logs, traces, alerting) to provide actionable insights and reduce mean time to detect (MTTD) and resolve (MTTR).
Reduce operational toil through automation, self-healing systems, and improved deployment and rollback mechanisms.
SmarterDx, a Smarter Technologies company, builds clinical AI that is transforming how hospitals translate care into payment. Founded by physicians in 2020, their platform connects clinical context with revenue intelligence, helping health systems recover millions in missed revenue, improve quality scores, and appeal every denial.
Enabling faster incident response by improving monitoring coverage, alert accuracy, and root cause visibility
Helping teams shift from reactive to proactive operations by applying telemetry data and AI-driven insights
Empowering service owners with clear dashboards and actionable insights that guide performance improvements
HealthEquity's mission is to save and improve lives by empowering healthcare consumers. They envision making HSAs as widespread and popular as retirement accounts by 2030, valuing individuals more than their positions and passionate about connecting health and wealth for American families.
Build and maintain CI/CD pipelines and GitOps workflows across a diverse set of engineering teams.
Own observability — monitoring, alerting, logging — and support development teams in instrumenting their services.
Optimise infrastructure for security, cost, performance and reliability.
1inch is a decentralized finance (DeFi) platform. We empower users to access the best rates and execute efficient and secure trades across multiple liquidity sources.
Maximize the velocity of our product engineering team.
Ensure platform scalability, reliability, and security.
Champion best practices and shape the engineering culture.
They are building a robust, scalable trading platform to serve high-traffic, latency-sensitive applications. They leverage state-of-the-art technologies to support real-time trading while providing unparalleled reliability and performance.
Build Reliable Cloud Infrastructure: Implement and maintain AWS infrastructure using Terraform across EKS, Lambda, EC2, and S3.
Improve Developer Workflows: Contribute to CI/CD pipelines, starter kits, and internal tooling that reduce manual effort and improve deployment confidence.
Strengthen Observability & Operations: Add monitoring, logging, and alerting (DataDog) to platform services and participate in an on-call rotation.
Spreetail helps brands increase their ecommerce market share globally while improving operational costs. They are building one of the fastest-growing ecommerce companies in history with a focus on innovation.
Design, build, and manage our cloud infrastructure using modern tools (Pulumi) to ensure all infrastructure changes are reproducible, secure, and easily auditable.
Orchestrate and optimize our Kubernetes clusters for complex, compute-heavy AI workloads, guaranteeing maximum efficiency and fault tolerance.
Implement a flawless monitoring setup using Datadog and OpenTelemetry to make the black box of our distributed systems transparent, hunting down latency spikes or bottlenecks before they impact users.
Deepslate is building Speech to Speech Voice AI models that sound and act indistinguishable from a human, with the belief that everyone should be able to use it. Backed by top-tier investors from the Tech and AI sectors, we are incredibly well-funded and moving fast.
Design and implement comprehensive monitoring strategies.
Take ownership of production incident response, lead handling, and drive remediation.
Continuously improve operational processes, reliability practices, and team readiness.
InvestorFlow delivers industry specialized CRM and digital portals to help alternative asset firms find opportunities, create and manage relationships, and turn relationship insights into action. They serve over 175 clients, including 25 of the top 50 alternative asset managers, managing more than $6 trillion in assets.
Support and operate Legion’s AWS-based cloud platform and Kubernetes (EKS) environments.
Build and maintain infrastructure-as-code using Terraform.
Improve CI/CD pipelines to increase deployment safety and velocity.
Legion Technologies delivers the industry’s most innovative workforce management platform. The AI-driven Legion WFM platform maximizes labor efficiency and employee engagement. They are a remote, mission-driven team that embraces a collaborative, fast-paced, and entrepreneurial culture.
Lead infrastructure initiatives across the engineering organization.
Design technical quality bar and architectural standards.
Build platforms and AI-enabled systems for multiple teams.
Fieldguide is automating and streamlining the work of assurance and audit practitioners specifically within cybersecurity, privacy, and financial audit, building software for the people who enable trust between businesses. They are based in San Francisco, CA, but built as a remote-first company with an inclusive, driven, humble and supportive team.
Design and maintain scalable cloud environments using tools like Terraform, CloudFormation, or Ansible.
Build and optimize automated deployment pipelines to ensure rapid and reliable software delivery.
Implement robust monitoring, logging, and alerting frameworks to ensure 24/7 system health.
CodeRoad offers end-to-end software development services, helping businesses scale with infrastructure solutions. They provide staff augmentation, dedicated IT teams, and software engineering to empower businesses in a digital landscape.
Provide and own automation of the provisioning of CSP resources, including networking, Kubernetes clusters and specific CSP resources required by our application teams.
Work with users (Grafana Cloud application teams) to help understand their needs and ensure investment in the right capabilities.
Participate in the Platform department Infrastructure wing on-call rotation.
Grafana Labs is a remote-first, open-source powerhouse with more than 20M users of Grafana around the globe. The team thrives in an innovation-driven environment where transparency, autonomy, and trust fuel everything that they do.
Build our observability and alerting platform from the ground up.
Lead infrastructure builds for compliance (SOC 2, HIPAA).
Truv is transforming the financial data industry with a secure and real-time API platform for payroll account access. Backed by $30M from top investors, they're disrupting a $2B legacy market with cutting-edge innovation and a customer-first approach.
Partner closely with product engineering squads (embedded model)
Own production reliability for high-SLA and complex customer environments
Design and implement automation to scale our reliability practices
Grafana Labs is a remote-first, open-source powerhouse that helps more than 3,000 companies manage their observability strategies. They are scaling fast and staying true to what makes them different: an open-source legacy, a global collaborative culture, and a passion for meaningful work.
Design and maintain scalable, fault-tolerant infrastructure that supports our SaaS platform and keeps pace with business growth.
Define, document, and maintain SLIs, SLOs, and SLAs in partnership with product engineering, translating business commitments into technical guardrails.
Lead incident response with steady judgment, facilitate blameless postmortems, and drive remediation efforts that prevent recurrence.
Fixify is on a mission to reimagine IT teams support companies. They need a Senior Site Reliability Engineer who finds joy in building systems that fade into the background, empowering product engineers to ship with confidence and their customers to work without interruption.
Standardize CI/CD pipelines (GitHub Actions) and Helm charts across 10+ microservices
Build centralized logging, metrics, and alerting (currently a gap)
Extend Terraform to cover full AWS infrastructure
Kiefer Tech delivers cutting-edge AI, robotics, and enterprise solutions across Greece and the EU, leveraging over 20 years of engineering heritage from the Green Energy sector. As the technology arm of Kiefer, they are guided by innovation, quality, and long-term client partnerships and are building sovereign AI infrastructure.