Source Job

Romania

  • Build and operate large-scale cloud infrastructure for product metrics and real-time analytics.
  • Design, develop, and improve fault-tolerant distributed systems processing trillions of events.
  • Collaborate cross-functionally, perform code reviews, and participate in on-call rotation.

Golang Kubernetes AWS Terraform

20 jobs similar to Senior Cloud Engineer - Product Metrics

Jobs ranked by similarity.

US

  • Design, build, operate, and maintain large-scale distributed systems for product metrics.
  • Develop backend services using Golang within a cloud-native, Kubernetes-based environment.
  • Ensure high performance, reliability, and cost efficiency of critical data processing systems.

The company builds a large-scale distributed data platform for real-time analytics and customer-facing insights. It is a remote-first organization with a strong engineering culture focused on ownership, learning, and technical excellence.

US Unlimited PTO

  • Provide frontline technical expertise to help developers deploy and scale Temporal in cloud-native environments.
  • Troubleshoot complex infrastructure issues, optimize performance, and develop automation solutions.
  • Collaborate with engineering and product teams to influence platform improvements and enhance developer experience.

Temporal provides an open source programming model that simplifies code and makes applications more reliable. The company is a growing team driven by values of curiosity, collaboration, and humility, focused on improving developer experience.

Global Unlimited PTO 16w maternity 16w paternity

  • Design, implement, and operate core services that power Docker’s Cloud Sandboxes platform.
  • Build scalable systems for microVM orchestration, workload scheduling, and lifecycle management.
  • Ensure system reliability, observability, and performance across Docker’s Cloud Sandbox infrastructure.

Docker is a globally distributed, remote-first company that builds tools for developers to build, share, and run applications. Trusted by over 20 million monthly users and 20 billion container image pulls, it has a collaborative culture focused on innovation and reliability.

Global

  • Lead a distributed engineering team to deliver reliable, secure, and scalable cloud infrastructure solutions.
  • Drive the design, development, and delivery of cloud image pipelines and infrastructure automation using Python and Golang.
  • Mentor engineers, perform code reviews, and champion agile practices and engineering best practices.

The company develops and manages public cloud infrastructure solutions, focusing on reliable, secure, and scalable cloud-native software and automation. It operates with a distributed, remote-first engineering team that fosters collaboration and technical excellence.

Netherlands

  • Design and scale highly reliable platform systems supporting complex cloud-native workloads across multiple deployment environments.
  • Build and enhance core platform services while contributing to distributed systems, event-driven architectures, and cloud-native infrastructure.
  • Optimize cloud resources, networking, storage, compute, and observability to improve system performance, scalability, reliability, and maintainability.

Jobgether uses an AI-powered matching process to connect candidates with hiring companies. They operate as a job platform, processing applications and sharing top candidates with employers.

US

  • Perform operational deployments, implementations, and maintenance for production systems.
  • Implement and maintain monitoring, reporting, and alerting systems for Core Speech products.
  • Be part of an on-call rotation and work collaboratively to improve system performance and architecture.

Solventum is a new healthcare company with a long legacy of solving big challenges to improve lives and enable healthcare professionals to perform at their best. They are a large company that values empathy, insight, and clinical intelligence, collaborating with top minds in healthcare.

United States Canada UK Unlimited PTO 18w maternity 12w paternity

  • Build and maintain core components of the clearing house in Go on GCP, including customer onboarding flows and data ingestion pipelines.
  • Take ownership of ambiguous problems and drive features from design through production with appropriate testing and observability.
  • Participate in on-call rotation, contribute to incident response, and become a go-to engineer for core subsystems.

Chainguard is the trusted source for secure open source software, delivering hardened builds for enterprise customers. The company is venture-backed by leading investors and serves Fortune 500 enterprises.

United States

  • Design and build core platform infrastructure for large-scale cloud-native data and analytics systems.
  • Own and improve CI/CD pipelines, testing frameworks, and deployment in a high-scale PaaS environment.
  • Contribute to reliability engineering, observability, and operational excellence across distributed systems.

Jobgether uses an AI-powered matching process to connect candidates with roles. The company is a growing platform focused on efficient job matching and data privacy compliance.

Poland

  • Design, write and deliver software to implement and support large web-scale, highly-performant, highly-available infrastructure on GCP/AWS.
  • Monitor infrastructure, respond to incidents, correct and improve systems to prevent incidents, and plan capacity.
  • Tune large-scale clusters for optimal performance and efficiency and support system deployments and product releases.

OpenX develops digital advertising marketplaces and technologies to optimize ad delivery for publishers and advertisers. The company operates a large-scale cloud infrastructure in Poland and values teamwork, customer centricity, and continuous learning.

Canada Unlimited PTO

  • Design, build, and operate distributed systems powering observability across ClickHouse Cloud.
  • Own reliability, performance, and cost-efficiency of the telemetry pipeline and storage systems.
  • Take part in on-call rotation and drive root-cause resolution and long-term fixes.

ClickHouse is a real-time analytics and data warehousing company recognized on the 2025 Forbes Cloud 100 list. With over 3,000 customers and rapid growth, the company fosters an innovative and fast-paced culture.

US Unlimited PTO

  • Own the US-only production environment end-to-end, including infrastructure deployment, maintenance, scaling, and reliability.
  • Lead and grow the US-based DevOps team, design scalable AWS infrastructure, and build CI/CD pipelines for safe, fast shipping.
  • Partner with engineering on application error investigations, improve monitoring and alerting, and coordinate with the Tel Aviv team on shared platform standards.

Zafran de-risks 90% of critical vulnerabilities overnight across hybrid environments using existing security tools. Backed by Sequoia Capital and Cyberstarts, it is one of the fastest-growing companies in cybersecurity, scaling to meet demand from advanced organizations.

Global Unlimited PTO

  • Design and build resilient, scalable platform services like authentication and rate limiting.
  • Collaborate with engineers across teams to deliver infrastructure solutions.
  • Optimize systems for security, performance, and always-on availability.

Constructor is an AI-first ecommerce search and discovery platform that helps shoppers find products and enables brands to drive revenue. The company is fully remote and fosters a culture of growth, offering training budgets and regular team offsites.

US

  • Design, code, test, and debug software applications with attention to performance, scalability, and security.
  • Collaborate with cross-functional teams to translate business requirements into technical specifications and architectural designs.
  • Participate in code reviews, troubleshoot issues, and maintain comprehensive technical documentation.

EasyPost is a YC unicorn that makes shipping simple for businesses through a developer-friendly REST API. The company is rapidly growing, with a culture of builders and problem-solvers who value elegant architecture and fast decisions.

US

  • Manage a scrum team of 4-6 engineers building and operating high-volume bidder systems.
  • Oversee AWS-based cloud infrastructure processing over 1 billion HTTP requests per hour.
  • Drive improvements in reliability, performance, and cost efficiency across production systems.

Jamloop builds high-scale advertising technology for real-time bidding systems. We are a remote-first company focused on reliability and operational excellence.

Germany Unlimited PTO

  • Design and maintain scalable infrastructure-as-code solutions using Terraform and Kubernetes.
  • Build and operate observability systems while leading incident response and reliability improvements.
  • Embed security and compliance practices into infrastructure and optimize system performance and cloud costs.

This partner company builds a next-generation platform enabling AI-driven services across global employment infrastructure. It is a highly distributed, async-first organization where engineers thrive in ownership and autonomy.

US 4w PTO 14w maternity 14w paternity

  • Own core compute infrastructure across multiple cloud providers and regions.
  • Design capabilities for greater performance and flexibility in service deployment.
  • Investigate and resolve challenging cloud and compute issues across the stack.

Render is a cloud platform for developers building AI-native, full-stack, multi-service applications. Trusted by over 6 million developers, the company has raised $257M in funding and values craft, velocity, and user experience.

US

  • Architect and ship reliable, high-velocity features handling critical traffic with low latency.
  • Drive rigorous code reviews and maintain high testing standards in Go codebase.
  • Manage and enhance cloud configurations across AWS and Azure using Infrastructure as Code (Terraform).

Twilio is shaping the future of communications with innovative solutions for hundreds of thousands of businesses and millions of developers worldwide. The company fosters a remote-first culture with a strong commitment to global inclusion and diverse experiences, making every team member feel part of a vibrant community.

India

  • Design and deliver robust, high-scale routing experiences for Data Pipelines for Twilio Segment.
  • Operate always-available, complex distributed systems in cloud environments.
  • Collaborate cross-functionally with design, product, and other engineers to define solutions.

Twilio is shaping the future of communications, delivering innovative solutions to hundreds of thousands of businesses and empowering millions of developers worldwide. The company is remote-first with a strong culture of connection and global inclusion, and employs a diverse team of Twilions.

Global

  • Design and implement AI inference and training cloud products optimized for Kubernetes, including autoscaling and distributed jobs across GPU fleets.
  • Write clean, efficient Go code for Kubernetes controllers, operators, and custom resources supporting AI workloads.
  • Build APIs, CLIs, and developer tools to simplify deployment, lifecycle management, and monitoring of AI applications.

Gcore is a global provider of infrastructure and software solutions for AI, cloud, network, and security, powering digital experiences worldwide. With 550+ professionals and 210+ edge locations, the company collaborates with partners like Intel, NVIDIA, and Equinix to build the foundation for an AI-driven world.

India

  • Design and automate cloud infrastructure for scalable, secure deployments across public cloud environments.
  • Develop and maintain AI-powered services and cloud-native solutions for enterprise platforms.
  • Build monitoring, alerting, and observability solutions to proactively resolve infrastructure and application issues.

This position is listed on behalf of a partner company. They are looking for a Cloud Platform & AI Engineer based in India.