Remote Software engineering Jobs · Kubernetes

Job listings

  • Build and improve scalable, fault-tolerant, self-serve data infrastructure technologies to support ML and analytics workflows.
  • Own the Data Movement Platform for batch and stream data processing, and invest in building new infrastructure for Spark, Flink, and Airflow.
  • Collaborate with teammates on on-call responsibilities and monitoring/alerting to improve reliability, scalability, latency, and efficiency.

Reddit is a community of communities built on shared interests, passion, and trust, hosting the most open and authentic conversations on the internet. With over 100,000 active communities and approximately 126 million daily active unique visitors, Reddit is one of the internet's largest sources of information.

  • Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
  • Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
  • Collaborate with Data Science teams to apply software engineering best practices to ML workflows.

Wellhub revolutionizes workplace wellness by connecting employees to partners for fitness, mindfulness, therapy, nutrition, and sleep in one subscription. Headquartered in NYC with team members across the globe, we value wellbeing, collaboration, and different perspectives.

United States Unlimited PTO

  • Design and build scalable backend systems powering AI agents in real-time enterprise environments.
  • Develop agent orchestration frameworks and low-latency inference pipelines integrating LLMs and SLMs.
  • Build robust APIs and work with cross-functional teams to productionize agentic AI at scale.

Level AI is an AI-native platform that helps enterprises transform contact centers into engines of customer intelligence and operational efficiency. The company is a Series C startup backed by Battery Ventures and ENIAC, based in Mountain View, California, with a globally distributed team.

US Unlimited PTO

  • Provide frontline technical expertise to help developers deploy and scale Temporal in cloud-native environments.
  • Troubleshoot complex infrastructure issues, optimize performance, and develop automation solutions.
  • Collaborate with engineering and product teams to influence platform improvements and enhance developer experience.

Temporal provides an open source programming model that simplifies code and makes applications more reliable. The company is a growing team driven by values of curiosity, collaboration, and humility, focused on improving developer experience.

US 3w PTO

  • Design, implement, and maintain software enabling autonomous satellite operations and real-time tasking.
  • Develop and integrate services interfacing with ground station infrastructure and satellite communication protocols.
  • Collaborate with cross-functional teams to refine technical requirements and write clean, maintainable code with emphasis on safety and reliability.

BlackSky is a real-time intelligence company that owns and operates the world's most advanced space-based intelligence platform, providing satellite imagery and automated analytics. The company has a global team that works with cutting-edge technology and prides itself on being people-first, customer-focused, and fun.

  • Design and deliver robust, high-scale routing experiences for Data Pipelines for Twilio Segment.
  • Operate always-available, complex distributed systems in cloud environments.
  • Collaborate cross-functionally with design, product, and other engineers to define solutions.

Twilio is shaping the future of communications, delivering innovative solutions to hundreds of thousands of businesses and empowering millions of developers worldwide. The company is remote-first with a strong culture of connection and global inclusion, and employs a diverse team of Twilions.

  • Manage a team of Engineers, conducting 1:1s, performance reviews, hiring, and career development in a distributed remote friendly environment.
  • Own the technical roadmap for shared cloud infrastructure across Azure and AWS, balancing reliability work against longer-term platform improvements.
  • Set and enforce standards for infrastructure-as-code (Terraform, Helm, Kubernetes), documentation, and operational readiness.

Delinea is a pioneer in securing human and machine identities through intelligent, centralized authorization, empowering organizations to seamlessly govern their interactions across the modern enterprise. They value diversity, innovation, and a culture of respect and fairness, with a global team supported by strategic investment from TPG.

North America 6w PTO 26w maternity 26w paternity

  • Lead and mentor a team of Forward Deployed Engineers deploying the North platform.
  • Drive end-to-end deployment in private cloud and on-premises environments for customer success.
  • Collaborate with Product, Engineering, and Sales while optimizing cloud infrastructure and K8s services.

Cohere is a security-first enterprise AI company building cutting-edge foundation AI models and end-to-end products for real-world business problems. They are a global technology company with offices in Toronto, San Francisco, London, New York City, Montreal, Seoul, Germany, and Paris, employing a team of researchers, engineers, and designers.

  • Design, build, and maintain highly available, scalable, and secure blockchain products, systems, and infrastructure.
  • Collaborate with cross-functional teams to improve infrastructure, monitoring, automation, and incident response.
  • Research emerging trends in web3/blockchain and identify new product opportunities.

Galaxy is a global leader in digital assets and data center infrastructure, delivering solutions that accelerate progress in finance and artificial intelligence. The company is headquartered in New York City, with offices across North America, Europe, the Middle East, and Asia, and blends deep crypto expertise with institutional experience.

  • Design and build the control plane that provisions, scales, and heals Neki clusters with minimal customer-visible downtime.
  • Build and maintain high availability, disaster recovery, and data protection solutions for customer databases.
  • Build tooling and automation for database operations, backup, restore, and migration workflows, and participate in an on-call rotation.

PlanetScale is reinventing the database space, offering PostgresQL and Vitess clusters for horizontal scaling of MySQL and PostgreSQL. The company is profitable, one of the fastest growing in America, and builds small teams of p99 individuals.