AI Compute and Infrastructure Engineer

Kraken

Remote regions

Global

Benefits

The Team:

  • This dedicated AI Compute and Infrastructure team owns the backbone for running AI workloads with control, speed, and reliability.
  • You will join a small, senior team working directly with AI researchers, platform engineers, and product teams to build production-grade infrastructure.

Key Responsibilities:

  • Design and operate GPU clusters, including scheduling, configuration, workload isolation, and cost optimization.
  • Optimize inference pipelines for performance and cost using advanced serving frameworks and tooling.
  • Build observability systems for utilization, latency, and capacity, and drive reliability and incident response improvements.

Required Skills:

  • 5+ years in infrastructure engineering with hands-on experience operating GPU clusters and ML infrastructure in production.
  • Strong systems fundamentals in Linux, networking, containers, Kubernetes, and proficiency in Python for automation.
  • Experience with ML serving frameworks, performance tradeoffs, cost optimization, and building observable, high-availability systems.

Kraken

Kraken is a crypto exchange platform building premium financial products for traders and institutions, accelerating global crypto adoption. It is a mission-driven, fully remote company with a world-class team of crypto experts spread across more than 70 countries.

Apply for This Position