Similar Jobs
See allSenior AI Infrastructure & Platform Operations Engineer
Mirantis
Europe
Linux
Kubernetes
Networking
SRE
Fal
US
Kubernetes
Terraform
Ansible
Staff Platform Engineer (IC5)
Ditto
Global
Kubernetes
Cloud Infrastructure
AWS
Senior Site Reliability Engineer II - Infrastructure (AI Native)
Life360
Canada
Kubernetes
AWS
Python
Senior Cloud Platform Engineer
Prolific
UK
Kubernetes
Terraform
Python
Platform Engineering:
- Build and operate the multi-tenant orchestration, scheduling, and customer-facing platform layer.
- Manage multi-tenant isolation including namespaces, networking, storage, and quotas.
Infrastructure and Automation:
- Implement and operate image management, GPU operator, and node provisioning automation.
- Drive infrastructure-as-code and configuration management across the platform stack.
Collaboration and Roadmap:
- Partner with SRE on platform reliability, SLO definition, and observability.
- Participate in architecture review, design discussions, and technical roadmap.
- Support TAM and Support engineers on customer-impacting platform issues.
GPU One
GPU One provides GPU-as-a-Service (GPUaaS), turning raw GPU infrastructure into a usable cloud platform. The company is building a multi-tenant orchestration layer to serve customers at scale, with a focus on platform engineering and AI infrastructure.