Infrastructure Ownership:

  • Own cloud infrastructure on AWS, including EC2, EKS, RDS, S3, IAM, and VPC.
  • Manage Kubernetes clusters and container orchestration end-to-end, ensuring system reliability and scalability.

Development and Reliability:

  • Build and maintain CI/CD pipelines using tools like GitHub Actions and implement monitoring stacks with Prometheus or DataDog.
  • Improve the reliability, performance, and security of production systems and automate infrastructure with Terraform or similar IaC tools.

Collaboration and Support:

  • Work closely with engineering and ML teams to support AI data pipelines and debug issues across complex, distributed systems.
  • Participate in design reviews and help elevate the overall infrastructure standards of the team.

Bespoke Labs

Bespoke Labs is an AI research and data company building the datasets, benchmarks, and evaluation infrastructure that power frontier AI models. It is a small, fast-moving team backed by leading investors, trusted by top AI labs, and publishes research at leading conferences.

Apply for This Position