Similar Jobs

See all

Infrastructure and Orchestration:

  • Design resource management systems provisioning and orchestrating compute across AWS, GCP, and Azure using infrastructure-as-code (Pulumi/Terraform).
  • Handle dynamic scaling, state synchronization, and concurrent operations across hundreds of heterogeneous nodes.
  • Architect fault-tolerant infrastructure for distributed ML. GPU clusters, NVIDIA runtime, S3 checkpointing.

Networking and Data Handling:

  • Build systems that simulate and handle real-world network conditions — bandwidth shaping, latency injection, packet loss.
  • Manage dynamic node churn and ensuring efficient data flow across workers with heterogeneous connectivity, because our training happens on consumer nodes and non co-located infrastructure, not in a datacenter.
  • Handle Large dataset management and streaming.

Required Skills:

  • Experience in a startup environment with an emphasis on micro-services orchestration or big tech background experience.
  • Deep understanding of multi-cloud infra & distributed training systems.
  • Excellent be a team player with high attention to detail.

Pluralis Research

Pluralis Research is pioneering Protocol Learning—a fully decentralised way to train and deploy AI models that opens this layer to individuals rather than well resourced corporates.

Apply for This Position