Similar Jobs

See all

Reliability & Performance:

  • Own availability, latency, and throughput SLOs across a large fleet of generative media model APIs.
  • Build monitoring, alerting, and observability to catch ML-specific failures and regressions.
  • Improve capacity planning, autoscaling, and GPU fleet efficiency for inference workloads.

Security & Safety:

  • Drive the security posture of the model fleet, including secure model serving and abuse detection.
  • Operationalize content moderation pipelines, safety classifiers, and guardrails for inference.
  • Lead incident response for model API outages and run blameless postmortems.

Collaboration & Culture:

  • Partner with model and infrastructure teams to embed reliability requirements into onboarding.
  • Work alongside a team dedicated to rapidly iterating on AI breakthroughs.
  • Contribute to a culture of automation, blameless postmortems, and continuous improvement.

Fal

Fal is the generative media ecosystem powering the next generation of AI products, providing infrastructure, tools, and model access for developers and enterprises. As a unified platform for high-performance inference, orchestration, and observability, fal is becoming the ecosystem ambitious teams build on in a market projected to grow by hundreds of billions over the next decade.

Apply for This Position