Inference Performance:

Make our models faster and cheaper with techniques like speculative decoding and quantization.
Optimize serving configurations, GPU selection, and batching strategies.
Address spiky traffic patterns from meeting schedules.

Fine-tuning Pipelines:

Build repeatable infrastructure for AI Engineers to deploy models faster.
Create pipelines to train adapters for domain-specific behavior.
Distill large teacher models for classification tasks.

How You’ll Help Us Win:

Discover performance bottlenecks across GPU families.
Evaluate serving frameworks with speculative decoding.
Optimize GPU spend based on workload characteristics.

Fathom

Fathom eliminates the needless overhead of meetings with an AI assistant that captures, summarizes, and organizes key moments. They are a small company that creates magical experiences through focused builders and values a supportive environment.

Apply for This Position