Inference Performance:
- Make our models faster and cheaper with techniques like speculative decoding and quantization.
- Optimize serving configurations, GPU selection, and batching strategies.
- Address spiky traffic patterns from meeting schedules.
Fine-tuning Pipelines:
- Build repeatable infrastructure for AI Engineers to deploy models faster.
- Create pipelines to train adapters for domain-specific behavior.
- Distill large teacher models for classification tasks.
How You’ll Help Us Win:
- Discover performance bottlenecks across GPU families.
- Evaluate serving frameworks with speculative decoding.
- Optimize GPU spend based on workload characteristics.
Fathom
Fathom eliminates the needless overhead of meetings with an AI assistant that captures, summarizes, and organizes key moments. They are a small company that creates magical experiences through focused builders and values a supportive environment.