Weβre looking for a Senior Infrastructure Engineer to help us design, build, and scale the foundational architecture that powers our next-generation AI systems. This role is ideal for someone who thrives in a fast-paced, engineering-driven environment and finds joy in creating robust, elegant systems from scratch.
Build and maintain stable, scalable, and highly available compute infrastructure , spanning cloud (AWS) and bare metal environments. Design and operate efficient storage solutions for large-scale AI training datasets and checkpoints. Develop high-performance online inference systems , optimizing for diverse GPU environments (e.g., H100, B200). Automate infra workflows to maximize reliability, observability, and performance across our platform. Collaborate closely with AI researchers and backend engineers to support evolving model deployment and experimentation needs. Lead and contribute to internal tooling, CI/CD pipelines (e.g., GitHub Actions), and monitoring infrastructure (e.g., Grafana, Prometheus, OpenTelemetry).