Collaborate with data scientists and engineers to build scalable ML pipelines, troubleshoot infrastructure issues from Linux to Kubernetes, and optimize model performance.
Drive high engineering standards, design on-premises MLOps solutions, and maintain tools for deployment and monitoring.
Refine CI/CD workflows, incorporate ML model training and evaluation into testing, and ensure seamless handover between research and production.
Collaborate with data scientists and software engineers to build scalable data pipelines and ML deployment systems.
Troubleshoot issues across the ML infrastructure stack, from Linux and Docker to Kubernetes and model serving.
Drive high engineering standards through code reviews, testing, and CI/CD enhancements.
Quillbot helps students and professionals strengthen their writing with AI-powered tools. We serve over 56 million users globally and foster a collaborative, virtual-first culture.
Own the ML serving API and deploy models to production with CI/CD and infrastructure as code.
Build monitoring, alerting, and reliability for NBA models and LLM agents.
Drive architectural decisions and mentor engineers on MLOps patterns.
Clutch is a vertical SaaS company backed by Andreessen Horowitz, revolutionizing how credit unions engage with members via fintech lending software. The company is small and ambitious, with a lean data team of five that values pragmatism and fast shipping.
Build and operate the ML lifecycle platform, including tooling for experiment tracking, model registry, and versioned pipelines.
Own CI/CD and deployment for ML workloads, building automated pipelines from notebook to production.
Make models observable and reliable in production with monitoring for latency, drift, data quality, and cost signals.
dv01 provides a data analytics platform for the structured finance market, offering transparency into investment performance and risk for lenders and Wall Street investors. With over 400 clients and coverage of over 100 million loans, dv01 is a data-first company with a diverse and innovative culture.
Evolve and maintain our Kubeflow, Feast, and Spark-on-Kubernetes ML infrastructure.
Design tools and APIs empowering teams to transition from centralized bottlenecks to self-service excellence.
Collaborate with Data Science teams to apply software engineering best practices to ML workflows.
Wellhub revolutionizes workplace wellness by connecting employees to partners for fitness, mindfulness, therapy, nutrition, and sleep in one subscription. Headquartered in NYC with team members across the globe, we value wellbeing, collaboration, and different perspectives.
Design and build scalable ML training, deployment, and inference pipelines using CI/CD and cloud infrastructure.
Implement MLOps for model versioning, monitoring, and automated retraining to detect drift and performance degradation.
Partner with Data Scientists and Product teams to productionise models and integrate ML into customer-facing products.
We develop solutions that make an impact for companies around the globe. Our culture embraces openness, acts with respect, shows grit & guts, and combines employment with enjoyment.
Design and maintain scalable ML infrastructure including data pipelines, training workflows, and model deployment systems.
Own end-to-end ML lifecycle operations, ensuring reliable delivery of models into production at scale.
Implement monitoring, telemetry, and feedback loops for ML models running across large-scale device fleets.
Our partner company develops ML systems for connected hardware products used by customers worldwide. They operate in a fast-paced, product-driven environment with a collaborative and technically ambitious culture focused on real-world ML impact.
Own and scale AI compute and deployment platforms including Kubernetes and GitOps pipelines.
Build inference infrastructure and observability stacks for LLM-powered workflows.
Drive security, compliance, and governance at the systems level in a regulated healthcare environment.
Hims & Hers is a leading health and wellness platform focused on making healthcare accessible and personal. As a publicly traded company on the NYSE (HIMS), it offers flexible/remote work and a culture centered on innovation and employee well-being.
Design and operate core AI platform components for training, deploying, and serving ML models at scale.
Own model serving and inference workflows end-to-end, optimizing for reliability, latency, throughput, and cost.
Collaborate with product, infrastructure, and security teams to build scalable platform capabilities for AI-powered features.
Mozilla Corporation is the non-profit-backed technology company behind Firefox and Pocket, with over 225 million monthly users. A wholly-owned subsidiary of the Mozilla Foundation, the company is mission-driven, employee-owned, and focused on privacy and open standards.
Lead and develop a high-performing team of MLOps engineers, fostering technical excellence and collaboration.
Define and execute the MLOps roadmap, aligning infrastructure initiatives with research, engineering, and product goals.
Design and maintain scalable ML infrastructure including automated training pipelines, CI/CD, and model serving platforms.
Our partner is a company focused on cutting-edge machine learning infrastructure for large-scale AI systems. They foster an inclusive, mission-driven culture with international collaboration and value innovation, diversity, and continuous learning.
Design and develop production-grade AI/ML services and web applications from proof-of-concept to scalable platforms.
Implement MLOps best practices, CI/CD pipelines, and cloud deployment for AI/ML workloads.
Collaborate with cross-functional teams to integrate AI capabilities into engineering workflows.
Cayuse Civil Services, LLC provides enterprise AI and engineering solutions for government and infrastructure clients. The company values innovation, excellence, collaboration, adaptability, and integrity, fostering a culture of teamwork and quality.
Build and operate production-grade model serving infrastructure using vLLM, TGI, or Triton frameworks.
Design and implement auto-scaling, multi-model architectures, and intelligent request routing for ML inference.
Optimize GPU utilization, memory efficiency, and observability to ensure low-latency, cost-effective systems.
They are a distributed cloud infrastructure startup building AI-native cloud services with GPU-powered compute. The company is well-funded, fast-scaling, and operates in a remote-first environment with a focus on sustainability and decentralization.
Design, build, and maintain scalable machine learning infrastructure on AWS, including training and deployment pipelines.
Develop and deploy ML models for recommendation systems, fraud detection, credit risk, and personalization use cases.
Implement monitoring, logging, and alerting systems to ensure model performance, stability, and reliability in production.
Our partner is a fast-growing, innovation-driven company where machine learning and AI systems directly power large-scale fintech and commerce experiences. They foster a highly dynamic environment with strong emphasis on experimentation, rapid iteration, and measurable business impact.
Design, build, and deploy AI/ML solutions from prototype to production for client business problems.
Apply generative AI and LLMs, establishing MLOps best practices including CI/CD and model monitoring.
Serve as a trusted technical advisor, translating ambiguous problems into well-scoped solutions and presenting to stakeholders.
DevIQ builds modern cloud and data solutions for mid-market companies focused on energy reduction, healthcare, education, and smart cities. The company offers competitive benefits, a strong team culture, and opportunities to work on end-to-end solutions with multi-disciplinary teams.
Drive end-to-end ML development for customer-facing SaaS products, from pipelines to production deployment and monitoring.
Design evaluation strategies and A/B tests to prove ML features improve customer outcomes and business impact.
Influence product roadmap by communicating ML capabilities and trade-offs to cross-functional teams.
WorkWave provides field service and logistics software solutions that help businesses manage their operations and serve their customers. They are a global company with a remote-first culture, recognized as a Best Place to Work and named among the top software companies worldwide.
Build and operate the real-time inference service for the risk decision engine with low latency and high availability.
Own model deployment infrastructure including CI/CD, shadow mode, and staged rollouts.
Build model observability and partner with Risk Data Science for production operation.
Mercury is a fintech company that provides banking services for startups via partner banks. The company is committed to creating a safe environment and values diversity, with a growing team focused on innovation.
Lead operational excellence, reliability, and support of enterprise AI and data platforms, ensuring stability, scalability, and observability.
Design and implement automation, monitoring, and operational tooling for AI/ML platforms including Palantir Foundry, AWS Bedrock, and SageMaker.
Serve as a senior escalation point for complex production issues, driving root cause analysis and improving platform stability.
CSAA Insurance Group, a AAA insurer, offers personal lines of property and casualty insurance to AAA members across 23 states and DC. Founded in 1914, they are one of the top personal lines insurers in the US with over 3,800 employees, known for a values-based culture and recognition in leadership development and community involvement.
Assist in managing multiregion and multicloud infrastructure, ensuring resiliency, scalability, and performance.
Support infrastructure provisioning and deployments primarily on GCP, while gaining exposure to other cloud providers.
Collaborate with development teams to design and maintain CI/CD pipelines in GitLab CI and contribute to GitOps-based deployments using ArgoCD.
Learneo is a platform of builder-driven businesses, including Course Hero, CliffsNotes, LitCharts, Quillbot, Symbolab, and Scribbr, united around supercharging productivity and learning. Each team innovates independently, supported by centralized corporate operations functions, and the company values collaboration and growth.
Build, ship, and own product features end-to-end using cutting-edge AI/ML techniques.
Apply classical ML and LLM-based approaches like RAG, prompt engineering, and fine-tuning to enhance the audit and risk platform.
Collaborate with cross-functional teams in an Agile environment to deliver scalable, production-quality code.
Optro is a leading audit, risk, ESG, and InfoSec platform trusted by over 50% of the Fortune 500. The company has been named one of the 500 fastest-growing tech companies in North America for seven consecutive years, fostering a culture of innovation and collaboration.
Build and maintain backend services, Python libraries, and model lifecycle tooling for internal ML teams.
Design and operate distributed systems for model serving, evaluation, and feature engineering.
Focus on developer experience and reliability to help teams train, deploy, and serve ML models safely.
Monzo is on a mission to make money work for everyone, offering personal and business bank accounts, savings, investments, and more through a modern digital banking platform. With around 600 engineers out of roughly 5,000 employees, we value flexibility, collaboration, and open source contributions.
Develop, deploy, maintain, operate, and support an Agentic AI Developer Platform.
Strongly oriented towards technical implementation and operation of the platform with hands-on experience.
Collaborate and lend experience to less experienced team members as needed.
We build modern Machine Learning systems for demand planning and budget forecasting, offering custom AI solutions to optimize cloud-based systems. We are a remote startup with a culture that values being data nerds, open team players, ownership, and a positive mindset.