Research and implement state-of-the-art techniques to accelerate AI inference: quantization, sparsity, distillation, speculative decoding, and caching.
Partner closely with hardware and compiler teams to ensure algorithmic improvements translate to real gains on custom silicon.
Build profiling tools and comprehensive benchmarking frameworks to measure model quality and efficiency.
Optimize production LLM serving with vLLM and SGLang to maximize throughput and minimize latency through batching and quantization.
Profile training runs to find bottlenecks and resolve them with attention implementations like FlashAttention on H200 and GB200 hardware.
Deploy and operate multiple models on shared GPU clusters with autoscaling, bin-packing, and efficient handling of mixed workloads.
Egen is a fast-growing technology company with a data-first mindset, partnering with clients on Google Cloud and Salesforce to drive action through data and insights. We are a team of dedicated engineers who thrive on solving tough problems and continually innovate to achieve fast, effective results.
Build and operate production-grade model serving infrastructure using vLLM, TGI, or Triton frameworks.
Design and implement auto-scaling, multi-model architectures, and intelligent request routing for ML inference.
Optimize GPU utilization, memory efficiency, and observability to ensure low-latency, cost-effective systems.
They are a distributed cloud infrastructure startup building AI-native cloud services with GPU-powered compute. The company is well-funded, fast-scaling, and operates in a remote-first environment with a focus on sustainability and decentralization.
Own the technical design and delivery of subsystems in a high-throughput, low-latency inference platform.
Develop robust API layers and SDKs that abstract complex distributed inference orchestration.
Build and harden a multi-tenant control plane for metering, rate limiting, and tenant isolation.
Stack develops revolutionary AI and autonomous systems to enhance safety and efficiency in trucking. The team has decades of experience deploying real-world systems and is committed to inclusion, entrepreneurship, and innovation.
Design and build systems that improve the efficiency of ML training and inference workloads.
Develop tooling that helps ML engineers debug, profile, optimize, and monitor model performance.
Partner with ML researchers and product teams to identify bottlenecks and drive performance improvements.
Reddit is a community of communities built on shared interests, passion, and trust, hosting the most open and authentic conversations on the internet. With over 100,000 active communities and approximately 126 million daily active users, Reddit is one of the internet's largest sources of information.
Lead AI innovation by researching and prototyping solutions using LLMs and Computer Vision for complex data extraction.
Architect scalable, cost-effective AI services and data processing pipelines for processing millions of documents daily.
Act as a force multiplier by mentoring engineering teams and driving mission-critical initiatives to production.
AlphaSense provides AI-driven market intelligence and search to help companies make informed decisions. Founded in 2011, it employs over 2,000 people globally and is trusted by over 6,000 enterprise customers, including a majority of the S&P 500.
Develop and fine-tune large language models for intelligent, safe, and responsive browser interactions.
Apply retrieval-augmented generation, summarization, classification, and intent modeling in real-world browser workflows.
Collaborate with product and engineering partners to design, iterate, and launch user-facing AI features aligned with Mozilla's values.
Mozilla Corporation is a non-profit-backed technology company that makes Firefox and Pocket, with a mission to reclaim an internet built for people. With over 225 million monthly users and a focus on AI, security, and open-source software, Mozillians work in a collaborative, mission-driven culture.
Evaluate and select cutting-edge AI models to enhance product capabilities and user experience.
Design evaluation frameworks and configure observability for AI performance in production.
Collaborate with data science, CTO, and engineering teams to fine-tune and integrate AI models.
Vetcove modernizes veterinary software and pet healthcare with a procurement marketplace, home delivery ecommerce, and practice management system. Over 25,000 hospitals across all 50 states use the platform daily, and the company is backed by Y Combinator and top venture investors.
Design and develop machine learning models for localization workflows, including machine translation and LLM finetuning.
Implement and optimize models using Python, TensorFlow, and deploy via Docker and AWS services.
Evaluate and select ML techniques, perform statistical analysis, and maintain clear documentation.
Welo Global is a leader in multilingual AI, technology, and content solutions serving over 2,000 clients in 300 languages. The company combines globally scaled multilingual infrastructure with a network of over 500,000 linguists and domain experts, backed by seven ISO certifications.
Own end-to-end AI product strategy and build adaptive AI systems for personalized tutoring.
Prototype new AI features, define product roadmap, and establish success metrics.
Partner with Engineering, Growth, and Commercial teams to drive product outcomes.
Smart Working connects skilled professionals with outstanding global teams for full-time, long-term roles. It is a remote-first company with a genuine community that values growth and well-being, and is one of the highest-rated workplaces on Glassdoor.