Develop prompts and guardrails for domain-specific LLM applications.
Implement hallucination detection, mitigation, and fact-checking mechanisms.
Robots & Pencils builds meaningful, scalable digital products by blending strategy, design, and engineering. They are a small, senior team with direct access to enterprise clients.
Design, build, and deploy AI Agents including custom tools, prompt engineering, orchestration workflows, and agent design patterns.
Contribute to the backend infrastructure powering Candidly's AI capabilities, including API development, data integrations, and data pipelines.
Work closely with stakeholders across product, design, engineering, and leadership to translate complex AI concepts into actionable strategies and features.
Candidly, founded in 2016, is the category leader with the market’s most comprehensive AI-driven student debt and savings optimization platform. They partner with hundreds of top employers, financial institutions, and retirement record keepers, positioning Candidly to serve more than 35 million Americans. Candidly is a high-growth, Series B startup, funded by leading investors with an international team of 70 (and counting).
Implement features for AI applications such as conversational assistants and copilots and text generation, summarization, and content classification.
Design and optimize prompts and system instructions to improve task completion, reliability, and latency, minimize hallucinations and toxic/unsafe outputs and implement structured outputs.
Write unit, integration, and regression tests for AI features, run evaluation scripts and log results for model quality metrics, and work with AI observability tools under guidance.
RealPage is at the forefront of the Generative AI revolution, dedicated to shaping the future of artificial intelligence within the Property Tech domain. Our Agentic AI team is focused on driving innovation by building next generation AI applications and enhancing existing systems with Generative AI capabilities.
Designing complex, dynamic prompt templates with conditional logic.
Implementing various response schemes to ensure AI outputs are predictable.
Building robust evaluation pipelines and using Langfuse to collect feedback.
Ruby Labs is a leading tech company that creates and operates innovative consumer products. We offer a diverse range of opportunities across the health, education, and entertainment industries, and our innovative teams are driving the future of consumer-led products.
Design and implement AI-powered features end to end, including prompts, agents, tools, retrieval, evaluation, and feedback loops.
Build agent systems that interact safely with infrastructure, codebases, and deployment pipelines.
Integrate LLMs deeply into product workflows as core platform primitives.
SuperPlane is an AI-native DevOps control plane with a mission to build the platform teams use to ship and manage software in the AI era. They are a fast-moving company aiming high, rethinking DevOps from first principles for the AI era to create a single control layer for engineers and agents to collaborate safely.
Own the LLM + retrieval + context layer that makes copilots accurate and fast.
Design and ship the end-to-end pipeline, improving quality and trust via evaluation.
Reduce cost/latency with a concrete inference optimization plan shipped to production.
Ethos is built to make it faster and easier to get life insurance. They blend industry expertise, technology, and the human touch to find the right policy to protect loved ones and have been named on CB Insights' Global Insurtech 50 list and BuiltIn's Top 100 Midsize Companies in San Francisco.
Design, build, and scale enterprise-grade AI/ML systems that power internal workflows and external-facing AI/ML platforms.
Develop a production-ready Generative AI and MLOps platform with reusable components used to deploy multiple AI solutions across Natera’s business units.
Implement cloud-native infrastructure for large-scale model training and serving using Kubernetes, MLflow, Terraform, and AWS-native services
Natera is a global leader in cell-free DNA (cfDNA) testing. They are dedicated to oncology, women’s health, and organ health, aiming to make personalized genetic testing and diagnostics part of the standard of care. The Natera team consists of highly dedicated statisticians, geneticists, doctors, laboratory scientists, business professionals, software engineers and many other professionals from world-class institutions.
Build and maintain an internal LLM gateway that handles routing, fallbacks, and rate limiting
Create reusable components for common AI patterns (RAG, function calling, streaming responses)
Develop SDKs or libraries that simplify AI integration for application developers
ButterflyMX empowers people to open and manage doors & gates from a smartphone and their products are installed in multifamily, commercial, and gated communities. As a distributed workforce, they're looking for intelligent, collaborative, and down-to-earth individuals to join their growing team.
Own complex, full-stack AI solutions end-to-end, from applied research to production deployment.
Set technical direction for ambiguous and high-impact use cases, while scaling the AI systems.
Mentor others, lead architectural decisions, and deepen Komodo’s AI-first culture.
Komodo Health is dedicated to reducing the global burden of disease by leveraging data. They have built the Healthcare Map, the industry’s largest view of the U.S. healthcare system. At Komodo, employees are ambitious, supportive, and passionate about delivering on its mission.
Design and deliver AI-powered advisors, assistants, and analytic agents.
Build and maintain high-quality, production-ready Python services.
Apply, adapt, and fine-tune foundation models to deliver reliable AI experiences.
Energage helps organizations turn employee feedback into useful business intelligence and credible employer recognition through Top Workplaces. Built on culture research and the results from 23 million employees surveyed across more than 70,000 organizations, Energage delivers the most accurate competitive benchmark available.
Work with other engineers on a wide variety of AI engineering tasks to improve our existing applied AI systems
Identify new opportunities to apply emerging AI capabilities to different parts of the Poe product
Take end-to-end ownership of applied AI systems - from prototyping, data pipelines, model optimization/evaluation to reliable deployment at scale
Quora's mission is to grow the world's collective intelligence. They have two platforms: Quora, a global knowledge sharing platform, and Poe, a platform to chat, explore and build with AI language models. They have a culture rooted in transparency, idea-sharing, and experimentation.
Work with our team to understand our Archie capability roadmap and decompose capabilities into technical development.
Turn capability prototypes and PoCs from our AI research team into robust, scalable implementations.
Diagnose and solve technical problems identified by our team or users.
P-1 AI is building an engineering AGI, focusing on the built world. They are a small team tackling an ambitious problem, aiming to put an Archie on every engineering team at every industrial company on earth.
Design, implement, and maintain high-performance ML training and inference platforms.
Ship tools that allow any ML engineer to deploy a model in minutes, not days.
Improve scalability, reliability, and cost efficiency of model training and serving systems.
Speechify's mission is to make sure that reading is never a barrier to learning. With nearly 200 people around the globe working in a 100% distributed setting, Speechify's team includes frontend and backend engineers, AI research scientists, and others.
Design agentic systems & ship AI to production: Turn prototypes into resilient, observable services with clear SLAs, rollback/fallback strategies, and cost/latency budgets.
Build tool‑using LLM “agents” (task planning, function/tool calling, multi‑step workflows, guardrails) for tasks like grant discovery, application drafting, and research assistance.
Own RAG end‑to-end: Ingest and normalize content, choose chunking/embedding strategies, implement hybrid retrieval, re‑ranking, citations, and grounding.
Instrumentl is a hyper-growth YC-backed startup that provides a SaaS platform to help nonprofits discover, track, and manage grants efficiently. They have over 4,000 nonprofit clients and are cash flow positive, doubling year-over-year, with customers who love them.
Design, optimize, and version prompts for production voice and chat LLM applications.
Architect and orchestrate multi-agent systems for complex conversations.
Build automated testing and validation frameworks for LLM outputs.
Tuotempo transforms healthcare experiences through intelligent digital solutions and is a trusted patient engagement platform powering some of Europe and Latin America's leading healthcare institutions. They have a remote-first culture with vibrant hubs in Bologna or Barcelona.
Build and operate scalable backend services and internal APIs for the AI platform.
Integrate LLMs and AI tool execution into reliable, production-ready workflows.
Own production reliability for AI platform infrastructure through observability, alerting, and incident response.
MaintainX is the world's leading Asset and Work Intelligence platform for industrial and frontline environments. They are a modern IoT-enabled cloud-based tool for reliability, safety, and operations on physical equipment and facilities, powering operational excellence for 13,000+ businesses. MaintainX recently completed a $150 million Series D round, at a valuation of $2.5 billion.
Drive Prompt’s mission to improve healthcare through modern technology including AI
Lead AI projects from ideation → architecture → production → iteration until tools are widely adopted and loved!
Design, build, and deploy end-to-end AI systems across both traditional ML and LLM-based workflows
Prompt delivers highly automated and modern B2B enterprise software to rehab therapy businesses, their teams, and most importantly the patients they serve. They’ve established themselves as the go-to platform in the space and are rapidly growing their market share by delivering software people love.
Rollstack is revolutionizing how businesses share data and insights by fully automating the creation of slide decks and documents. They are a remote-friendly workplace backed by Insight Partners and Y Combinator, with a diverse team that values intelligence and kindness.
Design and implement comprehensive evaluation frameworks that reflect real-world task success for agentic systems, with a focus on human+AI collaboration outcomes
Build benchmarking pipelines that capture nuanced success indicators including trust calibration, intervention frequency, and agent handoff quality
Collaborate with researchers, engineers, and product teams to align evaluation methodologies with business and user goals
Upwork is the world’s human and AI-powered work marketplace that connects businesses with highly skilled, AI-enabled independent talent from across the globe. From entrepreneurs to Fortune 100 enterprises, companies rely on Upwork’s trusted platform to find and hire expert talent. They have facilitated more than $25 billion in economic opportunity for talent around the world and their culture is built on trust, risk-taking, customer focus, and excellence.
Build AI-Powered Features: Design, develop, and deploy production-grade AI applications that solve real customer problems.
Architect Scalable Systems: Create robust backend architectures that support AI workloads, ensuring low latency and high reliability.
Drive AI Innovation: Implement and optimize agentic AI systems, RAG pipelines, and multi-agent workflows using modern LLM frameworks.
Procurify is the AI-enhanced procurement and AP automation platform for mid-market organizations. They help organizations take control of spend and save money as a remote-first company with a big heart and a strong ambition to modernize the way organizations manage business spend.