Source Job

$150,000–$220,000/yr
US Unlimited PTO

  • Incorporating the best research work on agents and code generation into the OpenHands framework
  • Performing novel improvements in areas of interest to improve agent performance and efficiency
  • Running and implementing evaluations to ensure agent quality

LLM AI

20 jobs similar to Member of the Technical Staff

Jobs ranked by similarity.

Europe

  • Design, implement, and maintain SFT and RL post-training pipelines for multi-step coding agents.
  • Train and adapt LLMs for agent workflows, including planning, tool use, and multi-step interactions inside JetBrains IDEs.
  • Build and develop evaluation and simulation environments where coding agents can act, be measured, and compared on realistic developer tasks.

At JetBrains, code is their passion and they strive to make the strongest, most effective developer tools on earth. Today, AI-powered assistance and agents are becoming a core part of how developers work in their IDEs.

  • Review AI-generated responses and evaluate technical accuracy.
  • Provide expert feedback to train AI systems to write better code.
  • Work with various programming languages and coding challenges.

G2i connects subject-matter experts, students, and professionals with flexible, remote AI training work such as annotation, evaluation, fact-checking, and content review.

  • Design, develop, and maintain a robust platform to enable users to create and manage AI agents.
  • Integrate and work with multiple LLMs, ensuring seamless orchestration and scalability.
  • Develop and implement evaluation frameworks for testing AI agents in challenging and complex scenarios.

ClickUp is building the first truly converged AI workspace, unifying tasks, docs, chat, calendar, and enterprise search, all supercharged by context-driven AI.

  • Build AI agents and tools that transform how developers write code and debug issues.
  • Architect and implement AI-powered tools such as code review assistants and automated test generators.
  • Collaborate with the Principal Engineer and product/design teams in a remote-first environment.

Docker makes app development easier so developers can focus on what matters.

Build resilient AI Agents using LangGraph and microservices. Develop complex automation workflows in n8n. Collaborate with Internal Business Analysts to focus on coding, not guessing requirements.

At Gcore, you’ll help design and deliver that foundation for an AI-driven world, being a global provider of infrastructure and software solutions for AI, cloud, network, and security.

Design and implement agentic architecture, defining context management, data flow, and action orchestration. Build AI variables capable of autonomous action loops to enrich leads and trigger actions. Deliver Copilot v1, initially semi-agentic, with potential for autonomous workflows, while implementing monitoring of all output.

lemlist is a global B2B SaaS business with $43M ARR, fully bootstrapped, profitable, and growing fast, shipping one of the most loved Sales Engagement Platforms worldwide.

US

  • Architect and scale production-grade Generative AI powering patient conversations, provider decision support, and clinical operations.
  • Design systems that are safe, observable, and built for scale in a regulated environment.
  • Set technical direction, mentor other engineers, and help shape Rula’s AI roadmap.

Rula is committed to treating the whole person and aims to create a world where mental health is no longer stigmatized. They are passionate about making a positive impact on the lives of those struggling with mental health issues.

$200,000–$225,000/yr
US Unlimited PTO

  • Support the emerging product, Night Shift, an AI research assistant.
  • Own the AI evaluation framework, working closely with Engineering (Backend, Frontend, and Design).
  • Contribute to the system architecture for agentic AI, aiming for faster, more accurate leads for officers.

Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security.

North America Canada

  • Lead domain-specific model optimization using PEFT (LoRA/QLoRA) and knowledge distillation to balance cost, latency, and reasoning capability.
  • Build next-gen Retrieval-Augmented Generation pipelines using hybrid search, cross-encoders, and self-correcting retrieval loops.
  • Design and deploy multi-agent systems using frameworks like LangGraph or CrewAI, enabling autonomous task planning and tool-use (Function Calling).

ServiceNow is a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Their intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work.

US

  • Fix bugs without writing code or waiting for engineering.
  • Build AI agents that handle entire support scenarios end-to-end.
  • Consult with users on achieving real outcomes, building infrastructure that scales, and collaborating with product and engineering.

We're building an AI‑native workspace—an operating system for work that puts co‑intelligence at the center.

$80,000–$150,000/yr

  • Research, Document, Test, and Ideate: Explore the best ways to achieve our customers’ goals using LLMs and other AI tools.
  • Master Our Dialogue Platform: Become an expert, answer questions, and train others on prompting both within and outside of our platform.
  • Train Our AIs: Utilize prompting, knowledge-base creation, and fine-tuning to enhance our AI capabilities.

1mind is a platform that deploys multimodal Superhumans for revenue teams, combining a face, a voice, and a GTM brain. The company has a remote-first, fast-moving culture with ownership, autonomy, and impact from day one.

$115,747–$208,344/yr
US

Design, implement, and refine prompts and conversational flows for agentic automation, ensuring high-quality consumer self-service experiences. Develop innovative, scalable components with JavaScript in a specialized scripting framework. Leverage Generative AI coding assistants to accelerate development, improve code quality, and enhance productivity throughout the software lifecycle.

Experian is a global data and technology company, powering opportunities for people and businesses around the world.

  • Operate and maintain AI systems based on PIPE44 in client environments.
  • Monitor AI behavior, execution outcomes, usage patterns, and costs.
  • Support the rollout of new agents and controlled scaling of existing systems.

SPACE44 builds and operates software systems for companies that need technology to work reliably in real, day-to-day operations.

$85,000–$225,000/yr
US Canada

This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

US

  • Build the shared AI execution platform that powers every AI product.
  • Shape the unified architecture, guide tough tradeoffs, and elevate engineering standards.
  • Ensure the platform is scalable, safe, cost-efficient, and easy to build on.

Zapier builds a platform to help millions of businesses scale with automation and AI, aiming to make automation work for everyone.

US Europe Unlimited PTO

  • Contribute to and review PRs for dapr/dapr-agents, dapr/python-sdk, dapr/durabletask-python and dapr/docs upstream
  • Collaborate with partners on integration of Agentic Frameworks into the Dapr ecosystem
  • Participate in promoting the Dapr project with Blog posts, recordings and demos

Diagrid provides developers with APIs and tools that help them focus on their code and not on infrastructure.

Canada

  • Shape AI-enabled development at Jane by setting a clear strategy for how engineers ideate, code, test, review, and ship with AI.
  • Prototype often, share what you learn, and model best practices by building small, high-impact tools that others can use.
  • Lead and support a small senior team while continuing to contribute technically, whether that means pairing with engineers, reviewing designs, or jumping into code when it matters most.

Jane is a team that's all about fostering growth, spreading delight, and serving our healthcare community by simplifying the lives of healthcare practitioners and patients daily.

  • Leverage professional experience to evaluate AI models' output in your field.
  • Assess content and deliver feedback to strengthen the model’s understanding.
  • Work independently from anywhere, with flexible hours and no minimum commitment.

Handshake is a recruiting platform. They connect students and recent graduates with employers.

$60,000–$90,000/yr
US

  • Formulate and execute small, high-leverage research projects aligned with our product roadmap.
  • Independently build and validate end-to-end prototypes.
  • Design and run experimental pipelines autonomously, including setting up research environments and defining evaluation metrics.

ZetaChain is building the first universal blockchain and AI platform that connects everything—Bitcoin, Ethereum, Solana, and more—while pioneering in the GenAI space. They are backed by top investors, live on mainnet, and building the future of blockchain and AI technology.