Source Job

20 jobs similar to Senior Machine Learning Engineer - Evaluations (Design Generation)

Jobs ranked by similarity.

Global

Evaluating Canva’s AI-generated designs across formats, reviewing layouts, hierarchy, spacing, colour, typography, and tone. Running structured testing workflows to validate generative quality across presentations, video, social, print, and more. Flagging design logic failures, template structure gaps, and visual inconsistencies that automated checks might miss.

Canva collaborates with talented contractors and freelancers from all over the world to help us achieve our crazy big goals.

$85,000–$225,000/yr
US Canada

This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

US

  • Build and maintain gen AI prompts aligned with ad formats and community dynamics.
  • Improve the quality and brand safety of model outputs across text, images, and video.
  • Partner with Product and Engineering to prioritize improvements and accelerate feature development.

Reddit is a community built on shared interests and trust, home to open conversations and one of the internet’s largest sources of information.

$200,000–$225,000/yr
US Unlimited PTO

  • Support the emerging product, Night Shift, an AI research assistant.
  • Own the AI evaluation framework, working closely with Engineering (Backend, Frontend, and Design).
  • Contribute to the system architecture for agentic AI, aiming for faster, more accurate leads for officers.

Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security.

US UK

As a Principal Decision Scientist, you will define high-level business objectives directly with clients, then develop and execute the project plan to meet those objectives. You will provide technical leadership to guide development work across teams while also owning and delivering specific technical components yourself. You will design and develop feature engineering pipelines, build ML & AI infrastructure, deploy models, and orchestrate advanced analytical insights.

Aimpoint Digital is a premier analytics consulting firm with a mission to drive business value for clients through expertise in data strategy, data analytics, decision sciences

Define solution architecture and analytical frameworks for complex client business challenges. Incorporate Generative AI and advanced techniques into scalable, repeatable frameworks. Lead the integration of Generative AI into solution frameworks for information retrieval and content generation.

WNS (Holdings) Limited is a leading Business Process Management (BPM) company that combines industry knowledge with technology and analytics expertise.

$172,000–$215,000/yr

  • Design, build, and productionize ML models for fine-tuned, Retrieval-Augmented Generation (RAG), and generative AI features.
  • Build and maintain scalable data pipelines to collect high-quality training and evaluation datasets, including annotation systems and human-in-the-loop workflows.
  • Collaborate with product and engineering to iterate on datasets, evaluation metrics, and model architectures to improve quality and relevance.

Mural is pioneering how generative AI transforms visual collaboration and decision-making.

  • Review AI-generated responses and evaluate technical accuracy.
  • Provide expert feedback to train AI systems to write better code.
  • Work with various programming languages and coding challenges.

G2i connects subject-matter experts, students, and professionals with flexible, remote AI training work such as annotation, evaluation, fact-checking, and content review.

Design and implement agentic architecture, defining context management, data flow, and action orchestration. Build AI variables capable of autonomous action loops to enrich leads and trigger actions. Deliver Copilot v1, initially semi-agentic, with potential for autonomous workflows, while implementing monitoring of all output.

lemlist is a global B2B SaaS business with $43M ARR, fully bootstrapped, profitable, and growing fast, shipping one of the most loved Sales Engagement Platforms worldwide.

Europe

Shape the future of AI-powered search across all OLX verticals. Lead the design and evolution of OLX’s Search AI Platform, developing LLM- and GenAI-based systems that power discovery and relevance. Mentor and inspire data scientists and ML engineers across OLX’s global hubs, sharing best practices and shaping the future of applied ML at scale.

At OLX, we work together to build a more sustainable world through trade.

Europe 6w PTO

  • Design multi-step AI prompt chains to generate high-quality educational content.
  • Orchestrate and debug multi-step AI flows, managing the technical tooling.
  • Build and maintain automated AI workflows using platforms such as n8n.

Kognity is a 125-person EdTech scale-up powering learning in 120+ countries through its intelligent platform that combines rich pedagogy with smart AI.

  • Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows.
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control.
  • Improve reliability, performance, and safety across existing Python codebases.

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

$172,000–$215,000/yr

  • Build and refine early-stage AI systems and frameworks.
  • Translate complex problems into structured engineering approaches.
  • Champion technical excellence and mentorship in AI development.

The AI Innovation Team at Mural is pioneering how generative AI transforms visual collaboration and decision-making.

North America Canada

Distill customer feedback into a cohesive product vision. Own end-to-end feature development by defining product requirements and managing development & testing. Maintain a perspective on the evolving generative AI landscape to feed product evolution.

ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®.

Drive global AI research delivery across distributed teams. Coordinate cross-time-zone research programs and ensure high-quality research operations. Partner with Product, Engineering, and Go-to-Market teams to identify research output suitable for productisation.

Canva was founded to democratize design and empower everyone in the world to design, it is an online design and publishing tool with a mission to empower everyone in the world to design.

  • Design and implement interfaces across the platform for compute orchestration and RL training.
  • Translate complex backend systems into intuitive, production-ready product experiences.
  • Build for technical audiences, including AI and general software engineers.

Prime Intellect makes frontier AI accessible to everyone and enables individuals/organizations to train models using their agentic training infrastructure.

$175,000–$200,000/yr
US

The AI Architect will drive Magna’s AI vision from concept to commercialization. You will design and implement AI systems and lead development teams. You will also partner with executives to identify opportunities, drive adoption, and deliver measurable value.

Magna Legal Services provides end-to-end legal support services to law firms, corporations, and governmental agencies throughout the nation.

$149,000–$350,000/yr
US

  • Drive fundamental and applied research in AI.
  • Build cutting edge Generative AI models, using techniques like Supervised Finetuning (SFT), Reinforcement Learning (RL), prompt improvements and synthetic data generation
  • Collaborate closely with product managers and engineers to transform user feedback into requirements for AI systems.

Figma’s platform helps teams bring ideas to life—whether you're brainstorming, creating a prototype, translating designs into code, or iterating with AI.

US North America

  • Design complex LLM prompts that accurately represent real customer journeys and service interactions.
  • Partner with Field Engineers to transform raw data into structured, high-quality tasks for model training.
  • Annotate and review tasks to ensure strict quality standards and alignment with expected customer outcomes.

Welo Data works with technology companies to provide datasets that are high-quality, ethically sourced, relevant, diverse, and scalable to supercharge their AI models.