Review and label content for sentiment, factual accuracy, and reasoning issues.
Evaluate model outputs across quality dimensions using scoring frameworks.
Validate automated assessments and identify discrepancies or errors.
Welo Data provides AI services helping to develop and evaluate large language models (LLMs). The job posting does not provide information regarding the company's size and culture.
Engage with clients to understand their business goals, customer personas, and resolution paths.
Write, test, and refine AI prompts and system instructions to optimize clarity, tone, and model performance across different channels.
Design experiences that gracefully handle multi-intent queries, context switching, and proactive interactions.
Netomi is the leading agentic AI platform for enterprise customer experience. They work with the largest global brands to enable agentic automation at scale across the entire customer journey and help enterprises drive efficiency, lower costs, and deliver higher quality customer experiences.
Architect and build agentic workflows that combine large language models, reasoning components, and data pipelines.
Lead the design and development of advanced ML/NLP products, from ideation to production.
Drive experimentation with new approaches for agentic reasoning, coordination, and autonomous system design.
SmartRecruiters is the Recruiting AI Company that transforms hiring for the world’s leading enterprises. Built for global scale, SmartRecruiters, an SAP company, delivers an AI-powered hiring platform that automates and optimizes the entire talent acquisition process. They are a values-driven, globally focused tech company with strong financial backing and a bold vision for the future of work and foster a collaborative and inclusive work environment.
Be responsible for the end-to-end technical migration workflow for transitioning templates to LLM autoraters.
Use client’s internal tools to leverage prompt engineering techniques to maximize model performance.
Solve edge-case scenarios by designing and refining manual prompts.
Welo Data provides AI services and focuses on data validation. Although the job posting does not say anything about the size of employees or culture, they seem like a fast growing company.
Osano is an innovative B-Corporation focused on giving modern enterprises the ability to innovate quickly and earn customer trust by respecting data privacy and complying with consent guidelines. We are scaling fast with a multi-year runway and ambitious growth plans.
n8n is the open workflow orchestration platform built for the new era of AI. They give technical teams the freedom of code with the speed of no-code, so they can automate faster, smarter, and without limits. Since their founding in 2019, they’ve grown into a diverse team of over 220 working across Europe and the US, connected by a shared builder spirit and with their centre of gravity in Berlin.
Design and curate evaluation datasets for retrieval quality.
Measure retrieval quality using metrics like Recall@k, Precision@k, MRR, and NDCG@k.
Conduct systematic error analysis on AI/ML system outputs; build structured failure taxonomies.
Jump empowers financial advisors, firms, and clients to thrive in the age of AI by automating tasks like meeting prep and compliance. As a Series A company, Jump has raised $30M and grown to 100+ employees including leaders from top companies and schools, fostering a culture of velocity, world-class standards, direct communication, and kindness.
Architect and implement AI-powered capabilities: code generation, intelligent node creation, and workflow optimization
Integrate LLM APIs and embedding models for text-to-workflow and natural language code suggestions
Design and iterate on prompts to improve model output and user experience
N8n is an open workflow orchestration platform built for the new era of AI. They give technical teams the freedom of code with the speed of no-code, so they can automate faster, smarter, and without limits. The company has a diverse team of over 220 working across Europe and the US.
You'll work with AI tools, test model outputs, and evaluate responses.
Document errors, gaps, and collaborate with our team.
Spot inconsistencies and provide structured feedback.
Project World Wide is involved in shaping the future of AI through training data. They seek motivated individuals to contribute to the development of cutting-edge AI systems.
Research & Train: Design, train, and evaluate our proprietary deep learning models.
High-Performance ML Systems: Optimize our models for maximum inference speed and efficiency, ensuring they can handle massive datasets and real-time workloads at scale.
Deepslate is building Speech to Speech Voice AI models that sound and act indistinguishable from a human, believing everyone should be able to use it. Backed by top-tier investors from the Tech and AI sectors, as well as a major German VC fund, they are incredibly well-funded and moving fast.
Design and Develop machine learning infrastructure, tooling, and models to help teams deliver world class experiences.
Help product and development teams understand the data lifecycle and the inherent experimental nature of machine learning.
Build internal products and platforms to enable teams to incorporate AI into their features and customer facing products.
Weave provides an all-in-one platform for small businesses to streamline communications, and patient experiences. The company has a phenomenal culture, and Weave's teams are cross-functional agile teams composed of a product owner, backend and frontend devs and devops.
Develop and enhance our AI-powered learning platform using TypeScript/React on the frontend and Python/FastAPI on the backend.
Build responsive web applications for diverse learners across desktop and mobile.
Integrate Large Language Models for content classification, skill assessment, learner feedback, and personalization.
EnGen offers an AI-powered approach to English instruction, designed to solve a systemic access issue. A Certified B Corporation, EnGen partners with employers, adult educators, workforce development organizations, and state governments.
Review, analyze, and rank AI-models' chains of thought for correctness and approach.
Provide clear, constructive feedback to improve AI-generated responses.
An Enterprise client is seeking talents who are fluent in English who will help train generative artificial intelligence models. They seem to maintain a contractor-based work environment.
Migrate and test existing bulk flashcard creation prompts.
Run test suites and manually review AI outputs for quality and correctness.
Analyze real user data to identify failure patterns and improve prompts.
Brainscape is the world's leading web & mobile EdTech study platform. They help millions of learners create better flashcards and the company is looking for an AI Prompt Engineer to join their team.
Build AI-powered systems to automate and improve workflows.
Work closely with business teams to understand processes and pain points.
Use AI coding agents to build software more rapidly than traditional methods.
M3 USA delivers digital solutions to healthcare, life sciences, and pharmaceutical industries. They focus on physician communities globally and have a dynamic, innovative work environment.
Collaborate with Cohere’s Modelling Safety team on implementing novel research ideas.
Conduct cutting-edge machine learning research, training and evaluating production large language models.
Focus on research projects aimed at making models better understood, safer, more reliable, more inclusive, and more beneficial for the world
Cohere scales intelligence to serve humanity. They train and deploy frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search and agents. They are a team of researchers, engineers, and designers.
Build and deploy AI models with RAG and tool calling for various product features.
Collaborate with frontend and backend engineers to bring AI features into production.
Stay updated with the latest AI research and propose innovative applications relevant to Finom’s mission.
Finom is a European tech startup headquartered in Amsterdam, revolutionizing financial services for entrepreneurs. They offer an all-in-one B2B financial solution integrating banking, accounting, financial management, and invoicing into a seamless, mobile-first platform, actively expanding across key EU markets.
Utilize Automatic Prompt Generation (APG) tools to create baseline prompts.
Run and supervise Automated Prompt Optimization (APO) tool.
Manually draft, test, and refine prompts to navigate complex template architectures.
Welo Data is an AI services company. They focus on data validation. The company seems to have a modern and innovative culture, based on the description.
Building a truly flexible and scalable conversational AI platform.
Fine-tuning and evaluating LLM-based models to improve performance.
Contributing to platform engineering across both ML and backend systems.
Canva is a design platform that allows users to create social media graphics, presentations, posters, documents and other visual content. They have a campus in Sydney, and a second campus in Melbourne and co-working spaces in Brisbane, Perth, Adelaide, and Auckland, NZ.