Source Job

$150,000–$225,000/yr
Global Unlimited PTO 26w maternity

  • Implement AI benchmarks within our evaluation infrastructure to expand the suite of capabilities we track.
  • Develop our existing suite of benchmarks so we can quickly and painlessly evaluate new model releases.
  • Contribute to the development of brand new benchmarks and prototype your own ideas.

Software Engineering AI Benchmarking Problem Solving

20 jobs similar to Software Engineer, Benchmarking

Jobs ranked by similarity.

US Unlimited PTO

  • Drive the development and deployment of advanced AI models and systems that power Abacus products.
  • Oversee software development teams (30+ engineers) across multiple product lines, guiding architecture decisions and coding standards.
  • Lead critical projects in AI and data engineering, ensuring roadmap delivery and maintaining high quality standards.

Abacus Insights aims to unlock the power of data, enabling health plans to deliver the right care at the right time. Backed by $100M, they foster a bold, curious, and collaborative environment where innovation starts with people working together to make an impact.

NAMER EMEA

In this role, you will drive the strategic and technical direction of AI engineering across multiple product lines, shaping how advanced AI capabilities power user-facing automation experiences. You will lead high-impact engineering teams, guiding experimentation, architecture decisions, and end-to-end delivery of AI-driven solutions. Working closely with product, design, and go-to-market teams, you will help define the roadmap, unify core AI initiatives, and turn bold ideas into scalable outcomes.

We use an AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly against the role's core requirements.

North America 4w PTO

  • Establish the technical vision for our AI product infrastructure.
  • Develop frameworks that make LLM integration seamless and reliable; building APIs and SDKs that allow LLMs to interface with Wealthsimple data and functionality.
  • Build new AI-powered product capabilities from 0 → 1; collaborating directly with product teams to bring AI features to life.

Wealthsimple is on a mission to help everyone achieve financial freedom by reimagining what it means to manage your money.

$220,000–$250,000/yr
US Unlimited PTO

  • Partner with the Senior Product Manager to translate AI product strategy into a clear technical roadmap.
  • Design, implement, and maintain backend services and APIs that power AI-enabled features for customers and internal users.
  • Mentor and level up engineers in AI/ML best practices, distributed systems design, and platform thinking.

Binance.US is America’s home to buy, trade, and earn digital assets and is a licensed and regulated U.S. crypto platform.

$160,000–$190,000/yr

  • Design, implement, and deploy AI-powered features, including model training, fine-tuning, and prompt engineering workflows.
  • Translate product requirements into robust, production-ready AI solutions, working with Product Managers, Software Engineers, and Data Scientists.
  • Optimize models and infrastructure for scalability, latency, and cost efficiency, partnering with DevOps and MLOps to ensure reliable and maintainable AI pipelines.

Paper is reimagining how schools support students so that every learner can reach their full potential.

$145,052–$164,014/yr
Europe Unlimited PTO

  • Lead an engineering team delivering high-quality products customers love.
  • Drive an AI-first approach across product development and engineering workflows.
  • Coach engineers effectively, support career growth, and raise the talent bar by developing and hiring individuals.

HackerOne is a global leader in Continuous Threat Exposure Management (CTEM).

$160,000–$220,000/yr
US Unlimited PTO

  • Develop and maintain internal tools and systems that automate existing work and increase employee productivity using AI.
  • Collaborate with teams across the business to understand pain points and identify high-impact automation opportunities.
  • Rapidly prototype small AI-enabled utilities or automations and deploy them into production swiftly.

Pomelo Care is a multi-disciplinary team of clinicians, engineers and problem solvers who are passionate about improving care for moms and babies.

US

  • Architect and scale production-grade Generative AI powering patient conversations, provider decision support, and clinical operations.
  • Design systems that are safe, observable, and built for scale in a regulated environment.
  • Set technical direction, mentor other engineers, and help shape Rula’s AI roadmap.

Rula is committed to treating the whole person and aims to create a world where mental health is no longer stigmatized. They are passionate about making a positive impact on the lives of those struggling with mental health issues.

$163,000–$247,000/yr
US Canada

  • Architect, build, and test Gusto’s product suite that spans across Payroll, Benefits, HR, Time, Tax Credits, and more.
  • Mentor other engineers to help solve some of the hardest technical problems out there in very complex domains and at a large scale.
  • Collaborate with Product Management and Design teams to understand customer pain points and iterate to launch daily.

Gusto is on a mission to grow the small business economy by handling payroll, health insurance, 401(k)s, and HR for over 400,000 small businesses.

$200,000–$225,000/yr
US Unlimited PTO

  • Support the emerging product, Night Shift, an AI research assistant.
  • Own the AI evaluation framework, working closely with Engineering (Backend, Frontend, and Design).
  • Contribute to the system architecture for agentic AI, aiming for faster, more accurate leads for officers.

Flock Safety is the leading safety technology platform, helping communities thrive by taking a proactive approach to crime prevention and security.

$259,300–$305,000/yr
US Unlimited PTO

  • Own end-to-end implementation of AI-powered product features, from prototypes to production.
  • Mentor other engineers on the team, leveling up the team as a whole.
  • Collaborate across the organization to support shipping these features to production.

Honeycomb is a service for the near and present future, defining observability and raising expectations of what developer tools can do!

  • Design, develop, and deploy AI-driven applications and services.
  • Build and integrate LLM-based solutions using frameworks such as LangChain.
  • Collaborate with product managers, designers, and data teams to translate business requirements into technical AI solutions.

CI&T are tech transformation specialists, uniting human expertise with AI to create scalable tech solutions. With over 8,000 employees around the world, they value diverse identities and life experiences, fostering a diverse, inclusive, and safe work environment.

North America Canada

  • Lead domain-specific model optimization using PEFT (LoRA/QLoRA) and knowledge distillation to balance cost, latency, and reasoning capability.
  • Build next-gen Retrieval-Augmented Generation pipelines using hybrid search, cross-encoders, and self-correcting retrieval loops.
  • Design and deploy multi-agent systems using frameworks like LangGraph or CrewAI, enabling autonomous task planning and tool-use (Function Calling).

ServiceNow is a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Their intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work.

As a Senior Software Engineer at Fluxon, you'll bring products to market while learning and growing with our team. You'll drive end-to-end implementations, collaborating with your team in a dynamic environment. You'll also engage directly with clients to understand business goals and debug production issues.

We are Fluxon, a product development team founded by ex-Googlers and startup founders offering full-cycle software development.

US

  • Oversee the development and delivery of innovative software products aligned with customer needs and business objectives.
  • Lead and mentor a team of Software Product Managers while engaging with cross-functional stakeholders.
  • Drive impactful technology solutions and contribute significantly to the team's professional development and product success.

Jobgether uses AI-powered matching process to ensure your application is reviewed quickly, objectively, and fairly. They use AI tools to support parts of the hiring process, such as reviewing applications, analyzing resumes, or assessing responses, with final decisions made by humans.

US

  • Build the shared AI execution platform that powers every AI product.
  • Shape the unified architecture, guide tough tradeoffs, and elevate engineering standards.
  • Ensure the platform is scalable, safe, cost-efficient, and easy to build on.

Zapier builds a platform to help millions of businesses scale with automation and AI, aiming to make automation work for everyone.

$160,000–$190,000/yr
US Europe Unlimited PTO 12w maternity

Write reliable, secure, and scalable code for Lithic's products with minimal tech debt. Own projects from planning to launch, keeping stakeholders informed and aligned along the way. Engage in design discussions and collaborate across teams to build well-reasoned solutions.

Lithic creates card issuing and payment infrastructure for technology companies that just works.

$230,000–$265,000/yr
US

  • Lead a cross-functional AI product engineering team, owning strategy and execution across multiple Copilot experiences
  • Drive technical direction and product evolution for the AI stack, inference backend, user-facing AI products
  • Interface closely with the AI Research and Model Training team to align on model capabilities, evaluation methodology, data/training pipelines, and the productionization path for new models and techniques

Cribl is a data engine for IT and Security. Many big names in demanding industries trust Cribl to solve their pressing data needs; they are growing rapidly and collaborative, with curious and motivated team members who are passionate about putting customers first.

$120,000–$150,000/yr

  • Architect, build, and deploy LLM-powered applications that augment and automate key workflows.
  • Design autonomous AI systems that can execute technical analysis, testing, troubleshooting, and decision-making at scale.
  • Develop AI-driven tools that create measurable business impact — improving efficiency, accelerating innovation, and driving revenue growth.

Sierra Studio connects talented Brazilian professionals with exciting career opportunities in a highly-vetted small community of growing companies in the US. They specialize in enabling merchants, consumers, and partners to operate with flexibility, intelligence, and trust with over 250 people.