Source Job

Europe North America 7w PTO

  • Design a Python framework for implementing internal and public benchmarks.
  • Build and maintain a pipeline that runs distributed evaluations at scale.
  • Collaborate with modeling and product teams to improve experimentation and evaluation tooling.

Python Kafka GCP AWS Azure

20 jobs similar to Software Engineer - Evaluations

Jobs ranked by similarity.

5w PTO

  • Drive major technical initiatives from design through production, improving scalability, reliability, and correctness across critical systems.
  • Design and evolve backend services, APIs, event-driven workflows, and data models that support complex business processes at scale.
  • Improve the operational foundations of the platform through better observability, testing, deployment safety, and incident reduction.

Tem is rebuilding the energy transaction, making it transparent and fair. They aim to put power back in the hands of customers and tackle the critical problem of access to low-cost electricity, leveraging AI-driven infrastructure for efficient and sustainable energy markets.

$107,000–$168,000/yr
US

  • Translate product vision into production-ready code, working closely with Product Managers to turn business goals into actionable plans.
  • Help drive the transition to a self-service model, ensuring infrastructure remains performant across new global regions as the team and technology scales.
  • Take end-to-end ownership of service health, including participating in design docs and implementing robust monitoring/alerting.

Addepar is a global data and AI platform that empowers investment professionals to turn complex financial information into actionable intelligence. More than 1,400 firms manage and advise on nearly $9 trillion in assets, and we strive to promote a welcoming environment, and inclusion and belonging are held as a shared responsibility.

Argentina

  • Help build the platform that lets people across Greenhouse build, deploy, and run their own agents and automations against Greenhouse's data and tools.
  • Stand up whatever infrastructure EA's application engineers need to ship reliably, including services, runtime, deployment, scheduling, and observability.
  • Provide services related to EA's AWS footprint end-to-end, including networking, IAM, secrets, security posture, and deployment automation.

Greenhouse's mission is to make hiring work for everyone; they hire great people because they believe that they’re the foundation of their success. The company collaborates purposefully, fosters inclusivity, and communicates with transparency and accountability.

$120,000–$210,000/yr
Europe

  • Enable systematic exploration and materially improve exploration success rates.
  • Build data pipelines and tooling for deriving advanced human and machine insights from exploration data.
  • Develop expertise in KoBold’s Data Systems and deeply understanding how they impact exploration.

KoBold builds AI models for mineral exploration and deploys those models to guide decisions in exploration programs. In the six years since founding, KoBold has become the largest independent mineral exploration company and the largest exploration technology developer.

Europe

  • Lead the design and implementation of key projects, delivering innovative solutions.
  • Collaborate with Product Managers to define project scope, timelines, and resources.
  • Mentor others and assist in hiring and onboarding new talent.

Prolific is building the human data infrastructure that's reshaping the landscape of AI development. They connect researchers and companies with a global pool of participants, enabling the collection of high-quality, ethically sourced human behavioral data and feedback.

  • Shaping the Python language ecosystem with a strong product and platform mindset.
  • Architecting, building and delivering high-impact solutions that uplift the Python developer experience.
  • Advocating for Python engineering best practices across the organization.

Canva is a design platform that empowers users to create professional-quality graphics. They offer an inclusive culture with employees across multiple locations.

$160,000–$180,000/yr
US Unlimited PTO

  • Identify systemic engineering challenges across our platforms and drive their resolution.
  • Write code, review PRs, debug production issues, and optimize system performance.
  • Partner with engineering teams as a technical point of contact on complex projects.

Zeta Global is an AI-Powered Marketing Cloud that leverages advanced artificial intelligence (AI) and trillions of consumer signals to help marketers acquire, grow, and retain customers more efficiently. They were founded in 2007 and are headquartered in New York City with offices around the world.

Europe 5w PTO

  • Design, build, and maintain scalable backend services and APIs that power Chattermill’s core analytics platform.
  • Architect reliable, maintainable distributed systems and contribute to the evolution of backend service design and infrastructure.
  • Own end-to-end delivery of backend engineering workstreams, from technical scoping and architecture through to implementation, testing, observability, and production support.

Chattermill helps large successful brands like Uber, Amazon, and Wise put their customers at the centre of everything they do. Using best-in-class tech in a fast-evolving AI space, their Customer Experience Intelligence platform continuously analyses feedback to help clients identify what to do next.

$160,000–$190,000/yr
US Canada Unlimited PTO

  • Own and maintain data pipeline architectures, ensuring reliability and monitoring.
  • Manage and evolve data modeling environments for analysts and engineers.
  • Implement observability for data systems, detecting issues early and continuously monitoring data quality.

Voltus unlocks the full value of distributed energy resources for customers and the grid. They are a fast-growing climate-tech company with a bright, gritty, and good team that values innovation, impact, and integrity.

$124,200–$198,700/yr
US

  • Design, build, test, deploy, and maintain scalable, reliable platform services and shared libraries.
  • Contribute to platform and system architecture decisions with a focus on reliability, scalability, and developer experience.
  • Write high-quality, maintainable code and set a strong example of engineering best practices.

Oportun is a mission-driven financial services company that aims to help its members reach their financial goals. They have provided more than $21.3 billion in responsible and affordable credit and saved its members more than $2.5 billion in interest and fees.

$125,000–$150,000/yr
US Unlimited PTO

  • Design and build systems, manage scalable ML pipelines using Vertex AI Pipelines for training, evaluation and deployment to support ranking, retrieval, and recommendation personalization use cases
  • Develop and maintain data pipelines that support feature generation, model training, and analytics workflows. Own vector generation via Milvus, storage, and retrieval workflows
  • Implement model serving solutions using KServe and build APIs using FastAPI for low latency inference Build observability and monitoring for models and pipelines.

People Inc. is America’s largest digital and print publisher. Our 40+ iconic and fast-growing brands harness the best intent-driven content, the fastest sites, and the fewest ads to help nearly 200 million people every month.

Canada

  • Set technical strategy for your team on a year-long scale and tie it to business-impacting projects.
  • Collaborate across product management, design, and analytics to ensure technical sustainability and manage risks.
  • Foster a culture of quality and ownership by setting code review standards and developing team talent.

Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. They are a remote-first company that provides competitive benefits anchored to their core value of people coming first.

$0–$0/yr
US Canada

  • Design, build, and ship agentic workflows across multiple domains.
  • Build multi-step agents capable of autonomous planning, context tracking, memory, tool use, and API orchestration.
  • Drive technical and architectural decisions to meet product requirements while also anticipating and designing for future needs

Cority helps customers see and prevent risks across their operations in real time. Our EHS+ platform converges people, data, and AI agents to provide a clear view of information people can trust. For 40 years, Cority has been the market leader in EHS+, recognized by top analysts and trusted by more than 1,500 of the most complex organizations worldwide.

Unlimited PTO

  • Assess and improve visibility by identifying gaps in dashboards, metrics, and logs.
  • Refine alerts and dashboards for critical services to catch issues earlier.
  • Automate routine checks and monitoring tasks to free up engineers.

PlayOn is where high school sports come to life through platforms like GoFan, NFHS Network, and MaxPreps. As a growth-stage company backed by KKR, we build the technology that powers high school athletics from ticketing and streaming to fundraising and merchandise.

$125,000–$175,000/yr
US Unlimited PTO

  • Act as a trusted advisor to customers, building relationships with technical and business stakeholders.
  • Advise on GenAI and ML best practices, giving product demos to technical and business stakeholders.
  • Partner with product and engineering teams to drive the product roadmap and spearhead new opportunities within existing accounts.

Arize AI is transforming the world by helping teams monitor, troubleshoot, and optimize their AI systems with its AI & Agent Engineering observability and evaluation platform. They are a Series C company backed by top-tier investors, with over $135M in funding and a rapidly growing customer base.

$154,384–$198,893/yr
Europe

  • Design, build, and own core components of the agent platform, from the orchestration layer to the tool integrations connecting it to internal systems.
  • Build and evolve the capabilities layer: APIs, data access patterns, and service integrations for agents to execute operational workflows.
  • Architect the knowledge and memory infrastructure, allowing agents to retrieve the data and act across our systems.

Justworks helps businesses get off the ground by enabling them to focus on running their business and solves HR issues. The company embraces a supportive, entrepreneurial environment where employees are encouraged to build something meaningful and have fun.

Canada

  • Define, drive, design, and build/ship end-to-end solutions that solve real customer problems.
  • Contribute to the end-to-end AI/ML software development lifecycle, ensuring reproducible research.
  • Drive architecture, design, and delivery of advanced ML systems in the Product R&D team.

Kinaxis is a global leader in modern supply chain orchestration. Known for its AI-infused platform and transparency across end-to-end supply chains, Kinaxis helps customers make faster, better decisions. The company has over 2000 employees worldwide and is recognized with Top Employer awards.

$190,000–$280,500/yr
US Canada

  • Design and build high-quality API and services for the international adaptation of Life360.
  • Work with AI (Claude Code) as a first-class collaborator.
  • Define and codify AI-Native engineering practices for the International team.

Life360's mission is to keep people close to the ones they love. They have a category-leading mobile app, Tile tracking devices, and Pet GPS tracker empower members to protect the people, pets, and things they care about most. Life360 has more than 500 (and growing!) remote-first employees.

  • Test large-scale data processing systems
  • Analyze and improve efficiency, scalability, and stability of applications
  • Create, maintain, and improve automation test suites/frameworks

OpenX is focused on unleashing the full economic potential of digital media companies by making digital advertising markets and technologies. They are a team uniquely experienced in designing and operating high-scale ad marketplaces, constantly looking for thoughtful, creative executors.

US

  • Lead the design and evolution of Fieldguide's core platform services.
  • Build platform capabilities that compound the leverage of product and AI engineers.
  • Define the architecture for how new product capabilities get delivered across environments.

Fieldguide is automating and streamlining the work of assurance and audit practitioners. They are based in San Francisco, CA, remote-first and backed by Goldman Sachs Alternatives, Bessemer Venture Partners, 8VC, Floodgate, Y Combinator, and more.