Source Job

  • Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows.
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control.
  • Improve reliability, performance, and safety across existing Python codebases.

Python

20 jobs similar to Python Insfrastructure Engineer - Model Evaluation

Jobs ranked by similarity.

  • Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation , validation, and quality control
  • Improve reliability, performance, and safety across existing Python codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

  • Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation , validation, and quality control
  • Improve reliability, performance, and safety across existing Python codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

$104,000–$156,000/hr

  • Design, build, and optimize high-performance systems in Python supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation , validation, and quality control
  • Improve reliability, performance, and safety across existing Python codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

$104,000–$156,000/hr

  • Design, build, and optimize high-performance systems in Rust supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
  • Improve reliability, performance, and safety across existing Rust codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They focus on real production systems and high-impact research workflows across data, tooling, and infrastructure.

  • Design, build, and optimize high-performance systems in C++ supporting AI data pipelines and evaluation workflows.
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control.
  • Improve reliability, performance, and safety across existing C++ codebases.

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

$104,000–$156,000/hr

  • Design, build, and optimize high-performance systems in Rust supporting AI data pipelines and evaluation workflows.
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control.
  • Improve reliability, performance, and safety across existing Rust codebases.

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

$141,487–$184,800/yr
Europe

  • Design scalable, future-proof data platforms optimized for AI research workloads.
  • Build efficient self-serve data processing pipelines leveraging GCP's advanced services.
  • Implement guardrails for cost, quality, and performance.

AssemblyAI is at the forefront of Speech AI, creating powerful models for speech-to-text and speech understanding via an API. They're a remote team of startup veterans and AI researchers looking to build one of the next great AI companies.

US Europe

  • Design, develop, and deploy AI-driven applications to make our software more accessible.
  • Own the software from requirements development through deployment and maintenance.
  • Design, build, test, and deploy a scalable system architecture.

Epistemix empowers organizations to make smarter decisions by simulating real-world outcomes using synthetic populations.

World Wide

  • Challenge advanced language models on realistic infrastructure and platform scenarios.
  • Verify architectural soundness and logical correctness, assess code quality and testing strategies.
  • Analyze performance bottlenecks and deployment risks, capture reproducible failure cases, and suggest improvements.

The company is hiring for a SWE Infrastructure Specialist. As a contractor, the employee will need to supply a secure computer and high-speed internet; company-sponsored benefits such as health insurance and PTO do not apply.

$60–$90/hr

  • Evaluate AI-generated code across the full stack.
  • Design and build full-stack tooling for AI data annotation and quality control.
  • Review complex system designs providing feedback on scalability and performance.

Alignerr partners with leading AI research teams and labs to build and train cutting-edge AI models. The company offers hourly contract positions, emphasizing innovation and collaboration within the AI field.

  • Design, build, and optimize high-performance systems in C# supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation , validation, and quality control
  • Improve reliability, performance, and safety across existing C# codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They focus on real production systems and high-impact research workflows across data, tooling, and infrastructure.

$85,000–$225,000/yr
US Canada

This role validates Veeva AI Agents through evaluation. You will define strategies for new AI Agents. The role involves analysis of model behaviors to identify defects.

Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster.

US Canada 11w maternity

  • Build FastAPI services, Pydantic models, evaluation tooling, integrations with device APIs, and guardrails to keep responses safe and useful.
  • Drive implementation for Phase 2 of the AI Sleep Chat targeting Q2 2026.
  • Translate voice/text commands into device API calls, design and implement multi-agent architecture for command processing pipeline.

At Hatch, they’re on a mission to help people build better sleep habits—so they can feel more focused, energized, and present in their lives.

  • Design, build, and optimize high-performance systems in C++ supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation, validation, and quality control
  • Improve reliability, performance, and safety across existing C++ codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They work on real production systems and high-impact research workflows across data, tooling, and infrastructure.

Europe 4w PTO

  • Explore and preprocess raw, messy datasets and design data strategies.
  • Prototype model ideas and translate prototypes into production.
  • Collaborate with cross-functional teams to turn ideas into impactful features.

Hostinger is shaping the future of online success powered by AI and driven by people with over 4 million clients in 150 countries.

Canada

Design, implement, test, and deploy offline object detection, tracking, and fusion modules to automatically create annotations on Cloud Services from recorded sensor data. Define and implement the ingestion, preparation, curation, and governance of large multidimensional datasets in support of analytical models and workflows. Evaluate and make recommendations regarding technical advances that improve productivity and quality, reduce turnaround time and strengthen operational reliability.

Torc has always believed that autonomous vehicle technology will transform the way we travel, transport goods and do business and is now part of the Daimler family.

  • Design, build, and optimize high-performance systems in Rust supporting AI data pipelines and evaluation workflows
  • Develop full-stack tooling and backend services for large-scale data annotation , validation, and quality control
  • Improve reliability, performance, and safety across existing Rust codebases

Alignerr connects top technical experts with leading AI labs to build, evaluate, and improve next-generation models. They did not provide information about the size/employees and culture in the job description.

Global

  • Define the vision and feature set for our internal Robotics Data Platform.
  • Act as the Product Owner for a dedicated team of software engineers.
  • Participating in discussions with leading robotics labs and foundation model builders.

Turing is the world’s leading research accelerator for frontier AI labs and a trusted partner for global enterprises looking to deploy advanced AI systems. Recognized by Forbes, The Information, and Fast Company among the world’s top innovators, Turing’s leadership team includes AI technologists from Meta, Google, Microsoft, Apple, Amazon, McKinsey, Bain, Stanford, Caltech, and MIT.

  • Own the end-to-end lifecycle of ML model deployment—from training artifacts to production inference services.
  • Design, build, and maintain scalable inference pipelines using modern orchestration frameworks (e.g., Kubeflow, Airflow, Ray, MLflow).
  • Implement and optimize model serving infrastructure for latency, throughput, and cost efficiency across GPU and CPU clusters.

MARA is building a modular platform that unifies IaaS, PaaS, and SaaS which will enable governments, enterprises, and AI innovators to deploy, scale, and govern workloads across data centers, edge environments, and sovereign clouds. They are redefining the future of sovereign, energy-aware AI infrastructure.

  • Review AI-generated responses and evaluate technical accuracy.
  • Provide expert feedback to train AI systems to write better code.
  • Work with various programming languages and coding challenges.

G2i connects subject-matter experts, students, and professionals with flexible, remote AI training work such as annotation, evaluation, fact-checking, and content review.