Staff Research Engineer, Pre-training Data

Reddit

Remote regions

US

Salary range

$230,000–$322,000/yr

Benefits

Similar Jobs

See all

AI Team:

  • Build Reddit-native foundational Large Language Models (LLMs).
  • Sit at the intersection of applied research and massive-scale infrastructure.
  • Tasked with training models that truly understand the Reddit culture.

Responsibilities:

  • Architect and implement high-throughput, deterministic data sampling systems.
  • Formulate and validate statistical hypotheses regarding data mixtures.
  • Design the 'Safety-First' ingestion layer for PII redaction and toxicity signals.

Qualifications:

  • 8+ years of software engineering experience with a focus on machine learning infrastructure.
  • Expert proficiency in Python and distributed data processing frameworks.
  • Strong mathematical foundation in probability, statistics, and importance sampling theory.

Reddit

Reddit is a community-driven platform where users submit, vote, and comment on what interests them. With over 100,000 active communities and 116 million daily active users, they foster open conversations and shared interests.

Apply for This Position