Research Engineer, Interpretability

Anthropic 🤖💡🤝

Salary range

$315,000–$560,000/year

Benefits

Job Description

The Interpretability team at Anthropic seeks to understand how modern language models work and how we can trust them. They are reverse-engineering trained models, believing mechanistic understanding is key to making advanced systems safe. The focus is on mechanistic interpretability, to discover how neural network parameters map to meaningful algorithms. Responsibilities include implementing and analyzing research experiments, building tools for research experimentation, and improving infrastructure to support model safety. The job requires 5-10+ years of experience building software, proficiency in a programming language like Python, Rust, Go, or Java. Experience contributing to empirical AI research projects is expected. Prioritization, collaboration, and interest in machine learning research, its applications, and societal impacts are important. Experience with code base design, performance optimization of large-scale distributed systems, language modeling with transformers, GPUs, or Pytorch is beneficial.

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems and wants AI to be safe and beneficial for its users and for society.

Apply for This Position