Research Scientist / Engineer, Alignment Finetuning

Anthropic

research

San Francisco Bay Area, USA$280,000 - $425,000

About This Role

- In this role, you will lead the development and implementation of techniques aimed at training language models that are more aligned with human values.

- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines.

- Train models to improve alignment properties, such as honesty, character, and harmlessness.

- Create and maintain evaluation frameworks to measure alignment properties in models.

- Collaborate across teams to integrate alignment improvements into production models.

Requirements

Master's degree