Research Scientist / Engineer, Alignment Finetuning

Anthropic

research
San Francisco Bay Area, USA$280,000 - $425,000
About This Role

- In this role, you will lead the development and implementation of techniques aimed at training language models that are more aligned with human values.

- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines.

- Train models to improve alignment properties, such as honesty, character, and harmlessness.

- Create and maintain evaluation frameworks to measure alignment properties in models.

- Collaborate across teams to integrate alignment improvements into production models.

Requirements
  • Master's degree
Last verified: about 20 hours ago