Research Scientist / Engineer, Alignment Finetuning
Anthropic
San Francisco Bay Area, USA$280,000 - $425,000
About This Role
- In this role, you will lead the development and implementation of techniques aimed at training language models that are more aligned with human values.
- Develop and implement novel finetuning techniques using synthetic data generation and advanced training pipelines.
- Train models to improve alignment properties, such as honesty, character, and harmlessness.
- Create and maintain evaluation frameworks to measure alignment properties in models.
- Collaborate across teams to integrate alignment improvements into production models.
Requirements
- Master's degree
Last verified: about 20 hours ago
Source:aisafety.com
