Anthropic’s Evaluation of Chain-of-Thought Faithfulness: Investigating Hidden Reasoning, Reward Hacks, and the Limitations of Verbal AI Transparency in Reasoning Models
A key advancement in AI capabilities is the development and use of chain-of-thought (CoT) reasoning, where models explain their steps ...