Chain-of-Thought Spoofing Targets Reasoning AI Models
Researchers [Charles Ye], [Jasmine Cui], and [Dylan Hadfield-Menell] have shown that AI Large Language Models (LLMs) can fail to correctly distinguish between different instruction sources because they prioritize writing style...
