熱門話題
#
Bonk 生態迷因幣展現強韌勢頭
#
有消息稱 Pump.fun 計劃 40 億估值發幣,引發市場猜測
#
Solana 新代幣發射平臺 Boop.Fun 風頭正勁
"即使是最前沿的模型也難以超越預訓練的先驗,無論新的證據多麼引人注目。"
我們訓練博士生來做到這一點!變壓器能在不改變其權重的情況下做到這一點嗎?


8月8日 07:29
Is Chain-of-Thought Reasoning of LLMs a Mirage?
... Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions. This work offers a deeper understanding of why and when CoT reasoning fails, emphasizing the ongoing challenge of achieving genuine and generalizable reasoning.
... Our findings reveal that CoT reasoning works effectively when applied to in-distribution or near
in-distribution data but becomes fragile and prone to failure even under moderate distribution shifts.
In some cases, LLMs generate fluent yet logically inconsistent reasoning steps. The results suggest that what appears to be structured reasoning can be a mirage, emerging from memorized or interpolated patterns in the training data rather than logical inference.
... Together, these findings suggest that LLMs are not principled reasoners but rather sophisticated simulators of reasoning-like text.

14.01K
熱門
排行
收藏