Critical Phase Transition in a Large Language Model

arXiv - PHYS - Disordered Systems and Neural Networks Pub Date : 2024-06-08 DOI:arxiv-2406.05335

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

引用次数: 0

Abstract

The performance of large language models (LLMs) strongly depends on the \textit{temperature} parameter. Empirically, at very low temperatures, LLMs generate sentences with clear repetitive structures, while at very high temperatures, generated sentences are often incomprehensible. In this study, using GPT-2, we numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities. Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature as well as in a natural language dataset. We also discuss that several statistical quantities characterizing the criticality should be useful to evaluate the performance of LLMs.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

大型语言模型中的临界相变

大型语言模型（LLMs）的性能在很大程度上取决于（textit{temperature}）参数。根据经验，在极低的温度下，大语言模型生成的句子具有清晰的重复结构，而在极高的温度下，生成的句子往往难以理解。在本研究中，我们使用 GPT-2 用数值证明了这两种状态之间的差异不仅仅是平滑的变化，而是具有奇异、发散统计量的相变。我们的大量分析表明，在过渡温度下，LLM 和自然语言数据集中都出现了临界行为，如文本中相关性的幂律衰减。我们还讨论了表征临界值的几个统计量，它们应该有助于评估 LLM 的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

arXiv - PHYS - Disordered Systems and Neural Networks

自引率

0.00%

发文量