大型语言模型中的临界相变

arXiv - PHYS - Disordered Systems and Neural Networks Pub Date : 2024-06-08 DOI:arxiv-2406.05335

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

{"title":"大型语言模型中的临界相变","authors":"Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima","doi":"arxiv-2406.05335","DOIUrl":null,"url":null,"abstract":"The performance of large language models (LLMs) strongly depends on the\n\\textit{temperature} parameter. Empirically, at very low temperatures, LLMs\ngenerate sentences with clear repetitive structures, while at very high\ntemperatures, generated sentences are often incomprehensible. In this study,\nusing GPT-2, we numerically demonstrate that the difference between the two\nregimes is not just a smooth change but a phase transition with singular,\ndivergent statistical quantities. Our extensive analysis shows that critical\nbehaviors, such as a power-law decay of correlation in a text, emerge in the\nLLM at the transition temperature as well as in a natural language dataset. We\nalso discuss that several statistical quantities characterizing the criticality\nshould be useful to evaluate the performance of LLMs.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Critical Phase Transition in a Large Language Model\",\"authors\":\"Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima\",\"doi\":\"arxiv-2406.05335\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The performance of large language models (LLMs) strongly depends on the\\n\\\\textit{temperature} parameter. Empirically, at very low temperatures, LLMs\\ngenerate sentences with clear repetitive structures, while at very high\\ntemperatures, generated sentences are often incomprehensible. In this study,\\nusing GPT-2, we numerically demonstrate that the difference between the two\\nregimes is not just a smooth change but a phase transition with singular,\\ndivergent statistical quantities. Our extensive analysis shows that critical\\nbehaviors, such as a power-law decay of correlation in a text, emerge in the\\nLLM at the transition temperature as well as in a natural language dataset. We\\nalso discuss that several statistical quantities characterizing the criticality\\nshould be useful to evaluate the performance of LLMs.\",\"PeriodicalId\":501066,\"journal\":{\"name\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - PHYS - Disordered Systems and Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.05335\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.05335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

大型语言模型（LLMs）的性能在很大程度上取决于（textit{temperature}）参数。根据经验，在极低的温度下，大语言模型生成的句子具有清晰的重复结构，而在极高的温度下，生成的句子往往难以理解。在本研究中，我们使用 GPT-2 用数值证明了这两种状态之间的差异不仅仅是平滑的变化，而是具有奇异、发散统计量的相变。我们的大量分析表明，在过渡温度下，LLM 和自然语言数据集中都出现了临界行为，如文本中相关性的幂律衰减。我们还讨论了表征临界值的几个统计量，它们应该有助于评估 LLM 的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Critical Phase Transition in a Large Language Model

The performance of large language models (LLMs) strongly depends on the \textit{temperature} parameter. Empirically, at very low temperatures, LLMs generate sentences with clear repetitive structures, while at very high temperatures, generated sentences are often incomprehensible. In this study, using GPT-2, we numerically demonstrate that the difference between the two regimes is not just a smooth change but a phase transition with singular, divergent statistical quantities. Our extensive analysis shows that critical behaviors, such as a power-law decay of correlation in a text, emerge in the LLM at the transition temperature as well as in a natural language dataset. We also discuss that several statistical quantities characterizing the criticality should be useful to evaluate the performance of LLMs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - PHYS - Disordered Systems and Neural Networks

自引率

0.00%

发文量