The Information of Large Language Model Geometry

Zhiquan Tan, Chenghai Li, Weiran Huang
{"title":"The Information of Large Language Model Geometry","authors":"Zhiquan Tan, Chenghai Li, Weiran Huang","doi":"arxiv-2402.03471","DOIUrl":null,"url":null,"abstract":"This paper investigates the information encoded in the embeddings of large\nlanguage models (LLMs). We conduct simulations to analyze the representation\nentropy and discover a power law relationship with model sizes. Building upon\nthis observation, we propose a theory based on (conditional) entropy to\nelucidate the scaling law phenomenon. Furthermore, we delve into the\nauto-regressive structure of LLMs and examine the relationship between the last\ntoken and previous context tokens using information theory and regression\ntechniques. Specifically, we establish a theoretical connection between the\ninformation gain of new tokens and ridge regression. Additionally, we explore\nthe effectiveness of Lasso regression in selecting meaningful tokens, which\nsometimes outperforms the closely related attention weights. Finally, we\nconduct controlled experiments, and find that information is distributed across\ntokens, rather than being concentrated in specific \"meaningful\" tokens alone.","PeriodicalId":501433,"journal":{"name":"arXiv - CS - Information Theory","volume":"127 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Information Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2402.03471","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper investigates the information encoded in the embeddings of large language models (LLMs). We conduct simulations to analyze the representation entropy and discover a power law relationship with model sizes. Building upon this observation, we propose a theory based on (conditional) entropy to elucidate the scaling law phenomenon. Furthermore, we delve into the auto-regressive structure of LLMs and examine the relationship between the last token and previous context tokens using information theory and regression techniques. Specifically, we establish a theoretical connection between the information gain of new tokens and ridge regression. Additionally, we explore the effectiveness of Lasso regression in selecting meaningful tokens, which sometimes outperforms the closely related attention weights. Finally, we conduct controlled experiments, and find that information is distributed across tokens, rather than being concentrated in specific "meaningful" tokens alone.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大语言模型几何的信息
本文研究了大型语言模型(LLMs)的嵌入中编码的信息。我们通过模拟来分析表征熵,发现它与模型大小之间存在幂律关系。基于这一观察结果,我们提出了一种基于(条件)熵的理论来解释缩放定律现象。此外,我们还深入研究了 LLM 的自回归结构,并利用信息论和回归技术研究了最后一个标记与之前上下文标记之间的关系。具体来说,我们在新标记的信息增益和脊回归之间建立了理论联系。此外,我们还探索了 Lasso 回归在选择有意义标记方面的有效性,其效果有时优于密切相关的注意力权重。最后,我们进行了对照实验,发现信息是分布在各个词块上的,而不是仅仅集中在特定的 "有意义 "词块上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Massive MIMO CSI Feedback using Channel Prediction: How to Avoid Machine Learning at UE? Reverse em-problem based on Bregman divergence and its application to classical and quantum information theory From "um" to "yeah": Producing, predicting, and regulating information flow in human conversation Electrochemical Communication in Bacterial Biofilms: A Study on Potassium Stimulation and Signal Transmission Semantics-Empowered Space-Air-Ground-Sea Integrated Network: New Paradigm, Frameworks, and Challenges
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1