arXiv - PHYS - Disordered Systems and Neural Networks最新文献_第6页

Critical Phase Transition in a Large Language Model 大型语言模型中的临界相变

arXiv - PHYS - Disordered Systems and Neural Networks

Pub Date : 2024-06-08 DOI: arxiv-2406.05335

Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima

The performance of large language models (LLMs) strongly depends on thetextit{temperature} parameter. Empirically, at very low temperatures, LLMsgenerate sentences with clear repetitive structures, while at very hightemperatures, generated sentences are often incomprehensible. In this study,using GPT-2, we numerically demonstrate that the difference between the tworegimes is not just a smooth change but a phase transition with singular,divergent statistical quantities. Our extensive analysis shows that criticalbehaviors, such as a power-law decay of correlation in a text, emerge in theLLM at the transition temperature as well as in a natural language dataset. Wealso discuss that several statistical quantities characterizing the criticalityshould be useful to evaluate the performance of LLMs.

大型语言模型（LLMs）的性能在很大程度上取决于（textit{temperature}）参数。根据经验，在极低的温度下，大语言模型生成的句子具有清晰的重复结构，而在极高的温度下，生成的句子往往难以理解。在本研究中，我们使用 GPT-2 用数值证明了这两种状态之间的差异不仅仅是平滑的变化，而是具有奇异、发散统计量的相变。我们的大量分析表明，在过渡温度下，LLM 和自然语言数据集中都出现了临界行为，如文本中相关性的幂律衰减。我们还讨论了表征临界值的几个统计量，它们应该有助于评估 LLM 的性能。

引用次数: 0

Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine 高通用性 FPGA 实现的网络相干等效机

arXiv - PHYS - Disordered Systems and Neural Networks

Pub Date : 2024-06-08 DOI: arxiv-2406.05377

Toru Aonishi, Tatsuya Nagasawa, Toshiyuki Koizumi, Mastiyage Don Sudeera Hasaranga Gunathilaka, Kazushi Mimura, Masato Okada, Satoshi Kako, Yoshihisa Yamamoto

In recent years, quantum Ising machines have drawn a lot of attention, butdue to physical implementation constraints, it has been difficult to achievedense coupling, such as full coupling with sufficient spins to handle practicallarge-scale applications. Consequently, classically computable equations havebeen derived from quantum master equations for these quantum Ising machines.Parallel implementations of these algorithms using FPGAs have been used torapidly find solutions to these problems on a scale that is difficult toachieve in physical systems. We have developed an FPGA implemented cybercoherent Ising machine (cyber CIM) that is much more versatile than previousimplementations using FPGAs. Our architecture is versatile since it can beapplied to the open-loop CIM, which was proposed when CIM research began, tothe closed-loop CIM, which has been used recently, as well as to Jacobisuccessive over-relaxation method. By modifying the sequence control code forthe calculation control module, other algorithms such as Simulated Bifurcation(SB) can also be implemented. Earlier research on large-scale FPGAimplementations of SB and CIM used binary or ternary discrete values forconnections, whereas the cyber CIM used FP32 values. Also, the cyber CIMutilized Zeeman terms that were represented as FP32, which were not present inother large-scale FPGA systems. Our implementation with continuous interactionrealizes N=4096 on a single FPGA, comparable to the single-FPGA implementationof SB with binary interactions, with N=4096. The cyber CIM enables applicationssuch as CDMA multi-user detector and L0 compressed sensing which were notpossible with earlier FPGA systems, while enabling superior calculation speeds,more than ten times faster than a GPU implementation. The calculation speed canbe further improved by increasing parallelism, such as through clustering.

近年来，量子伊辛机引起了广泛关注，但由于物理实现方面的限制，量子伊辛机很难实现紧密耦合，如具有足够自旋的完全耦合，以处理实际的大规模应用。因此，人们从量子主方程中推导出了这些量子伊辛机的经典可计算方程。使用 FPGA 并行执行这些算法，可以快速找到解决这些问题的方法，而物理系统很难实现这种规模。我们开发了一种 FPGA 实现的网络相干伊兴机（cyber CIM），它比以前使用 FPGA 实现的网络相干伊兴机更具通用性。我们的架构具有多功能性，因为它既可以应用于 CIM 研究开始时提出的开环 CIM，也可以应用于最近使用的闭环 CIM，还可以应用于雅各布连续超松弛法。通过修改计算控制模块的序列控制代码，还可以实现模拟分岔（SB）等其他算法。早期关于 SB 和 CIM 的大规模 FPGA 实现的研究使用二进制或三元离散值进行连接，而网络 CIM 使用的是 FP32 值。此外，网络 CIM 还利用了以 FP32 表示的泽曼项，这在其他大规模 FPGA 系统中是不存在的。我们在单个 FPGA 上实现了 N=4096 的连续交互，与单个 FPGA 实现二进制交互的 SB（N=4096）相当。网络 CIM 使 CDMA 多用户检测器和 L0 压缩传感等应用成为可能，而这些应用在早期的 FPGA 系统上是不可能实现的。通过增加并行性（如通过聚类），计算速度还能进一步提高。

{"title":"Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine","authors":"Toru Aonishi, Tatsuya Nagasawa, Toshiyuki Koizumi, Mastiyage Don Sudeera Hasaranga Gunathilaka, Kazushi Mimura, Masato Okada, Satoshi Kako, Yoshihisa Yamamoto","doi":"arxiv-2406.05377","DOIUrl":"https://doi.org/arxiv-2406.05377","url":null,"abstract":"In recent years, quantum Ising machines have drawn a lot of attention, but\u0000due to physical implementation constraints, it has been difficult to achieve\u0000dense coupling, such as full coupling with sufficient spins to handle practical\u0000large-scale applications. Consequently, classically computable equations have\u0000been derived from quantum master equations for these quantum Ising machines.\u0000Parallel implementations of these algorithms using FPGAs have been used to\u0000rapidly find solutions to these problems on a scale that is difficult to\u0000achieve in physical systems. We have developed an FPGA implemented cyber\u0000coherent Ising machine (cyber CIM) that is much more versatile than previous\u0000implementations using FPGAs. Our architecture is versatile since it can be\u0000applied to the open-loop CIM, which was proposed when CIM research began, to\u0000the closed-loop CIM, which has been used recently, as well as to Jacobi\u0000successive over-relaxation method. By modifying the sequence control code for\u0000the calculation control module, other algorithms such as Simulated Bifurcation\u0000(SB) can also be implemented. Earlier research on large-scale FPGA\u0000implementations of SB and CIM used binary or ternary discrete values for\u0000connections, whereas the cyber CIM used FP32 values. Also, the cyber CIM\u0000utilized Zeeman terms that were represented as FP32, which were not present in\u0000other large-scale FPGA systems. Our implementation with continuous interaction\u0000realizes N=4096 on a single FPGA, comparable to the single-FPGA implementation\u0000of SB with binary interactions, with N=4096. The cyber CIM enables applications\u0000such as CDMA multi-user detector and L0 compressed sensing which were not\u0000possible with earlier FPGA systems, while enabling superior calculation speeds,\u0000more than ten times faster than a GPU implementation. The calculation speed can\u0000be further improved by increasing parallelism, such as through clustering.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"354 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Reconsideration of optimization for reduction of traffic congestion 重新考虑减少交通拥堵的优化方案

arXiv - PHYS - Disordered Systems and Neural Networks

Pub Date : 2024-06-08 DOI: arxiv-2406.05448

Masayuki Ohzeki

One of the most impressive applications of a quantum annealer was optimizinga group of Volkswagen to reduce traffic congestion using a D-Wave system. Asimple formulation of a quadratic term was proposed to reduce trafficcongestion. This quadratic term was useful for determining the shortest routesamong several candidates. The original formulation produced decreases in thetotal lengths of car tours and traffic congestion. In this study, wereformulated the cost function with the sole focus on reducing trafficcongestion. We then found a unique cost function for expressing the quadraticfunction with a dead zone and an inequality constraint.

量子退火器最令人印象深刻的应用之一是利用 D-Wave 系统优化一组大众汽车，以减少交通拥堵。为了减少交通拥堵，我们提出了一个简单的二次项公式。该二次项有助于在多个候选路径中确定最短路径。最初的公式减少了汽车旅行的总长度和交通拥堵。在这项研究中，我们制定的成本函数只关注减少交通拥堵。然后，我们找到了一个唯一的成本函数，用于表达带有死区和不等式约束的二次函数。

引用次数: 0

Unified one-parameter scaling function for Anderson localization transitions in non-reciprocal non-Hermitian systems 非互惠非ermitian 系统中安德森定位转换的统一单参数缩放函数

arXiv - PHYS - Disordered Systems and Neural Networks

Pub Date : 2024-06-04 DOI: arxiv-2406.01984

C. Wang, Wenxue He, X. R. Wang, Hechen Ren

By using dimensionless conductances as scaling variables, the conventionalone-parameter scaling theory of localization fails for non-reciprocalnon-Hermitian systems such as the Hanato-Nelson model. Here, we propose aone-parameter scaling function using the participation ratio as the scalingvariable. Employing a highly accurate numerical procedure based on exactdiagonalization, we demonstrate that this one-parameter scaling function candescribe Anderson localization transitions of non-reciprocal non-Hermitiansystems in one and two dimensions of symmetry classes AI and A. The criticalexponents of correlation lengths depend on symmetries and dimensionality only,a typical feature of universality. Moreover, we derive a complex-gap equationbased on the self-consistent Born approximation that can determine the disorderat which the point gap closes. The obtained disorders match perfectly thecritical disorders of Anderson localization transitions from the one-parameterscaling function. Finally, we show that the one-parameter scaling function isalso valid for Anderson localization transitions in reciprocal non-Hermitiansystems such as two-dimensional class AII$^dagger$ and can, thus, serve as aunified scaling function for disordered non-Hermitian systems.

通过使用无量纲电导作为缩放变量，传统的一参数局部化缩放理论对于诸如哈纳托-纳尔逊模型这样的非互易非赫米提系统是失效的。在此，我们提出了一种使用参与比作为缩放变量的单参数缩放函数。通过基于精确对角的高精度数值计算过程，我们证明了这个一参数缩放函数可以描述对称类 AI 和 A 的一维和二维非互惠非ermitian 系统的安德森定位转换。此外，我们还推导出基于自洽玻恩近似的复隙方程，它可以确定点隙关闭时的无序度。得到的无序度与单参数缩放函数中安德森局域化转换的临界无序度完全吻合。最后，我们证明了单参数缩放函数对于对等非赫米提系统（如二维 AII 类$^dagger$）中的安德森定位转换也是有效的，因此可以作为无序非赫米提系统的统一缩放函数。

{"title":"Unified one-parameter scaling function for Anderson localization transitions in non-reciprocal non-Hermitian systems","authors":"C. Wang, Wenxue He, X. R. Wang, Hechen Ren","doi":"arxiv-2406.01984","DOIUrl":"https://doi.org/arxiv-2406.01984","url":null,"abstract":"By using dimensionless conductances as scaling variables, the conventional\u0000one-parameter scaling theory of localization fails for non-reciprocal\u0000non-Hermitian systems such as the Hanato-Nelson model. Here, we propose a\u0000one-parameter scaling function using the participation ratio as the scaling\u0000variable. Employing a highly accurate numerical procedure based on exact\u0000diagonalization, we demonstrate that this one-parameter scaling function can\u0000describe Anderson localization transitions of non-reciprocal non-Hermitian\u0000systems in one and two dimensions of symmetry classes AI and A. The critical\u0000exponents of correlation lengths depend on symmetries and dimensionality only,\u0000a typical feature of universality. Moreover, we derive a complex-gap equation\u0000based on the self-consistent Born approximation that can determine the disorder\u0000at which the point gap closes. The obtained disorders match perfectly the\u0000critical disorders of Anderson localization transitions from the one-parameter\u0000scaling function. Finally, we show that the one-parameter scaling function is\u0000also valid for Anderson localization transitions in reciprocal non-Hermitian\u0000systems such as two-dimensional class AII$^dagger$ and can, thus, serve as a\u0000unified scaling function for disordered non-Hermitian systems.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141257803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Prototype Analysis in Hopfield Networks with Hebbian Learning 采用 Hebbian 学习的 Hopfield 网络中的原型分析

arXiv - PHYS - Disordered Systems and Neural Networks

Pub Date : 2024-05-29 DOI: arxiv-2407.03342

Hayden McAlister, Anthony Robins, Lech Szymanski

We discuss prototype formation in the Hopfield network. Typically, Hebbianlearning with highly correlated states leads to degraded memory performance. Weshow this type of learning can lead to prototype formation, where unlearnedstates emerge as representatives of large correlated subsets of states,alleviating capacity woes. This process has similarities to prototype learningin human cognition. We provide a substantial literature review of prototypelearning in associative memories, covering contributions from psychology,statistical physics, and computer science. We analyze prototype formation froma theoretical perspective and derive a stability condition for these statesbased on the number of examples of the prototype presented for learning, thenoise in those examples, and the number of non-example states presented. Thestability condition is used to construct a probability of stability for aprototype state as the factors of stability change. We also note similaritiesto traditional network analysis, allowing us to find a prototype capacity. Wecorroborate these expectations of prototype formation with experiments using asimple Hopfield network with standard Hebbian learning. We extend ourexperiments to a Hopfield network trained on data with multiple prototypes andfind the network is capable of stabilizing multiple prototypes concurrently. Wemeasure the basins of attraction of the multiple prototype states, findingattractor strength grows with the number of examples and the agreement ofexamples. We link the stability and dominance of prototype states to the energyprofile of these states, particularly when comparing the profile shape totarget states or other spurious states.

我们讨论了 Hopfield 网络中的原型形成。通常，具有高度相关状态的希比安学习会导致记忆性能下降。在这种情况下，未学习的状态会作为大相关状态子集的代表出现，从而缓解容量问题。这一过程与人类认知中的原型学习有相似之处。我们对联想记忆中的原型学习进行了大量的文献综述，涉及心理学、统计物理学和计算机科学等领域。我们从理论角度分析了原型的形成，并根据为学习而呈现的原型示例的数量、这些示例中的噪声以及呈现的非示例状态的数量，推导出了这些状态的稳定条件。随着稳定因素的变化，稳定条件被用来构建原型状态的稳定概率。我们还注意到与传统网络分析的相似性，这使我们能够找到原型容量。我们通过使用标准海比学习的简单 Hopfield 网络进行实验，证实了对原型形成的这些预期。我们将实验扩展到在具有多个原型的数据上训练的 Hopfield 网络，发现该网络能够同时稳定多个原型。我们测量了多个原型状态的吸引盆地，发现吸引器强度会随着示例数量和示例一致性的增加而增加。我们将原型态的稳定性和主导性与这些态的能量剖面联系起来，特别是在将剖面形状与目标态或其他虚假态进行比较时。

{"title":"Prototype Analysis in Hopfield Networks with Hebbian Learning","authors":"Hayden McAlister, Anthony Robins, Lech Szymanski","doi":"arxiv-2407.03342","DOIUrl":"https://doi.org/arxiv-2407.03342","url":null,"abstract":"We discuss prototype formation in the Hopfield network. Typically, Hebbian\u0000learning with highly correlated states leads to degraded memory performance. We\u0000show this type of learning can lead to prototype formation, where unlearned\u0000states emerge as representatives of large correlated subsets of states,\u0000alleviating capacity woes. This process has similarities to prototype learning\u0000in human cognition. We provide a substantial literature review of prototype\u0000learning in associative memories, covering contributions from psychology,\u0000statistical physics, and computer science. We analyze prototype formation from\u0000a theoretical perspective and derive a stability condition for these states\u0000based on the number of examples of the prototype presented for learning, the\u0000noise in those examples, and the number of non-example states presented. The\u0000stability condition is used to construct a probability of stability for a\u0000prototype state as the factors of stability change. We also note similarities\u0000to traditional network analysis, allowing us to find a prototype capacity. We\u0000corroborate these expectations of prototype formation with experiments using a\u0000simple Hopfield network with standard Hebbian learning. We extend our\u0000experiments to a Hopfield network trained on data with multiple prototypes and\u0000find the network is capable of stabilizing multiple prototypes concurrently. We\u0000measure the basins of attraction of the multiple prototype states, finding\u0000attractor strength grows with the number of examples and the agreement of\u0000examples. We link the stability and dominance of prototype states to the energy\u0000profile of these states, particularly when comparing the profile shape to\u0000target states or other spurious states.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"364 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a theory of how the structure of language is acquired by deep neural networks 关于深度神经网络如何获得语言结构的理论研究

arXiv - PHYS - Disordered Systems and Neural Networks

Pub Date : 2024-05-28 DOI: arxiv-2406.00048

Francesco Cagnetta, Matthieu Wyart

How much data is required to learn the structure of a language via next-tokenprediction? We study this question for synthetic datasets generated via aProbabilistic Context-Free Grammar (PCFG) -- a hierarchical generative modelthat captures the tree-like structure of natural languages. We determinetoken-token correlations analytically in our model and show that they can beused to build a representation of the grammar's hidden variables, the longerthe range the deeper the variable. In addition, a finite training set limitsthe resolution of correlations to an effective range, whose size grows withthat of the training set. As a result, a Language Model trained withincreasingly many examples can build a deeper representation of the grammar'sstructure, thus reaching good performance despite the high dimensionality ofthe problem. We conjecture that the relationship between training set size andeffective range of correlations holds beyond our synthetic datasets. Inparticular, our conjecture predicts how the scaling law for the test lossbehaviour with training set size depends on the length of the context window,which we confirm empirically for a collection of lines from Shakespeare'splays.

通过下一个标记预测学习语言结构需要多少数据？我们针对通过概率自由上下文语法 (PCFG) 生成的合成数据集研究了这个问题，PCFG 是一种分层生成模型，可以捕捉自然语言的树状结构。我们通过分析确定了模型中的代词-代词相关性，并证明它们可以用来构建语法隐藏变量的表示，范围越长，变量越深。此外，有限的训练集将相关性的解析限制在一个有效范围内，而这个范围的大小会随着训练集的增大而增大。因此，在越来越多的示例中训练出来的语言模型可以建立语法结构的更深表征，从而在问题维度很高的情况下仍能达到很好的性能。我们推测，训练集大小与相关性有效范围之间的关系并不局限于我们的合成数据集。特别是，我们的猜想预测了测试损失行为随训练集大小的缩放规律如何取决于上下文窗口的长度。

{"title":"Towards a theory of how the structure of language is acquired by deep neural networks","authors":"Francesco Cagnetta, Matthieu Wyart","doi":"arxiv-2406.00048","DOIUrl":"https://doi.org/arxiv-2406.00048","url":null,"abstract":"How much data is required to learn the structure of a language via next-token\u0000prediction? We study this question for synthetic datasets generated via a\u0000Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model\u0000that captures the tree-like structure of natural languages. We determine\u0000token-token correlations analytically in our model and show that they can be\u0000used to build a representation of the grammar's hidden variables, the longer\u0000the range the deeper the variable. In addition, a finite training set limits\u0000the resolution of correlations to an effective range, whose size grows with\u0000that of the training set. As a result, a Language Model trained with\u0000increasingly many examples can build a deeper representation of the grammar's\u0000structure, thus reaching good performance despite the high dimensionality of\u0000the problem. We conjecture that the relationship between training set size and\u0000effective range of correlations holds beyond our synthetic datasets. In\u0000particular, our conjecture predicts how the scaling law for the test loss\u0000behaviour with training set size depends on the length of the context window,\u0000which we confirm empirically for a collection of lines from Shakespeare's\u0000plays.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141257270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fundamental limits of weak learnability in high-dimensional multi-index models 高维多指数模型中弱可学性的基本限制

arXiv - PHYS - Disordered Systems and Neural Networks

Pub Date : 2024-05-24 DOI: arxiv-2405.15480

Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala