首页 > 最新文献

arXiv - PHYS - Disordered Systems and Neural Networks最新文献

英文 中文
Critical Phase Transition in a Large Language Model 大型语言模型中的临界相变
Pub Date : 2024-06-08 DOI: arxiv-2406.05335
Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima
The performance of large language models (LLMs) strongly depends on thetextit{temperature} parameter. Empirically, at very low temperatures, LLMsgenerate sentences with clear repetitive structures, while at very hightemperatures, generated sentences are often incomprehensible. In this study,using GPT-2, we numerically demonstrate that the difference between the tworegimes is not just a smooth change but a phase transition with singular,divergent statistical quantities. Our extensive analysis shows that criticalbehaviors, such as a power-law decay of correlation in a text, emerge in theLLM at the transition temperature as well as in a natural language dataset. Wealso discuss that several statistical quantities characterizing the criticalityshould be useful to evaluate the performance of LLMs.
大型语言模型(LLMs)的性能在很大程度上取决于(textit{temperature})参数。根据经验,在极低的温度下,大语言模型生成的句子具有清晰的重复结构,而在极高的温度下,生成的句子往往难以理解。在本研究中,我们使用 GPT-2 用数值证明了这两种状态之间的差异不仅仅是平滑的变化,而是具有奇异、发散统计量的相变。我们的大量分析表明,在过渡温度下,LLM 和自然语言数据集中都出现了临界行为,如文本中相关性的幂律衰减。我们还讨论了表征临界值的几个统计量,它们应该有助于评估 LLM 的性能。
{"title":"Critical Phase Transition in a Large Language Model","authors":"Kai Nakaishi, Yoshihiko Nishikawa, Koji Hukushima","doi":"arxiv-2406.05335","DOIUrl":"https://doi.org/arxiv-2406.05335","url":null,"abstract":"The performance of large language models (LLMs) strongly depends on the\u0000textit{temperature} parameter. Empirically, at very low temperatures, LLMs\u0000generate sentences with clear repetitive structures, while at very high\u0000temperatures, generated sentences are often incomprehensible. In this study,\u0000using GPT-2, we numerically demonstrate that the difference between the two\u0000regimes is not just a smooth change but a phase transition with singular,\u0000divergent statistical quantities. Our extensive analysis shows that critical\u0000behaviors, such as a power-law decay of correlation in a text, emerge in the\u0000LLM at the transition temperature as well as in a natural language dataset. We\u0000also discuss that several statistical quantities characterizing the criticality\u0000should be useful to evaluate the performance of LLMs.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine 高通用性 FPGA 实现的网络相干等效机
Pub Date : 2024-06-08 DOI: arxiv-2406.05377
Toru Aonishi, Tatsuya Nagasawa, Toshiyuki Koizumi, Mastiyage Don Sudeera Hasaranga Gunathilaka, Kazushi Mimura, Masato Okada, Satoshi Kako, Yoshihisa Yamamoto
In recent years, quantum Ising machines have drawn a lot of attention, butdue to physical implementation constraints, it has been difficult to achievedense coupling, such as full coupling with sufficient spins to handle practicallarge-scale applications. Consequently, classically computable equations havebeen derived from quantum master equations for these quantum Ising machines.Parallel implementations of these algorithms using FPGAs have been used torapidly find solutions to these problems on a scale that is difficult toachieve in physical systems. We have developed an FPGA implemented cybercoherent Ising machine (cyber CIM) that is much more versatile than previousimplementations using FPGAs. Our architecture is versatile since it can beapplied to the open-loop CIM, which was proposed when CIM research began, tothe closed-loop CIM, which has been used recently, as well as to Jacobisuccessive over-relaxation method. By modifying the sequence control code forthe calculation control module, other algorithms such as Simulated Bifurcation(SB) can also be implemented. Earlier research on large-scale FPGAimplementations of SB and CIM used binary or ternary discrete values forconnections, whereas the cyber CIM used FP32 values. Also, the cyber CIMutilized Zeeman terms that were represented as FP32, which were not present inother large-scale FPGA systems. Our implementation with continuous interactionrealizes N=4096 on a single FPGA, comparable to the single-FPGA implementationof SB with binary interactions, with N=4096. The cyber CIM enables applicationssuch as CDMA multi-user detector and L0 compressed sensing which were notpossible with earlier FPGA systems, while enabling superior calculation speeds,more than ten times faster than a GPU implementation. The calculation speed canbe further improved by increasing parallelism, such as through clustering.
近年来,量子伊辛机引起了广泛关注,但由于物理实现方面的限制,量子伊辛机很难实现紧密耦合,如具有足够自旋的完全耦合,以处理实际的大规模应用。因此,人们从量子主方程中推导出了这些量子伊辛机的经典可计算方程。使用 FPGA 并行执行这些算法,可以快速找到解决这些问题的方法,而物理系统很难实现这种规模。我们开发了一种 FPGA 实现的网络相干伊兴机(cyber CIM),它比以前使用 FPGA 实现的网络相干伊兴机更具通用性。我们的架构具有多功能性,因为它既可以应用于 CIM 研究开始时提出的开环 CIM,也可以应用于最近使用的闭环 CIM,还可以应用于雅各布连续超松弛法。通过修改计算控制模块的序列控制代码,还可以实现模拟分岔(SB)等其他算法。早期关于 SB 和 CIM 的大规模 FPGA 实现的研究使用二进制或三元离散值进行连接,而网络 CIM 使用的是 FP32 值。此外,网络 CIM 还利用了以 FP32 表示的泽曼项,这在其他大规模 FPGA 系统中是不存在的。我们在单个 FPGA 上实现了 N=4096 的连续交互,与单个 FPGA 实现二进制交互的 SB(N=4096)相当。网络 CIM 使 CDMA 多用户检测器和 L0 压缩传感等应用成为可能,而这些应用在早期的 FPGA 系统上是不可能实现的。通过增加并行性(如通过聚类),计算速度还能进一步提高。
{"title":"Highly Versatile FPGA-Implemented Cyber Coherent Ising Machine","authors":"Toru Aonishi, Tatsuya Nagasawa, Toshiyuki Koizumi, Mastiyage Don Sudeera Hasaranga Gunathilaka, Kazushi Mimura, Masato Okada, Satoshi Kako, Yoshihisa Yamamoto","doi":"arxiv-2406.05377","DOIUrl":"https://doi.org/arxiv-2406.05377","url":null,"abstract":"In recent years, quantum Ising machines have drawn a lot of attention, but\u0000due to physical implementation constraints, it has been difficult to achieve\u0000dense coupling, such as full coupling with sufficient spins to handle practical\u0000large-scale applications. Consequently, classically computable equations have\u0000been derived from quantum master equations for these quantum Ising machines.\u0000Parallel implementations of these algorithms using FPGAs have been used to\u0000rapidly find solutions to these problems on a scale that is difficult to\u0000achieve in physical systems. We have developed an FPGA implemented cyber\u0000coherent Ising machine (cyber CIM) that is much more versatile than previous\u0000implementations using FPGAs. Our architecture is versatile since it can be\u0000applied to the open-loop CIM, which was proposed when CIM research began, to\u0000the closed-loop CIM, which has been used recently, as well as to Jacobi\u0000successive over-relaxation method. By modifying the sequence control code for\u0000the calculation control module, other algorithms such as Simulated Bifurcation\u0000(SB) can also be implemented. Earlier research on large-scale FPGA\u0000implementations of SB and CIM used binary or ternary discrete values for\u0000connections, whereas the cyber CIM used FP32 values. Also, the cyber CIM\u0000utilized Zeeman terms that were represented as FP32, which were not present in\u0000other large-scale FPGA systems. Our implementation with continuous interaction\u0000realizes N=4096 on a single FPGA, comparable to the single-FPGA implementation\u0000of SB with binary interactions, with N=4096. The cyber CIM enables applications\u0000such as CDMA multi-user detector and L0 compressed sensing which were not\u0000possible with earlier FPGA systems, while enabling superior calculation speeds,\u0000more than ten times faster than a GPU implementation. The calculation speed can\u0000be further improved by increasing parallelism, such as through clustering.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reconsideration of optimization for reduction of traffic congestion 重新考虑减少交通拥堵的优化方案
Pub Date : 2024-06-08 DOI: arxiv-2406.05448
Masayuki Ohzeki
One of the most impressive applications of a quantum annealer was optimizinga group of Volkswagen to reduce traffic congestion using a D-Wave system. Asimple formulation of a quadratic term was proposed to reduce trafficcongestion. This quadratic term was useful for determining the shortest routesamong several candidates. The original formulation produced decreases in thetotal lengths of car tours and traffic congestion. In this study, wereformulated the cost function with the sole focus on reducing trafficcongestion. We then found a unique cost function for expressing the quadraticfunction with a dead zone and an inequality constraint.
量子退火器最令人印象深刻的应用之一是利用 D-Wave 系统优化一组大众汽车,以减少交通拥堵。为了减少交通拥堵,我们提出了一个简单的二次项公式。该二次项有助于在多个候选路径中确定最短路径。最初的公式减少了汽车旅行的总长度和交通拥堵。在这项研究中,我们制定的成本函数只关注减少交通拥堵。然后,我们找到了一个唯一的成本函数,用于表达带有死区和不等式约束的二次函数。
{"title":"Reconsideration of optimization for reduction of traffic congestion","authors":"Masayuki Ohzeki","doi":"arxiv-2406.05448","DOIUrl":"https://doi.org/arxiv-2406.05448","url":null,"abstract":"One of the most impressive applications of a quantum annealer was optimizing\u0000a group of Volkswagen to reduce traffic congestion using a D-Wave system. A\u0000simple formulation of a quadratic term was proposed to reduce traffic\u0000congestion. This quadratic term was useful for determining the shortest routes\u0000among several candidates. The original formulation produced decreases in the\u0000total lengths of car tours and traffic congestion. In this study, we\u0000reformulated the cost function with the sole focus on reducing traffic\u0000congestion. We then found a unique cost function for expressing the quadratic\u0000function with a dead zone and an inequality constraint.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unified one-parameter scaling function for Anderson localization transitions in non-reciprocal non-Hermitian systems 非互惠非ermitian 系统中安德森定位转换的统一单参数缩放函数
Pub Date : 2024-06-04 DOI: arxiv-2406.01984
C. Wang, Wenxue He, X. R. Wang, Hechen Ren
By using dimensionless conductances as scaling variables, the conventionalone-parameter scaling theory of localization fails for non-reciprocalnon-Hermitian systems such as the Hanato-Nelson model. Here, we propose aone-parameter scaling function using the participation ratio as the scalingvariable. Employing a highly accurate numerical procedure based on exactdiagonalization, we demonstrate that this one-parameter scaling function candescribe Anderson localization transitions of non-reciprocal non-Hermitiansystems in one and two dimensions of symmetry classes AI and A. The criticalexponents of correlation lengths depend on symmetries and dimensionality only,a typical feature of universality. Moreover, we derive a complex-gap equationbased on the self-consistent Born approximation that can determine the disorderat which the point gap closes. The obtained disorders match perfectly thecritical disorders of Anderson localization transitions from the one-parameterscaling function. Finally, we show that the one-parameter scaling function isalso valid for Anderson localization transitions in reciprocal non-Hermitiansystems such as two-dimensional class AII$^dagger$ and can, thus, serve as aunified scaling function for disordered non-Hermitian systems.
通过使用无量纲电导作为缩放变量,传统的一参数局部化缩放理论对于诸如哈纳托-纳尔逊模型这样的非互易非赫米提系统是失效的。在此,我们提出了一种使用参与比作为缩放变量的单参数缩放函数。通过基于精确对角的高精度数值计算过程,我们证明了这个一参数缩放函数可以描述对称类 AI 和 A 的一维和二维非互惠非ermitian 系统的安德森定位转换。此外,我们还推导出基于自洽玻恩近似的复隙方程,它可以确定点隙关闭时的无序度。得到的无序度与单参数缩放函数中安德森局域化转换的临界无序度完全吻合。最后,我们证明了单参数缩放函数对于对等非赫米提系统(如二维 AII 类$^dagger$)中的安德森定位转换也是有效的,因此可以作为无序非赫米提系统的统一缩放函数。
{"title":"Unified one-parameter scaling function for Anderson localization transitions in non-reciprocal non-Hermitian systems","authors":"C. Wang, Wenxue He, X. R. Wang, Hechen Ren","doi":"arxiv-2406.01984","DOIUrl":"https://doi.org/arxiv-2406.01984","url":null,"abstract":"By using dimensionless conductances as scaling variables, the conventional\u0000one-parameter scaling theory of localization fails for non-reciprocal\u0000non-Hermitian systems such as the Hanato-Nelson model. Here, we propose a\u0000one-parameter scaling function using the participation ratio as the scaling\u0000variable. Employing a highly accurate numerical procedure based on exact\u0000diagonalization, we demonstrate that this one-parameter scaling function can\u0000describe Anderson localization transitions of non-reciprocal non-Hermitian\u0000systems in one and two dimensions of symmetry classes AI and A. The critical\u0000exponents of correlation lengths depend on symmetries and dimensionality only,\u0000a typical feature of universality. Moreover, we derive a complex-gap equation\u0000based on the self-consistent Born approximation that can determine the disorder\u0000at which the point gap closes. The obtained disorders match perfectly the\u0000critical disorders of Anderson localization transitions from the one-parameter\u0000scaling function. Finally, we show that the one-parameter scaling function is\u0000also valid for Anderson localization transitions in reciprocal non-Hermitian\u0000systems such as two-dimensional class AII$^dagger$ and can, thus, serve as a\u0000unified scaling function for disordered non-Hermitian systems.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141257803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype Analysis in Hopfield Networks with Hebbian Learning 采用 Hebbian 学习的 Hopfield 网络中的原型分析
Pub Date : 2024-05-29 DOI: arxiv-2407.03342
Hayden McAlister, Anthony Robins, Lech Szymanski
We discuss prototype formation in the Hopfield network. Typically, Hebbianlearning with highly correlated states leads to degraded memory performance. Weshow this type of learning can lead to prototype formation, where unlearnedstates emerge as representatives of large correlated subsets of states,alleviating capacity woes. This process has similarities to prototype learningin human cognition. We provide a substantial literature review of prototypelearning in associative memories, covering contributions from psychology,statistical physics, and computer science. We analyze prototype formation froma theoretical perspective and derive a stability condition for these statesbased on the number of examples of the prototype presented for learning, thenoise in those examples, and the number of non-example states presented. Thestability condition is used to construct a probability of stability for aprototype state as the factors of stability change. We also note similaritiesto traditional network analysis, allowing us to find a prototype capacity. Wecorroborate these expectations of prototype formation with experiments using asimple Hopfield network with standard Hebbian learning. We extend ourexperiments to a Hopfield network trained on data with multiple prototypes andfind the network is capable of stabilizing multiple prototypes concurrently. Wemeasure the basins of attraction of the multiple prototype states, findingattractor strength grows with the number of examples and the agreement ofexamples. We link the stability and dominance of prototype states to the energyprofile of these states, particularly when comparing the profile shape totarget states or other spurious states.
我们讨论了 Hopfield 网络中的原型形成。通常,具有高度相关状态的希比安学习会导致记忆性能下降。在这种情况下,未学习的状态会作为大相关状态子集的代表出现,从而缓解容量问题。这一过程与人类认知中的原型学习有相似之处。我们对联想记忆中的原型学习进行了大量的文献综述,涉及心理学、统计物理学和计算机科学等领域。我们从理论角度分析了原型的形成,并根据为学习而呈现的原型示例的数量、这些示例中的噪声以及呈现的非示例状态的数量,推导出了这些状态的稳定条件。随着稳定因素的变化,稳定条件被用来构建原型状态的稳定概率。我们还注意到与传统网络分析的相似性,这使我们能够找到原型容量。我们通过使用标准海比学习的简单 Hopfield 网络进行实验,证实了对原型形成的这些预期。我们将实验扩展到在具有多个原型的数据上训练的 Hopfield 网络,发现该网络能够同时稳定多个原型。我们测量了多个原型状态的吸引盆地,发现吸引器强度会随着示例数量和示例一致性的增加而增加。我们将原型态的稳定性和主导性与这些态的能量剖面联系起来,特别是在将剖面形状与目标态或其他虚假态进行比较时。
{"title":"Prototype Analysis in Hopfield Networks with Hebbian Learning","authors":"Hayden McAlister, Anthony Robins, Lech Szymanski","doi":"arxiv-2407.03342","DOIUrl":"https://doi.org/arxiv-2407.03342","url":null,"abstract":"We discuss prototype formation in the Hopfield network. Typically, Hebbian\u0000learning with highly correlated states leads to degraded memory performance. We\u0000show this type of learning can lead to prototype formation, where unlearned\u0000states emerge as representatives of large correlated subsets of states,\u0000alleviating capacity woes. This process has similarities to prototype learning\u0000in human cognition. We provide a substantial literature review of prototype\u0000learning in associative memories, covering contributions from psychology,\u0000statistical physics, and computer science. We analyze prototype formation from\u0000a theoretical perspective and derive a stability condition for these states\u0000based on the number of examples of the prototype presented for learning, the\u0000noise in those examples, and the number of non-example states presented. The\u0000stability condition is used to construct a probability of stability for a\u0000prototype state as the factors of stability change. We also note similarities\u0000to traditional network analysis, allowing us to find a prototype capacity. We\u0000corroborate these expectations of prototype formation with experiments using a\u0000simple Hopfield network with standard Hebbian learning. We extend our\u0000experiments to a Hopfield network trained on data with multiple prototypes and\u0000find the network is capable of stabilizing multiple prototypes concurrently. We\u0000measure the basins of attraction of the multiple prototype states, finding\u0000attractor strength grows with the number of examples and the agreement of\u0000examples. We link the stability and dominance of prototype states to the energy\u0000profile of these states, particularly when comparing the profile shape to\u0000target states or other spurious states.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards a theory of how the structure of language is acquired by deep neural networks 关于深度神经网络如何获得语言结构的理论研究
Pub Date : 2024-05-28 DOI: arxiv-2406.00048
Francesco Cagnetta, Matthieu Wyart
How much data is required to learn the structure of a language via next-tokenprediction? We study this question for synthetic datasets generated via aProbabilistic Context-Free Grammar (PCFG) -- a hierarchical generative modelthat captures the tree-like structure of natural languages. We determinetoken-token correlations analytically in our model and show that they can beused to build a representation of the grammar's hidden variables, the longerthe range the deeper the variable. In addition, a finite training set limitsthe resolution of correlations to an effective range, whose size grows withthat of the training set. As a result, a Language Model trained withincreasingly many examples can build a deeper representation of the grammar'sstructure, thus reaching good performance despite the high dimensionality ofthe problem. We conjecture that the relationship between training set size andeffective range of correlations holds beyond our synthetic datasets. Inparticular, our conjecture predicts how the scaling law for the test lossbehaviour with training set size depends on the length of the context window,which we confirm empirically for a collection of lines from Shakespeare'splays.
通过下一个标记预测学习语言结构需要多少数据?我们针对通过概率自由上下文语法 (PCFG) 生成的合成数据集研究了这个问题,PCFG 是一种分层生成模型,可以捕捉自然语言的树状结构。我们通过分析确定了模型中的代词-代词相关性,并证明它们可以用来构建语法隐藏变量的表示,范围越长,变量越深。此外,有限的训练集将相关性的解析限制在一个有效范围内,而这个范围的大小会随着训练集的增大而增大。因此,在越来越多的示例中训练出来的语言模型可以建立语法结构的更深表征,从而在问题维度很高的情况下仍能达到很好的性能。我们推测,训练集大小与相关性有效范围之间的关系并不局限于我们的合成数据集。特别是,我们的猜想预测了测试损失行为随训练集大小的缩放规律如何取决于上下文窗口的长度。
{"title":"Towards a theory of how the structure of language is acquired by deep neural networks","authors":"Francesco Cagnetta, Matthieu Wyart","doi":"arxiv-2406.00048","DOIUrl":"https://doi.org/arxiv-2406.00048","url":null,"abstract":"How much data is required to learn the structure of a language via next-token\u0000prediction? We study this question for synthetic datasets generated via a\u0000Probabilistic Context-Free Grammar (PCFG) -- a hierarchical generative model\u0000that captures the tree-like structure of natural languages. We determine\u0000token-token correlations analytically in our model and show that they can be\u0000used to build a representation of the grammar's hidden variables, the longer\u0000the range the deeper the variable. In addition, a finite training set limits\u0000the resolution of correlations to an effective range, whose size grows with\u0000that of the training set. As a result, a Language Model trained with\u0000increasingly many examples can build a deeper representation of the grammar's\u0000structure, thus reaching good performance despite the high dimensionality of\u0000the problem. We conjecture that the relationship between training set size and\u0000effective range of correlations holds beyond our synthetic datasets. In\u0000particular, our conjecture predicts how the scaling law for the test loss\u0000behaviour with training set size depends on the length of the context window,\u0000which we confirm empirically for a collection of lines from Shakespeare's\u0000plays.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141257270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fundamental limits of weak learnability in high-dimensional multi-index models 高维多指数模型中弱可学性的基本限制
Pub Date : 2024-05-24 DOI: arxiv-2405.15480
Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala
Multi-index models -- functions which only depend on the covariates through anon-linear transformation of their projection on a subspace -- are a usefulbenchmark for investigating feature learning with neural networks. This paperexamines the theoretical boundaries of learnability in this hypothesis class,focusing particularly on the minimum sample complexity required for weaklyrecovering their low-dimensional structure with first-order iterativealgorithms, in the high-dimensional regime where the number of samples is$n=alpha d$ is proportional to the covariate dimension $d$. Our findingsunfold in three parts: (i) first, we identify under which conditions atextit{trivial subspace} can be learned with a single step of a first-orderalgorithm for any $alpha!>!0$; (ii) second, in the case where the trivialsubspace is empty, we provide necessary and sufficient conditions for theexistence of an {it easy subspace} consisting of directions that can belearned only above a certain sample complexity $alpha!>!alpha_c$. Thecritical threshold $alpha_{c}$ marks the presence of a computational phasetransition, in the sense that no efficient iterative algorithm can succeed for$alpha!
多指数模型是研究神经网络特征学习的一个有用基准,多指数模型是指只通过子空间投影的非线性变换依赖于协变量的函数。本论文研究了这一假设类别的可学习性理论边界,尤其关注在样本数为$n=α d$与协变量维度$d$成正比的高维条件下,用一阶迭代算法弱恢复其低维结构所需的最小样本复杂度。我们的发现分为三个部分:(i)首先,我们确定了在哪些条件下,对于任意 $alpha!>!0$ 的一阶算法可以通过一步学习到一个{textit{trivial子空间};(ii)其次,在trivial子空间为空的情况下,我们为{it easy子空间}的存在提供了必要条件和充分条件,这个{it easy子空间}由只能在一定的样本复杂度 $alpha!>!alpha_c$ 以上才能学习到的方向组成。临界阈值$alpha_{c}$标志着计算阶段性转换的存在,从这个意义上说,对于$alpha!
{"title":"Fundamental limits of weak learnability in high-dimensional multi-index models","authors":"Emanuele Troiani, Yatin Dandi, Leonardo Defilippis, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala","doi":"arxiv-2405.15480","DOIUrl":"https://doi.org/arxiv-2405.15480","url":null,"abstract":"Multi-index models -- functions which only depend on the covariates through a\u0000non-linear transformation of their projection on a subspace -- are a useful\u0000benchmark for investigating feature learning with neural networks. This paper\u0000examines the theoretical boundaries of learnability in this hypothesis class,\u0000focusing particularly on the minimum sample complexity required for weakly\u0000recovering their low-dimensional structure with first-order iterative\u0000algorithms, in the high-dimensional regime where the number of samples is\u0000$n=alpha d$ is proportional to the covariate dimension $d$. Our findings\u0000unfold in three parts: (i) first, we identify under which conditions a\u0000textit{trivial subspace} can be learned with a single step of a first-order\u0000algorithm for any $alpha!>!0$; (ii) second, in the case where the trivial\u0000subspace is empty, we provide necessary and sufficient conditions for the\u0000existence of an {it easy subspace} consisting of directions that can be\u0000learned only above a certain sample complexity $alpha!>!alpha_c$. The\u0000critical threshold $alpha_{c}$ marks the presence of a computational phase\u0000transition, in the sense that no efficient iterative algorithm can succeed for\u0000$alpha!<!alpha_c$. In a limited but interesting set of really hard\u0000directions -- akin to the parity problem -- $alpha_c$ is found to diverge.\u0000Finally, (iii) we demonstrate that interactions between different directions\u0000can result in an intricate hierarchical learning phenomenon, where some\u0000directions can be learned sequentially when coupled to easier ones. Our\u0000analytical approach is built on the optimality of approximate message-passing\u0000algorithms among first-order iterative methods, delineating the fundamental\u0000learnability limit across a broad spectrum of algorithms, including neural\u0000networks trained with gradient descent.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hybrid scaling theory of localization transition in a non-Hermitian disorder Aubry-André model 非赫米提无序奥布里-安德烈模型中局部化转变的混合缩放理论
Pub Date : 2024-05-24 DOI: arxiv-2405.15220
Yue-Mei Sun, Xin-Yu Wang, Zi-Kang Wang, Liang-Jun Zhai
In this paper, we study the critical behaviors in the non-Hermtian disorderAubry-Andr'{e} (DAA) model, and we assume the non-Hermiticity is introduced bythe nonreciprocal hopping. We employ the localization length $xi$, the inverseparticipation ratio ($rm IPR$), and the real part of the energy gap betweenthe first excited state and the ground state $Delta E$ as the characterquantities to describe the critical properties of the localization transition.By preforming the scaling analysis, the critical exponents of the non-HermitianAnderson model and the non-Hermitian DAA model are obtained, and these criticalexponents are different from their Hermitian counterparts, indicating theHermitian and non-Hermitian disorder and DAA models belong to differentuniverse classes. The critical exponents of non-Hermitian DAA model areremarkably different from both the pure non-Hermitian AA model and thenon-Hermitian Anderson model, showing that disorder is a independent relevantdirection at the non-Hermitian AA model. We further propose a hybrid scalingtheory to describe the critical behavior in the overlapping critical regionconstituted by the critical regions of non-Hermitian DAA model and thenon-Hermitian Anderson localization transition.
本文研究了非赫米特无序奥布里-安德尔(DAA)模型中的临界行为,并假定非赫米特性是由非互跳引入的。我们采用局域化长度 $xi$、反参与比(inverseparticipation ratio)($rm IPR$)和第一激发态与基态之间能隙的实部 $Delta E$ 作为描述局域化转变临界特性的特征量。通过预缩放分析,得到了非赫米提安德森模型和非赫米提DAA模型的临界指数,这些临界指数与赫米提模型的临界指数不同,表明赫米提、非赫米提无序和DAA模型属于不同的宇宙类别。非ermitian DAA模型的临界指数与纯粹的非Hermitian AA模型和非Hermitian Anderson模型都有显著的不同,表明无序是非Hermitian AA模型的一个独立的相关方向。我们进一步提出了一种混合缩放理论来描述由非ermitian DAA 模型临界区和当时的非ermitian Anderson 局部转变临界区构成的重叠临界区的临界行为。
{"title":"Hybrid scaling theory of localization transition in a non-Hermitian disorder Aubry-André model","authors":"Yue-Mei Sun, Xin-Yu Wang, Zi-Kang Wang, Liang-Jun Zhai","doi":"arxiv-2405.15220","DOIUrl":"https://doi.org/arxiv-2405.15220","url":null,"abstract":"In this paper, we study the critical behaviors in the non-Hermtian disorder\u0000Aubry-Andr'{e} (DAA) model, and we assume the non-Hermiticity is introduced by\u0000the nonreciprocal hopping. We employ the localization length $xi$, the inverse\u0000participation ratio ($rm IPR$), and the real part of the energy gap between\u0000the first excited state and the ground state $Delta E$ as the character\u0000quantities to describe the critical properties of the localization transition.\u0000By preforming the scaling analysis, the critical exponents of the non-Hermitian\u0000Anderson model and the non-Hermitian DAA model are obtained, and these critical\u0000exponents are different from their Hermitian counterparts, indicating the\u0000Hermitian and non-Hermitian disorder and DAA models belong to different\u0000universe classes. The critical exponents of non-Hermitian DAA model are\u0000remarkably different from both the pure non-Hermitian AA model and the\u0000non-Hermitian Anderson model, showing that disorder is a independent relevant\u0000direction at the non-Hermitian AA model. We further propose a hybrid scaling\u0000theory to describe the critical behavior in the overlapping critical region\u0000constituted by the critical regions of non-Hermitian DAA model and the\u0000non-Hermitian Anderson localization transition.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141167690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quantum criticality of generalized Aubry-André models with exact mobility edges using fidelity susceptibility 利用保真易感性实现具有精确流动边缘的广义奥布里-安德烈模型的量子临界性
Pub Date : 2024-05-22 DOI: arxiv-2405.13282
Yu-Bin Liu, Wen-Yi Zhang, Tian-Cheng Yi, Liangsheng Li, Maoxin Liu, Wen-Long You
In this study, we explore the quantum critical phenomena in generalizedAubry-Andr'{e} models, with a particular focus on the scaling behavior atvarious filling states. Our approach involves using quantum fidelitysusceptibility to precisely identify the mobility edges in these systems.Through a finite-size scaling analysis of the fidelity susceptibility, we areable to determine both the correlation-length critical exponent and thedynamical critical exponent at the critical point of the generalizedAubry-Andr'{e} model. Based on the Diophantine equation conjecture, we candetermines the number of subsequences of the Fibonacci sequence and thecorresponding scaling functions for a specific filling fraction, as well as theuniversality class. Our findings demonstrate the effectiveness of employing thegeneralized fidelity susceptibility for the analysis of unconventional quantumcriticality and the associated universal information of quasiperiodic systemsin cutting-edge quantum simulation experiments.
在这项研究中,我们探讨了广义奥布里-安德尔(Aubry-Andr'{e})模型中的量子临界现象,尤其关注各种填充态的缩放行为。通过对保真度敏感性的有限大小缩放分析,我们可以确定广义奥布里-安德罗模型临界点的相关长度临界指数和动力学临界指数。基于 Diophantine 方程猜想,我们确定了特定填充分数的斐波那契序列子序列数和相应的缩放函数,以及普遍性类别。我们的研究结果证明,在前沿量子模拟实验中,利用广义保真度敏感性分析非常规量子临界性和相关准周期系统的普适信息是有效的。
{"title":"Quantum criticality of generalized Aubry-André models with exact mobility edges using fidelity susceptibility","authors":"Yu-Bin Liu, Wen-Yi Zhang, Tian-Cheng Yi, Liangsheng Li, Maoxin Liu, Wen-Long You","doi":"arxiv-2405.13282","DOIUrl":"https://doi.org/arxiv-2405.13282","url":null,"abstract":"In this study, we explore the quantum critical phenomena in generalized\u0000Aubry-Andr'{e} models, with a particular focus on the scaling behavior at\u0000various filling states. Our approach involves using quantum fidelity\u0000susceptibility to precisely identify the mobility edges in these systems.\u0000Through a finite-size scaling analysis of the fidelity susceptibility, we are\u0000able to determine both the correlation-length critical exponent and the\u0000dynamical critical exponent at the critical point of the generalized\u0000Aubry-Andr'{e} model. Based on the Diophantine equation conjecture, we can\u0000determines the number of subsequences of the Fibonacci sequence and the\u0000corresponding scaling functions for a specific filling fraction, as well as the\u0000universality class. Our findings demonstrate the effectiveness of employing the\u0000generalized fidelity susceptibility for the analysis of unconventional quantum\u0000criticality and the associated universal information of quasiperiodic systems\u0000in cutting-edge quantum simulation experiments.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141150701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Non-Hermitian diluted banded random matrices: Scaling of eigenfunction and spectral properties 非ermitian稀释带状随机矩阵:特征函数的缩放和频谱特性
Pub Date : 2024-05-21 DOI: arxiv-2406.15426
M. Hernández-Sánchez, G. Tapia-Labra, J. A. Mendez-Bermudez
Here we introduce the non-Hermitian diluted banded random matrix (nHdBRM)ensemble as the set of $Ntimes N$ real non-symmetric matrices whose entriesare independent Gaussian random variables with zero mean and variance one if$|i-j|
在此,我们将非ermitian稀释带状随机矩阵(nHdBRM)集合定义为:如果$|i-j|
{"title":"Non-Hermitian diluted banded random matrices: Scaling of eigenfunction and spectral properties","authors":"M. Hernández-Sánchez, G. Tapia-Labra, J. A. Mendez-Bermudez","doi":"arxiv-2406.15426","DOIUrl":"https://doi.org/arxiv-2406.15426","url":null,"abstract":"Here we introduce the non-Hermitian diluted banded random matrix (nHdBRM)\u0000ensemble as the set of $Ntimes N$ real non-symmetric matrices whose entries\u0000are independent Gaussian random variables with zero mean and variance one if\u0000$|i-j|<b$ and zero otherwise, moreover off-diagonal matrix elements within the\u0000bandwidth $b$ are randomly set to zero such that the sparsity $alpha$ is\u0000defined as the fraction of the $N(b-1)/2$ independent non-vanishing\u0000off-diagonal matrix elements. By means of a detailed numerical study we\u0000demonstrate that the eigenfunction and spectral properties of the nHdBRM\u0000ensemble scale with the parameter $x=gamma[(balpha)^2/N]^delta$, where\u0000$gamma,deltasim 1$. Moreover, the normalized localization length $beta$ of\u0000the eigenfunctions follows a simple scaling law: $beta = x/(1 + x)$. For\u0000comparison purposes, we also report eigenfunction and spectral properties of\u0000the Hermitian diluted banded random matrix ensemble.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - PHYS - Disordered Systems and Neural Networks
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1