Silvio Kalaj, Clarissa Lauditi, Gabriele Perugini, Carlo Lucibello, Enrico M. Malatesta, Matteo Negri
It has been recently shown that a learning transition happens when a Hopfield Network stores examples generated as superpositions of random features, where new attractors corresponding to such features appear in the model. In this work we reveal that the network also develops attractors corresponding to previously unseen examples generated with the same set of features. We explain this surprising behaviour in terms of spurious states of the learned features: we argue that, increasing the number of stored examples beyond the learning transition, the model also learns to mix the features to represent both stored and previously unseen examples. We support this claim with the computation of the phase diagram of the model.
{"title":"Random Features Hopfield Networks generalize retrieval to previously unseen examples","authors":"Silvio Kalaj, Clarissa Lauditi, Gabriele Perugini, Carlo Lucibello, Enrico M. Malatesta, Matteo Negri","doi":"arxiv-2407.05658","DOIUrl":"https://doi.org/arxiv-2407.05658","url":null,"abstract":"It has been recently shown that a learning transition happens when a Hopfield\u0000Network stores examples generated as superpositions of random features, where\u0000new attractors corresponding to such features appear in the model. In this work\u0000we reveal that the network also develops attractors corresponding to previously\u0000unseen examples generated with the same set of features. We explain this\u0000surprising behaviour in terms of spurious states of the learned features: we\u0000argue that, increasing the number of stored examples beyond the learning\u0000transition, the model also learns to mix the features to represent both stored\u0000and previously unseen examples. We support this claim with the computation of\u0000the phase diagram of the model.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"87 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141569173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shan-Zhong Li, Enhong Cheng, Shi-Liang Zhu, Zhi Li
The Lyapunov exponent, serving as an indicator of the localized state, is commonly utilized to identify localization transitions in disordered systems. In non-Hermitian quasicrystals, the non-Hermitian effect induced by non-reciprocal hopping can lead to the manifestation of two distinct Lyapunov exponents on opposite sides of the localization center. Building on this observation, we here introduce a comprehensive approach for examining the localization characteristics and mobility edges of non-reciprocal quasicrystals, referred to as asymmetric transfer matrix analysis. We demonstrate the application of this method to three specific scenarios: the non-reciprocal Aubry-Andr'e model, the non-reciprocal off-diagonal Aubry-Andr'e model, and the non-reciprocal mosaic quasicrystals. This work may contribute valuable insights to the investigation of non-Hermitian quasicrystal and disordered systems.
{"title":"Asymmetric transfer matrix analysis of Lyapunov exponents in one-dimensional non-reciprocal quasicrystals","authors":"Shan-Zhong Li, Enhong Cheng, Shi-Liang Zhu, Zhi Li","doi":"arxiv-2407.01372","DOIUrl":"https://doi.org/arxiv-2407.01372","url":null,"abstract":"The Lyapunov exponent, serving as an indicator of the localized state, is\u0000commonly utilized to identify localization transitions in disordered systems.\u0000In non-Hermitian quasicrystals, the non-Hermitian effect induced by\u0000non-reciprocal hopping can lead to the manifestation of two distinct Lyapunov\u0000exponents on opposite sides of the localization center. Building on this\u0000observation, we here introduce a comprehensive approach for examining the\u0000localization characteristics and mobility edges of non-reciprocal\u0000quasicrystals, referred to as asymmetric transfer matrix analysis. We\u0000demonstrate the application of this method to three specific scenarios: the\u0000non-reciprocal Aubry-Andr'e model, the non-reciprocal off-diagonal\u0000Aubry-Andr'e model, and the non-reciprocal mosaic quasicrystals. This work may\u0000contribute valuable insights to the investigation of non-Hermitian quasicrystal\u0000and disordered systems.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
After the groundbreaking work of Erd$ddot{o}$s-R$acute{e}$nyi random graph, the random networks has made great progress in recent years. One of the eye-catching modeling is time-varying random network model capable of encoding the instantaneous time description of the network dynamics. To further describe the random duration time for the nodes to be inactive, we herein propose a dinner party anomalous random networks model, and derive the analytical solution of the probability density function for the node being active at a given time. Moreover, we investigate the gift delivery and viral transmission in dinner party random networks. This work provides new quantitative insights in describing random networks, and could help model other uncertainty phenomena in real networks.
{"title":"Anomalous random networks","authors":"Hong Zhang, Guohua Li","doi":"arxiv-2406.18882","DOIUrl":"https://doi.org/arxiv-2406.18882","url":null,"abstract":"After the groundbreaking work of Erd$ddot{o}$s-R$acute{e}$nyi random graph,\u0000the random networks has made great progress in recent years. One of the\u0000eye-catching modeling is time-varying random network model capable of encoding\u0000the instantaneous time description of the network dynamics. To further describe\u0000the random duration time for the nodes to be inactive, we herein propose a\u0000dinner party anomalous random networks model, and derive the analytical\u0000solution of the probability density function for the node being active at a\u0000given time. Moreover, we investigate the gift delivery and viral transmission\u0000in dinner party random networks. This work provides new quantitative insights\u0000in describing random networks, and could help model other uncertainty phenomena\u0000in real networks.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141529214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we propose an open anomalous semi-Markovian random neural networks model with negative and positive signals with arbitrary random waiting times. We investigate the signal flow process in the anomalous random neural networks based on renewal process, and obtain the corresponding master equation for time evolution of the probability of the potential of the neurons. As examples, we discuss the special cases of exponential waiting times and power law ones, and find the fractional memory effect of the probability of the system state on its history evolution. Besides, the closed random neural networks model is introduced and the corresponding rate equation is given.
{"title":"Anomalous Random Neural Networks: a Special Renewal Process","authors":"Hong Zhang, Guohua Li","doi":"arxiv-2406.18877","DOIUrl":"https://doi.org/arxiv-2406.18877","url":null,"abstract":"In this paper we propose an open anomalous semi-Markovian random neural\u0000networks model with negative and positive signals with arbitrary random waiting\u0000times. We investigate the signal flow process in the anomalous random neural\u0000networks based on renewal process, and obtain the corresponding master equation\u0000for time evolution of the probability of the potential of the neurons. As\u0000examples, we discuss the special cases of exponential waiting times and power\u0000law ones, and find the fractional memory effect of the probability of the\u0000system state on its history evolution. Besides, the closed random neural\u0000networks model is introduced and the corresponding rate equation is given.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Glass has long been considered a nonequilibrium material. The primary reason is its history-dependent properties: the obtained properties are not uniquely determined by two state variables alone, namely, temperature and volume, but are affected by the process parameters, such as cooling rates. However, closer observations show that this history dependence is common in solid; in crystal growth, the properties of an obtained crystal are affected by the preparation conditions through defect structures and metallurgical structures. The problem with the previous reasoning of history dependence lies in the lack of appropriate specification of state variables. Without knowledge of the latter, describing thermodynamic states is impossible. The guiding principle to find state variables is provided by the first law of thermodynamics. The state variables of solids have been searched by requiring that the internal energy $U$ is a state function. Detailed information about the abovementioned microstructures is needed to describe the state function $U$. This can be accomplished by specifying the time-averaged positions R_{j} of all atoms comprising the solids. Therefore, R_{j} is a state variable for solids. Defect states, being metastable states, represent equilibrium states within a finite time (relaxation time). However, eternal equilibrium is nonexistent: the perfect crystal is thermodynamically unstable. Equilibrium states can only be considered at the local level. Glass is thus in equilibrium as long as its structure does not change. The relaxation time is controlled by the energy barriers by which a structure is sustained, and this time restriction is intimately related to the definition of state variables. The most important property of state variables is their invariance to time averaging. The time-averaged quantity R_{j} meets this invariance property.
{"title":"Revisiting nonequilibrium characterization of glass: History dependence in solids","authors":"Koun Shirai","doi":"arxiv-2406.15726","DOIUrl":"https://doi.org/arxiv-2406.15726","url":null,"abstract":"Glass has long been considered a nonequilibrium material. The primary reason\u0000is its history-dependent properties: the obtained properties are not uniquely\u0000determined by two state variables alone, namely, temperature and volume, but\u0000are affected by the process parameters, such as cooling rates. However, closer\u0000observations show that this history dependence is common in solid; in crystal\u0000growth, the properties of an obtained crystal are affected by the preparation\u0000conditions through defect structures and metallurgical structures. The problem\u0000with the previous reasoning of history dependence lies in the lack of\u0000appropriate specification of state variables. Without knowledge of the latter,\u0000describing thermodynamic states is impossible. The guiding principle to find\u0000state variables is provided by the first law of thermodynamics. The state\u0000variables of solids have been searched by requiring that the internal energy\u0000$U$ is a state function. Detailed information about the abovementioned\u0000microstructures is needed to describe the state function $U$. This can be\u0000accomplished by specifying the time-averaged positions R_{j} of all atoms\u0000comprising the solids. Therefore, R_{j} is a state variable for solids. Defect\u0000states, being metastable states, represent equilibrium states within a finite\u0000time (relaxation time). However, eternal equilibrium is nonexistent: the\u0000perfect crystal is thermodynamically unstable. Equilibrium states can only be\u0000considered at the local level. Glass is thus in equilibrium as long as its\u0000structure does not change. The relaxation time is controlled by the energy\u0000barriers by which a structure is sustained, and this time restriction is\u0000intimately related to the definition of state variables. The most important\u0000property of state variables is their invariance to time averaging. The\u0000time-averaged quantity R_{j} meets this invariance property.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"161 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hugo Tabanelli, Claudio Castelnovo, Antonio Štrkalj
We investigate the localisation properties of quasiperiodic tight-binding chains with hopping terms modulated by the interpolating Aubry-Andr'e-Fibonacci (IAAF) function. This off-diagonal IAAF model allows for a smooth and controllable interpolation between two paradigmatic quasiperiodic models: the Aubry-Andr'e and the Fibonacci model. Our analysis shows that the spectrum of this model can be divided into three principal bands, namely, two molecular bands at the edge of the spectrum and one atomic band in the middle, for all values of the interpolating parameter. We reveal that the states in the molecular bands undergo multiple re-entrant localisation transitions, a behaviour previously reported in the diagonal IAAF model. We link the emergence of these reentrant phenomena to symmetry points of the quasiperiodic modulation and, with that, explain the main ground state properties of the system. The atomic states in the middle band show no traces of localised phases and remain either extended or critical for any value of the interpolating parameter. Using a renormalisation group approach, adapted from the Fibonacci model, we explain the extended nature of the middle band. These findings expand our knowledge of phase transitions within quasiperiodic systems and highlight the interplay between extended, critical, and localised states.
{"title":"Reentrant localisation transitions and anomalous spectral properties in off-diagonal quasiperiodic systems","authors":"Hugo Tabanelli, Claudio Castelnovo, Antonio Štrkalj","doi":"arxiv-2406.14193","DOIUrl":"https://doi.org/arxiv-2406.14193","url":null,"abstract":"We investigate the localisation properties of quasiperiodic tight-binding\u0000chains with hopping terms modulated by the interpolating\u0000Aubry-Andr'e-Fibonacci (IAAF) function. This off-diagonal IAAF model allows\u0000for a smooth and controllable interpolation between two paradigmatic\u0000quasiperiodic models: the Aubry-Andr'e and the Fibonacci model. Our analysis\u0000shows that the spectrum of this model can be divided into three principal\u0000bands, namely, two molecular bands at the edge of the spectrum and one atomic\u0000band in the middle, for all values of the interpolating parameter. We reveal\u0000that the states in the molecular bands undergo multiple re-entrant localisation\u0000transitions, a behaviour previously reported in the diagonal IAAF model. We\u0000link the emergence of these reentrant phenomena to symmetry points of the\u0000quasiperiodic modulation and, with that, explain the main ground state\u0000properties of the system. The atomic states in the middle band show no traces\u0000of localised phases and remain either extended or critical for any value of the\u0000interpolating parameter. Using a renormalisation group approach, adapted from\u0000the Fibonacci model, we explain the extended nature of the middle band. These\u0000findings expand our knowledge of phase transitions within quasiperiodic systems\u0000and highlight the interplay between extended, critical, and localised states.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jonas F. Karcher, Sarang Gopalakrishnan, Mikael C. Rechtsman
The properties of semiconductors, insulators, and photonic crystals are defined by their electronic or photonic bands, and the gaps between them. When the material is disordered, Lifshitz tails appear: these are localized states that bifurcate from the band edge and act to effectively close the band gap. While Lifshitz tails are well understood when the disorder is spatially uncorrelated, there has been recent interest in the case of hyperuniform disorder, i.e., when the disorder fluctuations are highly correlated and approach zero at long length scales. In this paper, we analytically solve the Lifshitz tail problem for hyperuniform systems using a path integral and instanton approach. We find the functional form of the density-of-states as a function of the energy difference from the band edge. We also examine the effect of hyperuniform disorder on the density of states of Weyl semimetals, which do not have a band gap.
{"title":"The effect of hyperuniform disorder on band gaps","authors":"Jonas F. Karcher, Sarang Gopalakrishnan, Mikael C. Rechtsman","doi":"arxiv-2406.11710","DOIUrl":"https://doi.org/arxiv-2406.11710","url":null,"abstract":"The properties of semiconductors, insulators, and photonic crystals are\u0000defined by their electronic or photonic bands, and the gaps between them. When\u0000the material is disordered, Lifshitz tails appear: these are localized states\u0000that bifurcate from the band edge and act to effectively close the band gap.\u0000While Lifshitz tails are well understood when the disorder is spatially\u0000uncorrelated, there has been recent interest in the case of hyperuniform\u0000disorder, i.e., when the disorder fluctuations are highly correlated and\u0000approach zero at long length scales. In this paper, we analytically solve the\u0000Lifshitz tail problem for hyperuniform systems using a path integral and\u0000instanton approach. We find the functional form of the density-of-states as a\u0000function of the energy difference from the band edge. We also examine the\u0000effect of hyperuniform disorder on the density of states of Weyl semimetals,\u0000which do not have a band gap.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a class of general non-Hermitian quasiperiodic lattice models with exponential hoppings and analytically determine the genuine complex mobility edges by solving its dual counterpart exactly utilizing Avila's global theory. Our analytical formula unveils that the complex mobility edges usually form a loop structure in the complex energy plane. By shifting the eigenenergy a constant $t$, the complex mobility edges of the family of models with different hopping parameter $t$ can be described by a unified formula, formally independent of $t$. By scanning the hopping parameter, we demonstrate the existence of a type of intriguing flagellate-like spectra in complex energy plane, in which the localized states and extended states are well separated by the complex mobility edges. Our result provides a firm ground for understanding the complex mobility edges in non-Hermitian quasiperiodic lattices.
{"title":"Exact complex mobility edges and flagellate spectra for non-Hermitian quasicrystals with exponential hoppings","authors":"Li Wang, Jiaqi Liu, Zhenbo Wang, Shu Chen","doi":"arxiv-2406.10769","DOIUrl":"https://doi.org/arxiv-2406.10769","url":null,"abstract":"We propose a class of general non-Hermitian quasiperiodic lattice models with\u0000exponential hoppings and analytically determine the genuine complex mobility\u0000edges by solving its dual counterpart exactly utilizing Avila's global theory.\u0000Our analytical formula unveils that the complex mobility edges usually form a\u0000loop structure in the complex energy plane. By shifting the eigenenergy a\u0000constant $t$, the complex mobility edges of the family of models with different\u0000hopping parameter $t$ can be described by a unified formula, formally\u0000independent of $t$. By scanning the hopping parameter, we demonstrate the\u0000existence of a type of intriguing flagellate-like spectra in complex energy\u0000plane, in which the localized states and extended states are well separated by\u0000the complex mobility edges. Our result provides a firm ground for understanding\u0000the complex mobility edges in non-Hermitian quasiperiodic lattices.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"93 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We study the many-body localization problem in the non-abelian SU(2)-invariant random anti-ferromagnetic exchange model in 1D. Exact and sparse matrix diagonalization methods are used to calculate eigenvalues and eigenvectors of the Hamiltonian matrix. We investigate the behaviour of the energy level gap-ratio statistic, participation ratio, entanglement entropy and the entanglement spectral parameter as a function of disorder strengths. Different distributions of random couplings are considered. We find, up to L = 18, a clear distinction between our non-abelian model and the more often studied random field Heisenberg model: the regime of seemingly localized behaviour is much less pronounced in the random exchange model than in the field model case.
我们研究了一维非阿贝尔SU(2)不变随机反铁磁交换模型中的多体定位问题。我们使用精确和稀疏矩阵对角化方法来计算哈密顿矩阵的特征值和特征向量。我们研究了能级间隙比统计量、参与比、纠缠熵和纠缠谱参数作为无序强度函数的行为。我们发现,在 L =18 以下,我们的非阿贝尔模型与更常被研究的随机场海森堡模型之间存在明显区别:随机交换模型中的看似局部行为的机制远没有场模型中的明显。
{"title":"Spectral and Entanglement Properties of the Random Exchange Heisenberg Chain","authors":"Yilun Gao, Rudolf A. Römer","doi":"arxiv-2406.09985","DOIUrl":"https://doi.org/arxiv-2406.09985","url":null,"abstract":"We study the many-body localization problem in the non-abelian\u0000SU(2)-invariant random anti-ferromagnetic exchange model in 1D. Exact and\u0000sparse matrix diagonalization methods are used to calculate eigenvalues and\u0000eigenvectors of the Hamiltonian matrix. We investigate the behaviour of the\u0000energy level gap-ratio statistic, participation ratio, entanglement entropy and\u0000the entanglement spectral parameter as a function of disorder strengths.\u0000Different distributions of random couplings are considered. We find, up to L =\u000018, a clear distinction between our non-abelian model and the more often\u0000studied random field Heisenberg model: the regime of seemingly localized\u0000behaviour is much less pronounced in the random exchange model than in the\u0000field model case.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"22 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141518824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformer-based models have demonstrated exceptional performance across diverse domains, becoming the state-of-the-art solution for addressing sequential machine learning problems. Even though we have a general understanding of the fundamental components in the transformer architecture, little is known about how they operate or what are their expected dynamics. Recently, there has been an increasing interest in exploring the relationship between attention mechanisms and Hopfield networks, promising to shed light on the statistical physics of transformer networks. However, to date, the dynamical regimes of transformer-like models have not been studied in depth. In this paper, we address this gap by using methods for the study of asymmetric Hopfield networks in nonequilibrium regimes --namely path integral methods over generating functionals, yielding dynamics governed by concurrent mean-field variables. Assuming 1-bit tokens and weights, we derive analytical approximations for the behavior of large self-attention neural networks coupled to a softmax output, which become exact in the large limit size. Our findings reveal nontrivial dynamical phenomena, including nonequilibrium phase transitions associated with chaotic bifurcations, even for very simple configurations with a few encoded features and a very short context window. Finally, we discuss the potential of our analytic approach to improve our understanding of the inner workings of transformer models, potentially reducing computational training costs and enhancing model interpretability.
{"title":"Dynamical Mean-Field Theory of Self-Attention Neural Networks","authors":"Ángel Poc-López, Miguel Aguilera","doi":"arxiv-2406.07247","DOIUrl":"https://doi.org/arxiv-2406.07247","url":null,"abstract":"Transformer-based models have demonstrated exceptional performance across\u0000diverse domains, becoming the state-of-the-art solution for addressing\u0000sequential machine learning problems. Even though we have a general\u0000understanding of the fundamental components in the transformer architecture,\u0000little is known about how they operate or what are their expected dynamics.\u0000Recently, there has been an increasing interest in exploring the relationship\u0000between attention mechanisms and Hopfield networks, promising to shed light on\u0000the statistical physics of transformer networks. However, to date, the\u0000dynamical regimes of transformer-like models have not been studied in depth. In\u0000this paper, we address this gap by using methods for the study of asymmetric\u0000Hopfield networks in nonequilibrium regimes --namely path integral methods over\u0000generating functionals, yielding dynamics governed by concurrent mean-field\u0000variables. Assuming 1-bit tokens and weights, we derive analytical\u0000approximations for the behavior of large self-attention neural networks coupled\u0000to a softmax output, which become exact in the large limit size. Our findings\u0000reveal nontrivial dynamical phenomena, including nonequilibrium phase\u0000transitions associated with chaotic bifurcations, even for very simple\u0000configurations with a few encoded features and a very short context window.\u0000Finally, we discuss the potential of our analytic approach to improve our\u0000understanding of the inner workings of transformer models, potentially reducing\u0000computational training costs and enhancing model interpretability.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"151 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141505074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}