{"title":"Stochastic weight matrix dynamics during learning and Dyson Brownian motion","authors":"Gert Aarts, Biagio Lucini, Chanju Park","doi":"arxiv-2407.16427","DOIUrl":null,"url":null,"abstract":"We demonstrate that the update of weight matrices in learning algorithms can\nbe described in the framework of Dyson Brownian motion, thereby inheriting many\nfeatures of random matrix theory. We relate the level of stochasticity to the\nratio of the learning rate and the mini-batch size, providing more robust\nevidence to a previously conjectured scaling relationship. We discuss universal\nand non-universal features in the resulting Coulomb gas distribution and\nidentify the Wigner surmise and Wigner semicircle explicitly in a\nteacher-student model and in the (near-)solvable case of the Gaussian\nrestricted Boltzmann machine.","PeriodicalId":501191,"journal":{"name":"arXiv - PHYS - High Energy Physics - Lattice","volume":"25 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - High Energy Physics - Lattice","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.16427","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We demonstrate that the update of weight matrices in learning algorithms can
be described in the framework of Dyson Brownian motion, thereby inheriting many
features of random matrix theory. We relate the level of stochasticity to the
ratio of the learning rate and the mini-batch size, providing more robust
evidence to a previously conjectured scaling relationship. We discuss universal
and non-universal features in the resulting Coulomb gas distribution and
identify the Wigner surmise and Wigner semicircle explicitly in a
teacher-student model and in the (near-)solvable case of the Gaussian
restricted Boltzmann machine.