Unification of symmetries inside neural networks: transformer, feedforward and neural ODE

IF 4.6 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine Learning Science and Technology Pub Date : 2024-06-26 DOI:10.1088/2632-2153/ad5927

Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai

{"title":"Unification of symmetries inside neural networks: transformer, feedforward and neural ODE","authors":"Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai","doi":"10.1088/2632-2153/ad5927","DOIUrl":null,"url":null,"abstract":"Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"2016 1","pages":""},"PeriodicalIF":4.6000,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad5927","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

神经网络内部对称性的统一：变压器、前馈和神经 ODE

了解神经网络（包括变压器）的内部工作原理仍然是机器学习领域最具挑战性的难题之一。本研究引入了一种新方法，将物理学中的一个重要概念--量规对称性原理应用于神经网络架构。通过将模型函数视为物理观测值，我们发现各种机器学习模型的参数冗余可以解释为规整对称。我们用数学方法表述了神经 ODE 中的参数冗余，并发现它们的规对称性是由时空差分变形给出的，而时空差分变形在爱因斯坦的万有引力理论中扮演着重要角色。将神经 ODE 视为前馈神经网络的连续版本，我们证明了前馈神经网络中的参数冗余确实可以提升为神经 ODE 中的差分同构。我们进一步将分析扩展到变压器模型，找到了与神经 ODE 及其量规对称性的自然对应关系。量规对称性的概念通过物理学揭示了深度学习模型的复杂行为，并为我们提供了分析各种机器学习架构的统一视角。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Machine Learning Science and Technology Computer Science-Artificial Intelligence

CiteScore

9.10

自引率

4.40%

发文量

审稿时长

5 weeks

期刊介绍： Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.