Pub Date : 2024-07-10DOI: 10.1088/2632-2153/ad5f17
Thang M Pham, Nam Do, Ha T T Pham, Hanh T Bui, Thang T Do and Manh V Hoang
Landslides, which can occur due to earthquakes and heavy rainfall, pose significant challenges across large areas. To effectively manage these disasters, it is crucial to have fast and reliable automatic detection methods for mapping landslides. In recent years, deep learning methods, particularly convolutional neural and fully convolutional networks, have been successfully applied to various fields, including landslide detection, with remarkable accuracy and high reliability. However, most of these models achieved high detection performance based on high-resolution satellite images. In this research, we introduce a modified Residual U-Net combined with the Convolutional Block Attention Module, a deep learning method, for automatic landslide mapping. The proposed method is trained and assessed using freely available data sets acquired from Sentinel-2 sensors, digital elevation models, and slope data from ALOS PALSAR with a spatial resolution of 10 m. Compared to the original ResU-Net model, the proposed architecture achieved higher accuracy, with the F1-score improving by 9.1% for the landslide class. Additionally, it offers a lower computational cost, with 1.38 giga multiply-accumulate operations per second (GMACS) needed to execute the model compared to 2.68 GMACS in the original model. The source code is available at https://github.com/manhhv87/LandSlideMapping.git.
{"title":"CResU-Net: a method for landslide mapping using deep learning","authors":"Thang M Pham, Nam Do, Ha T T Pham, Hanh T Bui, Thang T Do and Manh V Hoang","doi":"10.1088/2632-2153/ad5f17","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5f17","url":null,"abstract":"Landslides, which can occur due to earthquakes and heavy rainfall, pose significant challenges across large areas. To effectively manage these disasters, it is crucial to have fast and reliable automatic detection methods for mapping landslides. In recent years, deep learning methods, particularly convolutional neural and fully convolutional networks, have been successfully applied to various fields, including landslide detection, with remarkable accuracy and high reliability. However, most of these models achieved high detection performance based on high-resolution satellite images. In this research, we introduce a modified Residual U-Net combined with the Convolutional Block Attention Module, a deep learning method, for automatic landslide mapping. The proposed method is trained and assessed using freely available data sets acquired from Sentinel-2 sensors, digital elevation models, and slope data from ALOS PALSAR with a spatial resolution of 10 m. Compared to the original ResU-Net model, the proposed architecture achieved higher accuracy, with the F1-score improving by 9.1% for the landslide class. Additionally, it offers a lower computational cost, with 1.38 giga multiply-accumulate operations per second (GMACS) needed to execute the model compared to 2.68 GMACS in the original model. The source code is available at https://github.com/manhhv87/LandSlideMapping.git.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"235 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141588574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-09DOI: 10.1088/2632-2153/ad563c
F Vaselli, F Cattafesta, P Asenov, A Rizzi
The simulation of high-energy physics collision events is a key element for data analysis at present and future particle accelerators. The comparison of simulation predictions to data allows looking for rare deviations that can be due to new phenomena not previously observed. We show that novel machine learning algorithms, specifically Normalizing Flows and Flow Matching, can be used to replicate accurate simulations from traditional approaches with several orders of magnitude of speed-up. The classical simulation chain starts from a physics process of interest, computes energy deposits of particles and electronics response, and finally employs the same reconstruction algorithms used for data. Eventually, the data are reduced to some high-level analysis format. Instead, we propose an end-to-end approach, simulating the final data format directly from physical generator inputs, skipping any intermediate steps. We use particle jets simulation as a benchmark for comparing both discrete and continuous Normalizing Flows models. The models are validated across a variety of metrics to identify the most accurate. We discuss the scaling of performance with the increase in training data, as well as the generalization power of these models on physical processes different from the training one. We investigate sampling multiple times from the same physical generator inputs, a procedure we name oversampling, and we show that it can effectively reduce the statistical uncertainties of a dataset. This class of ML algorithms is found to be capable of learning the expected detector response independently of the physical input process. The speed and accuracy of the models, coupled with the stability of the training procedure, make them a compelling tool for the needs of current and future experiments.
高能物理碰撞事件的模拟是目前和未来粒子加速器数据分析的关键要素。将模拟预测与数据进行比较,可以发现罕见的偏差,而这些偏差可能是由于以前未观察到的新现象造成的。我们展示了新颖的机器学习算法,特别是 "归一化流量"(Normalizing Flows)和 "流量匹配"(Flow Matching)算法,可用于从传统方法中复制精确的模拟结果,并将速度提高几个数量级。经典模拟链从感兴趣的物理过程开始,计算粒子的能量沉积和电子响应,最后采用与数据相同的重构算法。最终,数据被还原为某种高级分析格式。相反,我们提出了一种端到端的方法,直接从物理发生器输入模拟最终数据格式,跳过任何中间步骤。我们将粒子喷流模拟作为比较离散和连续归一化流模型的基准。通过各种指标对模型进行验证,以确定最准确的模型。我们讨论了性能随着训练数据的增加而缩放的问题,以及这些模型对不同于训练数据的物理过程的泛化能力。我们研究了从相同的物理发生器输入中进行多次采样的方法,我们将这一过程命名为 "超采样",结果表明它能有效降低数据集的统计不确定性。我们发现,这类 ML 算法能够独立于物理输入过程学习预期的探测器响应。模型的速度和准确性,加上训练过程的稳定性,使它们成为满足当前和未来实验需求的有力工具。
{"title":"End-to-end simulation of particle physics events with flow matching and generator oversampling","authors":"F Vaselli, F Cattafesta, P Asenov, A Rizzi","doi":"10.1088/2632-2153/ad563c","DOIUrl":"https://doi.org/10.1088/2632-2153/ad563c","url":null,"abstract":"The simulation of high-energy physics collision events is a key element for data analysis at present and future particle accelerators. The comparison of simulation predictions to data allows looking for rare deviations that can be due to new phenomena not previously observed. We show that novel machine learning algorithms, specifically Normalizing Flows and Flow Matching, can be used to replicate accurate simulations from traditional approaches with several orders of magnitude of speed-up. The classical simulation chain starts from a physics process of interest, computes energy deposits of particles and electronics response, and finally employs the same reconstruction algorithms used for data. Eventually, the data are reduced to some high-level analysis format. Instead, we propose an end-to-end approach, simulating the final data format directly from physical generator inputs, skipping any intermediate steps. We use particle jets simulation as a benchmark for comparing both <italic toggle=\"yes\">discrete</italic> and <italic toggle=\"yes\">continuous</italic> Normalizing Flows models. The models are validated across a variety of metrics to identify the most accurate. We discuss the scaling of performance with the increase in training data, as well as the generalization power of these models on physical processes different from the training one. We investigate sampling multiple times from the same physical generator inputs, a procedure we name <italic toggle=\"yes\">oversampling</italic>, and we show that it can effectively reduce the statistical uncertainties of a dataset. This class of ML algorithms is found to be capable of learning the expected detector response independently of the physical input process. The speed and accuracy of the models, coupled with the stability of the training procedure, make them a compelling tool for the needs of current and future experiments.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"38 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141573283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-04DOI: 10.1088/2632-2153/ad594a
Matthias Kellner and Michele Ceriotti
Statistical learning algorithms provide a generally-applicable framework to sidestep time-consuming experiments, or accurate physics-based modeling, but they introduce a further source of error on top of the intrinsic limitations of the experimental or theoretical setup. Uncertainty estimation is essential to quantify this error, and to make application of data-centric approaches more trustworthy. To ensure that uncertainty quantification is used widely, one should aim for algorithms that are accurate, but also easy to implement and apply. In particular, including uncertainty quantification on top of an existing architecture should be straightforward, and add minimal computational overhead. Furthermore, it should be easy to manipulate or combine multiple machine-learning predictions, propagating uncertainty over further modeling steps. We compare several well-established uncertainty quantification frameworks against these requirements, and propose a practical approach, which we dub direct propagation of shallow ensembles, that provides a good compromise between ease of use and accuracy. We present benchmarks for generic datasets, and an in-depth study of applications to the field of atomistic machine learning for chemistry and materials. These examples underscore the importance of using a formulation that allows propagating errors without making strong assumptions on the correlations between different predictions of the model.
{"title":"Uncertainty quantification by direct propagation of shallow ensembles","authors":"Matthias Kellner and Michele Ceriotti","doi":"10.1088/2632-2153/ad594a","DOIUrl":"https://doi.org/10.1088/2632-2153/ad594a","url":null,"abstract":"Statistical learning algorithms provide a generally-applicable framework to sidestep time-consuming experiments, or accurate physics-based modeling, but they introduce a further source of error on top of the intrinsic limitations of the experimental or theoretical setup. Uncertainty estimation is essential to quantify this error, and to make application of data-centric approaches more trustworthy. To ensure that uncertainty quantification is used widely, one should aim for algorithms that are accurate, but also easy to implement and apply. In particular, including uncertainty quantification on top of an existing architecture should be straightforward, and add minimal computational overhead. Furthermore, it should be easy to manipulate or combine multiple machine-learning predictions, propagating uncertainty over further modeling steps. We compare several well-established uncertainty quantification frameworks against these requirements, and propose a practical approach, which we dub direct propagation of shallow ensembles, that provides a good compromise between ease of use and accuracy. We present benchmarks for generic datasets, and an in-depth study of applications to the field of atomistic machine learning for chemistry and materials. These examples underscore the importance of using a formulation that allows propagating errors without making strong assumptions on the correlations between different predictions of the model.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"13 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-04DOI: 10.1088/2632-2153/ad5bbf
Carlo Abate, Sergio Decherchi and Andrea Cavalli
Drug design is both a time consuming and expensive endeavour. Computational strategies offer viable options to address this task; deep learning approaches in particular are indeed gaining traction for their capability of dealing with chemical structures. A straightforward way to represent such structures is via their molecular graph, which in turn can be naturally processed by graph neural networks. This paper introduces AMCG, a dual atomic-molecular, conditional, latent-space, generative model built around graph processing layers able to support both unconditional and conditional molecular graph generation. Among other features, AMCG is a one-shot model allowing for fast sampling, explicit atomic type histogram assignation and property optimization via gradient ascent. The model was trained on the Quantum Machines 9 (QM9) and ZINC datasets, achieving state-of-the-art performances. Together with classic benchmarks, AMCG was also tested by generating large-scale sampled sets, showing robustness in terms of sustainable throughput of valid, novel and unique molecules.
{"title":"AMCG: a graph dual atomic-molecular conditional molecular generator","authors":"Carlo Abate, Sergio Decherchi and Andrea Cavalli","doi":"10.1088/2632-2153/ad5bbf","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5bbf","url":null,"abstract":"Drug design is both a time consuming and expensive endeavour. Computational strategies offer viable options to address this task; deep learning approaches in particular are indeed gaining traction for their capability of dealing with chemical structures. A straightforward way to represent such structures is via their molecular graph, which in turn can be naturally processed by graph neural networks. This paper introduces AMCG, a dual atomic-molecular, conditional, latent-space, generative model built around graph processing layers able to support both unconditional and conditional molecular graph generation. Among other features, AMCG is a one-shot model allowing for fast sampling, explicit atomic type histogram assignation and property optimization via gradient ascent. The model was trained on the Quantum Machines 9 (QM9) and ZINC datasets, achieving state-of-the-art performances. Together with classic benchmarks, AMCG was also tested by generating large-scale sampled sets, showing robustness in terms of sustainable throughput of valid, novel and unique molecules.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"44 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-03DOI: 10.1088/2632-2153/ad56f9
Shailesh Lal, Suvajit Majumder and Evgeny Sobko
We provide a novel neural network architecture that can: i) output R-matrix for a given quantum integrable spin chain, ii) search for an integrable Hamiltonian and the corresponding R-matrix under assumptions of certain symmetries or other restrictions, iii) explore the space of Hamiltonians around already learned models and reconstruct the family of integrable spin chains which they belong to. The neural network training is done by minimizing loss functions encoding Yang–Baxter equation, regularity and other model-specific restrictions such as hermiticity. Holomorphy is implemented via the choice of activation functions. We demonstrate the work of our neural network on the spin chains of difference form with two-dimensional local space. In particular, we reconstruct the R-matrices for all 14 classes. We also demonstrate its utility as an Explorer, scanning a certain subspace of Hamiltonians and identifying integrable classes after clusterisation. The last strategy can be used in future to carve out the map of integrable spin chains with higher dimensional local space and in more general settings where no analytical methods are available.
我们提供了一种新颖的神经网络架构,它可以:i) 输出给定量子可积分自旋链的 R 矩阵;ii) 在某些对称性或其他限制条件的假设下,搜索可积分哈密顿和相应的 R 矩阵;iii) 围绕已学模型探索哈密顿空间,并重建它们所属的可积分自旋链家族。神经网络训练是通过最小化编码杨-巴克斯特方程、正则性和其他特定模型限制(如隐蔽性)的损失函数来完成的。全态性是通过选择激活函数来实现的。我们在具有二维局部空间的差分形式自旋链上演示了神经网络的工作。特别是,我们重建了所有 14 个类别的 R 矩阵。我们还展示了它作为探索者的实用性,它可以扫描汉密尔顿的某个子空间,并在聚类后识别可积分类。最后一种策略今后可用于在更高维度的局部空间和没有分析方法的更一般环境中刻画出可积分自旋链的映射。
{"title":"The R-mAtrIx Net","authors":"Shailesh Lal, Suvajit Majumder and Evgeny Sobko","doi":"10.1088/2632-2153/ad56f9","DOIUrl":"https://doi.org/10.1088/2632-2153/ad56f9","url":null,"abstract":"We provide a novel neural network architecture that can: i) output R-matrix for a given quantum integrable spin chain, ii) search for an integrable Hamiltonian and the corresponding R-matrix under assumptions of certain symmetries or other restrictions, iii) explore the space of Hamiltonians around already learned models and reconstruct the family of integrable spin chains which they belong to. The neural network training is done by minimizing loss functions encoding Yang–Baxter equation, regularity and other model-specific restrictions such as hermiticity. Holomorphy is implemented via the choice of activation functions. We demonstrate the work of our neural network on the spin chains of difference form with two-dimensional local space. In particular, we reconstruct the R-matrices for all 14 classes. We also demonstrate its utility as an Explorer, scanning a certain subspace of Hamiltonians and identifying integrable classes after clusterisation. The last strategy can be used in future to carve out the map of integrable spin chains with higher dimensional local space and in more general settings where no analytical methods are available.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"11 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141552911","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-02DOI: 10.1088/2632-2153/ad52e7
Giles C Strong, Maxime Lagrange, Aitor Orio, Anna Bordignon, Florian Bury, Tommaso Dorigo, Andrea Giammanco, Mariam Heikal, Jan Kieseler, Max Lamparth, Pablo Martínez Ruíz del Árbol, Federico Nardi, Pietro Vischia and Haitham Zaraket
We describe a software package, TomOpt, developed to optimise the geometrical layout and specifications of detectors designed for tomography by scattering of cosmic-ray muons. The software exploits differentiable programming for the modeling of muon interactions with detectors and scanned volumes, the inference of volume properties, and the optimisation cycle performing the loss minimisation. In doing so, we provide the first demonstration of end-to-end-differentiable and inference-aware optimisation of particle physics instruments. We study the performance of the software on a relevant benchmark scenario and discuss its potential applications. Our code is available on Github (Strong et al 2024 available at: https://github.com/GilesStrong/tomopt).
我们介绍了一个名为 "TomOpt "的软件包,该软件包的开发目的是优化通过宇宙射线μ介子散射进行断层扫描的探测器的几何布局和规格。该软件利用可微分编程对μ介子与探测器和扫描体积的相互作用进行建模,推断体积属性,以及执行损耗最小化的优化循环。这样,我们首次展示了粒子物理仪器端到端可微分和推理感知优化。我们研究了该软件在相关基准场景下的性能,并讨论了它的潜在应用。我们的代码可在 Github 上获取(Strong et al 2024,网址:https://github.com/GilesStrong/tomopt)。
{"title":"TomOpt: differential optimisation for task- and constraint-aware design of particle detectors in the context of muon tomography","authors":"Giles C Strong, Maxime Lagrange, Aitor Orio, Anna Bordignon, Florian Bury, Tommaso Dorigo, Andrea Giammanco, Mariam Heikal, Jan Kieseler, Max Lamparth, Pablo Martínez Ruíz del Árbol, Federico Nardi, Pietro Vischia and Haitham Zaraket","doi":"10.1088/2632-2153/ad52e7","DOIUrl":"https://doi.org/10.1088/2632-2153/ad52e7","url":null,"abstract":"We describe a software package, TomOpt, developed to optimise the geometrical layout and specifications of detectors designed for tomography by scattering of cosmic-ray muons. The software exploits differentiable programming for the modeling of muon interactions with detectors and scanned volumes, the inference of volume properties, and the optimisation cycle performing the loss minimisation. In doing so, we provide the first demonstration of end-to-end-differentiable and inference-aware optimisation of particle physics instruments. We study the performance of the software on a relevant benchmark scenario and discuss its potential applications. Our code is available on Github (Strong et al 2024 available at: https://github.com/GilesStrong/tomopt).","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"5 3 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141550029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-01DOI: 10.1088/2632-2153/ad5926
Paul Hagemann, Johannes Hertrich, Maren Casfor, Sebastian Heidenreich and Gabriele Steidl
We develop an algorithm for jointly estimating the posterior and the noise parameters in Bayesian inverse problems, which is motivated by indirect measurements and applications from nanometrology with a mixed noise model. We propose to solve the problem by an expectation maximization (EM) algorithm. Based on the current noise parameters, we learn in the E-step a conditional normalizing flow that approximates the posterior. In the M-step, we propose to find the noise parameter updates again by an EM algorithm, which has analytical formulas. We compare the training of the conditional normalizing flow with the forward and reverse Kullback–Leibler divergence, and show that our model is able to incorporate information from many measurements, unlike previous approaches.
我们开发了一种在贝叶斯逆问题中联合估计后验参数和噪声参数的算法,该算法的动机来自于采用混合噪声模型的纳米计量学的间接测量和应用。我们建议用期望最大化(EM)算法来解决这个问题。基于当前的噪声参数,我们在 E 步中学习一个近似后验的条件归一化流。在 M 步中,我们建议通过 EM 算法再次找到噪声参数更新,该算法具有解析公式。我们将条件归一化流的训练与正向和反向库尔贝克-莱布勒发散进行了比较,结果表明,与以往的方法不同,我们的模型能够纳入来自许多测量的信息。
{"title":"Mixed noise and posterior estimation with conditional deepGEM","authors":"Paul Hagemann, Johannes Hertrich, Maren Casfor, Sebastian Heidenreich and Gabriele Steidl","doi":"10.1088/2632-2153/ad5926","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5926","url":null,"abstract":"We develop an algorithm for jointly estimating the posterior and the noise parameters in Bayesian inverse problems, which is motivated by indirect measurements and applications from nanometrology with a mixed noise model. We propose to solve the problem by an expectation maximization (EM) algorithm. Based on the current noise parameters, we learn in the E-step a conditional normalizing flow that approximates the posterior. In the M-step, we propose to find the noise parameter updates again by an EM algorithm, which has analytical formulas. We compare the training of the conditional normalizing flow with the forward and reverse Kullback–Leibler divergence, and show that our model is able to incorporate information from many measurements, unlike previous approaches.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"86 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-27DOI: 10.1088/2632-2153/ad5784
Atsarina Larasati Anindya, Torbjörn Nur Olsson, Maja Jensen, Maria-Jose Garcia-Bonete, Sally P Wheatley, Maria I Bokarewa, Stefano A Mezzasalma and Gergely Katona
In the realm of atomic physics and chemistry, composition emerges as the most powerful means of describing matter. Mendeleev’s periodic table and chemical formulas, while not entirely free from ambiguities, provide robust approximations for comprehending the properties of atoms, chemicals, and their collective behaviours, which stem from the dynamic interplay of their constituents. Our study illustrates that protein-protein interactions follow a similar paradigm, wherein the composition of peptides plays a pivotal role in predicting their interactions with the protein survivin, using an elegantly simple model. An analysis of these predictions within the context of the human proteome not only confirms the known cellular locations of survivin and its interaction partners, but also introduces novel insights into biological functionality. It becomes evident that electrostatic- and primary structure-based descriptions fall short in predictive power, leading us to speculate that protein interactions are orchestrated by the collective dynamics of functional groups.
{"title":"Deciphering peptide-protein interactions via composition-based prediction: a case study with survivin/BIRC5","authors":"Atsarina Larasati Anindya, Torbjörn Nur Olsson, Maja Jensen, Maria-Jose Garcia-Bonete, Sally P Wheatley, Maria I Bokarewa, Stefano A Mezzasalma and Gergely Katona","doi":"10.1088/2632-2153/ad5784","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5784","url":null,"abstract":"In the realm of atomic physics and chemistry, composition emerges as the most powerful means of describing matter. Mendeleev’s periodic table and chemical formulas, while not entirely free from ambiguities, provide robust approximations for comprehending the properties of atoms, chemicals, and their collective behaviours, which stem from the dynamic interplay of their constituents. Our study illustrates that protein-protein interactions follow a similar paradigm, wherein the composition of peptides plays a pivotal role in predicting their interactions with the protein survivin, using an elegantly simple model. An analysis of these predictions within the context of the human proteome not only confirms the known cellular locations of survivin and its interaction partners, but also introduces novel insights into biological functionality. It becomes evident that electrostatic- and primary structure-based descriptions fall short in predictive power, leading us to speculate that protein interactions are orchestrated by the collective dynamics of functional groups.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"236 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1088/2632-2153/ad5a5f
Enrico Ventura, Simona Cocco, Rémi Monasson and Francesco Zamponi
Boltzmann machines (BMs) are graphical models with interconnected binary units, employed for the unsupervised modeling of data distributions. When trained on real data, BMs show the tendency to behave like critical systems, displaying a high susceptibility of the model under a small rescaling of the inferred parameters. This behavior is not convenient for the purpose of generating data, because it slows down the sampling process, and induces the model to overfit the training-data. In this study, we introduce a regularization method for BMs to improve the robustness of the model under rescaling of the parameters. The new technique shares formal similarities with the unlearning algorithm, an iterative procedure used to improve memory associativity in Hopfield-like neural networks. We test our unlearning regularization on synthetic data generated by two simple models, the Curie–Weiss ferromagnetic model and the Sherrington–Kirkpatrick spin glass model. We show that it outperforms Lp-norm schemes and discuss the role of parameter initialization. Eventually, the method is applied to learn the activity of real neuronal cells, confirming its efficacy at shifting the inferred model away from criticality and coming out as a powerful candidate for actual scientific implementations.
{"title":"Unlearning regularization for Boltzmann machines","authors":"Enrico Ventura, Simona Cocco, Rémi Monasson and Francesco Zamponi","doi":"10.1088/2632-2153/ad5a5f","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5a5f","url":null,"abstract":"Boltzmann machines (BMs) are graphical models with interconnected binary units, employed for the unsupervised modeling of data distributions. When trained on real data, BMs show the tendency to behave like critical systems, displaying a high susceptibility of the model under a small rescaling of the inferred parameters. This behavior is not convenient for the purpose of generating data, because it slows down the sampling process, and induces the model to overfit the training-data. In this study, we introduce a regularization method for BMs to improve the robustness of the model under rescaling of the parameters. The new technique shares formal similarities with the unlearning algorithm, an iterative procedure used to improve memory associativity in Hopfield-like neural networks. We test our unlearning regularization on synthetic data generated by two simple models, the Curie–Weiss ferromagnetic model and the Sherrington–Kirkpatrick spin glass model. We show that it outperforms Lp-norm schemes and discuss the role of parameter initialization. Eventually, the method is applied to learn the activity of real neuronal cells, confirming its efficacy at shifting the inferred model away from criticality and coming out as a powerful candidate for actual scientific implementations.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"9 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1088/2632-2153/ad5927
Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai
Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.
{"title":"Unification of symmetries inside neural networks: transformer, feedforward and neural ODE","authors":"Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai","doi":"10.1088/2632-2153/ad5927","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5927","url":null,"abstract":"Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"2016 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}