Machine Learning Science and Technology最新文献_第8页

Unlearning regularization for Boltzmann machines 为波尔兹曼机解除学习正则化

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-26 DOI: 10.1088/2632-2153/ad5a5f

Enrico Ventura, Simona Cocco, Rémi Monasson and Francesco Zamponi

Boltzmann machines (BMs) are graphical models with interconnected binary units, employed for the unsupervised modeling of data distributions. When trained on real data, BMs show the tendency to behave like critical systems, displaying a high susceptibility of the model under a small rescaling of the inferred parameters. This behavior is not convenient for the purpose of generating data, because it slows down the sampling process, and induces the model to overfit the training-data. In this study, we introduce a regularization method for BMs to improve the robustness of the model under rescaling of the parameters. The new technique shares formal similarities with the unlearning algorithm, an iterative procedure used to improve memory associativity in Hopfield-like neural networks. We test our unlearning regularization on synthetic data generated by two simple models, the Curie–Weiss ferromagnetic model and the Sherrington–Kirkpatrick spin glass model. We show that it outperforms Lp-norm schemes and discuss the role of parameter initialization. Eventually, the method is applied to learn the activity of real neuronal cells, confirming its efficacy at shifting the inferred model away from criticality and coming out as a powerful candidate for actual scientific implementations.

玻尔兹曼机（BMs）是一种具有相互连接的二进制单元的图形模型，用于对数据分布进行无监督建模。在真实数据上进行训练时，玻尔兹曼机表现出临界系统的倾向，在推断参数的微小重缩放下显示出模型的高度易感性。这种行为不利于生成数据，因为它会减慢采样过程，并导致模型过度拟合训练数据。在本研究中，我们为 BMs 引入了一种正则化方法，以提高模型在参数重新缩放情况下的鲁棒性。这种新技术在形式上与 "un-learning 算法 "有相似之处，后者是一种迭代过程，用于改善霍普菲尔德类神经网络的记忆关联性。我们在居里-魏斯铁磁模型和谢林顿-柯克帕特里克自旋玻璃模型这两个简单模型生成的合成数据上测试了我们的非学习正则化。结果表明，它优于 Lp 正则方案，并讨论了参数初始化的作用。最后，我们将该方法应用于学习真实神经细胞的活动，证实了它能有效地将推断出的模型从临界状态中转移出来，成为实际科学实施的有力候选方案。

{"title":"Unlearning regularization for Boltzmann machines","authors":"Enrico Ventura, Simona Cocco, Rémi Monasson and Francesco Zamponi","doi":"10.1088/2632-2153/ad5a5f","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5a5f","url":null,"abstract":"Boltzmann machines (BMs) are graphical models with interconnected binary units, employed for the unsupervised modeling of data distributions. When trained on real data, BMs show the tendency to behave like critical systems, displaying a high susceptibility of the model under a small rescaling of the inferred parameters. This behavior is not convenient for the purpose of generating data, because it slows down the sampling process, and induces the model to overfit the training-data. In this study, we introduce a regularization method for BMs to improve the robustness of the model under rescaling of the parameters. The new technique shares formal similarities with the unlearning algorithm, an iterative procedure used to improve memory associativity in Hopfield-like neural networks. We test our unlearning regularization on synthetic data generated by two simple models, the Curie–Weiss ferromagnetic model and the Sherrington–Kirkpatrick spin glass model. We show that it outperforms Lp-norm schemes and discuss the role of parameter initialization. Eventually, the method is applied to learn the activity of real neuronal cells, confirming its efficacy at shifting the inferred model away from criticality and coming out as a powerful candidate for actual scientific implementations.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"9 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unification of symmetries inside neural networks: transformer, feedforward and neural ODE 神经网络内部对称性的统一：变压器、前馈和神经 ODE

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-26 DOI: 10.1088/2632-2153/ad5927

Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai

Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.

了解神经网络（包括变压器）的内部工作原理仍然是机器学习领域最具挑战性的难题之一。本研究引入了一种新方法，将物理学中的一个重要概念--量规对称性原理应用于神经网络架构。通过将模型函数视为物理观测值，我们发现各种机器学习模型的参数冗余可以解释为规整对称。我们用数学方法表述了神经 ODE 中的参数冗余，并发现它们的规对称性是由时空差分变形给出的，而时空差分变形在爱因斯坦的万有引力理论中扮演着重要角色。将神经 ODE 视为前馈神经网络的连续版本，我们证明了前馈神经网络中的参数冗余确实可以提升为神经 ODE 中的差分同构。我们进一步将分析扩展到变压器模型，找到了与神经 ODE 及其量规对称性的自然对应关系。量规对称性的概念通过物理学揭示了深度学习模型的复杂行为，并为我们提供了分析各种机器学习架构的统一视角。

{"title":"Unification of symmetries inside neural networks: transformer, feedforward and neural ODE","authors":"Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai","doi":"10.1088/2632-2153/ad5927","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5927","url":null,"abstract":"Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"2016 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Performance deterioration of deep learning models after clinical deployment: a case study with auto-segmentation for definitive prostate cancer radiotherapy 深度学习模型在临床部署后性能下降：前列腺癌放射治疗自动分割案例研究

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-24 DOI: 10.1088/2632-2153/ad580f

Biling Wang, Michael Dohopolski, Ti Bai, Junjie Wu, Raquibul Hannan, Neil Desai, Aurelie Garant, Daniel Yang, Dan Nguyen, Mu-Han Lin, Robert Timmerman, Xinlei Wang and Steve B Jiang

Our study aims to explore the long-term performance patterns for deep learning (DL) models deployed in clinic and to investigate their efficacy in relation to evolving clinical practices. We conducted a retrospective study simulating the clinical implementation of our DL model involving 1328 prostate cancer patients treated between January 2006 and August 2022. We trained and validated a U-Net-based auto-segmentation model on data obtained from 2006 to 2011 and tested on data from 2012 to 2022, simulating the model’s clinical deployment starting in 2012. We visualized the trends of the model performance using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test and multiple linear regression to investigate Dice similarity coefficient (DSC) variations across distinct periods and the impact of clinical factors, respectively. Initially, from 2012 to 2014, the model showed high performance in segmenting the prostate, rectum, and bladder. Post-2015, a notable decline in EMA DSC was observed for the prostate and rectum, while bladder contours remained stable. Key factors impacting the prostate contour quality included physician contouring styles, using various hydrogel spacers, CT scan slice thickness, MRI-guided contouring, and intravenous (IV) contrast (p < 0.0001, p < 0.0001, p = 0.0085, p = 0.0012, p < 0.0001, respectively). Rectum contour quality was notably influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The quality of the bladder contour was primarily affected by IV contrast. The deployed DL model exhibited a substantial decline in performance over time, aligning with the evolving clinical settings.

我们的研究旨在探索部署在临床中的深度学习（DL）模型的长期性能模式，并研究其与不断发展的临床实践相关的功效。我们进行了一项回顾性研究，模拟了我们的深度学习模型的临床实施情况，涉及 2006 年 1 月至 2022 年 8 月期间接受治疗的 1328 名前列腺癌患者。我们在 2006 年至 2011 年获得的数据上训练并验证了基于 U-Net 的自动分割模型，并在 2012 年至 2022 年的数据上进行了测试，模拟了模型从 2012 年开始的临床部署。我们使用指数加权移动平均（EMA）曲线直观地显示了模型性能的变化趋势。此外，我们还进行了Wilcoxon秩和检验和多元线性回归，分别研究了不同时期的Dice相似系数（DSC）变化和临床因素的影响。最初，从 2012 年到 2014 年，该模型在分割前列腺、直肠和膀胱方面表现出很高的性能。2015 年后，前列腺和直肠的 EMA DSC 显著下降，而膀胱轮廓则保持稳定。影响前列腺轮廓质量的关键因素包括医生的轮廓塑造方式、使用的各种水凝胶垫片、CT 扫描切片厚度、MRI 引导下的轮廓塑造以及静脉注射 (IV) 造影剂（分别为 p < 0.0001、p < 0.0001、p = 0.0085、p = 0.0012、p < 0.0001）。直肠轮廓质量明显受到切片厚度、医生轮廓设计风格和各种水凝胶垫片使用等因素的影响。膀胱轮廓的质量主要受静脉注射对比剂的影响。随着时间的推移，部署的 DL 模型的性能大幅下降，这与不断变化的临床环境相一致。

{"title":"Performance deterioration of deep learning models after clinical deployment: a case study with auto-segmentation for definitive prostate cancer radiotherapy","authors":"Biling Wang, Michael Dohopolski, Ti Bai, Junjie Wu, Raquibul Hannan, Neil Desai, Aurelie Garant, Daniel Yang, Dan Nguyen, Mu-Han Lin, Robert Timmerman, Xinlei Wang and Steve B Jiang","doi":"10.1088/2632-2153/ad580f","DOIUrl":"https://doi.org/10.1088/2632-2153/ad580f","url":null,"abstract":"Our study aims to explore the long-term performance patterns for deep learning (DL) models deployed in clinic and to investigate their efficacy in relation to evolving clinical practices. We conducted a retrospective study simulating the clinical implementation of our DL model involving 1328 prostate cancer patients treated between January 2006 and August 2022. We trained and validated a U-Net-based auto-segmentation model on data obtained from 2006 to 2011 and tested on data from 2012 to 2022, simulating the model’s clinical deployment starting in 2012. We visualized the trends of the model performance using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test and multiple linear regression to investigate Dice similarity coefficient (DSC) variations across distinct periods and the impact of clinical factors, respectively. Initially, from 2012 to 2014, the model showed high performance in segmenting the prostate, rectum, and bladder. Post-2015, a notable decline in EMA DSC was observed for the prostate and rectum, while bladder contours remained stable. Key factors impacting the prostate contour quality included physician contouring styles, using various hydrogel spacers, CT scan slice thickness, MRI-guided contouring, and intravenous (IV) contrast (p < 0.0001, p < 0.0001, p = 0.0085, p = 0.0012, p < 0.0001, respectively). Rectum contour quality was notably influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The quality of the bladder contour was primarily affected by IV contrast. The deployed DL model exhibited a substantial decline in performance over time, aligning with the evolving clinical settings.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"159 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Finetuning foundation models for joint analysis optimization in High Energy Physics 微调高能物理联合分析优化的基础模型

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-20 DOI: 10.1088/2632-2153/ad55a3

Matthias Vigl, Nicole Hartman and Lukas Heinrich

In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs system to four b-jets. To our knowledge this is the first example of a low-level feature extraction network finetuned for a downstream HEP analysis objective.

在这项工作中，我们证明了在高能物理（HEP）中，通过超越顺序优化或重建和分析组件的标准范式，可以显著提高性能和数据效率。我们从概念上将高能物理重构和分析与预训练、微调、域适应和高维嵌入空间等现代机器学习工作流程联系起来，并量化了在通过中间二希格斯系统衰变到四个 b 喷射的重共振搜索示例用例中的收益。据我们所知，这是第一个针对下游 HEP 分析目标对低级特征提取网络进行微调的例子。

引用次数: 0

Sparse autoregressive neural networks for classical spin systems 经典自旋系统的稀疏自回归神经网络

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-20 DOI: 10.1088/2632-2153/ad5783

Indaco Biazzo, Dian Wu and Giuseppe Carleo

Efficient sampling and approximation of Boltzmann distributions involving large sets of binary variables, or spins, are pivotal in diverse scientific fields even beyond physics. Recent advances in generative neural networks have significantly impacted this domain. However, these neural networks are often treated as black boxes, with architectures primarily influenced by data-driven problems in computational science. Addressing this gap, we introduce a novel autoregressive neural network architecture named TwoBo, specifically designed for sparse two-body interacting spin systems. We directly incorporate the Boltzmann distribution into its architecture and parameters, resulting in enhanced convergence speed, superior free energy accuracy, and reduced trainable parameters. We perform numerical experiments on disordered, frustrated systems with more than 1000 spins on grids and random graphs, and demonstrate its advantages compared to previous autoregressive and recurrent architectures. Our findings validate a physically informed approach and suggest potential extensions to multivalued variables and many-body interaction systems, paving the way for broader applications in scientific research.

对涉及大量二元变量集或自旋的玻尔兹曼分布进行高效采样和逼近，在各种科学领域甚至物理学之外都至关重要。生成神经网络的最新进展对这一领域产生了重大影响。然而，这些神经网络通常被视为黑盒子，其架构主要受计算科学中数据驱动问题的影响。为了弥补这一不足，我们引入了一种名为 TwoBo 的新型自回归神经网络架构，专门用于稀疏的双体相互作用自旋系统。我们直接将玻尔兹曼分布纳入其架构和参数中，从而提高了收敛速度、自由能精度和可训练参数。我们对网格和随机图上超过 1000 个自旋的无序、受挫系统进行了数值实验，证明了它与之前的自回归和递归架构相比所具有的优势。我们的研究结果验证了这种物理方法，并建议将其扩展到多值变量和多体相互作用系统，从而为其在科学研究中的广泛应用铺平道路。

{"title":"Sparse autoregressive neural networks for classical spin systems","authors":"Indaco Biazzo, Dian Wu and Giuseppe Carleo","doi":"10.1088/2632-2153/ad5783","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5783","url":null,"abstract":"Efficient sampling and approximation of Boltzmann distributions involving large sets of binary variables, or spins, are pivotal in diverse scientific fields even beyond physics. Recent advances in generative neural networks have significantly impacted this domain. However, these neural networks are often treated as black boxes, with architectures primarily influenced by data-driven problems in computational science. Addressing this gap, we introduce a novel autoregressive neural network architecture named TwoBo, specifically designed for sparse two-body interacting spin systems. We directly incorporate the Boltzmann distribution into its architecture and parameters, resulting in enhanced convergence speed, superior free energy accuracy, and reduced trainable parameters. We perform numerical experiments on disordered, frustrated systems with more than 1000 spins on grids and random graphs, and demonstrate its advantages compared to previous autoregressive and recurrent architectures. Our findings validate a physically informed approach and suggest potential extensions to multivalued variables and many-body interaction systems, paving the way for broader applications in scientific research.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"46 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Merging automatic differentiation and the adjoint method for photonic inverse design 将自动微分法与光子逆向设计的邻接法结合起来

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-20 DOI: 10.1088/2632-2153/ad5411

Alexander Luce, Rasoul Alaee, Fabian Knorr and Florian Marquardt

Optimizing the shapes and topology of physical devices is crucial for both scientific and technological advancements, given their wide-ranging implications across numerous industries and research areas. Innovations in shape and topology optimization have been observed across a wide range of fields, notably structural mechanics, fluid mechanics, and more recently, photonics. Gradient-based inverse design techniques have been particularly successful for photonic and optical problems, resulting in integrated, miniaturized hardware that has set new standards in device performance. To calculate the gradients, there are typically two approaches: namely, either by implementing specialized solvers using automatic differentiation (AD) or by deriving analytical solutions for gradient calculation and adjoint sources by hand. In this work, we propose a middle ground and present a hybrid approach that leverages and enables the benefits of AD for handling gradient derivation while using existing, proven but black-box photonic solvers for numerical solutions. Utilizing the adjoint method, we make existing numerical solvers differentiable and seamlessly integrate them into an AD framework. Further, this enables users to integrate the optimization environment seamlessly with other autodifferentiable components such as machine learning, geometry generation, or intricate post-processing which could lead to better photonic design workflows. We illustrate the approach through two distinct photonic optimization problems: optimizing the Purcell factor of a magnetic dipole in the vicinity of an optical nanocavity and enhancing the light extraction efficiency of a µLED.

优化物理设备的形状和拓扑结构对科学和技术进步至关重要，因为这对众多行业和研究领域都有广泛影响。形状和拓扑优化方面的创新已经遍及各个领域，特别是结构力学、流体力学和最近的光子学。基于梯度的逆向设计技术在解决光子和光学问题方面尤为成功，其集成化、微型化硬件为设备性能设定了新标准。要计算梯度，通常有两种方法：一是使用自动微分（AD）实现专门的求解器，二是手工推导梯度计算和邻接源的解析解。在这项工作中，我们提出了一个中间方案，并提出了一种混合方法，即利用自动微分的优势处理梯度推导，同时使用现有的、经过验证的黑盒光子求解器进行数值求解。利用邻接法，我们使现有的数值求解器可微分，并将其无缝集成到 AD 框架中。此外，这还能让用户将优化环境与机器学习、几何生成或复杂的后处理等其他自动可微分组件无缝集成，从而实现更好的光子设计工作流程。我们通过两个不同的光子优化问题来说明这种方法：优化光学纳米腔附近磁偶极子的珀塞尔因子和提高 µLED 的光提取效率。

{"title":"Merging automatic differentiation and the adjoint method for photonic inverse design","authors":"Alexander Luce, Rasoul Alaee, Fabian Knorr and Florian Marquardt","doi":"10.1088/2632-2153/ad5411","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5411","url":null,"abstract":"Optimizing the shapes and topology of physical devices is crucial for both scientific and technological advancements, given their wide-ranging implications across numerous industries and research areas. Innovations in shape and topology optimization have been observed across a wide range of fields, notably structural mechanics, fluid mechanics, and more recently, photonics. Gradient-based inverse design techniques have been particularly successful for photonic and optical problems, resulting in integrated, miniaturized hardware that has set new standards in device performance. To calculate the gradients, there are typically two approaches: namely, either by implementing specialized solvers using automatic differentiation (AD) or by deriving analytical solutions for gradient calculation and adjoint sources by hand. In this work, we propose a middle ground and present a hybrid approach that leverages and enables the benefits of AD for handling gradient derivation while using existing, proven but black-box photonic solvers for numerical solutions. Utilizing the adjoint method, we make existing numerical solvers differentiable and seamlessly integrate them into an AD framework. Further, this enables users to integrate the optimization environment seamlessly with other autodifferentiable components such as machine learning, geometry generation, or intricate post-processing which could lead to better photonic design workflows. We illustrate the approach through two distinct photonic optimization problems: optimizing the Purcell factor of a magnetic dipole in the vicinity of an optical nanocavity and enhancing the light extraction efficiency of a µLED.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"12 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep learning methods for Hamiltonian parameter estimation and magnetic domain image generation in twisted van der Waals magnets 扭转范德华磁体中哈密顿参数估计和磁域图像生成的深度学习方法

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-19 DOI: 10.1088/2632-2153/ad56fa

Woo Seok Lee, Taegeun Song and Kyoung-Min Kim

The application of twist engineering in van der Waals magnets has opened new frontiers in the field of two-dimensional magnetism, yielding distinctive magnetic domain structures. Despite the introduction of numerous theoretical methods, limitations persist in terms of accuracy or efficiency due to the complex nature of the magnetic Hamiltonians pertinent to these systems. In this study, we introduce a deep-learning approach to tackle these challenges. Utilizing customized, fully connected networks, we develop two deep-neural-network kernels that facilitate efficient and reliable analysis of twisted van der Waals magnets. Our regression model is adept at estimating the magnetic Hamiltonian parameters of twisted bilayer CrI3 from its magnetic domain images generated through atomistic spin simulations. The ‘generative model’ excels in producing precise magnetic domain images from the provided magnetic parameters. The trained networks for these models undergo thorough validation, including statistical error analysis and assessment of robustness against noisy injections. These advancements not only extend the applicability of deep-learning methods to twisted van der Waals magnets but also streamline future investigations into these captivating yet poorly understood systems.

范德华磁体扭转工程的应用开辟了二维磁学领域的新领域，产生了独特的磁畴结构。尽管引入了大量理论方法，但由于这些系统相关的磁性哈密顿的复杂性，在精度或效率方面仍然存在限制。在本研究中，我们引入了一种深度学习方法来应对这些挑战。利用定制的全连接网络，我们开发了两种深度神经网络内核，有助于对扭曲范德华磁体进行高效可靠的分析。我们的回归模型善于从原子自旋模拟生成的磁畴图像中估计扭曲双层 CrI3 的磁性哈密顿参数。生成模型 "擅长根据提供的磁参数生成精确的磁畴图像。这些模型的训练网络经过了全面的验证，包括统计误差分析和对噪声注入的鲁棒性评估。这些进展不仅扩展了深度学习方法在扭曲范德华磁体中的适用性，还简化了未来对这些令人着迷但却鲜为人知的系统的研究。

{"title":"Deep learning methods for Hamiltonian parameter estimation and magnetic domain image generation in twisted van der Waals magnets","authors":"Woo Seok Lee, Taegeun Song and Kyoung-Min Kim","doi":"10.1088/2632-2153/ad56fa","DOIUrl":"https://doi.org/10.1088/2632-2153/ad56fa","url":null,"abstract":"The application of twist engineering in van der Waals magnets has opened new frontiers in the field of two-dimensional magnetism, yielding distinctive magnetic domain structures. Despite the introduction of numerous theoretical methods, limitations persist in terms of accuracy or efficiency due to the complex nature of the magnetic Hamiltonians pertinent to these systems. In this study, we introduce a deep-learning approach to tackle these challenges. Utilizing customized, fully connected networks, we develop two deep-neural-network kernels that facilitate efficient and reliable analysis of twisted van der Waals magnets. Our regression model is adept at estimating the magnetic Hamiltonian parameters of twisted bilayer CrI3 from its magnetic domain images generated through atomistic spin simulations. The ‘generative model’ excels in producing precise magnetic domain images from the provided magnetic parameters. The trained networks for these models undergo thorough validation, including statistical error analysis and assessment of robustness against noisy injections. These advancements not only extend the applicability of deep-learning methods to twisted van der Waals magnets but also streamline future investigations into these captivating yet poorly understood systems.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"86 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machine learning meets Kepler: inverting Kepler’s equation for All vs All conjunction analysis 机器学习与开普勒：倒转开普勒方程进行全局与全局连线分析

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-13 DOI: 10.1088/2632-2153/ad51cc

Kevin Otto, Simon Burgis, Kristian Kersting, Reinhold Bertrand and Devendra Singh Dhami

The number of satellites in orbit around Earth is increasing rapidly, with the risk of collision rising accordingly. Trends of the global population of satellites need to be analyzed to test the viability and impact of proposed rules and laws affecting the satellite population and collision avoidance strategies. This requires large scale simulations of satellites that are propagated on long timescales to compute the large amounts of actionable close encounters (called conjunctions), which could lead to collisions. Rigorously checking for conjunctions by computing future states of orbits is computationally expensive due to the large amount of objects involved and conjunction filters are thus used to remove non-conjuncting orbit pairs from the list of possible conjunctions. In this work, we explore the possibility of machine learning (ML) based conjunction filters using several algorithms such as eXtreme Gradient Boosting, TabNet and (physics-informed) neural networks and deep operator networks. To show the viability and the potential of ML based filters, these algorithms are trained to predict the future state of orbits. For the physics-informed approaches, multiple partial differential equations are set up using the Kepler equation as a basis. The empirical results demonstrate that physics-informed deep operator networks are capable of predicting the future state of orbits using these equations (RMSE: 0.136) and outperform eXtreme Gradient Boosting (RMSE: 0.568) and TabNet (RMSE: 0.459). We also propose a filter based on the trained deep operator network which is shown to outperforms the filter capability of the commonly used perigee-apogee test and the orbit path filter on a synthetic dataset, while being on average 3.2 times faster to compute than a rigorous conjunction check.

环绕地球轨道的卫星数量正在迅速增加，碰撞风险也随之上升。需要对全球卫星数量的趋势进行分析，以检验影响卫星数量和避免碰撞战略的拟议规则和法律的可行性和影响。这就需要对卫星进行大规模模拟，在长时间尺度上进行传播，以计算可能导致碰撞的大量可操作的近距离相遇（称为会合）。由于涉及大量物体，通过计算轨道的未来状态来严格检查会合情况的计算成本很高，因此会合过滤器被用来从可能的会合列表中剔除非会合轨道对。在这项工作中，我们探索了基于机器学习（ML）的会合过滤器的可能性，使用了几种算法，如极端梯度提升、TabNet 和（物理信息）神经网络以及深度算子网络。为了展示基于 ML 的滤波器的可行性和潜力，对这些算法进行了预测轨道未来状态的训练。对于物理信息方法，以开普勒方程为基础建立了多个偏微分方程。实证结果表明，基于物理信息的深度算子网络能够利用这些方程预测轨道的未来状态（RMSE：0.136），并优于极梯度提升（RMSE：0.568）和 TabNet（RMSE：0.459）。我们还提出了一种基于训练有素的深度算子网络的滤波器，结果表明该滤波器的滤波能力优于常用的近地点-远地点测试和合成数据集上的轨道路径滤波器，同时计算速度平均比严格的会合检查快 3.2 倍。

{"title":"Machine learning meets Kepler: inverting Kepler’s equation for All vs All conjunction analysis","authors":"Kevin Otto, Simon Burgis, Kristian Kersting, Reinhold Bertrand and Devendra Singh Dhami","doi":"10.1088/2632-2153/ad51cc","DOIUrl":"https://doi.org/10.1088/2632-2153/ad51cc","url":null,"abstract":"The number of satellites in orbit around Earth is increasing rapidly, with the risk of collision rising accordingly. Trends of the global population of satellites need to be analyzed to test the viability and impact of proposed rules and laws affecting the satellite population and collision avoidance strategies. This requires large scale simulations of satellites that are propagated on long timescales to compute the large amounts of actionable close encounters (called conjunctions), which could lead to collisions. Rigorously checking for conjunctions by computing future states of orbits is computationally expensive due to the large amount of objects involved and conjunction filters are thus used to remove non-conjuncting orbit pairs from the list of possible conjunctions. In this work, we explore the possibility of machine learning (ML) based conjunction filters using several algorithms such as eXtreme Gradient Boosting, TabNet and (physics-informed) neural networks and deep operator networks. To show the viability and the potential of ML based filters, these algorithms are trained to predict the future state of orbits. For the physics-informed approaches, multiple partial differential equations are set up using the Kepler equation as a basis. The empirical results demonstrate that physics-informed deep operator networks are capable of predicting the future state of orbits using these equations (RMSE: 0.136) and outperform eXtreme Gradient Boosting (RMSE: 0.568) and TabNet (RMSE: 0.459). We also propose a filter based on the trained deep operator network which is shown to outperforms the filter capability of the commonly used perigee-apogee test and the orbit path filter on a synthetic dataset, while being on average 3.2 times faster to compute than a rigorous conjunction check.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"34 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STG-MTL: scalable task grouping for multi-task learning using data maps STG-MTL：利用数据图谱对多任务学习进行可扩展的任务分组

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-13 DOI: 10.1088/2632-2153/ad4e04

Ammar Sherif, Abubakar Abid, Mustafa Elattar and Mohamed ElHelw

Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference between tasks. That is why existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on a re-proposed data-driven features, Data Maps, which capture the training dynamics for each classification task during the MTL training. Through a theoretical comparison with other techniques, we manage to show that our approach has the superior scalability. Our experiments show a better performance and verify the method’s effectiveness, even on an unprecedented number of tasks (up to 100 tasks on CIFAR100). Being the first to work on such number of tasks, our comparisons on the resulting grouping shows similar grouping to the mentioned in the dataset, CIFAR100. Finally, we provide a modular implementation3for easier integration and testing, with examples from multiple datasets and tasks.

多任务学习（Multi-Task Learning，简称 MTL）是一种功能强大的技术，与传统的单任务学习（Single-Task Learning，简称 STL）相比，MTL 的性能有所提高，因此广受欢迎。然而，MTL 通常具有挑战性，因为可能的任务分组数量呈指数级增长，这使得选择最佳分组变得困难，因为有些分组可能会因任务之间的负面干扰而导致性能下降。这就是为什么现有的解决方案都存在严重的可扩展性问题，从而限制了实际应用。在本文中，我们提出了一种新的数据驱动方法来应对这些挑战，并基于重新提出的数据驱动特征--数据地图，为分类任务分组提供了一种可扩展的模块化解决方案。通过与其他技术的理论比较，我们成功地证明了我们的方法具有卓越的可扩展性。我们的实验表明，即使在任务数量前所未有的情况下（在 CIFAR100 上多达 100 个任务），我们的方法也能取得更好的性能并验证其有效性。作为首个对如此多任务进行分组的方法，我们对分组结果进行的比较显示，分组结果与数据集 CIFAR100 中提到的分组结果相似。最后，我们提供了一个模块化实现3 ，以便于集成和测试，并提供了来自多个数据集和任务的示例。

{"title":"STG-MTL: scalable task grouping for multi-task learning using data maps","authors":"Ammar Sherif, Abubakar Abid, Mustafa Elattar and Mohamed ElHelw","doi":"10.1088/2632-2153/ad4e04","DOIUrl":"https://doi.org/10.1088/2632-2153/ad4e04","url":null,"abstract":"Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference between tasks. That is why existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on a re-proposed data-driven features, Data Maps, which capture the training dynamics for each classification task during the MTL training. Through a theoretical comparison with other techniques, we manage to show that our approach has the superior scalability. Our experiments show a better performance and verify the method’s effectiveness, even on an unprecedented number of tasks (up to 100 tasks on CIFAR100). Being the first to work on such number of tasks, our comparisons on the resulting grouping shows similar grouping to the mentioned in the dataset, CIFAR100. Finally, we provide a modular implementation3for easier integration and testing, with examples from multiple datasets and tasks.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"10 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synergizing human expertise and AI efficiency with language model for microscopy operation and automated experiment design * 通过显微镜操作和自动实验设计语言模型，实现人类专业知识与人工智能效率的协同 *

IF 6.8 2区物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Machine Learning Science and Technology

Pub Date : 2024-06-12 DOI: 10.1088/2632-2153/ad52e9

Yongtao Liu, Marti Checa and Rama K Vasudevan

With the advent of large language models (LLMs), in both the open source and proprietary domains, attention is turning to how to exploit such artificial intelligence (AI) systems in assisting complex scientific tasks, such as material synthesis, characterization, analysis and discovery. Here, we explore the utility of LLMs, particularly ChatGPT4, in combination with application program interfaces (APIs) in tasks of experimental design, programming workflows, and data analysis in scanning probe microscopy, using both in-house developed APIs and APIs given by a commercial vendor for instrument control. We find that the LLM can be especially useful in converting ideations of experimental workflows to executable code on microscope APIs. Beyond code generation, we find that the GPT4 is capable of analyzing microscopy images in a generic sense. At the same time, we find that GPT4 suffers from an inability to extend beyond basic analyses for more in-depth technical experimental design. We argue that an LLM specifically fine-tuned for individual scientific domains can potentially be a better language interface for converting scientific ideations from human experts to executable workflows. Such a synergy between human expertise and LLM efficiency in experimentation can open new doors for accelerating scientific research, enabling effective experimental protocols sharing in the scientific community.

随着大型语言模型（LLMs）在开源和专有领域的出现，人们开始关注如何利用这种人工智能（AI）系统来辅助复杂的科学任务，如材料合成、表征、分析和发现。在这里，我们探索了 LLM（尤其是 ChatGPT4）与应用程序接口（API）相结合，在扫描探针显微镜的实验设计、编程工作流和数据分析任务中的实用性，同时使用了内部开发的 API 和商业供应商提供的用于仪器控制的 API。我们发现，LLM 在将实验工作流程的构思转换为显微镜 API 的可执行代码方面特别有用。除了代码生成之外，我们还发现 GPT4 能够对显微图像进行一般意义上的分析。与此同时，我们发现 GPT4 无法超越基本分析，进行更深入的技术实验设计。我们认为，专门针对个别科学领域进行微调的 LLM 有可能成为更好的语言界面，将人类专家的科学想法转换为可执行的工作流程。人类的专业知识与 LLM 在实验中的效率之间的这种协同作用，可以为加速科学研究打开新的大门，使科学界能够共享有效的实验方案。

{"title":"Synergizing human expertise and AI efficiency with language model for microscopy operation and automated experiment design *","authors":"Yongtao Liu, Marti Checa and Rama K Vasudevan","doi":"10.1088/2632-2153/ad52e9","DOIUrl":"https://doi.org/10.1088/2632-2153/ad52e9","url":null,"abstract":"With the advent of large language models (LLMs), in both the open source and proprietary domains, attention is turning to how to exploit such artificial intelligence (AI) systems in assisting complex scientific tasks, such as material synthesis, characterization, analysis and discovery. Here, we explore the utility of LLMs, particularly ChatGPT4, in combination with application program interfaces (APIs) in tasks of experimental design, programming workflows, and data analysis in scanning probe microscopy, using both in-house developed APIs and APIs given by a commercial vendor for instrument control. We find that the LLM can be especially useful in converting ideations of experimental workflows to executable code on microscope APIs. Beyond code generation, we find that the GPT4 is capable of analyzing microscopy images in a generic sense. At the same time, we find that GPT4 suffers from an inability to extend beyond basic analyses for more in-depth technical experimental design. We argue that an LLM specifically fine-tuned for individual scientific domains can potentially be a better language interface for converting scientific ideations from human experts to executable workflows. Such a synergy between human expertise and LLM efficiency in experimentation can open new doors for accelerating scientific research, enabling effective experimental protocols sharing in the scientific community.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"39 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0