Pub Date : 2024-06-27DOI: 10.1088/2632-2153/ad5784
Atsarina Larasati Anindya, Torbjörn Nur Olsson, Maja Jensen, Maria-Jose Garcia-Bonete, Sally P Wheatley, Maria I Bokarewa, Stefano A Mezzasalma and Gergely Katona
In the realm of atomic physics and chemistry, composition emerges as the most powerful means of describing matter. Mendeleev’s periodic table and chemical formulas, while not entirely free from ambiguities, provide robust approximations for comprehending the properties of atoms, chemicals, and their collective behaviours, which stem from the dynamic interplay of their constituents. Our study illustrates that protein-protein interactions follow a similar paradigm, wherein the composition of peptides plays a pivotal role in predicting their interactions with the protein survivin, using an elegantly simple model. An analysis of these predictions within the context of the human proteome not only confirms the known cellular locations of survivin and its interaction partners, but also introduces novel insights into biological functionality. It becomes evident that electrostatic- and primary structure-based descriptions fall short in predictive power, leading us to speculate that protein interactions are orchestrated by the collective dynamics of functional groups.
{"title":"Deciphering peptide-protein interactions via composition-based prediction: a case study with survivin/BIRC5","authors":"Atsarina Larasati Anindya, Torbjörn Nur Olsson, Maja Jensen, Maria-Jose Garcia-Bonete, Sally P Wheatley, Maria I Bokarewa, Stefano A Mezzasalma and Gergely Katona","doi":"10.1088/2632-2153/ad5784","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5784","url":null,"abstract":"In the realm of atomic physics and chemistry, composition emerges as the most powerful means of describing matter. Mendeleev’s periodic table and chemical formulas, while not entirely free from ambiguities, provide robust approximations for comprehending the properties of atoms, chemicals, and their collective behaviours, which stem from the dynamic interplay of their constituents. Our study illustrates that protein-protein interactions follow a similar paradigm, wherein the composition of peptides plays a pivotal role in predicting their interactions with the protein survivin, using an elegantly simple model. An analysis of these predictions within the context of the human proteome not only confirms the known cellular locations of survivin and its interaction partners, but also introduces novel insights into biological functionality. It becomes evident that electrostatic- and primary structure-based descriptions fall short in predictive power, leading us to speculate that protein interactions are orchestrated by the collective dynamics of functional groups.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"236 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1088/2632-2153/ad5a5f
Enrico Ventura, Simona Cocco, Rémi Monasson and Francesco Zamponi
Boltzmann machines (BMs) are graphical models with interconnected binary units, employed for the unsupervised modeling of data distributions. When trained on real data, BMs show the tendency to behave like critical systems, displaying a high susceptibility of the model under a small rescaling of the inferred parameters. This behavior is not convenient for the purpose of generating data, because it slows down the sampling process, and induces the model to overfit the training-data. In this study, we introduce a regularization method for BMs to improve the robustness of the model under rescaling of the parameters. The new technique shares formal similarities with the unlearning algorithm, an iterative procedure used to improve memory associativity in Hopfield-like neural networks. We test our unlearning regularization on synthetic data generated by two simple models, the Curie–Weiss ferromagnetic model and the Sherrington–Kirkpatrick spin glass model. We show that it outperforms Lp-norm schemes and discuss the role of parameter initialization. Eventually, the method is applied to learn the activity of real neuronal cells, confirming its efficacy at shifting the inferred model away from criticality and coming out as a powerful candidate for actual scientific implementations.
{"title":"Unlearning regularization for Boltzmann machines","authors":"Enrico Ventura, Simona Cocco, Rémi Monasson and Francesco Zamponi","doi":"10.1088/2632-2153/ad5a5f","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5a5f","url":null,"abstract":"Boltzmann machines (BMs) are graphical models with interconnected binary units, employed for the unsupervised modeling of data distributions. When trained on real data, BMs show the tendency to behave like critical systems, displaying a high susceptibility of the model under a small rescaling of the inferred parameters. This behavior is not convenient for the purpose of generating data, because it slows down the sampling process, and induces the model to overfit the training-data. In this study, we introduce a regularization method for BMs to improve the robustness of the model under rescaling of the parameters. The new technique shares formal similarities with the unlearning algorithm, an iterative procedure used to improve memory associativity in Hopfield-like neural networks. We test our unlearning regularization on synthetic data generated by two simple models, the Curie–Weiss ferromagnetic model and the Sherrington–Kirkpatrick spin glass model. We show that it outperforms Lp-norm schemes and discuss the role of parameter initialization. Eventually, the method is applied to learn the activity of real neuronal cells, confirming its efficacy at shifting the inferred model away from criticality and coming out as a powerful candidate for actual scientific implementations.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"9 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141532643","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-26DOI: 10.1088/2632-2153/ad5927
Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai
Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.
{"title":"Unification of symmetries inside neural networks: transformer, feedforward and neural ODE","authors":"Koji Hashimoto, Yuji Hirono and Akiyoshi Sannai","doi":"10.1088/2632-2153/ad5927","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5927","url":null,"abstract":"Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein’s theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"2016 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-24DOI: 10.1088/2632-2153/ad580f
Biling Wang, Michael Dohopolski, Ti Bai, Junjie Wu, Raquibul Hannan, Neil Desai, Aurelie Garant, Daniel Yang, Dan Nguyen, Mu-Han Lin, Robert Timmerman, Xinlei Wang and Steve B Jiang
Our study aims to explore the long-term performance patterns for deep learning (DL) models deployed in clinic and to investigate their efficacy in relation to evolving clinical practices. We conducted a retrospective study simulating the clinical implementation of our DL model involving 1328 prostate cancer patients treated between January 2006 and August 2022. We trained and validated a U-Net-based auto-segmentation model on data obtained from 2006 to 2011 and tested on data from 2012 to 2022, simulating the model’s clinical deployment starting in 2012. We visualized the trends of the model performance using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test and multiple linear regression to investigate Dice similarity coefficient (DSC) variations across distinct periods and the impact of clinical factors, respectively. Initially, from 2012 to 2014, the model showed high performance in segmenting the prostate, rectum, and bladder. Post-2015, a notable decline in EMA DSC was observed for the prostate and rectum, while bladder contours remained stable. Key factors impacting the prostate contour quality included physician contouring styles, using various hydrogel spacers, CT scan slice thickness, MRI-guided contouring, and intravenous (IV) contrast (p < 0.0001, p < 0.0001, p = 0.0085, p = 0.0012, p < 0.0001, respectively). Rectum contour quality was notably influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The quality of the bladder contour was primarily affected by IV contrast. The deployed DL model exhibited a substantial decline in performance over time, aligning with the evolving clinical settings.
{"title":"Performance deterioration of deep learning models after clinical deployment: a case study with auto-segmentation for definitive prostate cancer radiotherapy","authors":"Biling Wang, Michael Dohopolski, Ti Bai, Junjie Wu, Raquibul Hannan, Neil Desai, Aurelie Garant, Daniel Yang, Dan Nguyen, Mu-Han Lin, Robert Timmerman, Xinlei Wang and Steve B Jiang","doi":"10.1088/2632-2153/ad580f","DOIUrl":"https://doi.org/10.1088/2632-2153/ad580f","url":null,"abstract":"Our study aims to explore the long-term performance patterns for deep learning (DL) models deployed in clinic and to investigate their efficacy in relation to evolving clinical practices. We conducted a retrospective study simulating the clinical implementation of our DL model involving 1328 prostate cancer patients treated between January 2006 and August 2022. We trained and validated a U-Net-based auto-segmentation model on data obtained from 2006 to 2011 and tested on data from 2012 to 2022, simulating the model’s clinical deployment starting in 2012. We visualized the trends of the model performance using exponentially weighted moving average (EMA) curves. Additionally, we performed Wilcoxon Rank Sum Test and multiple linear regression to investigate Dice similarity coefficient (DSC) variations across distinct periods and the impact of clinical factors, respectively. Initially, from 2012 to 2014, the model showed high performance in segmenting the prostate, rectum, and bladder. Post-2015, a notable decline in EMA DSC was observed for the prostate and rectum, while bladder contours remained stable. Key factors impacting the prostate contour quality included physician contouring styles, using various hydrogel spacers, CT scan slice thickness, MRI-guided contouring, and intravenous (IV) contrast (p < 0.0001, p < 0.0001, p = 0.0085, p = 0.0012, p < 0.0001, respectively). Rectum contour quality was notably influenced by factors such as slice thickness, physician contouring styles, and the use of various hydrogel spacers. The quality of the bladder contour was primarily affected by IV contrast. The deployed DL model exhibited a substantial decline in performance over time, aligning with the evolving clinical settings.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"159 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1088/2632-2153/ad55a3
Matthias Vigl, Nicole Hartman and Lukas Heinrich
In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs system to four b-jets. To our knowledge this is the first example of a low-level feature extraction network finetuned for a downstream HEP analysis objective.
在这项工作中,我们证明了在高能物理(HEP)中,通过超越顺序优化或重建和分析组件的标准范式,可以显著提高性能和数据效率。我们从概念上将高能物理重构和分析与预训练、微调、域适应和高维嵌入空间等现代机器学习工作流程联系起来,并量化了在通过中间二希格斯系统衰变到四个 b 喷射的重共振搜索示例用例中的收益。据我们所知,这是第一个针对下游 HEP 分析目标对低级特征提取网络进行微调的例子。
{"title":"Finetuning foundation models for joint analysis optimization in High Energy Physics","authors":"Matthias Vigl, Nicole Hartman and Lukas Heinrich","doi":"10.1088/2632-2153/ad55a3","DOIUrl":"https://doi.org/10.1088/2632-2153/ad55a3","url":null,"abstract":"In this work we demonstrate that significant gains in performance and data efficiency can be achieved in High Energy Physics (HEP) by moving beyond the standard paradigm of sequential optimization or reconstruction and analysis components. We conceptually connect HEP reconstruction and analysis to modern machine learning workflows such as pretraining, finetuning, domain adaptation and high-dimensional embedding spaces and quantify the gains in the example usecase of searches of heavy resonances decaying via an intermediate di-Higgs system to four b-jets. To our knowledge this is the first example of a low-level feature extraction network finetuned for a downstream HEP analysis objective.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"21 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1088/2632-2153/ad5783
Indaco Biazzo, Dian Wu and Giuseppe Carleo
Efficient sampling and approximation of Boltzmann distributions involving large sets of binary variables, or spins, are pivotal in diverse scientific fields even beyond physics. Recent advances in generative neural networks have significantly impacted this domain. However, these neural networks are often treated as black boxes, with architectures primarily influenced by data-driven problems in computational science. Addressing this gap, we introduce a novel autoregressive neural network architecture named TwoBo, specifically designed for sparse two-body interacting spin systems. We directly incorporate the Boltzmann distribution into its architecture and parameters, resulting in enhanced convergence speed, superior free energy accuracy, and reduced trainable parameters. We perform numerical experiments on disordered, frustrated systems with more than 1000 spins on grids and random graphs, and demonstrate its advantages compared to previous autoregressive and recurrent architectures. Our findings validate a physically informed approach and suggest potential extensions to multivalued variables and many-body interaction systems, paving the way for broader applications in scientific research.
{"title":"Sparse autoregressive neural networks for classical spin systems","authors":"Indaco Biazzo, Dian Wu and Giuseppe Carleo","doi":"10.1088/2632-2153/ad5783","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5783","url":null,"abstract":"Efficient sampling and approximation of Boltzmann distributions involving large sets of binary variables, or spins, are pivotal in diverse scientific fields even beyond physics. Recent advances in generative neural networks have significantly impacted this domain. However, these neural networks are often treated as black boxes, with architectures primarily influenced by data-driven problems in computational science. Addressing this gap, we introduce a novel autoregressive neural network architecture named TwoBo, specifically designed for sparse two-body interacting spin systems. We directly incorporate the Boltzmann distribution into its architecture and parameters, resulting in enhanced convergence speed, superior free energy accuracy, and reduced trainable parameters. We perform numerical experiments on disordered, frustrated systems with more than 1000 spins on grids and random graphs, and demonstrate its advantages compared to previous autoregressive and recurrent architectures. Our findings validate a physically informed approach and suggest potential extensions to multivalued variables and many-body interaction systems, paving the way for broader applications in scientific research.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"46 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141531276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-20DOI: 10.1088/2632-2153/ad5411
Alexander Luce, Rasoul Alaee, Fabian Knorr and Florian Marquardt
Optimizing the shapes and topology of physical devices is crucial for both scientific and technological advancements, given their wide-ranging implications across numerous industries and research areas. Innovations in shape and topology optimization have been observed across a wide range of fields, notably structural mechanics, fluid mechanics, and more recently, photonics. Gradient-based inverse design techniques have been particularly successful for photonic and optical problems, resulting in integrated, miniaturized hardware that has set new standards in device performance. To calculate the gradients, there are typically two approaches: namely, either by implementing specialized solvers using automatic differentiation (AD) or by deriving analytical solutions for gradient calculation and adjoint sources by hand. In this work, we propose a middle ground and present a hybrid approach that leverages and enables the benefits of AD for handling gradient derivation while using existing, proven but black-box photonic solvers for numerical solutions. Utilizing the adjoint method, we make existing numerical solvers differentiable and seamlessly integrate them into an AD framework. Further, this enables users to integrate the optimization environment seamlessly with other autodifferentiable components such as machine learning, geometry generation, or intricate post-processing which could lead to better photonic design workflows. We illustrate the approach through two distinct photonic optimization problems: optimizing the Purcell factor of a magnetic dipole in the vicinity of an optical nanocavity and enhancing the light extraction efficiency of a µLED.
优化物理设备的形状和拓扑结构对科学和技术进步至关重要,因为这对众多行业和研究领域都有广泛影响。形状和拓扑优化方面的创新已经遍及各个领域,特别是结构力学、流体力学和最近的光子学。基于梯度的逆向设计技术在解决光子和光学问题方面尤为成功,其集成化、微型化硬件为设备性能设定了新标准。要计算梯度,通常有两种方法:一是使用自动微分(AD)实现专门的求解器,二是手工推导梯度计算和邻接源的解析解。在这项工作中,我们提出了一个中间方案,并提出了一种混合方法,即利用自动微分的优势处理梯度推导,同时使用现有的、经过验证的黑盒光子求解器进行数值求解。利用邻接法,我们使现有的数值求解器可微分,并将其无缝集成到 AD 框架中。此外,这还能让用户将优化环境与机器学习、几何生成或复杂的后处理等其他自动可微分组件无缝集成,从而实现更好的光子设计工作流程。我们通过两个不同的光子优化问题来说明这种方法:优化光学纳米腔附近磁偶极子的珀塞尔因子和提高 µLED 的光提取效率。
{"title":"Merging automatic differentiation and the adjoint method for photonic inverse design","authors":"Alexander Luce, Rasoul Alaee, Fabian Knorr and Florian Marquardt","doi":"10.1088/2632-2153/ad5411","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5411","url":null,"abstract":"Optimizing the shapes and topology of physical devices is crucial for both scientific and technological advancements, given their wide-ranging implications across numerous industries and research areas. Innovations in shape and topology optimization have been observed across a wide range of fields, notably structural mechanics, fluid mechanics, and more recently, photonics. Gradient-based inverse design techniques have been particularly successful for photonic and optical problems, resulting in integrated, miniaturized hardware that has set new standards in device performance. To calculate the gradients, there are typically two approaches: namely, either by implementing specialized solvers using automatic differentiation (AD) or by deriving analytical solutions for gradient calculation and adjoint sources by hand. In this work, we propose a middle ground and present a hybrid approach that leverages and enables the benefits of AD for handling gradient derivation while using existing, proven but black-box photonic solvers for numerical solutions. Utilizing the adjoint method, we make existing numerical solvers differentiable and seamlessly integrate them into an AD framework. Further, this enables users to integrate the optimization environment seamlessly with other autodifferentiable components such as machine learning, geometry generation, or intricate post-processing which could lead to better photonic design workflows. We illustrate the approach through two distinct photonic optimization problems: optimizing the Purcell factor of a magnetic dipole in the vicinity of an optical nanocavity and enhancing the light extraction efficiency of a µLED.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"12 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141504573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-19DOI: 10.1088/2632-2153/ad56fa
Woo Seok Lee, Taegeun Song and Kyoung-Min Kim
The application of twist engineering in van der Waals magnets has opened new frontiers in the field of two-dimensional magnetism, yielding distinctive magnetic domain structures. Despite the introduction of numerous theoretical methods, limitations persist in terms of accuracy or efficiency due to the complex nature of the magnetic Hamiltonians pertinent to these systems. In this study, we introduce a deep-learning approach to tackle these challenges. Utilizing customized, fully connected networks, we develop two deep-neural-network kernels that facilitate efficient and reliable analysis of twisted van der Waals magnets. Our regression model is adept at estimating the magnetic Hamiltonian parameters of twisted bilayer CrI3 from its magnetic domain images generated through atomistic spin simulations. The ‘generative model’ excels in producing precise magnetic domain images from the provided magnetic parameters. The trained networks for these models undergo thorough validation, including statistical error analysis and assessment of robustness against noisy injections. These advancements not only extend the applicability of deep-learning methods to twisted van der Waals magnets but also streamline future investigations into these captivating yet poorly understood systems.
{"title":"Deep learning methods for Hamiltonian parameter estimation and magnetic domain image generation in twisted van der Waals magnets","authors":"Woo Seok Lee, Taegeun Song and Kyoung-Min Kim","doi":"10.1088/2632-2153/ad56fa","DOIUrl":"https://doi.org/10.1088/2632-2153/ad56fa","url":null,"abstract":"The application of twist engineering in van der Waals magnets has opened new frontiers in the field of two-dimensional magnetism, yielding distinctive magnetic domain structures. Despite the introduction of numerous theoretical methods, limitations persist in terms of accuracy or efficiency due to the complex nature of the magnetic Hamiltonians pertinent to these systems. In this study, we introduce a deep-learning approach to tackle these challenges. Utilizing customized, fully connected networks, we develop two deep-neural-network kernels that facilitate efficient and reliable analysis of twisted van der Waals magnets. Our regression model is adept at estimating the magnetic Hamiltonian parameters of twisted bilayer CrI3 from its magnetic domain images generated through atomistic spin simulations. The ‘generative model’ excels in producing precise magnetic domain images from the provided magnetic parameters. The trained networks for these models undergo thorough validation, including statistical error analysis and assessment of robustness against noisy injections. These advancements not only extend the applicability of deep-learning methods to twisted van der Waals magnets but also streamline future investigations into these captivating yet poorly understood systems.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"86 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1088/2632-2153/ad51cc
Kevin Otto, Simon Burgis, Kristian Kersting, Reinhold Bertrand and Devendra Singh Dhami
The number of satellites in orbit around Earth is increasing rapidly, with the risk of collision rising accordingly. Trends of the global population of satellites need to be analyzed to test the viability and impact of proposed rules and laws affecting the satellite population and collision avoidance strategies. This requires large scale simulations of satellites that are propagated on long timescales to compute the large amounts of actionable close encounters (called conjunctions), which could lead to collisions. Rigorously checking for conjunctions by computing future states of orbits is computationally expensive due to the large amount of objects involved and conjunction filters are thus used to remove non-conjuncting orbit pairs from the list of possible conjunctions. In this work, we explore the possibility of machine learning (ML) based conjunction filters using several algorithms such as eXtreme Gradient Boosting, TabNet and (physics-informed) neural networks and deep operator networks. To show the viability and the potential of ML based filters, these algorithms are trained to predict the future state of orbits. For the physics-informed approaches, multiple partial differential equations are set up using the Kepler equation as a basis. The empirical results demonstrate that physics-informed deep operator networks are capable of predicting the future state of orbits using these equations (RMSE: 0.136) and outperform eXtreme Gradient Boosting (RMSE: 0.568) and TabNet (RMSE: 0.459). We also propose a filter based on the trained deep operator network which is shown to outperforms the filter capability of the commonly used perigee-apogee test and the orbit path filter on a synthetic dataset, while being on average 3.2 times faster to compute than a rigorous conjunction check.
环绕地球轨道的卫星数量正在迅速增加,碰撞风险也随之上升。需要对全球卫星数量的趋势进行分析,以检验影响卫星数量和避免碰撞战略的拟议规则和法律的可行性和影响。这就需要对卫星进行大规模模拟,在长时间尺度上进行传播,以计算可能导致碰撞的大量可操作的近距离相遇(称为会合)。由于涉及大量物体,通过计算轨道的未来状态来严格检查会合情况的计算成本很高,因此会合过滤器被用来从可能的会合列表中剔除非会合轨道对。在这项工作中,我们探索了基于机器学习(ML)的会合过滤器的可能性,使用了几种算法,如极端梯度提升、TabNet 和(物理信息)神经网络以及深度算子网络。为了展示基于 ML 的滤波器的可行性和潜力,对这些算法进行了预测轨道未来状态的训练。对于物理信息方法,以开普勒方程为基础建立了多个偏微分方程。实证结果表明,基于物理信息的深度算子网络能够利用这些方程预测轨道的未来状态(RMSE:0.136),并优于极梯度提升(RMSE:0.568)和 TabNet(RMSE:0.459)。我们还提出了一种基于训练有素的深度算子网络的滤波器,结果表明该滤波器的滤波能力优于常用的近地点-远地点测试和合成数据集上的轨道路径滤波器,同时计算速度平均比严格的会合检查快 3.2 倍。
{"title":"Machine learning meets Kepler: inverting Kepler’s equation for All vs All conjunction analysis","authors":"Kevin Otto, Simon Burgis, Kristian Kersting, Reinhold Bertrand and Devendra Singh Dhami","doi":"10.1088/2632-2153/ad51cc","DOIUrl":"https://doi.org/10.1088/2632-2153/ad51cc","url":null,"abstract":"The number of satellites in orbit around Earth is increasing rapidly, with the risk of collision rising accordingly. Trends of the global population of satellites need to be analyzed to test the viability and impact of proposed rules and laws affecting the satellite population and collision avoidance strategies. This requires large scale simulations of satellites that are propagated on long timescales to compute the large amounts of actionable close encounters (called conjunctions), which could lead to collisions. Rigorously checking for conjunctions by computing future states of orbits is computationally expensive due to the large amount of objects involved and conjunction filters are thus used to remove non-conjuncting orbit pairs from the list of possible conjunctions. In this work, we explore the possibility of machine learning (ML) based conjunction filters using several algorithms such as eXtreme Gradient Boosting, TabNet and (physics-informed) neural networks and deep operator networks. To show the viability and the potential of ML based filters, these algorithms are trained to predict the future state of orbits. For the physics-informed approaches, multiple partial differential equations are set up using the Kepler equation as a basis. The empirical results demonstrate that physics-informed deep operator networks are capable of predicting the future state of orbits using these equations (RMSE: 0.136) and outperform eXtreme Gradient Boosting (RMSE: 0.568) and TabNet (RMSE: 0.459). We also propose a filter based on the trained deep operator network which is shown to outperforms the filter capability of the commonly used perigee-apogee test and the orbit path filter on a synthetic dataset, while being on average 3.2 times faster to compute than a rigorous conjunction check.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"34 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-06-13DOI: 10.1088/2632-2153/ad4e04
Ammar Sherif, Abubakar Abid, Mustafa Elattar and Mohamed ElHelw
Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference between tasks. That is why existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on a re-proposed data-driven features, Data Maps, which capture the training dynamics for each classification task during the MTL training. Through a theoretical comparison with other techniques, we manage to show that our approach has the superior scalability. Our experiments show a better performance and verify the method’s effectiveness, even on an unprecedented number of tasks (up to 100 tasks on CIFAR100). Being the first to work on such number of tasks, our comparisons on the resulting grouping shows similar grouping to the mentioned in the dataset, CIFAR100. Finally, we provide a modular implementation3for easier integration and testing, with examples from multiple datasets and tasks.
{"title":"STG-MTL: scalable task grouping for multi-task learning using data maps","authors":"Ammar Sherif, Abubakar Abid, Mustafa Elattar and Mohamed ElHelw","doi":"10.1088/2632-2153/ad4e04","DOIUrl":"https://doi.org/10.1088/2632-2153/ad4e04","url":null,"abstract":"Multi-Task Learning (MTL) is a powerful technique that has gained popularity due to its performance improvement over traditional Single-Task Learning (STL). However, MTL is often challenging because there is an exponential number of possible task groupings, which can make it difficult to choose the best one because some groupings might produce performance degradation due to negative interference between tasks. That is why existing solutions are severely suffering from scalability issues, limiting any practical application. In our paper, we propose a new data-driven method that addresses these challenges and provides a scalable and modular solution for classification task grouping based on a re-proposed data-driven features, Data Maps, which capture the training dynamics for each classification task during the MTL training. Through a theoretical comparison with other techniques, we manage to show that our approach has the superior scalability. Our experiments show a better performance and verify the method’s effectiveness, even on an unprecedented number of tasks (up to 100 tasks on CIFAR100). Being the first to work on such number of tasks, our comparisons on the resulting grouping shows similar grouping to the mentioned in the dataset, CIFAR100. Finally, we provide a modular implementation3for easier integration and testing, with examples from multiple datasets and tasks.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":"10 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141519159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}