Pub Date : 2024-08-11DOI: 10.1088/2632-2153/ad65b5
M Paiano, S Martina, C Giannelli and F Caruso
The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such data scarcity is found in the domain of marine biology, where it is useful to develop methods to automatically detect submarine species for environmental monitoring. To address this data limitation, the state-of-the-art machine learning strategies employ two main approaches. The first involves pretraining models on existing datasets before generalizing to the specific domain of interest. The second strategy is to create synthetic datasets specifically tailored to the target domain using methods like copy-paste techniques or ad-hoc simulators. The first strategy often faces a significant domain shift, while the second demands custom solutions crafted for the specific task. In response to these challenges, here we propose a transfer learning framework that is valid for a generic scenario. In this framework, generated images help to improve the performances of an object detector in a few-real data regime. This is achieved through a diffusion-based generative model that was pretrained on large generic datasets. With respect to the state-of-the-art, we find that it is not necessary to fine tune the generative model on the specific domain of interest. We believe that this is an important advance because it mitigates the labor-intensive task of manual labeling the images in object detection tasks. We validate our approach focusing on fishes in an underwater environment, and on the more common domain of cars in an urban setting. Our method achieves detection performance comparable to models trained on thousands of images, using only a few hundreds of input data. Our results pave the way for new generative AI-based protocols for machine learning applications in various domains, for instance ranging from geophysics to biology and medicine.
{"title":"Transfer learning with generative models for object detection on limited datasets","authors":"M Paiano, S Martina, C Giannelli and F Caruso","doi":"10.1088/2632-2153/ad65b5","DOIUrl":"https://doi.org/10.1088/2632-2153/ad65b5","url":null,"abstract":"The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such data scarcity is found in the domain of marine biology, where it is useful to develop methods to automatically detect submarine species for environmental monitoring. To address this data limitation, the state-of-the-art machine learning strategies employ two main approaches. The first involves pretraining models on existing datasets before generalizing to the specific domain of interest. The second strategy is to create synthetic datasets specifically tailored to the target domain using methods like copy-paste techniques or ad-hoc simulators. The first strategy often faces a significant domain shift, while the second demands custom solutions crafted for the specific task. In response to these challenges, here we propose a transfer learning framework that is valid for a generic scenario. In this framework, generated images help to improve the performances of an object detector in a few-real data regime. This is achieved through a diffusion-based generative model that was pretrained on large generic datasets. With respect to the state-of-the-art, we find that it is not necessary to fine tune the generative model on the specific domain of interest. We believe that this is an important advance because it mitigates the labor-intensive task of manual labeling the images in object detection tasks. We validate our approach focusing on fishes in an underwater environment, and on the more common domain of cars in an urban setting. Our method achieves detection performance comparable to models trained on thousands of images, using only a few hundreds of input data. Our results pave the way for new generative AI-based protocols for machine learning applications in various domains, for instance ranging from geophysics to biology and medicine.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1088/2632-2153/ad6830
André Sequeira, Luis Paulo Santos and Luis Soares Barbosa
This research explores the trainability of Parameterized Quantum Circuit-based policies in Reinforcement Learning, an area that has recently seen a surge in empirical exploration. While some studies suggest improved sample complexity using quantum gradient estimation, the efficient trainability of these policies remains an open question. Our findings reveal significant challenges, including standard Barren Plateaus with exponentially small gradients and gradient explosion. These phenomena depend on the type of basis-state partitioning and the mapping of these partitions onto actions. For a polynomial number of actions, a trainable window can be ensured with a polynomial number of measurements if a contiguous-like partitioning of basis-states is employed. These results are empirically validated in a multi-armed bandit environment.
{"title":"Trainability issues in quantum policy gradients","authors":"André Sequeira, Luis Paulo Santos and Luis Soares Barbosa","doi":"10.1088/2632-2153/ad6830","DOIUrl":"https://doi.org/10.1088/2632-2153/ad6830","url":null,"abstract":"This research explores the trainability of Parameterized Quantum Circuit-based policies in Reinforcement Learning, an area that has recently seen a surge in empirical exploration. While some studies suggest improved sample complexity using quantum gradient estimation, the efficient trainability of these policies remains an open question. Our findings reveal significant challenges, including standard Barren Plateaus with exponentially small gradients and gradient explosion. These phenomena depend on the type of basis-state partitioning and the mapping of these partitions onto actions. For a polynomial number of actions, a trainable window can be ensured with a polynomial number of measurements if a contiguous-like partitioning of basis-states is employed. These results are empirically validated in a multi-armed bandit environment.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-05DOI: 10.1088/2632-2153/ad652c
Khaled Kahouli, Stefaan Simon Pierre Hessmann, Klaus-Robert Müller, Shinichi Nakajima, Stefan Gugler and Niklas Wolf Andreas Gebauer
Molecular relaxation, finding the equilibrium state of a non-equilibrium structure, is an essential component of computational chemistry to understand reactivity. Classical force field (FF) methods often rely on insufficient local energy minimization, while neural network FF models require large labeled datasets encompassing both equilibrium and non-equilibrium structures. As a remedy, we propose MoreRed, molecular relaxation by reverse diffusion, a conceptually novel and purely statistical approach where non-equilibrium structures are treated as noisy instances of their corresponding equilibrium states. To enable the denoising of arbitrarily noisy inputs via a generative diffusion model, we further introduce a novel diffusion time step predictor. Notably, MoreRed learns a simpler pseudo potential energy surface (PES) instead of the complex physical PES. It is trained on a significantly smaller, and thus computationally cheaper, dataset consisting of solely unlabeled equilibrium structures, avoiding the computation of non-equilibrium structures altogether. We compare MoreRed to classical FFs, equivariant neural network FFs trained on a large dataset of equilibrium and non-equilibrium data, as well as a semi-empirical tight-binding model. To assess this quantitatively, we evaluate the root-mean-square deviation between the found equilibrium structures and the reference equilibrium structures as well as their energies.
{"title":"Molecular relaxation by reverse diffusion with time step prediction","authors":"Khaled Kahouli, Stefaan Simon Pierre Hessmann, Klaus-Robert Müller, Shinichi Nakajima, Stefan Gugler and Niklas Wolf Andreas Gebauer","doi":"10.1088/2632-2153/ad652c","DOIUrl":"https://doi.org/10.1088/2632-2153/ad652c","url":null,"abstract":"Molecular relaxation, finding the equilibrium state of a non-equilibrium structure, is an essential component of computational chemistry to understand reactivity. Classical force field (FF) methods often rely on insufficient local energy minimization, while neural network FF models require large labeled datasets encompassing both equilibrium and non-equilibrium structures. As a remedy, we propose MoreRed, molecular relaxation by reverse diffusion, a conceptually novel and purely statistical approach where non-equilibrium structures are treated as noisy instances of their corresponding equilibrium states. To enable the denoising of arbitrarily noisy inputs via a generative diffusion model, we further introduce a novel diffusion time step predictor. Notably, MoreRed learns a simpler pseudo potential energy surface (PES) instead of the complex physical PES. It is trained on a significantly smaller, and thus computationally cheaper, dataset consisting of solely unlabeled equilibrium structures, avoiding the computation of non-equilibrium structures altogether. We compare MoreRed to classical FFs, equivariant neural network FFs trained on a large dataset of equilibrium and non-equilibrium data, as well as a semi-empirical tight-binding model. To assess this quantitatively, we evaluate the root-mean-square deviation between the found equilibrium structures and the reference equilibrium structures as well as their energies.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-04DOI: 10.1088/2632-2153/ad66af
Xiaobo Zhu, Yan Wu, Jin Che, Chao Wang, Liying Wang and Zhanheng Chen
Representation learning over graph networks has recently gained popularity, with many models showing promising results. However, several challenges remain: (1) most methods are designed for static or discrete-time dynamic graphs; (2) existing continuous-time dynamic graph algorithms focus on a single evolving perspective; and (3) many continuous-time dynamic graph approaches necessitate numerous temporal neighbors to capture long-term dependencies. In response, this paper introduces a Multi-Perspective Feedback-Attention Coupling (MPFA) model. MPFA incorporates information from both evolving and original perspectives to effectively learn the complex dynamics of dynamic graph evolution processes. The evolving perspective considers the current state of historical interaction events of nodes and uses a temporal attention module to aggregate current state information. This perspective also makes it possible to capture long-term dependencies of nodes using a small number of temporal neighbors. Meanwhile, the original perspective utilizes a feedback attention module with growth characteristic coefficients to aggregate the original state information of node interactions. Experimental results on one dataset organized by ourselves and seven public datasets validate the effectiveness and competitiveness of our proposed model.
{"title":"Multi-perspective feedback-attention coupling model for continuous-time dynamic graphs","authors":"Xiaobo Zhu, Yan Wu, Jin Che, Chao Wang, Liying Wang and Zhanheng Chen","doi":"10.1088/2632-2153/ad66af","DOIUrl":"https://doi.org/10.1088/2632-2153/ad66af","url":null,"abstract":"Representation learning over graph networks has recently gained popularity, with many models showing promising results. However, several challenges remain: (1) most methods are designed for static or discrete-time dynamic graphs; (2) existing continuous-time dynamic graph algorithms focus on a single evolving perspective; and (3) many continuous-time dynamic graph approaches necessitate numerous temporal neighbors to capture long-term dependencies. In response, this paper introduces a Multi-Perspective Feedback-Attention Coupling (MPFA) model. MPFA incorporates information from both evolving and original perspectives to effectively learn the complex dynamics of dynamic graph evolution processes. The evolving perspective considers the current state of historical interaction events of nodes and uses a temporal attention module to aggregate current state information. This perspective also makes it possible to capture long-term dependencies of nodes using a small number of temporal neighbors. Meanwhile, the original perspective utilizes a feedback attention module with growth characteristic coefficients to aggregate the original state information of node interactions. Experimental results on one dataset organized by ourselves and seven public datasets validate the effectiveness and competitiveness of our proposed model.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-04DOI: 10.1088/2632-2153/ad6390
Tracey Oellerich and Maria Emelianenko
Conservation laws are an inherent feature in many systems modeling real world phenomena, in particular, those modeling biological and chemical systems. If the form of the underlying dynamical system is known, linear algebra and algebraic geometry methods can be used to identify the conservation laws. Our work focuses on using data-driven methods to identify the conservation law(s) in the absence of the knowledge of system dynamics. We develop a robust data-driven computational framework that automates the process of identifying the number and type of the conservation law(s) while keeping the amount of required data to a minimum. We demonstrate that due to relative stability of singular vectors to noise we are able to reconstruct correct conservation laws without the need for excessive parameter tuning. While we focus primarily on biological examples, the framework proposed herein is suitable for a variety of data science applications and can be coupled with other machine learning approaches.
{"title":"Towards robust data-driven automated recovery of symbolic conservation laws from limited data","authors":"Tracey Oellerich and Maria Emelianenko","doi":"10.1088/2632-2153/ad6390","DOIUrl":"https://doi.org/10.1088/2632-2153/ad6390","url":null,"abstract":"Conservation laws are an inherent feature in many systems modeling real world phenomena, in particular, those modeling biological and chemical systems. If the form of the underlying dynamical system is known, linear algebra and algebraic geometry methods can be used to identify the conservation laws. Our work focuses on using data-driven methods to identify the conservation law(s) in the absence of the knowledge of system dynamics. We develop a robust data-driven computational framework that automates the process of identifying the number and type of the conservation law(s) while keeping the amount of required data to a minimum. We demonstrate that due to relative stability of singular vectors to noise we are able to reconstruct correct conservation laws without the need for excessive parameter tuning. While we focus primarily on biological examples, the framework proposed herein is suitable for a variety of data science applications and can be coupled with other machine learning approaches.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-04DOI: 10.1088/2632-2153/ad64a6
Ryan Humble, Zhe Zhang, Finn O’Shea, Eric Darve and Daniel Ratner
Anomaly detection is an important task for complex scientific experiments and other complex systems (e.g. industrial facilities, manufacturing), where failures in a sub-system can lead to lost data, poor performance, or even damage to components. While scientific facilities generate a wealth of data, labeled anomalies may be rare (or even nonexistent), and expensive to acquire. Unsupervised approaches are therefore common and typically search for anomalies either by distance or density of examples in the input feature space (or some associated low-dimensional representation). This paper presents a novel approach called coincident learning for anomaly detection (CoAD), which is specifically designed for multi-modal tasks and identifies anomalies based on coincident behavior across two different slices of the feature space. We define an unsupervised metric, , out of analogy to the supervised classification Fβ statistic. CoAD uses to train an anomaly detection algorithm on unlabeled data, based on the expectation that anomalous behavior in one feature slice is coincident with anomalous behavior in the other. The method is illustrated using a synthetic outlier data set and a MNIST-based image data set, and is compared to prior state-of-the-art on two real-world tasks: a metal milling data set and our motivating task of identifying RF station anomalies in a particle accelerator.
{"title":"Coincident learning for unsupervised anomaly detection of scientific instruments","authors":"Ryan Humble, Zhe Zhang, Finn O’Shea, Eric Darve and Daniel Ratner","doi":"10.1088/2632-2153/ad64a6","DOIUrl":"https://doi.org/10.1088/2632-2153/ad64a6","url":null,"abstract":"Anomaly detection is an important task for complex scientific experiments and other complex systems (e.g. industrial facilities, manufacturing), where failures in a sub-system can lead to lost data, poor performance, or even damage to components. While scientific facilities generate a wealth of data, labeled anomalies may be rare (or even nonexistent), and expensive to acquire. Unsupervised approaches are therefore common and typically search for anomalies either by distance or density of examples in the input feature space (or some associated low-dimensional representation). This paper presents a novel approach called coincident learning for anomaly detection (CoAD), which is specifically designed for multi-modal tasks and identifies anomalies based on coincident behavior across two different slices of the feature space. We define an unsupervised metric, , out of analogy to the supervised classification Fβ statistic. CoAD uses to train an anomaly detection algorithm on unlabeled data, based on the expectation that anomalous behavior in one feature slice is coincident with anomalous behavior in the other. The method is illustrated using a synthetic outlier data set and a MNIST-based image data set, and is compared to prior state-of-the-art on two real-world tasks: a metal milling data set and our motivating task of identifying RF station anomalies in a particle accelerator.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-08-01DOI: 10.1088/2632-2153/ad66ad
Joschka Birk, Anna Hallin and Gregor Kasieczka
Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-α model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.
{"title":"OmniJet-α: the first cross-task foundation model for particle physics","authors":"Joschka Birk, Anna Hallin and Gregor Kasieczka","doi":"10.1088/2632-2153/ad66ad","DOIUrl":"https://doi.org/10.1088/2632-2153/ad66ad","url":null,"abstract":"Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-α model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-22DOI: 10.1088/2632-2153/ad5f74
Xiaofei Guan, Xintong Wang, Hao Wu, Zihao Yang and Peng Yu
This paper presents an innovative approach to tackle Bayesian inverse problems using physics-informed invertible neural networks (PI-INN). Serving as a neural operator model, PI-INN employs an invertible neural network (INN) to elucidate the relationship between the parameter field and the solution function in latent variable spaces. Specifically, the INN decomposes the latent variable of the parameter field into two distinct components: the expansion coefficients that represent the solution to the forward problem, and the noise that captures the inherent uncertainty associated with the inverse problem. Through precise estimation of the forward mapping and preservation of statistical independence between expansion coefficients and latent noise, PI-INN offers an accurate and efficient generative model for resolving Bayesian inverse problems, even in the absence of labeled data. For a given solution function, PI-INN can provide tractable and accurate estimates of the posterior distribution of the underlying parameter field. Moreover, capitalizing on the INN’s characteristics, we propose a novel independent loss function to effectively ensure the independence of the INN’s decomposition results. The efficacy and precision of the proposed PI-INN are demonstrated through a series of numerical experiments.
本文提出了一种利用物理信息可逆神经网络(PI-INN)解决贝叶斯逆问题的创新方法。作为一种神经算子模型,PI-INN 利用可逆神经网络(INN)来阐明潜变量空间中参数场与解函数之间的关系。具体来说,INN 将参数场的潜变量分解为两个不同的部分:代表正向问题解决方案的扩展系数,以及捕捉与逆向问题相关的固有不确定性的噪声。通过精确估计前向映射以及保持扩展系数和潜在噪声之间的统计独立性,PI-INN 为解决贝叶斯逆问题提供了一个精确高效的生成模型,即使在没有标记数据的情况下也是如此。对于给定的求解函数,PI-INN 可以对底层参数场的后验分布提供简便而准确的估计。此外,利用 INN 的特点,我们提出了一种新的独立损失函数,以有效确保 INN 分解结果的独立性。我们通过一系列数值实验证明了所提出的 PI-INN 的有效性和精确性。
{"title":"Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems","authors":"Xiaofei Guan, Xintong Wang, Hao Wu, Zihao Yang and Peng Yu","doi":"10.1088/2632-2153/ad5f74","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5f74","url":null,"abstract":"This paper presents an innovative approach to tackle Bayesian inverse problems using physics-informed invertible neural networks (PI-INN). Serving as a neural operator model, PI-INN employs an invertible neural network (INN) to elucidate the relationship between the parameter field and the solution function in latent variable spaces. Specifically, the INN decomposes the latent variable of the parameter field into two distinct components: the expansion coefficients that represent the solution to the forward problem, and the noise that captures the inherent uncertainty associated with the inverse problem. Through precise estimation of the forward mapping and preservation of statistical independence between expansion coefficients and latent noise, PI-INN offers an accurate and efficient generative model for resolving Bayesian inverse problems, even in the absence of labeled data. For a given solution function, PI-INN can provide tractable and accurate estimates of the posterior distribution of the underlying parameter field. Moreover, capitalizing on the INN’s characteristics, we propose a novel independent loss function to effectively ensure the independence of the INN’s decomposition results. The efficacy and precision of the proposed PI-INN are demonstrated through a series of numerical experiments.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-21DOI: 10.1088/2632-2153/ad622f
Alessandro Bombini, Fernando García-Avello Bofías, Caterina Bracci, Michele Ginolfi and Chiara Ruberto
Extended vision techniques are ubiquitous in physics. However, the data cubes steaming from such analysis often pose a challenge in their interpretation, due to the intrinsic difficulty in discerning the relevant information from the spectra composing the data cube. Furthermore, the huge dimensionality of data cube spectra poses a complex task in its statistical interpretation; nevertheless, this complexity contains a massive amount of statistical information that can be exploited in an unsupervised manner to outline some essential properties of the case study at hand, e.g. it is possible to obtain an image segmentation via (deep) clustering of data-cube’s spectra, performed in a suitably defined low-dimensional embedding space. To tackle this topic, we explore the possibility of applying unsupervised clustering methods in encoded space, i.e. perform deep clustering on the spectral properties of datacube pixels. A statistical dimensional reduction is performed by an ad hoc trained (variational) AutoEncoder, in charge of mapping spectra into lower dimensional metric spaces, while the clustering process is performed by a (learnable) iterative K-means clustering algorithm. We apply this technique to two different use cases, of different physical origins: a set of macro mapping x-ray fluorescence (MA-XRF) synthetic data on pictorial artworks, and a dataset of simulated astrophysical observations.
扩展视觉技术在物理学中无处不在。然而,由于从组成数据立方体的光谱中辨别相关信息的内在困难,从此类分析中产生的数据立方体往往对其解释构成挑战。此外,数据立方体光谱的巨大维度也给统计解释带来了复杂的任务;然而,这种复杂性包含了大量的统计信息,可以在无监督的情况下利用这些信息来概述手头案例研究的一些基本属性,例如,可以通过在适当定义的低维嵌入空间中对数据立方体光谱进行(深度)聚类来获得图像分割。为了解决这个问题,我们探索了在编码空间中应用无监督聚类方法的可能性,即对数据立方体像素的光谱属性进行深度聚类。统计降维是通过一个经过特别训练的(变异)自动编码器来完成的,它负责将光谱映射到低维的度量空间中,而聚类过程则是通过一个(可学习的)迭代 K-means 聚类算法来完成的。我们将这一技术应用于两个不同的使用案例,它们的物理来源各不相同:一组关于绘画艺术品的宏观映射 X 射线荧光(MA-XRF)合成数据,以及一个模拟天体物理观测数据集。
{"title":"Datacube segmentation via deep spectral clustering","authors":"Alessandro Bombini, Fernando García-Avello Bofías, Caterina Bracci, Michele Ginolfi and Chiara Ruberto","doi":"10.1088/2632-2153/ad622f","DOIUrl":"https://doi.org/10.1088/2632-2153/ad622f","url":null,"abstract":"Extended vision techniques are ubiquitous in physics. However, the data cubes steaming from such analysis often pose a challenge in their interpretation, due to the intrinsic difficulty in discerning the relevant information from the spectra composing the data cube. Furthermore, the huge dimensionality of data cube spectra poses a complex task in its statistical interpretation; nevertheless, this complexity contains a massive amount of statistical information that can be exploited in an unsupervised manner to outline some essential properties of the case study at hand, e.g. it is possible to obtain an image segmentation via (deep) clustering of data-cube’s spectra, performed in a suitably defined low-dimensional embedding space. To tackle this topic, we explore the possibility of applying unsupervised clustering methods in encoded space, i.e. perform deep clustering on the spectral properties of datacube pixels. A statistical dimensional reduction is performed by an ad hoc trained (variational) AutoEncoder, in charge of mapping spectra into lower dimensional metric spaces, while the clustering process is performed by a (learnable) iterative K-means clustering algorithm. We apply this technique to two different use cases, of different physical origins: a set of macro mapping x-ray fluorescence (MA-XRF) synthetic data on pictorial artworks, and a dataset of simulated astrophysical observations.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-07-18DOI: 10.1088/2632-2153/ad5a60
Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein and Gustau Camps-Valls
Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing double machine learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the Q10 model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.
{"title":"Causal hybrid modeling with double machine learning—applications in carbon flux modeling","authors":"Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein and Gustau Camps-Valls","doi":"10.1088/2632-2153/ad5a60","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5a60","url":null,"abstract":"Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing double machine learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the Q10 model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}