首页 > 最新文献

Machine Learning Science and Technology最新文献

英文 中文
Transfer learning with generative models for object detection on limited datasets 利用生成模型在有限数据集上进行物体检测的迁移学习
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-11 DOI: 10.1088/2632-2153/ad65b5
M Paiano, S Martina, C Giannelli and F Caruso
The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such data scarcity is found in the domain of marine biology, where it is useful to develop methods to automatically detect submarine species for environmental monitoring. To address this data limitation, the state-of-the-art machine learning strategies employ two main approaches. The first involves pretraining models on existing datasets before generalizing to the specific domain of interest. The second strategy is to create synthetic datasets specifically tailored to the target domain using methods like copy-paste techniques or ad-hoc simulators. The first strategy often faces a significant domain shift, while the second demands custom solutions crafted for the specific task. In response to these challenges, here we propose a transfer learning framework that is valid for a generic scenario. In this framework, generated images help to improve the performances of an object detector in a few-real data regime. This is achieved through a diffusion-based generative model that was pretrained on large generic datasets. With respect to the state-of-the-art, we find that it is not necessary to fine tune the generative model on the specific domain of interest. We believe that this is an important advance because it mitigates the labor-intensive task of manual labeling the images in object detection tasks. We validate our approach focusing on fishes in an underwater environment, and on the more common domain of cars in an urban setting. Our method achieves detection performance comparable to models trained on thousands of images, using only a few hundreds of input data. Our results pave the way for new generative AI-based protocols for machine learning applications in various domains, for instance ranging from geophysics to biology and medicine.
在某些领域,数据的可用性是有限的,特别是在物体检测任务中,需要在每个物体周围有正确标注的边界框。海洋生物学领域就是这种数据匮乏的一个显著例子,在该领域,开发用于环境监测的海底物种自动检测方法非常有用。为了解决这种数据限制,最先进的机器学习策略主要采用两种方法。第一种方法是在现有数据集上对模型进行预训练,然后再将其推广到感兴趣的特定领域。第二种策略是使用复制粘贴技术或临时模拟器等方法创建专门针对目标领域的合成数据集。第一种策略通常会面临重大的领域转变,而第二种策略则需要为特定任务量身定制解决方案。为了应对这些挑战,我们在这里提出了一个适用于通用场景的迁移学习框架。在这个框架中,生成的图像有助于提高物体检测器在少量真实数据环境中的性能。这是通过在大型通用数据集上预训练的基于扩散的生成模型实现的。与最先进的技术相比,我们发现无需对特定兴趣领域的生成模型进行微调。我们认为这是一个重要的进步,因为它减轻了物体检测任务中人工标记图像的劳动密集型任务。我们以水下环境中的鱼类和城市环境中更常见的汽车领域为重点,对我们的方法进行了验证。我们的方法只使用了几百个输入数据,就实现了与在数千张图像上训练的模型相当的检测性能。我们的研究成果为新的基于生成式人工智能的协议铺平了道路,该协议适用于从地球物理学到生物学和医学等各个领域的机器学习应用。
{"title":"Transfer learning with generative models for object detection on limited datasets","authors":"M Paiano, S Martina, C Giannelli and F Caruso","doi":"10.1088/2632-2153/ad65b5","DOIUrl":"https://doi.org/10.1088/2632-2153/ad65b5","url":null,"abstract":"The availability of data is limited in some fields, especially for object detection tasks, where it is necessary to have correctly labeled bounding boxes around each object. A notable example of such data scarcity is found in the domain of marine biology, where it is useful to develop methods to automatically detect submarine species for environmental monitoring. To address this data limitation, the state-of-the-art machine learning strategies employ two main approaches. The first involves pretraining models on existing datasets before generalizing to the specific domain of interest. The second strategy is to create synthetic datasets specifically tailored to the target domain using methods like copy-paste techniques or ad-hoc simulators. The first strategy often faces a significant domain shift, while the second demands custom solutions crafted for the specific task. In response to these challenges, here we propose a transfer learning framework that is valid for a generic scenario. In this framework, generated images help to improve the performances of an object detector in a few-real data regime. This is achieved through a diffusion-based generative model that was pretrained on large generic datasets. With respect to the state-of-the-art, we find that it is not necessary to fine tune the generative model on the specific domain of interest. We believe that this is an important advance because it mitigates the labor-intensive task of manual labeling the images in object detection tasks. We validate our approach focusing on fishes in an underwater environment, and on the more common domain of cars in an urban setting. Our method achieves detection performance comparable to models trained on thousands of images, using only a few hundreds of input data. Our results pave the way for new generative AI-based protocols for machine learning applications in various domains, for instance ranging from geophysics to biology and medicine.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trainability issues in quantum policy gradients 量子政策梯度的可训练性问题
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1088/2632-2153/ad6830
André Sequeira, Luis Paulo Santos and Luis Soares Barbosa
This research explores the trainability of Parameterized Quantum Circuit-based policies in Reinforcement Learning, an area that has recently seen a surge in empirical exploration. While some studies suggest improved sample complexity using quantum gradient estimation, the efficient trainability of these policies remains an open question. Our findings reveal significant challenges, including standard Barren Plateaus with exponentially small gradients and gradient explosion. These phenomena depend on the type of basis-state partitioning and the mapping of these partitions onto actions. For a polynomial number of actions, a trainable window can be ensured with a polynomial number of measurements if a contiguous-like partitioning of basis-states is employed. These results are empirically validated in a multi-armed bandit environment.
本研究探讨了强化学习中基于参数化量子电路的策略的可训练性,这一领域的实证探索最近出现了激增。虽然一些研究表明,量子梯度估计提高了样本复杂度,但这些策略的高效可训练性仍是一个未决问题。我们的研究结果揭示了巨大的挑战,包括梯度呈指数级小的标准贫瘠高原和梯度爆炸。这些现象取决于基态划分的类型以及这些划分对行动的映射。对于多项式数量的动作,如果采用类似连续的基态划分,则只需多项式数量的测量就能确保可训练窗口。这些结果在多臂强盗环境中得到了经验验证。
{"title":"Trainability issues in quantum policy gradients","authors":"André Sequeira, Luis Paulo Santos and Luis Soares Barbosa","doi":"10.1088/2632-2153/ad6830","DOIUrl":"https://doi.org/10.1088/2632-2153/ad6830","url":null,"abstract":"This research explores the trainability of Parameterized Quantum Circuit-based policies in Reinforcement Learning, an area that has recently seen a surge in empirical exploration. While some studies suggest improved sample complexity using quantum gradient estimation, the efficient trainability of these policies remains an open question. Our findings reveal significant challenges, including standard Barren Plateaus with exponentially small gradients and gradient explosion. These phenomena depend on the type of basis-state partitioning and the mapping of these partitions onto actions. For a polynomial number of actions, a trainable window can be ensured with a polynomial number of measurements if a contiguous-like partitioning of basis-states is employed. These results are empirically validated in a multi-armed bandit environment.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Molecular relaxation by reverse diffusion with time step prediction 反向扩散的分子弛豫与时间步长预测
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-05 DOI: 10.1088/2632-2153/ad652c
Khaled Kahouli, Stefaan Simon Pierre Hessmann, Klaus-Robert Müller, Shinichi Nakajima, Stefan Gugler and Niklas Wolf Andreas Gebauer
Molecular relaxation, finding the equilibrium state of a non-equilibrium structure, is an essential component of computational chemistry to understand reactivity. Classical force field (FF) methods often rely on insufficient local energy minimization, while neural network FF models require large labeled datasets encompassing both equilibrium and non-equilibrium structures. As a remedy, we propose MoreRed, molecular relaxation by reverse diffusion, a conceptually novel and purely statistical approach where non-equilibrium structures are treated as noisy instances of their corresponding equilibrium states. To enable the denoising of arbitrarily noisy inputs via a generative diffusion model, we further introduce a novel diffusion time step predictor. Notably, MoreRed learns a simpler pseudo potential energy surface (PES) instead of the complex physical PES. It is trained on a significantly smaller, and thus computationally cheaper, dataset consisting of solely unlabeled equilibrium structures, avoiding the computation of non-equilibrium structures altogether. We compare MoreRed to classical FFs, equivariant neural network FFs trained on a large dataset of equilibrium and non-equilibrium data, as well as a semi-empirical tight-binding model. To assess this quantitatively, we evaluate the root-mean-square deviation between the found equilibrium structures and the reference equilibrium structures as well as their energies.
分子弛豫,即寻找非平衡态结构的平衡状态,是计算化学理解反应性的重要组成部分。经典的力场(FF)方法通常依赖于不充分的局部能量最小化,而神经网络 FF 模型则需要包含平衡和非平衡结构的大型标记数据集。作为一种补救措施,我们提出了反向扩散分子弛豫方法(MoreRed),这是一种概念新颖的纯统计方法,将非平衡态结构视为其相应平衡态的噪声实例。为了通过生成扩散模型对任意噪声输入进行去噪处理,我们进一步引入了一种新型扩散时间步预测器。值得注意的是,MoreRed 学习的是更简单的伪势能面(PES),而不是复杂的物理势能面。它是在一个明显更小的数据集上进行训练的,因此计算成本更低,该数据集仅由未标记的平衡结构组成,完全避免了非平衡结构的计算。我们将 MoreRed 与经典 FF、在大量平衡和非平衡数据集上训练的等变神经网络 FF 以及半经验紧密结合模型进行了比较。为了定量评估这一点,我们评估了所发现的平衡结构与参考平衡结构之间的均方根偏差以及它们的能量。
{"title":"Molecular relaxation by reverse diffusion with time step prediction","authors":"Khaled Kahouli, Stefaan Simon Pierre Hessmann, Klaus-Robert Müller, Shinichi Nakajima, Stefan Gugler and Niklas Wolf Andreas Gebauer","doi":"10.1088/2632-2153/ad652c","DOIUrl":"https://doi.org/10.1088/2632-2153/ad652c","url":null,"abstract":"Molecular relaxation, finding the equilibrium state of a non-equilibrium structure, is an essential component of computational chemistry to understand reactivity. Classical force field (FF) methods often rely on insufficient local energy minimization, while neural network FF models require large labeled datasets encompassing both equilibrium and non-equilibrium structures. As a remedy, we propose MoreRed, molecular relaxation by reverse diffusion, a conceptually novel and purely statistical approach where non-equilibrium structures are treated as noisy instances of their corresponding equilibrium states. To enable the denoising of arbitrarily noisy inputs via a generative diffusion model, we further introduce a novel diffusion time step predictor. Notably, MoreRed learns a simpler pseudo potential energy surface (PES) instead of the complex physical PES. It is trained on a significantly smaller, and thus computationally cheaper, dataset consisting of solely unlabeled equilibrium structures, avoiding the computation of non-equilibrium structures altogether. We compare MoreRed to classical FFs, equivariant neural network FFs trained on a large dataset of equilibrium and non-equilibrium data, as well as a semi-empirical tight-binding model. To assess this quantitatively, we evaluate the root-mean-square deviation between the found equilibrium structures and the reference equilibrium structures as well as their energies.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-perspective feedback-attention coupling model for continuous-time dynamic graphs 连续时间动态图的多视角反馈-关注耦合模型
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-04 DOI: 10.1088/2632-2153/ad66af
Xiaobo Zhu, Yan Wu, Jin Che, Chao Wang, Liying Wang and Zhanheng Chen
Representation learning over graph networks has recently gained popularity, with many models showing promising results. However, several challenges remain: (1) most methods are designed for static or discrete-time dynamic graphs; (2) existing continuous-time dynamic graph algorithms focus on a single evolving perspective; and (3) many continuous-time dynamic graph approaches necessitate numerous temporal neighbors to capture long-term dependencies. In response, this paper introduces a Multi-Perspective Feedback-Attention Coupling (MPFA) model. MPFA incorporates information from both evolving and original perspectives to effectively learn the complex dynamics of dynamic graph evolution processes. The evolving perspective considers the current state of historical interaction events of nodes and uses a temporal attention module to aggregate current state information. This perspective also makes it possible to capture long-term dependencies of nodes using a small number of temporal neighbors. Meanwhile, the original perspective utilizes a feedback attention module with growth characteristic coefficients to aggregate the original state information of node interactions. Experimental results on one dataset organized by ourselves and seven public datasets validate the effectiveness and competitiveness of our proposed model.
图网络的表征学习最近很受欢迎,许多模型都取得了可喜的成果。然而,目前仍存在一些挑战:(1) 大多数方法都是针对静态或离散时间动态图设计的;(2) 现有的连续时间动态图算法只关注单一的演化视角;(3) 许多连续时间动态图方法需要大量的时间邻域来捕捉长期依赖关系。为此,本文引入了多视角反馈-关注耦合(MPFA)模型。MPFA 融合了演化视角和原始视角的信息,可有效学习动态图演化过程的复杂动态。演化视角考虑了节点历史交互事件的当前状态,并使用时间注意力模块来聚合当前状态信息。这种视角还能利用少量的时间邻域来捕捉节点的长期依赖关系。同时,原始视角利用具有增长特征系数的反馈注意力模块来聚合节点交互的原始状态信息。在我们自己组织的一个数据集和七个公共数据集上的实验结果验证了我们提出的模型的有效性和竞争力。
{"title":"Multi-perspective feedback-attention coupling model for continuous-time dynamic graphs","authors":"Xiaobo Zhu, Yan Wu, Jin Che, Chao Wang, Liying Wang and Zhanheng Chen","doi":"10.1088/2632-2153/ad66af","DOIUrl":"https://doi.org/10.1088/2632-2153/ad66af","url":null,"abstract":"Representation learning over graph networks has recently gained popularity, with many models showing promising results. However, several challenges remain: (1) most methods are designed for static or discrete-time dynamic graphs; (2) existing continuous-time dynamic graph algorithms focus on a single evolving perspective; and (3) many continuous-time dynamic graph approaches necessitate numerous temporal neighbors to capture long-term dependencies. In response, this paper introduces a Multi-Perspective Feedback-Attention Coupling (MPFA) model. MPFA incorporates information from both evolving and original perspectives to effectively learn the complex dynamics of dynamic graph evolution processes. The evolving perspective considers the current state of historical interaction events of nodes and uses a temporal attention module to aggregate current state information. This perspective also makes it possible to capture long-term dependencies of nodes using a small number of temporal neighbors. Meanwhile, the original perspective utilizes a feedback attention module with growth characteristic coefficients to aggregate the original state information of node interactions. Experimental results on one dataset organized by ourselves and seven public datasets validate the effectiveness and competitiveness of our proposed model.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141968649","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards robust data-driven automated recovery of symbolic conservation laws from limited data 从有限数据中实现稳健的数据驱动自动恢复符号守恒定律
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-04 DOI: 10.1088/2632-2153/ad6390
Tracey Oellerich and Maria Emelianenko
Conservation laws are an inherent feature in many systems modeling real world phenomena, in particular, those modeling biological and chemical systems. If the form of the underlying dynamical system is known, linear algebra and algebraic geometry methods can be used to identify the conservation laws. Our work focuses on using data-driven methods to identify the conservation law(s) in the absence of the knowledge of system dynamics. We develop a robust data-driven computational framework that automates the process of identifying the number and type of the conservation law(s) while keeping the amount of required data to a minimum. We demonstrate that due to relative stability of singular vectors to noise we are able to reconstruct correct conservation laws without the need for excessive parameter tuning. While we focus primarily on biological examples, the framework proposed herein is suitable for a variety of data science applications and can be coupled with other machine learning approaches.
守恒定律是许多模拟现实世界现象的系统,特别是模拟生物和化学系统的系统的固有特征。如果已知基本动态系统的形式,就可以使用线性代数和代数几何方法来识别守恒定律。我们的工作重点是在缺乏系统动力学知识的情况下,使用数据驱动方法来识别守恒定律。我们开发了一个稳健的数据驱动计算框架,可自动识别守恒定律的数量和类型,同时将所需数据量保持在最低水平。我们证明,由于奇异向量对噪声的相对稳定性,我们能够重建正确的守恒定律,而无需过多的参数调整。虽然我们主要关注生物实例,但本文提出的框架适用于各种数据科学应用,并可与其他机器学习方法相结合。
{"title":"Towards robust data-driven automated recovery of symbolic conservation laws from limited data","authors":"Tracey Oellerich and Maria Emelianenko","doi":"10.1088/2632-2153/ad6390","DOIUrl":"https://doi.org/10.1088/2632-2153/ad6390","url":null,"abstract":"Conservation laws are an inherent feature in many systems modeling real world phenomena, in particular, those modeling biological and chemical systems. If the form of the underlying dynamical system is known, linear algebra and algebraic geometry methods can be used to identify the conservation laws. Our work focuses on using data-driven methods to identify the conservation law(s) in the absence of the knowledge of system dynamics. We develop a robust data-driven computational framework that automates the process of identifying the number and type of the conservation law(s) while keeping the amount of required data to a minimum. We demonstrate that due to relative stability of singular vectors to noise we are able to reconstruct correct conservation laws without the need for excessive parameter tuning. While we focus primarily on biological examples, the framework proposed herein is suitable for a variety of data science applications and can be coupled with other machine learning approaches.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Coincident learning for unsupervised anomaly detection of scientific instruments 用于科学仪器无监督异常检测的巧合学习
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-04 DOI: 10.1088/2632-2153/ad64a6
Ryan Humble, Zhe Zhang, Finn O’Shea, Eric Darve and Daniel Ratner
Anomaly detection is an important task for complex scientific experiments and other complex systems (e.g. industrial facilities, manufacturing), where failures in a sub-system can lead to lost data, poor performance, or even damage to components. While scientific facilities generate a wealth of data, labeled anomalies may be rare (or even nonexistent), and expensive to acquire. Unsupervised approaches are therefore common and typically search for anomalies either by distance or density of examples in the input feature space (or some associated low-dimensional representation). This paper presents a novel approach called coincident learning for anomaly detection (CoAD), which is specifically designed for multi-modal tasks and identifies anomalies based on coincident behavior across two different slices of the feature space. We define an unsupervised metric, , out of analogy to the supervised classification Fβ statistic. CoAD uses to train an anomaly detection algorithm on unlabeled data, based on the expectation that anomalous behavior in one feature slice is coincident with anomalous behavior in the other. The method is illustrated using a synthetic outlier data set and a MNIST-based image data set, and is compared to prior state-of-the-art on two real-world tasks: a metal milling data set and our motivating task of identifying RF station anomalies in a particle accelerator.
异常检测是复杂科学实验和其他复杂系统(如工业设施、制造业)的一项重要任务,其中子系统的故障可能导致数据丢失、性能低下,甚至损坏部件。虽然科学设施会产生大量数据,但标注的异常情况可能很少(甚至不存在),而且获取成本高昂。因此,无监督方法很常见,通常是通过输入特征空间(或一些相关的低维表示)中示例的距离或密度来搜索异常。本文提出了一种名为 "异常检测重合学习"(CoAD)的新方法,该方法专为多模态任务而设计,可根据特征空间两个不同片段的重合行为识别异常。我们定义了一个无监督度量,与监督分类 Fβ 统计量类似。CoAD 用于在无标记数据上训练异常检测算法,该算法基于一个特征片中的异常行为与另一个特征片中的异常行为重合的预期。我们使用合成离群点数据集和基于 MNIST 的图像数据集对该方法进行了说明,并在两个实际任务中将该方法与先前的先进方法进行了比较:一个是金属铣削数据集,另一个是我们在粒子加速器中识别射频站异常的激励任务。
{"title":"Coincident learning for unsupervised anomaly detection of scientific instruments","authors":"Ryan Humble, Zhe Zhang, Finn O’Shea, Eric Darve and Daniel Ratner","doi":"10.1088/2632-2153/ad64a6","DOIUrl":"https://doi.org/10.1088/2632-2153/ad64a6","url":null,"abstract":"Anomaly detection is an important task for complex scientific experiments and other complex systems (e.g. industrial facilities, manufacturing), where failures in a sub-system can lead to lost data, poor performance, or even damage to components. While scientific facilities generate a wealth of data, labeled anomalies may be rare (or even nonexistent), and expensive to acquire. Unsupervised approaches are therefore common and typically search for anomalies either by distance or density of examples in the input feature space (or some associated low-dimensional representation). This paper presents a novel approach called coincident learning for anomaly detection (CoAD), which is specifically designed for multi-modal tasks and identifies anomalies based on coincident behavior across two different slices of the feature space. We define an unsupervised metric, , out of analogy to the supervised classification Fβ statistic. CoAD uses to train an anomaly detection algorithm on unlabeled data, based on the expectation that anomalous behavior in one feature slice is coincident with anomalous behavior in the other. The method is illustrated using a synthetic outlier data set and a MNIST-based image data set, and is compared to prior state-of-the-art on two real-world tasks: a metal milling data set and our motivating task of identifying RF station anomalies in a particle accelerator.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141931255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OmniJet-α: the first cross-task foundation model for particle physics OmniJet-α:首个用于粒子物理学的跨任务基础模型
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-08-01 DOI: 10.1088/2632-2153/ad66ad
Joschka Birk, Anna Hallin and Gregor Kasieczka
Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-α model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.
基础模型是多数据集和多任务机器学习方法,一旦经过预训练,就可以针对各种下游应用进行微调。为物理数据成功开发这种通用模型将是一个重大突破,因为它们可以提高可实现的物理性能,同时大幅减少所需的训练时间和数据量。我们报告了这一挑战在几个方面取得的重大进展。首先,我们引入了一套全面的评估方法,用于判断将物理数据编码为适合自回归生成具有变压器架构(基础模型的常见骨干)的粒子喷流的表示形式的质量。与之前的工作相比,这些措施促使我们选择了保真度更高的标记化方法。最后,我们用新的 OmniJet-α 模型演示了无监督问题(喷流生成)和经典监督任务(喷流标记)之间的迁移学习。这是首次成功地在两个不同的、被积极研究的任务类别之间进行迁移,是建立粒子物理学基础模型的重要一步。
{"title":"OmniJet-α: the first cross-task foundation model for particle physics","authors":"Joschka Birk, Anna Hallin and Gregor Kasieczka","doi":"10.1088/2632-2153/ad66ad","DOIUrl":"https://doi.org/10.1088/2632-2153/ad66ad","url":null,"abstract":"Foundation models are multi-dataset and multi-task machine learning methods that once pre-trained can be fine-tuned for a large variety of downstream applications. The successful development of such general-purpose models for physics data would be a major breakthrough as they could improve the achievable physics performance while at the same time drastically reduce the required amount of training time and data. We report significant progress on this challenge on several fronts. First, a comprehensive set of evaluation methods is introduced to judge the quality of an encoding from physics data into a representation suitable for the autoregressive generation of particle jets with transformer architectures (the common backbone of foundation models). These measures motivate the choice of a higher-fidelity tokenization compared to previous works. Finally, we demonstrate transfer learning between an unsupervised problem (jet generation) and a classic supervised task (jet tagging) with our new OmniJet-α model. This is the first successful transfer between two different and actively studied classes of tasks and constitutes a major step in the building of foundation models for particle physics.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141885723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems 利用物理信息可逆神经网络对逆问题进行高效贝叶斯推理
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-22 DOI: 10.1088/2632-2153/ad5f74
Xiaofei Guan, Xintong Wang, Hao Wu, Zihao Yang and Peng Yu
This paper presents an innovative approach to tackle Bayesian inverse problems using physics-informed invertible neural networks (PI-INN). Serving as a neural operator model, PI-INN employs an invertible neural network (INN) to elucidate the relationship between the parameter field and the solution function in latent variable spaces. Specifically, the INN decomposes the latent variable of the parameter field into two distinct components: the expansion coefficients that represent the solution to the forward problem, and the noise that captures the inherent uncertainty associated with the inverse problem. Through precise estimation of the forward mapping and preservation of statistical independence between expansion coefficients and latent noise, PI-INN offers an accurate and efficient generative model for resolving Bayesian inverse problems, even in the absence of labeled data. For a given solution function, PI-INN can provide tractable and accurate estimates of the posterior distribution of the underlying parameter field. Moreover, capitalizing on the INN’s characteristics, we propose a novel independent loss function to effectively ensure the independence of the INN’s decomposition results. The efficacy and precision of the proposed PI-INN are demonstrated through a series of numerical experiments.
本文提出了一种利用物理信息可逆神经网络(PI-INN)解决贝叶斯逆问题的创新方法。作为一种神经算子模型,PI-INN 利用可逆神经网络(INN)来阐明潜变量空间中参数场与解函数之间的关系。具体来说,INN 将参数场的潜变量分解为两个不同的部分:代表正向问题解决方案的扩展系数,以及捕捉与逆向问题相关的固有不确定性的噪声。通过精确估计前向映射以及保持扩展系数和潜在噪声之间的统计独立性,PI-INN 为解决贝叶斯逆问题提供了一个精确高效的生成模型,即使在没有标记数据的情况下也是如此。对于给定的求解函数,PI-INN 可以对底层参数场的后验分布提供简便而准确的估计。此外,利用 INN 的特点,我们提出了一种新的独立损失函数,以有效确保 INN 分解结果的独立性。我们通过一系列数值实验证明了所提出的 PI-INN 的有效性和精确性。
{"title":"Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems","authors":"Xiaofei Guan, Xintong Wang, Hao Wu, Zihao Yang and Peng Yu","doi":"10.1088/2632-2153/ad5f74","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5f74","url":null,"abstract":"This paper presents an innovative approach to tackle Bayesian inverse problems using physics-informed invertible neural networks (PI-INN). Serving as a neural operator model, PI-INN employs an invertible neural network (INN) to elucidate the relationship between the parameter field and the solution function in latent variable spaces. Specifically, the INN decomposes the latent variable of the parameter field into two distinct components: the expansion coefficients that represent the solution to the forward problem, and the noise that captures the inherent uncertainty associated with the inverse problem. Through precise estimation of the forward mapping and preservation of statistical independence between expansion coefficients and latent noise, PI-INN offers an accurate and efficient generative model for resolving Bayesian inverse problems, even in the absence of labeled data. For a given solution function, PI-INN can provide tractable and accurate estimates of the posterior distribution of the underlying parameter field. Moreover, capitalizing on the INN’s characteristics, we propose a novel independent loss function to effectively ensure the independence of the INN’s decomposition results. The efficacy and precision of the proposed PI-INN are demonstrated through a series of numerical experiments.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141753973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Datacube segmentation via deep spectral clustering 通过深度光谱聚类进行数据立方体分割
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-21 DOI: 10.1088/2632-2153/ad622f
Alessandro Bombini, Fernando García-Avello Bofías, Caterina Bracci, Michele Ginolfi and Chiara Ruberto
Extended vision techniques are ubiquitous in physics. However, the data cubes steaming from such analysis often pose a challenge in their interpretation, due to the intrinsic difficulty in discerning the relevant information from the spectra composing the data cube. Furthermore, the huge dimensionality of data cube spectra poses a complex task in its statistical interpretation; nevertheless, this complexity contains a massive amount of statistical information that can be exploited in an unsupervised manner to outline some essential properties of the case study at hand, e.g. it is possible to obtain an image segmentation via (deep) clustering of data-cube’s spectra, performed in a suitably defined low-dimensional embedding space. To tackle this topic, we explore the possibility of applying unsupervised clustering methods in encoded space, i.e. perform deep clustering on the spectral properties of datacube pixels. A statistical dimensional reduction is performed by an ad hoc trained (variational) AutoEncoder, in charge of mapping spectra into lower dimensional metric spaces, while the clustering process is performed by a (learnable) iterative K-means clustering algorithm. We apply this technique to two different use cases, of different physical origins: a set of macro mapping x-ray fluorescence (MA-XRF) synthetic data on pictorial artworks, and a dataset of simulated astrophysical observations.
扩展视觉技术在物理学中无处不在。然而,由于从组成数据立方体的光谱中辨别相关信息的内在困难,从此类分析中产生的数据立方体往往对其解释构成挑战。此外,数据立方体光谱的巨大维度也给统计解释带来了复杂的任务;然而,这种复杂性包含了大量的统计信息,可以在无监督的情况下利用这些信息来概述手头案例研究的一些基本属性,例如,可以通过在适当定义的低维嵌入空间中对数据立方体光谱进行(深度)聚类来获得图像分割。为了解决这个问题,我们探索了在编码空间中应用无监督聚类方法的可能性,即对数据立方体像素的光谱属性进行深度聚类。统计降维是通过一个经过特别训练的(变异)自动编码器来完成的,它负责将光谱映射到低维的度量空间中,而聚类过程则是通过一个(可学习的)迭代 K-means 聚类算法来完成的。我们将这一技术应用于两个不同的使用案例,它们的物理来源各不相同:一组关于绘画艺术品的宏观映射 X 射线荧光(MA-XRF)合成数据,以及一个模拟天体物理观测数据集。
{"title":"Datacube segmentation via deep spectral clustering","authors":"Alessandro Bombini, Fernando García-Avello Bofías, Caterina Bracci, Michele Ginolfi and Chiara Ruberto","doi":"10.1088/2632-2153/ad622f","DOIUrl":"https://doi.org/10.1088/2632-2153/ad622f","url":null,"abstract":"Extended vision techniques are ubiquitous in physics. However, the data cubes steaming from such analysis often pose a challenge in their interpretation, due to the intrinsic difficulty in discerning the relevant information from the spectra composing the data cube. Furthermore, the huge dimensionality of data cube spectra poses a complex task in its statistical interpretation; nevertheless, this complexity contains a massive amount of statistical information that can be exploited in an unsupervised manner to outline some essential properties of the case study at hand, e.g. it is possible to obtain an image segmentation via (deep) clustering of data-cube’s spectra, performed in a suitably defined low-dimensional embedding space. To tackle this topic, we explore the possibility of applying unsupervised clustering methods in encoded space, i.e. perform deep clustering on the spectral properties of datacube pixels. A statistical dimensional reduction is performed by an ad hoc trained (variational) AutoEncoder, in charge of mapping spectra into lower dimensional metric spaces, while the clustering process is performed by a (learnable) iterative K-means clustering algorithm. We apply this technique to two different use cases, of different physical origins: a set of macro mapping x-ray fluorescence (MA-XRF) synthetic data on pictorial artworks, and a dataset of simulated astrophysical observations.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Causal hybrid modeling with double machine learning—applications in carbon flux modeling 双机器学习的因果混合建模--在碳通量建模中的应用
IF 6.8 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-18 DOI: 10.1088/2632-2153/ad5a60
Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein and Gustau Camps-Valls
Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing double machine learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the Q10 model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.
混合建模将机器学习与科学知识相结合,以增强可解释性、概括性和对自然规律的遵循。然而,等价性和正则化偏差给混合建模实现这些目的带来了挑战。本文介绍了一种通过因果推理框架来估计混合模型的新方法,特别是采用双重机器学习(DML)来估计因果效应。我们在两个与二氧化碳通量有关的问题上展示了这种方法在地球科学中的应用。在 Q10 模型中,我们证明了基于 DML 的混合建模在估计因果参数方面优于端到端深度神经网络方法,证明了其效率、对正则化方法产生的偏差的鲁棒性以及规避等效性。我们的方法适用于碳通量分区,在适应异质因果效应方面表现出灵活性。该研究强调了明确定义因果图和因果关系的必要性,并倡导将此作为一般最佳实践。我们鼓励在混合模型中继续探索因果关系,以便在知识引导的机器学习中获得更可解释、更可信的结果。
{"title":"Causal hybrid modeling with double machine learning—applications in carbon flux modeling","authors":"Kai-Hendrik Cohrs, Gherardo Varando, Nuno Carvalhais, Markus Reichstein and Gustau Camps-Valls","doi":"10.1088/2632-2153/ad5a60","DOIUrl":"https://doi.org/10.1088/2632-2153/ad5a60","url":null,"abstract":"Hybrid modeling integrates machine learning with scientific knowledge to enhance interpretability, generalization, and adherence to natural laws. Nevertheless, equifinality and regularization biases pose challenges in hybrid modeling to achieve these purposes. This paper introduces a novel approach to estimating hybrid models via a causal inference framework, specifically employing double machine learning (DML) to estimate causal effects. We showcase its use for the Earth sciences on two problems related to carbon dioxide fluxes. In the Q10 model, we demonstrate that DML-based hybrid modeling is superior in estimating causal parameters over end-to-end deep neural network approaches, proving efficiency, robustness to bias from regularization methods, and circumventing equifinality. Our approach, applied to carbon flux partitioning, exhibits flexibility in accommodating heterogeneous causal effects. The study emphasizes the necessity of explicitly defining causal graphs and relationships, advocating for this as a general best practice. We encourage the continued exploration of causality in hybrid models for more interpretable and trustworthy results in knowledge-guided machine learning.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141745429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Machine Learning Science and Technology
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1