首页 > 最新文献

arXiv - PHYS - Data Analysis, Statistics and Probability最新文献

英文 中文
Machine learning from limited data: Predicting biological dynamics under a time-varying external input 从有限数据中进行机器学习:预测时变外部输入下的生物动态
Pub Date : 2024-08-15 DOI: arxiv-2408.07998
Hoony Kang, Keshav Srinivasan, Wolfgang Losert
Reservoir computing (RC) is known as a powerful machine learning approach forlearning complex dynamics from limited data. Here, we use RC to predict highlystochastic dynamics of cell shapes. We find that RC is able to predict thesteady state climate from very limited data. Furthermore, the RC learns thetimescale of transients from only four observations. We find that thesecapabilities of the RC to act as a dynamic twin allows us to also inferimportant statistics of cell shape dynamics of unobserved conditions.
众所周知,存储计算(RC)是一种强大的机器学习方法,可以从有限的数据中学习复杂的动力学。在这里,我们使用 RC 预测细胞形状的高随机动态。我们发现,RC 能够从非常有限的数据中预测稳态气候。此外,RC 还能从仅有的四个观测数据中学习瞬态的时间尺度。我们发现,RC 作为动态孪生体的这些能力使我们还能推断出未观测条件下细胞形状动态的重要统计数据。
{"title":"Machine learning from limited data: Predicting biological dynamics under a time-varying external input","authors":"Hoony Kang, Keshav Srinivasan, Wolfgang Losert","doi":"arxiv-2408.07998","DOIUrl":"https://doi.org/arxiv-2408.07998","url":null,"abstract":"Reservoir computing (RC) is known as a powerful machine learning approach for\u0000learning complex dynamics from limited data. Here, we use RC to predict highly\u0000stochastic dynamics of cell shapes. We find that RC is able to predict the\u0000steady state climate from very limited data. Furthermore, the RC learns the\u0000timescale of transients from only four observations. We find that these\u0000capabilities of the RC to act as a dynamic twin allows us to also infer\u0000important statistics of cell shape dynamics of unobserved conditions.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Behavioral AI: Reinforcement Learning to Enhance Pharmacy Services 自适应行为人工智能:强化学习提升药学服务
Pub Date : 2024-08-14 DOI: arxiv-2408.07647
Ana Fernández del Río, Michael Brennan Leong, Paulo Saraiva, Ivan Nazarov, Aditya Rastogi, Moiz Hassan, Dexian Tang, África Periáñez
Pharmacies are critical in healthcare systems, particularly in low- andmiddle-income countries. Procuring pharmacists with the right behavioralinterventions or nudges can enhance their skills, public health awareness, andpharmacy inventory management, ensuring access to essential medicines thatultimately benefit their patients. We introduce a reinforcement learningoperational system to deliver personalized behavioral interventions throughmobile health applications. We illustrate its potential by discussing a seriesof initial experiments run with SwipeRx, an all-in-one app for pharmacists,including B2B e-commerce, in Indonesia. The proposed method has broaderapplications extending beyond pharmacy operations to optimize healthcaredelivery.
药房在医疗保健系统中至关重要,尤其是在中低收入国家。为药剂师提供正确的行为干预或暗示,可以提高他们的技能、公共卫生意识和药房库存管理水平,确保他们获得基本药物,最终使患者受益。我们介绍了一种强化学习操作系统,通过移动医疗应用提供个性化的行为干预。我们通过讨论印度尼西亚的 SwipeRx(一款面向药剂师的一体化应用,包括 B2B 电子商务)进行的一系列初步实验来说明该系统的潜力。所提出的方法具有更广泛的应用前景,不仅能用于药房运营,还能优化医疗服务。
{"title":"Adaptive Behavioral AI: Reinforcement Learning to Enhance Pharmacy Services","authors":"Ana Fernández del Río, Michael Brennan Leong, Paulo Saraiva, Ivan Nazarov, Aditya Rastogi, Moiz Hassan, Dexian Tang, África Periáñez","doi":"arxiv-2408.07647","DOIUrl":"https://doi.org/arxiv-2408.07647","url":null,"abstract":"Pharmacies are critical in healthcare systems, particularly in low- and\u0000middle-income countries. Procuring pharmacists with the right behavioral\u0000interventions or nudges can enhance their skills, public health awareness, and\u0000pharmacy inventory management, ensuring access to essential medicines that\u0000ultimately benefit their patients. We introduce a reinforcement learning\u0000operational system to deliver personalized behavioral interventions through\u0000mobile health applications. We illustrate its potential by discussing a series\u0000of initial experiments run with SwipeRx, an all-in-one app for pharmacists,\u0000including B2B e-commerce, in Indonesia. The proposed method has broader\u0000applications extending beyond pharmacy operations to optimize healthcare\u0000delivery.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142179262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian analysis of nucleon-nucleon scattering data in pionless effective field theory 无先锋有效场理论中核子-核子散射数据的贝叶斯分析
Pub Date : 2024-08-05 DOI: arxiv-2408.02480
J. M. Bub, M. Piarulli, R. J. Furnstahl, S. Pastore, D. R. Phillips
We perform Bayesian model calibration of two-nucleon ($NN$) low-energyconstants (LECs) appearing in an $NN$ interaction based on pionless effectivefield theory (EFT). The calibration is carried out for potentials constructedusing naive dimensional analysis in $NN$ relative momenta ($p$) up tonext-to-leading order [NLO, $O(p^2)$] and next-to-next-to-next-to-leading order[N3LO, $O(p^4)$]. We consider two classes of pionless EFT potential: one thatacts in all partial waves and another that is dominated by $s$-wave physics.The two classes produce broadly similar results for calibrations to $NN$ dataup to $E_{rm lab}=5$ MeV. Our analysis accounts for the correlateduncertainties that arise from the truncation of the pionless EFT. Wesimultaneously estimate both the EFT LECs and the parameters that quantify thetruncation error. This permits the first quantitative estimates of the pionlessEFT breakdown scale, $Lambda_b$: the 95% intervals are $Lambda_b in[50.11,63.03]$ MeV at NLO and $Lambda_b in [72.27, 88.54]$ MeV at N3LO.Invoking naive dimensional analysis for the $NN$ potential, therefore, does notlead to consistent results across orders in pionless EFT. This exemplifies thepossible use of Bayesian tools to identify inconsistencies in a proposed EFTpower counting.
我们基于无先驱有效场理论(EFT)对$NN$相互作用中出现的双核($NN$)低能常数(LECs)进行贝叶斯模型校准。校准是针对以$NN$相对矩($p$)为基调的下至前导阶[NLO,$O(p^2)$]和次至下至前导阶[N3LO,$O(p^4)$]的天真维度分析构建的势进行的。我们考虑了两类无先驱EFT势:一类作用于所有部分波,另一类则以$s$波物理为主。这两类势在校准$NN$数据至$E_{rm lab}=5$ MeV时产生了大致相似的结果。我们的分析考虑了无先驱EFT截断所产生的相关不确定性。我们同时估算了EFT LEC和量化截断误差的参数。这就允许了对无先驱EFT击穿尺度$Lambda_b$的首次定量估计:在NLO时,95%的区间为$Lambda_b in[50.11,63.03]$ MeV,而在N3LO时,95%的区间为$Lambda_b in[72.27, 88.54]$ MeV。因此,对$NN$势进行天真的维度分析并不会导致无先驱EFT中各阶结果的一致性。这体现了使用贝叶斯工具来识别拟议EFT功率计算中的不一致性的可能性。
{"title":"Bayesian analysis of nucleon-nucleon scattering data in pionless effective field theory","authors":"J. M. Bub, M. Piarulli, R. J. Furnstahl, S. Pastore, D. R. Phillips","doi":"arxiv-2408.02480","DOIUrl":"https://doi.org/arxiv-2408.02480","url":null,"abstract":"We perform Bayesian model calibration of two-nucleon ($NN$) low-energy\u0000constants (LECs) appearing in an $NN$ interaction based on pionless effective\u0000field theory (EFT). The calibration is carried out for potentials constructed\u0000using naive dimensional analysis in $NN$ relative momenta ($p$) up to\u0000next-to-leading order [NLO, $O(p^2)$] and next-to-next-to-next-to-leading order\u0000[N3LO, $O(p^4)$]. We consider two classes of pionless EFT potential: one that\u0000acts in all partial waves and another that is dominated by $s$-wave physics.\u0000The two classes produce broadly similar results for calibrations to $NN$ data\u0000up to $E_{rm lab}=5$ MeV. Our analysis accounts for the correlated\u0000uncertainties that arise from the truncation of the pionless EFT. We\u0000simultaneously estimate both the EFT LECs and the parameters that quantify the\u0000truncation error. This permits the first quantitative estimates of the pionless\u0000EFT breakdown scale, $Lambda_b$: the 95% intervals are $Lambda_b in\u0000[50.11,63.03]$ MeV at NLO and $Lambda_b in [72.27, 88.54]$ MeV at N3LO.\u0000Invoking naive dimensional analysis for the $NN$ potential, therefore, does not\u0000lead to consistent results across orders in pionless EFT. This exemplifies the\u0000possible use of Bayesian tools to identify inconsistencies in a proposed EFT\u0000power counting.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"57 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141934987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example 我们能否改进 HEP 分类任务?柯尔莫哥洛夫-阿诺德网络在大型强子对撞机物理实例中的应用
Pub Date : 2024-08-05 DOI: arxiv-2408.02743
Johannes Erdmann, Florian Mausolf, Jan Lukas Späh
Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as analternative to multilayer perceptrons, suggesting advantages in performance andinterpretability. We study a typical binary event classification task inhigh-energy physics including high-level features and comment on theperformance and interpretability of KANs in this context. We find that thelearned activation functions of a one-layer KAN resemble the log-likelihoodratio of the input features. In deeper KANs, the activations in the first KANlayer differ from those in the one-layer KAN, which indicates that the deeperKANs learn more complex representations of the data. We study KANs withdifferent depths and widths and we compare them to multilayer perceptrons interms of performance and number of trainable parameters. For the chosenclassification task, we do not find that KANs are more parameter efficient.However, small KANs may offer advantages in terms of interpretability that comeat the cost of only a moderate loss in performance.
最近,有人提出用 Kolmogorov-Arnold 网络(KANs)替代多层感知器,这表明 KANs 在性能和可解释性方面具有优势。我们研究了高能物理中一个典型的二元事件分类任务,其中包括高层次特征,并对 KANs 在这种情况下的性能和可解释性进行了评论。我们发现,单层 KAN 学习到的激活函数类似于输入特征的对数似然比。在深度 KAN 中,第一层 KAN 的激活函数与单层 KAN 的激活函数不同,这表明深度 KAN 学习到了更复杂的数据表示。我们研究了不同深度和宽度的 KAN,并将它们与多层感知器在性能和可训练参数数量方面进行了比较。对于所选的分类任务,我们并没有发现 KANs 在参数效率上更高。不过,小型 KANs 在可解释性方面可能具有优势,但其代价是性能上的适度损失。
{"title":"KAN we improve on HEP classification tasks? Kolmogorov-Arnold Networks applied to an LHC physics example","authors":"Johannes Erdmann, Florian Mausolf, Jan Lukas Späh","doi":"arxiv-2408.02743","DOIUrl":"https://doi.org/arxiv-2408.02743","url":null,"abstract":"Recently, Kolmogorov-Arnold Networks (KANs) have been proposed as an\u0000alternative to multilayer perceptrons, suggesting advantages in performance and\u0000interpretability. We study a typical binary event classification task in\u0000high-energy physics including high-level features and comment on the\u0000performance and interpretability of KANs in this context. We find that the\u0000learned activation functions of a one-layer KAN resemble the log-likelihood\u0000ratio of the input features. In deeper KANs, the activations in the first KAN\u0000layer differ from those in the one-layer KAN, which indicates that the deeper\u0000KANs learn more complex representations of the data. We study KANs with\u0000different depths and widths and we compare them to multilayer perceptrons in\u0000terms of performance and number of trainable parameters. For the chosen\u0000classification task, we do not find that KANs are more parameter efficient.\u0000However, small KANs may offer advantages in terms of interpretability that come\u0000at the cost of only a moderate loss in performance.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"112 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On marginals and profiled posteriors for cosmological parameter estimation 关于宇宙学参数估计的边际和剖面后验
Pub Date : 2024-08-04 DOI: arxiv-2408.02063
Martin Kerscher, Jochen Weller
With several examples and in an analysis of the Pantheon+ supernova sample wediscuss the properties of the marginal posterior distribution versus theprofiled posterior distribution -- the profile likelihood in a Bayesiandisguise. We investigate whether maximisation, as used for the profiling, orintegration, as used for the marginalisation, is more appropriate. To reportresults we recommend the marginal posterior distribution.
通过几个例子和对Pantheon+超新星样本的分析,我们讨论了边际后验分布与剖面后验分布--贝叶斯化的剖面似然--的特性。我们研究了是最大化(用于剖析)更合适,还是积分(用于边际化)更合适。为了报告结果,我们推荐使用边际后验分布。
{"title":"On marginals and profiled posteriors for cosmological parameter estimation","authors":"Martin Kerscher, Jochen Weller","doi":"arxiv-2408.02063","DOIUrl":"https://doi.org/arxiv-2408.02063","url":null,"abstract":"With several examples and in an analysis of the Pantheon+ supernova sample we\u0000discuss the properties of the marginal posterior distribution versus the\u0000profiled posterior distribution -- the profile likelihood in a Bayesian\u0000disguise. We investigate whether maximisation, as used for the profiling, or\u0000integration, as used for the marginalisation, is more appropriate. To report\u0000results we recommend the marginal posterior distribution.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141935079","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TrackSorter: A Transformer-based sorting algorithm for track finding in High Energy Physics TrackSorter:基于变压器的排序算法,用于高能物理中的轨迹查找
Pub Date : 2024-07-31 DOI: arxiv-2407.21290
Yash Melkani, Xiangyang Ju
Track finding in particle data is a challenging pattern recognition problemin High Energy Physics. It takes as inputs a point cloud of space points andlabels them so that space points created by the same particle have the samelabel. The list of space points with the same label is a track candidate. Weargue that this pattern recognition problem can be formulated as a sortingproblem, of which the inputs are a list of space points sorted by theirdistances away from the collision points and the outputs are the space pointssorted by their labels. In this paper, we propose the TrackSorter algorithm: aTransformer-based algorithm for pattern recognition in particle data.TrackSorter uses a simple tokenization scheme to convert space points intodiscrete tokens. It then uses the tokenized space points as inputs and sortsthe input tokens into track candidates. TrackSorter is a novel end-to-end trackfinding algorithm that leverages Transformer-based models to solve patternrecognition problems. It is evaluated on the TrackML dataset and has good trackfinding performance.
粒子数据中的轨迹查找是高能物理中一个具有挑战性的模式识别问题。它将空间点的点云作为输入,并对其进行标注,从而使同一粒子产生的空间点具有相同的标签。具有相同标签的空间点列表就是候选轨迹。假设这个模式识别问题可以表述为一个排序问题,其中输入是按距离碰撞点的距离排序的空间点列表,输出是按标签排序的空间点。在本文中,我们提出了 TrackSorter 算法:一种基于变换器的粒子数据模式识别算法。然后,它将标记化的空间点作为输入,并将输入标记排序为候选轨迹。TrackSorter 是一种新颖的端到端轨迹查找算法,它利用基于 Transformer 的模型来解决模式识别问题。该算法在 TrackML 数据集上进行了评估,具有良好的寻轨性能。
{"title":"TrackSorter: A Transformer-based sorting algorithm for track finding in High Energy Physics","authors":"Yash Melkani, Xiangyang Ju","doi":"arxiv-2407.21290","DOIUrl":"https://doi.org/arxiv-2407.21290","url":null,"abstract":"Track finding in particle data is a challenging pattern recognition problem\u0000in High Energy Physics. It takes as inputs a point cloud of space points and\u0000labels them so that space points created by the same particle have the same\u0000label. The list of space points with the same label is a track candidate. We\u0000argue that this pattern recognition problem can be formulated as a sorting\u0000problem, of which the inputs are a list of space points sorted by their\u0000distances away from the collision points and the outputs are the space points\u0000sorted by their labels. In this paper, we propose the TrackSorter algorithm: a\u0000Transformer-based algorithm for pattern recognition in particle data.\u0000TrackSorter uses a simple tokenization scheme to convert space points into\u0000discrete tokens. It then uses the tokenized space points as inputs and sorts\u0000the input tokens into track candidates. TrackSorter is a novel end-to-end track\u0000finding algorithm that leverages Transformer-based models to solve pattern\u0000recognition problems. It is evaluated on the TrackML dataset and has good track\u0000finding performance.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"413 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865163","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Low dimensional fragment-based descriptors for property predictions in inorganic materials with machine learning 基于片段的低维描述符,利用机器学习预测无机材料的性能
Pub Date : 2024-07-30 DOI: arxiv-2407.21146
Md Mohaiminul Islam
In recent times, the use of machine learning in materials design anddiscovery has aided to accelerate the discovery of innovative materials withextraordinary properties, which otherwise would have been driven by a laboriousand time-consuming trial-and-error process. In this study, a simple yetpowerful fragment-based descriptor, Low Dimensional Fragment Descriptors(LDFD), is proposed to work in conjunction with machine learning models topredict important properties of a wide range of inorganic materials such asperovskite oxides, metal halide perovskites, alloys, semiconductor, and othermaterials system and can also be extended to work with interfaces. To predictproperties, the generation of descriptors requires only the structural formulaof the materials and, in presence of identical structure in the dataset,additional system properties as input. And the generation of descriptorsinvolves few steps, encoding the formula in binary space and reduction ofdimensionality, allowing easy implementation and prediction. To evaluatedescriptor performance, six known datasets with up to eight components werecompared. The method was applied to properties such as band gaps of perovskitesand semiconductors, lattice constant of magnetic alloys, bulk/shear modulus ofsuperhard alloys, critical temperature of superconductors, formation enthalpyand energy above hull convex of perovskite oxides. An advanced python-baseddata mining tool matminer was utilized for the collection of data. Theprediction accuracies are equivalent to the quality of the training data andshow comparable effectiveness as previous studies. This method should beextendable to any inorganic material systems which can be subdivided intolayers or crystal structures with more than one atom site, and with theprogress of data mining the performance should get better with larger andunbiased datasets.
近来,机器学习在材料设计和发现中的应用帮助加速了具有非凡特性的创新材料的发现,否则这些材料的发现只能通过费力费时的试错过程来完成。本研究提出了一种简单而强大的基于片段的描述符--低维片段描述符(LDFD),它可以与机器学习模型结合使用,预测各种无机材料的重要性质,如过氧化物氧化物、金属卤化物过氧化物、合金、半导体和其他材料系统,还可以扩展到界面。要预测特性,描述符的生成只需要输入材料的结构式,如果数据集中存在相同的结构,还需要输入额外的系统特性。描述符的生成步骤很少,只需在二进制空间中对公式进行编码并降低维度,因此易于实现和预测。为了评估描述符的性能,我们对六个已知数据集进行了比较,这些数据集最多包含八个成分。该方法适用于包晶和半导体的带隙、磁性合金的晶格常数、超硬合金的体积/剪切模量、超导体的临界温度、包晶氧化物的形成焓和凸面以上的能量等性质。收集数据时使用了先进的基于 python- 的数据挖掘工具 matminer。预测精度与训练数据的质量相当,并显示出与以往研究相当的有效性。随着数据挖掘技术的进步,该方法的性能会随着数据集的增大和无偏性而提高。
{"title":"Low dimensional fragment-based descriptors for property predictions in inorganic materials with machine learning","authors":"Md Mohaiminul Islam","doi":"arxiv-2407.21146","DOIUrl":"https://doi.org/arxiv-2407.21146","url":null,"abstract":"In recent times, the use of machine learning in materials design and\u0000discovery has aided to accelerate the discovery of innovative materials with\u0000extraordinary properties, which otherwise would have been driven by a laborious\u0000and time-consuming trial-and-error process. In this study, a simple yet\u0000powerful fragment-based descriptor, Low Dimensional Fragment Descriptors\u0000(LDFD), is proposed to work in conjunction with machine learning models to\u0000predict important properties of a wide range of inorganic materials such as\u0000perovskite oxides, metal halide perovskites, alloys, semiconductor, and other\u0000materials system and can also be extended to work with interfaces. To predict\u0000properties, the generation of descriptors requires only the structural formula\u0000of the materials and, in presence of identical structure in the dataset,\u0000additional system properties as input. And the generation of descriptors\u0000involves few steps, encoding the formula in binary space and reduction of\u0000dimensionality, allowing easy implementation and prediction. To evaluate\u0000descriptor performance, six known datasets with up to eight components were\u0000compared. The method was applied to properties such as band gaps of perovskites\u0000and semiconductors, lattice constant of magnetic alloys, bulk/shear modulus of\u0000superhard alloys, critical temperature of superconductors, formation enthalpy\u0000and energy above hull convex of perovskite oxides. An advanced python-based\u0000data mining tool matminer was utilized for the collection of data. The\u0000prediction accuracies are equivalent to the quality of the training data and\u0000show comparable effectiveness as previous studies. This method should be\u0000extendable to any inorganic material systems which can be subdivided into\u0000layers or crystal structures with more than one atom site, and with the\u0000progress of data mining the performance should get better with larger and\u0000unbiased datasets.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"263 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated Review Generation Method Based on Large Language Models 基于大型语言模型的自动评论生成方法
Pub Date : 2024-07-30 DOI: arxiv-2407.20906
Shican Wu, Xiao Ma, Dehui Luo, Lulu Li, Xiangcheng Shi, Xin Chang, Xiaoyun Lin, Ran Luo, Chunlei Pei, Zhi-Jian Zhao, Jinlong Gong
Literature research, vital for scientific advancement, is overwhelmed by thevast ocean of available information. Addressing this, we propose an automatedreview generation method based on Large Language Models (LLMs) to streamlineliterature processing and reduce cognitive load. In case study on propanedehydrogenation (PDH) catalysts, our method swiftly generated comprehensivereviews from 343 articles, averaging seconds per article per LLM account.Extended analysis of 1041 articles provided deep insights into catalysts'composition, structure, and performance. Recognizing LLMs' hallucinations, weemployed a multi-layered quality control strategy, ensuring our method'sreliability and effective hallucination mitigation. Expert verificationconfirms the accuracy and citation integrity of generated reviews,demonstrating LLM hallucination risks reduced to below 0.5% with over 95%confidence. Released Windows application enables one-click review generation,aiding researchers in tracking advancements and recommending literature. Thisapproach showcases LLMs' role in enhancing scientific research productivity andsets the stage for further exploration.
文献研究对科学进步至关重要,但面对浩如烟海的可用信息,文献研究显得力不从心。针对这一问题,我们提出了一种基于大语言模型(LLM)的自动综述生成方法,以简化文献处理并减轻认知负荷。在关于丙烷氢化(PDH)催化剂的案例研究中,我们的方法从 343 篇文章中迅速生成了综合评论,每个 LLM 账户平均每篇文章只需几秒钟。认识到 LLM 的幻觉,我们采用了多层质量控制策略,确保我们的方法可靠并有效地减少幻觉。专家验证确认了所生成评论的准确性和引文的完整性,表明 LLM 的幻觉风险降低到 0.5% 以下,可信度超过 95%。发布的 Windows 应用程序可以一键生成评论,帮助研究人员跟踪进展和推荐文献。这种方法展示了 LLM 在提高科研生产力方面的作用,并为进一步探索奠定了基础。
{"title":"Automated Review Generation Method Based on Large Language Models","authors":"Shican Wu, Xiao Ma, Dehui Luo, Lulu Li, Xiangcheng Shi, Xin Chang, Xiaoyun Lin, Ran Luo, Chunlei Pei, Zhi-Jian Zhao, Jinlong Gong","doi":"arxiv-2407.20906","DOIUrl":"https://doi.org/arxiv-2407.20906","url":null,"abstract":"Literature research, vital for scientific advancement, is overwhelmed by the\u0000vast ocean of available information. Addressing this, we propose an automated\u0000review generation method based on Large Language Models (LLMs) to streamline\u0000literature processing and reduce cognitive load. In case study on propane\u0000dehydrogenation (PDH) catalysts, our method swiftly generated comprehensive\u0000reviews from 343 articles, averaging seconds per article per LLM account.\u0000Extended analysis of 1041 articles provided deep insights into catalysts'\u0000composition, structure, and performance. Recognizing LLMs' hallucinations, we\u0000employed a multi-layered quality control strategy, ensuring our method's\u0000reliability and effective hallucination mitigation. Expert verification\u0000confirms the accuracy and citation integrity of generated reviews,\u0000demonstrating LLM hallucination risks reduced to below 0.5% with over 95%\u0000confidence. Released Windows application enables one-click review generation,\u0000aiding researchers in tracking advancements and recommending literature. This\u0000approach showcases LLMs' role in enhancing scientific research productivity and\u0000sets the stage for further exploration.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian technique to combine independently-trained Machine-Learning models applied to direct dark matter detection 将独立训练的机器学习模型应用于暗物质直接探测的贝叶斯技术
Pub Date : 2024-07-30 DOI: arxiv-2407.21008
David Cerdeno, Martin de los Rios, Andres D. Perez
We carry out a Bayesian analysis of dark matter (DM) direct detection data todetermine particle model parameters using the Truncated Marginal Neural RatioEstimation (TMNRE) machine learning technique. TMNRE avoids an explicitcalculation of the likelihood, which instead is estimated from simulated data,unlike in traditional Markov Chain Monte Carlo (MCMC) algorithms. Thisconsiderably speeds up, by several orders of magnitude, the computation of theposterior distributions, which allows to perform the Bayesian analysis of anotherwise computationally prohibitive number of benchmark points. In thisarticle we demonstrate that, in the TMNRE framework, it is possible to include,combine, and remove different datasets in a modular fashion, which is fast andsimple as there is no need to re-train the machine learning algorithm or todefine a combined likelihood. In order to assess the performance of thismethod, we consider the case of WIMP DM with spin-dependent and independentinteractions with protons and neutrons in a xenon experiment. After validatingour results with MCMC, we employ the TMNRE procedure to determine the regionswhere the DM parameters can be reconstructed. Finally, we present CADDENA, aPython package that implements the modular Bayesian analysis of directdetection experiments described in this work.
我们对暗物质(DM)直接探测数据进行了贝叶斯分析,利用截断边际神经比估计(TMNRE)机器学习技术确定粒子模型参数。与传统的马尔可夫链蒙特卡洛(MCMC)算法不同,TMNRE 避免了对似然的明确计算,而是通过模拟数据进行估计。这大大加快了后验分布的计算速度,使贝叶斯分析得以对大量基准点进行,否则计算量将大得令人望而却步。在本文中,我们证明了在 TMNRE 框架中,可以以模块化方式包含、组合和移除不同的数据集,由于无需重新训练机器学习算法或定义组合似然,因此既快速又简单。为了评估这种方法的性能,我们考虑了在氙实验中与质子和中子发生自旋依赖和独立相互作用的 WIMP DM 的情况。在用 MCMC 验证了我们的结果之后,我们采用 TMNRE 程序来确定可以重建 DM 参数的区域。最后,我们介绍了 CADDENA,它是一个 Python 软件包,用于实现本文所述的直接探测实验的模块化贝叶斯分析。
{"title":"Bayesian technique to combine independently-trained Machine-Learning models applied to direct dark matter detection","authors":"David Cerdeno, Martin de los Rios, Andres D. Perez","doi":"arxiv-2407.21008","DOIUrl":"https://doi.org/arxiv-2407.21008","url":null,"abstract":"We carry out a Bayesian analysis of dark matter (DM) direct detection data to\u0000determine particle model parameters using the Truncated Marginal Neural Ratio\u0000Estimation (TMNRE) machine learning technique. TMNRE avoids an explicit\u0000calculation of the likelihood, which instead is estimated from simulated data,\u0000unlike in traditional Markov Chain Monte Carlo (MCMC) algorithms. This\u0000considerably speeds up, by several orders of magnitude, the computation of the\u0000posterior distributions, which allows to perform the Bayesian analysis of an\u0000otherwise computationally prohibitive number of benchmark points. In this\u0000article we demonstrate that, in the TMNRE framework, it is possible to include,\u0000combine, and remove different datasets in a modular fashion, which is fast and\u0000simple as there is no need to re-train the machine learning algorithm or to\u0000define a combined likelihood. In order to assess the performance of this\u0000method, we consider the case of WIMP DM with spin-dependent and independent\u0000interactions with protons and neutrons in a xenon experiment. After validating\u0000our results with MCMC, we employ the TMNRE procedure to determine the regions\u0000where the DM parameters can be reconstructed. Finally, we present CADDENA, a\u0000Python package that implements the modular Bayesian analysis of direct\u0000detection experiments described in this work.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Universal New Physics Latent Space 通用新物理学潜空间
Pub Date : 2024-07-29 DOI: arxiv-2407.20315
Anna Hallin, Gregor Kasieczka, Sabine Kraml, André Lessa, Louis Moureaux, Tore von Schwartz, David Shih
We develop a machine learning method for mapping data originating from bothStandard Model processes and various theories beyond the Standard Model into aunified representation (latent) space while conserving information about therelationship between the underlying theories. We apply our method to threeexamples of new physics at the LHC of increasing complexity, showing thatmodels can be clustered according to their LHC phenomenology: different modelsare mapped to distinct regions in latent space, while indistinguishable modelsare mapped to the same region. This opens interesting new avenues on severalfronts, such as model discrimination, selection of representative benchmarkscenarios, and identifying gaps in the coverage of model space.
我们开发了一种机器学习方法,用于将源自标准模型过程和标准模型之外的各种理论的数据映射到一个统一的表示(潜在)空间,同时保留了有关基础理论之间关系的信息。我们将我们的方法应用于大型强子对撞机上三个复杂度不断增加的新物理实例,结果表明模型可以根据它们的大型强子对撞机现象学进行聚类:不同的模型被映射到潜在空间的不同区域,而无法区分的模型则被映射到同一区域。这在多个方面开辟了有趣的新途径,如模型鉴别、选择有代表性的基准情景,以及确定模型空间覆盖范围中的空白。
{"title":"Universal New Physics Latent Space","authors":"Anna Hallin, Gregor Kasieczka, Sabine Kraml, André Lessa, Louis Moureaux, Tore von Schwartz, David Shih","doi":"arxiv-2407.20315","DOIUrl":"https://doi.org/arxiv-2407.20315","url":null,"abstract":"We develop a machine learning method for mapping data originating from both\u0000Standard Model processes and various theories beyond the Standard Model into a\u0000unified representation (latent) space while conserving information about the\u0000relationship between the underlying theories. We apply our method to three\u0000examples of new physics at the LHC of increasing complexity, showing that\u0000models can be clustered according to their LHC phenomenology: different models\u0000are mapped to distinct regions in latent space, while indistinguishable models\u0000are mapped to the same region. This opens interesting new avenues on several\u0000fronts, such as model discrimination, selection of representative benchmark\u0000scenarios, and identifying gaps in the coverage of model space.","PeriodicalId":501065,"journal":{"name":"arXiv - PHYS - Data Analysis, Statistics and Probability","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - PHYS - Data Analysis, Statistics and Probability
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1