Cp3-bench：一个用宇宙学对符号回归算法进行基准测试的工具

IF 5.3 2区物理与天体物理 Q1 ASTRONOMY & ASTROPHYSICS Journal of Cosmology and Astroparticle Physics Pub Date : 2025-01-09 DOI:10.1088/1475-7516/2025/01/040

M.E. Thing and S.M. Koksbang

{"title":"Cp3-bench：一个用宇宙学对符号回归算法进行基准测试的工具","authors":"M.E. Thing and S.M. Koksbang","doi":"10.1088/1475-7516/2025/01/040","DOIUrl":null,"url":null,"abstract":"We introduce cp3-bench, a tool for comparing/benching symbolic regression algorithms, which we make publicly available at https://github.com/CP3-Origins/cp3-bench. In its current format, cp3-bench includes 12 different symbolic regression algorithms which can be automatically installed as part of cp3-bench. The philosophy behind cp3-bench is that is should be as user-friendly as possible, available in a ready-to-use format, and allow for easy additions of new algorithms and datasets. Our hope is that users of symbolic regression algorithms can use cp3-bench to easily install and compare/bench an array of symbolic regression algorithms to better decide which algorithms to use for their specific tasks at hand. To introduce and motivate the use of cp3-bench we present a small benchmark of 12 symbolic regression algorithms applied to 28 datasets representing six different cosmological and astroparticle physics setups. Overall, we find that most of the benched algorithms do rather poorly in the benchmark and suggest possible ways to proceed with developing algorithms that will be better at identifying ground truth expressions for cosmological and astroparticle physics datasets. Our demonstration benchmark specifically studies the significance of dimensionality of the feature space and precision of datasets. We find both to be highly important for symbolic regression tasks to be successful. On the other hand, we find no indication that inter-dependence of features in datasets is particularly important, meaning that it is not in general a hindrance for symbolic regression algorithms if datasets e.g. contain both z and H(z) as features. Lastly, we note that we find no indication that performance of algorithms on standardized datasets are good indicators of performance on particular cosmological and astrophysical datasets. This suggests that it is not necessarily prudent to choose symbolic regression algorithms based on their performance on standardized data. Instead, a more robust approach is to consider a variety of algorithms, chosen based on the particular task at hand that one wishes to apply symbolic regression to.","PeriodicalId":15445,"journal":{"name":"Journal of Cosmology and Astroparticle Physics","volume":"43 1","pages":""},"PeriodicalIF":5.3000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"cp3-bench: a tool for benchmarking symbolic regression algorithms demonstrated with cosmology\",\"authors\":\"M.E. Thing and S.M. Koksbang\",\"doi\":\"10.1088/1475-7516/2025/01/040\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We introduce cp3-bench, a tool for comparing/benching symbolic regression algorithms, which we make publicly available at https://github.com/CP3-Origins/cp3-bench. In its current format, cp3-bench includes 12 different symbolic regression algorithms which can be automatically installed as part of cp3-bench. The philosophy behind cp3-bench is that is should be as user-friendly as possible, available in a ready-to-use format, and allow for easy additions of new algorithms and datasets. Our hope is that users of symbolic regression algorithms can use cp3-bench to easily install and compare/bench an array of symbolic regression algorithms to better decide which algorithms to use for their specific tasks at hand. To introduce and motivate the use of cp3-bench we present a small benchmark of 12 symbolic regression algorithms applied to 28 datasets representing six different cosmological and astroparticle physics setups. Overall, we find that most of the benched algorithms do rather poorly in the benchmark and suggest possible ways to proceed with developing algorithms that will be better at identifying ground truth expressions for cosmological and astroparticle physics datasets. Our demonstration benchmark specifically studies the significance of dimensionality of the feature space and precision of datasets. We find both to be highly important for symbolic regression tasks to be successful. On the other hand, we find no indication that inter-dependence of features in datasets is particularly important, meaning that it is not in general a hindrance for symbolic regression algorithms if datasets e.g. contain both z and H(z) as features. Lastly, we note that we find no indication that performance of algorithms on standardized datasets are good indicators of performance on particular cosmological and astrophysical datasets. This suggests that it is not necessarily prudent to choose symbolic regression algorithms based on their performance on standardized data. Instead, a more robust approach is to consider a variety of algorithms, chosen based on the particular task at hand that one wishes to apply symbolic regression to.\",\"PeriodicalId\":15445,\"journal\":{\"name\":\"Journal of Cosmology and Astroparticle Physics\",\"volume\":\"43 1\",\"pages\":\"\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Cosmology and Astroparticle Physics\",\"FirstCategoryId\":\"101\",\"ListUrlMain\":\"https://doi.org/10.1088/1475-7516/2025/01/040\",\"RegionNum\":2,\"RegionCategory\":\"物理与天体物理\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ASTRONOMY & ASTROPHYSICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Cosmology and Astroparticle Physics","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/1475-7516/2025/01/040","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ASTRONOMY & ASTROPHYSICS","Score":null,"Total":0}

引用次数: 0

摘要

我们介绍了cp3-bench，这是一个比较/检验符号回归算法的工具，我们在https://github.com/CP3-Origins/cp3-bench上公开提供了它。在目前的格式中，cp3-bench包括12种不同的符号回归算法，这些算法可以作为cp3-bench的一部分自动安装。cp3-bench背后的理念是，它应该尽可能用户友好，以现成的格式提供，并允许轻松添加新算法和数据集。我们希望符号回归算法的用户可以使用cp3-bench轻松地安装和比较/bench一系列符号回归算法，以便更好地决定使用哪些算法来完成手头的特定任务。为了介绍和激励cp3-bench的使用，我们提出了一个小型基准，其中12种符号回归算法应用于28个数据集，代表6种不同的宇宙学和天体粒子物理设置。总体而言，我们发现大多数基准算法在基准测试中表现相当差，并提出了继续开发算法的可能方法，这些算法将更好地识别宇宙学和天体粒子物理数据集的基本真值表达式。我们的示范基准具体研究了特征空间维度的重要性和数据集的精度。我们发现这两者对于符号回归任务的成功是非常重要的。另一方面，我们没有发现数据集中特征的相互依赖性特别重要的迹象，这意味着如果数据集同时包含z和H(z)作为特征，它通常不会阻碍符号回归算法。最后，我们注意到，我们没有发现任何迹象表明算法在标准化数据集上的性能是特定宇宙学和天体物理学数据集性能的良好指标。这表明，根据标准化数据上的性能选择符号回归算法不一定是谨慎的。相反，更健壮的方法是考虑各种算法，这些算法是根据希望应用符号回归的特定任务选择的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

cp3-bench: a tool for benchmarking symbolic regression algorithms demonstrated with cosmology

We introduce cp3-bench, a tool for comparing/benching symbolic regression algorithms, which we make publicly available at https://github.com/CP3-Origins/cp3-bench. In its current format, cp3-bench includes 12 different symbolic regression algorithms which can be automatically installed as part of cp3-bench. The philosophy behind cp3-bench is that is should be as user-friendly as possible, available in a ready-to-use format, and allow for easy additions of new algorithms and datasets. Our hope is that users of symbolic regression algorithms can use cp3-bench to easily install and compare/bench an array of symbolic regression algorithms to better decide which algorithms to use for their specific tasks at hand. To introduce and motivate the use of cp3-bench we present a small benchmark of 12 symbolic regression algorithms applied to 28 datasets representing six different cosmological and astroparticle physics setups. Overall, we find that most of the benched algorithms do rather poorly in the benchmark and suggest possible ways to proceed with developing algorithms that will be better at identifying ground truth expressions for cosmological and astroparticle physics datasets. Our demonstration benchmark specifically studies the significance of dimensionality of the feature space and precision of datasets. We find both to be highly important for symbolic regression tasks to be successful. On the other hand, we find no indication that inter-dependence of features in datasets is particularly important, meaning that it is not in general a hindrance for symbolic regression algorithms if datasets e.g. contain both z and H(z) as features. Lastly, we note that we find no indication that performance of algorithms on standardized datasets are good indicators of performance on particular cosmological and astrophysical datasets. This suggests that it is not necessarily prudent to choose symbolic regression algorithms based on their performance on standardized data. Instead, a more robust approach is to consider a variety of algorithms, chosen based on the particular task at hand that one wishes to apply symbolic regression to.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Cosmology and Astroparticle Physics 地学天文-天文与天体物理

CiteScore

10.20

自引率

23.40%

发文量

632

审稿时长

1 months

期刊介绍： Journal of Cosmology and Astroparticle Physics (JCAP) encompasses theoretical, observational and experimental areas as well as computation and simulation. The journal covers the latest developments in the theory of all fundamental interactions and their cosmological implications (e.g. M-theory and cosmology, brane cosmology). JCAP''s coverage also includes topics such as formation, dynamics and clustering of galaxies, pre-galactic star formation, x-ray astronomy, radio astronomy, gravitational lensing, active galactic nuclei, intergalactic and interstellar matter.

期刊最新文献

Spacetime surgery for black hole fireworks Distinct photon-ALP propagation modes Ultra slow-roll with a black hole ANNZ+: an enhanced photometric redshift estimation algorithm with applications on the PAU survey Hot Casimir wormholes