SRBench++: Principled Benchmarking of Symbolic Regression With Domain-Expert Interpretation

IF 11.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE IEEE Transactions on Evolutionary Computation Pub Date : 2024-07-04 DOI:10.1109/TEVC.2024.3423681

F. O. de Franca;M. Virgolin;M. Kommenda;M. S. Majumder;M. Cranmer;G. Espada;L. Ingelse;A. Fonseca;M. Landajuela;B. Petersen;R. Glatt;N. Mundhenk;C. S. Lee;J. D. Hochhalter;D. L. Randall;P. Kamienny;H. Zhang;G. Dick;A. Simon;B. Burlacu;Jaan Kasak;Meera Machado;Casper Wilstrup;W. G. La Cava

{"title":"SRBench++: Principled Benchmarking of Symbolic Regression With Domain-Expert Interpretation","authors":"F. O. de Franca;M. Virgolin;M. Kommenda;M. S. Majumder;M. Cranmer;G. Espada;L. Ingelse;A. Fonseca;M. Landajuela;B. Petersen;R. Glatt;N. Mundhenk;C. S. Lee;J. D. Hochhalter;D. L. Randall;P. Kamienny;H. Zhang;G. Dick;A. Simon;B. Burlacu;Jaan Kasak;Meera Machado;Casper Wilstrup;W. G. La Cava","doi":"10.1109/TEVC.2024.3423681","DOIUrl":null,"url":null,"abstract":"Symbolic regression (SR) searches for analytic expressions that accurately describe studied phenomena. The main promise of this approach is that it may return an interpretable model that can be insightful to users, while maintaining high accuracy. The current standard for benchmarking these algorithms is SRBench, which evaluates methods on hundreds of datasets that are a mix of real-world and simulated processes spanning multiple domains. At present, the ability of SRBench to evaluate interpretability is limited to measuring the size of expressions on real-world data, and the exactness of model forms on synthetic data. In practice, model size is only one of many factors used by subject experts to determine how interpretable a model truly is. Furthermore, SRBench does not characterize algorithm performance on specific, challenging subtasks of regression, such as feature selection and evasion of local minima. In this work, we propose and evaluate an approach to benchmarking SR algorithms that addresses these limitations of SRBench by 1) incorporating expert evaluations of interpretability on a domain-specific task, and 2) evaluating algorithms over distinct properties of data science tasks. We evaluate 12 modern SR algorithms on these benchmarks and present an in-depth analysis of the results, discuss current challenges of SR algorithms and highlight possible improvements for the benchmark itself.","PeriodicalId":13206,"journal":{"name":"IEEE Transactions on Evolutionary Computation","volume":"29 4","pages":"1127-1137"},"PeriodicalIF":11.7000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10586218","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10586218/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Symbolic regression (SR) searches for analytic expressions that accurately describe studied phenomena. The main promise of this approach is that it may return an interpretable model that can be insightful to users, while maintaining high accuracy. The current standard for benchmarking these algorithms is SRBench, which evaluates methods on hundreds of datasets that are a mix of real-world and simulated processes spanning multiple domains. At present, the ability of SRBench to evaluate interpretability is limited to measuring the size of expressions on real-world data, and the exactness of model forms on synthetic data. In practice, model size is only one of many factors used by subject experts to determine how interpretable a model truly is. Furthermore, SRBench does not characterize algorithm performance on specific, challenging subtasks of regression, such as feature selection and evasion of local minima. In this work, we propose and evaluate an approach to benchmarking SR algorithms that addresses these limitations of SRBench by 1) incorporating expert evaluations of interpretability on a domain-specific task, and 2) evaluating algorithms over distinct properties of data science tasks. We evaluate 12 modern SR algorithms on these benchmarks and present an in-depth analysis of the results, discuss current challenges of SR algorithms and highlight possible improvements for the benchmark itself.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SRBench++：具有领域专家解释的符号回归原则基准测试

符号回归（SR）寻找准确描述所研究现象的解析表达式。这种方法的主要承诺是，它可以返回一个可解释的模型，可以对用户有深刻的见解，同时保持高准确性。目前对这些算法进行基准测试的标准是SRBench，它在数百个数据集上评估方法，这些数据集混合了跨越多个领域的真实世界和模拟过程。目前，SRBench评估可解释性的能力仅限于测量真实数据上表达式的大小，以及合成数据上模型形式的准确性。在实践中，模型大小只是学科专家用来确定模型真正可解释性的众多因素之一。此外，SRBench没有描述算法在特定的、具有挑战性的回归子任务上的性能，例如特征选择和逃避局部最小值。在这项工作中，我们提出并评估了一种对SR算法进行基准测试的方法，该方法通过以下方式解决了SRBench的这些局限性：1)结合专家对特定领域任务的可解释性的评估，以及2)评估数据科学任务不同属性的算法。我们在这些基准上评估了12种现代SR算法，并对结果进行了深入分析，讨论了SR算法当前面临的挑战，并强调了基准本身可能的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Evolutionary Computation 工程技术-计算机：理论方法

CiteScore

21.90

自引率

9.80%

发文量

196

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on Evolutionary Computation is published by the IEEE Computational Intelligence Society on behalf of 13 societies: Circuits and Systems; Computer; Control Systems; Engineering in Medicine and Biology; Industrial Electronics; Industry Applications; Lasers and Electro-Optics; Oceanic Engineering; Power Engineering; Robotics and Automation; Signal Processing; Social Implications of Technology; and Systems, Man, and Cybernetics. The journal publishes original papers in evolutionary computation and related areas such as nature-inspired algorithms, population-based methods, optimization, and hybrid systems. It welcomes both purely theoretical papers and application papers that provide general insights into these areas of computation.