F. O. de Franca;M. Virgolin;M. Kommenda;M. S. Majumder;M. Cranmer;G. Espada;L. Ingelse;A. Fonseca;M. Landajuela;B. Petersen;R. Glatt;N. Mundhenk;C. S. Lee;J. D. Hochhalter;D. L. Randall;P. Kamienny;H. Zhang;G. Dick;A. Simon;B. Burlacu;Jaan Kasak;Meera Machado;Casper Wilstrup;W. G. La Cava
{"title":"SRBench++: Principled Benchmarking of Symbolic Regression With Domain-Expert Interpretation","authors":"F. O. de Franca;M. Virgolin;M. Kommenda;M. S. Majumder;M. Cranmer;G. Espada;L. Ingelse;A. Fonseca;M. Landajuela;B. Petersen;R. Glatt;N. Mundhenk;C. S. Lee;J. D. Hochhalter;D. L. Randall;P. Kamienny;H. Zhang;G. Dick;A. Simon;B. Burlacu;Jaan Kasak;Meera Machado;Casper Wilstrup;W. G. La Cava","doi":"10.1109/TEVC.2024.3423681","DOIUrl":null,"url":null,"abstract":"Symbolic regression (SR) searches for analytic expressions that accurately describe studied phenomena. The main promise of this approach is that it may return an interpretable model that can be insightful to users, while maintaining high accuracy. The current standard for benchmarking these algorithms is SRBench, which evaluates methods on hundreds of datasets that are a mix of real-world and simulated processes spanning multiple domains. At present, the ability of SRBench to evaluate interpretability is limited to measuring the size of expressions on real-world data, and the exactness of model forms on synthetic data. In practice, model size is only one of many factors used by subject experts to determine how interpretable a model truly is. Furthermore, SRBench does not characterize algorithm performance on specific, challenging subtasks of regression, such as feature selection and evasion of local minima. In this work, we propose and evaluate an approach to benchmarking SR algorithms that addresses these limitations of SRBench by 1) incorporating expert evaluations of interpretability on a domain-specific task, and 2) evaluating algorithms over distinct properties of data science tasks. We evaluate 12 modern SR algorithms on these benchmarks and present an in-depth analysis of the results, discuss current challenges of SR algorithms and highlight possible improvements for the benchmark itself.","PeriodicalId":13206,"journal":{"name":"IEEE Transactions on Evolutionary Computation","volume":"29 4","pages":"1127-1137"},"PeriodicalIF":11.7000,"publicationDate":"2024-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10586218","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Evolutionary Computation","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10586218/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Symbolic regression (SR) searches for analytic expressions that accurately describe studied phenomena. The main promise of this approach is that it may return an interpretable model that can be insightful to users, while maintaining high accuracy. The current standard for benchmarking these algorithms is SRBench, which evaluates methods on hundreds of datasets that are a mix of real-world and simulated processes spanning multiple domains. At present, the ability of SRBench to evaluate interpretability is limited to measuring the size of expressions on real-world data, and the exactness of model forms on synthetic data. In practice, model size is only one of many factors used by subject experts to determine how interpretable a model truly is. Furthermore, SRBench does not characterize algorithm performance on specific, challenging subtasks of regression, such as feature selection and evasion of local minima. In this work, we propose and evaluate an approach to benchmarking SR algorithms that addresses these limitations of SRBench by 1) incorporating expert evaluations of interpretability on a domain-specific task, and 2) evaluating algorithms over distinct properties of data science tasks. We evaluate 12 modern SR algorithms on these benchmarks and present an in-depth analysis of the results, discuss current challenges of SR algorithms and highlight possible improvements for the benchmark itself.
期刊介绍:
The IEEE Transactions on Evolutionary Computation is published by the IEEE Computational Intelligence Society on behalf of 13 societies: Circuits and Systems; Computer; Control Systems; Engineering in Medicine and Biology; Industrial Electronics; Industry Applications; Lasers and Electro-Optics; Oceanic Engineering; Power Engineering; Robotics and Automation; Signal Processing; Social Implications of Technology; and Systems, Man, and Cybernetics. The journal publishes original papers in evolutionary computation and related areas such as nature-inspired algorithms, population-based methods, optimization, and hybrid systems. It welcomes both purely theoretical papers and application papers that provide general insights into these areas of computation.