Symbolic regression for the interpretation of quantitative structure-property relationships

Katsushi Takaki , Tomoyuki Miyao
{"title":"Symbolic regression for the interpretation of quantitative structure-property relationships","authors":"Katsushi Takaki ,&nbsp;Tomoyuki Miyao","doi":"10.1016/j.ailsci.2022.100046","DOIUrl":null,"url":null,"abstract":"<div><p>The interpretation of quantitative structure–activity or structure–property relationships is important in the field of chemoinformatics. Although multivariate linear regression models are typically interpretable, they do not generally have high predictive abilities. Symbolic regression (SR) combined with genetic programming (GP) is a well-established technique for generating the mathematical expressions that describe the relationships within a dataset. However, SR sometimes produces complicated expressions that are hard for humans to interpret. This paper proposes a method for generating simpler expressions by incorporating three filters into GP-based SR. The filters are further combined with nonlinear least-squares optimization to give filter-introduced GP (FIGP), which improves the predictive ability of SR models while retaining simple expressions. As a proof-of-concept, the quantitative estimate of drug-likeness and the synthetic accessibility score are predicted based on the chemical structures of compounds. Overall, FIGP generates less-complicated expressions than previous SR methods. In terms of predictive ability, FIGP is better than GP, but is outperformed by a support vector machine with a radial basis function kernel. Furthermore, quantitative structure–activity relationship models are constructed for three matching molecular series with biological targets. In the case of one target, the activity prediction models given by FIGP exhibit better predictive ability than multivariate linear regression and support vector regression with the radial basis function kernel, whereas for the remaining cases, FIGP is slightly less accurate than multivariate linear regression.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"2 ","pages":"Article 100046"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318522000162/pdfft?md5=d40d5f4fb6a5861ba6faf6c4bcb2c52c&pid=1-s2.0-S2667318522000162-main.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence in the life sciences","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667318522000162","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The interpretation of quantitative structure–activity or structure–property relationships is important in the field of chemoinformatics. Although multivariate linear regression models are typically interpretable, they do not generally have high predictive abilities. Symbolic regression (SR) combined with genetic programming (GP) is a well-established technique for generating the mathematical expressions that describe the relationships within a dataset. However, SR sometimes produces complicated expressions that are hard for humans to interpret. This paper proposes a method for generating simpler expressions by incorporating three filters into GP-based SR. The filters are further combined with nonlinear least-squares optimization to give filter-introduced GP (FIGP), which improves the predictive ability of SR models while retaining simple expressions. As a proof-of-concept, the quantitative estimate of drug-likeness and the synthetic accessibility score are predicted based on the chemical structures of compounds. Overall, FIGP generates less-complicated expressions than previous SR methods. In terms of predictive ability, FIGP is better than GP, but is outperformed by a support vector machine with a radial basis function kernel. Furthermore, quantitative structure–activity relationship models are constructed for three matching molecular series with biological targets. In the case of one target, the activity prediction models given by FIGP exhibit better predictive ability than multivariate linear regression and support vector regression with the radial basis function kernel, whereas for the remaining cases, FIGP is slightly less accurate than multivariate linear regression.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
符号回归在定量构效关系解释中的应用
定量结构-活性或结构-性质关系的解释在化学信息学领域是重要的。虽然多元线性回归模型通常是可解释的,但它们通常没有很高的预测能力。符号回归(SR)结合遗传规划(GP)是一种成熟的技术,用于生成描述数据集中关系的数学表达式。然而,SR有时会产生人类难以理解的复杂表达。本文提出了一种将三个滤波器合并到基于遗传算法的遗传算法中生成更简单表达式的方法,并将这些滤波器与非线性最小二乘优化相结合,得到滤波引入遗传算法(FIGP),在保留简单表达式的同时提高了遗传算法模型的预测能力。作为概念验证,基于化合物的化学结构预测了药物相似性的定量估计和合成可及性评分。总的来说,FIGP生成的表达式比以前的SR方法简单。在预测能力方面,FIGP优于GP,但优于具有径向基函数核的支持向量机。在此基础上,构建了具有生物靶点的三个匹配分子序列的定量构效关系模型。在一个目标的情况下,FIGP给出的活动预测模型的预测能力优于多元线性回归和径向基函数核支持向量回归,而在其余情况下,FIGP的预测精度略低于多元线性回归。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Artificial intelligence in the life sciences
Artificial intelligence in the life sciences Pharmacology, Biochemistry, Genetics and Molecular Biology (General), Computer Science Applications, Health Informatics, Drug Discovery, Veterinary Science and Veterinary Medicine (General)
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
15 days
期刊最新文献
Multi-objective synthesis planning by means of Monte Carlo Tree search Enhancing uncertainty quantification in drug discovery with censored regression labels Conformal prediction-based machine learning in Cheminformatics: Current applications and new challenges LIDEB's Useful Decoys (LUDe): A freely available decoy-generation tool. Benchmarking and scope “Foundation models for research: A matter of trust?”
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1