回归模型的适当评价测量

IF 0.4 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Chem-Bio Informatics Journal Pub Date : 2021-09-15 DOI:10.1273/cbij.21.59
Tsuyoshi Esaki
{"title":"回归模型的适当评价测量","authors":"Tsuyoshi Esaki","doi":"10.1273/cbij.21.59","DOIUrl":null,"url":null,"abstract":"In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Appropriate Evaluation Measurements for Regression Models\",\"authors\":\"Tsuyoshi Esaki\",\"doi\":\"10.1273/cbij.21.59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.\",\"PeriodicalId\":40659,\"journal\":{\"name\":\"Chem-Bio Informatics Journal\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chem-Bio Informatics Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1273/cbij.21.59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chem-Bio Informatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1273/cbij.21.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 1

摘要

近年来,加快寻找种子化合物的速度和降低药物研究的成本已成为一种必要。因此,我们期待计算机药物发现方法的贡献,即利用化合物的物理化学特征和亚结构指纹来预测候选新药。在不进行实验的情况下选择种子化合物可以使我们减少药物开发所需的时间和成本。然而,仅使用简单的线性模型来估计我们体内化合物的特性是不令人满意的,因为化合物的作用和分布是由我们体内的环境及其与其他分子的相互作用决定的。与简单的模型相比,更复杂的模型已经被用来估计具有较高预测精度的复合特性。因此,在选择适合研究目的的模型时,正确评估预测性能变得越来越重要。行列式系数,即众所周知的r2,是评价回归模型最著名的统计度量之一。然而,这种方法不能用于评价非线性模型。本文解释了使用决定系数的困难,并在以下两种情况下提出了适当的统计措施:交叉验证的均方误差(MSE),以及试验数据的观测值和预测值的MSE与相关系数。由于了解和正确使用统计方法是必要的,所建议的方法将支持有前途的种子化合物的有效选择和加速药物的发现。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Appropriate Evaluation Measurements for Regression Models
In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Chem-Bio Informatics Journal
Chem-Bio Informatics Journal BIOCHEMISTRY & MOLECULAR BIOLOGY-
CiteScore
0.60
自引率
0.00%
发文量
8
期刊最新文献
Structural Stability and Binding Ability of SARS-CoV-2 Main Protease with GC376: A Stereoisomeric Covalent Ligand Analysis by FMO calculation Enzyme Kinetics Based on the Concept of Flux Enzyme Kinetics Based on the Concept of Flux Application of Model Core Potentials to Zn- and Mg-containing Metalloproteins in the Fragment Molecular Orbital Method How Beneficial or Threatening is Artificial Intelligence?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1