回归模型的适当评价测量

IF 0.4 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY Chem-Bio Informatics Journal Pub Date : 2021-09-15 DOI:10.1273/cbij.21.59

Tsuyoshi Esaki

{"title":"回归模型的适当评价测量","authors":"Tsuyoshi Esaki","doi":"10.1273/cbij.21.59","DOIUrl":null,"url":null,"abstract":"In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.","PeriodicalId":40659,"journal":{"name":"Chem-Bio Informatics Journal","volume":"40 2 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2021-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Appropriate Evaluation Measurements for Regression Models\",\"authors\":\"Tsuyoshi Esaki\",\"doi\":\"10.1273/cbij.21.59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.\",\"PeriodicalId\":40659,\"journal\":{\"name\":\"Chem-Bio Informatics Journal\",\"volume\":\"40 2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2021-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Chem-Bio Informatics Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1273/cbij.21.59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chem-Bio Informatics Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1273/cbij.21.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 1

摘要

近年来，加快寻找种子化合物的速度和降低药物研究的成本已成为一种必要。因此，我们期待计算机药物发现方法的贡献，即利用化合物的物理化学特征和亚结构指纹来预测候选新药。在不进行实验的情况下选择种子化合物可以使我们减少药物开发所需的时间和成本。然而，仅使用简单的线性模型来估计我们体内化合物的特性是不令人满意的，因为化合物的作用和分布是由我们体内的环境及其与其他分子的相互作用决定的。与简单的模型相比，更复杂的模型已经被用来估计具有较高预测精度的复合特性。因此，在选择适合研究目的的模型时，正确评估预测性能变得越来越重要。行列式系数，即众所周知的r2，是评价回归模型最著名的统计度量之一。然而，这种方法不能用于评价非线性模型。本文解释了使用决定系数的困难，并在以下两种情况下提出了适当的统计措施:交叉验证的均方误差(MSE)，以及试验数据的观测值和预测值的MSE与相关系数。由于了解和正确使用统计方法是必要的，所建议的方法将支持有前途的种子化合物的有效选择和加速药物的发现。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Appropriate Evaluation Measurements for Regression Models

In recent years, accelerating the speed of finding seed compounds and reducing the cost of pharmaceutical research has become a necessity. The contribution of in silico drug discovery methods, which predict candidates as new drugs using physicochemical features and substructure fingerprints of compounds, is thus expected. Selecting the seed compounds without conducting experiments could enable us to reduce the time and cost required for drug development. However, estimating the characteristics of compounds in our body using a simple linear model alone is unsatisfactory because effects and distribution of compounds are determined by the environment in our body and their interactions with other molecules. Compared to simple models, more complex models have been prepared to estimate compound characteristics with high predictive accuracy. Thus, it is increasingly important to correctly evaluate the predictive performance when selecting the models appropriate for research purposes. The determinant coefficient, famous as R 2 , is one of the most famous statistical measures for evaluating regression models. However, this measure cannot be used to evaluate nonlinear models. In this paper, the difficulty of using the determinant coefficient is explained and the proper statistical measures were suggested under the following two conditions: mean squared error (MSE) for cross-validation, and MSE along with correlation coefficients for the observed and predicted values of test data. As understanding statistical measures and using them appropriately is necessary, the suggested measures will support the effective selection of promising seed compounds and accelerate drug discovery.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Chem-Bio Informatics Journal BIOCHEMISTRY & MOLECULAR BIOLOGY-

CiteScore

0.60

自引率

0.00%

发文量