Rethinking the applicability domain analysis in QSAR models

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC ACS Applied Electronic Materials Pub Date : 2024-02-14 DOI:10.1007/s10822-024-00550-8
Jose R. Mora, Edgar A. Marquez, Noel Pérez-Pérez, Ernesto Contreras-Torres, Yunierkis Perez-Castillo, Guillermin Agüero-Chapin, Felix Martinez-Rios, Yovani Marrero-Ponce, Stephen J. Barigye
{"title":"Rethinking the applicability domain analysis in QSAR models","authors":"Jose R. Mora,&nbsp;Edgar A. Marquez,&nbsp;Noel Pérez-Pérez,&nbsp;Ernesto Contreras-Torres,&nbsp;Yunierkis Perez-Castillo,&nbsp;Guillermin Agüero-Chapin,&nbsp;Felix Martinez-Rios,&nbsp;Yovani Marrero-Ponce,&nbsp;Stephen J. Barigye","doi":"10.1007/s10822-024-00550-8","DOIUrl":null,"url":null,"abstract":"<div><p>Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between <i>in silico</i> predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in “rational” model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.</p></div>","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10822-024-00550-8","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in “rational” model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
重新思考 QSAR 模型中的适用域分析。
尽管 QSAR 建模广泛采用了 OECD 原则(或最佳实践),但硅学预测与实验结果之间经常出现差异,这表明模型预测往往过于乐观。在这些 OECD 原则中,适用域(AD)估算在一些文献报告中被认为是最具挑战性的原则之一,这意味着模型预测的实际可靠性度量往往不可靠。对文献中报道的、QsarDB 数据库中的 5 个 QSAR 模型(即雄激素受体生物活性(分别为激动剂、拮抗剂和粘合剂)和膜渗透性(最高膜渗透性和内在渗透性),我们证明了被错误地标记为可靠的预测(AD 预测错误)绝大多数对应于预测错误率最高的子空间(队列)中的实例,突出了 AD 空间的不均匀性。从这个意义上说,我们呼吁制定更严格的 AD 分析指南,要求纳入模型误差分析方案,以提供对基础 AD 算法可靠性的重要见解。此外,任何选定的 AD 方法都应经过严格验证,以证明其适用于所应用的模型空间。这些步骤最终将有助于更准确地估计模型预测的可靠性。最后,误差分析还有助于 "合理 "地完善模型,因为数据扩展工作和模型再训练都将重点放在误差率最高的队列上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
期刊最新文献
Vitamin B12: prevention of human beings from lethal diseases and its food application. Current status and obstacles of narrowing yield gaps of four major crops. Cold shock treatment alleviates pitting in sweet cherry fruit by enhancing antioxidant enzymes activity and regulating membrane lipid metabolism. Removal of proteins and lipids affects structure, in vitro digestion and physicochemical properties of rice flour modified by heat-moisture treatment. Investigating the impact of climate variables on the organic honey yield in Turkey using XGBoost machine learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1