UNIQUE: A Framework for Uncertainty Quantification Benchmarking.

IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Journal of Chemical Information and Modeling Pub Date : 2024-11-14 DOI:10.1021/acs.jcim.4c01578
Jessica Lanini, Minh Tam Davide Huynh, Gaetano Scebba, Nadine Schneider, Raquel Rodríguez-Pérez
{"title":"UNIQUE: A Framework for Uncertainty Quantification Benchmarking.","authors":"Jessica Lanini, Minh Tam Davide Huynh, Gaetano Scebba, Nadine Schneider, Raquel Rodríguez-Pérez","doi":"10.1021/acs.jcim.4c01578","DOIUrl":null,"url":null,"abstract":"<p><p>Machine learning (ML) models have become key in decision-making for many disciplines, including drug discovery and medicinal chemistry. ML models are generally evaluated prior to their usage in high-stakes decisions, such as compound synthesis or experimental testing. However, no ML model is robust or predictive in all real-world scenarios. Therefore, uncertainty quantification (UQ) in ML predictions has gained importance in recent years. Many investigations have focused on developing methodologies that provide accurate uncertainty estimates for ML-based predictions. Unfortunately, there is no UQ strategy that consistently provides robust estimates about model's applicability on new samples. Depending on the dataset, prediction task, and algorithm, accurate uncertainty estimations might be unfeasible to obtain. Moreover, the optimum UQ metric also varies across applications, and previous investigations have shown a lack of consistency across benchmarks. Herein, the UNIQUE (UNcertaInty QUantification bEnchmarking) framework is introduced to facilitate a comparison of UQ strategies in ML-based predictions. This Python library unifies the benchmarking of multiple UQ metrics, including the calculation of nonstandard UQ metrics (combining information from the dataset and model), and provides a comprehensive evaluation. In this framework, UQ metrics are evaluated for different application scenarios, e.g., eliminating the predictions with the lowest confidence or obtaining a reliable uncertainty estimate for an acquisition function. Taken together, this library will help to standardize UQ investigations and evaluate new methodologies.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6000,"publicationDate":"2024-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.4c01578","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

Abstract

Machine learning (ML) models have become key in decision-making for many disciplines, including drug discovery and medicinal chemistry. ML models are generally evaluated prior to their usage in high-stakes decisions, such as compound synthesis or experimental testing. However, no ML model is robust or predictive in all real-world scenarios. Therefore, uncertainty quantification (UQ) in ML predictions has gained importance in recent years. Many investigations have focused on developing methodologies that provide accurate uncertainty estimates for ML-based predictions. Unfortunately, there is no UQ strategy that consistently provides robust estimates about model's applicability on new samples. Depending on the dataset, prediction task, and algorithm, accurate uncertainty estimations might be unfeasible to obtain. Moreover, the optimum UQ metric also varies across applications, and previous investigations have shown a lack of consistency across benchmarks. Herein, the UNIQUE (UNcertaInty QUantification bEnchmarking) framework is introduced to facilitate a comparison of UQ strategies in ML-based predictions. This Python library unifies the benchmarking of multiple UQ metrics, including the calculation of nonstandard UQ metrics (combining information from the dataset and model), and provides a comprehensive evaluation. In this framework, UQ metrics are evaluated for different application scenarios, e.g., eliminating the predictions with the lowest confidence or obtaining a reliable uncertainty estimate for an acquisition function. Taken together, this library will help to standardize UQ investigations and evaluate new methodologies.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
UNIQUE:不确定性量化基准框架。
机器学习(ML)模型已成为许多学科决策的关键,包括药物发现和药物化学。在化合物合成或实验测试等重大决策中使用 ML 模型之前,通常会对其进行评估。然而,没有一个 ML 模型在现实世界的所有情况下都是稳健的或具有预测性的。因此,近年来 ML 预测的不确定性量化(UQ)变得越来越重要。许多研究都侧重于开发能为基于 ML 的预测提供准确不确定性估计的方法。遗憾的是,目前还没有一种不确定性量化策略能始终如一地对模型在新样本上的适用性提供可靠的估计。根据数据集、预测任务和算法的不同,准确的不确定性估计可能难以获得。此外,最佳 UQ 指标也因应用而异,以往的研究表明不同基准之间缺乏一致性。在此,我们引入了 UNIQUE(UNcertaInty QUantification bEnchmarking)框架,以方便比较基于 ML 的预测中的 UQ 策略。这个 Python 库统一了多个 UQ 指标的基准测试,包括非标准 UQ 指标的计算(结合数据集和模型的信息),并提供了全面的评估。在这一框架中,UQ 指标针对不同的应用场景进行评估,例如,剔除置信度最低的预测,或为获取函数获得可靠的不确定性估计。总之,该库将有助于标准化 UQ 调查和评估新方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
期刊最新文献
Structural basis for inositol pyrophosphate gating of the phosphate channel XPR1. Time to take stock. A Divide-and-Conquer Approach to Nanoparticle Global Optimisation Using Machine Learning. Combining a Chemical Language Model and the Structure-Activity Relationship Matrix Formalism for Generative Design of Potent Compounds with Core Structure and Substituent Modifications. Putting wellbeing at the core of diabetes care
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1