用于基因组研究的一系列可解释和可解释的机器学习方法

IF 4.4 2区 数学 Q1 STATISTICS & PROBABILITY Wiley Interdisciplinary Reviews-Computational Statistics Pub Date : 2023-05-04 DOI:10.1002/wics.1617
A. M. Conard, Alan DenAdel, Lorin Crawford
{"title":"用于基因组研究的一系列可解释和可解释的机器学习方法","authors":"A. M. Conard, Alan DenAdel, Lorin Crawford","doi":"10.1002/wics.1617","DOIUrl":null,"url":null,"abstract":"The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":4.4000,"publicationDate":"2023-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A spectrum of explainable and interpretable machine learning approaches for genomic studies\",\"authors\":\"A. M. Conard, Alan DenAdel, Lorin Crawford\",\"doi\":\"10.1002/wics.1617\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.\",\"PeriodicalId\":47779,\"journal\":{\"name\":\"Wiley Interdisciplinary Reviews-Computational Statistics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2023-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Wiley Interdisciplinary Reviews-Computational Statistics\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1002/wics.1617\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"STATISTICS & PROBABILITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Wiley Interdisciplinary Reviews-Computational Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1002/wics.1617","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 4

摘要

高通量基因组分析的进步导致大规模生物数据集的可用性大幅增长。在过去的二十年里,这些日益复杂的数据需要比传统线性模型更复杂的统计方法。神经网络等机器学习方法在许多生物医学应用中为基于预测的任务带来了最先进的性能。然而,这些机器学习模型的一个显著缺点是,它们通常不会揭示如何或为什么做出准确的预测。在生物医学的许多领域,这种“黑匣子”特性可能不太理想,尤其是当需要对生物系统进行计算机假设测试时,除了为下游决策证明模型发现的合理性外,例如确定最佳的下一个实验或治疗策略。为了克服这个问题,出现了可解释和可解释的机器学习方法。虽然可解释的方法试图获得对模型所学内容的事后理解,但可解释的模型被设计为固有地提供其参数和架构的可理解定义。在这里,我们回顾了从黑匣子和可解释到可解释的机器学习方法的模型透明度谱。受基因组学应用的启发,我们提供了这一领域的进展背景,详细介绍了监督和非监督学习的具体方法。重要的是,我们专注于在构建用于生物医学应用的可解释机器学习方法时结合现有生物学知识的前景。然后,我们以这一领域新发展的考虑和机遇作为结束语。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A spectrum of explainable and interpretable machine learning approaches for genomic studies
The advancement of high‐throughput genomic assays has led to enormous growth in the availability of large‐scale biological datasets. Over the last two decades, these increasingly complex data have required statistical approaches that are more sophisticated than traditional linear models. Machine learning methodologies such as neural networks have yielded state‐of‐the‐art performance for prediction‐based tasks in many biomedical applications. However, a notable downside of these machine learning models is that they typically do not reveal how or why accurate predictions are made. In many areas of biomedicine, this “black box” property can be less than desirable—particularly when there is a need to perform in silico hypothesis testing about a biological system, in addition to justifying model findings for downstream decision‐making, such as determining the best next experiment or treatment strategy. Explainable and interpretable machine learning approaches have emerged to overcome this issue. While explainable methods attempt to derive post hoc understanding of what a model has learned, interpretable models are designed to inherently provide an intelligible definition of their parameters and architecture. Here, we review the model transparency spectrum moving from black box and explainable, to interpretable machine learning methodology. Motivated by applications in genomics, we provide background on the advances across this spectrum, detailing specific approaches in both supervised and unsupervised learning. Importantly, we focus on the promise of incorporating existing biological knowledge when constructing interpretable machine learning methods for biomedical applications. We then close with considerations and opportunities for new development in this space.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.20
自引率
0.00%
发文量
31
期刊最新文献
Neuroimaging statistical approaches for determining neural correlates of Alzheimer's disease via positron emission tomography imaging. A spectrum of explainable and interpretable machine learning approaches for genomic studies Functional neuroimaging in the era of Big Data and Open Science: A modern overview Information criteria for model selection Data Integration in Causal Inference.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1