机器学习时代的变异效应预测

IF 6.9 2区 生物学 Q1 CELL BIOLOGY Cold Spring Harbor perspectives in biology Pub Date : 2024-04-15 DOI:10.1101/cshperspect.a041467
Yana Bromberg, R. Prabakaran, Anowarul Kabir, Amarda Shehu
{"title":"机器学习时代的变异效应预测","authors":"Yana Bromberg, R. Prabakaran, Anowarul Kabir, Amarda Shehu","doi":"10.1101/cshperspect.a041467","DOIUrl":null,"url":null,"abstract":"Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein “sentences”? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.","PeriodicalId":10494,"journal":{"name":"Cold Spring Harbor perspectives in biology","volume":null,"pages":null},"PeriodicalIF":6.9000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Variant Effect Prediction in the Age of Machine Learning\",\"authors\":\"Yana Bromberg, R. Prabakaran, Anowarul Kabir, Amarda Shehu\",\"doi\":\"10.1101/cshperspect.a041467\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein “sentences”? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.\",\"PeriodicalId\":10494,\"journal\":{\"name\":\"Cold Spring Harbor perspectives in biology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.9000,\"publicationDate\":\"2024-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cold Spring Harbor perspectives in biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1101/cshperspect.a041467\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CELL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cold Spring Harbor perspectives in biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/cshperspect.a041467","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CELL BIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

多年来,人们创造了许多计算方法,用于分析基因组编码区单核苷酸变异产生的单氨基酸置换的影响。从历史上看,所有方法都是有监督的,因此受到实验数据集规模不足和缺乏变异效应标准化定义的限制。基于深度学习(DL)的无监督方法的出现提出了一个重要问题:机器能否从未注释的蛋白质序列数据中学习到足够好的生命语言,以识别蛋白质 "句子 "中的重大错误?我们的分析表明,一些无监督方法的表现与现有的有监督方法不相上下,甚至更好。无监督方法的速度也更快,因此可用于大规模变异评估。然而,对于所有其他方法来说,它们的性能因评价指标和预测的变异效应类型而异。我们还注意到,对研究较少的非人类蛋白质的方法性能评估仍然缺乏,而无监督方法在这方面最有前途。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Variant Effect Prediction in the Age of Machine Learning
Over the years, many computational methods have been created for the analysis of the impact of single amino acid substitutions resulting from single-nucleotide variants in genome coding regions. Historically, all methods have been supervised and thus limited by the inadequate sizes of experimentally curated data sets and by the lack of a standardized definition of variant effect. The emergence of unsupervised, deep learning (DL)-based methods raised an important question: Can machines learn the language of life from the unannotated protein sequence data well enough to identify significant errors in the protein “sentences”? Our analysis suggests that some unsupervised methods perform as well or better than existing supervised methods. Unsupervised methods are also faster and can, thus, be useful in large-scale variant evaluations. For all other methods, however, their performance varies by both evaluation metrics and by the type of variant effect being predicted. We also note that the evaluation of method performance is still lacking on less-studied, nonhuman proteins where unsupervised methods hold the most promise.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
15.00
自引率
1.40%
发文量
56
审稿时长
3-8 weeks
期刊介绍: Cold Spring Harbor Perspectives in Biology offers a comprehensive platform in the molecular life sciences, featuring reviews that span molecular, cell, and developmental biology, genetics, neuroscience, immunology, cancer biology, and molecular pathology. This online publication provides in-depth insights into various topics, making it a valuable resource for those engaged in diverse aspects of biological research.
期刊最新文献
Mechanisms of Alternative Lengthening of Telomeres. Rediscovering and Unrediscovering Gregor Mendel: His Life, Times, and Intellectual Context. Teaching School Genetics in the 2020s: Why "Naive" Mendelian Genetics Has to Go. The Role of Microhomology-Mediated End Joining (MMEJ) at Dysfunctional Telomeres. Modeling the Emergence of Circuit Organization and Function during Development.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1