通过 reseek 进行蛋白质结构比对可提高对远端同源物的敏感性。

Robert C Edgar
{"title":"通过 reseek 进行蛋白质结构比对可提高对远端同源物的敏感性。","authors":"Robert C Edgar","doi":"10.1093/bioinformatics/btae687","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.</p><p><strong>Results: </strong>Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a \"mega-alphabet\" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.</p><p><strong>Availability: </strong>https://github.com/rcedgar/reseek.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Protein structure alignment by reseek improves sensitivity to remote homologs.\",\"authors\":\"Robert C Edgar\",\"doi\":\"10.1093/bioinformatics/btae687\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.</p><p><strong>Results: </strong>Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a \\\"mega-alphabet\\\" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.</p><p><strong>Availability: </strong>https://github.com/rcedgar/reseek.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btae687\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btae687","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动因:最近在根据氨基酸序列预测蛋白质折叠方面取得了突破性进展,从而产生了大量新结构,为生物信息学带来了新的机遇和挑战:Reseek是一种基于序列比对的新型蛋白质结构比对算法,蛋白质骨架中的每个残基都用一个字母来表示,这个 "巨型字母表 "包含85,899,345,920(∼1011)种不同的状态。与 DALI、TMalign 和 Foldseek 等最先进的方法相比,Reseek 大大提高了对远端同源物的灵敏度,其速度与之前最快的方法 Foldseek 不相上下。我们对扩展到大型人工智能预测折叠数据库的情况进行了分析。结果表明,Foldseek 的 E 值被低估了几个数量级,而 Reseek 的 E 值与测得的误差率十分吻合。可用性:https://github.com/rcedgar/reseek.Supplementary 信息:补充数据可在 Bioinformatics online 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Protein structure alignment by reseek improves sensitivity to remote homologs.

Motivation: Recent breakthroughs in protein fold prediction from amino acid sequences have unleashed a deluge of new structures, presenting new opportunities and challenges to bioinformatics.

Results: Reseek is a novel protein structure alignment algorithm based on sequence alignment where each residue in the protein backbone is represented by a letter in a "mega-alphabet" of 85,899,345,920 (∼1011) distinct states. Reseek achieves substantially improved sensitivity to remote homologs compared to state-of-the-art methods including DALI, TMalign and Foldseek, with comparable speed to Foldseek, the fastest previous method. Scaling to large databases of AI-predicted folds is analyzed. Foldseek E-values are shown to be under-estimated by several orders of magnitude, while Reseek E-values are in good agreement with measured error rates.

Availability: https://github.com/rcedgar/reseek.

Supplementary information: Supplementary data are available at Bioinformatics online.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
RUCova: Removal of Unwanted Covariance in mass cytometry data. ViraLM: Empowering Virus Discovery through the Genome Foundation Model. CVR-BBI: An Open-Source VR Platform for Multi-User Collaborative Brain to Brain Interfaces. Expert-guided protein Language Models enable accurate and blazingly fast fitness prediction. FungiFun3: Systemic gene set enrichment analysis for fungal species.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1