A transformer-based semi-autoregressive framework for high-speed and accurate de novo peptide sequencing.

IF 5.1 1区 生物学 Q1 BIOLOGY Communications Biology Pub Date : 2025-02-14 DOI:10.1038/s42003-025-07584-0
Yang Zhao, Shuo Wang, Jinze Huang, Bo Meng, Dong An, Xiang Fang, Yaoguang Wei, Xinhua Dai
{"title":"A transformer-based semi-autoregressive framework for high-speed and accurate de novo peptide sequencing.","authors":"Yang Zhao, Shuo Wang, Jinze Huang, Bo Meng, Dong An, Xiang Fang, Yaoguang Wei, Xinhua Dai","doi":"10.1038/s42003-025-07584-0","DOIUrl":null,"url":null,"abstract":"<p><p>De novo peptide sequencing directly identifies peptides from mass spectrometry data, playing a critical role in discovering novel proteins and analyzing complex biological samples without reliance on existing databases. To address challenges in both speed and accuracy, a transformer-based model, TSARseqNovo, incorporates two key innovations: a Semi-Autoregressive decoder for parallel prediction of multiple amino acids and a Masking Refinement decoder for refining low-confidence predictions. These features significantly enhance sequencing efficiency and accuracy. Evaluations on the Nine-Species, Aggregated, and Glycoproteomic datasets, demonstrate that TSARseqNovo outperforms state-of-the-art models, including CasaNovo, NovoB, InstaNovo + , and π-HelixNovo. Specifically, TSARseqNovo achieves up to a 2-fold speed increase over CasaNovo and π-HelixNovo, and approximately 10-fold over NovoB and InstaNovo + , while also showing substantial improvements in peptide prediction precision, especially for long peptides. These advancements position TSARseqNovo as a powerful tool for accelerating high-throughput proteomics research and addressing increasingly complex biological questions.</p>","PeriodicalId":10552,"journal":{"name":"Communications Biology","volume":"8 1","pages":"234"},"PeriodicalIF":5.1000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11825679/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Communications Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1038/s42003-025-07584-0","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

De novo peptide sequencing directly identifies peptides from mass spectrometry data, playing a critical role in discovering novel proteins and analyzing complex biological samples without reliance on existing databases. To address challenges in both speed and accuracy, a transformer-based model, TSARseqNovo, incorporates two key innovations: a Semi-Autoregressive decoder for parallel prediction of multiple amino acids and a Masking Refinement decoder for refining low-confidence predictions. These features significantly enhance sequencing efficiency and accuracy. Evaluations on the Nine-Species, Aggregated, and Glycoproteomic datasets, demonstrate that TSARseqNovo outperforms state-of-the-art models, including CasaNovo, NovoB, InstaNovo + , and π-HelixNovo. Specifically, TSARseqNovo achieves up to a 2-fold speed increase over CasaNovo and π-HelixNovo, and approximately 10-fold over NovoB and InstaNovo + , while also showing substantial improvements in peptide prediction precision, especially for long peptides. These advancements position TSARseqNovo as a powerful tool for accelerating high-throughput proteomics research and addressing increasingly complex biological questions.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于变压器的半自回归框架,用于高速和准确的从头肽测序。
De novo肽测序直接从质谱数据中识别肽,在发现新的蛋白质和分析复杂的生物样品中发挥着关键作用,而不依赖于现有的数据库。为了解决速度和准确性方面的挑战,基于变压器的模型TSARseqNovo结合了两个关键创新:用于并行预测多个氨基酸的半自回归解码器和用于改进低置信度预测的掩蔽优化解码器。这些特征显著提高了测序效率和准确性。对9种、聚合和糖蛋白组学数据集的评估表明,TSARseqNovo优于最先进的模型,包括CasaNovo、NovoB、InstaNovo +和π-HelixNovo。具体而言,TSARseqNovo比CasaNovo和π-HelixNovo的速度提高了2倍,比NovoB和InstaNovo +的速度提高了约10倍,同时在肽预测精度方面也有了很大的提高,特别是对长肽的预测精度。这些进展使TSARseqNovo成为加速高通量蛋白质组学研究和解决日益复杂的生物学问题的有力工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Communications Biology
Communications Biology Medicine-Medicine (miscellaneous)
CiteScore
8.60
自引率
1.70%
发文量
1233
审稿时长
13 weeks
期刊介绍: Communications Biology is an open access journal from Nature Research publishing high-quality research, reviews and commentary in all areas of the biological sciences. Research papers published by the journal represent significant advances bringing new biological insight to a specialized area of research.
期刊最新文献
Genome-wide ribosome profiling reveals a dynamic translational landscape in Arabidopsis seedling roots under simulated microgravity. Head stabilization behavior and underlying circuit mechanisms in larval zebrafish. Representation Transfer via Invariant Input-driven Continuous Attractors for Fast Domain Adaptation. Palmitoleic acid promoted by BMPR2 signaling primes CD169 macrophages and alleviates liver fibrosis. Quantitative comparison of methods for widespread delivery of small molecules across the blood-brain barrier.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1