Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

Smart medicine Pub Date : 2024-08-27 eCollection Date: 2024-09-01 DOI:10.1002/SMMD.20240014
Qingzu He, Xiang Li, Jinjin Zhong, Gen Yang, Jiahuai Han, Jianwei Shuai
{"title":"Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.","authors":"Qingzu He, Xiang Li, Jinjin Zhong, Gen Yang, Jiahuai Han, Jianwei Shuai","doi":"10.1002/SMMD.20240014","DOIUrl":null,"url":null,"abstract":"<p><p>Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.</p>","PeriodicalId":74816,"journal":{"name":"Smart medicine","volume":"3 3","pages":"e20240014"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11425048/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/SMMD.20240014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Dear-PSM:基于深度学习的多肽搜索引擎,可实现蛋白质组学的全数据库搜索。
肽谱匹配是将质谱数据与肽序列联系起来的过程。一个实验频谱可以匹配数千个候选肽段,这些肽段的修饰方式各不相同,导致候选肽段的数量呈指数级增长。在有限的时间内完成搜索是一项关键挑战。传统搜索通过限制肽段质量误差和可变修饰来加快搜索过程,但这限制了解释能力。为了应对这一挑战,我们提出了支持全数据库搜索的多肽搜索引擎 Dear-PSM。Dear-PSM 不限制肽段质量误差,可将每个频谱与数据库中的所有肽段进行匹配,并将每个肽段的可变修饰数量从传统的 3-20 个增加到更多。利用倒置索引技术,Dear-PSM 创建了一个高性能的实验光谱索引表,并利用深度学习算法进行多肽验证。通过这些技术,Dear-PSM 实现了速度上的突破,在普通台式电脑上比主流搜索引擎快 7 倍,内存消耗显著减少 240 倍。基准测试结果表明,在全数据库搜索模式下,Dear-PSM 在处理使用各种仪器从不同物种收集到的复杂质谱数据时,可以重现主流搜索引擎所获得结果的 90% 以上。此外,它还发现了大量新的多肽和蛋白质。Dear-PSM 已在 GitHub 存储库 https://github.com/jianweishuai/Dear-PSM 上公开发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Polymeric silk fibroin hydrogel as a conductive and multifunctional adhesive for durable skin and epidermal electronics. Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics. Developing functional hydrogels for treatment of oral diseases Sustainable synthesis of carbon dots via bio‐waste recycling for biomedical imaging Engineering strategies for apoptotic bodies
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1