Dear-PSM：基于深度学习的多肽搜索引擎，可实现蛋白质组学的全数据库搜索。

IF 11.6 Smart medicine Pub Date : 2024-08-27 eCollection Date: 2024-09-01 DOI:10.1002/SMMD.20240014

Qingzu He, Xiang Li, Jinjin Zhong, Gen Yang, Jiahuai Han, Jianwei Shuai

{"title":"Dear-PSM：基于深度学习的多肽搜索引擎，可实现蛋白质组学的全数据库搜索。","authors":"Qingzu He, Xiang Li, Jinjin Zhong, Gen Yang, Jiahuai Han, Jianwei Shuai","doi":"10.1002/SMMD.20240014","DOIUrl":null,"url":null,"abstract":"Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.","PeriodicalId":74816,"journal":{"name":"Smart medicine","volume":"3 3","pages":"e20240014"},"PeriodicalIF":11.6000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11425048/pdf/","citationCount":"0","resultStr":"{\"title\":\"Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.\",\"authors\":\"Qingzu He, Xiang Li, Jinjin Zhong, Gen Yang, Jiahuai Han, Jianwei Shuai\",\"doi\":\"10.1002/SMMD.20240014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.\",\"PeriodicalId\":74816,\"journal\":{\"name\":\"Smart medicine\",\"volume\":\"3 3\",\"pages\":\"e20240014\"},\"PeriodicalIF\":11.6000,\"publicationDate\":\"2024-08-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11425048/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Smart medicine\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/SMMD.20240014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Smart medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/SMMD.20240014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

肽谱匹配是将质谱数据与肽序列联系起来的过程。一个实验频谱可以匹配数千个候选肽段，这些肽段的修饰方式各不相同，导致候选肽段的数量呈指数级增长。在有限的时间内完成搜索是一项关键挑战。传统搜索通过限制肽段质量误差和可变修饰来加快搜索过程，但这限制了解释能力。为了应对这一挑战，我们提出了支持全数据库搜索的多肽搜索引擎 Dear-PSM。Dear-PSM 不限制肽段质量误差，可将每个频谱与数据库中的所有肽段进行匹配，并将每个肽段的可变修饰数量从传统的 3-20 个增加到更多。利用倒置索引技术，Dear-PSM 创建了一个高性能的实验光谱索引表，并利用深度学习算法进行多肽验证。通过这些技术，Dear-PSM 实现了速度上的突破，在普通台式电脑上比主流搜索引擎快 7 倍，内存消耗显著减少 240 倍。基准测试结果表明，在全数据库搜索模式下，Dear-PSM 在处理使用各种仪器从不同物种收集到的复杂质谱数据时，可以重现主流搜索引擎所获得结果的 90% 以上。此外，它还发现了大量新的多肽和蛋白质。Dear-PSM 已在 GitHub 存储库 https://github.com/jianweishuai/Dear-PSM 上公开发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Dear-PSM: A deep learning-based peptide search engine enables full database search for proteomics.

Peptide spectrum matching is the process of linking mass spectrometry data with peptide sequences. An experimental spectrum can match thousands of candidate peptides with variable modifications leading to an exponential increase in candidates. Completing the search within a limited time is a key challenge. Traditional searches expedite the process by restricting peptide mass errors and variable modifications, but this limits interpretive capability. To address this challenge, we propose Dear-PSM, a peptide search engine that supports full database searching. Dear-PSM does not restrict peptide mass errors, matching each spectrum to all peptides in the database and increasing the number of variable modifications per peptide from the conventional 3-20. Leveraging inverted index technology, Dear-PSM creates a high-performance index table of experimental spectra and utilizes deep learning algorithms for peptide validation. Through these techniques, Dear-PSM achieves a speed breakthrough 7 times faster than mainstream search engines on a regular desktop computer, with a remarkable 240-fold reduction in memory consumption. Benchmark test results demonstrate that Dear-PSM, in full database search mode, can reproduce over 90% of the results obtained by mainstream search engines when handling complex mass spectrometry data collected from different species using various instruments. Furthermore, it uncovers a substantial number of new peptides and proteins. Dear-PSM has been publicly released on the GitHub repository https://github.com/jianweishuai/Dear-PSM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Smart medicine

自引率

0.00%

发文量