遗传规划预处理串联质谱以提高多肽鉴定的可靠性

Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng
{"title":"遗传规划预处理串联质谱以提高多肽鉴定的可靠性","authors":"Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng","doi":"10.1109/CEC.2018.8477810","DOIUrl":null,"url":null,"abstract":"Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Genetic Programming for Preprocessing Tandem Mass Spectra to Improve the Reliability of Peptide Identification\",\"authors\":\"Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng\",\"doi\":\"10.1109/CEC.2018.8477810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.\",\"PeriodicalId\":212677,\"journal\":{\"name\":\"2018 IEEE Congress on Evolutionary Computation (CEC)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Congress on Evolutionary Computation (CEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEC.2018.8477810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2018.8477810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

串联质谱(MS/MS)是目前蛋白质组学中最常用的技术,用于鉴定复杂生物样品中的蛋白质。质谱仪可以产生大量的MS/MS谱图,每个谱图都有数百个峰。这些峰通常包含背景噪声,因此预处理步骤过滤噪声峰可以提高多肽识别的准确性和可靠性。本文提出对数据进行预处理,将峰值分类为噪声峰值或信号峰值,即一个高度不平衡的二值分类任务,并使用遗传规划(GP)来解决该任务。期望提高多肽鉴定的可靠性。同时,除GP算法外,对不同的不平衡比率使用了6种不同的分类算法,并对其平均准确率和召回率进行了评价。在包含1674个MS/MS谱的基准数据集上,GP方法在保留更多信号峰方面表现最好。为了进一步评估GP方法的有效性,将预处理后的光谱数据提交给基准从头测序软件PEAKS,以识别肽。结果表明,与未经预处理的原始数据和基于强度的阈值方法相比,该方法提高了多肽识别的可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Genetic Programming for Preprocessing Tandem Mass Spectra to Improve the Reliability of Peptide Identification
Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic Evolution of AutoEncoders for Compressed Representations Landscape-Based Differential Evolution for Constrained Optimization Problems A Novel Approach for Optimizing Ensemble Components in Rainfall Prediction A Many-Objective Evolutionary Algorithm with Fast Clustering and Reference Point Redistribution Manyobjective Optimization to Design Physical Topology of Optical Networks with Undefined Node Locations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1