遗传规划预处理串联质谱以提高多肽鉴定的可靠性

2018 IEEE Congress on Evolutionary Computation (CEC) Pub Date : 2018-07-01 DOI:10.1109/CEC.2018.8477810

Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng

{"title":"遗传规划预处理串联质谱以提高多肽鉴定的可靠性","authors":"Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng","doi":"10.1109/CEC.2018.8477810","DOIUrl":null,"url":null,"abstract":"Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.","PeriodicalId":212677,"journal":{"name":"2018 IEEE Congress on Evolutionary Computation (CEC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Genetic Programming for Preprocessing Tandem Mass Spectra to Improve the Reliability of Peptide Identification\",\"authors\":\"Samaneh Azari, Mengjie Zhang, Bing Xue, Lifeng Peng\",\"doi\":\"10.1109/CEC.2018.8477810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.\",\"PeriodicalId\":212677,\"journal\":{\"name\":\"2018 IEEE Congress on Evolutionary Computation (CEC)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE Congress on Evolutionary Computation (CEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEC.2018.8477810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2018.8477810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

串联质谱(MS/MS)是目前蛋白质组学中最常用的技术，用于鉴定复杂生物样品中的蛋白质。质谱仪可以产生大量的MS/MS谱图，每个谱图都有数百个峰。这些峰通常包含背景噪声，因此预处理步骤过滤噪声峰可以提高多肽识别的准确性和可靠性。本文提出对数据进行预处理，将峰值分类为噪声峰值或信号峰值，即一个高度不平衡的二值分类任务，并使用遗传规划(GP)来解决该任务。期望提高多肽鉴定的可靠性。同时，除GP算法外，对不同的不平衡比率使用了6种不同的分类算法，并对其平均准确率和召回率进行了评价。在包含1674个MS/MS谱的基准数据集上，GP方法在保留更多信号峰方面表现最好。为了进一步评估GP方法的有效性，将预处理后的光谱数据提交给基准从头测序软件PEAKS，以识别肽。结果表明，与未经预处理的原始数据和基于强度的阈值方法相比，该方法提高了多肽识别的可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Genetic Programming for Preprocessing Tandem Mass Spectra to Improve the Reliability of Peptide Identification

Tandem mass spectrometry (MS/MS) is currently the most commonly used technology in proteomics for identifying proteins in complex biological samples. Mass spectrometers can produce a large number of MS/MS spectra each of which has hundreds of peaks. These peaks normally contain background noise, therefore a preprocessing step to filter the noise peaks can improve the accuracy and reliability of peptide identification. This paper proposes to preprocess the data by classifying peaks as noise peaks or signal peaks, i.e., a highly-imbalanced binary classification task, and uses genetic programming (GP) to address this task. The expectation is to increase the peptide identification reliability. Meanwhile, six different types of classification algorithms in addition to GP are used on various imbalance ratios and evaluated in terms of the average accuracy and recall. The GP method appears to be the best in the retention of more signal peaks as examined on a benchmark dataset containing 1, 674 MS/MS spectra. To further evaluate the effectiveness of the GP method, the preprocessed spectral data is submitted to a benchmark de novo sequencing software, PEAKS, to identify the peptides. The results show that the proposed method improves the reliability of peptide identification compared to the original un-preprocessed data and the intensity-based thresholding methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE Congress on Evolutionary Computation (CEC)

自引率

0.00%

发文量