Availability of MudPIT data for classification of biological samples.

Journal of clinical bioinformatics Pub Date : 2013-01-14 DOI:10.1186/2043-9113-3-1

Dario Di Silvestre, Italo Zoppis, Francesca Brambilla, Valeria Bellettato, Giancarlo Mauri, Pierluigi Mauri

{"title":"Availability of MudPIT data for classification of biological samples.","authors":"Dario Di Silvestre, Italo Zoppis, Francesca Brambilla, Valeria Bellettato, Giancarlo Mauri, Pierluigi Mauri","doi":"10.1186/2043-9113-3-1","DOIUrl":null,"url":null,"abstract":"Unlabelled: Background: Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins.Results: Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software.Conclusions: These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.","PeriodicalId":73663,"journal":{"name":"Journal of clinical bioinformatics","volume":" ","pages":"1"},"PeriodicalIF":0.0000,"publicationDate":"2013-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/2043-9113-3-1","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of clinical bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/2043-9113-3-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

Unlabelled:

Background: Mass spectrometry is an important analytical tool for clinical proteomics. Primarily employed for biomarker discovery, it is increasingly used for developing methods which may help to provide unambiguous diagnosis of biological samples. In this context, we investigated the classification of phenotypes by applying support vector machine (SVM) on experimental data obtained by MudPIT approach. In particular, we compared the performance capabilities of SVM by using two independent collection of complex samples and different data-types, such as mass spectra (m/z), peptides and proteins.

Results: Globally, protein and peptide data allowed a better discriminant informative content than experimental mass spectra (overall accuracy higher than 87% in both collection 1 and 2). These results indicate that sequencing of peptides and proteins reduces the experimental noise affecting the raw mass spectra, and allows the extraction of more informative features available for the effective classification of samples. In addition, proteins and peptides features selected by SVM matched for 80% with the differentially expressed proteins identified by the MAProMa software.

Conclusions: These findings confirm the availability of the most label-free quantitative methods based on processing of spectral count and SEQUEST-based SCORE values. On the other hand, it stresses the usefulness of MudPIT data for a correct grouping of sample phenotypes, by applying both supervised and unsupervised learning algorithms. This capacity permit the evaluation of actual samples and it is a good starting point to translate proteomic methodology to clinical application.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MudPIT数据对生物样本分类的可用性。

背景:质谱法是临床蛋白质组学的重要分析工具。它主要用于生物标志物的发现，越来越多地用于开发可能有助于提供生物样品明确诊断的方法。在此背景下，我们利用支持向量机(SVM)对MudPIT方法获得的实验数据进行了表型分类研究。特别地，我们通过使用两个独立的复杂样本和不同的数据类型(如质谱(m/z)，肽和蛋白质)来比较支持向量机的性能。结果:总体而言，蛋白质和多肽数据比实验质谱具有更好的判别性信息含量(收集1和收集2的总体准确率均高于87%)。这些结果表明，肽和蛋白质的测序减少了影响原始质谱的实验噪声，并允许提取更多信息特征，用于有效分类样品。此外，SVM选择的蛋白质和多肽特征与MAProMa软件识别的差异表达蛋白的匹配率为80%。结论:这些发现证实了大多数基于谱计数处理和基于sequest的SCORE值的无标签定量方法的有效性。另一方面，它强调了MudPIT数据通过应用监督和无监督学习算法对样本表型进行正确分组的有用性。这种能力允许实际样品的评估，这是一个很好的起点，将蛋白质组学方法转化为临床应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助