Qianzhou Wei, Jiamin Li, Qing-Yu He, Yang Chen* and Gong Zhang*,
{"title":"Identifying PE2 and PE5 Proteins from Existing Mass Spectrometry Data Using pFind","authors":"Qianzhou Wei, Jiamin Li, Qing-Yu He, Yang Chen* and Gong Zhang*, ","doi":"10.1021/acs.jproteome.3c00674","DOIUrl":null,"url":null,"abstract":"<p >The Chromosome-Centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome. Currently, the human proteome still contains approximately 2000 PE2–PE5 proteins, referring to annotated coding genes that lack sufficient protein-level evidence. During the past 10 years, it has been increasingly difficult to identify PE2–PE5 proteins in C-HPP approaches due to the limited occurrence. Therefore, we proposed that reanalyzing massive MS data sets in repository with newly developed algorithms may increase the occurrence of the peptides of these proteins. In this study, we downloaded 1000 MS data sets via the ProteomeXchange database. Using pFind software, we identified peptides referring to 1788 PE2–PE5 proteins. Among them, 11 PE2 and 16 PE5 proteins were identified with at least 2 peptides, and 12 of them were identified using 2 peptides in a single data set, following the criteria of the HPP guidelines. We found translation evidence for 16 of the 11 PE2 and 16 PE5 proteins in our RNC-seq data, supporting their existence. The properties of the PE2 and PE5 proteins were similar to those of the PE1 proteins. Our approach demonstrated that mining PE2 and PE5 proteins in massive data repository is still worthy, and multidata set peptide identifications may support the presence of PE2 and PE5 proteins or at least prompt additional studies for validation. Extremely high throughput could be a solution to finding more PE2 and PE5 proteins.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":3.8000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jproteome.3c00674","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The Chromosome-Centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome. Currently, the human proteome still contains approximately 2000 PE2–PE5 proteins, referring to annotated coding genes that lack sufficient protein-level evidence. During the past 10 years, it has been increasingly difficult to identify PE2–PE5 proteins in C-HPP approaches due to the limited occurrence. Therefore, we proposed that reanalyzing massive MS data sets in repository with newly developed algorithms may increase the occurrence of the peptides of these proteins. In this study, we downloaded 1000 MS data sets via the ProteomeXchange database. Using pFind software, we identified peptides referring to 1788 PE2–PE5 proteins. Among them, 11 PE2 and 16 PE5 proteins were identified with at least 2 peptides, and 12 of them were identified using 2 peptides in a single data set, following the criteria of the HPP guidelines. We found translation evidence for 16 of the 11 PE2 and 16 PE5 proteins in our RNC-seq data, supporting their existence. The properties of the PE2 and PE5 proteins were similar to those of the PE1 proteins. Our approach demonstrated that mining PE2 and PE5 proteins in massive data repository is still worthy, and multidata set peptide identifications may support the presence of PE2 and PE5 proteins or at least prompt additional studies for validation. Extremely high throughput could be a solution to finding more PE2 and PE5 proteins.
期刊介绍:
Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".