Identifying PE2 and PE5 Proteins from Existing Mass Spectrometry Data Using pFind

IF 3.6 2区生物学 Q1 BIOCHEMICAL RESEARCH METHODS Journal of Proteome Research Pub Date : 2024-06-12 DOI:10.1021/acs.jproteome.3c00674

Qianzhou Wei, Jiamin Li, Qing-Yu He, Yang Chen* and Gong Zhang*,

{"title":"Identifying PE2 and PE5 Proteins from Existing Mass Spectrometry Data Using pFind","authors":"Qianzhou Wei, Jiamin Li, Qing-Yu He, Yang Chen* and Gong Zhang*, ","doi":"10.1021/acs.jproteome.3c00674","DOIUrl":null,"url":null,"abstract":"<p >The Chromosome-Centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome. Currently, the human proteome still contains approximately 2000 PE2–PE5 proteins, referring to annotated coding genes that lack sufficient protein-level evidence. During the past 10 years, it has been increasingly difficult to identify PE2–PE5 proteins in C-HPP approaches due to the limited occurrence. Therefore, we proposed that reanalyzing massive MS data sets in repository with newly developed algorithms may increase the occurrence of the peptides of these proteins. In this study, we downloaded 1000 MS data sets via the ProteomeXchange database. Using pFind software, we identified peptides referring to 1788 PE2–PE5 proteins. Among them, 11 PE2 and 16 PE5 proteins were identified with at least 2 peptides, and 12 of them were identified using 2 peptides in a single data set, following the criteria of the HPP guidelines. We found translation evidence for 16 of the 11 PE2 and 16 PE5 proteins in our RNC-seq data, supporting their existence. The properties of the PE2 and PE5 proteins were similar to those of the PE1 proteins. Our approach demonstrated that mining PE2 and PE5 proteins in massive data repository is still worthy, and multidata set peptide identifications may support the presence of PE2 and PE5 proteins or at least prompt additional studies for validation. Extremely high throughput could be a solution to finding more PE2 and PE5 proteins.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":"23 7","pages":"2323–2331"},"PeriodicalIF":3.6000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Proteome Research","FirstCategoryId":"99","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jproteome.3c00674","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The Chromosome-Centric Human Proteome Project (C-HPP) aims to identify all proteins encoded by the human genome. Currently, the human proteome still contains approximately 2000 PE2–PE5 proteins, referring to annotated coding genes that lack sufficient protein-level evidence. During the past 10 years, it has been increasingly difficult to identify PE2–PE5 proteins in C-HPP approaches due to the limited occurrence. Therefore, we proposed that reanalyzing massive MS data sets in repository with newly developed algorithms may increase the occurrence of the peptides of these proteins. In this study, we downloaded 1000 MS data sets via the ProteomeXchange database. Using pFind software, we identified peptides referring to 1788 PE2–PE5 proteins. Among them, 11 PE2 and 16 PE5 proteins were identified with at least 2 peptides, and 12 of them were identified using 2 peptides in a single data set, following the criteria of the HPP guidelines. We found translation evidence for 16 of the 11 PE2 and 16 PE5 proteins in our RNC-seq data, supporting their existence. The properties of the PE2 and PE5 proteins were similar to those of the PE1 proteins. Our approach demonstrated that mining PE2 and PE5 proteins in massive data repository is still worthy, and multidata set peptide identifications may support the presence of PE2 and PE5 proteins or at least prompt additional studies for validation. Extremely high throughput could be a solution to finding more PE2 and PE5 proteins.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用 pFind 从现有质谱数据中识别 PE2 和 PE5 蛋白质。

以染色体为中心的人类蛋白质组计划（C-HPP）旨在鉴定人类基因组编码的所有蛋白质。目前，人类蛋白质组仍包含约 2000 个 PE2-PE5 蛋白，指的是缺乏足够蛋白质级证据的注释编码基因。在过去 10 年中，由于 PE2-PE5 蛋白的出现率有限，用 C-HPP 方法鉴定 PE2-PE5 蛋白越来越困难。因此，我们提出利用新开发的算法重新分析存储库中的海量 MS 数据集可能会增加这些蛋白质肽段的出现率。在这项研究中，我们通过 ProteomeXchange 数据库下载了 1000 个 MS 数据集。利用 pFind 软件，我们鉴定了 1788 个 PE2-PE5 蛋白的肽段。其中，11 个 PE2 蛋白和 16 个 PE5 蛋白被鉴定出至少 2 个肽段，其中 12 个蛋白在单个数据集中被鉴定出 2 个肽段，符合 HPP 指南的标准。我们在 RNC-seq 数据中发现了 11 个 PE2 蛋白和 16 个 PE5 蛋白中 16 个的翻译证据，证明了它们的存在。PE2 和 PE5 蛋白的特性与 PE1 蛋白相似。我们的方法表明，在海量数据资源库中挖掘 PE2 和 PE5 蛋白仍然是有价值的，多数据集肽链鉴定可能会支持 PE2 和 PE5 蛋白的存在，或至少促使更多的研究进行验证。极高的通量可能是找到更多 PE2 和 PE5 蛋白的一个解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Proteome Research 生物-生化研究方法

CiteScore

9.00

自引率

4.50%

发文量

251

审稿时长

3 months

期刊介绍： Journal of Proteome Research publishes content encompassing all aspects of global protein analysis and function, including the dynamic aspects of genomics, spatio-temporal proteomics, metabonomics and metabolomics, clinical and agricultural proteomics, as well as advances in methodology including bioinformatics. The theme and emphasis is on a multidisciplinary approach to the life sciences through the synergy between the different types of "omics".