Pub Date : 2024-12-06Epub Date: 2024-10-25DOI: 10.1021/acs.jproteome.4c00665
Muzaffer Arıkan, Başak Atabay
In metaproteomics studies, constructing a reference protein sequence database that is both comprehensive and not overly large is critical for the peptide identification step. Therefore, the availability of well-curated reference databases and tools for custom database construction is essential to enhance the performance of metaproteomics analyses. In this review, we first provide an overview of metaproteomics by presenting a concise historical background, outlining a typical experimental and bioinformatics workflow, emphasizing the crucial step of constructing a protein sequence database for metaproteomics. We then delve into the current tools available for building such databases, highlighting their individual approaches, utility, and advantages and limitations. Next, we examine existing protein sequence databases, detailing their scope and relevance in metaproteomics research. Then, we provide practical recommendations for constructing protein sequence databases for metaproteomics, along with an overview of the current challenges in this area. We conclude with a discussion of anticipated advancements, emerging trends, and future directions in the construction of protein sequence databases for metaproteomics.
{"title":"Construction of Protein Sequence Databases for Metaproteomics: A Review of the Current Tools and Databases.","authors":"Muzaffer Arıkan, Başak Atabay","doi":"10.1021/acs.jproteome.4c00665","DOIUrl":"10.1021/acs.jproteome.4c00665","url":null,"abstract":"<p><p>In metaproteomics studies, constructing a reference protein sequence database that is both comprehensive and not overly large is critical for the peptide identification step. Therefore, the availability of well-curated reference databases and tools for custom database construction is essential to enhance the performance of metaproteomics analyses. In this review, we first provide an overview of metaproteomics by presenting a concise historical background, outlining a typical experimental and bioinformatics workflow, emphasizing the crucial step of constructing a protein sequence database for metaproteomics. We then delve into the current tools available for building such databases, highlighting their individual approaches, utility, and advantages and limitations. Next, we examine existing protein sequence databases, detailing their scope and relevance in metaproteomics research. Then, we provide practical recommendations for constructing protein sequence databases for metaproteomics, along with an overview of the current challenges in this area. We conclude with a discussion of anticipated advancements, emerging trends, and future directions in the construction of protein sequence databases for metaproteomics.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5250-5262"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142491147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06Epub Date: 2024-11-24DOI: 10.1021/acs.jproteome.4c00820
Amie M Solosky, Iliana M Claudio, Jessie R Chappel, Kaylie I Kirkwood-Donelson, Michael G Janech, Alison M Bland, Frances M D Gulland, Benjamin A Neely, Erin S Baker
Domoic acid is a neurotoxin secreted by the marine diatom genus Pseudo-nitzschia during toxic algal bloom events. California sea lions (Zalophus californianus) are exposed to domoic acid through the ingestion of fish that feed on toxic diatoms, resulting in domoic acid toxicosis (DAT), which can vary from mild to fatal. Sea lions with mild disease can be treated if toxicosis is detected early after exposure. Therefore, rapid diagnosis of DAT is essential but also challenging. In this work, we performed multiomics analyses, specifically proteomic and lipidomic, on blood samples from 31 California sea lions. Fourteen sea lions were diagnosed with DAT based on clinical signs and post-mortem histological examination of brain tissue, and 17 had no evidence of DAT. Proteomic analyses revealed 31 statistically significant proteins in the DAT individuals compared to the non-DAT individuals (adjusted p < 0.05). Of these proteins, 19 were decreased in the DAT group of which three were apolipoproteins that are known to transport lipids in the blood, prompting lipidomic analyses. In the lipidomic analyses, 331 lipid species were detected with high confidence and multidimensional separations, and 29 were found to be statistically significant (adjusted p < 0.05 and log2(FC) < -1 or >1) in the DAT versus non-DAT comparison. Of these, 28 were lower in the DAT individuals, while only 1 was higher. Furthermore, 15 of the 28 lower concentration lipids were triglycerides, illustrating their putative connection with the perturbed apolipoproteins and potential use in rapid DAT diagnoses.
多羧酸是海洋硅藻属(Pseudo-nitzschia)在有毒藻类大量繁殖期间分泌的一种神经毒素。加州海狮(Zalophus californianus)通过摄取以有毒硅藻为食的鱼类而接触到多墨酸,从而导致多墨酸中毒症(DAT),这种病症轻重不一,有的甚至致命。如果在接触后及早发现中毒症状,病情轻微的海狮可以得到治疗。因此,快速诊断多杀菌酸中毒症至关重要,但也极具挑战性。在这项工作中,我们对 31 头加州海狮的血液样本进行了多组学分析,特别是蛋白质组和脂质组学分析。根据临床症状和死后脑组织的组织学检查,14 头海狮被诊断为患有 DAT,17 头海狮没有证据表明患有 DAT。蛋白质组分析表明,与非 DAT 海狮相比,DAT 海狮体内有 31 种具有统计学意义的蛋白质(调整后 p < 0.05)。在这些蛋白质中,有19种蛋白质在DAT组中减少,其中有三种是脂蛋白,众所周知,脂蛋白在血液中运输脂质,因此需要进行脂质体分析。在脂质组学分析中,通过高置信度和多维分离检测到 331 种脂质,发现有 29 种脂质在 DAT 与非 DAT 的比较中具有统计学意义(调整后 p < 0.05 且 log2(FC) < -1 或 >1)。其中,有 28 个浓度较低,只有 1 个较高。此外,28种浓度较低的脂质中有15种是甘油三酯,这说明它们可能与受干扰的脂蛋白有关,并可能用于快速诊断DAT。
{"title":"Proteomic and Lipidomic Plasma Evaluations Reveal Biomarkers for Domoic Acid Toxicosis in California Sea Lions.","authors":"Amie M Solosky, Iliana M Claudio, Jessie R Chappel, Kaylie I Kirkwood-Donelson, Michael G Janech, Alison M Bland, Frances M D Gulland, Benjamin A Neely, Erin S Baker","doi":"10.1021/acs.jproteome.4c00820","DOIUrl":"10.1021/acs.jproteome.4c00820","url":null,"abstract":"<p><p>Domoic acid is a neurotoxin secreted by the marine diatom genus <i>Pseudo-nitzschia</i> during toxic algal bloom events. California sea lions (<i>Zalophus californianus</i>) are exposed to domoic acid through the ingestion of fish that feed on toxic diatoms, resulting in domoic acid toxicosis (DAT), which can vary from mild to fatal. Sea lions with mild disease can be treated if toxicosis is detected early after exposure. Therefore, rapid diagnosis of DAT is essential but also challenging. In this work, we performed multiomics analyses, specifically proteomic and lipidomic, on blood samples from 31 California sea lions. Fourteen sea lions were diagnosed with DAT based on clinical signs and post-mortem histological examination of brain tissue, and 17 had no evidence of DAT. Proteomic analyses revealed 31 statistically significant proteins in the DAT individuals compared to the non-DAT individuals (adjusted <i>p</i> < 0.05). Of these proteins, 19 were decreased in the DAT group of which three were apolipoproteins that are known to transport lipids in the blood, prompting lipidomic analyses. In the lipidomic analyses, 331 lipid species were detected with high confidence and multidimensional separations, and 29 were found to be statistically significant (adjusted <i>p</i> < 0.05 and log2(FC) < -1 or >1) in the DAT versus non-DAT comparison. Of these, 28 were lower in the DAT individuals, while only 1 was higher. Furthermore, 15 of the 28 lower concentration lipids were triglycerides, illustrating their putative connection with the perturbed apolipoproteins and potential use in rapid DAT diagnoses.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5577-5585"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142708551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06Epub Date: 2024-10-30DOI: 10.1021/acs.jproteome.4c00667
Julián Candia, Giovanna Fantoni, Francheska Delgado-Peraza, Nader Shehadeh, Toshiko Tanaka, Ruin Moaddel, Keenan A Walker, Luigi Ferrucci
SomaScan is an aptamer-based proteomics assay designed for the simultaneous measurement of thousands of human proteins with a broad range of endogenous concentrations. The 7K SomaScan assay has recently been expanded into the new 11K version. Following up on our previous assessment of the 7K assay, here, we expand our work on technical replicates from donors enrolled in the Baltimore Longitudinal Study of Aging. By generating SomaScan data from a second batch of technical replicates in the 7K version as well as additional intra- and interplate replicate measurements in the new 11K version using the same donor samples, this work provides useful precision benchmarks for the SomaScan user community. Beyond updating our previous technical assessment of the 7K assay with increased statistics, here, we estimate interbatch variability, assess inter- and intraplate variability in the new 11K assay, compare the observed variability between the 7K and 11K assays (leveraging the use of overlapping pairs of technical replicates), and explore the potential effects of sample storage time (ranging from 2 to 30 years) in the assays' precision.
{"title":"Variability of 7K and 11K SomaScan Plasma Proteomics Assays.","authors":"Julián Candia, Giovanna Fantoni, Francheska Delgado-Peraza, Nader Shehadeh, Toshiko Tanaka, Ruin Moaddel, Keenan A Walker, Luigi Ferrucci","doi":"10.1021/acs.jproteome.4c00667","DOIUrl":"10.1021/acs.jproteome.4c00667","url":null,"abstract":"<p><p>SomaScan is an aptamer-based proteomics assay designed for the simultaneous measurement of thousands of human proteins with a broad range of endogenous concentrations. The 7K SomaScan assay has recently been expanded into the new 11K version. Following up on our previous assessment of the 7K assay, here, we expand our work on technical replicates from donors enrolled in the Baltimore Longitudinal Study of Aging. By generating SomaScan data from a second batch of technical replicates in the 7K version as well as additional intra- and interplate replicate measurements in the new 11K version using the same donor samples, this work provides useful precision benchmarks for the SomaScan user community. Beyond updating our previous technical assessment of the 7K assay with increased statistics, here, we estimate interbatch variability, assess inter- and intraplate variability in the new 11K assay, compare the observed variability between the 7K and 11K assays (leveraging the use of overlapping pairs of technical replicates), and explore the potential effects of sample storage time (ranging from 2 to 30 years) in the assays' precision.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5531-5539"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629374/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142542882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
PF1 + 2 plasma levels are a crucial indicator for assessing anticoagulant action in individuals receiving anticoagulant treatment. Urine also has PF1 + 2 levels due to its molecular size. Hence, the present study aims to measure urinary prothrombin fragment 1 + 2 (uPF1 + 2) in patients taking anticoagulants in order to divulge a noninvasive surrogate marker of PT-INR of blood coagulopathy. A total of 205 people participated in the study: 104 patients on acenocoumarol (AC) and 101 healthy controls (HC). Clinical parameters, including PT-INR, urinary creatinine, etc., were measured in all subjects. To evaluate uPF1 + 2 in samples, MALDI-TOF-MS, Western blot analysis, and ELISA tests were used. The MALDI-TOF-MS results showed the presence of uPF1 + 2 in both AC and HC urine samples. The Western blot, ELISA experiment, and unpaired t test results displayed that the patients with AC had significantly increased levels of uPF1 + 2 compared to HC. A regression study showed a strong positive relation between blood-based PT-INR and uPF1 + 2. ROC validation also revealed the clinical efficacy of uPF1 + 2. For the goal to monitor anticoagulant medication, the present study highlights PF1 + 2, which describes the overall hemostatic capacity and might be utilized in addition to or instead of PT-INR.
{"title":"Role of Noninvasive Urinary Prothrombin Fragment 1 + 2 to Measure Blood Coagulation Indices and Dose of Acenocoumarol.","authors":"Ashish Gupta, Deepak Kumar, Niharika Bharti, Sudeep Kumar, Shantanu Pande, Vikas Agarwal","doi":"10.1021/acs.jproteome.4c00462","DOIUrl":"10.1021/acs.jproteome.4c00462","url":null,"abstract":"<p><p>PF1 + 2 plasma levels are a crucial indicator for assessing anticoagulant action in individuals receiving anticoagulant treatment. Urine also has PF1 + 2 levels due to its molecular size. Hence, the present study aims to measure urinary prothrombin fragment 1 + 2 (uPF1 + 2) in patients taking anticoagulants in order to divulge a noninvasive surrogate marker of PT-INR of blood coagulopathy. A total of 205 people participated in the study: 104 patients on acenocoumarol (AC) and 101 healthy controls (HC). Clinical parameters, including PT-INR, urinary creatinine, etc., were measured in all subjects. To evaluate uPF1 + 2 in samples, MALDI-TOF-MS, Western blot analysis, and ELISA tests were used. The MALDI-TOF-MS results showed the presence of uPF1 + 2 in both AC and HC urine samples. The Western blot, ELISA experiment, and unpaired <i>t</i> test results displayed that the patients with AC had significantly increased levels of uPF1 + 2 compared to HC. A regression study showed a strong positive relation between blood-based PT-INR and uPF1 + 2. ROC validation also revealed the clinical efficacy of uPF1 + 2. For the goal to monitor anticoagulant medication, the present study highlights PF1 + 2, which describes the overall hemostatic capacity and might be utilized in addition to or instead of PT-INR.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5342-5351"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142054236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The hospitalization and mortality rates of patients gradually increase following the onset and progression of liver cirrhosis (LC). We aimed to help define clinical stage and better target interventions by detecting the expression of specific metabolites in patients with different stages of LC via Q Exactive hybrid quadrupole orbitrap mass spectrometry (UPLC-Q-Exactive) technology. This noninterventional observation case-control study involved 139 patients with LC or acute-on-chronic liver failure (ACLF) in a Chinese hospital between October 2022 and April 2023. Serum specimens were analyzed for multiple metabolite levels using UPLC-Q-Exactive. Data were processed to screen for differentially accumulated metabolites (DAMs). Short time-series expression miner (STEM) analysis and enrichment analysis were performed to assess cirrhosis progression biomarkers. Following univariate and multivariate analyses, a Venn diagram indicated nine significant DAMs in common among groups. STEM analysis showed 8'-hydroxyabscisic acid, HDCA, pyruvate-3-phosphate, indospicine, eplerenone, and DEHP as significant; their levels first peaked [Child-Turcotte-Pugh (CTP) class B peaked] and then decreased with CTP grade aggravation. Significant differences among 8'-hydroxyabscisic acid, eplerenone, and DEHP were observed among LC comorbidities and between subgroups. Therefore, serum levels of six DAMs may characterize metabolomic changes, determine the severity of LC, and predict the development of ACLF.
{"title":"Metabolic Differences among Patients with Cirrhosis Using Q Exactive Hybrid Quadrupole Orbitrap Mass Spectrometry Technology.","authors":"Ying Xiao, Jie Lu, Suyan Xu, Zhinian Wu, Wei Wang, Ru Ji, Tingyu Guo, Zeqiang Qi, Hua Tong, Yadong Wang, Caiyan Zhao","doi":"10.1021/acs.jproteome.4c00437","DOIUrl":"10.1021/acs.jproteome.4c00437","url":null,"abstract":"<p><p>The hospitalization and mortality rates of patients gradually increase following the onset and progression of liver cirrhosis (LC). We aimed to help define clinical stage and better target interventions by detecting the expression of specific metabolites in patients with different stages of LC via Q Exactive hybrid quadrupole orbitrap mass spectrometry (UPLC-Q-Exactive) technology. This noninterventional observation case-control study involved 139 patients with LC or acute-on-chronic liver failure (ACLF) in a Chinese hospital between October 2022 and April 2023. Serum specimens were analyzed for multiple metabolite levels using UPLC-Q-Exactive. Data were processed to screen for differentially accumulated metabolites (DAMs). Short time-series expression miner (STEM) analysis and enrichment analysis were performed to assess cirrhosis progression biomarkers. Following univariate and multivariate analyses, a Venn diagram indicated nine significant DAMs in common among groups. STEM analysis showed 8'-hydroxyabscisic acid, HDCA, pyruvate-3-phosphate, indospicine, eplerenone, and DEHP as significant; their levels first peaked [Child-Turcotte-Pugh (CTP) class B peaked] and then decreased with CTP grade aggravation. Significant differences among 8'-hydroxyabscisic acid, eplerenone, and DEHP were observed among LC comorbidities and between subgroups. Therefore, serum levels of six DAMs may characterize metabolomic changes, determine the severity of LC, and predict the development of ACLF.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5352-5359"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06Epub Date: 2024-11-08DOI: 10.1021/acs.jproteome.4c00776
Gilbert S Omenn, Sandra Orchard, Lydie Lane, Cecilia Lindskog, Charles Pineau, Christopher M Overall, Bogdan Budnik, Jonathan M Mudge, Nicolle H Packer, Susan T Weintraub, Michael H A Roehrl, Edouard Nice, Tiannan Guo, Jennifer E Van Eyk, Uwe Völker, Gong Zhang, Nuno Bandeira, Ruedi Aebersold, Robert L Moritz, Eric W Deutsch
The Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify at least one isoform of every protein-coding gene and (2) to make proteomics an integral part of multiomics studies of human health and disease. The past year has seen major transitions for the HPP. neXtProt was retired as the official HPP knowledge base, UniProtKB became the reference proteome knowledge base, and Ensembl-GENCODE provides the reference protein target list. A function evidence FE1-5 scoring system has been developed for functional annotation of proteins, parallel to the PE1-5 UniProtKB/neXtProt scheme for evidence of protein expression. This report includes updates from neXtProt (version 2023-09) and UniProtKB release 2024_04, with protein expression detected (PE1) for 18138 of the 19411 GENCODE protein-coding genes (93%). The number of non-PE1 proteins ("missing proteins") is now 1273. The transition to GENCODE is a net reduction of 367 proteins (19,411 PE1-5 instead of 19,778 PE1-4 last year in neXtProt). We include reports from the Biology and Disease-driven HPP, the Human Protein Atlas, and the HPP Grand Challenge Project. We expect the new Functional Evidence FE1-5 scheme to energize the Grand Challenge Project for functional annotation of human proteins throughout the global proteomics community, including π-HuB in China.
{"title":"The 2024 Report on the Human Proteome from the HUPO Human Proteome Project.","authors":"Gilbert S Omenn, Sandra Orchard, Lydie Lane, Cecilia Lindskog, Charles Pineau, Christopher M Overall, Bogdan Budnik, Jonathan M Mudge, Nicolle H Packer, Susan T Weintraub, Michael H A Roehrl, Edouard Nice, Tiannan Guo, Jennifer E Van Eyk, Uwe Völker, Gong Zhang, Nuno Bandeira, Ruedi Aebersold, Robert L Moritz, Eric W Deutsch","doi":"10.1021/acs.jproteome.4c00776","DOIUrl":"10.1021/acs.jproteome.4c00776","url":null,"abstract":"<p><p>The Human Proteome Project (HPP), the flagship initiative of the Human Proteome Organization (HUPO), has pursued two goals: (1) to credibly identify at least one isoform of every protein-coding gene and (2) to make proteomics an integral part of multiomics studies of human health and disease. The past year has seen major transitions for the HPP. neXtProt was retired as the official HPP knowledge base, UniProtKB became the reference proteome knowledge base, and Ensembl-GENCODE provides the reference protein target list. A function evidence FE1-5 scoring system has been developed for functional annotation of proteins, parallel to the PE1-5 UniProtKB/neXtProt scheme for evidence of protein expression. This report includes updates from neXtProt (version 2023-09) and UniProtKB release 2024_04, with protein expression detected (PE1) for 18138 of the 19411 GENCODE protein-coding genes (93%). The number of non-PE1 proteins (\"missing proteins\") is now 1273. The transition to GENCODE is a net reduction of 367 proteins (19,411 PE1-5 instead of 19,778 PE1-4 last year in neXtProt). We include reports from the Biology and Disease-driven HPP, the Human Protein Atlas, and the HPP Grand Challenge Project. We expect the new Functional Evidence FE1-5 scheme to energize the Grand Challenge Project for functional annotation of human proteins throughout the global proteomics community, including π-HuB in China.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5296-5311"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142602440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06Epub Date: 2024-11-11DOI: 10.1021/acs.jproteome.4c00535
David J Degnan, Clayton W Strauch, Moses Y Obiri, Erik D VonKaenel, Grace S Kim, James D Kershaw, David L Novelli, Karl Tl Pazdernik, Lisa M Bramer
The study of protein-protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.
{"title":"Protein-Protein Interaction Networks Derived from Classical and Machine Learning-Based Natural Language Processing Tools.","authors":"David J Degnan, Clayton W Strauch, Moses Y Obiri, Erik D VonKaenel, Grace S Kim, James D Kershaw, David L Novelli, Karl Tl Pazdernik, Lisa M Bramer","doi":"10.1021/acs.jproteome.4c00535","DOIUrl":"10.1021/acs.jproteome.4c00535","url":null,"abstract":"<p><p>The study of protein-protein interactions (PPIs) provides insight into various biological mechanisms, including the binding of antibodies to antigens, enzymes to inhibitors or promoters, and receptors to ligands. Recent studies of PPIs have led to significant biological breakthroughs. For example, the study of PPIs involved in the human:SARS-CoV-2 viral infection mechanism aided in the development of SARS-CoV-2 vaccines. Though several databases exist for the manual curation of PPI networks, text mining methods have been routinely demonstrated as useful alternatives for newly studied or understudied species, where databases are incomplete. Here, the relationship extraction performance of several open-source classical text processing, machine learning (ML)-based natural language processing (NLP), and large language model (LLM)-based NLP tools was compared. Overall, our results indicated that networks derived from classical methods tend to have high true positive rates at the expense of having overconnected networks, ML-based NLP methods have lower true positive rates but networks with the closest structures to the target network, and LLM-based NLP methods tend to exist between the two other approaches, with variable performances. The selection of a specific NLP approach should be tied to the needs of a study and text availability, as models varied in performance due to the amount of text provided.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5395-5404"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142612731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06DOI: 10.1021/acs.jproteome.4c00829
Rosalee McMahon, Natasha Lucas, Cameron Hill, Dana Pascovici, Ben Herbert, Elisabeth Karsten
Diagnosis of non-small cell lung cancer (NSCLC) currently relies on imaging; however, these methods are not effective for detecting early stage disease. Investigating blood-based protein biomarkers aims to simplify the diagnostic process and identify disease-associated changes before they can be seen by using imaging techniques. In this study, plasma and frozen whole blood cell pellets from NSCLC patients and healthy controls were processed using both classical and novel techniques to produce a unique set of four sample types from a single blood draw. These samples were analyzed using 12 immunoassays and liquid chromatography-mass spectrometry to collectively screen 3974 proteins. Analysis of all fractions produced a set of 522 differentially expressed proteins, with conventional blood analysis (proteomic analysis of plasma) accounting for only 7 of the total. Boosted regression tree analysis of the differentially expressed proteins produced a panel of 13 proteins that were able to discriminate between controls and NSCLC patients, with an area under the ROC curve (AUC) of 0.864 for the set. Our rapid and reproducible (<10% CV for technical replicates) blood preparation and analysis methods enabled the production of high-quality data from only 30 μL of complex samples that typically require significant fractionation prior to proteomic analysis. With our methods, almost 4000 proteins were identified from a single fraction over a 62.5 min gradient by LC-MS/MS.
{"title":"Investigating the Use of Novel Blood Processing Methods to Boost the Identification of Biomarkers for Non-Small Cell Lung Cancer: A Proof of Concept.","authors":"Rosalee McMahon, Natasha Lucas, Cameron Hill, Dana Pascovici, Ben Herbert, Elisabeth Karsten","doi":"10.1021/acs.jproteome.4c00829","DOIUrl":"https://doi.org/10.1021/acs.jproteome.4c00829","url":null,"abstract":"<p><p>Diagnosis of non-small cell lung cancer (NSCLC) currently relies on imaging; however, these methods are not effective for detecting early stage disease. Investigating blood-based protein biomarkers aims to simplify the diagnostic process and identify disease-associated changes before they can be seen by using imaging techniques. In this study, plasma and frozen whole blood cell pellets from NSCLC patients and healthy controls were processed using both classical and novel techniques to produce a unique set of four sample types from a single blood draw. These samples were analyzed using 12 immunoassays and liquid chromatography-mass spectrometry to collectively screen 3974 proteins. Analysis of all fractions produced a set of 522 differentially expressed proteins, with conventional blood analysis (proteomic analysis of plasma) accounting for only 7 of the total. Boosted regression tree analysis of the differentially expressed proteins produced a panel of 13 proteins that were able to discriminate between controls and NSCLC patients, with an area under the ROC curve (AUC) of 0.864 for the set. Our rapid and reproducible (<10% CV for technical replicates) blood preparation and analysis methods enabled the production of high-quality data from only 30 μL of complex samples that typically require significant fractionation prior to proteomic analysis. With our methods, almost 4000 proteins were identified from a single fraction over a 62.5 min gradient by LC-MS/MS.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142789420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06DOI: 10.1021/acs.jproteome.4c00293
Abdul Rehman Basharat, Xingzhao Xiong, Tian Xu, Yong Zang, Liangliang Sun, Xiaowen Liu
Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the past decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS using Escherichia coli K-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.
{"title":"TopDIA: A Software Tool for Top-Down Data-Independent Acquisition Proteomics.","authors":"Abdul Rehman Basharat, Xingzhao Xiong, Tian Xu, Yong Zang, Liangliang Sun, Xiaowen Liu","doi":"10.1021/acs.jproteome.4c00293","DOIUrl":"10.1021/acs.jproteome.4c00293","url":null,"abstract":"<p><p>Top-down mass spectrometry is widely used for proteoform identification, characterization, and quantification owing to its ability to analyze intact proteoforms. In the past decade, top-down proteomics has been dominated by top-down data-dependent acquisition mass spectrometry (TD-DDA-MS), and top-down data-independent acquisition mass spectrometry (TD-DIA-MS) has not been well studied. While TD-DIA-MS produces complex multiplexed tandem mass spectrometry (MS/MS) spectra, which are challenging to confidently identify, it selects more precursor ions for MS/MS analysis and has the potential to increase proteoform identifications compared with TD-DDA-MS. Here we present TopDIA, the first software tool for proteoform identification by TD-DIA-MS. It generates demultiplexed pseudo MS/MS spectra from TD-DIA-MS data and then searches the pseudo MS/MS spectra against a protein sequence database for proteoform identification. We compared the performance of TD-DDA-MS and TD-DIA-MS using <i>Escherichia coli</i> K-12 MG1655 cells and demonstrated that TD-DIA-MS with TopDIA increased proteoform and protein identifications compared with TD-DDA-MS.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":""},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142783345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-06Epub Date: 2024-10-31DOI: 10.1021/acs.jproteome.4c00586
Philipp E Geyer, Daniel Hornburg, Maria Pernemalm, Stefanie M Hauck, Krishnan K Palaniappan, Vincent Albrecht, Laura F Dagley, Robert L Moritz, Xiaobo Yu, Fredrik Edfors, Yves Vandenbrouck, Johannes B Mueller-Reif, Zhi Sun, Virginie Brun, Sara Ahadi, Gilbert S Omenn, Eric W Deutsch, Jochen M Schwenk
Recent improvements in proteomics technologies have fundamentally altered our capacities to characterize human biology. There is an ever-growing interest in using these novel methods for studying the circulating proteome, as blood offers an accessible window into human health. However, every methodological innovation and analytical progress calls for reassessing our existing approaches and routines to ensure that the new data will add value to the greater biomedical research community and avoid previous errors. As representatives of HUPO's Human Plasma Proteome Project (HPPP), we present our 2024 survey of the current progress in our community, including the latest build of the Human Plasma Proteome PeptideAtlas that now comprises 4608 proteins detected in 113 data sets. We then discuss the updates of established proteomics methods, emerging technologies, and investigations of proteoforms, protein networks, extracellualr vesicles, circulating antibodies and microsamples. Finally, we provide a prospective view of using the current and emerging proteomics tools in studies of circulating proteins.
{"title":"The Circulating Proteome─Technological Developments, Current Challenges, and Future Trends.","authors":"Philipp E Geyer, Daniel Hornburg, Maria Pernemalm, Stefanie M Hauck, Krishnan K Palaniappan, Vincent Albrecht, Laura F Dagley, Robert L Moritz, Xiaobo Yu, Fredrik Edfors, Yves Vandenbrouck, Johannes B Mueller-Reif, Zhi Sun, Virginie Brun, Sara Ahadi, Gilbert S Omenn, Eric W Deutsch, Jochen M Schwenk","doi":"10.1021/acs.jproteome.4c00586","DOIUrl":"10.1021/acs.jproteome.4c00586","url":null,"abstract":"<p><p>Recent improvements in proteomics technologies have fundamentally altered our capacities to characterize human biology. There is an ever-growing interest in using these novel methods for studying the circulating proteome, as blood offers an accessible window into human health. However, every methodological innovation and analytical progress calls for reassessing our existing approaches and routines to ensure that the new data will add value to the greater biomedical research community and avoid previous errors. As representatives of HUPO's Human Plasma Proteome Project (HPPP), we present our 2024 survey of the current progress in our community, including the latest build of the Human Plasma Proteome PeptideAtlas that now comprises 4608 proteins detected in 113 data sets. We then discuss the updates of established proteomics methods, emerging technologies, and investigations of proteoforms, protein networks, extracellualr vesicles, circulating antibodies and microsamples. Finally, we provide a prospective view of using the current and emerging proteomics tools in studies of circulating proteins.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":" ","pages":"5279-5295"},"PeriodicalIF":3.8,"publicationDate":"2024-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11629384/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142542880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}