Jiří Pospíšil, Alice Sax, Martin Hubálek, Libor Krásný, Jiří Vohradský
In this study, we present a high-resolution dataset and bioinformatic analysis of the proteome of Bacillus subtilis 168 trp+ (BSB1) during germination and spore outgrowth. Samples were collected at 14 different time points (ranging from 0 to 130 min) in three biological replicates after spore inoculation into germination medium. A total of 2191 proteins were identified and categorized based on their expression kinetics. We observed four distinct clusters that were analyzed for functional categories and KEGG pathways annotations. The examination of newly synthesized proteins between successive time points revealed significant changes, particularly within the first 50 min. The dataset provides an information base that can be used for modeling purposes and inspire the design of new experiments.
{"title":"Whole proteome analysis of germinating and outgrowing Bacillus subtilis 168","authors":"Jiří Pospíšil, Alice Sax, Martin Hubálek, Libor Krásný, Jiří Vohradský","doi":"10.1002/pmic.202400031","DOIUrl":"10.1002/pmic.202400031","url":null,"abstract":"<p>In this study, we present a high-resolution dataset and bioinformatic analysis of the proteome of <i>Bacillus subtilis</i> 168 trp+ (BSB1) during germination and spore outgrowth. Samples were collected at 14 different time points (ranging from 0 to 130 min) in three biological replicates after spore inoculation into germination medium. A total of 2191 proteins were identified and categorized based on their expression kinetics. We observed four distinct clusters that were analyzed for functional categories and KEGG pathways annotations. The examination of newly synthesized proteins between successive time points revealed significant changes, particularly within the first 50 min. The dataset provides an information base that can be used for modeling purposes and inspire the design of new experiments.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 17","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/pmic.202400031","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141750690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Fang, Guangyao Gao, Qianyi Wang, Qianjie Wang, Liangliang Sun
Mass spectrometry (MS)-based top-down proteomics (TDP) analysis of histone proteoforms provides critical information about combinatorial post-translational modifications (PTMs), which is vital for pursuing a better understanding of epigenetic regulation of gene expression. It requires high-resolution separations of histone proteoforms before MS and tandem MS (MS/MS) analysis. In this work, for the first time, we combined SDS-PAGE-based protein fractionation (passively eluting proteins from polyacrylamide gels as intact species for mass spectrometry, PEPPI-MS) with capillary zone electrophoresis (CZE)-MS/MS for high-resolution characterization of histone proteoforms. We systematically studied the histone proteoform extraction from SDS-PAGE gel and follow-up cleanup as well as CZE-MS/MS, to determine an optimal procedure. The optimal procedure showed reproducible and high-resolution separation and characterization of histone proteoforms. SDS-PAGE separated histone proteins (H1, H2, H3, and H4) based on their molecular weight and CZE provided additional separations of proteoforms of each histone protein based on their electrophoretic mobility, which was affected by PTMs, for example, acetylation and phosphorylation. Using the technique, we identified over 200 histone proteoforms from a commercial calf thymus histone sample with good reproducibility. The orthogonal and high-resolution separations of SDS-PAGE and CZE made our technique attractive for the delineation of histone proteoforms extracted from complex biological systems.
{"title":"Combining SDS-PAGE to capillary zone electrophoresis-tandem mass spectrometry for high-resolution top-down proteomics analysis of intact histone proteoforms","authors":"Fei Fang, Guangyao Gao, Qianyi Wang, Qianjie Wang, Liangliang Sun","doi":"10.1002/pmic.202300650","DOIUrl":"10.1002/pmic.202300650","url":null,"abstract":"<p>Mass spectrometry (MS)-based top-down proteomics (TDP) analysis of histone proteoforms provides critical information about combinatorial post-translational modifications (PTMs), which is vital for pursuing a better understanding of epigenetic regulation of gene expression. It requires high-resolution separations of histone proteoforms before MS and tandem MS (MS/MS) analysis. In this work, for the first time, we combined SDS-PAGE-based protein fractionation (passively eluting proteins from polyacrylamide gels as intact species for mass spectrometry, PEPPI-MS) with capillary zone electrophoresis (CZE)-MS/MS for high-resolution characterization of histone proteoforms. We systematically studied the histone proteoform extraction from SDS-PAGE gel and follow-up cleanup as well as CZE-MS/MS, to determine an optimal procedure. The optimal procedure showed reproducible and high-resolution separation and characterization of histone proteoforms. SDS-PAGE separated histone proteins (H1, H2, H3, and H4) based on their molecular weight and CZE provided additional separations of proteoforms of each histone protein based on their electrophoretic mobility, which was affected by PTMs, for example, acetylation and phosphorylation. Using the technique, we identified over 200 histone proteoforms from a commercial calf thymus histone sample with good reproducibility. The orthogonal and high-resolution separations of SDS-PAGE and CZE made our technique attractive for the delineation of histone proteoforms extracted from complex biological systems.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 17","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/pmic.202300650","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141632126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Aaron O. Bailey, Kenneth R. Durbin, Matthew T. Robey, Lee K. Palmer, William K. Russell
Liquid chromatography–mass spectrometry (LC-MS) intact mass analysis and LC-MS/MS peptide mapping are decisional assays for developing biological drugs and other commercial protein products. Certain PTM types, such as truncation and oxidation, increase the difficulty of precise proteoform characterization owing to inherent limitations in peptide and intact protein analyses. Top-down MS (TDMS) can resolve this ambiguity via fragmentation of specific proteoforms. We leveraged the strengths of flow-programmed (fp) denaturing online buffer exchange (dOBE) chromatography, including robust automation, relatively high ESI sensitivity, and long MS/MS window time, to support a TDMS platform for industrial protein characterization. We tested data-dependent (DDA) and targeted strategies using 14 different MS/MS scan types featuring combinations of collisional- and electron-based fragmentation as well as proton transfer charge reduction. This large, focused dataset was processed using a new software platform, named TDAcquireX, that improves proteoform characterization through TDMS data aggregation. A DDA-based workflow provided objective identification of αLac truncation proteoforms with a two-termini clipping search. A targeted TDMS workflow facilitated the characterization of αLac oxidation positional isomers. This strategy relied on using sliding window-based fragment ion deconvolution to generate composite proteoform spectral match (cPrSM) results amenable to fragment noise filtering, which is a fundamental enhancement relevant to TDMS applications generally.
{"title":"Filling the gaps in peptide maps with a platform assay for top-down characterization of purified protein samples","authors":"Aaron O. Bailey, Kenneth R. Durbin, Matthew T. Robey, Lee K. Palmer, William K. Russell","doi":"10.1002/pmic.202400036","DOIUrl":"10.1002/pmic.202400036","url":null,"abstract":"<p>Liquid chromatography–mass spectrometry (LC-MS) intact mass analysis and LC-MS/MS peptide mapping are decisional assays for developing biological drugs and other commercial protein products. Certain PTM types, such as truncation and oxidation, increase the difficulty of precise proteoform characterization owing to inherent limitations in peptide and intact protein analyses. Top-down MS (TDMS) can resolve this ambiguity via fragmentation of specific proteoforms. We leveraged the strengths of flow-programmed (fp) denaturing online buffer exchange (dOBE) chromatography, including robust automation, relatively high ESI sensitivity, and long MS/MS window time, to support a TDMS platform for industrial protein characterization. We tested data-dependent (DDA) and targeted strategies using 14 different MS/MS scan types featuring combinations of collisional- and electron-based fragmentation as well as proton transfer charge reduction. This large, focused dataset was processed using a new software platform, named TDAcquireX, that improves proteoform characterization through TDMS data aggregation. A DDA-based workflow provided objective identification of αLac truncation proteoforms with a two-termini clipping search. A targeted TDMS workflow facilitated the characterization of αLac oxidation positional isomers. This strategy relied on using sliding window-based fragment ion deconvolution to generate composite proteoform spectral match (cPrSM) results amenable to fragment noise filtering, which is a fundamental enhancement relevant to TDMS applications generally.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 21-22","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141615392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Predicting protein function from protein sequence, structure, interaction, and other relevant information is important for generating hypotheses for biological experiments and studying biological systems, and therefore has been a major challenge in protein bioinformatics. Numerous computational methods had been developed to advance protein function prediction gradually in the last two decades. Particularly, in the recent years, leveraging the revolutionary advances in artificial intelligence (AI), more and more deep learning methods have been developed to improve protein function prediction at a faster pace. Here, we provide an in-depth review of the recent developments of deep learning methods for protein function prediction. We summarize the significant advances in the field, identify several remaining major challenges to be tackled, and suggest some potential directions to explore. The data sources and evaluation metrics widely used in protein function prediction are also discussed to assist the machine learning, AI, and bioinformatics communities to develop more cutting-edge methods to advance protein function prediction.
{"title":"Deep learning methods for protein function prediction","authors":"Frimpong Boadu, Ahhyun Lee, Jianlin Cheng","doi":"10.1002/pmic.202300471","DOIUrl":"10.1002/pmic.202300471","url":null,"abstract":"<p>Predicting protein function from protein sequence, structure, interaction, and other relevant information is important for generating hypotheses for biological experiments and studying biological systems, and therefore has been a major challenge in protein bioinformatics. Numerous computational methods had been developed to advance protein function prediction gradually in the last two decades. Particularly, in the recent years, leveraging the revolutionary advances in artificial intelligence (AI), more and more deep learning methods have been developed to improve protein function prediction at a faster pace. Here, we provide an in-depth review of the recent developments of deep learning methods for protein function prediction. We summarize the significant advances in the field, identify several remaining major challenges to be tackled, and suggest some potential directions to explore. The data sources and evaluation metrics widely used in protein function prediction are also discussed to assist the machine learning, AI, and bioinformatics communities to develop more cutting-edge methods to advance protein function prediction.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"25 1-2","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735672/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141597982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Colin W. Combe, Lars Kolbowski, Lutz Fischer, Ville Koskinen, Joshua Klein, Alexander Leitner, Andrew R. Jones, Juan Antonio Vizcaíno, Juri Rappsilber
The mzIdentML data format, originally developed by the Proteomics Standards Initiative in 2011, is the open XML data standard for peptide and protein identification results coming from mass spectrometry. We present mzIdentML version 1.3.0, which introduces new functionality and support for additional use cases. First of all, a new mechanism for encoding identifications based on multiple spectra has been introduced. Furthermore, the main mzIdentML specification document can now be supplemented by extension documents which provide further guidance for encoding specific use cases for different proteomics subfields. One extension document has been added, covering additional use cases for the encoding of crosslinked peptide identifications. The ability to add extension documents facilitates keeping the mzIdentML standard up to date with advances in the proteomics field, without having to change the main specification document. The crosslinking extension document provides further explanation of the crosslinking use cases already supported in mzIdentML version 1.2.0, and provides support for encoding additional scenarios that are critical to reflect developments in the crosslinking field and facilitate its integration in structural biology. These are: (i) support for cleavable crosslinkers, (ii) support for internally linked peptides, (iii) support for noncovalently associated peptides, and (iv) improved support for encoding scores and the corresponding thresholds.
{"title":"mzIdentML 1.3.0 – Essential progress on the support of crosslinking and other identifications based on multiple spectra","authors":"Colin W. Combe, Lars Kolbowski, Lutz Fischer, Ville Koskinen, Joshua Klein, Alexander Leitner, Andrew R. Jones, Juan Antonio Vizcaíno, Juri Rappsilber","doi":"10.1002/pmic.202300385","DOIUrl":"10.1002/pmic.202300385","url":null,"abstract":"<p>The mzIdentML data format, originally developed by the Proteomics Standards Initiative in 2011, is the open XML data standard for peptide and protein identification results coming from mass spectrometry. We present mzIdentML version 1.3.0, which introduces new functionality and support for additional use cases. First of all, a new mechanism for encoding identifications based on multiple spectra has been introduced. Furthermore, the main mzIdentML specification document can now be supplemented by extension documents which provide further guidance for encoding specific use cases for different proteomics subfields. One extension document has been added, covering additional use cases for the encoding of crosslinked peptide identifications. The ability to add extension documents facilitates keeping the mzIdentML standard up to date with advances in the proteomics field, without having to change the main specification document. The crosslinking extension document provides further explanation of the crosslinking use cases already supported in mzIdentML version 1.2.0, and provides support for encoding additional scenarios that are critical to reflect developments in the crosslinking field and facilitate its integration in structural biology. These are: (i) support for cleavable crosslinkers, (ii) support for internally linked peptides, (iii) support for noncovalently associated peptides, and (iv) improved support for encoding scores and the corresponding thresholds.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 17","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/pmic.202300385","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141597983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Given the pivotal roles of metabolomics and microbiomics, numerous data mining approaches aim to uncover their intricate connections. However, the complex many-to-many associations between metabolome-microbiome profiles yield numerous statistically significant but biologically unvalidated candidates. To address these challenges, we introduce BiOFI, a strategic framework for identifying metabolome-microbiome correlation pairs (Bi-Omics). BiOFI employs a comprehensive scoring system, incorporating intergroup differences, effects on feature correlation networks, and organism abundance. Meanwhile, it establishes a built-in database of metabolite-microbe-KEGG functional pathway linking relationships. Furthermore, BiOFI can rank related feature pairs by combining importance scores and correlation strength. Validation on a dataset of cesarean-section infants confirms the strategy's validity and interpretability. The BiOFI R package is freely accessible at https://github.com/chentianlu/BiOFI.
鉴于代谢组学和微生物组学的关键作用,许多数据挖掘方法都旨在揭示它们之间错综复杂的联系。然而,代谢组-微生物组图谱之间复杂的多对多关联产生了许多在统计学上有意义但在生物学上未经验证的候选者。为了应对这些挑战,我们引入了 BiOFI,这是一个用于识别代谢组-微生物组相关对(Bi-Omics)的战略框架。BiOFI 采用综合评分系统,将组间差异、对特征相关网络的影响以及生物丰度纳入其中。同时,它还建立了一个代谢物-微生物-KEGG 功能通路连接关系的内置数据库。此外,BiOFI 还能结合重要性得分和相关性强度对相关特征对进行排序。对剖腹产婴儿数据集的验证证实了该策略的有效性和可解释性。BiOFI R 软件包可在 https://github.com/chentianlu/BiOFI 免费获取。
{"title":"A cross-omics data analysis strategy for metabolite-microbe pair identification","authors":"Tao Sun, Dongnan Sun, Junliang Kuang, Xiaowen Chao, Yihan Guo, Mengci Li, Tianlu Chen","doi":"10.1002/pmic.202400035","DOIUrl":"10.1002/pmic.202400035","url":null,"abstract":"<p>Given the pivotal roles of metabolomics and microbiomics, numerous data mining approaches aim to uncover their intricate connections. However, the complex many-to-many associations between metabolome-microbiome profiles yield numerous statistically significant but biologically unvalidated candidates. To address these challenges, we introduce BiOFI, a strategic framework for identifying metabolome-microbiome correlation pairs (Bi-Omics). BiOFI employs a comprehensive scoring system, incorporating intergroup differences, effects on feature correlation networks, and organism abundance. Meanwhile, it establishes a built-in database of metabolite-microbe-KEGG functional pathway linking relationships. Furthermore, BiOFI can rank related feature pairs by combining importance scores and correlation strength. Validation on a dataset of cesarean-section infants confirms the strategy's validity and interpretability. The BiOFI R package is freely accessible at https://github.com/chentianlu/BiOFI.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 21-22","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141589067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Extracellular vesicles (EVs) are anucleate particles enclosed by a lipid bilayer that are released from cells via exocytosis or direct budding from the plasma membrane. They contain an array of important molecular cargo such as proteins, nucleic acids, and lipids, and can transfer these cargoes to recipient cells as a means of intercellular communication. One of the overarching paradigms in the field of EV research is that EV cargo should reflect the biological state of the cell of origin. The true relationship or extent of this correlation is confounded by many factors, including the numerous ways one can isolate or enrich EVs, overlap in the biophysical properties of different classes of EVs, and analytical limitations. This presents a challenge to research aimed at detecting low-abundant EV-encapsulated nucleic acids or proteins in biofluids for biomarker research and underpins technical obstacles in the confident assessment of the proteomic landscape of EVs that may be affected by sample-type specific or disease-associated proteoforms. Improving our understanding of EV biogenesis, cargo loading, and developments in top-down proteomics may guide us towards advanced approaches for selective EV and molecular cargo enrichment, which could aid EV diagnostics and therapeutics research.
细胞外囊泡(EVs)是由脂质双分子层包裹的无核颗粒,通过外泌或直接从质膜出芽的方式从细胞中释放出来。它们含有一系列重要的分子货物,如蛋白质、核酸和脂质,并能将这些货物转移到受体细胞,作为细胞间通信的一种手段。EV研究领域的一个重要范式是,EV货物应能反映来源细胞的生物状态。这种相关性的真实关系或程度受到许多因素的干扰,包括分离或富集 EVs 的多种方法、不同类别 EVs 生物物理特性的重叠以及分析的局限性。这给旨在检测生物流体中低丰度 EV 包被核酸或蛋白质以进行生物标记物研究的研究带来了挑战,同时也是对 EV 蛋白组学状况进行可靠评估的技术障碍,这些蛋白组学状况可能会受到样本类型特异性或疾病相关蛋白形式的影响。提高我们对 EV 生物发生、货物装载和自上而下蛋白质组学发展的认识,可能会引导我们采用先进的方法进行选择性 EV 和分子货物富集,这将有助于 EV 诊断和治疗研究。
{"title":"Playing pin-the-tail-on-the-protein in extracellular vesicle (EV) proteomics","authors":"Natalie P. Turner","doi":"10.1002/pmic.202400074","DOIUrl":"10.1002/pmic.202400074","url":null,"abstract":"<p>Extracellular vesicles (EVs) are anucleate particles enclosed by a lipid bilayer that are released from cells via exocytosis or direct budding from the plasma membrane. They contain an array of important molecular cargo such as proteins, nucleic acids, and lipids, and can transfer these cargoes to recipient cells as a means of intercellular communication. One of the overarching paradigms in the field of EV research is that EV cargo should reflect the biological state of the cell of origin. The true relationship or extent of this correlation is confounded by many factors, including the numerous ways one can isolate or enrich EVs, overlap in the biophysical properties of different classes of EVs, and analytical limitations. This presents a challenge to research aimed at detecting low-abundant EV-encapsulated nucleic acids or proteins in biofluids for biomarker research and underpins technical obstacles in the confident assessment of the proteomic landscape of EVs that may be affected by sample-type specific or disease-associated proteoforms. Improving our understanding of EV biogenesis, cargo loading, and developments in top-down proteomics may guide us towards advanced approaches for selective EV and molecular cargo enrichment, which could aid EV diagnostics and therapeutics research.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":"24 18","pages":""},"PeriodicalIF":3.4,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/pmic.202400074","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141425818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}