首页 > 最新文献

Journal of Proteome Research最新文献

英文 中文
Benzoyl Chloride Derivatization Advances the Quantification of Dissolved Polar Metabolites on Coral Reefs 苯甲酰氯衍生化推进了珊瑚礁溶解极性代谢物的定量。
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-23 DOI: 10.1021/acs.jproteome.4c00049
Brianna M. Garcia, Cynthia C. Becker, Laura Weber, Gretchen J. Swarr, Melissa C. Kido Soule, Amy Apprill* and Elizabeth B. Kujawinski*, 

Extracellular chemical cues constitute much of the language of life among marine organisms, from microbes to mammals. Changes in this chemical pool serve as invisible signals of overall ecosystem health and disruption to this finely tuned equilibrium. In coral reefs, the scope and magnitude of the chemicals involved in maintaining reef equilibria are largely unknown. Processes involving small, polar molecules, which form the majority components of labile dissolved organic carbon, are often poorly captured using traditional techniques. We employed chemical derivatization with mass spectrometry-based targeted exometabolomics to quantify polar dissolved phase metabolites on five coral reefs in the U.S. Virgin Islands. We quantified 45 polar exometabolites, demonstrated their spatial variability, and contextualized these findings in terms of geographic and benthic cover differences. By comparing our results to previously published coral reef exometabolomes, we show the novel quantification of 23 metabolites, including central carbon metabolism compounds (e.g., glutamate) and novel metabolites such as homoserine betaine. We highlight the immense potential of chemical derivatization-based exometabolomics for quantifying labile chemical cues on coral reefs and measuring molecular level responses to environmental stressors. Overall, improving our understanding of the composition and dynamics of reef exometabolites is vital for effective ecosystem monitoring and management strategies.

从微生物到哺乳动物,细胞外化学线索构成了海洋生物的大部分生命语言。这种化学物质库的变化是整个生态系统健康的无形信号,也是对这种微调平衡的破坏。在珊瑚礁中,参与维持珊瑚礁平衡的化学物质的范围和数量在很大程度上都是未知的。涉及小分子、极性分子(构成可溶性溶解有机碳的主要成分)的过程,传统技术往往难以捕捉。我们采用基于质谱的化学衍生化和靶向外代谢组学方法,对美属维尔京群岛五个珊瑚礁上的极性溶解相代谢物进行了量化。我们对 45 种极性外代谢物进行了量化,证明了它们的空间变异性,并从地理和海底覆盖差异的角度对这些发现进行了背景分析。通过将我们的结果与之前发表的珊瑚礁外代谢组进行比较,我们展示了 23 种代谢物的新定量,其中包括中心碳代谢化合物(如谷氨酸)和新型代谢物(如高丝氨酸甜菜碱)。我们强调了基于化学衍生化的外代谢组学在量化珊瑚礁上的易变化学线索和测量对环境压力因素的分子水平反应方面的巨大潜力。总之,提高我们对珊瑚礁外代谢物的组成和动态的了解对于有效的生态系统监测和管理策略至关重要。
{"title":"Benzoyl Chloride Derivatization Advances the Quantification of Dissolved Polar Metabolites on Coral Reefs","authors":"Brianna M. Garcia,&nbsp;Cynthia C. Becker,&nbsp;Laura Weber,&nbsp;Gretchen J. Swarr,&nbsp;Melissa C. Kido Soule,&nbsp;Amy Apprill* and Elizabeth B. Kujawinski*,&nbsp;","doi":"10.1021/acs.jproteome.4c00049","DOIUrl":"10.1021/acs.jproteome.4c00049","url":null,"abstract":"<p >Extracellular chemical cues constitute much of the language of life among marine organisms, from microbes to mammals. Changes in this chemical pool serve as invisible signals of overall ecosystem health and disruption to this finely tuned equilibrium. In coral reefs, the scope and magnitude of the chemicals involved in maintaining reef equilibria are largely unknown. Processes involving small, polar molecules, which form the majority components of labile dissolved organic carbon, are often poorly captured using traditional techniques. We employed chemical derivatization with mass spectrometry-based targeted exometabolomics to quantify polar dissolved phase metabolites on five coral reefs in the U.S. Virgin Islands. We quantified 45 polar exometabolites, demonstrated their spatial variability, and contextualized these findings in terms of geographic and benthic cover differences. By comparing our results to previously published coral reef exometabolomes, we show the novel quantification of 23 metabolites, including central carbon metabolism compounds (e.g., glutamate) and novel metabolites such as homoserine betaine. We highlight the immense potential of chemical derivatization-based exometabolomics for quantifying labile chemical cues on coral reefs and measuring molecular level responses to environmental stressors. Overall, improving our understanding of the composition and dynamics of reef exometabolites is vital for effective ecosystem monitoring and management strategies.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141086219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Proteomic Barcoding Platform for Macromolecular Screening and Delivery 用于大分子筛选和输送的蛋白质组条形码平台。
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-22 DOI: 10.1021/acs.jproteome.4c00068
Ning Wang, Nicole A. Mcneer, Elliot Eton, Josh Fass and Alex Kentsis*, 

Engineered macromolecules offer compelling means for the therapy of conventionally undruggable interactions in human disease. However, their efficacy is limited by barriers to tissue and intracellular delivery. Inspired by recent advances in molecular barcoding and evolution, we developed BarcodeBabel, a generalized method for the design of libraries of peptide barcodes suitable for high-throughput mass spectrometry proteomics. Combined with PeptideBabel, a Monte Carlo sampling algorithm for the design of peptides with evolvable physicochemical properties and sequence complexity, we developed a barcoded library of cell penetrating peptides (CPPs) with distinct physicochemical features. Using quantitative targeted mass spectrometry, we identified CPPS with improved nuclear and cytoplasmic delivery exceeding hundreds of millions of molecules per human cell while maintaining minimal membrane disruption and negligible toxicity in vitro. These studies provide a proof of concept for peptide barcoding as a homogeneous high-throughput method for macromolecular screening and delivery. BarcodeBabel and PeptideBabel are available open-source from https://github.com/kentsisresearchgroup/.

工程大分子为治疗人类疾病中传统上无法治疗的相互作用提供了令人信服的方法。然而,它们的疗效却受到组织和细胞内输送障碍的限制。受分子条形码和进化方面最新进展的启发,我们开发了 BarcodeBabel,这是一种设计肽条形码库的通用方法,适用于高通量质谱蛋白质组学。PeptideBabel 是一种蒙特卡洛抽样算法,用于设计具有可进化理化特性和序列复杂性的多肽,结合使用 PeptideBabel,我们开发出了具有独特理化特征的细胞穿透肽(CPPs)条形码库。通过定量靶向质谱分析,我们确定了具有更好的细胞核和细胞质递送能力的 CPPS,每个人体细胞的递送量超过数亿分子,同时保持最小的膜破坏和可忽略不计的体外毒性。这些研究证明了肽条形码作为大分子筛选和递送的同质高通量方法的概念。BarcodeBabel 和 PeptideBabel 可从 https://github.com/kentsisresearchgroup/ 获取开源信息。
{"title":"Proteomic Barcoding Platform for Macromolecular Screening and Delivery","authors":"Ning Wang,&nbsp;Nicole A. Mcneer,&nbsp;Elliot Eton,&nbsp;Josh Fass and Alex Kentsis*,&nbsp;","doi":"10.1021/acs.jproteome.4c00068","DOIUrl":"10.1021/acs.jproteome.4c00068","url":null,"abstract":"<p >Engineered macromolecules offer compelling means for the therapy of conventionally undruggable interactions in human disease. However, their efficacy is limited by barriers to tissue and intracellular delivery. Inspired by recent advances in molecular barcoding and evolution, we developed BarcodeBabel, a generalized method for the design of libraries of peptide barcodes suitable for high-throughput mass spectrometry proteomics. Combined with PeptideBabel, a Monte Carlo sampling algorithm for the design of peptides with evolvable physicochemical properties and sequence complexity, we developed a barcoded library of cell penetrating peptides (CPPs) with distinct physicochemical features. Using quantitative targeted mass spectrometry, we identified CPPS with improved nuclear and cytoplasmic delivery exceeding hundreds of millions of molecules per human cell while maintaining minimal membrane disruption and negligible toxicity in vitro. These studies provide a proof of concept for peptide barcoding as a homogeneous high-throughput method for macromolecular screening and delivery. BarcodeBabel and PeptideBabel are available open-source from https://github.com/kentsisresearchgroup/.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141079794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distinctive Lipid Characteristics of Colorectal Cancer Revealed through Non-targeted Lipidomics Analysis of Tongue Coating 通过对舌苔的非靶向脂质组学分析发现结直肠癌的独特脂质特征
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-22 DOI: 10.1021/acs.jproteome.4c00063
Qubo Chen*, Fengye Lin, Wanhua Li, Xiangyu Gu, Ying Chen, Hairong Su, Lu Zhang, Wen Zheng, Xuan Zeng, Xinyi Lu, Chuyang Wang, Weicheng Chen, Beiping Zhang, Haiyan Zhang and Meng Gong*, 

The metabolites and microbiota in tongue coating display distinct characteristics in certain digestive disorders, yet their relationship with colorectal cancer (CRC) remains unexplored. Here, we employed liquid chromatography coupled with tandem mass spectrometry to analyze the lipid composition of tongue coating using a nontargeted approach in 30 individuals with colorectal adenomas (CRA), 32 with CRC, and 30 healthy controls (HC). We identified 21 tongue coating lipids that effectively distinguished CRC from HC (AUC = 0.89), and 9 lipids that differentiated CRC from CRA (AUC = 0.9). Furthermore, we observed significant alterations in the tongue coating lipid composition in the CRC group compared to HC/CRA groups. As the adenoma-cancer sequence progressed, there was an increase in long-chain unsaturated triglycerides (TG) levels and a decrease in phosphatidylethanolamine plasmalogen (PE-P) levels. Furthermore, we noted a positive correlation between N-acyl ornithine (NAOrn), sphingomyelin (SM), and ceramide phosphoethanolamine (PE-Cer), potentially produced by members of the Bacteroidetes phylum. The levels of inflammatory lipid metabolite 12-HETE showed a decreasing trend with colorectal tumor progression, indicating the potential involvement of tongue coating microbiota and tumor immune regulation in early CRC development. Our findings highlight the potential utility of tongue coating lipid analysis as a noninvasive tool for CRC diagnosis.

舌苔中的代谢物和微生物群在某些消化系统疾病中表现出不同的特征,但它们与结直肠癌(CRC)的关系仍未得到研究。在此,我们采用液相色谱-串联质谱法,以非靶向方法分析了 30 名结直肠腺瘤(CRA)患者、32 名结直肠癌(CRC)患者和 30 名健康对照组(HC)的舌苔脂质组成。我们发现了 21 种有效区分 CRC 和 HC 的舌苔脂质(AUC = 0.89),以及区分 CRC 和 CRA 的 9 种脂质(AUC = 0.9)。此外,与 HC/CRA 组相比,我们观察到 CRC 组的舌苔脂质成分发生了明显变化。随着腺瘤-癌症序列的发展,长链不饱和甘油三酯(TG)水平升高,磷脂酰乙醇胺质原(PE-P)水平降低。此外,我们还注意到,N-酰鸟氨酸(NAOrn)、鞘磷脂(SM)和神经酰胺磷脂酰乙醇胺(PE-Cer)之间存在正相关,这可能是由类杆菌门成员产生的。炎性脂质代谢物 12-HETE 的水平随结直肠肿瘤的进展呈下降趋势,这表明舌苔微生物群和肿瘤免疫调节可能参与了早期结直肠肿瘤的发展。我们的研究结果凸显了舌苔脂质分析作为诊断 CRC 的无创工具的潜在作用。
{"title":"Distinctive Lipid Characteristics of Colorectal Cancer Revealed through Non-targeted Lipidomics Analysis of Tongue Coating","authors":"Qubo Chen*,&nbsp;Fengye Lin,&nbsp;Wanhua Li,&nbsp;Xiangyu Gu,&nbsp;Ying Chen,&nbsp;Hairong Su,&nbsp;Lu Zhang,&nbsp;Wen Zheng,&nbsp;Xuan Zeng,&nbsp;Xinyi Lu,&nbsp;Chuyang Wang,&nbsp;Weicheng Chen,&nbsp;Beiping Zhang,&nbsp;Haiyan Zhang and Meng Gong*,&nbsp;","doi":"10.1021/acs.jproteome.4c00063","DOIUrl":"10.1021/acs.jproteome.4c00063","url":null,"abstract":"<p >The metabolites and microbiota in tongue coating display distinct characteristics in certain digestive disorders, yet their relationship with colorectal cancer (CRC) remains unexplored. Here, we employed liquid chromatography coupled with tandem mass spectrometry to analyze the lipid composition of tongue coating using a nontargeted approach in 30 individuals with colorectal adenomas (CRA), 32 with CRC, and 30 healthy controls (HC). We identified 21 tongue coating lipids that effectively distinguished CRC from HC (AUC = 0.89), and 9 lipids that differentiated CRC from CRA (AUC = 0.9). Furthermore, we observed significant alterations in the tongue coating lipid composition in the CRC group compared to HC/CRA groups. As the adenoma-cancer sequence progressed, there was an increase in long-chain unsaturated triglycerides (TG) levels and a decrease in phosphatidylethanolamine plasmalogen (PE-P) levels. Furthermore, we noted a positive correlation between <i>N</i>-acyl ornithine (NAOrn), sphingomyelin (SM), and ceramide phosphoethanolamine (PE-Cer), potentially produced by members of the <i>Bacteroidetes</i> phylum. The levels of inflammatory lipid metabolite 12-HETE showed a decreasing trend with colorectal tumor progression, indicating the potential involvement of tongue coating microbiota and tumor immune regulation in early CRC development. Our findings highlight the potential utility of tongue coating lipid analysis as a noninvasive tool for CRC diagnosis.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jproteome.4c00063","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141074346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChIP-seq and RNA-seq Reveal the Involvement of Histone Lactylation Modification in Gestational Diabetes Mellitus ChIP-seq和RNA-seq揭示了组蛋白乳酰化修饰对妊娠糖尿病的影响
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-22 DOI: 10.1021/acs.jproteome.3c00727
Xiaman Huang, KaCheuk Yip, Hanhui Nie, Ruiping Chen, Xiufang Wang, Yun Wang, Weizhao Lin and Ruiman Li*, 

Lactylation is a novel post-translational modification of proteins. Although the histone lactylation modification has been reported to be involved in glucose metabolism, its role and molecular pathways in gestational diabetes mellitus (GDM) are still unclear. This study aims to elucidate the histone lactylation modification landscapes of GDM patients and explore lactylation-modification-related genes involved in GDM. We employed a combination of RNA-seq analysis and chromatin immunoprecipitation sequencing (ChIP-seq) analysis to identify upregulated differentially expressed genes (DEGs) with hyperhistone lactylation modification in GDM. We demonstrated that the levels of lactate and histone lactylation were significantly elevated in GDM patients. DEGs were involved in diabetes-related pathways, such as the PI3K-Akt signaling pathway, Jak-STAT signaling pathway, and mTOR signaling pathway. ChIP-seq analysis indicated that histone lactylation modification in the promoter regions of the GDM group was significantly changed. By integrating the results of RNA-seq and ChIP-seq analysis, we found that CACNA2D1 is a key gene for histone lactylation modification and is involved in the progression of GDM by promoting cell vitality and proliferation. In conclusion, we identified the key gene CACNA2D1, which upregulated and exhibited hypermodification of histone lactylation in GDM. These findings establish a theoretical groundwork for the targeted therapy of GDM.

乳化是一种新的蛋白质翻译后修饰。虽然有报道称组蛋白乳化修饰参与葡萄糖代谢,但其在妊娠糖尿病(GDM)中的作用和分子途径仍不清楚。本研究旨在阐明 GDM 患者的组蛋白乳酰化修饰图谱,并探索参与 GDM 的乳酰化修饰相关基因。我们采用RNA-seq分析和染色质免疫沉淀测序(ChIP-seq)分析相结合的方法,鉴定了GDM中高组蛋白乳酰化修饰的上调差异表达基因(DEGs)。结果表明,GDM 患者的乳酸水平和组蛋白乳酸化水平显著升高。DEGs参与了糖尿病相关通路,如PI3K-Akt信号通路、Jak-STAT信号通路和mTOR信号通路。ChIP-seq分析表明,GDM组启动子区域的组蛋白乳化修饰发生了显著变化。综合 RNA-seq 和 ChIP-seq 分析结果,我们发现 CACNA2D1 是组蛋白乳化修饰的关键基因,通过促进细胞活力和增殖参与 GDM 的进展。总之,我们发现了关键基因 CACNA2D1,它在 GDM 中上调并表现出组蛋白乳酰化的过度修饰。这些发现为 GDM 的靶向治疗奠定了理论基础。
{"title":"ChIP-seq and RNA-seq Reveal the Involvement of Histone Lactylation Modification in Gestational Diabetes Mellitus","authors":"Xiaman Huang,&nbsp;KaCheuk Yip,&nbsp;Hanhui Nie,&nbsp;Ruiping Chen,&nbsp;Xiufang Wang,&nbsp;Yun Wang,&nbsp;Weizhao Lin and Ruiman Li*,&nbsp;","doi":"10.1021/acs.jproteome.3c00727","DOIUrl":"10.1021/acs.jproteome.3c00727","url":null,"abstract":"<p >Lactylation is a novel post-translational modification of proteins. Although the histone lactylation modification has been reported to be involved in glucose metabolism, its role and molecular pathways in gestational diabetes mellitus (GDM) are still unclear. This study aims to elucidate the histone lactylation modification landscapes of GDM patients and explore lactylation-modification-related genes involved in GDM. We employed a combination of RNA-seq analysis and chromatin immunoprecipitation sequencing (ChIP-seq) analysis to identify upregulated differentially expressed genes (DEGs) with hyperhistone lactylation modification in GDM. We demonstrated that the levels of lactate and histone lactylation were significantly elevated in GDM patients. DEGs were involved in diabetes-related pathways, such as the PI3K-Akt signaling pathway, Jak-STAT signaling pathway, and mTOR signaling pathway. ChIP-seq analysis indicated that histone lactylation modification in the promoter regions of the GDM group was significantly changed. By integrating the results of RNA-seq and ChIP-seq analysis, we found that CACNA2D1 is a key gene for histone lactylation modification and is involved in the progression of GDM by promoting cell vitality and proliferation. In conclusion, we identified the key gene CACNA2D1, which upregulated and exhibited hypermodification of histone lactylation in GDM. These findings establish a theoretical groundwork for the targeted therapy of GDM.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141079792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIPI2: Sensitive Tag-Based Database Search to Identify Peptides with Multiple Post-translational Modifications PIPI2:基于敏感标签的数据库搜索,用于识别具有多种翻译后修饰的肽。
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-21 DOI: 10.1021/acs.jproteome.3c00819
Shengzhi Lai, Peize Zhao, Chen Zhou, Ning Li* and Weichuan Yu*, 

Peptide identification is important in bottom-up proteomics. Post-translational modifications (PTMs) are crucial in regulating cellular activities. Many database search methods have been developed to identify peptides with PTMs and characterize the PTM patterns. However, the PTMs on peptides hinder the peptide identification rate and the PTM characterization precision, especially for peptides with multiple PTMs. To address this issue, we present a sensitive open search engine, PIPI2, with much better performance on peptides with multiple PTMs than other methods. With a greedy approach, we simplify the PTM characterization problem into a linear one, which enables characterizing multiple PTMs on one peptide. On the simulation data sets with up to four PTMs per peptide, PIPI2 identified over 90% of the spectra, at least 56% more than five other competitors. PIPI2 also characterized these PTM patterns with the highest precision of 77%, demonstrating a significant advantage in handling peptides with multiple PTMs. In the real applications, PIPI2 identified 30% to 88% more peptides with PTMs than its competitors.

肽的鉴定在自下而上的蛋白质组学中非常重要。翻译后修饰(PTM)是调控细胞活动的关键。目前已开发出许多数据库搜索方法来识别具有 PTM 的多肽并描述 PTM 模式。然而,肽段上的 PTMs 会阻碍肽段识别率和 PTM 表征的精确度,尤其是对于具有多个 PTMs 的肽段。为了解决这个问题,我们提出了一种灵敏的开放式搜索引擎 PIPI2,它在多肽 PTM 方面的性能远远优于其他方法。通过贪婪方法,我们将 PTM 表征问题简化为线性问题,这样就能表征一条肽上的多个 PTM。在每条肽多达四个 PTM 的模拟数据集上,PIPI2 鉴定出了 90% 以上的光谱,比其他五种竞争者至少高出 56%。PIPI2 还以 77% 的最高精确度鉴定了这些 PTM 模式,这表明它在处理具有多个 PTM 的多肽时具有显著优势。在实际应用中,PIPI2 鉴定出的具有 PTM 的肽比竞争对手多 30% 到 88%。
{"title":"PIPI2: Sensitive Tag-Based Database Search to Identify Peptides with Multiple Post-translational Modifications","authors":"Shengzhi Lai,&nbsp;Peize Zhao,&nbsp;Chen Zhou,&nbsp;Ning Li* and Weichuan Yu*,&nbsp;","doi":"10.1021/acs.jproteome.3c00819","DOIUrl":"10.1021/acs.jproteome.3c00819","url":null,"abstract":"<p >Peptide identification is important in bottom-up proteomics. Post-translational modifications (PTMs) are crucial in regulating cellular activities. Many database search methods have been developed to identify peptides with PTMs and characterize the PTM patterns. However, the PTMs on peptides hinder the peptide identification rate and the PTM characterization precision, especially for peptides with multiple PTMs. To address this issue, we present a sensitive open search engine, PIPI2, with much better performance on peptides with multiple PTMs than other methods. With a greedy approach, we simplify the PTM characterization problem into a linear one, which enables characterizing multiple PTMs on one peptide. On the simulation data sets with up to four PTMs per peptide, PIPI2 identified over 90% of the spectra, at least 56% more than five other competitors. PIPI2 also characterized these PTM patterns with the highest precision of 77%, demonstrating a significant advantage in handling peptides with multiple PTMs. In the real applications, PIPI2 identified 30% to 88% more peptides with PTMs than its competitors.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141069610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secretome of Cancer-Associated Fibroblasts (CAFs) Influences Drug Sensitivity in Cancer Cells 癌症相关成纤维细胞(CAFs)的分泌组影响癌细胞对药物的敏感性
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-20 DOI: 10.1021/acs.jproteome.4c00112
Rachel Lau, Lu Yu, Theodoros I. Roumeliotis, Adam Stewart, Lisa Pickard, Jyoti S. Choudhary* and Udai Banerji*, 

Resistance is a major problem with effective cancer treatment and the stroma forms a significant portion of the tumor mass but traditional drug screens involve cancer cells alone. Cancer-associated fibroblasts (CAFs) are a major tumor stroma component and its secreted proteins may influence the function of cancer cells. The majority of secretome studies compare different cancer or CAF cell lines exclusively. Here, we present the direct characterization of the secreted protein profiles between CAFs and KRAS mutant-cancer cell lines from colorectal, lung, and pancreatic tissues using multiplexed mass spectrometry. 2573 secreted proteins were annotated, and differential analysis highlighted understudied CAF-enriched secreted proteins, including Wnt family member 5B (WNT5B), in addition to established CAF markers, such as collagens. The functional role of CAF secreted proteins was explored by assessing its effect on the response to 97 anticancer drugs since stromal cells may cause a differing cancer drug response, which may be missed on routine drug screening using cancer cells alone. CAF secreted proteins caused specific effects on each of the cancer cell lines, which highlights the complexity and challenges in cancer treatment and so the importance to consider stromal elements.

抗药性是有效治疗癌症的一个主要问题,而基质构成了肿瘤的重要部分,但传统的药物筛选仅涉及癌细胞。癌症相关成纤维细胞(CAF)是肿瘤基质的主要组成部分,其分泌的蛋白质可能会影响癌细胞的功能。大多数分泌组研究只对不同的癌细胞或 CAF 细胞系进行比较。在这里,我们利用多重质谱技术直接鉴定了来自结直肠、肺和胰腺组织的 CAFs 和 KRAS 突变癌细胞系的分泌蛋白特征。除了胶原蛋白等已确立的 CAF 标志物外,差异分析还强调了未被充分研究的 CAF 富集分泌蛋白,包括 Wnt 家族成员 5B (WNT5B)。我们通过评估CAF分泌蛋白对97种抗癌药物反应的影响来探索CAF分泌蛋白的功能作用,因为基质细胞可能会引起不同的抗癌药物反应,而仅使用癌细胞进行常规药物筛选可能会漏掉这些反应。CAF分泌蛋白对每种癌细胞系都产生了特定的影响,这凸显了癌症治疗的复杂性和挑战性,因此考虑基质的重要性不言而喻。
{"title":"Secretome of Cancer-Associated Fibroblasts (CAFs) Influences Drug Sensitivity in Cancer Cells","authors":"Rachel Lau,&nbsp;Lu Yu,&nbsp;Theodoros I. Roumeliotis,&nbsp;Adam Stewart,&nbsp;Lisa Pickard,&nbsp;Jyoti S. Choudhary* and Udai Banerji*,&nbsp;","doi":"10.1021/acs.jproteome.4c00112","DOIUrl":"10.1021/acs.jproteome.4c00112","url":null,"abstract":"<p >Resistance is a major problem with effective cancer treatment and the stroma forms a significant portion of the tumor mass but traditional drug screens involve cancer cells alone. Cancer-associated fibroblasts (CAFs) are a major tumor stroma component and its secreted proteins may influence the function of cancer cells. The majority of secretome studies compare different cancer or CAF cell lines exclusively. Here, we present the direct characterization of the secreted protein profiles between CAFs and <i>KRAS</i> mutant-cancer cell lines from colorectal, lung, and pancreatic tissues using multiplexed mass spectrometry. 2573 secreted proteins were annotated, and differential analysis highlighted understudied CAF-enriched secreted proteins, including Wnt family member 5B (WNT5B), in addition to established CAF markers, such as collagens. The functional role of CAF secreted proteins was explored by assessing its effect on the response to 97 anticancer drugs since stromal cells may cause a differing cancer drug response, which may be missed on routine drug screening using cancer cells alone. CAF secreted proteins caused specific effects on each of the cancer cell lines, which highlights the complexity and challenges in cancer treatment and so the importance to consider stromal elements.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jproteome.4c00112","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141064589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Offline Two-Dimensional Liquid Chromatography–Mass Spectrometry for Deep Annotation of the Fecal Metabolome Following Fecal Microbiota Transplantation 离线二维液相色谱-质谱法深度注释粪便微生物群移植后的粪便代谢组
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-16 DOI: 10.1021/acs.jproteome.4c00022
Brady G. Anderson, Alexander Raskind, Rylan Hissong, Michael K. Dougherty, Sarah K. McGill, Ajay S. Gulati, Casey M. Theriot, Robert T. Kennedy and Charles R. Evans*, 

Biological interpretation of untargeted LC-MS-based metabolomics data depends on accurate compound identification, but current techniques fall short of identifying most features that can be detected. The human fecal metabolome is complex, variable, incompletely annotated, and serves as an ideal matrix to evaluate novel compound identification methods. We devised an experimental strategy for compound annotation using multidimensional chromatography and semiautomated feature alignment and applied these methods to study the fecal metabolome in the context of fecal microbiota transplantation (FMT) for recurrent C. difficile infection. Pooled fecal samples were fractionated using semipreparative liquid chromatography and analyzed by an orthogonal LC-MS/MS method. The resulting spectra were searched against commercial, public, and local spectral libraries, and annotations were vetted using retention time alignment and prediction. Multidimensional chromatography yielded more than a 2-fold improvement in identified compounds compared to conventional LC-MS/MS and successfully identified several rare and previously unreported compounds, including novel fatty-acid conjugated bile acid species. Using an automated software-based feature alignment strategy, most metabolites identified by the new approach could be matched to features that were detected but not identified in single-dimensional LC-MS/MS data. Overall, our approach represents a powerful strategy to enhance compound identification and biological insight from untargeted metabolomics data.

对基于 LC-MS 的非靶向代谢组学数据的生物学解释取决于准确的化合物鉴定,但目前的技术无法鉴定出可以检测到的大多数特征。人类粪便代谢组复杂多变,注释不全,是评估新型化合物鉴定方法的理想基质。我们设计了一种利用多维色谱法和半自动特征比对进行化合物注释的实验策略,并将这些方法用于研究粪便微生物群移植(FMT)治疗艰难梭菌复发性感染时的粪便代谢组。使用半分离液相色谱法对汇集的粪便样本进行分馏,并采用正交 LC-MS/MS 方法进行分析。根据商业、公共和本地光谱库搜索得到的光谱,并通过保留时间比对和预测对注释进行审核。与传统的 LC-MS/MS 相比,多维色谱法的化合物鉴定率提高了 2 倍多,并成功鉴定了几种罕见的、以前未报道过的化合物,包括新型脂肪酸共轭胆汁酸种类。利用基于软件的自动特征比对策略,新方法鉴定出的大多数代谢物都能与单维 LC-MS/MS 数据中检测到但未鉴定出的特征相匹配。总之,我们的方法是一种从非靶向代谢组学数据中提高化合物鉴定和生物洞察力的强大策略。
{"title":"Offline Two-Dimensional Liquid Chromatography–Mass Spectrometry for Deep Annotation of the Fecal Metabolome Following Fecal Microbiota Transplantation","authors":"Brady G. Anderson,&nbsp;Alexander Raskind,&nbsp;Rylan Hissong,&nbsp;Michael K. Dougherty,&nbsp;Sarah K. McGill,&nbsp;Ajay S. Gulati,&nbsp;Casey M. Theriot,&nbsp;Robert T. Kennedy and Charles R. Evans*,&nbsp;","doi":"10.1021/acs.jproteome.4c00022","DOIUrl":"10.1021/acs.jproteome.4c00022","url":null,"abstract":"<p >Biological interpretation of untargeted LC-MS-based metabolomics data depends on accurate compound identification, but current techniques fall short of identifying most features that can be detected. The human fecal metabolome is complex, variable, incompletely annotated, and serves as an ideal matrix to evaluate novel compound identification methods. We devised an experimental strategy for compound annotation using multidimensional chromatography and semiautomated feature alignment and applied these methods to study the fecal metabolome in the context of fecal microbiota transplantation (FMT) for recurrent <i>C. difficile</i> infection. Pooled fecal samples were fractionated using semipreparative liquid chromatography and analyzed by an orthogonal LC-MS/MS method. The resulting spectra were searched against commercial, public, and local spectral libraries, and annotations were vetted using retention time alignment and prediction. Multidimensional chromatography yielded more than a 2-fold improvement in identified compounds compared to conventional LC-MS/MS and successfully identified several rare and previously unreported compounds, including novel fatty-acid conjugated bile acid species. Using an automated software-based feature alignment strategy, most metabolites identified by the new approach could be matched to features that were detected but not identified in single-dimensional LC-MS/MS data. Overall, our approach represents a powerful strategy to enhance compound identification and biological insight from untargeted metabolomics data.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140943318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multigenerational Effect of Heat Stress on the Drosophila melanogaster Sperm Proteome 热应激对黑腹果蝇精子蛋白质组的多代影响
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-14 DOI: 10.1021/acs.jproteome.4c00205
Shagufta Khan,  and , Rakesh K. Mishra*, 

The effect of the parental environment on offspring through non-DNA sequence-based mechanisms, such as DNA methylation, chromatin modifications, noncoding RNAs, and proteins, could only be established after the conception of “epigenetics”. These effects are now broadly referred to as multigenerational epigenetic effects. Despite accumulating evidence of male gamete-mediated multigenerational epigenetic inheritance, little is known about the factors that underlie heat stress-induced multigenerational epigenetic inheritance via the male germline in Drosophila. In this study, we address this gap by utilizing an established heat stress paradigm in Drosophila and investigating its multigenerational effect on the sperm proteome. Our findings indicate that multigenerational heat stress during the early embryonic stage significantly influences proteins in the sperm associated with translation, chromatin organization, microtubule-based processes, and the generation of metabolites and energy. Assessment of life-history traits revealed that reproductive fitness and stress tolerance remained unaffected by multigenerational heat stress. Our study offers initial insights into the chromatin-based epigenetic mechanisms as a plausible means of transmitting heat stress memory through the male germline in Drosophila. Furthermore, it sheds light on the repercussions of early embryonic heat stress on male reproductive potential. The data sets from this study are available at the ProteomeXchange Consortium under the identifier PXD037488.

亲代环境通过非 DNA 序列机制(如 DNA 甲基化、染色质修饰、非编码 RNA 和蛋白质)对后代的影响,只有在 "表观遗传学 "概念提出后才得以确立。这些效应现在被广泛称为多代表观遗传效应。尽管雄性配子介导的多代表观遗传的证据越来越多,但人们对果蝇热应激通过雄性生殖系诱导多代表观遗传的因素知之甚少。在本研究中,我们利用果蝇中已确立的热应激范例,研究了热应激对精子蛋白质组的多代影响,从而弥补了这一空白。我们的研究结果表明,胚胎早期的多代热应激会显著影响精子中与翻译、染色质组织、基于微管的过程以及代谢物和能量的产生有关的蛋白质。对生命史性状的评估表明,生殖适应性和应激耐受性不受多代热胁迫的影响。我们的研究初步揭示了基于染色质的表观遗传机制是果蝇通过雄性生殖系传递热应激记忆的一种合理方式。此外,它还揭示了早期胚胎热应激对雄性生殖潜能的影响。这项研究的数据集可在蛋白质组交换联盟(ProteomeXchange Consortium)上查阅,其标识符为 PXD037488。
{"title":"Multigenerational Effect of Heat Stress on the Drosophila melanogaster Sperm Proteome","authors":"Shagufta Khan,&nbsp; and ,&nbsp;Rakesh K. Mishra*,&nbsp;","doi":"10.1021/acs.jproteome.4c00205","DOIUrl":"10.1021/acs.jproteome.4c00205","url":null,"abstract":"<p >The effect of the parental environment on offspring through non-DNA sequence-based mechanisms, such as DNA methylation, chromatin modifications, noncoding RNAs, and proteins, could only be established after the conception of “epigenetics”. These effects are now broadly referred to as multigenerational epigenetic effects. Despite accumulating evidence of male gamete-mediated multigenerational epigenetic inheritance, little is known about the factors that underlie heat stress-induced multigenerational epigenetic inheritance via the male germline in <i>Drosophila</i>. In this study, we address this gap by utilizing an established heat stress paradigm in <i>Drosophila</i> and investigating its multigenerational effect on the sperm proteome. Our findings indicate that multigenerational heat stress during the early embryonic stage significantly influences proteins in the sperm associated with translation, chromatin organization, microtubule-based processes, and the generation of metabolites and energy. Assessment of life-history traits revealed that reproductive fitness and stress tolerance remained unaffected by multigenerational heat stress. Our study offers initial insights into the chromatin-based epigenetic mechanisms as a plausible means of transmitting heat stress memory through the male germline in <i>Drosophila</i>. Furthermore, it sheds light on the repercussions of early embryonic heat stress on male reproductive potential. The data sets from this study are available at the ProteomeXchange Consortium under the identifier PXD037488.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140920342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identification and Quantification of Human Relaxin Proteins by Immunoaffinity-Mass Spectrometry 用免疫亲和质谱法鉴定和定量人类松弛素蛋白
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-13 DOI: 10.1021/acs.jproteome.4c00027
Yasmine Rais,  and , Andrei P. Drabovich*, 

The human relaxins belong to the Insulin/IGF/Relaxin superfamily of peptide hormones, and their physiological function is primarily associated with reproduction. In this study, we focused on a prostate tissue-specific relaxin RLN1 (REL1_HUMAN protein) and a broader tissue specificity RLN2 (REL2_HUMAN protein). Due to their structural similarity, REL1 and REL2 proteins were collectively named a ‘human relaxin protein’ in previous studies and were exclusively measured by immunoassays. We hypothesized that the highly selective and sensitive immunoaffinity-selected reaction monitoring (IA-SRM) assays would reveal the identity and abundance of the endogenous REL1 and REL2 in biological samples and facilitate the evaluation of these proteins for diagnostic applications. High levels of RLN1 and RLN2 transcripts were found in prostate and breast cancer cell lines by RT-PCR. However, no endogenous prorelaxin-1 or mature REL1 were detected by IA-SRM in cell lines, seminal plasma, or blood serum. The IA-SRM assay of REL2 demonstrated its undetectable levels (<9.4 pg/mL) in healthy control female and male sera and relatively high levels of REL2 in maternal sera across different gestational weeks (median 331 pg/mL; N = 120). IA-SRM assays uncovered potential cross-reactivity and nonspecific binding for relaxin immunoassays. The developed IA-SRM assays will facilitate the investigation of the physiological and pathological roles of REL1 and REL2 proteins.

人类松弛素属于胰岛素/IGF/松弛素超家族肽类激素,其生理功能主要与生殖有关。在这项研究中,我们重点研究了前列腺组织特异性松弛素 RLN1(REL1_HUMAN 蛋白)和更广泛组织特异性松弛素 RLN2(REL2_HUMAN 蛋白)。由于结构相似,REL1 和 REL2 蛋白在以前的研究中被统称为 "人类松弛素蛋白",并且只能通过免疫测定法进行测定。我们假设,高选择性和高灵敏度的免疫亲和力选择反应监测(IA-SRM)测定法将揭示生物样本中内源性 REL1 和 REL2 的特性和丰度,并有助于评估这些蛋白质的诊断应用。通过 RT-PCR 法,在前列腺癌和乳腺癌细胞系中发现了高水平的 RLN1 和 RLN2 转录本。然而,在细胞系、精浆或血清中,IA-SRM 没有检测到内源性原松弛素-1 或成熟的 REL1。对 REL2 进行的 IA-SRM 检测显示其检测不到(N = 120)。IA-SRM 检测发现了松弛素免疫检测的潜在交叉反应和非特异性结合。所开发的 IA-SRM 检测方法将有助于研究 REL1 和 REL2 蛋白的生理和病理作用。
{"title":"Identification and Quantification of Human Relaxin Proteins by Immunoaffinity-Mass Spectrometry","authors":"Yasmine Rais,&nbsp; and ,&nbsp;Andrei P. Drabovich*,&nbsp;","doi":"10.1021/acs.jproteome.4c00027","DOIUrl":"10.1021/acs.jproteome.4c00027","url":null,"abstract":"<p >The human relaxins belong to the Insulin/IGF/Relaxin superfamily of peptide hormones, and their physiological function is primarily associated with reproduction. In this study, we focused on a prostate tissue-specific relaxin RLN1 (REL1_HUMAN protein) and a broader tissue specificity RLN2 (REL2_HUMAN protein). Due to their structural similarity, REL1 and REL2 proteins were collectively named a ‘human relaxin protein’ in previous studies and were exclusively measured by immunoassays. We hypothesized that the highly selective and sensitive immunoaffinity-selected reaction monitoring (IA-SRM) assays would reveal the identity and abundance of the endogenous REL1 and REL2 in biological samples and facilitate the evaluation of these proteins for diagnostic applications. High levels of RLN1 and RLN2 transcripts were found in prostate and breast cancer cell lines by RT-PCR. However, no endogenous prorelaxin-1 or mature REL1 were detected by IA-SRM in cell lines, seminal plasma, or blood serum. The IA-SRM assay of REL2 demonstrated its undetectable levels (&lt;9.4 pg/mL) in healthy control female and male sera and relatively high levels of REL2 in maternal sera across different gestational weeks (median 331 pg/mL; <i>N</i> = 120). IA-SRM assays uncovered potential cross-reactivity and nonspecific binding for relaxin immunoassays. The developed IA-SRM assays will facilitate the investigation of the physiological and pathological roles of REL1 and REL2 proteins.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140915424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition 词汇很重要:用于酶命名实体识别的注释管道和四种深度学习算法。
IF 4.4 2区 生物学 Q1 Chemistry Pub Date : 2024-05-11 DOI: 10.1021/acs.jproteome.3c00367
Meiqi Wang, Avish Vijayaraghavan, Tim Beck* and Joram M. Posma*, 

Enzymes are indispensable in many biological processes, and with biomedical literature growing exponentially, effective literature review becomes increasingly challenging. Natural language processing methods offer solutions to streamline this process. This study aims to develop an annotated enzyme corpus for training and evaluating enzyme named entity recognition (NER) models. A novel pipeline, combining dictionary matching and rule-based keyword searching, automatically annotated enzyme entities in >4800 full-text publications. Four deep learning NER models were created with different vocabularies (BioBERT/SciBERT) and architectures (BiLSTM/transformer) and evaluated on 526 manually annotated full-text publications. The annotation pipeline achieved an F1-score of 0.86 (precision = 1.00, recall = 0.76), surpassed by fine-tuned transformers for F1-score (BioBERT: 0.89, SciBERT: 0.88) and recall (0.86) with BiLSTM models having higher precision (0.94) than transformers (0.92). The annotation pipeline runs in seconds on standard laptops with almost perfect precision, but was outperformed by fine-tuned transformers in terms of F1-score and recall, demonstrating generalizability beyond the training data. In comparison, SciBERT-based models exhibited higher precision, and BioBERT-based models exhibited higher recall, highlighting the importance of vocabulary and architecture. These models, representing the first enzyme NER algorithms, enable more effective enzyme text mining and information extraction. Codes for automated annotation and model generation are available from https://github.com/omicsNLP/enzymeNER and https://zenodo.org/doi/10.5281/zenodo.10581586.

酶在许多生物过程中都是不可或缺的,而随着生物医学文献呈指数级增长,有效的文献综述变得越来越具有挑战性。自然语言处理方法为简化这一过程提供了解决方案。本研究旨在开发一个有注释的酶语料库,用于训练和评估酶命名实体识别(NER)模型。一个结合了词典匹配和基于规则的关键词搜索的新型管道自动注释了 >4800 篇全文出版物中的酶实体。利用不同的词汇表(BioBERT/SciBERT)和架构(BiLSTM/transformer)创建了四个深度学习 NER 模型,并在 526 篇人工标注的全文出版物上进行了评估。注释管道的 F1 分数为 0.86(精确度 = 1.00,召回率 = 0.76),微调转换器的 F1 分数(BioBERT:0.89,SciBERT:0.88)和召回率(0.86)超过了注释管道,BiLSTM 模型的精确度(0.94)高于转换器(0.92)。注释管道在标准笔记本电脑上的运行时间仅为几秒钟,精确度几乎完美,但在 F1 分数和召回率方面却优于经过微调的转换器,这证明了训练数据之外的通用性。相比之下,基于 SciBERT 的模型表现出更高的精确度,而基于 BioBERT 的模型表现出更高的召回率,这凸显了词汇和架构的重要性。这些模型代表了第一种酶核酸还原算法,能更有效地进行酶文本挖掘和信息提取。自动注释和模型生成的代码可从 https://github.com/omicsNLP/enzymeNER 和 https://zenodo.org/doi/10.5281/zenodo.10581586 获取。
{"title":"Vocabulary Matters: An Annotation Pipeline and Four Deep Learning Algorithms for Enzyme Named Entity Recognition","authors":"Meiqi Wang,&nbsp;Avish Vijayaraghavan,&nbsp;Tim Beck* and Joram M. Posma*,&nbsp;","doi":"10.1021/acs.jproteome.3c00367","DOIUrl":"10.1021/acs.jproteome.3c00367","url":null,"abstract":"<p >Enzymes are indispensable in many biological processes, and with biomedical literature growing exponentially, effective literature review becomes increasingly challenging. Natural language processing methods offer solutions to streamline this process. This study aims to develop an annotated enzyme corpus for training and evaluating enzyme named entity recognition (NER) models. A novel pipeline, combining dictionary matching and rule-based keyword searching, automatically annotated enzyme entities in &gt;4800 full-text publications. Four deep learning NER models were created with different vocabularies (BioBERT/SciBERT) and architectures (BiLSTM/transformer) and evaluated on 526 manually annotated full-text publications. The annotation pipeline achieved an <i>F</i>1-score of 0.86 (precision = 1.00, recall = 0.76), surpassed by fine-tuned transformers for <i>F</i>1-score (BioBERT: 0.89, SciBERT: 0.88) and recall (0.86) with BiLSTM models having higher precision (0.94) than transformers (0.92). The annotation pipeline runs in seconds on standard laptops with almost perfect precision, but was outperformed by fine-tuned transformers in terms of <i>F</i>1-score and recall, demonstrating generalizability beyond the training data. In comparison, SciBERT-based models exhibited higher precision, and BioBERT-based models exhibited higher recall, highlighting the importance of vocabulary and architecture. These models, representing the first enzyme NER algorithms, enable more effective enzyme text mining and information extraction. Codes for automated annotation and model generation are available from https://github.com/omicsNLP/enzymeNER and https://zenodo.org/doi/10.5281/zenodo.10581586.</p>","PeriodicalId":48,"journal":{"name":"Journal of Proteome Research","volume":null,"pages":null},"PeriodicalIF":4.4,"publicationDate":"2024-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.acs.org/doi/epdf/10.1021/acs.jproteome.3c00367","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140908252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Proteome Research
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1