Deep Learning Enhances Precision of Citrullination Identification in Human and Plant Tissue Proteomes.

IF 6.1 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS Molecular & Cellular Proteomics Pub Date : 2025-02-05 DOI:10.1016/j.mcpro.2025.100924
Wassim Gabriel, Rebecca Meelker Gonzalez, Sophia Laposchan, Erik Riedel, Gönül Dündar, Brigitte Poppenberger, Mathias Wilhelm, Chien-Yun Lee
{"title":"Deep Learning Enhances Precision of Citrullination Identification in Human and Plant Tissue Proteomes.","authors":"Wassim Gabriel, Rebecca Meelker Gonzalez, Sophia Laposchan, Erik Riedel, Gönül Dündar, Brigitte Poppenberger, Mathias Wilhelm, Chien-Yun Lee","doi":"10.1016/j.mcpro.2025.100924","DOIUrl":null,"url":null,"abstract":"<p><p>Citrullination is a critical yet understudied post-translational modification (PTM) implicated in various biological processes. Exploring its role in health and disease requires a comprehensive understanding of the prevalence of this PTM at a proteome-wide scale. Although mass spectrometry has enabled the identification of citrullination sites in complex biological samples, it faces significant challenges, including limited enrichment tools and a high rate of false positives due to the identical mass with deamidation (+0.9840 Da) and errors in monoisotopic ion selection. These issues often necessitate manual spectrum inspection, reducing throughput in large-scale studies. In this work, we present a novel data analysis pipeline that incorporates the deep learning model Prosit-Cit into the MS database search workflow to improve both the sensitivity and precision of citrullination site identification. Prosit-Cit, an extension of the existing Prosit model, has been trained on ∼53,000 spectra from ∼2,500 synthetic citrullinated peptides and provides precise predictions for chromatographic retention time and fragment ion intensities of both citrullinated and deamidated peptides. This enhances the accuracy of identification and reduces false positives. Our pipeline demonstrated high precision on the evaluation dataset, recovering the majority of known citrullination sites in human tissue proteomes and improving sensitivity by identifying up to 14 times more citrullinated sites. Sequence motif analysis revealed consistency with previously reported findings, validating the reliability of our approach. Furthermore, extending the pipeline to a tissue proteome dataset of the model plant Arabidopsis thaliana enabled the identification of ∼200 citrullination sites across 169 proteins from 30 tissues, representing the first large-scale citrullination mapping in plants. This pipeline can be seamlessly applied to existing proteomics datasets, offering a robust tool for advancing biological discoveries and deepening our understanding of protein citrullination across species.</p>","PeriodicalId":18712,"journal":{"name":"Molecular & Cellular Proteomics","volume":" ","pages":"100924"},"PeriodicalIF":6.1000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular & Cellular Proteomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.mcpro.2025.100924","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Citrullination is a critical yet understudied post-translational modification (PTM) implicated in various biological processes. Exploring its role in health and disease requires a comprehensive understanding of the prevalence of this PTM at a proteome-wide scale. Although mass spectrometry has enabled the identification of citrullination sites in complex biological samples, it faces significant challenges, including limited enrichment tools and a high rate of false positives due to the identical mass with deamidation (+0.9840 Da) and errors in monoisotopic ion selection. These issues often necessitate manual spectrum inspection, reducing throughput in large-scale studies. In this work, we present a novel data analysis pipeline that incorporates the deep learning model Prosit-Cit into the MS database search workflow to improve both the sensitivity and precision of citrullination site identification. Prosit-Cit, an extension of the existing Prosit model, has been trained on ∼53,000 spectra from ∼2,500 synthetic citrullinated peptides and provides precise predictions for chromatographic retention time and fragment ion intensities of both citrullinated and deamidated peptides. This enhances the accuracy of identification and reduces false positives. Our pipeline demonstrated high precision on the evaluation dataset, recovering the majority of known citrullination sites in human tissue proteomes and improving sensitivity by identifying up to 14 times more citrullinated sites. Sequence motif analysis revealed consistency with previously reported findings, validating the reliability of our approach. Furthermore, extending the pipeline to a tissue proteome dataset of the model plant Arabidopsis thaliana enabled the identification of ∼200 citrullination sites across 169 proteins from 30 tissues, representing the first large-scale citrullination mapping in plants. This pipeline can be seamlessly applied to existing proteomics datasets, offering a robust tool for advancing biological discoveries and deepening our understanding of protein citrullination across species.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular & Cellular Proteomics
Molecular & Cellular Proteomics 生物-生化研究方法
CiteScore
11.50
自引率
4.30%
发文量
131
审稿时长
84 days
期刊介绍: The mission of MCP is to foster the development and applications of proteomics in both basic and translational research. MCP will publish manuscripts that report significant new biological or clinical discoveries underpinned by proteomic observations across all kingdoms of life. Manuscripts must define the biological roles played by the proteins investigated or their mechanisms of action. The journal also emphasizes articles that describe innovative new computational methods and technological advancements that will enable future discoveries. Manuscripts describing such approaches do not have to include a solution to a biological problem, but must demonstrate that the technology works as described, is reproducible and is appropriate to uncover yet unknown protein/proteome function or properties using relevant model systems or publicly available data. Scope: -Fundamental studies in biology, including integrative "omics" studies, that provide mechanistic insights -Novel experimental and computational technologies -Proteogenomic data integration and analysis that enable greater understanding of physiology and disease processes -Pathway and network analyses of signaling that focus on the roles of post-translational modifications -Studies of proteome dynamics and quality controls, and their roles in disease -Studies of evolutionary processes effecting proteome dynamics, quality and regulation -Chemical proteomics, including mechanisms of drug action -Proteomics of the immune system and antigen presentation/recognition -Microbiome proteomics, host-microbe and host-pathogen interactions, and their roles in health and disease -Clinical and translational studies of human diseases -Metabolomics to understand functional connections between genes, proteins and phenotypes
期刊最新文献
HEXB drives raised paucimannosylation in colorectal cancer and stratifies patient risk. Identifying receptor kinase substrates using an 8,000 peptide kinase client library enriched for conserved phosphorylation sites. Glycoproteoforms of osteoarthritis-associated lubricin in plasma and synovial fluid. Deep Learning Enhances Precision of Citrullination Identification in Human and Plant Tissue Proteomes. Multiple classes of antigen contribute to the antigenic landscape of mesothelioma.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1