Katharina Waury, Dea Gogishvili, Rienk Nieuwland, Madhurima Chatterjee, Charlotte E. Teunissen, Sanne Abeln
{"title":"蛋白质组编码的蛋白质分选到细胞外囊泡的决定因素","authors":"Katharina Waury, Dea Gogishvili, Rienk Nieuwland, Madhurima Chatterjee, Charlotte E. Teunissen, Sanne Abeln","doi":"10.1002/jex2.120","DOIUrl":null,"url":null,"abstract":"<p>Extracellular vesicles (EVs) are membranous structures released by cells into the extracellular space and are thought to be involved in cell-to-cell communication. While EVs and their cargo are promising biomarker candidates, sorting mechanisms of proteins to EVs remain unclear. In this study, we ask if it is possible to determine EV association based on the protein sequence. Additionally, we ask what the most important determinants are for EV association. We answer these questions with explainable AI models, using human proteome data from EV databases to train and validate the model. It is essential to correct the datasets for contaminants introduced by coarse EV isolation workflows and for experimental bias caused by mass spectrometry. In this study, we show that it is indeed possible to predict EV association from the protein sequence: a simple sequence-based model for predicting EV proteins achieved an area under the curve of 0.77 ± 0.01, which increased further to 0.84 ± 0.00 when incorporating curated post-translational modification (PTM) annotations. Feature analysis shows that EV-associated proteins are stable, polar, and structured with low isoelectric point compared to non-EV proteins. PTM annotations emerged as the most important features for correct classification; specifically, palmitoylation is one of the most prevalent EV sorting mechanisms for unique proteins. Palmitoylation and nitrosylation sites are especially prevalent in EV proteins that are determined by very strict isolation protocols, indicating they could potentially serve as quality control criteria for future studies. This computational study offers an effective sequence-based predictor of EV associated proteins with extensive characterisation of the human EV proteome that can explain for individual proteins which factors contribute to their EV association.</p>","PeriodicalId":73747,"journal":{"name":"Journal of extracellular biology","volume":"3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jex2.120","citationCount":"0","resultStr":"{\"title\":\"Proteome encoded determinants of protein sorting into extracellular vesicles\",\"authors\":\"Katharina Waury, Dea Gogishvili, Rienk Nieuwland, Madhurima Chatterjee, Charlotte E. Teunissen, Sanne Abeln\",\"doi\":\"10.1002/jex2.120\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Extracellular vesicles (EVs) are membranous structures released by cells into the extracellular space and are thought to be involved in cell-to-cell communication. While EVs and their cargo are promising biomarker candidates, sorting mechanisms of proteins to EVs remain unclear. In this study, we ask if it is possible to determine EV association based on the protein sequence. Additionally, we ask what the most important determinants are for EV association. We answer these questions with explainable AI models, using human proteome data from EV databases to train and validate the model. It is essential to correct the datasets for contaminants introduced by coarse EV isolation workflows and for experimental bias caused by mass spectrometry. In this study, we show that it is indeed possible to predict EV association from the protein sequence: a simple sequence-based model for predicting EV proteins achieved an area under the curve of 0.77 ± 0.01, which increased further to 0.84 ± 0.00 when incorporating curated post-translational modification (PTM) annotations. Feature analysis shows that EV-associated proteins are stable, polar, and structured with low isoelectric point compared to non-EV proteins. PTM annotations emerged as the most important features for correct classification; specifically, palmitoylation is one of the most prevalent EV sorting mechanisms for unique proteins. Palmitoylation and nitrosylation sites are especially prevalent in EV proteins that are determined by very strict isolation protocols, indicating they could potentially serve as quality control criteria for future studies. This computational study offers an effective sequence-based predictor of EV associated proteins with extensive characterisation of the human EV proteome that can explain for individual proteins which factors contribute to their EV association.</p>\",\"PeriodicalId\":73747,\"journal\":{\"name\":\"Journal of extracellular biology\",\"volume\":\"3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jex2.120\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of extracellular biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jex2.120\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of extracellular biology","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jex2.120","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
细胞外囊泡(EVs)是细胞释放到细胞外空间的膜结构,被认为参与了细胞间的通讯。尽管EVs及其货物是很有希望的候选生物标记物,但蛋白质向EVs的分拣机制仍不清楚。在这项研究中,我们询问是否有可能根据蛋白质序列确定 EV 关联。此外,我们还想知道 EV 关联最重要的决定因素是什么。我们用可解释的人工智能模型来回答这些问题,并使用来自 EV 数据库的人类蛋白质组数据来训练和验证模型。必须对数据集进行校正,以消除粗略的 EV 分离工作流程所带来的污染物以及质谱分析所造成的实验偏差。在这项研究中,我们发现确实有可能通过蛋白质序列预测 EV 关联性:基于序列的简单模型预测 EV 蛋白质的曲线下面积为 0.77 ± 0.01,当加入经策划的翻译后修饰 (PTM) 注释时,曲线下面积进一步增加到 0.84 ± 0.00。特征分析表明,与非EV蛋白相比,EV相关蛋白具有稳定性、极性和低等电点结构。PTM 注释是正确分类的最重要特征;特别是,棕榈酰化是独特蛋白质最普遍的 EV 分类机制之一。棕榈酰化和亚硝基化位点在通过非常严格的分离协议确定的 EV 蛋白中尤其普遍,这表明它们有可能成为未来研究的质量控制标准。这项计算研究提供了一种有效的基于序列的 EV 相关蛋白质预测方法,它对人类 EV 蛋白质体进行了广泛的表征,可以为单个蛋白质解释哪些因素促成了它们与 EV 的关联。
Proteome encoded determinants of protein sorting into extracellular vesicles
Extracellular vesicles (EVs) are membranous structures released by cells into the extracellular space and are thought to be involved in cell-to-cell communication. While EVs and their cargo are promising biomarker candidates, sorting mechanisms of proteins to EVs remain unclear. In this study, we ask if it is possible to determine EV association based on the protein sequence. Additionally, we ask what the most important determinants are for EV association. We answer these questions with explainable AI models, using human proteome data from EV databases to train and validate the model. It is essential to correct the datasets for contaminants introduced by coarse EV isolation workflows and for experimental bias caused by mass spectrometry. In this study, we show that it is indeed possible to predict EV association from the protein sequence: a simple sequence-based model for predicting EV proteins achieved an area under the curve of 0.77 ± 0.01, which increased further to 0.84 ± 0.00 when incorporating curated post-translational modification (PTM) annotations. Feature analysis shows that EV-associated proteins are stable, polar, and structured with low isoelectric point compared to non-EV proteins. PTM annotations emerged as the most important features for correct classification; specifically, palmitoylation is one of the most prevalent EV sorting mechanisms for unique proteins. Palmitoylation and nitrosylation sites are especially prevalent in EV proteins that are determined by very strict isolation protocols, indicating they could potentially serve as quality control criteria for future studies. This computational study offers an effective sequence-based predictor of EV associated proteins with extensive characterisation of the human EV proteome that can explain for individual proteins which factors contribute to their EV association.