利用人工智能策略识别依赖于 RNA 的液-液相分离蛋白质。

IF 3.4 4区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS Proteomics Pub Date : 2024-11-01 Epub Date: 2024-06-02 DOI:10.1002/pmic.202400044
Zahoor Ahmed, Kiran Shahzadi, Yanting Jin, Rui Li, Biffon Manyura Momanyi, Hasan Zulfiqar, Lin Ning, Hao Lin
{"title":"利用人工智能策略识别依赖于 RNA 的液-液相分离蛋白质。","authors":"Zahoor Ahmed, Kiran Shahzadi, Yanting Jin, Rui Li, Biffon Manyura Momanyi, Hasan Zulfiqar, Lin Ning, Hao Lin","doi":"10.1002/pmic.202400044","DOIUrl":null,"url":null,"abstract":"<p><p>RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":" ","pages":"e2400044"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy.\",\"authors\":\"Zahoor Ahmed, Kiran Shahzadi, Yanting Jin, Rui Li, Biffon Manyura Momanyi, Hasan Zulfiqar, Lin Ning, Hao Lin\",\"doi\":\"10.1002/pmic.202400044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.</p>\",\"PeriodicalId\":224,\"journal\":{\"name\":\"Proteomics\",\"volume\":\" \",\"pages\":\"e2400044\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/pmic.202400044\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.202400044","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

摘要

RNA 依赖性液-液相分离(LLPS)蛋白在细胞过程中发挥着关键作用,如应激颗粒形成、DNA 修复、RNA 代谢、生殖细胞发育和蛋白质翻译调控。这些蛋白质的异常行为与多种疾病有关,尤其是神经退行性疾病,如肌萎缩性脊髓侧索硬化症和额颞叶痴呆症,因此对它们的鉴定至关重要。然而,基于生物化学的传统方法鉴定这些蛋白质既耗时又昂贵。为了应对这一挑战,我们的研究开发了一个强大的计算模型来识别它们。我们构建了一个包含 137 个 RNA 依赖性和 606 个非 RNA 依赖性 LLPS 蛋白序列的综合数据集,然后使用氨基酸组成、K 距氨基酸对组成、Geary 自相关和三元连体方法对这些序列进行编码。通过结合相关性分析、互信息评分和增量特征选择,我们确定了一个最佳特征子集。该子集用于训练随机森林模型,该模型在独立数据集的测试中达到了 90% 的准确率。这项研究证明了计算方法作为鉴定 RNA 依赖性 LLPS 蛋白的有效替代方法的潜力。为了提高模型的可访问性,我们建立了一个以用户为中心的网络服务器,可通过以下链接访问:http://rpp.lin-group.cn。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy.

RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Proteomics
Proteomics 生物-生化研究方法
CiteScore
6.30
自引率
5.90%
发文量
193
审稿时长
3 months
期刊介绍: PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.
期刊最新文献
Special Issue on "Metaproteomics and meta-omics perspectives to decrypt Microbiome Functionality". In-Depth Proteome Profiling of the Hippocampus of LDLR Knockout Mice Reveals Alternation in Synaptic Signaling Pathway. Parallel Analyses by Mass Spectrometry (MS) and Reverse Phase Protein Array (RPPA) Reveal Complementary Proteomic Profiles in Triple-Negative Breast Cancer (TNBC) Patient Tissues and Cell Cultures. Review and Practical Guide for Getting Started With Single-Cell Proteomics. Omics Studies in CKD: Diagnostic Opportunities and Therapeutic Potential.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1