Zahoor Ahmed, Kiran Shahzadi, Yanting Jin, Rui Li, Biffon Manyura Momanyi, Hasan Zulfiqar, Lin Ning, Hao Lin
{"title":"利用人工智能策略识别依赖于 RNA 的液-液相分离蛋白质。","authors":"Zahoor Ahmed, Kiran Shahzadi, Yanting Jin, Rui Li, Biffon Manyura Momanyi, Hasan Zulfiqar, Lin Ning, Hao Lin","doi":"10.1002/pmic.202400044","DOIUrl":null,"url":null,"abstract":"<p><p>RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.</p>","PeriodicalId":224,"journal":{"name":"Proteomics","volume":" ","pages":"e2400044"},"PeriodicalIF":3.4000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy.\",\"authors\":\"Zahoor Ahmed, Kiran Shahzadi, Yanting Jin, Rui Li, Biffon Manyura Momanyi, Hasan Zulfiqar, Lin Ning, Hao Lin\",\"doi\":\"10.1002/pmic.202400044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.</p>\",\"PeriodicalId\":224,\"journal\":{\"name\":\"Proteomics\",\"volume\":\" \",\"pages\":\"e2400044\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proteomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1002/pmic.202400044\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/6/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proteomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1002/pmic.202400044","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/6/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
Identification of RNA‐dependent liquid‐liquid phase separation proteins using an artificial intelligence strategy.
RNA-dependent liquid-liquid phase separation (LLPS) proteins play critical roles in cellular processes such as stress granule formation, DNA repair, RNA metabolism, germ cell development, and protein translation regulation. The abnormal behavior of these proteins is associated with various diseases, particularly neurodegenerative disorders like amyotrophic lateral sclerosis and frontotemporal dementia, making their identification crucial. However, conventional biochemistry-based methods for identifying these proteins are time-consuming and costly. Addressing this challenge, our study developed a robust computational model for their identification. We constructed a comprehensive dataset containing 137 RNA-dependent and 606 non-RNA-dependent LLPS protein sequences, which were then encoded using amino acid composition, composition of K-spaced amino acid pairs, Geary autocorrelation, and conjoined triad methods. Through a combination of correlation analysis, mutual information scoring, and incremental feature selection, we identified an optimal feature subset. This subset was used to train a random forest model, which achieved an accuracy of 90% when tested against an independent dataset. This study demonstrates the potential of computational methods as efficient alternatives for the identification of RNA-dependent LLPS proteins. To enhance the accessibility of the model, a user-centric web server has been established and can be accessed via the link: http://rpp.lin-group.cn.
期刊介绍:
PROTEOMICS is the premier international source for information on all aspects of applications and technologies, including software, in proteomics and other "omics". The journal includes but is not limited to proteomics, genomics, transcriptomics, metabolomics and lipidomics, and systems biology approaches. Papers describing novel applications of proteomics and integration of multi-omics data and approaches are especially welcome.