Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang
{"title":"结合多种特征表示策略的集成学习方法预测lncRNA亚细胞定位。","authors":"Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang","doi":"10.1016/j.compbiolchem.2024.108336","DOIUrl":null,"url":null,"abstract":"<div><div>Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"115 ","pages":"Article 108336"},"PeriodicalIF":2.6000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations\",\"authors\":\"Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang\",\"doi\":\"10.1016/j.compbiolchem.2024.108336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.</div></div>\",\"PeriodicalId\":10616,\"journal\":{\"name\":\"Computational Biology and Chemistry\",\"volume\":\"115 \",\"pages\":\"Article 108336\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2025-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Biology and Chemistry\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1476927124003244\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124003244","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations
Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.
期刊介绍:
Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered.
Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered.
Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.