结合多种特征表示策略的集成学习方法预测lncRNA亚细胞定位。

IF 3.1 4区生物学 Q2 BIOLOGY Computational Biology and Chemistry Pub Date : 2025-04-01 Epub Date: 2025-01-01 DOI:10.1016/j.compbiolchem.2024.108336

Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang

{"title":"结合多种特征表示策略的集成学习方法预测lncRNA亚细胞定位。","authors":"Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang","doi":"10.1016/j.compbiolchem.2024.108336","DOIUrl":null,"url":null,"abstract":"<div><div>Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.</div></div>","PeriodicalId":10616,"journal":{"name":"Computational Biology and Chemistry","volume":"115 ","pages":"Article 108336"},"PeriodicalIF":3.1000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations\",\"authors\":\"Lina Zhang, Sizan Gao, Qinghao Yuan, Yao Fu, Runtao Yang\",\"doi\":\"10.1016/j.compbiolchem.2024.108336\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.</div></div>\",\"PeriodicalId\":10616,\"journal\":{\"name\":\"Computational Biology and Chemistry\",\"volume\":\"115 \",\"pages\":\"Article 108336\"},\"PeriodicalIF\":3.1000,\"publicationDate\":\"2025-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational Biology and Chemistry\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1476927124003244\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Biology and Chemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1476927124003244","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

长链非编码rna （lncRNAs）与细胞生理机制密切相关，并与许多疾病有关。通过探索lncrna的亚细胞定位，我们不仅可以对lncrna相关生物学过程的分子机制有重要的认识，而且可以对各种人类疾病的诊断、预防和治疗做出有价值的贡献。然而，传统的实验技术往往是费力和费时的。在这种情况下，计算方法的需求增加了。本文的重点是开发一种创新的集成方法，该方法结合混合特征来准确预测lncrna的亚细胞定位。为了解决单一源特征不能完全反映目标的内在相关性的问题，通过引入序列组成、理化性质和结构等信息，实现了异构多源特征的利用。为了解决基准数据集中的类不平衡问题，采用了合成少数派过采样技术（SMOTE）。最后，通过集成各个分类器的输出来开发称为lncSLPre的预测器。实验结果表明，多源异构特征的互补性提高了预测性能。此外，还证明了SMOTE在缓解数据集不平衡问题方面的应用是有效的，而特征选择方法对于消除多余和多余的特征至关重要。与现有的先进方法相比，lncSLPre取得了更好的性能，总体准确率分别提高了13.13%、2.15%和3.23%，表明lncSLPre可以有效地预测lncRNA亚细胞定位。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An ensemble learning method combined with multiple feature representation strategies to predict lncRNA subcellular localizations

Long non-coding RNAs (lncRNAs) are strongly associated with cellular physiological mechanisms and implicated in the numerous diseases. By exploring the subcellular localizations of lncRNAs, we can not only gain crucial insights into the molecular mechanisms of lncRNA-related biological processes but also make valuable contributions towards the diagnosis, prevention, and treatment of various human diseases. However, conventional experimental techniques tend to be laborious and time-intensive. In this context, computational methods are in increased demand. The focus of this paper is the development of an innovative ensemble method that incorporates hybrid features to accurately predict the subcellular localizations of lncRNAs. To address the issue of incomplete reflection of inherent correlation with the intended target using singular source features, the utilization of heterogeneous multi-source features is implemented by introducing information on sequence composition, physicochemical properties, and structure. To address the issue of the imbalance classes in the benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is employed. Finally, the resulting predictor termed lncSLPre is developed by integrating the outputs of the individual classifiers. Experimental findings suggest that the complementarity of multi-source heterogeneous features improves prediction performance. Additionally, it is demonstrated that the application of SMOTE is effective in mitigating the issue of the imbalanced dataset, while the feature selection approach is critical in eliminating extraneous and redundant features. Compared with existing advanced methods, lncSLPre achieves better performance with an overall accuracy improvement of 13.13%, 2.15%, and 3.23%, respectively, indicating that lncSLPre can effectively predict lncRNA subcellular localizations.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational Biology and Chemistry 生物-计算机：跨学科应用

CiteScore

6.10

自引率

3.20%

发文量

142

审稿时长

24 days

期刊介绍： Computational Biology and Chemistry publishes original research papers and review articles in all areas of computational life sciences. High quality research contributions with a major computational component in the areas of nucleic acid and protein sequence research, molecular evolution, molecular genetics (functional genomics and proteomics), theory and practice of either biology-specific or chemical-biology-specific modeling, and structural biology of nucleic acids and proteins are particularly welcome. Exceptionally high quality research work in bioinformatics, systems biology, ecology, computational pharmacology, metabolism, biomedical engineering, epidemiology, and statistical genetics will also be considered. Given their inherent uncertainty, protein modeling and molecular docking studies should be thoroughly validated. In the absence of experimental results for validation, the use of molecular dynamics simulations along with detailed free energy calculations, for example, should be used as complementary techniques to support the major conclusions. Submissions of premature modeling exercises without additional biological insights will not be considered. Review articles will generally be commissioned by the editors and should not be submitted to the journal without explicit invitation. However prospective authors are welcome to send a brief (one to three pages) synopsis, which will be evaluated by the editors.