eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines.

IF 2.1 3区生物学 Q3 GENETICS & HEREDITY Hereditas Pub Date : 2016-06-30 eCollection Date: 2016-01-01 DOI:10.1186/s41065-016-0012-2

Fang Huang, Jiawei Shen, Qingli Guo, Yongyong Shi

{"title":"eRFSVM: a hybrid classifier to predict enhancers-integrating random forests with support vector machines.","authors":"Fang Huang, Jiawei Shen, Qingli Guo, Yongyong Shi","doi":"10.1186/s41065-016-0012-2","DOIUrl":null,"url":null,"abstract":"Background: Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.Results: We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.Conclusions: All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.","PeriodicalId":55057,"journal":{"name":"Hereditas","volume":"153 1","pages":"6"},"PeriodicalIF":2.1000,"publicationDate":"2016-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5226099/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Hereditas","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s41065-016-0012-2","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2016/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Enhancers are tissue specific distal regulation elements, playing vital roles in gene regulation and expression. The prediction and identification of enhancers are important but challenging issues for bioinformatics studies. Existing computational methods, mostly single classifiers, can only predict the transcriptional coactivator EP300 based enhancers and show low generalization performance.

Results: We built a hybrid classifier called eRFSVM in this study, using random forests as a base classifier, and support vector machines as a main classifier. eRFSVM integrated two components as eRFSVM-ENCODE and eRFSVM-FANTOM5 with diverse features and labels. The base classifier trained datasets from a single tissue or cell with random forests. The main classifier made the final decision by support vector machines algorithm, with the predicting results of base classifiers as inputs. For eRFSVM-ENCODE, we trained datasets from cell lines including Gm12878, Hep, H1-hesc and Huvec, using ChIP-Seq datasets as features and EP300 based enhancers as labels. We tested eRFSVM-ENCODE on K562 dataset, and resulted in a predicting precision of 83.69 %, which was much better than existing classifiers. For eRFSVM-FANTOM5, with enhancers identified by RNA in FANTOM5 project as labels, the precision, recall, F-score and accuracy were 86.17 %, 36.06 %, 50.84 % and 93.38 % using eRFSVM, increasing 23.24 % (69.92 %), 97.05 % (18.30 %), 76.90 % (28.74 %), 4.69 % (89.20 %) than the existing algorithm, respectively.

Conclusions: All these results demonstrated that eRFSVM was a better classifier in predicting both EP300 based and FAMTOM5 RNAs based enhancers.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

eRFSVM：预测增强子的混合分类器--集成随机森林与支持向量机。

背景：增强子是组织特异性远端调控元件，在基因调控和表达中发挥着重要作用。增强子的预测和识别是生物信息学研究中重要但具有挑战性的问题。现有的计算方法大多是单一分类器，只能预测基于转录辅激活因子 EP300 的增强子，而且泛化性能较低：eRFSVM 集成了 eRFSVM-ENCODE 和 eRFSVM-FANTOM5 两个具有不同特征和标签的组件。基础分类器使用随机森林训练来自单一组织或细胞的数据集。主分类器以基础分类器的预测结果为输入，通过支持向量机算法做出最终决定。对于 eRFSVM-ENCODE，我们以 ChIP-Seq 数据集为特征，以基于 EP300 的增强子为标签，训练了来自 Gm12878、Hep、H1-hesc 和 Huvec 等细胞系的数据集。我们在 K562 数据集上测试了 eRFSVM-ENCODE，结果预测精度为 83.69%，远远优于现有的分类器。对于 eRFSVM-FANTOM5，以 FANTOM5 项目中 RNA 鉴定出的增强子为标签，eRFSVM 的精确度、召回率、F-score 和准确率分别为 86.17 %、36.06 %、50.84 % 和 93.38 %，比现有算法分别提高了 23.24 %（69.92 %）、97.05 %（18.30 %）、76.90 %（28.74 %）、4.69 %（89.20 %）：所有这些结果表明，eRFSVM 在预测基于 EP300 和基于 FAMTOM5 RNAs 的增强子方面是一种更好的分类器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Hereditas 生物-遗传学

CiteScore

4.30

自引率

3.70%

发文量

审稿时长

6 weeks

期刊介绍： For almost a century, Hereditas has published original cutting-edge research and reviews. As the Official journal of the Mendelian Society of Lund, the journal welcomes research from across all areas of genetics and genomics. Topics of interest include human and medical genetics, animal and plant genetics, microbial genetics, agriculture and bioinformatics.