Comparing artificial neural networks, general linear models and support vector machines in building predictive models for small interfering RNAs.

IF 2.6 3区 综合性期刊 Q1 MULTIDISCIPLINARY SCIENCES PLoS ONE Pub Date : 2009-10-22 DOI:10.1371/journal.pone.0007522
Kyle A McQuisten, Andrew S Peek
{"title":"Comparing artificial neural networks, general linear models and support vector machines in building predictive models for small interfering RNAs.","authors":"Kyle A McQuisten,&nbsp;Andrew S Peek","doi":"10.1371/journal.pone.0007522","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Exogenous short interfering RNAs (siRNAs) induce a gene knockdown effect in cells by interacting with naturally occurring RNA processing machinery. However not all siRNAs induce this effect equally. Several heterogeneous kinds of machine learning techniques and feature sets have been applied to modeling siRNAs and their abilities to induce knockdown. There is some growing agreement to which techniques produce maximally predictive models and yet there is little consensus for methods to compare among predictive models. Also, there are few comparative studies that address what the effect of choosing learning technique, feature set or cross validation approach has on finding and discriminating among predictive models.</p><p><strong>Principal findings: </strong>Three learning techniques were used to develop predictive models for effective siRNA sequences including Artificial Neural Networks (ANNs), General Linear Models (GLMs) and Support Vector Machines (SVMs). Five feature mapping methods were also used to generate models of siRNA activities. The 2 factors of learning technique and feature mapping were evaluated by complete 3x5 factorial ANOVA. Overall, both learning techniques and feature mapping contributed significantly to the observed variance in predictive models, but to differing degrees for precision and accuracy as well as across different kinds and levels of model cross-validation.</p><p><strong>Conclusions: </strong>The methods presented here provide a robust statistical framework to compare among models developed under distinct learning techniques and feature sets for siRNAs. Further comparisons among current or future modeling approaches should apply these or other suitable statistically equivalent methods to critically evaluate the performance of proposed models. ANN and GLM techniques tend to be more sensitive to the inclusion of noisy features, but the SVM technique is more robust under large numbers of features for measures of model precision and accuracy. Features found to result in maximally predictive models are not consistent across learning techniques, suggesting care should be taken in the interpretation of feature relevance. In the models developed here, there are statistically differentiable combinations of learning techniques and feature mapping methods where the SVM technique under a specific combination of features significantly outperforms all the best combinations of features within the ANN and GLM techniques.</p>","PeriodicalId":20189,"journal":{"name":"PLoS ONE","volume":" ","pages":"e7522"},"PeriodicalIF":2.6000,"publicationDate":"2009-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1371/journal.pone.0007522","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLoS ONE","FirstCategoryId":"103","ListUrlMain":"https://doi.org/10.1371/journal.pone.0007522","RegionNum":3,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 18

Abstract

Background: Exogenous short interfering RNAs (siRNAs) induce a gene knockdown effect in cells by interacting with naturally occurring RNA processing machinery. However not all siRNAs induce this effect equally. Several heterogeneous kinds of machine learning techniques and feature sets have been applied to modeling siRNAs and their abilities to induce knockdown. There is some growing agreement to which techniques produce maximally predictive models and yet there is little consensus for methods to compare among predictive models. Also, there are few comparative studies that address what the effect of choosing learning technique, feature set or cross validation approach has on finding and discriminating among predictive models.

Principal findings: Three learning techniques were used to develop predictive models for effective siRNA sequences including Artificial Neural Networks (ANNs), General Linear Models (GLMs) and Support Vector Machines (SVMs). Five feature mapping methods were also used to generate models of siRNA activities. The 2 factors of learning technique and feature mapping were evaluated by complete 3x5 factorial ANOVA. Overall, both learning techniques and feature mapping contributed significantly to the observed variance in predictive models, but to differing degrees for precision and accuracy as well as across different kinds and levels of model cross-validation.

Conclusions: The methods presented here provide a robust statistical framework to compare among models developed under distinct learning techniques and feature sets for siRNAs. Further comparisons among current or future modeling approaches should apply these or other suitable statistically equivalent methods to critically evaluate the performance of proposed models. ANN and GLM techniques tend to be more sensitive to the inclusion of noisy features, but the SVM technique is more robust under large numbers of features for measures of model precision and accuracy. Features found to result in maximally predictive models are not consistent across learning techniques, suggesting care should be taken in the interpretation of feature relevance. In the models developed here, there are statistically differentiable combinations of learning techniques and feature mapping methods where the SVM technique under a specific combination of features significantly outperforms all the best combinations of features within the ANN and GLM techniques.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
比较人工神经网络、一般线性模型和支持向量机在建立小干扰rna预测模型中的应用。
背景:外源性短干扰RNA (sirna)通过与自然发生的RNA加工机制相互作用,在细胞中诱导基因敲低效应。然而,并不是所有的sirna都能产生同样的效果。几种不同类型的机器学习技术和特征集已被应用于sirna建模及其诱导敲除的能力。对于哪种技术能产生最大程度的预测模型,人们的看法越来越一致,但对于在预测模型之间进行比较的方法,人们的看法却很少。此外,很少有比较研究解决选择学习技术、特征集或交叉验证方法对发现和区分预测模型的影响。主要发现:采用人工神经网络(ann)、一般线性模型(GLMs)和支持向量机(svm)三种学习技术建立了有效siRNA序列的预测模型。五种特征映射方法也被用于生成siRNA活性模型。采用完全3x5因子方差分析对学习技术和特征映射两个因素进行评价。总体而言,学习技术和特征映射都对预测模型中观察到的方差有显著贡献,但在精度和准确度以及不同类型和水平的模型交叉验证方面有不同程度的影响。结论:本文提出的方法提供了一个强大的统计框架,可以比较在不同学习技术和sirna特征集下开发的模型。在当前或未来的建模方法之间的进一步比较应该应用这些或其他合适的统计等效方法来批判性地评估所提出模型的性能。ANN和GLM技术往往对包含噪声特征更敏感,但SVM技术在大量特征下对模型精度和准确性的度量更强。在不同的学习技术中,导致最大预测模型的特征并不一致,这表明在解释特征相关性时应该小心。在这里开发的模型中,存在统计学上可微分的学习技术和特征映射方法的组合,其中SVM技术在特定特征组合下的性能明显优于ANN和GLM技术中所有最佳特征组合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
PLoS ONE
PLoS ONE 生物-生物学
CiteScore
6.20
自引率
5.40%
发文量
14242
审稿时长
3.7 months
期刊介绍: PLOS ONE is an international, peer-reviewed, open-access, online publication. PLOS ONE welcomes reports on primary research from any scientific discipline. It provides: * Open-access—freely accessible online, authors retain copyright * Fast publication times * Peer review by expert, practicing researchers * Post-publication tools to indicate quality and impact * Community-based dialogue on articles * Worldwide media coverage
期刊最新文献
Delivering culturally adapted family interventions for people with schizophrenia in Indonesia: A feasibility randomised controlled trial and nested process evaluation. Sepsis under pressure, intraoperative surgical site infection prevention practices among nurses in emergency surgical settings: A qualitative study. Differences in soil carbon fractions and microbial communities and their underlying mechanisms between assisted natural regeneration and plantation forests in subtropical China. Does surgical approach affect Hirschsprung-associated enterocolitis risk? A comparison between transanal Swenson-like and endorectal pull-throughs. Eddy current measurements of dielectric coating thickness on a weakly magnetic substrate within the medium frequency range.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1