Effects of pooling samples on the performance of classification algorithms: a comparative study.

Q2 Environmental Science The Scientific World Journal Pub Date : 2012-01-01 Epub Date: 2012-04-30 DOI:10.1100/2012/278352
Kanthida Kusonmano, Michael Netzer, Christian Baumgartner, Matthias Dehmer, Klaus R Liedl, Armin Graber
{"title":"Effects of pooling samples on the performance of classification algorithms: a comparative study.","authors":"Kanthida Kusonmano,&nbsp;Michael Netzer,&nbsp;Christian Baumgartner,&nbsp;Matthias Dehmer,&nbsp;Klaus R Liedl,&nbsp;Armin Graber","doi":"10.1100/2012/278352","DOIUrl":null,"url":null,"abstract":"<p><p>A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.</p>","PeriodicalId":22985,"journal":{"name":"The Scientific World Journal","volume":" ","pages":"278352"},"PeriodicalIF":0.0000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1100/2012/278352","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Scientific World Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1100/2012/278352","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/4/30 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"Environmental Science","Score":null,"Total":0}
引用次数: 15

Abstract

A pooling design can be used as a powerful strategy to compensate for limited amounts of samples or high biological variation. In this paper, we perform a comparative study to model and quantify the effects of virtual pooling on the performance of the widely applied classifiers, support vector machines (SVMs), random forest (RF), k-nearest neighbors (k-NN), penalized logistic regression (PLR), and prediction analysis for microarrays (PAMs). We evaluate a variety of experimental designs using mock omics datasets with varying levels of pool sizes and considering effects from feature selection. Our results show that feature selection significantly improves classifier performance for non-pooled and pooled data. All investigated classifiers yield lower misclassification rates with smaller pool sizes. RF mainly outperforms other investigated algorithms, while accuracy levels are comparable among all the remaining ones. Guidelines are derived to identify an optimal pooling scheme for obtaining adequate predictive power and, hence, to motivate a study design that meets best experimental objectives and budgetary conditions, including time constraints.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
池化样本对分类算法性能影响的比较研究。
池化设计可以作为一种强大的策略来补偿有限数量的样本或高生物变异。在本文中,我们进行了一项比较研究,以模拟和量化虚拟池对广泛应用的分类器、支持向量机(svm)、随机森林(RF)、k-近邻(k-NN)、惩罚逻辑回归(PLR)和微阵列(pam)预测分析性能的影响。我们使用具有不同池大小水平的模拟组学数据集评估各种实验设计,并考虑特征选择的影响。我们的研究结果表明,特征选择显著提高了分类器在非池化和池化数据中的性能。所有被调查的分类器在池大小较小的情况下产生较低的误分类率。RF主要优于其他已研究的算法,而其余所有算法的精度水平相当。制定指导方针,以确定获得足够预测能力的最佳汇集方案,从而激励研究设计,以满足最佳实验目标和预算条件,包括时间限制。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
The Scientific World Journal
The Scientific World Journal 综合性期刊-综合性期刊
CiteScore
5.60
自引率
0.00%
发文量
170
审稿时长
3.7 months
期刊介绍: The Scientific World Journal is a peer-reviewed, Open Access journal that publishes original research, reviews, and clinical studies covering a wide range of subjects in science, technology, and medicine. The journal is divided into 81 subject areas.
期刊最新文献
Ethrel-Induced Enhancement of Sugar Accumulation and Postharvest Quality in Early-Harvested Cantaloupe Melons. Investigating the Knowledge, Attitudes, and Practice of Family Caregivers in Post-Hip Fracture Surgery Care: A Descriptive-Analytical Study. Morphological Characterization of Merino Sheep in Different Agro-Ecological Zones of Lesotho. Ultrasound-Based Knee Osteoarthritis Severity Assessment and Its Association With Kellgren-Lawrence Grading. Micro-CT Evaluation of Dentin Preservation by ProTaper Gold and VDW.Rotate in Oval Mandibular Incisors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1