基于树的特征选择齐次集成模型在糖尿病视网膜病变预测中的应用

Tamunopriye Ene Dagogo-George, H. A. Mojeed, A. Balogun, M. Mabayoje, S. A. Salihu
{"title":"基于树的特征选择齐次集成模型在糖尿病视网膜病变预测中的应用","authors":"Tamunopriye Ene Dagogo-George, H. A. Mojeed, A. Balogun, M. Mabayoje, S. A. Salihu","doi":"10.14710/jtsiskom.2020.13669","DOIUrl":null,"url":null,"abstract":"Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.","PeriodicalId":56231,"journal":{"name":"Jurnal Teknologi dan Sistem Komputer","volume":"8 1","pages":"297-303"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Tree-based homogeneous ensemble model with feature selection for diabetic retinopathy prediction\",\"authors\":\"Tamunopriye Ene Dagogo-George, H. A. Mojeed, A. Balogun, M. Mabayoje, S. A. Salihu\",\"doi\":\"10.14710/jtsiskom.2020.13669\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.\",\"PeriodicalId\":56231,\"journal\":{\"name\":\"Jurnal Teknologi dan Sistem Komputer\",\"volume\":\"8 1\",\"pages\":\"297-303\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Teknologi dan Sistem Komputer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14710/jtsiskom.2020.13669\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknologi dan Sistem Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14710/jtsiskom.2020.13669","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

糖尿病视网膜病变(DR)是一种由长期糖尿病引起的疾病,会对眼睛造成严重损伤。这种疾病的早期诊断是非常必要的,因为晚期诊断可能是致命的。现有的研究采用了机器学习方法,其中支持向量机(SVM)在大多数分析中具有最高的性能,而决策树(DT)具有最低的性能。然而,众所周知,支持向量机存在参数和核选择问题,这削弱了其预测能力。因此,本研究提出了以DT为基础分类器的同质集成分类方法,以优化预测性能。采用了具有特征选择的Boosting和Bagging集成方法,并使用Python Scikit Learn库在从UCI机器学习库中提取的DR数据集上进行了实验。实验结果表明,Bagged和Boosted DT均优于SVM。具体而言,与SVM相比,Bagged DT表现最好,准确率为65.38%,f-score为0.664,AUC为0.731,其次是Boosted DT,准确率分别为65.42%、0.655和0.724(准确率为65%、0.652和0.721)。这些结果表明,通过采用齐次集成方法可以优化DT的预测性能,使其在预测DR方面优于SVM。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Tree-based homogeneous ensemble model with feature selection for diabetic retinopathy prediction
Diabetic Retinopathy (DR) is a condition that emerges from prolonged diabetes, causing severe damages to the eyes. Early diagnosis of this disease is highly imperative as late diagnosis may be fatal. Existing studies employed machine learning approaches with Support Vector Machines (SVM) having the highest performance on most analyses and Decision Trees (DT) having the lowest. However, SVM has been known to suffer from parameter and kernel selection problems, which undermine its predictive capability. Hence, this study presents homogenous ensemble classification methods with DT as the base classifier to optimize predictive performance. Boosting and Bagging ensemble methods with feature selection were employed, and experiments were carried out using Python Scikit Learn libraries on DR datasets extracted from UCI Machine Learning repository. Experimental results showed that Bagged and Boosted DT were better than SVM. Specifically, Bagged DT performed best with accuracy 65.38 %, f-score 0.664, and AUC 0.731, followed by Boosted DT with accuracy 65.42 %, f-score 0.655, and AUC 0.724 when compared to SVM (accuracy 65.16 %, f-score 0.652, and AUC 0.721). These results indicate that DT's predictive performance can be optimized by employing the homogeneous ensemble methods to outperform SVM in predicting DR.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
6
审稿时长
6 weeks
期刊最新文献
TATOPSIS: A decision support system for selecting a major in university with a two-way approach and TOPSIS Regional clustering based on economic potential with a modified fuzzy k-prototypes algorithm for village developing target determination River water level measurement system using Sobel edge detection method Classification of beneficiaries for the rehabilitation of uninhabitable houses using the K-Nearest Neighbor algorithm Sequence-based prediction of protein-protein interaction using autocorrelation features and machine learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1