Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers

IF 4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Machine learning and knowledge extraction Pub Date : 2023-09-18 DOI:10.3390/make5030061
Mohammad H. Alshayeji
{"title":"Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers","authors":"Mohammad H. Alshayeji","doi":"10.3390/make5030061","DOIUrl":null,"url":null,"abstract":"Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"12 1","pages":"0"},"PeriodicalIF":4.0000,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge extraction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/make5030061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于数据挖掘和集成分类器的早期甲状腺风险预测
甲状腺疾病是世界上最常见的内分泌疾病之一。由于甲状腺控制着人体的代谢,甲状腺疾病是一个关注人类健康的问题。为了节省时间和降低错误率,一个自动、可靠、准确的甲状腺识别机器学习(ML)系统是必不可少的。该模型旨在解决现有工作的局限性,如缺乏详细的特征分析、可视化、预测精度和可靠性的提高。在这里,使用了一个公共甲状腺疾病数据集,其中包含来自加州大学欧文分校ML存储库的29个临床特征。临床特征帮助我们建立了一个ML模型,可以通过分析早期症状来预测甲状腺疾病,并取代手工分析这些属性。特征分析和可视化有助于理解特征在甲状腺预测任务中的作用。此外,使用合成少数过采样技术(SMOTE)通过5倍交叉验证和数据平衡消除了过拟合问题。由于在预测决策中涉及多个分类器,集成学习保证了预测模型的可靠性。该模型准确率为99.5%,灵敏度为99.39%,特异度为99.59%,可应用于实时计算机辅助诊断(CAD)系统,方便诊断,促进早期治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
CiteScore
6.30
自引率
0.00%
发文量
0
审稿时长
7 weeks
期刊最新文献
Knowledge Graph Extraction of Business Interactions from News Text for Business Networking Analysis Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data A Data Mining Approach for Health Transport Demand Predicting Wind Comfort in an Urban Area: A Comparison of a Regression- with a Classification-CNN for General Wind Rose Statistics An Evaluative Baseline for Sentence-Level Semantic Division
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1