A comparative analysis of the effectiveness of feature engineering techniques on thyroid disease prediction

Aidah Mashonga, Leslie KudzaiNyandoro, Kudakwashe Zvarevashe
{"title":"A comparative analysis of the effectiveness of feature engineering techniques on thyroid disease prediction","authors":"Aidah Mashonga, Leslie KudzaiNyandoro, Kudakwashe Zvarevashe","doi":"10.1109/ZCICT55726.2022.10045927","DOIUrl":null,"url":null,"abstract":"The thyroid gland’s edge experiences an abnormal proliferation of thyroid tissue, which causes thyroid illness. The two primary types of thyroid disorders are hypothyroidism and hyperthyroidism which typically result when this gland releases excessive amounts of hormones. To identify and diagnose thyroid disease, this study suggests employing effective classifiers and feature selection strategies that consider accuracy and other performance evaluation measures. This study offers a thorough examination of various classifiers that includes the support vector machine, logistic regression, and extreme gradient boosting algorithms. The algorithms use three feature removal strategies that is recursive feature elimination, Pearson’s correlation and chi-squared statistics. To determine thyroid illness, thyroid data from the Kaggle datasets were used. Numerous aspects of the experiment have been evaluated and analyzed, including accuracy, precision, and the receiver operating curve’s area under the curve. The outcome showed that classifiers that use feature selection have a greater overall accuracy(Xtreme Gradient Boost 98%and support vector machine 95%) compared to without feature selection technique (support vector machine 89%). Logistics regression performed better without at 95% than 94% with feature selection.","PeriodicalId":125540,"journal":{"name":"2022 1st Zimbabwe Conference of Information and Communication Technologies (ZCICT)","volume":"362 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 1st Zimbabwe Conference of Information and Communication Technologies (ZCICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ZCICT55726.2022.10045927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The thyroid gland’s edge experiences an abnormal proliferation of thyroid tissue, which causes thyroid illness. The two primary types of thyroid disorders are hypothyroidism and hyperthyroidism which typically result when this gland releases excessive amounts of hormones. To identify and diagnose thyroid disease, this study suggests employing effective classifiers and feature selection strategies that consider accuracy and other performance evaluation measures. This study offers a thorough examination of various classifiers that includes the support vector machine, logistic regression, and extreme gradient boosting algorithms. The algorithms use three feature removal strategies that is recursive feature elimination, Pearson’s correlation and chi-squared statistics. To determine thyroid illness, thyroid data from the Kaggle datasets were used. Numerous aspects of the experiment have been evaluated and analyzed, including accuracy, precision, and the receiver operating curve’s area under the curve. The outcome showed that classifiers that use feature selection have a greater overall accuracy(Xtreme Gradient Boost 98%and support vector machine 95%) compared to without feature selection technique (support vector machine 89%). Logistics regression performed better without at 95% than 94% with feature selection.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
特征工程技术在甲状腺疾病预测中的有效性对比分析
甲状腺边缘经历异常增生的甲状腺组织,这导致甲状腺疾病。甲状腺疾病的两种主要类型是甲状腺功能减退和甲状腺功能亢进,这两种疾病通常是由甲状腺释放过多的激素引起的。为了识别和诊断甲状腺疾病,本研究建议采用有效的分类器和特征选择策略,考虑准确性和其他性能评估指标。本研究提供了各种分类器的全面检查,包括支持向量机,逻辑回归和极端梯度增强算法。该算法使用三种特征去除策略,即递归特征消除、Pearson相关和卡方统计。为了确定甲状腺疾病,使用了Kaggle数据集的甲状腺数据。对实验的许多方面进行了评估和分析,包括准确性、精密度和接收器工作曲线下的面积。结果表明,与不使用特征选择技术(支持向量机89%)相比,使用特征选择的分类器具有更高的整体准确性(Xtreme Gradient Boost 98%和支持向量机95%)。在没有特征选择的情况下,物流回归的表现优于94%的特征选择。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Automatic detection of Covid-19 based on lung CT images using Deep Convolutional Neural Networks (CNN) A Mobile-Based Control System For Smart Homes Shrinking the digital divide in online learning beyond the COVID-19 pandemic: A Systematic Literature Review Queue Modelling and Jitter Control in Mobile Ad Hoc Networks Virtual Technologies for Tourism Promotion in Zimbabwe
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1