{"title":"特征工程技术在甲状腺疾病预测中的有效性对比分析","authors":"Aidah Mashonga, Leslie KudzaiNyandoro, Kudakwashe Zvarevashe","doi":"10.1109/ZCICT55726.2022.10045927","DOIUrl":null,"url":null,"abstract":"The thyroid gland’s edge experiences an abnormal proliferation of thyroid tissue, which causes thyroid illness. The two primary types of thyroid disorders are hypothyroidism and hyperthyroidism which typically result when this gland releases excessive amounts of hormones. To identify and diagnose thyroid disease, this study suggests employing effective classifiers and feature selection strategies that consider accuracy and other performance evaluation measures. This study offers a thorough examination of various classifiers that includes the support vector machine, logistic regression, and extreme gradient boosting algorithms. The algorithms use three feature removal strategies that is recursive feature elimination, Pearson’s correlation and chi-squared statistics. To determine thyroid illness, thyroid data from the Kaggle datasets were used. Numerous aspects of the experiment have been evaluated and analyzed, including accuracy, precision, and the receiver operating curve’s area under the curve. The outcome showed that classifiers that use feature selection have a greater overall accuracy(Xtreme Gradient Boost 98%and support vector machine 95%) compared to without feature selection technique (support vector machine 89%). Logistics regression performed better without at 95% than 94% with feature selection.","PeriodicalId":125540,"journal":{"name":"2022 1st Zimbabwe Conference of Information and Communication Technologies (ZCICT)","volume":"362 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A comparative analysis of the effectiveness of feature engineering techniques on thyroid disease prediction\",\"authors\":\"Aidah Mashonga, Leslie KudzaiNyandoro, Kudakwashe Zvarevashe\",\"doi\":\"10.1109/ZCICT55726.2022.10045927\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The thyroid gland’s edge experiences an abnormal proliferation of thyroid tissue, which causes thyroid illness. The two primary types of thyroid disorders are hypothyroidism and hyperthyroidism which typically result when this gland releases excessive amounts of hormones. To identify and diagnose thyroid disease, this study suggests employing effective classifiers and feature selection strategies that consider accuracy and other performance evaluation measures. This study offers a thorough examination of various classifiers that includes the support vector machine, logistic regression, and extreme gradient boosting algorithms. The algorithms use three feature removal strategies that is recursive feature elimination, Pearson’s correlation and chi-squared statistics. To determine thyroid illness, thyroid data from the Kaggle datasets were used. Numerous aspects of the experiment have been evaluated and analyzed, including accuracy, precision, and the receiver operating curve’s area under the curve. The outcome showed that classifiers that use feature selection have a greater overall accuracy(Xtreme Gradient Boost 98%and support vector machine 95%) compared to without feature selection technique (support vector machine 89%). Logistics regression performed better without at 95% than 94% with feature selection.\",\"PeriodicalId\":125540,\"journal\":{\"name\":\"2022 1st Zimbabwe Conference of Information and Communication Technologies (ZCICT)\",\"volume\":\"362 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 1st Zimbabwe Conference of Information and Communication Technologies (ZCICT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ZCICT55726.2022.10045927\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 1st Zimbabwe Conference of Information and Communication Technologies (ZCICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ZCICT55726.2022.10045927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A comparative analysis of the effectiveness of feature engineering techniques on thyroid disease prediction
The thyroid gland’s edge experiences an abnormal proliferation of thyroid tissue, which causes thyroid illness. The two primary types of thyroid disorders are hypothyroidism and hyperthyroidism which typically result when this gland releases excessive amounts of hormones. To identify and diagnose thyroid disease, this study suggests employing effective classifiers and feature selection strategies that consider accuracy and other performance evaluation measures. This study offers a thorough examination of various classifiers that includes the support vector machine, logistic regression, and extreme gradient boosting algorithms. The algorithms use three feature removal strategies that is recursive feature elimination, Pearson’s correlation and chi-squared statistics. To determine thyroid illness, thyroid data from the Kaggle datasets were used. Numerous aspects of the experiment have been evaluated and analyzed, including accuracy, precision, and the receiver operating curve’s area under the curve. The outcome showed that classifiers that use feature selection have a greater overall accuracy(Xtreme Gradient Boost 98%and support vector machine 95%) compared to without feature selection technique (support vector machine 89%). Logistics regression performed better without at 95% than 94% with feature selection.