用改进的非线性核改进SVM在II型糖尿病预测中的性能:来自PIMA数据集的见解

Md.Shamim Reza , Umme Hafsha , Ruhul Amin , Rubia Yasmin , Sabba Ruhi
{"title":"用改进的非线性核改进SVM在II型糖尿病预测中的性能:来自PIMA数据集的见解","authors":"Md.Shamim Reza ,&nbsp;Umme Hafsha ,&nbsp;Ruhul Amin ,&nbsp;Rubia Yasmin ,&nbsp;Sabba Ruhi","doi":"10.1016/j.cmpbup.2023.100118","DOIUrl":null,"url":null,"abstract":"<div><p>Type 2 diabetes is a chronic metabolic disease that affects a significant portion of the worldwide people. Prediction of this disease using different machine learning (ML) based algorithms has gained substantial attention due to its potential for early detection and effective intervention. One of the most powerful ML algorithm support vector machines (SVM) has proven to be effective in a variety of classification tasks, including diabetes prediction. However, the kernel function chosen has a substantial effect on the performance of SVM classifiers. This paper proposes an improved non-linear kernel for the SVM model to enhance Type 2 diabetes classification. The new kernel uses radial basis function (RBF) and RBF city block kernels that enable SVM to learn complex decision boundaries and adapt to the intricacies of the PIMA dataset. The PIMA dataset contains various clinical and demographic features of individuals. To address missing values and outliers, we impute them using the median, ensuring the integrity of the dataset. We tackle the class imbalance issue by leveraging a robust synthetic-based over-sampling approach.</p><p>A comparative analysis is performed against several existing kernel functions to show that the proposed approach is superior in terms of various prediction evaluation matrices. Our recommended integrated kernel model also showed improved performance (ACC = 85.5, Recall = 87.0, Precision = 83.4, F1 score = 85.2, and AUC = 85.5) when compared to other approaches in the literature. The results of this study indicate that the proposed non-linear kernel in SVM outperforms existing kernel functions for predicting Type 2 diabetes using the PIMA dataset. Furthermore, a simulation study is carried out to robustify the proposed kernel in SVM and perform well. The improved accuracy and robustness of the model suggest its potential utility in clinical settings, aiding in the early identification and management of individuals at risk for developing diabetes.</p></div>","PeriodicalId":72670,"journal":{"name":"Computer methods and programs in biomedicine update","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset\",\"authors\":\"Md.Shamim Reza ,&nbsp;Umme Hafsha ,&nbsp;Ruhul Amin ,&nbsp;Rubia Yasmin ,&nbsp;Sabba Ruhi\",\"doi\":\"10.1016/j.cmpbup.2023.100118\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Type 2 diabetes is a chronic metabolic disease that affects a significant portion of the worldwide people. Prediction of this disease using different machine learning (ML) based algorithms has gained substantial attention due to its potential for early detection and effective intervention. One of the most powerful ML algorithm support vector machines (SVM) has proven to be effective in a variety of classification tasks, including diabetes prediction. However, the kernel function chosen has a substantial effect on the performance of SVM classifiers. This paper proposes an improved non-linear kernel for the SVM model to enhance Type 2 diabetes classification. The new kernel uses radial basis function (RBF) and RBF city block kernels that enable SVM to learn complex decision boundaries and adapt to the intricacies of the PIMA dataset. The PIMA dataset contains various clinical and demographic features of individuals. To address missing values and outliers, we impute them using the median, ensuring the integrity of the dataset. We tackle the class imbalance issue by leveraging a robust synthetic-based over-sampling approach.</p><p>A comparative analysis is performed against several existing kernel functions to show that the proposed approach is superior in terms of various prediction evaluation matrices. Our recommended integrated kernel model also showed improved performance (ACC = 85.5, Recall = 87.0, Precision = 83.4, F1 score = 85.2, and AUC = 85.5) when compared to other approaches in the literature. The results of this study indicate that the proposed non-linear kernel in SVM outperforms existing kernel functions for predicting Type 2 diabetes using the PIMA dataset. Furthermore, a simulation study is carried out to robustify the proposed kernel in SVM and perform well. The improved accuracy and robustness of the model suggest its potential utility in clinical settings, aiding in the early identification and management of individuals at risk for developing diabetes.</p></div>\",\"PeriodicalId\":72670,\"journal\":{\"name\":\"Computer methods and programs in biomedicine update\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer methods and programs in biomedicine update\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666990023000265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer methods and programs in biomedicine update","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666990023000265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

2型糖尿病是一种慢性代谢性疾病,影响着世界上很大一部分人。由于其早期发现和有效干预的潜力,使用不同的基于机器学习(ML)的算法预测这种疾病已经获得了大量关注。最强大的机器学习算法之一支持向量机(SVM)已被证明在各种分类任务中是有效的,包括糖尿病预测。然而,核函数的选择对SVM分类器的性能有很大的影响。本文提出了一种改进的非线性核支持向量机模型,以增强2型糖尿病的分类能力。新核使用径向基函数(RBF)和RBF城市块核,使支持向量机能够学习复杂的决策边界并适应PIMA数据集的复杂性。PIMA数据集包含个人的各种临床和人口统计学特征。为了解决缺失值和异常值,我们使用中位数来估算它们,以确保数据集的完整性。我们通过利用稳健的基于合成的过采样方法来解决类不平衡问题。通过与几种现有核函数的比较分析,表明该方法在各种预测评价矩阵方面具有优越性。与文献中的其他方法相比,我们推荐的集成核模型也显示出更高的性能(ACC = 85.5, Recall = 87.0, Precision = 83.4, F1得分= 85.2,AUC = 85.5)。本研究结果表明,所提出的SVM非线性核函数在使用PIMA数据集预测2型糖尿病方面优于现有的核函数。通过仿真研究,验证了所提核在支持向量机中的鲁棒性,并取得了良好的效果。该模型的准确性和稳健性的提高表明其在临床环境中的潜在效用,有助于早期识别和管理有患糖尿病风险的个体。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving SVM performance for type II diabetes prediction with an improved non-linear kernel: Insights from the PIMA dataset

Type 2 diabetes is a chronic metabolic disease that affects a significant portion of the worldwide people. Prediction of this disease using different machine learning (ML) based algorithms has gained substantial attention due to its potential for early detection and effective intervention. One of the most powerful ML algorithm support vector machines (SVM) has proven to be effective in a variety of classification tasks, including diabetes prediction. However, the kernel function chosen has a substantial effect on the performance of SVM classifiers. This paper proposes an improved non-linear kernel for the SVM model to enhance Type 2 diabetes classification. The new kernel uses radial basis function (RBF) and RBF city block kernels that enable SVM to learn complex decision boundaries and adapt to the intricacies of the PIMA dataset. The PIMA dataset contains various clinical and demographic features of individuals. To address missing values and outliers, we impute them using the median, ensuring the integrity of the dataset. We tackle the class imbalance issue by leveraging a robust synthetic-based over-sampling approach.

A comparative analysis is performed against several existing kernel functions to show that the proposed approach is superior in terms of various prediction evaluation matrices. Our recommended integrated kernel model also showed improved performance (ACC = 85.5, Recall = 87.0, Precision = 83.4, F1 score = 85.2, and AUC = 85.5) when compared to other approaches in the literature. The results of this study indicate that the proposed non-linear kernel in SVM outperforms existing kernel functions for predicting Type 2 diabetes using the PIMA dataset. Furthermore, a simulation study is carried out to robustify the proposed kernel in SVM and perform well. The improved accuracy and robustness of the model suggest its potential utility in clinical settings, aiding in the early identification and management of individuals at risk for developing diabetes.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.90
自引率
0.00%
发文量
0
审稿时长
10 weeks
期刊最新文献
Fostering digital health literacy to enhance trust and improve health outcomes Machine learning from real data: A mental health registry case study ResfEANet: ResNet-fused External Attention Network for Tuberculosis Diagnosis using Chest X-ray Images Role-playing recovery in social virtual worlds: Adult use of child avatars as PTSD therapy Comparative evaluation of low-cost 3D scanning devices for ear acquisition
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1