adasynn - svm与SMOTE-SVM检测2型糖尿病的比较分析

Nur Ghaniaviyanto Ramadhan
{"title":"adasynn - svm与SMOTE-SVM检测2型糖尿病的比较分析","authors":"Nur Ghaniaviyanto Ramadhan","doi":"10.15294/sji.v8i2.32484","DOIUrl":null,"url":null,"abstract":"Most people with diabetes in the world are type 2. We can detect diabetes early to prevent things that are not desirable by checking sugar and insulin levels with the doctor. In addition to using this method, people with diabetes can also be grouped based on data from diabetes examination results. However, most of the data on health examination results have several parameters that are difficult for the public to understand. These problems can be done by means of automatic classification. In addition to these problems, there is another problem in the form of an unbalanced amount of data for diabetics and non-diabetics. This problem can be done by balancing the amount of data using the model to increase the ratio of the amount of data that is small or decrease the ratio of the amount of data that is too much. Purpose: This study aims to detect type 2 diabetes mellitus using the SVM classification model and analyze the results of the comparison using the SMOTE and ADASYN data balancing technique which is the best. Methods/Study design/approach: The research method starts from collecting the diabetes dataset, then the dataset cleaning process is carried out whether there is a null value or not. After applying two oversampling methods to analyze which method is the most appropriate. After the oversampling technique was carried out, data classification was carried out using a support vector machine model to see the accuracy results. Result/Findings: The results obtained by the ADASYN-SVM method are superior to SMOTE-SVM. The ADASYNSVM method has an accuracy of 87.3%, while the SMOTE-SVM has an accuracy of 85.4%. Novelty/Originality/Value: The data used in this study came from the Karya Medika clinic, Indonesia which contains parameters related to type 2 diabetes.","PeriodicalId":30781,"journal":{"name":"Scientific Journal of Informatics","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2021-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus\",\"authors\":\"Nur Ghaniaviyanto Ramadhan\",\"doi\":\"10.15294/sji.v8i2.32484\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most people with diabetes in the world are type 2. We can detect diabetes early to prevent things that are not desirable by checking sugar and insulin levels with the doctor. In addition to using this method, people with diabetes can also be grouped based on data from diabetes examination results. However, most of the data on health examination results have several parameters that are difficult for the public to understand. These problems can be done by means of automatic classification. In addition to these problems, there is another problem in the form of an unbalanced amount of data for diabetics and non-diabetics. This problem can be done by balancing the amount of data using the model to increase the ratio of the amount of data that is small or decrease the ratio of the amount of data that is too much. Purpose: This study aims to detect type 2 diabetes mellitus using the SVM classification model and analyze the results of the comparison using the SMOTE and ADASYN data balancing technique which is the best. Methods/Study design/approach: The research method starts from collecting the diabetes dataset, then the dataset cleaning process is carried out whether there is a null value or not. After applying two oversampling methods to analyze which method is the most appropriate. After the oversampling technique was carried out, data classification was carried out using a support vector machine model to see the accuracy results. Result/Findings: The results obtained by the ADASYN-SVM method are superior to SMOTE-SVM. The ADASYNSVM method has an accuracy of 87.3%, while the SMOTE-SVM has an accuracy of 85.4%. Novelty/Originality/Value: The data used in this study came from the Karya Medika clinic, Indonesia which contains parameters related to type 2 diabetes.\",\"PeriodicalId\":30781,\"journal\":{\"name\":\"Scientific Journal of Informatics\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific Journal of Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15294/sji.v8i2.32484\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific Journal of Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15294/sji.v8i2.32484","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

摘要

世界上大多数糖尿病患者都是2型糖尿病。我们可以通过与医生检查血糖和胰岛素水平,及早发现糖尿病,以预防不理想的情况。除了使用这种方法,糖尿病患者还可以根据糖尿病检查结果的数据进行分组。然而,大多数健康检查结果的数据都有几个公众难以理解的参数。这些问题可以通过自动分类来解决。除了这些问题之外,还有另一个问题,即糖尿病患者和非糖尿病患者的数据量不平衡。这个问题可以通过使用模型平衡数据量来实现,以增加小数据量的比率或减少过多数据量的比例。目的:本研究旨在使用SVM分类模型检测2型糖尿病,并使用最佳的SMOTE和ADASYN数据平衡技术分析比较结果。方法/研究设计/方法:研究方法从收集糖尿病数据集开始,然后进行数据集清理过程,无论是否存在零值。在应用两种过采样方法来分析哪种方法最合适之后。在进行过采样技术之后,使用支持向量机模型进行数据分类,以查看准确性结果。结果/发现:ADASYN-SVM法的结果优于SMOTE-SVM法。ADASYNSVM方法的准确率为87.3%,而SMOTE-SVM方法的准确度为85.4%。新颖性/独创性/价值:本研究中使用的数据来自印度尼西亚Karya Medika诊所,该诊所包含与2型糖尿病相关的参数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Comparative Analysis of ADASYN-SVM and SMOTE-SVM Methods on the Detection of Type 2 Diabetes Mellitus
Most people with diabetes in the world are type 2. We can detect diabetes early to prevent things that are not desirable by checking sugar and insulin levels with the doctor. In addition to using this method, people with diabetes can also be grouped based on data from diabetes examination results. However, most of the data on health examination results have several parameters that are difficult for the public to understand. These problems can be done by means of automatic classification. In addition to these problems, there is another problem in the form of an unbalanced amount of data for diabetics and non-diabetics. This problem can be done by balancing the amount of data using the model to increase the ratio of the amount of data that is small or decrease the ratio of the amount of data that is too much. Purpose: This study aims to detect type 2 diabetes mellitus using the SVM classification model and analyze the results of the comparison using the SMOTE and ADASYN data balancing technique which is the best. Methods/Study design/approach: The research method starts from collecting the diabetes dataset, then the dataset cleaning process is carried out whether there is a null value or not. After applying two oversampling methods to analyze which method is the most appropriate. After the oversampling technique was carried out, data classification was carried out using a support vector machine model to see the accuracy results. Result/Findings: The results obtained by the ADASYN-SVM method are superior to SMOTE-SVM. The ADASYNSVM method has an accuracy of 87.3%, while the SMOTE-SVM has an accuracy of 85.4%. Novelty/Originality/Value: The data used in this study came from the Karya Medika clinic, Indonesia which contains parameters related to type 2 diabetes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
13
审稿时长
24 weeks
期刊最新文献
A Comparative Study of Random Forest and Double Random Forest Models from View Points of Their Interpretability Comparative Analysis of LSTM Neural Network and SVM for USD Exchange Rate Prediction: A Study on Different Training Data Scenarios Knowledge Discovery from Confusion Matrix of Pruned CART in Imbalanced Microarray Data Ovarian Cancer Classification Comparison of Discriminant Analysis and Support Vector Machine on Mixed Categorical and Continuous Independent Variables for COVID-19 Patients Data The Comparison of K-Nearest Neighbors and Random Forest Algorithm to Recognize Indonesian Sign Language in a Real-Time
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1