PREDICTIVE SIMULATION FOR TYPE II DIABETES USING DATA MINING STRATEGIES APPLIED TO BIG DATA

M. Turnea, M. Ilea
{"title":"PREDICTIVE SIMULATION FOR TYPE II DIABETES USING DATA MINING STRATEGIES APPLIED TO BIG DATA","authors":"M. Turnea, M. Ilea","doi":"10.12753/2066-026x-18-213","DOIUrl":null,"url":null,"abstract":"By recent estimation, there are over 30 million people that have diabetes only in USA. From this, around 7 million are supposed to have undiagnosed diabetes. Different countries have been made efforts to predict and avoid the risk of developing complications from this disease. The implementation of Electronic Health Records and collection of data in a national register for all the patients that have been developed diabetes is an issue to make a valid predictor for diabetes mellitus evolution, e-health stage of population and risk assessment due to various causative factors responsible for T2DM (type 2 diabetes mellitus). One approach is frequently used in diabetes prediction inspired by data mining algorithms, the decision tree, single or as mixed techniques with SVM (support vector machine), inductive learning, and clustering techniques. Data mining is applied to existing diabetes record for many years. Data mining is applied in this case to analyzing and extract new knowledge for prediction and classification based on large amount of records. Decision trees and associative classification is used as tools in this paper. Genetic data are difficult to integrate in a predictor using big data collected at national level so the main individual attributes are collected from three sources: clinical data, anthropological measures and personal and family history (related to T2DM and vascular diseases). The irrelevant rules, below a threshold are deleted in a pruning process in order to make the classification tree more efficient. The preliminary results are present along with directions of future research. We propose an architecture that can collect and predict the risk for existent records and analyses the reis for a new record triggered by update or append operation with possible storage in cloud computing.","PeriodicalId":371908,"journal":{"name":"14th International Conference eLearning and Software for Education","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"14th International Conference eLearning and Software for Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12753/2066-026x-18-213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

By recent estimation, there are over 30 million people that have diabetes only in USA. From this, around 7 million are supposed to have undiagnosed diabetes. Different countries have been made efforts to predict and avoid the risk of developing complications from this disease. The implementation of Electronic Health Records and collection of data in a national register for all the patients that have been developed diabetes is an issue to make a valid predictor for diabetes mellitus evolution, e-health stage of population and risk assessment due to various causative factors responsible for T2DM (type 2 diabetes mellitus). One approach is frequently used in diabetes prediction inspired by data mining algorithms, the decision tree, single or as mixed techniques with SVM (support vector machine), inductive learning, and clustering techniques. Data mining is applied to existing diabetes record for many years. Data mining is applied in this case to analyzing and extract new knowledge for prediction and classification based on large amount of records. Decision trees and associative classification is used as tools in this paper. Genetic data are difficult to integrate in a predictor using big data collected at national level so the main individual attributes are collected from three sources: clinical data, anthropological measures and personal and family history (related to T2DM and vascular diseases). The irrelevant rules, below a threshold are deleted in a pruning process in order to make the classification tree more efficient. The preliminary results are present along with directions of future research. We propose an architecture that can collect and predict the risk for existent records and analyses the reis for a new record triggered by update or append operation with possible storage in cloud computing.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
应用大数据的数据挖掘策略对ii型糖尿病进行预测模拟
根据最近的估计,仅在美国就有超过3000万人患有糖尿病。由此推算,约有700万人患有未确诊的糖尿病。不同的国家已作出努力,以预测和避免这种疾病产生并发症的风险。实施电子健康记录和在国家登记中收集所有糖尿病患者的数据是一个问题,可以有效预测糖尿病的演变、人口的电子健康阶段和由于导致2型糖尿病的各种病因因素而进行的风险评估。受数据挖掘算法、决策树、支持向量机(SVM)、归纳学习和聚类技术的单一或混合技术的启发,一种方法经常用于糖尿病预测。数据挖掘应用于现有的糖尿病记录已有多年的历史。在这种情况下,应用数据挖掘技术对大量的记录进行分析和提取新的知识,用于预测和分类。本文使用决策树和关联分类作为工具。遗传数据很难整合到使用国家层面收集的大数据的预测器中,因此主要的个人属性是从三个来源收集的:临床数据、人类学测量和个人和家族史(与2型糖尿病和血管疾病有关)。在剪枝过程中删除低于阈值的不相关规则,以提高分类树的效率。本文给出了初步的研究结果,并提出了今后的研究方向。我们提出了一种架构,可以收集和预测现有记录的风险,并分析由更新或追加操作触发的新记录的风险,并可能在云计算中存储。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING - FUTURE TRENDS IN TEACHING ESL AND ESP DESIGN AND DEVELOPMENT OF A SERIOUS GAME FOR TREATING ACROPHOBIA ONLINE LEARNING OF TEXTILE ENGINEERING USING MOODLE PLATFORM FREE ONLINE EDUCATION - THE FUTURE OF A BETTER WORLD? ONLINE MARKETING STRATEGIES USED TO DEVELOP ONLINE COURSES
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1