利用数据挖掘分析5岁以下儿童营养状况,预测其贫血的机器学习

Alexander J. Marcos Valdez, Eduardo G. Navarro Ortiz, Rodrigo E. Quinteros Peralta, Juan J. Tirado Julca, David F. Valentin Ricaldi, Hugo D. Calderon Vilca
{"title":"利用数据挖掘分析5岁以下儿童营养状况,预测其贫血的机器学习","authors":"Alexander J. Marcos Valdez, Eduardo G. Navarro Ortiz, Rodrigo E. Quinteros Peralta, Juan J. Tirado Julca, David F. Valentin Ricaldi, Hugo D. Calderon Vilca","doi":"10.13053/cys-27-3-4315","DOIUrl":null,"url":null,"abstract":"One of the main public health problems is child malnutrition, since it negatively affects the individual throughout his life, limits the development of society and makes it difficult to eradicate poverty. The first objective of this research is to apply data mining techniques for preprocessing, cleaning, reduction and transformation to a data lake that has allowed analyzing anemia in children under 5 years of age, the second objective is to apply Machine Learning algorithms to obtain the best model to predict anemia in children under 5 years of age. The data set was extracted from the open data platform of the government of Peru that corresponds to South Lima, North Lima, East Lima, Central Lima and rural Lima, which collected a total of 138,369 instances and 36 variables of which 30 are categorical and 6 numeric, being an unbalanced data set. In order to obtain the best predictor variables, the Anova F-test and Chi Square filters were used, and it was possible to reduce them to 10 variables, cases were also carried out without considering one of the filters and both filters.To find the best prediction model, the algorithms have been tested: decision tree, logistic regression, K nearest neighbors, random forest and naive bayes. As a result, we show that the best algorithm to predict anemia in children under 5 years of age is the Naive Bayes algorithm with the highest recall of 74%, precision of 43% and accuracy of 70%.","PeriodicalId":333706,"journal":{"name":"Computación Y Sistemas","volume":"494 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning for the Prediction of Anemia in Children Under 5 Years of Age by Analyzing their Nutritional Status Using Data Mining\",\"authors\":\"Alexander J. Marcos Valdez, Eduardo G. Navarro Ortiz, Rodrigo E. Quinteros Peralta, Juan J. Tirado Julca, David F. Valentin Ricaldi, Hugo D. Calderon Vilca\",\"doi\":\"10.13053/cys-27-3-4315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the main public health problems is child malnutrition, since it negatively affects the individual throughout his life, limits the development of society and makes it difficult to eradicate poverty. The first objective of this research is to apply data mining techniques for preprocessing, cleaning, reduction and transformation to a data lake that has allowed analyzing anemia in children under 5 years of age, the second objective is to apply Machine Learning algorithms to obtain the best model to predict anemia in children under 5 years of age. The data set was extracted from the open data platform of the government of Peru that corresponds to South Lima, North Lima, East Lima, Central Lima and rural Lima, which collected a total of 138,369 instances and 36 variables of which 30 are categorical and 6 numeric, being an unbalanced data set. In order to obtain the best predictor variables, the Anova F-test and Chi Square filters were used, and it was possible to reduce them to 10 variables, cases were also carried out without considering one of the filters and both filters.To find the best prediction model, the algorithms have been tested: decision tree, logistic regression, K nearest neighbors, random forest and naive bayes. As a result, we show that the best algorithm to predict anemia in children under 5 years of age is the Naive Bayes algorithm with the highest recall of 74%, precision of 43% and accuracy of 70%.\",\"PeriodicalId\":333706,\"journal\":{\"name\":\"Computación Y Sistemas\",\"volume\":\"494 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computación Y Sistemas\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.13053/cys-27-3-4315\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computación Y Sistemas","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13053/cys-27-3-4315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

儿童营养不良是主要的公共卫生问题之一,因为它对个人的一生产生负面影响,限制了社会的发展,使消除贫困变得困难。本研究的第一个目标是将数据挖掘技术用于预处理,清洗,还原和转换,以分析5岁以下儿童贫血的数据湖,第二个目标是应用机器学习算法获得预测5岁以下儿童贫血的最佳模型。数据集提取自秘鲁政府开放数据平台,该平台对应南利马、北利马、东利马、中部利马和农村利马,共收集了138369个实例和36个变量,其中30个为分类变量,6个为数字变量,属于非平衡数据集。为了获得最佳的预测变量,使用了方差分析f检验和卡方过滤器,并且有可能将它们减少到10个变量,也进行了不考虑其中一个过滤器和两个过滤器的情况。为了找到最好的预测模型,我们测试了决策树、逻辑回归、K近邻、随机森林和朴素贝叶斯等算法。结果表明,预测5岁以下儿童贫血的最佳算法是朴素贝叶斯算法,其最高召回率为74%,精度为43%,准确率为70%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine Learning for the Prediction of Anemia in Children Under 5 Years of Age by Analyzing their Nutritional Status Using Data Mining
One of the main public health problems is child malnutrition, since it negatively affects the individual throughout his life, limits the development of society and makes it difficult to eradicate poverty. The first objective of this research is to apply data mining techniques for preprocessing, cleaning, reduction and transformation to a data lake that has allowed analyzing anemia in children under 5 years of age, the second objective is to apply Machine Learning algorithms to obtain the best model to predict anemia in children under 5 years of age. The data set was extracted from the open data platform of the government of Peru that corresponds to South Lima, North Lima, East Lima, Central Lima and rural Lima, which collected a total of 138,369 instances and 36 variables of which 30 are categorical and 6 numeric, being an unbalanced data set. In order to obtain the best predictor variables, the Anova F-test and Chi Square filters were used, and it was possible to reduce them to 10 variables, cases were also carried out without considering one of the filters and both filters.To find the best prediction model, the algorithms have been tested: decision tree, logistic regression, K nearest neighbors, random forest and naive bayes. As a result, we show that the best algorithm to predict anemia in children under 5 years of age is the Naive Bayes algorithm with the highest recall of 74%, precision of 43% and accuracy of 70%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Multispectral Camera Calibration Using Convolutional Neural Networks Simulation of Systems with Random Variables for Making Strategic Decisions Parametric Negations of Probability Distributions and Fuzzy Distribution Sets trACE - Anomaly Correlation Engine for Tracing the Root Cause on a Cloud based Microservice Architecture Spatiotemporal Bandits Crime Prediction from Web News Archives Analysis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1