利用数据挖掘分析5岁以下儿童营养状况，预测其贫血的机器学习

Computación Y Sistemas Pub Date : 2023-09-29 DOI:10.13053/cys-27-3-4315

Alexander J. Marcos Valdez, Eduardo G. Navarro Ortiz, Rodrigo E. Quinteros Peralta, Juan J. Tirado Julca, David F. Valentin Ricaldi, Hugo D. Calderon Vilca

{"title":"利用数据挖掘分析5岁以下儿童营养状况，预测其贫血的机器学习","authors":"Alexander J. Marcos Valdez, Eduardo G. Navarro Ortiz, Rodrigo E. Quinteros Peralta, Juan J. Tirado Julca, David F. Valentin Ricaldi, Hugo D. Calderon Vilca","doi":"10.13053/cys-27-3-4315","DOIUrl":null,"url":null,"abstract":"One of the main public health problems is child malnutrition, since it negatively affects the individual throughout his life, limits the development of society and makes it difficult to eradicate poverty. The first objective of this research is to apply data mining techniques for preprocessing, cleaning, reduction and transformation to a data lake that has allowed analyzing anemia in children under 5 years of age, the second objective is to apply Machine Learning algorithms to obtain the best model to predict anemia in children under 5 years of age. The data set was extracted from the open data platform of the government of Peru that corresponds to South Lima, North Lima, East Lima, Central Lima and rural Lima, which collected a total of 138,369 instances and 36 variables of which 30 are categorical and 6 numeric, being an unbalanced data set. In order to obtain the best predictor variables, the Anova F-test and Chi Square filters were used, and it was possible to reduce them to 10 variables, cases were also carried out without considering one of the filters and both filters.To find the best prediction model, the algorithms have been tested: decision tree, logistic regression, K nearest neighbors, random forest and naive bayes. As a result, we show that the best algorithm to predict anemia in children under 5 years of age is the Naive Bayes algorithm with the highest recall of 74%, precision of 43% and accuracy of 70%.","PeriodicalId":333706,"journal":{"name":"Computación Y Sistemas","volume":"494 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning for the Prediction of Anemia in Children Under 5 Years of Age by Analyzing their Nutritional Status Using Data Mining\",\"authors\":\"Alexander J. Marcos Valdez, Eduardo G. Navarro Ortiz, Rodrigo E. Quinteros Peralta, Juan J. Tirado Julca, David F. Valentin Ricaldi, Hugo D. Calderon Vilca\",\"doi\":\"10.13053/cys-27-3-4315\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One of the main public health problems is child malnutrition, since it negatively affects the individual throughout his life, limits the development of society and makes it difficult to eradicate poverty. The first objective of this research is to apply data mining techniques for preprocessing, cleaning, reduction and transformation to a data lake that has allowed analyzing anemia in children under 5 years of age, the second objective is to apply Machine Learning algorithms to obtain the best model to predict anemia in children under 5 years of age. The data set was extracted from the open data platform of the government of Peru that corresponds to South Lima, North Lima, East Lima, Central Lima and rural Lima, which collected a total of 138,369 instances and 36 variables of which 30 are categorical and 6 numeric, being an unbalanced data set. In order to obtain the best predictor variables, the Anova F-test and Chi Square filters were used, and it was possible to reduce them to 10 variables, cases were also carried out without considering one of the filters and both filters.To find the best prediction model, the algorithms have been tested: decision tree, logistic regression, K nearest neighbors, random forest and naive bayes. As a result, we show that the best algorithm to predict anemia in children under 5 years of age is the Naive Bayes algorithm with the highest recall of 74%, precision of 43% and accuracy of 70%.\",\"PeriodicalId\":333706,\"journal\":{\"name\":\"Computación Y Sistemas\",\"volume\":\"494 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computación Y Sistemas\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.13053/cys-27-3-4315\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computación Y Sistemas","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13053/cys-27-3-4315","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

儿童营养不良是主要的公共卫生问题之一，因为它对个人的一生产生负面影响，限制了社会的发展，使消除贫困变得困难。本研究的第一个目标是将数据挖掘技术用于预处理，清洗，还原和转换，以分析5岁以下儿童贫血的数据湖，第二个目标是应用机器学习算法获得预测5岁以下儿童贫血的最佳模型。数据集提取自秘鲁政府开放数据平台，该平台对应南利马、北利马、东利马、中部利马和农村利马，共收集了138369个实例和36个变量，其中30个为分类变量，6个为数字变量，属于非平衡数据集。为了获得最佳的预测变量，使用了方差分析f检验和卡方过滤器，并且有可能将它们减少到10个变量，也进行了不考虑其中一个过滤器和两个过滤器的情况。为了找到最好的预测模型，我们测试了决策树、逻辑回归、K近邻、随机森林和朴素贝叶斯等算法。结果表明，预测5岁以下儿童贫血的最佳算法是朴素贝叶斯算法，其最高召回率为74%，精度为43%，准确率为70%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Machine Learning for the Prediction of Anemia in Children Under 5 Years of Age by Analyzing their Nutritional Status Using Data Mining

One of the main public health problems is child malnutrition, since it negatively affects the individual throughout his life, limits the development of society and makes it difficult to eradicate poverty. The first objective of this research is to apply data mining techniques for preprocessing, cleaning, reduction and transformation to a data lake that has allowed analyzing anemia in children under 5 years of age, the second objective is to apply Machine Learning algorithms to obtain the best model to predict anemia in children under 5 years of age. The data set was extracted from the open data platform of the government of Peru that corresponds to South Lima, North Lima, East Lima, Central Lima and rural Lima, which collected a total of 138,369 instances and 36 variables of which 30 are categorical and 6 numeric, being an unbalanced data set. In order to obtain the best predictor variables, the Anova F-test and Chi Square filters were used, and it was possible to reduce them to 10 variables, cases were also carried out without considering one of the filters and both filters.To find the best prediction model, the algorithms have been tested: decision tree, logistic regression, K nearest neighbors, random forest and naive bayes. As a result, we show that the best algorithm to predict anemia in children under 5 years of age is the Naive Bayes algorithm with the highest recall of 74%, precision of 43% and accuracy of 70%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computación Y Sistemas

自引率

0.00%

发文量