Gavin Long , Georgiana Nica-Avram , John Harvey , Evgeniya Lukinova , Roberto Mansilla , Simon Welham , Gregor Engelmann , Elizabeth Dolan , Kuzivakwashe Makokoro , Michelle Thomas , Edward Powell , James Goulding
{"title":"Machine learning on national shopping data reliably estimates childhood obesity prevalence and socio-economic deprivation","authors":"Gavin Long , Georgiana Nica-Avram , John Harvey , Evgeniya Lukinova , Roberto Mansilla , Simon Welham , Gregor Engelmann , Elizabeth Dolan , Kuzivakwashe Makokoro , Michelle Thomas , Edward Powell , James Goulding","doi":"10.1016/j.foodpol.2025.102826","DOIUrl":null,"url":null,"abstract":"<div><div>Deprivation pushes people to choose cheap, calorie-dense foods instead of nutritious but expensive alternatives. Diseases, such as obesity, cardiovascular disease, and diabetes, resulting from these poor dietary choices place a significant burden on public health systems. Measuring nutritional insecurity is difficult to achieve at scale and so the ability to study the relationship between nutritional outcomes and deprivation at a national level is very challenging. This makes it difficult to understand the effect of new policies or track changes over time. To address this challenge, we develop a machine learning approach using massive anonymised transactional data (4 million members and 2.5 billion transactions) in partnership with the retailer The Co-operative Group UK. We engineer a series of variables related to obesogenic diets, including a new measure called ‘Calorie-oriented purchasing’. These variables help illustrate how large-scale transactional data can discriminate between neighbourhoods most affected by deprivation and childhood obesity. Through comparative assessment of machine learning approaches, we find better performance from tree-based models (Random Forest, XGBoost) with the best-achieving accuracy of 0.88 for predicting deprivation and an accuracy of 0.79 for childhood obesity. Calorie-oriented purchasing emerges as a robust predictor of deprivation and childhood obesity at the census area level. Results show this approach can help summarise nutritional insecurity, and support its spatio-temporal monitoring. We conclude with policy implications and recommend retailers adopt new measures for measuring national nutrition insecurity.</div></div>","PeriodicalId":321,"journal":{"name":"Food Policy","volume":"131 ","pages":"Article 102826"},"PeriodicalIF":6.8000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Food Policy","FirstCategoryId":"97","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306919225000302","RegionNum":1,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ECONOMICS & POLICY","Score":null,"Total":0}
引用次数: 0
Abstract
Deprivation pushes people to choose cheap, calorie-dense foods instead of nutritious but expensive alternatives. Diseases, such as obesity, cardiovascular disease, and diabetes, resulting from these poor dietary choices place a significant burden on public health systems. Measuring nutritional insecurity is difficult to achieve at scale and so the ability to study the relationship between nutritional outcomes and deprivation at a national level is very challenging. This makes it difficult to understand the effect of new policies or track changes over time. To address this challenge, we develop a machine learning approach using massive anonymised transactional data (4 million members and 2.5 billion transactions) in partnership with the retailer The Co-operative Group UK. We engineer a series of variables related to obesogenic diets, including a new measure called ‘Calorie-oriented purchasing’. These variables help illustrate how large-scale transactional data can discriminate between neighbourhoods most affected by deprivation and childhood obesity. Through comparative assessment of machine learning approaches, we find better performance from tree-based models (Random Forest, XGBoost) with the best-achieving accuracy of 0.88 for predicting deprivation and an accuracy of 0.79 for childhood obesity. Calorie-oriented purchasing emerges as a robust predictor of deprivation and childhood obesity at the census area level. Results show this approach can help summarise nutritional insecurity, and support its spatio-temporal monitoring. We conclude with policy implications and recommend retailers adopt new measures for measuring national nutrition insecurity.
期刊介绍:
Food Policy is a multidisciplinary journal publishing original research and novel evidence on issues in the formulation, implementation, and evaluation of policies for the food sector in developing, transition, and advanced economies.
Our main focus is on the economic and social aspect of food policy, and we prioritize empirical studies informing international food policy debates. Provided that articles make a clear and explicit contribution to food policy debates of international interest, we consider papers from any of the social sciences. Papers from other disciplines (e.g., law) will be considered only if they provide a key policy contribution, and are written in a style which is accessible to a social science readership.