Amanda Doggett, Ashok Chaurasia, Jean-Philippe Chaput, Scott T Leatherdale
{"title":"Using classification and regression trees to model missingness in youth BMI, height and body mass data.","authors":"Amanda Doggett, Ashok Chaurasia, Jean-Philippe Chaput, Scott T Leatherdale","doi":"10.24095/hpcdp.43.5.03","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Research suggests that there is often a high degree of missingness in youth body mass index (BMI) data derived from self-reported measures, which may have a large effect on research findings. The first step in handling missing data is to examine the levels and patterns of missingness. However, previous studies examining youth BMI missingness used logistic regression, which is limited in its ability to discern subgroups or identify a hierarchy of importance for variables, aspects that may go a long way in helping understand missing data patterns.</p><p><strong>Methods: </strong>This study used sex-stratified classification and regression tree (CART) models to examine missingness in height, body mass and BMI data among 74 501 youth participating in the 2018/19 COMPASS study (a prospective cohort study examining health behaviours among Canadian youth), where 31% of BMI data were missing. Diet, movement, academic, mental health and substance use variables were examined for associations with missingness in height, body mass and BMI.</p><p><strong>Results: </strong>CART models indicated that the combination of being younger, having a selfperception of being overweight, being less physically active and having poorer mental health yielded female and male subgroups highly likely to be missing BMI values. Survey respondents who did not perceive themselves as overweight and who were older were unlikely to be missing BMI values.</p><p><strong>Conclusion: </strong>The subgroups identified by the CART models indicate that a sample that deletes cases with missing BMI would be biased towards physically, emotionally and mentally healthier youth. Given the ability of CART models to identify these subgroups and a hierarchy of variable importance, they are an invaluable tool for examining missing data patterns and appropriate handling of missing data.</p>","PeriodicalId":51316,"journal":{"name":"Health Promotion and Chronic Disease Prevention in Canada-Research Policy and Practice","volume":"43 5","pages":"231-242"},"PeriodicalIF":2.2000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10237263/pdf/43_5_3.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Health Promotion and Chronic Disease Prevention in Canada-Research Policy and Practice","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.24095/hpcdp.43.5.03","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"PUBLIC, ENVIRONMENTAL & OCCUPATIONAL HEALTH","Score":null,"Total":0}
引用次数: 1
Abstract
Introduction: Research suggests that there is often a high degree of missingness in youth body mass index (BMI) data derived from self-reported measures, which may have a large effect on research findings. The first step in handling missing data is to examine the levels and patterns of missingness. However, previous studies examining youth BMI missingness used logistic regression, which is limited in its ability to discern subgroups or identify a hierarchy of importance for variables, aspects that may go a long way in helping understand missing data patterns.
Methods: This study used sex-stratified classification and regression tree (CART) models to examine missingness in height, body mass and BMI data among 74 501 youth participating in the 2018/19 COMPASS study (a prospective cohort study examining health behaviours among Canadian youth), where 31% of BMI data were missing. Diet, movement, academic, mental health and substance use variables were examined for associations with missingness in height, body mass and BMI.
Results: CART models indicated that the combination of being younger, having a selfperception of being overweight, being less physically active and having poorer mental health yielded female and male subgroups highly likely to be missing BMI values. Survey respondents who did not perceive themselves as overweight and who were older were unlikely to be missing BMI values.
Conclusion: The subgroups identified by the CART models indicate that a sample that deletes cases with missing BMI would be biased towards physically, emotionally and mentally healthier youth. Given the ability of CART models to identify these subgroups and a hierarchy of variable importance, they are an invaluable tool for examining missing data patterns and appropriate handling of missing data.
期刊介绍:
Health Promotion and Chronic Disease Prevention in Canada: Research, Policy and Practice (the HPCDP Journal) is the monthly, online scientific journal of the Health Promotion and Chronic Disease Prevention Branch of the Public Health Agency of Canada. The journal publishes articles on disease prevention, health promotion and health equity in the areas of chronic diseases, injuries and life course health. Content includes research from fields such as public/community health, epidemiology, biostatistics, the behavioural and social sciences, and health services or economics.