Outliers in nutrient intake data for U.S. adults: national health and nutrition examination survey 2017–2018

Q3 Mathematics Epidemiologic Methods Pub Date : 2023-01-01 DOI:10.1515/em-2023-0018
Sara Burcham, Yuki Liu, Ashley L. Merianos, Angelico Mendy
{"title":"Outliers in nutrient intake data for U.S. adults: national health and nutrition examination survey 2017–2018","authors":"Sara Burcham, Yuki Liu, Ashley L. Merianos, Angelico Mendy","doi":"10.1515/em-2023-0018","DOIUrl":null,"url":null,"abstract":"Abstract Objectives An important step in preparing data for statistical analysis is outlier detection and removal, yet no gold standard exists in current literature. The objective of this study is to identify the ideal decision test using the National Health and Nutrition Examination Survey (NHANES) 2017–2018 dietary data. Methods We conducted a secondary analysis of NHANES 24-h dietary recalls, considering the survey's multi-stage cluster design. Six outlier detection and removal strategies were assessed by evaluating the decision tests' impact on the Pearson's correlation coefficient among macronutrients. Furthermore, we assessed changes in the effect size estimates based on pre-defined sample sizes. The data were collected as part of the 2017–2018 24-h dietary recall among adult participants (N=4,893). Results Effect estimate changes for macronutrients varied from 6.5 % for protein to 39.3 % for alcohol across all decision tests. The largest proportion of outliers removed was 4.0 % in the large sample size, for the decision test, >2 standard deviations from the mean. The smallest sample size, particularly for alcohol analysis, was most affected by the six decision tests when compared to no decision test. Conclusions This study, the first to use 2017–2018 NHANES dietary data for outlier evaluation, emphasizes the importance of selecting an appropriate decision test considering factors such as statistical power, sample size, normality assumptions, the proportion of data removed, effect estimate changes, and the consistency of estimates across sample sizes. We recommend the use of non-parametric tests for non-normally distributed variables of interest.","PeriodicalId":37999,"journal":{"name":"Epidemiologic Methods","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Epidemiologic Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/em-2023-0018","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Objectives An important step in preparing data for statistical analysis is outlier detection and removal, yet no gold standard exists in current literature. The objective of this study is to identify the ideal decision test using the National Health and Nutrition Examination Survey (NHANES) 2017–2018 dietary data. Methods We conducted a secondary analysis of NHANES 24-h dietary recalls, considering the survey's multi-stage cluster design. Six outlier detection and removal strategies were assessed by evaluating the decision tests' impact on the Pearson's correlation coefficient among macronutrients. Furthermore, we assessed changes in the effect size estimates based on pre-defined sample sizes. The data were collected as part of the 2017–2018 24-h dietary recall among adult participants (N=4,893). Results Effect estimate changes for macronutrients varied from 6.5 % for protein to 39.3 % for alcohol across all decision tests. The largest proportion of outliers removed was 4.0 % in the large sample size, for the decision test, >2 standard deviations from the mean. The smallest sample size, particularly for alcohol analysis, was most affected by the six decision tests when compared to no decision test. Conclusions This study, the first to use 2017–2018 NHANES dietary data for outlier evaluation, emphasizes the importance of selecting an appropriate decision test considering factors such as statistical power, sample size, normality assumptions, the proportion of data removed, effect estimate changes, and the consistency of estimates across sample sizes. We recommend the use of non-parametric tests for non-normally distributed variables of interest.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
美国成年人营养摄入数据的异常值:2017-2018年全国健康和营养检查调查
摘要目的为统计分析准备数据的重要步骤是异常值检测和去除,但目前文献中没有金标准。本研究的目的是利用2017-2018年国家健康与营养检查调查(NHANES)的饮食数据确定理想的决策测试。方法考虑到调查的多阶段聚类设计,我们对NHANES 24小时饮食召回进行二次分析。通过评估决策测试对宏量营养素间Pearson相关系数的影响,对六种异常值检测和去除策略进行了评估。此外,我们根据预先定义的样本量评估了效应大小估计值的变化。这些数据是作为2017-2018年成人参与者24小时饮食回顾的一部分收集的(N=4,893)。结果在所有决策测试中,宏量营养素的效应估计变化从蛋白质的6.5%到酒精的39.3%不等。对于决策检验,在大样本量中,去除异常值的最大比例为4.0%,与平均值相差2个标准差。与没有决策测试相比,最小样本量,特别是酒精分析,受六种决策测试的影响最大。本研究首次使用2017-2018年NHANES饮食数据进行离群值评估,强调了选择合适的决策检验的重要性,考虑了统计能力、样本量、正态性假设、数据删除比例、效应估计变化以及不同样本量估计的一致性等因素。我们建议对感兴趣的非正态分布变量使用非参数检验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Epidemiologic Methods
Epidemiologic Methods Mathematics-Applied Mathematics
CiteScore
2.10
自引率
0.00%
发文量
7
期刊介绍: Epidemiologic Methods (EM) seeks contributions comparable to those of the leading epidemiologic journals, but also invites papers that may be more technical or of greater length than what has traditionally been allowed by journals in epidemiology. Applications and examples with real data to illustrate methodology are strongly encouraged but not required. Topics. genetic epidemiology, infectious disease, pharmaco-epidemiology, ecologic studies, environmental exposures, screening, surveillance, social networks, comparative effectiveness, statistical modeling, causal inference, measurement error, study design, meta-analysis
期刊最新文献
Linked shrinkage to improve estimation of interaction effects in regression models. Bounds for selection bias using outcome probabilities Population dynamic study of two prey one predator system with disease in first prey using fuzzy impulsive control Development and application of an evidence-based directed acyclic graph to evaluate the associations between metal mixtures and cardiometabolic outcomes. Performance evaluation of ResNet model for classification of tomato plant disease
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1