Handling missing data and measurement error for early-onset myopia risk prediction models.

IF 3.9 3区 医学 Q1 HEALTH CARE SCIENCES & SERVICES BMC Medical Research Methodology Pub Date : 2024-09-06 DOI:10.1186/s12874-024-02319-x
Hongyu Lai, Kaiye Gao, Meiyan Li, Tao Li, Xiaodong Zhou, Xingtao Zhou, Hui Guo, Bo Fu
{"title":"Handling missing data and measurement error for early-onset myopia risk prediction models.","authors":"Hongyu Lai, Kaiye Gao, Meiyan Li, Tao Li, Xiaodong Zhou, Xingtao Zhou, Hui Guo, Bo Fu","doi":"10.1186/s12874-024-02319-x","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early identification of children at high risk of developing myopia is essential to prevent myopia progression by introducing timely interventions. However, missing data and measurement error (ME) are common challenges in risk prediction modelling that can introduce bias in myopia prediction.</p><p><strong>Methods: </strong>We explore four imputation methods to address missing data and ME: single imputation (SI), multiple imputation under missing at random (MI-MAR), multiple imputation with calibration procedure (MI-ME), and multiple imputation under missing not at random (MI-MNAR). We compare four machine-learning models (Decision Tree, Naive Bayes, Random Forest, and Xgboost) and three statistical models (logistic regression, stepwise logistic regression, and least absolute shrinkage and selection operator logistic regression) in myopia risk prediction. We apply these models to the Shanghai Jinshan Myopia Cohort Study and also conduct a simulation study to investigate the impact of missing mechanisms, the degree of ME, and the importance of predictors on model performance. Model performance is evaluated using the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC).</p><p><strong>Results: </strong>Our findings indicate that in scenarios with missing data and ME, using MI-ME in combination with logistic regression yields the best prediction results. In scenarios without ME, employing MI-MAR to handle missing data outperforms SI regardless of the missing mechanisms. When ME has a greater impact on prediction than missing data, the relative advantage of MI-MAR diminishes, and MI-ME becomes more superior. Furthermore, our results demonstrate that statistical models exhibit better prediction performance than machine-learning models.</p><p><strong>Conclusion: </strong>MI-ME emerges as a reliable method for handling missing data and ME in important predictors for early-onset myopia risk prediction.</p>","PeriodicalId":9114,"journal":{"name":"BMC Medical Research Methodology","volume":"24 1","pages":"194"},"PeriodicalIF":3.9000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11378546/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Research Methodology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12874-024-02319-x","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Early identification of children at high risk of developing myopia is essential to prevent myopia progression by introducing timely interventions. However, missing data and measurement error (ME) are common challenges in risk prediction modelling that can introduce bias in myopia prediction.

Methods: We explore four imputation methods to address missing data and ME: single imputation (SI), multiple imputation under missing at random (MI-MAR), multiple imputation with calibration procedure (MI-ME), and multiple imputation under missing not at random (MI-MNAR). We compare four machine-learning models (Decision Tree, Naive Bayes, Random Forest, and Xgboost) and three statistical models (logistic regression, stepwise logistic regression, and least absolute shrinkage and selection operator logistic regression) in myopia risk prediction. We apply these models to the Shanghai Jinshan Myopia Cohort Study and also conduct a simulation study to investigate the impact of missing mechanisms, the degree of ME, and the importance of predictors on model performance. Model performance is evaluated using the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC).

Results: Our findings indicate that in scenarios with missing data and ME, using MI-ME in combination with logistic regression yields the best prediction results. In scenarios without ME, employing MI-MAR to handle missing data outperforms SI regardless of the missing mechanisms. When ME has a greater impact on prediction than missing data, the relative advantage of MI-MAR diminishes, and MI-ME becomes more superior. Furthermore, our results demonstrate that statistical models exhibit better prediction performance than machine-learning models.

Conclusion: MI-ME emerges as a reliable method for handling missing data and ME in important predictors for early-onset myopia risk prediction.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
处理早发近视风险预测模型的数据缺失和测量误差。
背景:早期识别近视高风险儿童对于通过及时干预预防近视发展至关重要。然而,数据缺失和测量误差(ME)是风险预测建模中常见的难题,会给近视预测带来偏差:方法:我们探讨了四种解决数据缺失和测量误差的估算方法:单一估算(SI)、随机缺失下的多重估算(MI-MAR)、带校准程序的多重估算(MI-ME)和非随机缺失下的多重估算(MI-MNAR)。我们比较了四种机器学习模型(决策树、奈夫贝叶斯、随机森林和 Xgboost)和三种统计模型(逻辑回归、逐步逻辑回归和最小绝对收缩与选择算子逻辑回归)在近视风险预测中的应用。我们将这些模型应用于上海金山近视队列研究,并进行了模拟研究,以调查缺失机制、ME程度和预测因子的重要性对模型性能的影响。模型性能使用接收者操作特征曲线(AUROC)和精确度-召回曲线下面积(AUPRC)进行评估:结果:我们的研究结果表明,在有缺失数据和 ME 的情况下,使用 MI-ME 结合逻辑回归可获得最佳预测结果。在没有 ME 的情况下,无论缺失机制如何,使用 MI-MAR 处理缺失数据的效果都优于 SI。当 ME 对预测的影响大于缺失数据时,MI-MAR 的相对优势就会减弱,MI-ME 就会变得更加优越。此外,我们的结果表明,统计模型比机器学习模型表现出更好的预测性能:结论:MI-ME 是一种可靠的方法,可以处理早期近视风险预测中重要预测指标的缺失数据和 ME。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMC Medical Research Methodology
BMC Medical Research Methodology 医学-卫生保健
CiteScore
6.50
自引率
2.50%
发文量
298
审稿时长
3-8 weeks
期刊介绍: BMC Medical Research Methodology is an open access journal publishing original peer-reviewed research articles in methodological approaches to healthcare research. Articles on the methodology of epidemiological research, clinical trials and meta-analysis/systematic review are particularly encouraged, as are empirical studies of the associations between choice of methodology and study outcomes. BMC Medical Research Methodology does not aim to publish articles describing scientific methods or techniques: these should be directed to the BMC journal covering the relevant biomedical subject area.
期刊最新文献
Correction: Forced randomization: the what, why, and how. Three new methodologies for calculating the effective sample size when performing population adjustment. Correction: Inclusion of unexposed clusters improves the precision of fixed effects analysis of stepped-wedge cluster randomized trials with binary and count outcomes. FAIR data management: a framework for fostering data literacy in biomedical sciences education. A Bayesian analysis integrating expert beliefs to better understand how new evidence ought to update what we believe: a use case of chiropractic care and acute lumbar disc herniation with early surgery.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1