Replicability and reproducibility of predictive models for diagnosis of depression among young adults using Electronic Health Records.

David Nickson, Henrik Singmann, Caroline Meyer, Carla Toro, Lukasz Walasek
{"title":"Replicability and reproducibility of predictive models for diagnosis of depression among young adults using Electronic Health Records.","authors":"David Nickson, Henrik Singmann, Caroline Meyer, Carla Toro, Lukasz Walasek","doi":"10.1186/s41512-023-00160-2","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Recent advances in machine learning combined with the growing availability of digitized health records offer new opportunities for improving early diagnosis of depression. An emerging body of research shows that Electronic Health Records can be used to accurately predict cases of depression on the basis of individual's primary care records. The successes of these studies are undeniable, but there is a growing concern that their results may not be replicable, which could cast doubt on their clinical usefulness.</p><p><strong>Methods: </strong>To address this issue in the present paper, we set out to reproduce and replicate the work by Nichols et al. (2018), who trained predictive models of depression among young adults using Electronic Healthcare Records. Our contribution consists of three parts. First, we attempt to replicate the methodology used by the original authors, acquiring a more up-to-date set of primary health care records to the same specification and reproducing their data processing and analysis. Second, we test models presented in the original paper on our own data, thus providing out-of-sample prediction of the predictive models. Third, we extend past work by considering several novel machine-learning approaches in an attempt to improve the predictive accuracy achieved in the original work.</p><p><strong>Results: </strong>In summary, our results demonstrate that the work of Nichols et al. is largely reproducible and replicable. This was the case both for the replication of the original model and the out-of-sample replication applying NRCBM coefficients to our new EHRs data. Although alternative predictive models did not improve model performance over standard logistic regression, our results indicate that stepwise variable selection is not stable even in the case of large data sets.</p><p><strong>Conclusion: </strong>We discuss the challenges associated with the research on mental health and Electronic Health Records, including the need to produce interpretable and robust models. We demonstrated some potential issues associated with the reliance on EHRs, including changes in the regulations and guidelines (such as the QOF guidelines in the UK) and reliance on visits to GP as a predictor of specific disorders.</p>","PeriodicalId":72800,"journal":{"name":"Diagnostic and prognostic research","volume":"7 1","pages":"25"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10696659/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Diagnostic and prognostic research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41512-023-00160-2","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Recent advances in machine learning combined with the growing availability of digitized health records offer new opportunities for improving early diagnosis of depression. An emerging body of research shows that Electronic Health Records can be used to accurately predict cases of depression on the basis of individual's primary care records. The successes of these studies are undeniable, but there is a growing concern that their results may not be replicable, which could cast doubt on their clinical usefulness.

Methods: To address this issue in the present paper, we set out to reproduce and replicate the work by Nichols et al. (2018), who trained predictive models of depression among young adults using Electronic Healthcare Records. Our contribution consists of three parts. First, we attempt to replicate the methodology used by the original authors, acquiring a more up-to-date set of primary health care records to the same specification and reproducing their data processing and analysis. Second, we test models presented in the original paper on our own data, thus providing out-of-sample prediction of the predictive models. Third, we extend past work by considering several novel machine-learning approaches in an attempt to improve the predictive accuracy achieved in the original work.

Results: In summary, our results demonstrate that the work of Nichols et al. is largely reproducible and replicable. This was the case both for the replication of the original model and the out-of-sample replication applying NRCBM coefficients to our new EHRs data. Although alternative predictive models did not improve model performance over standard logistic regression, our results indicate that stepwise variable selection is not stable even in the case of large data sets.

Conclusion: We discuss the challenges associated with the research on mental health and Electronic Health Records, including the need to produce interpretable and robust models. We demonstrated some potential issues associated with the reliance on EHRs, including changes in the regulations and guidelines (such as the QOF guidelines in the UK) and reliance on visits to GP as a predictor of specific disorders.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用电子健康记录诊断年轻人抑郁症的预测模型的可复制性和再现性
背景:机器学习的最新进展与数字化健康记录的日益普及为改善抑郁症的早期诊断提供了新的机会。一项新兴的研究表明,电子健康记录可以在个人初级保健记录的基础上准确预测抑郁症病例。这些研究的成功是不可否认的,但越来越多的人担心它们的结果可能不可复制,这可能使人们怀疑它们的临床用途。方法:为了在本文中解决这一问题,我们着手重现和复制Nichols等人(2018)的工作,他们使用电子医疗记录训练了年轻人抑郁症的预测模型。我们的贡献由三部分组成。首先,我们试图复制原作者使用的方法,以相同的规格获取一组最新的初级卫生保健记录,并复制其数据处理和分析。其次,我们用自己的数据对原论文中的模型进行检验,从而对预测模型进行样本外预测。第三,我们通过考虑几种新的机器学习方法来扩展过去的工作,试图提高原始工作中实现的预测准确性。结果:总之,我们的结果表明Nichols等人的工作在很大程度上是可重复和可复制的。对于原始模型的复制和将NRCBM系数应用于我们的新ehr数据的样本外复制都是如此。虽然替代预测模型并没有提高标准逻辑回归模型的性能,但我们的结果表明,即使在大数据集的情况下,逐步变量选择也不稳定。结论:我们讨论了与心理健康和电子健康记录研究相关的挑战,包括需要产生可解释和稳健的模型。我们展示了一些与依赖电子病历相关的潜在问题,包括法规和指南的变化(如英国的QOF指南)以及依赖GP访问作为特定疾病的预测因子。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
审稿时长
18 weeks
期刊最新文献
Risk prediction tools for pressure injury occurrence: an umbrella review of systematic reviews reporting model development and validation methods. Rehabilitation outcomes after comprehensive post-acute inpatient rehabilitation following moderate to severe acquired brain injury-study protocol for an overall prognosis study based on routinely collected health data. Validation of prognostic models predicting mortality or ICU admission in patients with COVID-19 in low- and middle-income countries: a global individual participant data meta-analysis. Reported prevalence and comparison of diagnostic approaches for Candida africana: a systematic review with meta-analysis. The relative data hungriness of unpenalized and penalized logistic regression and ensemble-based machine learning methods: the case of calibration.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1