Applying machine learning approaches for predicting obesity risk using US health administrative claims database.

IF 3.7 2区 医学 Q2 ENDOCRINOLOGY & METABOLISM BMJ Open Diabetes Research & Care Pub Date : 2024-09-26 DOI:10.1136/bmjdrc-2024-004193
Casey Choong, Alan Brnabic, Chanadda Chinthammit, Meena Ravuri, Kendra Terrell, Hong Kan
{"title":"Applying machine learning approaches for predicting obesity risk using US health administrative claims database.","authors":"Casey Choong, Alan Brnabic, Chanadda Chinthammit, Meena Ravuri, Kendra Terrell, Hong Kan","doi":"10.1136/bmjdrc-2024-004193","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Body mass index (BMI) is inadequately recorded in US administrative claims databases. We aimed to validate the sensitivity and positive predictive value (PPV) of BMI-related diagnosis codes using an electronic medical records (EMR) claims-linked database. Additionally, we applied machine learning (ML) to identify features in US claims databases to predict obesity status.</p><p><strong>Research design and methods: </strong>This observational, retrospective analysis included 692 119 people ≥18 years of age, with ≥1 BMI reading in MarketScan Explorys Claims-EMR data (January 2013-December 2019). Claims-based obesity status was compared with EMR-based BMI (gold standard) to assess BMI-related diagnosis code sensitivity and PPV. Logistic regression (LR), penalized LR with L1 penalty (Least Absolute Shrinkage and Selection Operator), extreme gradient boosting (XGBoost) and random forest, with features drawn from insurance claims, were trained to predict obesity status (BMI≥30 kg/m<sup>2</sup>) from EMR as the gold standard. Model performance was compared using several metrics, including the area under the receiver operating characteristic curve. The best-performing model was applied to assess feature importance. Obesity risk scores were computed from the best model generated from the claims database and compared against the BMI recorded in the EMR.</p><p><strong>Results: </strong>The PPV of diagnosis codes from claims alone remained high over the study period (85.4-89.2%); sensitivity was low (16.8-44.8%). XGBoost performed the best at predicting obesity with the highest area under the curve (AUC; 79.4%) and the lowest Brier score. The number of obesity diagnoses and obesity diagnoses from inpatient settings were the most important predictors of obesity. XGBoost showed an AUC of 74.1% when trained without an obesity diagnosis.</p><p><strong>Conclusions: </strong>Obesity prevalence is under-reported in claims databases. ML models, with or without explicit obesity, show promise in improving obesity prediction accuracy compared with obesity codes alone. Improved obesity status prediction may assist practitioners and payors to estimate the burden of obesity and investigate the potential unmet needs of current treatments.</p>","PeriodicalId":9151,"journal":{"name":"BMJ Open Diabetes Research & Care","volume":"12 5","pages":""},"PeriodicalIF":3.7000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11429277/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMJ Open Diabetes Research & Care","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1136/bmjdrc-2024-004193","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENDOCRINOLOGY & METABOLISM","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Body mass index (BMI) is inadequately recorded in US administrative claims databases. We aimed to validate the sensitivity and positive predictive value (PPV) of BMI-related diagnosis codes using an electronic medical records (EMR) claims-linked database. Additionally, we applied machine learning (ML) to identify features in US claims databases to predict obesity status.

Research design and methods: This observational, retrospective analysis included 692 119 people ≥18 years of age, with ≥1 BMI reading in MarketScan Explorys Claims-EMR data (January 2013-December 2019). Claims-based obesity status was compared with EMR-based BMI (gold standard) to assess BMI-related diagnosis code sensitivity and PPV. Logistic regression (LR), penalized LR with L1 penalty (Least Absolute Shrinkage and Selection Operator), extreme gradient boosting (XGBoost) and random forest, with features drawn from insurance claims, were trained to predict obesity status (BMI≥30 kg/m2) from EMR as the gold standard. Model performance was compared using several metrics, including the area under the receiver operating characteristic curve. The best-performing model was applied to assess feature importance. Obesity risk scores were computed from the best model generated from the claims database and compared against the BMI recorded in the EMR.

Results: The PPV of diagnosis codes from claims alone remained high over the study period (85.4-89.2%); sensitivity was low (16.8-44.8%). XGBoost performed the best at predicting obesity with the highest area under the curve (AUC; 79.4%) and the lowest Brier score. The number of obesity diagnoses and obesity diagnoses from inpatient settings were the most important predictors of obesity. XGBoost showed an AUC of 74.1% when trained without an obesity diagnosis.

Conclusions: Obesity prevalence is under-reported in claims databases. ML models, with or without explicit obesity, show promise in improving obesity prediction accuracy compared with obesity codes alone. Improved obesity status prediction may assist practitioners and payors to estimate the burden of obesity and investigate the potential unmet needs of current treatments.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用美国健康管理索赔数据库,采用机器学习方法预测肥胖风险。
导言:美国行政索赔数据库对体重指数(BMI)的记录不足。我们的目的是利用与电子病历(EMR)索赔相关联的数据库,验证 BMI 相关诊断代码的灵敏度和阳性预测值(PPV)。此外,我们还应用机器学习(ML)来识别美国索赔数据库中预测肥胖状态的特征:这项观察性、回顾性分析包括 MarketScan Explorys Claims-EMR 数据(2013 年 1 月至 2019 年 12 月)中年龄≥18 岁、BMI 值≥1 的 692 119 人。将基于索赔的肥胖状态与基于 EMR 的 BMI(金标准)进行比较,以评估 BMI 相关诊断代码的灵敏度和 PPV。训练了逻辑回归(LR)、带有 L1 惩罚(最小绝对缩减和选择操作符)的惩罚性 LR、极端梯度提升(XGBoost)和随机森林,其特征来自保险理赔,以预测作为金标准的 EMR 中的肥胖状态(BMI≥30 kg/m2)。使用接收者工作特征曲线下面积等多个指标对模型性能进行了比较。采用表现最好的模型来评估特征的重要性。根据理赔数据库生成的最佳模型计算肥胖风险评分,并与 EMR 中记录的 BMI 进行比较:在研究期间,仅从报销单中获得的诊断代码的 PPV 值仍然很高(85.4%-89.2%);灵敏度较低(16.8%-44.8%)。XGBoost 在预测肥胖症方面表现最佳,曲线下面积(AUC;79.4%)最高,Brier 评分最低。肥胖诊断次数和来自住院环境的肥胖诊断是最重要的肥胖预测因素。在没有肥胖诊断的情况下,XGBoost 的 AUC 为 74.1%:结论:理赔数据库对肥胖症患病率的报告不足。与单纯的肥胖代码相比,有无明确肥胖的 ML 模型都有望提高肥胖预测的准确性。改进肥胖状况预测可帮助从业人员和付款人估算肥胖的负担,并调查当前治疗未满足的潜在需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
BMJ Open Diabetes Research & Care
BMJ Open Diabetes Research & Care Medicine-Endocrinology, Diabetes and Metabolism
CiteScore
9.30
自引率
2.40%
发文量
123
审稿时长
18 weeks
期刊介绍: BMJ Open Diabetes Research & Care is an open access journal committed to publishing high-quality, basic and clinical research articles regarding type 1 and type 2 diabetes, and associated complications. Only original content will be accepted, and submissions are subject to rigorous peer review to ensure the publication of high-quality — and evidence-based — original research articles.
期刊最新文献
Association between RDW-SD and prognosis across glycemic status in patients with dilated cardiomyopathy. Widening the phenotypic spectrum caused by pathogenic PDX1 variants in individuals with neonatal diabetes. Identification of atypical pediatric diabetes mellitus cases using electronic medical records. Type 2 diabetes complications in ethnic minority compared with European host populations: a systematic review and meta-analysis. Effect of weight-maintaining ketogenic diet on glycemic control and insulin sensitivity in obese T2D subjects.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1