Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties

Iben M. Ricket , Michael E. Matheny , Todd A. MacKenzie , Jennifer A. Emond , Kusum L. Ailawadi , Jeremiah R. Brown
{"title":"Novel integration of governmental data sources using machine learning to identify super-utilization among U.S. counties","authors":"Iben M. Ricket ,&nbsp;Michael E. Matheny ,&nbsp;Todd A. MacKenzie ,&nbsp;Jennifer A. Emond ,&nbsp;Kusum L. Ailawadi ,&nbsp;Jeremiah R. Brown","doi":"10.1016/j.ibmed.2023.100093","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><p>Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected data from the U.S. government, including information on consumer spending, offering an alternative method for identifying super-utilization among population units rather than individuals.</p></div><div><h3>Methods</h3><p>Cross-sectional data from 5 governmental sources in 2017 were used in a machine learning pipeline, where target-prediction features were selected and used in 4 distinct algorithms. Outcome metrics of RIHC utilization came from the American Hospital Association and included yearly: (1) emergency rooms visit, (2) inpatient days, and (3) hospital expenditures. Target-prediction features included: 149 demographic characteristics from the U.S. Census Bureau, 151 adult and child health characteristics from the Centers for Disease Control and Prevention, 151 community characteristics from the American Community Survey, and 571 consumer expenditures from the Bureau of Labor Statistics. SHAP analysis identified important target-prediction features for 3 RIHC outcome metrics.</p></div><div><h3>Results</h3><p>2475 counties with emergency rooms and 2491 counties with hospitals were included. The median yearly emergency room visits per capita was 0.450 [IQR:0.318, 0.618], the median inpatient days per capita was 0.368 [IQR: 0.176, 0.826], and the median hospital expenditures per capita was $2104 [IQR: $1299.93, 3362.97]. The coefficient of determination (R<sup>2</sup>), calculated on the test set, ranged between 0.267 and 0.447. Demographic and community characteristics were among the important predictors for all 3 RIHC outcome metrics.</p></div><div><h3>Conclusions</h3><p>Integrating diverse population characteristics from numerous governmental sources, we predicted 3-outcome metrics of RIHC among U.S. counties with good performance, offering a novel and actionable tool for identifying super-utilizer segments in the population. Wider integration of routinely collected data can be used to develop alternative methods for predicting RIHC among population units.</p></div>","PeriodicalId":73399,"journal":{"name":"Intelligence-based medicine","volume":"7 ","pages":"Article 100093"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/d0/83/nihms-1909855.PMC10358365.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligence-based medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666521223000078","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Background

Super-utilizers consume the greatest share of resource intensive healthcare (RIHC) and reducing their utilization remains a crucial challenge to healthcare systems in the United States (U.S.). The objective of this study was to predict RIHC among U.S. counties, using routinely collected data from the U.S. government, including information on consumer spending, offering an alternative method for identifying super-utilization among population units rather than individuals.

Methods

Cross-sectional data from 5 governmental sources in 2017 were used in a machine learning pipeline, where target-prediction features were selected and used in 4 distinct algorithms. Outcome metrics of RIHC utilization came from the American Hospital Association and included yearly: (1) emergency rooms visit, (2) inpatient days, and (3) hospital expenditures. Target-prediction features included: 149 demographic characteristics from the U.S. Census Bureau, 151 adult and child health characteristics from the Centers for Disease Control and Prevention, 151 community characteristics from the American Community Survey, and 571 consumer expenditures from the Bureau of Labor Statistics. SHAP analysis identified important target-prediction features for 3 RIHC outcome metrics.

Results

2475 counties with emergency rooms and 2491 counties with hospitals were included. The median yearly emergency room visits per capita was 0.450 [IQR:0.318, 0.618], the median inpatient days per capita was 0.368 [IQR: 0.176, 0.826], and the median hospital expenditures per capita was $2104 [IQR: $1299.93, 3362.97]. The coefficient of determination (R2), calculated on the test set, ranged between 0.267 and 0.447. Demographic and community characteristics were among the important predictors for all 3 RIHC outcome metrics.

Conclusions

Integrating diverse population characteristics from numerous governmental sources, we predicted 3-outcome metrics of RIHC among U.S. counties with good performance, offering a novel and actionable tool for identifying super-utilizer segments in the population. Wider integration of routinely collected data can be used to develop alternative methods for predicting RIHC among population units.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用机器学习对政府数据源进行新的集成,以识别美国各县的超级利用率。
背景:超级利用者消耗了资源密集型医疗保健(RIHC)的最大份额,降低其利用率仍然是美国医疗保健系统面临的一个关键挑战。本研究的目的是利用美国政府定期收集的数据,包括消费者支出信息,预测美国各县的资源密集型卫生保健,提供了一种用于识别种群单位而非个体之间的超利用率的替代方法。方法:在机器学习管道中使用2017年来自5个政府来源的横断面数据,其中选择目标预测特征并将其用于4种不同的算法。RIHC利用率的结果指标来自美国医院协会,包括每年:(1)急诊室就诊,(2)住院天数,(3)医院支出。目标预测特征包括:美国人口普查局的149个人口特征,疾病控制和预防中心的151个成人和儿童健康特征,美国社区调查的151个社区特征,以及劳工统计局的571个消费者支出。SHAP分析确定了3个RIHC结果指标的重要目标预测特征。结果:纳入2475个设有急诊室的县和2491个设有医院的县。年人均急诊就诊人次中位数为0.450[IQR:0.318,0.618],人均住院天数中位数为0.368[IQR:0.176,0.826],人均医院支出中位数为2104美元[IQR:129.93,3362.97]。根据测试集计算的决定系数(R2)在0.267和0.447之间。人口统计学和社区特征是所有3个RIHC结果指标的重要预测因素。结论:综合来自众多政府来源的不同人群特征,我们预测了美国表现良好的县的RIHC的3个结果指标,为识别人群中的超级利用者群体提供了一个新的可行工具。对常规收集的数据进行更广泛的整合,可用于开发预测人口单位间RIHC的替代方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Intelligence-based medicine
Intelligence-based medicine Health Informatics
CiteScore
5.00
自引率
0.00%
发文量
0
审稿时长
187 days
期刊最新文献
Artificial intelligence in child development monitoring: A systematic review on usage, outcomes and acceptance Automatic characterization of cerebral MRI images for the detection of autism spectrum disorders DOTnet 2.0: Deep learning network for diffuse optical tomography image reconstruction Artificial intelligence in child development monitoring: A systematic review on usage, outcomes and acceptance Clustering polycystic ovary syndrome laboratory results extracted from a large internet forum with machine learning
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1