An interpretable machine learning study for developing a binary classifier for predicting rehospitalization from skilled nursing facilities

Zhouyang Lou , Zachary Hass , Nan Kong
{"title":"An interpretable machine learning study for developing a binary classifier for predicting rehospitalization from skilled nursing facilities","authors":"Zhouyang Lou ,&nbsp;Zachary Hass ,&nbsp;Nan Kong","doi":"10.1016/j.health.2025.100387","DOIUrl":null,"url":null,"abstract":"<div><div>Reducing hospital readmissions for older adults discharged to a skilled nursing facility (SNF) is important to the Unites States (U.S.) both from financial and care quality perspectives. To identify potential risk factors, researchers have used data from claims, national surveys, and administrative databases to train models that predict hospital readmissions that occur within 30 days of discharge. Machine learning techniques hold promise for this binary classification task. However, analysis pipelines are underdeveloped in data balancing, feature selection, and model interpretability. In this paper, we utilized individual resident-level data from the Long-Term Care Minimum Data Set (MDS) collected from SNFs in a midwestern U.S. state (n = 93,058). We further triangulated this data with publicly available facility quality and staffing data from the Nursing Home Compares tool of the Medicare.gov and facility neighborhood data from the National Neighborhood Data Archive. We compared several machine learning models, data balancing techniques, and feature selection methods, for the prediction task. We found that XGBoost, with Synthetic Minority Oversampling Edited Nearest Neighbor (SMOTE-ENN) to balance the data, and hierarchical clustering based on spearman correlation to select the features that produces the best prediction performance. We then used SHapley Additive exPlanations (SHAP) values to identify features that contribute most to the performance and used partial dependence plots to examine curvilinear and moderating relationships between features and the risk of 30-day rehospitalization.</div></div>","PeriodicalId":73222,"journal":{"name":"Healthcare analytics (New York, N.Y.)","volume":"7 ","pages":"Article 100387"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Healthcare analytics (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772442525000061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Reducing hospital readmissions for older adults discharged to a skilled nursing facility (SNF) is important to the Unites States (U.S.) both from financial and care quality perspectives. To identify potential risk factors, researchers have used data from claims, national surveys, and administrative databases to train models that predict hospital readmissions that occur within 30 days of discharge. Machine learning techniques hold promise for this binary classification task. However, analysis pipelines are underdeveloped in data balancing, feature selection, and model interpretability. In this paper, we utilized individual resident-level data from the Long-Term Care Minimum Data Set (MDS) collected from SNFs in a midwestern U.S. state (n = 93,058). We further triangulated this data with publicly available facility quality and staffing data from the Nursing Home Compares tool of the Medicare.gov and facility neighborhood data from the National Neighborhood Data Archive. We compared several machine learning models, data balancing techniques, and feature selection methods, for the prediction task. We found that XGBoost, with Synthetic Minority Oversampling Edited Nearest Neighbor (SMOTE-ENN) to balance the data, and hierarchical clustering based on spearman correlation to select the features that produces the best prediction performance. We then used SHapley Additive exPlanations (SHAP) values to identify features that contribute most to the performance and used partial dependence plots to examine curvilinear and moderating relationships between features and the risk of 30-day rehospitalization.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
求助全文
约1分钟内获得全文 去求助
来源期刊
Healthcare analytics (New York, N.Y.)
Healthcare analytics (New York, N.Y.) Applied Mathematics, Modelling and Simulation, Nursing and Health Professions (General)
CiteScore
4.40
自引率
0.00%
发文量
0
审稿时长
79 days
期刊最新文献
An exploration of the interplay between treatment and vaccination in an Age-Structured Malaria Model using non-linear ordinary differential equations A machine learning and neural network approach for classifying multidrug-resistant bacterial infections A data-driven approach to pricing models for balanced public–private healthcare systems An interpretable machine learning study for developing a binary classifier for predicting rehospitalization from skilled nursing facilities A recommender system with multi-objective hybrid Harris Hawk optimization for feature selection and disease diagnosis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1