Health State Risk Categorization: A Machine Learning Clustering Approach Using Health and Retirement Study Data

The Journal of Financial Data Science Pub Date : 2022-04-30 DOI:10.3905/jfds.2022.4.2.139

F. Tan, D. Mehta

{"title":"Health State Risk Categorization: A Machine Learning Clustering Approach Using Health and Retirement Study Data","authors":"F. Tan, D. Mehta","doi":"10.3905/jfds.2022.4.2.139","DOIUrl":null,"url":null,"abstract":"For countries such as the United States, which lacks a universal health care system, future health care costs can create significant uncertainty that a retirement investment strategy must be built to manage. One of the most important factors determining health care costs is the individual’s health status. Hence, categorizing individuals into meaningful health risk types is an essential task. The conventional approach is to use individuals’ self-rated health state categorization. In this work, the authors provide an objective and data-driven machine learning (ML)–based approach to categorize heath state risk by using the most widely used US household surveys on older Americans, the Health and Retirement Study (HRS). The authors propose an approach of employing the K-modes clustering method to algorithmically cluster on an exhaustive list of categorical health-related variables in the HRS. The resulting clusters are shown to provide an objective, interpretable, and practical health state risk categorization. The authors then compare and contrast the ML-based and self-rated health state categorizations and discuss the implications of the differences. They also illustrate the difficulty in predicting out-of-pocket costs based on self-rated health status and how ML-based categorizations can generate more-accurate health care cost estimates for personalized retirement planning. The results in this article open different avenues of research, including behavioral science analysis for health and retirement study.","PeriodicalId":199045,"journal":{"name":"The Journal of Financial Data Science","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Financial Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3905/jfds.2022.4.2.139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

For countries such as the United States, which lacks a universal health care system, future health care costs can create significant uncertainty that a retirement investment strategy must be built to manage. One of the most important factors determining health care costs is the individual’s health status. Hence, categorizing individuals into meaningful health risk types is an essential task. The conventional approach is to use individuals’ self-rated health state categorization. In this work, the authors provide an objective and data-driven machine learning (ML)–based approach to categorize heath state risk by using the most widely used US household surveys on older Americans, the Health and Retirement Study (HRS). The authors propose an approach of employing the K-modes clustering method to algorithmically cluster on an exhaustive list of categorical health-related variables in the HRS. The resulting clusters are shown to provide an objective, interpretable, and practical health state risk categorization. The authors then compare and contrast the ML-based and self-rated health state categorizations and discuss the implications of the differences. They also illustrate the difficulty in predicting out-of-pocket costs based on self-rated health status and how ML-based categorizations can generate more-accurate health care cost estimates for personalized retirement planning. The results in this article open different avenues of research, including behavioral science analysis for health and retirement study.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

健康状态风险分类:使用健康和退休研究数据的机器学习聚类方法

对于像美国这样缺乏全民医疗保健体系的国家来说，未来的医疗保健成本可能会产生巨大的不确定性，必须建立退休投资策略来管理。决定医疗费用的最重要因素之一是个人的健康状况。因此，将个体划分为有意义的健康风险类型是一项重要任务。传统的方法是使用个体自评健康状态分类。在这项工作中，作者提供了一种客观的、基于数据驱动的机器学习(ML)的方法，通过使用最广泛使用的美国老年人家庭调查，即健康与退休研究(HRS)，对健康状态风险进行分类。作者提出了一种方法，采用k模式聚类方法，对HRS中与健康相关的分类变量的详尽列表进行算法聚类。结果显示，集群提供了一个客观的，可解释的，实用的健康状态风险分类。然后，作者比较和对比了基于ml的和自评的健康状态分类，并讨论了差异的含义。他们还说明了基于自评健康状况预测自付费用的困难，以及基于ml的分类如何为个性化退休计划生成更准确的医疗保健费用估计。本文的研究结果开辟了不同的研究途径，包括对健康和退休研究的行为科学分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

The Journal of Financial Data Science

自引率

0.00%

发文量

期刊最新文献

Managing Editor’s Letter Explainable Machine Learning Models of Consumer Credit Risk Predicting Returns with Machine Learning across Horizons, Firm Size, and Time Deep Calibration with Artificial Neural Network: A Performance Comparison on Option-Pricing Models RIFT: Pretraining and Applications for Representations of Interrelated Financial Time Series