Prediction of Hospital Charges for the Cancer Patients with Data Mining Techniques

J. Kang, Suk-Hoon Chung, Yong-Moo Suh
{"title":"Prediction of Hospital Charges for the Cancer Patients with Data Mining Techniques","authors":"J. Kang, Suk-Hoon Chung, Yong-Moo Suh","doi":"10.4258/JKSMI.2009.15.1.13","DOIUrl":null,"url":null,"abstract":"Objective: Predictions of hospital charges for cancer patients are very important, because they provide a basis for allocating medical resources in the hospital and for establishing national medical policies. But previous studies to predict hospital charges were mainly based on statistical analysis, which has used only a small aspect among huge medical data so that the prediction power was limited. Thus we developed four data mining models, including two artificial neural network (ANN) models and two classification and regression tree (CART) models, to predict both the total amount of hospital charges and the amount paid by the insurance of cancer patients and compared their efficacies. Methods: The data was generated from 400,625 medical records of 1,605 cancer patients who had been hospitalized to Kyung Hee University Hospital from March 1, 2003 to February 29, 2004. Clementine 8.1 program was used to build four data mining prediction models, two for the total amount and two for the amount paid by insurance. The variables included all of the data fields of standard medical record form of Korea. The neural network model used feed-forward back propagation method, which had 2 hidden layers. For decision tree model, RELIEFF method was used and the maximum tree depth was set to 30. We divided the dataset into 67% of training dataset and 33% of test dataset, using stratified sampling. Linear correlation coefficient and gain chart were compared. Results: The ANN models showed better linear correlation coefficient than the CART models in predicting both the total amount (0.824 vs. 0.791) and the amount paid by insurance (0.838 vs. 0.699). The estimated accuracy of ANN model was more than 98% to predict both total amount and amount paid by insurance. The CART model for total amount showed that the relative importance of the variables were duration of admission(0.073), number of consultation(0.061), and treatment group 16(0.06). The CART model for the amount paid by insurance showed that the relative importance of the cariables were duration of admission (0.09), number of ICU admission (0.063), and number of consultations (0.062). The percent gain of ANN model shows better %gain than CART to predict total amount but to predict amount paid by insurance, ANN showed similar pattern to CART Conclusion: The ANN models showed better prediction accuracy than CART models. However, the CART models, which serve different information from ANN model, can be used to allocate limited medical resources effectively and efficiently. For the purpose of establishing medical policies and strategies, using those models together is warranted.","PeriodicalId":255087,"journal":{"name":"Journal of Korean Society of Medical Informatics","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Korean Society of Medical Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4258/JKSMI.2009.15.1.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22

Abstract

Objective: Predictions of hospital charges for cancer patients are very important, because they provide a basis for allocating medical resources in the hospital and for establishing national medical policies. But previous studies to predict hospital charges were mainly based on statistical analysis, which has used only a small aspect among huge medical data so that the prediction power was limited. Thus we developed four data mining models, including two artificial neural network (ANN) models and two classification and regression tree (CART) models, to predict both the total amount of hospital charges and the amount paid by the insurance of cancer patients and compared their efficacies. Methods: The data was generated from 400,625 medical records of 1,605 cancer patients who had been hospitalized to Kyung Hee University Hospital from March 1, 2003 to February 29, 2004. Clementine 8.1 program was used to build four data mining prediction models, two for the total amount and two for the amount paid by insurance. The variables included all of the data fields of standard medical record form of Korea. The neural network model used feed-forward back propagation method, which had 2 hidden layers. For decision tree model, RELIEFF method was used and the maximum tree depth was set to 30. We divided the dataset into 67% of training dataset and 33% of test dataset, using stratified sampling. Linear correlation coefficient and gain chart were compared. Results: The ANN models showed better linear correlation coefficient than the CART models in predicting both the total amount (0.824 vs. 0.791) and the amount paid by insurance (0.838 vs. 0.699). The estimated accuracy of ANN model was more than 98% to predict both total amount and amount paid by insurance. The CART model for total amount showed that the relative importance of the variables were duration of admission(0.073), number of consultation(0.061), and treatment group 16(0.06). The CART model for the amount paid by insurance showed that the relative importance of the cariables were duration of admission (0.09), number of ICU admission (0.063), and number of consultations (0.062). The percent gain of ANN model shows better %gain than CART to predict total amount but to predict amount paid by insurance, ANN showed similar pattern to CART Conclusion: The ANN models showed better prediction accuracy than CART models. However, the CART models, which serve different information from ANN model, can be used to allocate limited medical resources effectively and efficiently. For the purpose of establishing medical policies and strategies, using those models together is warranted.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于数据挖掘技术的癌症患者住院费用预测
目的:肿瘤患者住院费用预测具有重要意义,为医院医疗资源配置和国家医疗政策制定提供依据。但以往的医院收费预测研究主要是基于统计分析,在庞大的医疗数据中只使用了很小的一个方面,预测能力有限。为此,我们开发了4种数据挖掘模型,包括2种人工神经网络(ANN)模型和2种分类回归树(CART)模型,用于预测癌症患者的住院总费用和保险支付金额,并比较其疗效。方法:数据来源于2003年3月1日至2004年2月29日在庆熙大学附属医院住院的1605例癌症患者的400,625份病历。使用Clementine 8.1程序构建4个数据挖掘预测模型,其中2个用于总金额预测,2个用于保险赔付金额预测。变量包括韩国标准病历表的所有数据字段。神经网络模型采用前馈反传播方法,有2个隐藏层。决策树模型采用RELIEFF方法,最大树深度设置为30。我们将数据集分为67%的训练数据集和33%的测试数据集,采用分层抽样的方法。比较了线性相关系数和增益图。结果:ANN模型对保险总金额(0.824比0.791)和保险支付金额(0.838比0.699)的线性相关系数均优于CART模型。人工神经网络模型对保险总金额和保险支付金额的预测准确率均在98%以上。总金额的CART模型显示,各变量的相对重要性分别为入院时间(0.073)、会诊次数(0.061)和治疗组16(0.06)。保险支付金额的CART模型显示,住院时间(0.09)、ICU住院次数(0.063)和会诊次数(0.062)是影响因素的相对重要性。ANN模型预测总金额的百分比增益优于CART,但预测保险支付金额的百分比增益与CART相似。结论:ANN模型的预测精度优于CART模型。然而,CART模型与人工神经网络模型提供的信息不同,可以有效地分配有限的医疗资源。为了制定医疗政策和战略,有必要同时使用这些模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Assessing the Quality of Structured Data Entry for the Secondary Use of Electronic Medical Records A Korean Version of the WHO International Classification for Patient Safety: A Validity Study Development and Validation of Archetypes for Nursing Problems in Breast Cancer Patients Comparison of Physicians' and Patients' Perception on the Effect of Internet Health Information Practical Guide to Clinical Data Management by Susanne Prokscha, 2007
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1