用于医疗保险可解释成本预测的机器学习

Ugochukwu Orji , Elochukwu Ukwandu
{"title":"用于医疗保险可解释成本预测的机器学习","authors":"Ugochukwu Orji ,&nbsp;Elochukwu Ukwandu","doi":"10.1016/j.mlwa.2023.100516","DOIUrl":null,"url":null,"abstract":"<div><p>Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning (ML) approaches to increase their productivity and efficiency. In this paper, the authors deployed three regression-based ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting (XGBoost), Gradient-boosting Machine (GBM), and Random Forest (RF) methods in predicting medical insurance costs. Explainable Artificial Intelligence (XAi) methods SHapley Additive exPlanations (SHAP) and Individual Conditional Expectation (ICE) plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared (R<sup>2</sup>), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.</p></div>","PeriodicalId":74093,"journal":{"name":"Machine learning with applications","volume":"15 ","pages":"Article 100516"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666827023000695/pdfft?md5=fcd73094ae2ec3d7f5d01f086997c258&pid=1-s2.0-S2666827023000695-main.pdf","citationCount":"0","resultStr":"{\"title\":\"Machine learning for an explainable cost prediction of medical insurance\",\"authors\":\"Ugochukwu Orji ,&nbsp;Elochukwu Ukwandu\",\"doi\":\"10.1016/j.mlwa.2023.100516\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning (ML) approaches to increase their productivity and efficiency. In this paper, the authors deployed three regression-based ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting (XGBoost), Gradient-boosting Machine (GBM), and Random Forest (RF) methods in predicting medical insurance costs. Explainable Artificial Intelligence (XAi) methods SHapley Additive exPlanations (SHAP) and Individual Conditional Expectation (ICE) plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared (R<sup>2</sup>), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.</p></div>\",\"PeriodicalId\":74093,\"journal\":{\"name\":\"Machine learning with applications\",\"volume\":\"15 \",\"pages\":\"Article 100516\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2666827023000695/pdfft?md5=fcd73094ae2ec3d7f5d01f086997c258&pid=1-s2.0-S2666827023000695-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Machine learning with applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2666827023000695\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning with applications","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666827023000695","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着越来越多的保险公司致力于最大限度地发挥机器学习(ML)方法的潜力,以提高其生产力和效率,医疗保健领域的预测建模仍然是一个活跃的精算研究主题。在本文中,作者部署了三种基于回归的集成ML模型,这些模型通过极端梯度增强(XGBoost)、梯度增强机(GBM)和随机森林(RF)方法结合决策树的变化来预测医疗保险费用。采用可解释人工智能(XAi)方法SHapley加性解释(SHAP)和个体条件期望(ICE)图来发现和解释影响数据集中医疗保险保费价格的关键决定因素。使用的数据集包含986条记录,并且在KAGGLE存储库中公开可用。采用四种性能评价指标对模型进行评价,包括r平方(R2)、平均绝对误差(MAE)、均方根误差(RMSE)和平均绝对百分比误差(MAPE)。结果表明,所有模型都产生了令人印象深刻的结果;然而,XGBoost模型获得了更好的整体性能,尽管它也扩展了更多的计算资源,而RF模型记录的预测误差更小,消耗的计算资源远少于XGBoost模型。此外,我们比较了两种XAi方法的结果,以确定影响每种模型的保费价格的关键决定因素,尽管两种XAi方法产生了相似的结果,但我们发现ICE图比SHAP分析更详细地显示了每个变量之间的相互作用,而SHAP分析似乎更高级。作者的目的是,本研究的贡献将有助于决策者,保险公司和潜在的医疗保险购买者在他们的决策过程中选择正确的政策,以满足他们的具体需求。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Machine learning for an explainable cost prediction of medical insurance

Predictive modeling in healthcare continues to be an active actuarial research topic as more insurance companies aim to maximize the potential of Machine Learning (ML) approaches to increase their productivity and efficiency. In this paper, the authors deployed three regression-based ensemble ML models that combine variations of decision trees through Extreme Gradient Boosting (XGBoost), Gradient-boosting Machine (GBM), and Random Forest (RF) methods in predicting medical insurance costs. Explainable Artificial Intelligence (XAi) methods SHapley Additive exPlanations (SHAP) and Individual Conditional Expectation (ICE) plots were deployed to discover and explain the key determinant factors that influence medical insurance premium prices in the dataset. The dataset used comprised 986 records and is publicly available in the KAGGLE repository. The models were evaluated using four performance evaluation metrics, including R-squared (R2), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE). The results show that all models produced impressive outcomes; however, the XGBoost model achieved a better overall performance although it also expanded more computational resources, while the RF model recorded a lesser prediction error and consumed far fewer computing resources than the XGBoost model. Furthermore, we compared the outcome of both XAi methods in identifying the key determinant features that influenced the PremiumPrices for each model and whereas both XAi methods produced similar outcomes, we found that the ICE plots showed in more detail the interactions between each variable than the SHAP analysis which seemed to be more high-level. It is the aim of the authors that the contributions of this study will help policymakers, insurers, and potential medical insurance buyers in their decision-making process for selecting the right policies that meet their specific needs.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Machine learning with applications
Machine learning with applications Management Science and Operations Research, Artificial Intelligence, Computer Science Applications
自引率
0.00%
发文量
0
审稿时长
98 days
期刊最新文献
Document Layout Error Rate (DLER) metric to evaluate image segmentation methods Supervised machine learning for microbiomics: Bridging the gap between current and best practices Playing with words: Comparing the vocabulary and lexical diversity of ChatGPT and humans A survey on knowledge distillation: Recent advancements Texas rural land market integration: A causal analysis using machine learning applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1