{"title":"通过决策树回归提高可解释性:以保险数据集为例","authors":"Shuyuan Dong, Dingzhou Fei","doi":"10.1109/ICCEAI52939.2021.00065","DOIUrl":null,"url":null,"abstract":"Rapidly rising national health care expenditure is a problem for both developed and developing countries. Based on the data of medical insurance of insurance companies, this study explores the influencing factors of medical insurance cost. Furthermore, the influencing factors are used as characteristic variables to establish decision tree regression model and linear regression model, and predict the medical insurance cost. The main conclusions are as follows: (1) The characteristics of “region” and “sex” do not affect the insurance cost.(2) Smoking has the greatest influence on insurance cost. Smoking is a characteristic of body mass index (BMI) and has a driving effect on insurance cost. (3) The regression correlation coefficient of decision tree is about 81%, and the linear regression correlation coefficient is 65%, that is, the prediction result of decision tree is more accurate.","PeriodicalId":331409,"journal":{"name":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Improve the interpretability by decision tree regression: exampled by an insurance dataset\",\"authors\":\"Shuyuan Dong, Dingzhou Fei\",\"doi\":\"10.1109/ICCEAI52939.2021.00065\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rapidly rising national health care expenditure is a problem for both developed and developing countries. Based on the data of medical insurance of insurance companies, this study explores the influencing factors of medical insurance cost. Furthermore, the influencing factors are used as characteristic variables to establish decision tree regression model and linear regression model, and predict the medical insurance cost. The main conclusions are as follows: (1) The characteristics of “region” and “sex” do not affect the insurance cost.(2) Smoking has the greatest influence on insurance cost. Smoking is a characteristic of body mass index (BMI) and has a driving effect on insurance cost. (3) The regression correlation coefficient of decision tree is about 81%, and the linear regression correlation coefficient is 65%, that is, the prediction result of decision tree is more accurate.\",\"PeriodicalId\":331409,\"journal\":{\"name\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCEAI52939.2021.00065\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computer Engineering and Artificial Intelligence (ICCEAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEAI52939.2021.00065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improve the interpretability by decision tree regression: exampled by an insurance dataset
Rapidly rising national health care expenditure is a problem for both developed and developing countries. Based on the data of medical insurance of insurance companies, this study explores the influencing factors of medical insurance cost. Furthermore, the influencing factors are used as characteristic variables to establish decision tree regression model and linear regression model, and predict the medical insurance cost. The main conclusions are as follows: (1) The characteristics of “region” and “sex” do not affect the insurance cost.(2) Smoking has the greatest influence on insurance cost. Smoking is a characteristic of body mass index (BMI) and has a driving effect on insurance cost. (3) The regression correlation coefficient of decision tree is about 81%, and the linear regression correlation coefficient is 65%, that is, the prediction result of decision tree is more accurate.