{"title":"Data Sets Modeling and Frequency Prediction via Machine Learning and Neural Network","authors":"Ziqi Zhang","doi":"10.1109/ICESIT53460.2021.9696532","DOIUrl":null,"url":null,"abstract":"In recent years, generalized linear models have been widely used in auto insurance pricing, and some research results show that machine learning is better than generalized linear models in some aspects, but these results are only based on a single data set. In order to more comprehensively compare the effects of generalized linear models and machine learning methods on the problem of car insurance claim frequency prediction, a comparative test was carried out on 7 car insurance data sets, including deep learning, random forest, support vector machine, XGboost and other machine learning methods; Based on the same training set, establish different generalized linear models to predict the frequency of claims, select the best generalized linear model according to the minimum information criterion (AIC); obtain the best machine learning parameters and models through cross-validation tuning. The research results show that the prediction effect of XGboost on all data sets is consistently better than the generalized linear model; for some data sets with more independent variables and strong correlation between variables, the prediction effects of neural networks, deep learning and random forests Better than generalized linear models.","PeriodicalId":164745,"journal":{"name":"2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Emergency Science and Information Technology (ICESIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICESIT53460.2021.9696532","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In recent years, generalized linear models have been widely used in auto insurance pricing, and some research results show that machine learning is better than generalized linear models in some aspects, but these results are only based on a single data set. In order to more comprehensively compare the effects of generalized linear models and machine learning methods on the problem of car insurance claim frequency prediction, a comparative test was carried out on 7 car insurance data sets, including deep learning, random forest, support vector machine, XGboost and other machine learning methods; Based on the same training set, establish different generalized linear models to predict the frequency of claims, select the best generalized linear model according to the minimum information criterion (AIC); obtain the best machine learning parameters and models through cross-validation tuning. The research results show that the prediction effect of XGboost on all data sets is consistently better than the generalized linear model; for some data sets with more independent variables and strong correlation between variables, the prediction effects of neural networks, deep learning and random forests Better than generalized linear models.