AIRA-ML:汽车保险风险评估-使用重采样方法的机器学习模型

IF 0.7 Q3 COMPUTER SCIENCE, THEORY & METHODS International Journal of Advanced Computer Science and Applications Pub Date : 2023-01-01 DOI:10.14569/ijacsa.2023.0140966
Ahmed Shawky Elbhrawy, Mohamed A. Belal, Mohamed Sameh Hassanein
{"title":"AIRA-ML:汽车保险风险评估-使用重采样方法的机器学习模型","authors":"Ahmed Shawky Elbhrawy, Mohamed A. Belal, Mohamed Sameh Hassanein","doi":"10.14569/ijacsa.2023.0140966","DOIUrl":null,"url":null,"abstract":"Predicting underwriting risk has become a major challenge due to the imbalanced datasets in the field. A real-world imbalanced dataset is used in this work with 12 variables in 30144 cases, where most of the cases were classified as \"accepting the insurance request\", while a small percentage classified as \"refusing insurance\". This work developed 55 machine learning (ML) models to predict whether or not to renew policies. The models were developed using the original dataset and four data-level approaches resampling techniques: random oversampling, SMOTE, random undersampling, and hybrid methods with 11 ML algorithms to address the issue of imbalanced data (11 ML× (4 resampling techniques + unbalanced datasets) = 55 ML models). Seven classifier efficiency measures were used to evaluate these 55 models that were developed using 11 ML algorithms: logistic regression (LR), random forest (RF), artificial neural network (ANN), multilayer perceptron (MLP), support vector machine (SVM), naive Bayes (NB), decision tree (DT), XGBoost, k-nearest neighbors (KNN), stochastic gradient boosting (SGB), and AdaBoost. The seven classifier efficiency measures namely are accuracy, sensitivity, specificity, AUC, precision, F1-measure, and kappa. CRISP-DM methodology is utilisied to ensure that studies are conducted in a rigorous and systematic manner. Additionally, RapidMiner software was used to apply the algorithms and analyze the data, which highlighted the potential of ML to improve the accuracy of risk assessment in insurance underwriting. The results showed that all ML classifiers became more effective when using resampling strategies; where Hybrid resampling methods improved the performance of machine learning models on imbalanced data with an accuracy of 0.9967 and kappa statistics of 0.992 for the RF classifier.","PeriodicalId":13824,"journal":{"name":"International Journal of Advanced Computer Science and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.7000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AIRA-ML: Auto Insurance Risk Assessment-Machine Learning Model using Resampling Methods\",\"authors\":\"Ahmed Shawky Elbhrawy, Mohamed A. Belal, Mohamed Sameh Hassanein\",\"doi\":\"10.14569/ijacsa.2023.0140966\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Predicting underwriting risk has become a major challenge due to the imbalanced datasets in the field. A real-world imbalanced dataset is used in this work with 12 variables in 30144 cases, where most of the cases were classified as \\\"accepting the insurance request\\\", while a small percentage classified as \\\"refusing insurance\\\". This work developed 55 machine learning (ML) models to predict whether or not to renew policies. The models were developed using the original dataset and four data-level approaches resampling techniques: random oversampling, SMOTE, random undersampling, and hybrid methods with 11 ML algorithms to address the issue of imbalanced data (11 ML× (4 resampling techniques + unbalanced datasets) = 55 ML models). Seven classifier efficiency measures were used to evaluate these 55 models that were developed using 11 ML algorithms: logistic regression (LR), random forest (RF), artificial neural network (ANN), multilayer perceptron (MLP), support vector machine (SVM), naive Bayes (NB), decision tree (DT), XGBoost, k-nearest neighbors (KNN), stochastic gradient boosting (SGB), and AdaBoost. The seven classifier efficiency measures namely are accuracy, sensitivity, specificity, AUC, precision, F1-measure, and kappa. CRISP-DM methodology is utilisied to ensure that studies are conducted in a rigorous and systematic manner. Additionally, RapidMiner software was used to apply the algorithms and analyze the data, which highlighted the potential of ML to improve the accuracy of risk assessment in insurance underwriting. The results showed that all ML classifiers became more effective when using resampling strategies; where Hybrid resampling methods improved the performance of machine learning models on imbalanced data with an accuracy of 0.9967 and kappa statistics of 0.992 for the RF classifier.\",\"PeriodicalId\":13824,\"journal\":{\"name\":\"International Journal of Advanced Computer Science and Applications\",\"volume\":\"15 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.7000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Advanced Computer Science and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14569/ijacsa.2023.0140966\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Advanced Computer Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14569/ijacsa.2023.0140966","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

摘要

由于该领域数据集的不平衡,预测承保风险已成为一项重大挑战。在这项工作中使用了一个真实世界的不平衡数据集,在30144个案例中有12个变量,其中大多数案例被归类为“接受保险请求”,而一小部分被归类为“拒绝保险”。这项工作开发了55个机器学习(ML)模型来预测是否更新政策。这些模型是使用原始数据集和四种数据级方法重新采样技术开发的:随机过采样、SMOTE、随机欠采样和11 ML算法的混合方法,以解决数据不平衡问题(11 mlx(4重采样技术+不平衡数据集)= 55 ML模型)。7个分类器效率指标用于评估使用11 ML算法开发的55个模型:逻辑回归(LR)、随机森林(RF)、人工神经网络(ANN)、多层感知器(MLP)、支持向量机(SVM)、朴素贝叶斯(NB)、决策树(DT)、XGBoost、k近邻(KNN)、随机梯度增强(SGB)和AdaBoost。7个分类器效率指标分别是准确性、灵敏度、特异性、AUC、精度、F1-measure和kappa。采用CRISP-DM方法确保以严格和系统的方式进行研究。此外,使用RapidMiner软件应用算法并分析数据,这突出了机器学习在提高保险承保风险评估准确性方面的潜力。结果表明,当使用重采样策略时,所有ML分类器都变得更加有效;其中,混合重采样方法提高了机器学习模型在不平衡数据上的性能,RF分类器的准确率为0.9967,kappa统计量为0.992。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
AIRA-ML: Auto Insurance Risk Assessment-Machine Learning Model using Resampling Methods
Predicting underwriting risk has become a major challenge due to the imbalanced datasets in the field. A real-world imbalanced dataset is used in this work with 12 variables in 30144 cases, where most of the cases were classified as "accepting the insurance request", while a small percentage classified as "refusing insurance". This work developed 55 machine learning (ML) models to predict whether or not to renew policies. The models were developed using the original dataset and four data-level approaches resampling techniques: random oversampling, SMOTE, random undersampling, and hybrid methods with 11 ML algorithms to address the issue of imbalanced data (11 ML× (4 resampling techniques + unbalanced datasets) = 55 ML models). Seven classifier efficiency measures were used to evaluate these 55 models that were developed using 11 ML algorithms: logistic regression (LR), random forest (RF), artificial neural network (ANN), multilayer perceptron (MLP), support vector machine (SVM), naive Bayes (NB), decision tree (DT), XGBoost, k-nearest neighbors (KNN), stochastic gradient boosting (SGB), and AdaBoost. The seven classifier efficiency measures namely are accuracy, sensitivity, specificity, AUC, precision, F1-measure, and kappa. CRISP-DM methodology is utilisied to ensure that studies are conducted in a rigorous and systematic manner. Additionally, RapidMiner software was used to apply the algorithms and analyze the data, which highlighted the potential of ML to improve the accuracy of risk assessment in insurance underwriting. The results showed that all ML classifiers became more effective when using resampling strategies; where Hybrid resampling methods improved the performance of machine learning models on imbalanced data with an accuracy of 0.9967 and kappa statistics of 0.992 for the RF classifier.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
2.30
自引率
22.20%
发文量
519
期刊介绍: IJACSA is a scholarly computer science journal representing the best in research. Its mission is to provide an outlet for quality research to be publicised and published to a global audience. The journal aims to publish papers selected through rigorous double-blind peer review to ensure originality, timeliness, relevance, and readability. In sync with the Journal''s vision "to be a respected publication that publishes peer reviewed research articles, as well as review and survey papers contributed by International community of Authors", we have drawn reviewers and editors from Institutions and Universities across the globe. A double blind peer review process is conducted to ensure that we retain high standards. At IJACSA, we stand strong because we know that global challenges make way for new innovations, new ways and new talent. International Journal of Advanced Computer Science and Applications publishes carefully refereed research, review and survey papers which offer a significant contribution to the computer science literature, and which are of interest to a wide audience. Coverage extends to all main-stream branches of computer science and related applications
期刊最新文献
Comparison of K-Nearest Neighbor, Naive Bayes Classifier, Decision Tree, and Logistic Regression in Classification of Non-Performing Financing Simulation of fire exposure behavior to building structural elements using LISA FEA V.8. An Exploration into Hybrid Agile Development Approach A Study on Sentiment Analysis Techniques of Twitter Data Handwriting Recognition using Artificial Intelligence Neural Network and Image Processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1