MiNB:少数派敏感Naïve非平衡数据多类分类的贝叶斯算法

Int. Arab J. Inf. Technol. Pub Date : 2022-01-01 DOI:10.34028/iajit/19/4/5

Pratik A. Barot, H. Jethva

{"title":"MiNB:少数派敏感Naïve非平衡数据多类分类的贝叶斯算法","authors":"Pratik A. Barot, H. Jethva","doi":"10.34028/iajit/19/4/5","DOIUrl":null,"url":null,"abstract":"The unbalanced nature of data makes it tough to achieve the desire performance goal for classification algorithms. The sub-optimal prediction system isn't a viable solution due to the high misclassification cost of minority events. Thus accurate imbalanced data classification could be a path changer for prediction in domains like medical diagnosis, judiciary, and disaster management systems. To date, most of the existing studies of imbalanced data are for the binary class dataset and supported by data sampling techniques that suffer from loss of information and over-fitting. In this paper, we present the modified naïve Bayesian algorithm for unbalanced data classification that eliminates the requirement of data level sampling. We compared our proposed model with the data sampling technique and cost-sensitive techniques. We use minority sensitive TP Rate, class-specific misclassification rate, and overall performance parameters such as accuracy, f-measure and G-mean. The result shows that our proposed algorithm shows a more optimal result for unbalanced data classification. Results shows reduction in misclassification rate and improve predictive performance for the minority class.","PeriodicalId":13624,"journal":{"name":"Int. Arab J. Inf. Technol.","volume":"47 1","pages":"609-616"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"MiNB: Minority Sensitive Naïve Bayesian Algorithm for Multi-Class Classification of Unbalanced Data\",\"authors\":\"Pratik A. Barot, H. Jethva\",\"doi\":\"10.34028/iajit/19/4/5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The unbalanced nature of data makes it tough to achieve the desire performance goal for classification algorithms. The sub-optimal prediction system isn't a viable solution due to the high misclassification cost of minority events. Thus accurate imbalanced data classification could be a path changer for prediction in domains like medical diagnosis, judiciary, and disaster management systems. To date, most of the existing studies of imbalanced data are for the binary class dataset and supported by data sampling techniques that suffer from loss of information and over-fitting. In this paper, we present the modified naïve Bayesian algorithm for unbalanced data classification that eliminates the requirement of data level sampling. We compared our proposed model with the data sampling technique and cost-sensitive techniques. We use minority sensitive TP Rate, class-specific misclassification rate, and overall performance parameters such as accuracy, f-measure and G-mean. The result shows that our proposed algorithm shows a more optimal result for unbalanced data classification. Results shows reduction in misclassification rate and improve predictive performance for the minority class.\",\"PeriodicalId\":13624,\"journal\":{\"name\":\"Int. Arab J. Inf. Technol.\",\"volume\":\"47 1\",\"pages\":\"609-616\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. Arab J. Inf. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34028/iajit/19/4/5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. Arab J. Inf. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/19/4/5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

数据的不平衡特性使得分类算法很难达到理想的性能目标。由于少数事件的错误分类成本高，次优预测系统不是一个可行的解决方案。因此，准确的不平衡数据分类可能会改变医疗诊断、司法和灾害管理系统等领域的预测路径。到目前为止，大多数对不平衡数据的研究都是针对二值类数据集的，并且通过数据采样技术来支持，这些技术存在信息丢失和过拟合的问题。本文提出了一种改进的naïve贝叶斯算法用于非平衡数据分类，该算法消除了对数据级采样的要求。我们将所提出的模型与数据抽样技术和成本敏感技术进行了比较。我们使用少数敏感的TP率、特定类别的误分类率和总体性能参数，如准确性、f-measure和G-mean。结果表明，本文提出的算法对不平衡数据的分类具有较好的效果。结果表明，少数类别的错误分类率降低，预测性能提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MiNB: Minority Sensitive Naïve Bayesian Algorithm for Multi-Class Classification of Unbalanced Data

The unbalanced nature of data makes it tough to achieve the desire performance goal for classification algorithms. The sub-optimal prediction system isn't a viable solution due to the high misclassification cost of minority events. Thus accurate imbalanced data classification could be a path changer for prediction in domains like medical diagnosis, judiciary, and disaster management systems. To date, most of the existing studies of imbalanced data are for the binary class dataset and supported by data sampling techniques that suffer from loss of information and over-fitting. In this paper, we present the modified naïve Bayesian algorithm for unbalanced data classification that eliminates the requirement of data level sampling. We compared our proposed model with the data sampling technique and cost-sensitive techniques. We use minority sensitive TP Rate, class-specific misclassification rate, and overall performance parameters such as accuracy, f-measure and G-mean. The result shows that our proposed algorithm shows a more optimal result for unbalanced data classification. Results shows reduction in misclassification rate and improve predictive performance for the minority class.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. Arab J. Inf. Technol.

自引率

0.00%

发文量