A New Fuzzy Adaptive Algorithm to Classify Imbalanced Data

IF 1.7 4区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Cmc-computers Materials & Continua Pub Date : 2022-01-01 DOI:10.32604/cmc.2022.017114

Harshita Patel, D. Rajput, O. Stan, L. Miclea

{"title":"A New Fuzzy Adaptive Algorithm to Classify Imbalanced Data","authors":"Harshita Patel, D. Rajput, O. Stan, L. Miclea","doi":"10.32604/cmc.2022.017114","DOIUrl":null,"url":null,"abstract":"Classification of imbalanced data is a well explored issue in the data mining and machine learning community where one class representation is overwhelmed by other classes. The Imbalanced distribution of data is a natural occurrence in real world datasets, so needed to be dealt with carefully to get important insights. In case of imbalance in data sets, traditional classifiers have to sacrifice their performances, therefore lead to misclassifications. This paper suggests a weighted nearest neighbor approach in a fuzzy manner to deal with this issue. We have adapted the ‘existing algorithm modification solution’ to learn from imbalanced datasets that classify data without manipulating the natural distribution of data unlike the other popular data balancing methods. The K nearest neighbor is a non-parametric classification method that is mostly used in machine learning problems. Fuzzy classification with the nearest neighbor clears the belonging of an instance to classes and optimal weights with improved nearest neighbor concept helping to correctly classify imbalanced data. The proposed hybrid approach takes care of imbalance nature of data and reduces the inaccuracies appear in applications of original and traditional classifiers. Results show that it performs well over the existing fuzzy nearest neighbor and weighted neighbor strategies for imbalanced learning.","PeriodicalId":10440,"journal":{"name":"Cmc-computers Materials & Continua","volume":"26 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cmc-computers Materials & Continua","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/cmc.2022.017114","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 7

Abstract

Classification of imbalanced data is a well explored issue in the data mining and machine learning community where one class representation is overwhelmed by other classes. The Imbalanced distribution of data is a natural occurrence in real world datasets, so needed to be dealt with carefully to get important insights. In case of imbalance in data sets, traditional classifiers have to sacrifice their performances, therefore lead to misclassifications. This paper suggests a weighted nearest neighbor approach in a fuzzy manner to deal with this issue. We have adapted the ‘existing algorithm modification solution’ to learn from imbalanced datasets that classify data without manipulating the natural distribution of data unlike the other popular data balancing methods. The K nearest neighbor is a non-parametric classification method that is mostly used in machine learning problems. Fuzzy classification with the nearest neighbor clears the belonging of an instance to classes and optimal weights with improved nearest neighbor concept helping to correctly classify imbalanced data. The proposed hybrid approach takes care of imbalance nature of data and reduces the inaccuracies appear in applications of original and traditional classifiers. Results show that it performs well over the existing fuzzy nearest neighbor and weighted neighbor strategies for imbalanced learning.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

一种新的模糊自适应不平衡数据分类算法

不平衡数据的分类是数据挖掘和机器学习社区中一个很好的探索问题，其中一个类表示被其他类淹没。数据的不平衡分布在现实世界的数据集中是一种自然现象，因此需要仔细处理以获得重要的见解。在数据集不平衡的情况下，传统的分类器不得不牺牲其性能，从而导致误分类。本文提出了一种模糊加权最近邻法来处理这一问题。我们已经调整了“现有的算法修改解决方案”，从不平衡的数据集中学习数据分类，而不像其他流行的数据平衡方法那样操纵数据的自然分布。K近邻是一种非参数分类方法，主要用于机器学习问题。基于最近邻的模糊分类清除了实例对类的归属，改进了最近邻概念的最优权值有助于正确分类不平衡数据。该方法兼顾了数据的不平衡性，降低了传统分类器和原始分类器在应用中出现的不准确性。结果表明，该方法在不平衡学习方面优于现有的模糊近邻和加权近邻策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Cmc-computers Materials & Continua 工程技术-材料科学：综合

CiteScore

5.30

自引率

19.40%

发文量

345

审稿时长

1 months

期刊介绍： This journal publishes original research papers in the areas of computer networks, artificial intelligence, big data management, software engineering, multimedia, cyber security, internet of things, materials genome, integrated materials science, data analysis, modeling, and engineering of designing and manufacturing of modern functional and multifunctional materials. Novel high performance computing methods, big data analysis, and artificial intelligence that advance material technologies are especially welcome.