{"title":"A multi-stage multi-objective GWO based feature selection approach for multi-label text classification","authors":"Pradip Dhal, Chandrashekhar Azad","doi":"10.1109/CONIT55038.2022.9847886","DOIUrl":null,"url":null,"abstract":"In Information Retrieval (IR), Text Mining (TM), and web search, Multi-label Text Classification (MTC) plays an essential role. A document can fall into more than one category in MTC. Text documents frequently include High Dimensional (HD) non-discriminative (noisy and irrelevant) phrases, resulting in high computing costs and impoverish learning performance of Text Classification (TC). The Feature Selection (FS) procedure is complicated by three issues caused by small samples and HD datasets. First, given limited samples and HD, FS is unstable. Second, with HD, FS takes longer. Third, a particular FS approach may not provide enough Classification Accuracy (CA). In this paper, we have developed a two-stage FS approach based Meta-heuristics Algorithm (MA) for MTC. The first stage work on the filter-based FS approach, while the second stage is based on the multi-objective Grey Wolf Optimization (GWO) algorithm. The first objective is to diminish the Hamming Loss (HL), and the second objective is to decrease the Selected Features (SF). We have used the Multi-Layer Perceptron (MLP) model for the classification task. The experimental findings show that the suggested FS scheme achieves superior HL with a less number of features.","PeriodicalId":270445,"journal":{"name":"2022 2nd International Conference on Intelligent Technologies (CONIT)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT55038.2022.9847886","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In Information Retrieval (IR), Text Mining (TM), and web search, Multi-label Text Classification (MTC) plays an essential role. A document can fall into more than one category in MTC. Text documents frequently include High Dimensional (HD) non-discriminative (noisy and irrelevant) phrases, resulting in high computing costs and impoverish learning performance of Text Classification (TC). The Feature Selection (FS) procedure is complicated by three issues caused by small samples and HD datasets. First, given limited samples and HD, FS is unstable. Second, with HD, FS takes longer. Third, a particular FS approach may not provide enough Classification Accuracy (CA). In this paper, we have developed a two-stage FS approach based Meta-heuristics Algorithm (MA) for MTC. The first stage work on the filter-based FS approach, while the second stage is based on the multi-objective Grey Wolf Optimization (GWO) algorithm. The first objective is to diminish the Hamming Loss (HL), and the second objective is to decrease the Selected Features (SF). We have used the Multi-Layer Perceptron (MLP) model for the classification task. The experimental findings show that the suggested FS scheme achieves superior HL with a less number of features.
在信息检索(IR)、文本挖掘(TM)和web搜索中,多标签文本分类(MTC)起着至关重要的作用。在MTC中,一个文档可以属于多个类别。文本文档中经常包含高维(HD)非判别(有噪声和不相关)短语,导致文本分类(TC)的计算成本高,学习性能差。特征选择(FS)过程由于小样本和高清数据集导致的三个问题而复杂化。首先,考虑到有限的样本和HD, FS是不稳定的。其次,使用HD, FS需要更长的时间。第三,特定的FS方法可能无法提供足够的分类精度(CA)。在本文中,我们开发了一个基于两阶段FS方法的MTC元启发式算法(MA)。第一阶段采用基于滤波器的FS方法,第二阶段采用多目标灰狼优化算法。第一个目标是减少Hamming Loss (HL),第二个目标是减少Selected Features (SF)。我们使用多层感知器(MLP)模型进行分类任务。实验结果表明,所提出的FS方案以较少的特征数达到了较好的HL。