考虑特定标签相关信息的增强型多标签特征选择

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2024-11-23 DOI:10.1016/j.eswa.2024.125819

Qingqi Han , Zhanpeng Zhao , Liang Hu, Wanfu Gao

{"title":"考虑特定标签相关信息的增强型多标签特征选择","authors":"Qingqi Han , Zhanpeng Zhao , Liang Hu, Wanfu Gao","doi":"10.1016/j.eswa.2024.125819","DOIUrl":null,"url":null,"abstract":"<div><div>In fields such as text classification and image recognition, multi-label data is frequently encountered. However, extracting information-rich and reliable features from high-dimensional multi-label datasets presents significant challenges in pattern recognition tasks. Traditional information-theoretic feature selection methods utilize a greedy algorithm strategy, selecting the feature that best meets the evaluation criteria in each iteration. However, the optimal result of each iteration does not necessarily yield a globally optimal solution. These methods primarily focus on the overall relevance of each feature with respect to all labels from a macro perspective, often overlooking the distribution of relevant information among features. This oversight can lead to the selection of features that are weakly correlated with the labels. Additionally, they neglect the impact of redundancy measures on feature scoring, resulting in the selection of some irrelevant features. To address these issues, we propose a novel multi-label feature selection method that evaluates the relevance between feature sets and label sets from both macro and micro perspectives. This method maximizes the relevance between features and the label set while ensuring the selection of features that are strongly correlated with each individual label. Classification experiments conducted on eight multi-label datasets demonstrate that the proposed method consistently outperforms seven comparative methods.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"264 ","pages":"Article 125819"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced multi-label feature selection considering label-specific relevant information\",\"authors\":\"Qingqi Han , Zhanpeng Zhao , Liang Hu, Wanfu Gao\",\"doi\":\"10.1016/j.eswa.2024.125819\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In fields such as text classification and image recognition, multi-label data is frequently encountered. However, extracting information-rich and reliable features from high-dimensional multi-label datasets presents significant challenges in pattern recognition tasks. Traditional information-theoretic feature selection methods utilize a greedy algorithm strategy, selecting the feature that best meets the evaluation criteria in each iteration. However, the optimal result of each iteration does not necessarily yield a globally optimal solution. These methods primarily focus on the overall relevance of each feature with respect to all labels from a macro perspective, often overlooking the distribution of relevant information among features. This oversight can lead to the selection of features that are weakly correlated with the labels. Additionally, they neglect the impact of redundancy measures on feature scoring, resulting in the selection of some irrelevant features. To address these issues, we propose a novel multi-label feature selection method that evaluates the relevance between feature sets and label sets from both macro and micro perspectives. This method maximizes the relevance between features and the label set while ensuring the selection of features that are strongly correlated with each individual label. Classification experiments conducted on eight multi-label datasets demonstrate that the proposed method consistently outperforms seven comparative methods.</div></div>\",\"PeriodicalId\":50461,\"journal\":{\"name\":\"Expert Systems with Applications\",\"volume\":\"264 \",\"pages\":\"Article 125819\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2024-11-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Expert Systems with Applications\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0957417424026861\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417424026861","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

在文本分类和图像识别等领域，经常会遇到多标签数据。然而，在模式识别任务中，从高维多标签数据集中提取信息丰富且可靠的特征是一项重大挑战。传统的信息论特征选择方法采用贪婪算法策略，在每次迭代中选择最符合评价标准的特征。然而，每次迭代的最优结果并不一定产生全局最优解。这些方法主要从宏观角度关注每个特征与所有标签的整体相关性，往往忽略了特征间相关信息的分布。这种忽略可能会导致选择与标签相关性较弱的特征。此外，它们还忽视了冗余度测量对特征评分的影响，从而导致选择了一些不相关的特征。为了解决这些问题，我们提出了一种新颖的多标签特征选择方法，从宏观和微观两个角度评估特征集和标签集之间的相关性。这种方法既能最大限度地提高特征与标签集之间的相关性，又能确保选择与每个标签密切相关的特征。在八个多标签数据集上进行的分类实验表明，所提出的方法始终优于七种比较方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Enhanced multi-label feature selection considering label-specific relevant information

In fields such as text classification and image recognition, multi-label data is frequently encountered. However, extracting information-rich and reliable features from high-dimensional multi-label datasets presents significant challenges in pattern recognition tasks. Traditional information-theoretic feature selection methods utilize a greedy algorithm strategy, selecting the feature that best meets the evaluation criteria in each iteration. However, the optimal result of each iteration does not necessarily yield a globally optimal solution. These methods primarily focus on the overall relevance of each feature with respect to all labels from a macro perspective, often overlooking the distribution of relevant information among features. This oversight can lead to the selection of features that are weakly correlated with the labels. Additionally, they neglect the impact of redundancy measures on feature scoring, resulting in the selection of some irrelevant features. To address these issues, we propose a novel multi-label feature selection method that evaluates the relevance between feature sets and label sets from both macro and micro perspectives. This method maximizes the relevance between features and the label set while ensuring the selection of features that are strongly correlated with each individual label. Classification experiments conducted on eight multi-label datasets demonstrate that the proposed method consistently outperforms seven comparative methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.