Adaptive ensemble learning for efficient keyphrase extraction: Diagnosis, aggregation, and distillation

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-06-10 Epub Date: 2025-03-25 DOI:10.1016/j.eswa.2025.127236

Kai Zhang , Hongbo Gang , Feng Hu , Runlong Yu , Qi Liu

{"title":"Adaptive ensemble learning for efficient keyphrase extraction: Diagnosis, aggregation, and distillation","authors":"Kai Zhang , Hongbo Gang , Feng Hu , Runlong Yu , Qi Liu","doi":"10.1016/j.eswa.2025.127236","DOIUrl":null,"url":null,"abstract":"<div><div>Keyphrase extraction (KE) refers to the process of identifying words or phrases that signify the primary themes of a document. Although keyphrase extraction is important in many downstream applications, including scientific document indexing, search, and question answering, the challenge lies in executing this extraction both adaptively and effectively. To this end, we propose a novel <em><strong>D</strong>istillation-based <strong>A</strong>daptive <strong>E</strong>nsemble <strong>L</strong>earning (<strong>DAEL</strong>)</em> method specifically designed for efficient keyphrase extraction, encompassing diagnosis, aggregation, and distillation processes. Specifically, we initiate with a <em>Cognitive Diagnosis Module (CDM)</em> to evaluate the diverse capabilities of individual KE models. Following this, an <em>Adaptive Aggregation Module (AAM)</em> is employed to create a weight distribution uniquely suited to each data instance. The process concludes with a <em>Knowledge Distillation Module (KDM)</em> to distill the superior performance of the ensemble model into a single model, thereby refining its efficiency and reducing computational cost. Extensive testing on real-world datasets highlights the superior performance of the proposed model. In comparison with leading-edge methods, our approach notably excels in processing text with complex structures or significant noise, marking a substantial advancement in KE effectiveness.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127236"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425008589","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/25 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Keyphrase extraction (KE) refers to the process of identifying words or phrases that signify the primary themes of a document. Although keyphrase extraction is important in many downstream applications, including scientific document indexing, search, and question answering, the challenge lies in executing this extraction both adaptively and effectively. To this end, we propose a novel Distillation-based Adaptive Ensemble Learning (DAEL) method specifically designed for efficient keyphrase extraction, encompassing diagnosis, aggregation, and distillation processes. Specifically, we initiate with a Cognitive Diagnosis Module (CDM) to evaluate the diverse capabilities of individual KE models. Following this, an Adaptive Aggregation Module (AAM) is employed to create a weight distribution uniquely suited to each data instance. The process concludes with a Knowledge Distillation Module (KDM) to distill the superior performance of the ensemble model into a single model, thereby refining its efficiency and reducing computational cost. Extensive testing on real-world datasets highlights the superior performance of the proposed model. In comparison with leading-edge methods, our approach notably excels in processing text with complex structures or significant noise, marking a substantial advancement in KE effectiveness.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于高效关键词提取的自适应集成学习：诊断、聚合和蒸馏

关键词提取（KE）是指识别表示文档主要主题的单词或短语的过程。尽管关键字提取在许多下游应用程序中很重要，包括科学文档索引、搜索和问题回答，但挑战在于如何自适应地有效地执行这种提取。为此，我们提出了一种新的基于蒸馏的自适应集成学习（DAEL）方法，专门用于高效的关键词提取，包括诊断、聚合和蒸馏过程。具体来说，我们首先使用认知诊断模块（CDM）来评估各个KE模型的不同能力。接下来，将使用自适应聚合模块（AAM）来创建唯一适合于每个数据实例的权重分布。最后利用知识蒸馏模块（Knowledge Distillation Module， KDM）将集成模型的优秀性能提炼成单个模型，从而提高集成模型的效率，降低计算成本。对真实世界数据集的广泛测试突出了所提出模型的优越性能。与前沿方法相比，我们的方法在处理复杂结构或明显噪声的文本方面表现突出，标志着KE有效性的实质性进步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Expert Systems with Applications 工程技术-工程：电子与电气

CiteScore

13.80

自引率

10.60%

发文量

2045

审稿时长

8.7 months

期刊介绍： Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.