Adaptive ensemble learning for efficient keyphrase extraction: Diagnosis, aggregation, and distillation

IF 7.5 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Expert Systems with Applications Pub Date : 2025-06-10 Epub Date: 2025-03-25 DOI:10.1016/j.eswa.2025.127236
Kai Zhang , Hongbo Gang , Feng Hu , Runlong Yu , Qi Liu
{"title":"Adaptive ensemble learning for efficient keyphrase extraction: Diagnosis, aggregation, and distillation","authors":"Kai Zhang ,&nbsp;Hongbo Gang ,&nbsp;Feng Hu ,&nbsp;Runlong Yu ,&nbsp;Qi Liu","doi":"10.1016/j.eswa.2025.127236","DOIUrl":null,"url":null,"abstract":"<div><div>Keyphrase extraction (KE) refers to the process of identifying words or phrases that signify the primary themes of a document. Although keyphrase extraction is important in many downstream applications, including scientific document indexing, search, and question answering, the challenge lies in executing this extraction both adaptively and effectively. To this end, we propose a novel <em><strong>D</strong>istillation-based <strong>A</strong>daptive <strong>E</strong>nsemble <strong>L</strong>earning (<strong>DAEL</strong>)</em> method specifically designed for efficient keyphrase extraction, encompassing diagnosis, aggregation, and distillation processes. Specifically, we initiate with a <em>Cognitive Diagnosis Module (CDM)</em> to evaluate the diverse capabilities of individual KE models. Following this, an <em>Adaptive Aggregation Module (AAM)</em> is employed to create a weight distribution uniquely suited to each data instance. The process concludes with a <em>Knowledge Distillation Module (KDM)</em> to distill the superior performance of the ensemble model into a single model, thereby refining its efficiency and reducing computational cost. Extensive testing on real-world datasets highlights the superior performance of the proposed model. In comparison with leading-edge methods, our approach notably excels in processing text with complex structures or significant noise, marking a substantial advancement in KE effectiveness.</div></div>","PeriodicalId":50461,"journal":{"name":"Expert Systems with Applications","volume":"278 ","pages":"Article 127236"},"PeriodicalIF":7.5000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Expert Systems with Applications","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0957417425008589","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/25 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Keyphrase extraction (KE) refers to the process of identifying words or phrases that signify the primary themes of a document. Although keyphrase extraction is important in many downstream applications, including scientific document indexing, search, and question answering, the challenge lies in executing this extraction both adaptively and effectively. To this end, we propose a novel Distillation-based Adaptive Ensemble Learning (DAEL) method specifically designed for efficient keyphrase extraction, encompassing diagnosis, aggregation, and distillation processes. Specifically, we initiate with a Cognitive Diagnosis Module (CDM) to evaluate the diverse capabilities of individual KE models. Following this, an Adaptive Aggregation Module (AAM) is employed to create a weight distribution uniquely suited to each data instance. The process concludes with a Knowledge Distillation Module (KDM) to distill the superior performance of the ensemble model into a single model, thereby refining its efficiency and reducing computational cost. Extensive testing on real-world datasets highlights the superior performance of the proposed model. In comparison with leading-edge methods, our approach notably excels in processing text with complex structures or significant noise, marking a substantial advancement in KE effectiveness.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于高效关键词提取的自适应集成学习:诊断、聚合和蒸馏
关键词提取(KE)是指识别表示文档主要主题的单词或短语的过程。尽管关键字提取在许多下游应用程序中很重要,包括科学文档索引、搜索和问题回答,但挑战在于如何自适应地有效地执行这种提取。为此,我们提出了一种新的基于蒸馏的自适应集成学习(DAEL)方法,专门用于高效的关键词提取,包括诊断、聚合和蒸馏过程。具体来说,我们首先使用认知诊断模块(CDM)来评估各个KE模型的不同能力。接下来,将使用自适应聚合模块(AAM)来创建唯一适合于每个数据实例的权重分布。最后利用知识蒸馏模块(Knowledge Distillation Module, KDM)将集成模型的优秀性能提炼成单个模型,从而提高集成模型的效率,降低计算成本。对真实世界数据集的广泛测试突出了所提出模型的优越性能。与前沿方法相比,我们的方法在处理复杂结构或明显噪声的文本方面表现突出,标志着KE有效性的实质性进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Expert Systems with Applications
Expert Systems with Applications 工程技术-工程:电子与电气
CiteScore
13.80
自引率
10.60%
发文量
2045
审稿时长
8.7 months
期刊介绍: Expert Systems With Applications is an international journal dedicated to the exchange of information on expert and intelligent systems used globally in industry, government, and universities. The journal emphasizes original papers covering the design, development, testing, implementation, and management of these systems, offering practical guidelines. It spans various sectors such as finance, engineering, marketing, law, project management, information management, medicine, and more. The journal also welcomes papers on multi-agent systems, knowledge management, neural networks, knowledge discovery, data mining, and other related areas, excluding applications to military/defense systems.
期刊最新文献
FairDiff: Masked condition diffusion for fairness-aware recommendation CTGAN-MNLIME: A CTGAN-boosted multidimensional nonlinear LIME method for corporate environmental indicators prediction An explainable machine learning-based scoring function using interpretable features and model explanation approaches for binding affinity prediction Hybrid fuzzy multi-criteria decision-making model for assessing sustainable waste management strategies MPGCF: Multi-objective and popularity-smoothing graph collaborative filtering for long-tail web API recommendation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1