RBCA-ETS:利用上下文嵌入和词级关注加强提取式文本摘要

Ravindra Gangundi, Rajeswari Sridhar
{"title":"RBCA-ETS:利用上下文嵌入和词级关注加强提取式文本摘要","authors":"Ravindra Gangundi, Rajeswari Sridhar","doi":"10.1007/s41870-024-02192-3","DOIUrl":null,"url":null,"abstract":"<p>The existing limitations in extractive text summarization encompass challenges related to preserving contextual features, limited feature extraction capabilities, and handling hierarchical and compositional aspects. To address these issues, the RoBERTa-BiLSTM-CNN-Attention Extractive Text Summarization, i.e., the RBCA-ETS model, is proposed in this work. RoBERTa word embedding is used to generate contextual embeddings. Parallelly connected CNN and BiLSTM layers extract textual features. CNN focuses more on local features, and BiLSTM captures long-range dependencies that extend across sentences. These two feature sets are concatenated and forwarded to the attention layer, highlighting the most relevant features. In the output layer, a fully connected layer receives the attention vector and calculates sentence scores for each sentence. This leads to the generation of the final summary. The RBCA-ETS model has demonstrated superior performance on the CNN-Daily Mail (CNN/DM) dataset compared to many state-of-the-art methods, and it has also outperformed existing state-of-the-art techniques when tested on the out-of-domain DUC 2002 dataset.</p>","PeriodicalId":14138,"journal":{"name":"International Journal of Information Technology","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"RBCA-ETS: enhancing extractive text summarization with contextual embedding and word-level attention\",\"authors\":\"Ravindra Gangundi, Rajeswari Sridhar\",\"doi\":\"10.1007/s41870-024-02192-3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>The existing limitations in extractive text summarization encompass challenges related to preserving contextual features, limited feature extraction capabilities, and handling hierarchical and compositional aspects. To address these issues, the RoBERTa-BiLSTM-CNN-Attention Extractive Text Summarization, i.e., the RBCA-ETS model, is proposed in this work. RoBERTa word embedding is used to generate contextual embeddings. Parallelly connected CNN and BiLSTM layers extract textual features. CNN focuses more on local features, and BiLSTM captures long-range dependencies that extend across sentences. These two feature sets are concatenated and forwarded to the attention layer, highlighting the most relevant features. In the output layer, a fully connected layer receives the attention vector and calculates sentence scores for each sentence. This leads to the generation of the final summary. The RBCA-ETS model has demonstrated superior performance on the CNN-Daily Mail (CNN/DM) dataset compared to many state-of-the-art methods, and it has also outperformed existing state-of-the-art techniques when tested on the out-of-domain DUC 2002 dataset.</p>\",\"PeriodicalId\":14138,\"journal\":{\"name\":\"International Journal of Information Technology\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41870-024-02192-3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-024-02192-3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

抽取式文本摘要的现有局限性包括与保留上下文特征相关的挑战、有限的特征提取能力以及处理层次和组成方面的问题。为解决这些问题,本研究提出了 RoBERTa-BiLSTM-CNN-Attention Extractive Text Summarization(即 RBCA-ETS 模型)。RBCA-ETS 模型使用 RoBERTa 词嵌入生成上下文嵌入。并行连接的 CNN 和 BiLSTM 层提取文本特征。CNN 更注重局部特征,而 BiLSTM 则捕捉跨句子的长距离依赖关系。这两个特征集被串联起来并转发到注意力层,突出最相关的特征。在输出层,一个全连接层接收注意力向量,并计算每个句子的分数。这样就生成了最终摘要。与许多最先进的方法相比,RBCA-ETS 模型在 CNN-Daily Mail(CNN/DM)数据集上表现出了卓越的性能,在域外 DUC 2002 数据集上进行测试时,其性能也优于现有的最先进技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
RBCA-ETS: enhancing extractive text summarization with contextual embedding and word-level attention

The existing limitations in extractive text summarization encompass challenges related to preserving contextual features, limited feature extraction capabilities, and handling hierarchical and compositional aspects. To address these issues, the RoBERTa-BiLSTM-CNN-Attention Extractive Text Summarization, i.e., the RBCA-ETS model, is proposed in this work. RoBERTa word embedding is used to generate contextual embeddings. Parallelly connected CNN and BiLSTM layers extract textual features. CNN focuses more on local features, and BiLSTM captures long-range dependencies that extend across sentences. These two feature sets are concatenated and forwarded to the attention layer, highlighting the most relevant features. In the output layer, a fully connected layer receives the attention vector and calculates sentence scores for each sentence. This leads to the generation of the final summary. The RBCA-ETS model has demonstrated superior performance on the CNN-Daily Mail (CNN/DM) dataset compared to many state-of-the-art methods, and it has also outperformed existing state-of-the-art techniques when tested on the out-of-domain DUC 2002 dataset.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Statistical cryptanalysis of seven classical lightweight ciphers CNN-BO-LSTM: an ensemble framework for prognosis of liver cancer Architecting lymphoma fusion: PROMETHEE-II guided optimization of combination therapeutic synergy RBCA-ETS: enhancing extractive text summarization with contextual embedding and word-level attention RAMD and transient analysis of a juice clarification unit in sugar plants
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1