使用 TextR-BLG 指针算法处理词汇表外词语的自动文本摘要混合方法

IF 0.4 Q4 INFORMATION SCIENCE & LIBRARY SCIENCE Scientific and Technical Information Processing Pub Date : 2024-05-20 DOI:10.3103/s0147688224010106
Sonali Mhatre, Lata L. Ragha
{"title":"使用 TextR-BLG 指针算法处理词汇表外词语的自动文本摘要混合方法","authors":"Sonali Mhatre, Lata L. Ragha","doi":"10.3103/s0147688224010106","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.</p>","PeriodicalId":43962,"journal":{"name":"Scientific and Technical Information Processing","volume":"29 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Hybrid Approach for Automatic Text Summarization by Handling Out-of-Vocabulary Words Using TextR-BLG Pointer Algorithm\",\"authors\":\"Sonali Mhatre, Lata L. Ragha\",\"doi\":\"10.3103/s0147688224010106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<h3 data-test=\\\"abstract-sub-heading\\\">Abstract</h3><p>Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.</p>\",\"PeriodicalId\":43962,\"journal\":{\"name\":\"Scientific and Technical Information Processing\",\"volume\":\"29 1\",\"pages\":\"\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2024-05-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Scientific and Technical Information Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3103/s0147688224010106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and Technical Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3103/s0147688224010106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0

摘要

摘要 科学论文和政府报告等长篇文件通常以长篇对话的形式讨论实质性问题,阅读和理解起来非常耗时。生成抽象的摘要可以帮助读者快速掌握主要内容,但之前的工作大多集中在短篇文本上,而且还存在一些缺点,如词汇不足(OOV)、句子不匹配和摘要意义较少等。因此,为了克服这些问题,我们采用了 TextR-BLG 指针算法来实现文本自动摘要的混合方法。在这个设计的模型中,长文档作为自动文本摘要的输入,并对词频长度进行评估,根据阈值将句子分成提取和抽象两种方法。该模型中使用的文本算法可以找出句子的相似度得分,并通过绘制的图表进行验证。同样,高于阈值的句子被视为抽象方法。基于 BERT-LSTM-BiGRU (BLG) 的优化指针算法通过单词嵌入、编码和解码隐藏状态来学习句子的含义。最后,对重构后的句子进行考虑,以获得相似度得分并绘制图表。根据绘制的图表对抽象和提取方法得出的句子进行排序,以生成摘要。为了评估模型的性能,与现有模型相比,ROUGE 1、ROUGE 2、ROUGE L、Bert Score、Bleu Score 和 Meteor Score 分别为 59.2、58.4、62.3、0.92、0.78 和 0.67。通过对提出的摘要技术和现有摘要技术的评估,提出的模型比现有模型获得了更好的文本摘要效果。因此,使用 TextR-BLG 指针算法进行自动文本摘要的混合方法在处理词汇表外词语方面的表现优于现有的文本摘要技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Hybrid Approach for Automatic Text Summarization by Handling Out-of-Vocabulary Words Using TextR-BLG Pointer Algorithm

Abstract

Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Scientific and Technical Information Processing
Scientific and Technical Information Processing INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
1.00
自引率
42.90%
发文量
20
期刊介绍: Scientific and Technical Information Processing  is a refereed journal that covers all aspects of management and use of information technology in libraries and archives, information centres, and the information industry in general. Emphasis is on practical applications of new technologies and techniques for information analysis and processing.
期刊最新文献
Information Theories of Event Probability Estimation Webometric Assessment of Foreign Information Systems of Current Research Scientometric Analysis of Subjective Interest of Participants in a Scientific Event to Reports Presented and the Event As a Whole Information Resources of Professional Associations in the Field of World Politics and International Relations Bibliographic Database As a Tool for the Scientific Approach to the Solution of a Task of Socio-Economic Development: the Example of River Tourism
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1