{"title":"A Hybrid Approach for Automatic Text Summarization by Handling Out-of-Vocabulary Words Using TextR-BLG Pointer Algorithm","authors":"Sonali Mhatre, Lata L. Ragha","doi":"10.3103/s0147688224010106","DOIUrl":null,"url":null,"abstract":"<h3 data-test=\"abstract-sub-heading\">Abstract</h3><p>Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.</p>","PeriodicalId":43962,"journal":{"name":"Scientific and Technical Information Processing","volume":"29 1","pages":""},"PeriodicalIF":0.4000,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and Technical Information Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3103/s0147688224010106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Long documents such as scientific papers and government reports, often discuss substantial issues in long conversation, which are time-consuming to read and understand. Generating abstractive summaries can help readers quickly grasp the main topics, yet prior work has mostly focused on short texts and also has some of the drawbacks such as out-of-vocabulary (OOV), mismatched sentences and meaning less summary. Hence, to overcome these issues the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm. In this designed model, the long document is given as an input for automatic text summarization and are evaluated for the word frequency length, based on the threshold value the sentences are split into extractive and abstractive approach. The text algorithm used in this model finds out the sentence similarity score and is validated with the plotted graph. Likewise, the sentences above the threshold level are considered as the abstractive approach. The optimized BERT-LSTM-BiGRU (BLG) based pointer algorithm is used for learning the meaning from the sentences by word embedding, encoding and decoding the hidden states. Finally, the reframed sentences are considered for attaining the similarity score and plotting graph. The sentences from the abstractive and extractive approach are ranked based on the plotted graph for generating the summary. For evaluating the performance of the model, the ROUGE 1, ROUGE 2, ROUGE L, Bert Score, Bleu Score, and Meteor Score are 59.2, 58.4, 62.3, 0.92, 0.78, and 0.67, which are compared with the existing model. From the evaluation of proposed and existing summarization techniques, the proposed model attains better text summarization than the existing model. Thus, the hybrid approach for automatic text summarization using the TextR-BLG pointer algorithm performs better by handling out-of-vocabulary words than the existing text summarization techniques.
期刊介绍:
Scientific and Technical Information Processing is a refereed journal that covers all aspects of management and use of information technology in libraries and archives, information centres, and the information industry in general. Emphasis is on practical applications of new technologies and techniques for information analysis and processing.