Abstractive Text Summarization Using BERT and GPT-2 Models

2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT) Pub Date : 2023-05-25 DOI:10.1109/IConSCEPT57958.2023.10170093

Narayana Darapaneni, R. Prajeesh, Payel Dutta, Venkat K Pillai, Anirban Karak, A. Paduri

{"title":"Abstractive Text Summarization Using BERT and GPT-2 Models","authors":"Narayana Darapaneni, R. Prajeesh, Payel Dutta, Venkat K Pillai, Anirban Karak, A. Paduri","doi":"10.1109/IConSCEPT57958.2023.10170093","DOIUrl":null,"url":null,"abstract":"This paper aims to study the research articles related to Covid-19 and provide abstractive summarization on the same, demystifying the myths related to covid-19 as well as finding the possible root cause of hesitation in taking the vaccine. As per the government of India’s official site, as on 20th Jan 2023, 1 billion people have been fully vaccinated out of India’s total population of 1.4 billion. To fully eradicate Covid - 19, which emerged 3 years ago, the entire population needs to be vaccinated, but that’s not the case as people hesitate to get vaccinated due to various articles published in newspapers, social media, etc., the authenticity of which are unknown. In this paper we will try to summarize all the articles, as available in the CORD-19 dataset, using BERT and GPT-2 models. For extractive summarization, BERT models performed well, but there is a scope for improvement in abstractive summarization. Our approach involves utilizing regularization to suppress local similarity while simultaneously promoting global similarity, using the distill-gpt2 version with higher computing resources. We used V100 GPU with 100 computing engines from google collab-pro for faster computation and higher accuracy. The result will elaborate on the Rouge and Bleu score and its relevant significance for summarization.","PeriodicalId":240167,"journal":{"name":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IConSCEPT57958.2023.10170093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper aims to study the research articles related to Covid-19 and provide abstractive summarization on the same, demystifying the myths related to covid-19 as well as finding the possible root cause of hesitation in taking the vaccine. As per the government of India’s official site, as on 20th Jan 2023, 1 billion people have been fully vaccinated out of India’s total population of 1.4 billion. To fully eradicate Covid - 19, which emerged 3 years ago, the entire population needs to be vaccinated, but that’s not the case as people hesitate to get vaccinated due to various articles published in newspapers, social media, etc., the authenticity of which are unknown. In this paper we will try to summarize all the articles, as available in the CORD-19 dataset, using BERT and GPT-2 models. For extractive summarization, BERT models performed well, but there is a scope for improvement in abstractive summarization. Our approach involves utilizing regularization to suppress local similarity while simultaneously promoting global similarity, using the distill-gpt2 version with higher computing resources. We used V100 GPU with 100 computing engines from google collab-pro for faster computation and higher accuracy. The result will elaborate on the Rouge and Bleu score and its relevant significance for summarization.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于BERT和GPT-2模型的抽象文本摘要

本文旨在通过对Covid-19相关的研究文章进行研究，并对其进行抽象总结，揭开与Covid-19相关的神话，找到犹豫接种疫苗的可能根本原因。根据印度政府的官方网站，截至2023年1月20日，印度14亿总人口中有10亿人接种了全面疫苗。为了彻底根除3年前出现的Covid - 19，所有人都需要接种疫苗，但事实并非如此，因为报纸、社交媒体等上发表的各种文章真实性不明，人们对接种疫苗犹豫不决。在本文中，我们将尝试使用BERT和GPT-2模型总结CORD-19数据集中可用的所有文章。对于抽取摘要，BERT模型表现良好，但在抽象摘要方面仍有改进的余地。我们的方法包括利用正则化来抑制局部相似度，同时提高全局相似度，使用具有更高计算资源的蒸馏-gpt2版本。我们使用了V100 GPU和google lab-pro的100个计算引擎，以实现更快的计算速度和更高的精度。结果将阐述Rouge和Bleu评分及其相关意义进行总结。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)

自引率

0.00%

发文量

期刊最新文献

Three Port Full Bridge PFC Converter for Hybrid AC/DC/DC System with Fuzzy Logic Control ESH: A Non-Monotonic Activation Function For Image Classification Image Classification using Quantum Convolutional Neural Network Machine Learning Based Predictive Model for Intrusion Detection EV Sahayak: Android Assistance App for Electric Vehicle