Narayana Darapaneni, R. Prajeesh, Payel Dutta, Venkat K Pillai, Anirban Karak, A. Paduri
{"title":"Abstractive Text Summarization Using BERT and GPT-2 Models","authors":"Narayana Darapaneni, R. Prajeesh, Payel Dutta, Venkat K Pillai, Anirban Karak, A. Paduri","doi":"10.1109/IConSCEPT57958.2023.10170093","DOIUrl":null,"url":null,"abstract":"This paper aims to study the research articles related to Covid-19 and provide abstractive summarization on the same, demystifying the myths related to covid-19 as well as finding the possible root cause of hesitation in taking the vaccine. As per the government of India’s official site, as on 20th Jan 2023, 1 billion people have been fully vaccinated out of India’s total population of 1.4 billion. To fully eradicate Covid - 19, which emerged 3 years ago, the entire population needs to be vaccinated, but that’s not the case as people hesitate to get vaccinated due to various articles published in newspapers, social media, etc., the authenticity of which are unknown. In this paper we will try to summarize all the articles, as available in the CORD-19 dataset, using BERT and GPT-2 models. For extractive summarization, BERT models performed well, but there is a scope for improvement in abstractive summarization. Our approach involves utilizing regularization to suppress local similarity while simultaneously promoting global similarity, using the distill-gpt2 version with higher computing resources. We used V100 GPU with 100 computing engines from google collab-pro for faster computation and higher accuracy. The result will elaborate on the Rouge and Bleu score and its relevant significance for summarization.","PeriodicalId":240167,"journal":{"name":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IConSCEPT57958.2023.10170093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper aims to study the research articles related to Covid-19 and provide abstractive summarization on the same, demystifying the myths related to covid-19 as well as finding the possible root cause of hesitation in taking the vaccine. As per the government of India’s official site, as on 20th Jan 2023, 1 billion people have been fully vaccinated out of India’s total population of 1.4 billion. To fully eradicate Covid - 19, which emerged 3 years ago, the entire population needs to be vaccinated, but that’s not the case as people hesitate to get vaccinated due to various articles published in newspapers, social media, etc., the authenticity of which are unknown. In this paper we will try to summarize all the articles, as available in the CORD-19 dataset, using BERT and GPT-2 models. For extractive summarization, BERT models performed well, but there is a scope for improvement in abstractive summarization. Our approach involves utilizing regularization to suppress local similarity while simultaneously promoting global similarity, using the distill-gpt2 version with higher computing resources. We used V100 GPU with 100 computing engines from google collab-pro for faster computation and higher accuracy. The result will elaborate on the Rouge and Bleu score and its relevant significance for summarization.