基于BERT和GPT-2模型的抽象文本摘要

Narayana Darapaneni, R. Prajeesh, Payel Dutta, Venkat K Pillai, Anirban Karak, A. Paduri
{"title":"基于BERT和GPT-2模型的抽象文本摘要","authors":"Narayana Darapaneni, R. Prajeesh, Payel Dutta, Venkat K Pillai, Anirban Karak, A. Paduri","doi":"10.1109/IConSCEPT57958.2023.10170093","DOIUrl":null,"url":null,"abstract":"This paper aims to study the research articles related to Covid-19 and provide abstractive summarization on the same, demystifying the myths related to covid-19 as well as finding the possible root cause of hesitation in taking the vaccine. As per the government of India’s official site, as on 20th Jan 2023, 1 billion people have been fully vaccinated out of India’s total population of 1.4 billion. To fully eradicate Covid - 19, which emerged 3 years ago, the entire population needs to be vaccinated, but that’s not the case as people hesitate to get vaccinated due to various articles published in newspapers, social media, etc., the authenticity of which are unknown. In this paper we will try to summarize all the articles, as available in the CORD-19 dataset, using BERT and GPT-2 models. For extractive summarization, BERT models performed well, but there is a scope for improvement in abstractive summarization. Our approach involves utilizing regularization to suppress local similarity while simultaneously promoting global similarity, using the distill-gpt2 version with higher computing resources. We used V100 GPU with 100 computing engines from google collab-pro for faster computation and higher accuracy. The result will elaborate on the Rouge and Bleu score and its relevant significance for summarization.","PeriodicalId":240167,"journal":{"name":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Abstractive Text Summarization Using BERT and GPT-2 Models\",\"authors\":\"Narayana Darapaneni, R. Prajeesh, Payel Dutta, Venkat K Pillai, Anirban Karak, A. Paduri\",\"doi\":\"10.1109/IConSCEPT57958.2023.10170093\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper aims to study the research articles related to Covid-19 and provide abstractive summarization on the same, demystifying the myths related to covid-19 as well as finding the possible root cause of hesitation in taking the vaccine. As per the government of India’s official site, as on 20th Jan 2023, 1 billion people have been fully vaccinated out of India’s total population of 1.4 billion. To fully eradicate Covid - 19, which emerged 3 years ago, the entire population needs to be vaccinated, but that’s not the case as people hesitate to get vaccinated due to various articles published in newspapers, social media, etc., the authenticity of which are unknown. In this paper we will try to summarize all the articles, as available in the CORD-19 dataset, using BERT and GPT-2 models. For extractive summarization, BERT models performed well, but there is a scope for improvement in abstractive summarization. Our approach involves utilizing regularization to suppress local similarity while simultaneously promoting global similarity, using the distill-gpt2 version with higher computing resources. We used V100 GPU with 100 computing engines from google collab-pro for faster computation and higher accuracy. The result will elaborate on the Rouge and Bleu score and its relevant significance for summarization.\",\"PeriodicalId\":240167,\"journal\":{\"name\":\"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IConSCEPT57958.2023.10170093\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IConSCEPT57958.2023.10170093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文旨在通过对Covid-19相关的研究文章进行研究,并对其进行抽象总结,揭开与Covid-19相关的神话,找到犹豫接种疫苗的可能根本原因。根据印度政府的官方网站,截至2023年1月20日,印度14亿总人口中有10亿人接种了全面疫苗。为了彻底根除3年前出现的Covid - 19,所有人都需要接种疫苗,但事实并非如此,因为报纸、社交媒体等上发表的各种文章真实性不明,人们对接种疫苗犹豫不决。在本文中,我们将尝试使用BERT和GPT-2模型总结CORD-19数据集中可用的所有文章。对于抽取摘要,BERT模型表现良好,但在抽象摘要方面仍有改进的余地。我们的方法包括利用正则化来抑制局部相似度,同时提高全局相似度,使用具有更高计算资源的蒸馏-gpt2版本。我们使用了V100 GPU和google lab-pro的100个计算引擎,以实现更快的计算速度和更高的精度。结果将阐述Rouge和Bleu评分及其相关意义进行总结。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Abstractive Text Summarization Using BERT and GPT-2 Models
This paper aims to study the research articles related to Covid-19 and provide abstractive summarization on the same, demystifying the myths related to covid-19 as well as finding the possible root cause of hesitation in taking the vaccine. As per the government of India’s official site, as on 20th Jan 2023, 1 billion people have been fully vaccinated out of India’s total population of 1.4 billion. To fully eradicate Covid - 19, which emerged 3 years ago, the entire population needs to be vaccinated, but that’s not the case as people hesitate to get vaccinated due to various articles published in newspapers, social media, etc., the authenticity of which are unknown. In this paper we will try to summarize all the articles, as available in the CORD-19 dataset, using BERT and GPT-2 models. For extractive summarization, BERT models performed well, but there is a scope for improvement in abstractive summarization. Our approach involves utilizing regularization to suppress local similarity while simultaneously promoting global similarity, using the distill-gpt2 version with higher computing resources. We used V100 GPU with 100 computing engines from google collab-pro for faster computation and higher accuracy. The result will elaborate on the Rouge and Bleu score and its relevant significance for summarization.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Three Port Full Bridge PFC Converter for Hybrid AC/DC/DC System with Fuzzy Logic Control ESH: A Non-Monotonic Activation Function For Image Classification Image Classification using Quantum Convolutional Neural Network Machine Learning Based Predictive Model for Intrusion Detection EV Sahayak: Android Assistance App for Electric Vehicle
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1