A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.

Sumanta Banerjee, Shyamapada Mukherjee, Sivaji Bandyopadhyay
{"title":"A novel centroid based sentence classification approach for extractive summarization of COVID-19 news reports.","authors":"Sumanta Banerjee,&nbsp;Shyamapada Mukherjee,&nbsp;Sivaji Bandyopadhyay","doi":"10.1007/s41870-023-01221-x","DOIUrl":null,"url":null,"abstract":"<p><p>A COVID-19 news covers subtopics like infections, deaths, the economy, jobs, and more. The proposed method generates a news summary based on the subtopics of a reader's interest. It extracts a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them. The centroid is then used as a query in the vector space model (VSM) for sentence classification and extraction, producing a query focused summarization (QFS) of the documents. Three approaches, TF-IDF, word vector averaging, and auto-encoder are experimented to generate sentence embedding that are used in VSM. These embeddings are ranked depending on their similarities with the query embedding. A Novel approach has been introduced to find the value for the similarity parameter using a supervised technique to classify the sentences. Finally, the performance of the method has been assessed in two different ways. All the sentences of the dataset are considered together in the first assessment and in the second, each document wise group of sentences is considered separately using fivefold cross-validation. The proposed method has achieved a minimum of 0.60 to a maximum of 0.63 mean F1 scores with the three sentence encoding approaches on the test dataset.</p>","PeriodicalId":73455,"journal":{"name":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10036244/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of information technology : an official journal of Bharati Vidyapeeth's Institute of Computer Applications and Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41870-023-01221-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2023/3/24 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A COVID-19 news covers subtopics like infections, deaths, the economy, jobs, and more. The proposed method generates a news summary based on the subtopics of a reader's interest. It extracts a centroid having the lexical pattern of the sentences on those subtopics by the frequently used words in them. The centroid is then used as a query in the vector space model (VSM) for sentence classification and extraction, producing a query focused summarization (QFS) of the documents. Three approaches, TF-IDF, word vector averaging, and auto-encoder are experimented to generate sentence embedding that are used in VSM. These embeddings are ranked depending on their similarities with the query embedding. A Novel approach has been introduced to find the value for the similarity parameter using a supervised technique to classify the sentences. Finally, the performance of the method has been assessed in two different ways. All the sentences of the dataset are considered together in the first assessment and in the second, each document wise group of sentences is considered separately using fivefold cross-validation. The proposed method has achieved a minimum of 0.60 to a maximum of 0.63 mean F1 scores with the three sentence encoding approaches on the test dataset.

Abstract Image

Abstract Image

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种新的基于质心的句子分类方法,用于新冠肺炎新闻报道的提取摘要。
新冠肺炎新闻涵盖了感染、死亡、经济、就业等副主题。所提出的方法基于读者感兴趣的子主题生成新闻摘要。它通过子主题中的常用词来提取具有子主题句子词汇模式的质心。然后,质心被用作向量空间模型(VSM)中的查询,用于句子分类和提取,从而生成文档的以查询为中心的摘要(QFS)。实验了TF-IDF、词向量平均和自动编码器三种方法来生成VSM中使用的句子嵌入。这些嵌入根据它们与查询嵌入的相似性进行排序。引入了一种新的方法,使用监督技术对句子进行分类,以找到相似性参数的值。最后,通过两种不同的方式对该方法的性能进行了评估。在第一次评估中,数据集的所有句子都被一起考虑,在第二次评估中使用五倍交叉验证分别考虑每个文档中的句子组。所提出的方法在测试数据集上使用三种句子编码方法获得了最小0.60到最大0.63的平均F1分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Convolutional neural network based children recognition system using contactless fingerprints. On utilizing modified TOPSIS with R-norm q-rung picture fuzzy information measure green supplier selection. Adoption of machine learning algorithm for predicting the length of stay of patients (construction workers) during COVID pandemic. Adoption and sustainability of bitcoin and the blockchain technology in Nigeria. Debunking multi-lingual social media posts using deep learning.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1