最大化gSpan:基于频繁子图挖掘的多文档摘要

Riva Malik, Kifayat-Ullah Khan, Waqas Nawaz
{"title":"最大化gSpan:基于频繁子图挖掘的多文档摘要","authors":"Riva Malik, Kifayat-Ullah Khan, Waqas Nawaz","doi":"10.1109/IMCOM56909.2023.10035618","DOIUrl":null,"url":null,"abstract":"Multi-document Summarization(MDS) involves ex-traction of salient information from multiple documents to represent it in a compressed yet comprehensible form. Existing approaches towards MDS utilize deep learning models. These approaches are data hungry and employ the complete search space of input documents for generating summaries. On the other hand, frequent subgraph mining(FSM) can be utilized as an unsupervised approach to reduce the search space where only the frequent subgraphs are considered as representative of documents. gSpan is a state-of-the-art FSM algorithm. The problem with using gSpan for generating subgraphs towards summarization is that the resultant subgraphs contain repetitive words that affects summary. To cater this problem, we propose Maximal gSpan, an extension of gSpan that mines maximal frequent subgraphs. These subgraphs contain diverse words hence resulting in better document coverage for summarization. The sentences of the summary are selected from these subgraphs. The proposed approach achieves better results in terms of sum-marization evaluation metric, i.e., ROUGE scores, compared with TextRank, which is another unsupervised graph-based extractive summarization technique.","PeriodicalId":230213,"journal":{"name":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"94 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Maximal gSpan: Multi-Document Summarization through Frequent Subgraph Mining\",\"authors\":\"Riva Malik, Kifayat-Ullah Khan, Waqas Nawaz\",\"doi\":\"10.1109/IMCOM56909.2023.10035618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-document Summarization(MDS) involves ex-traction of salient information from multiple documents to represent it in a compressed yet comprehensible form. Existing approaches towards MDS utilize deep learning models. These approaches are data hungry and employ the complete search space of input documents for generating summaries. On the other hand, frequent subgraph mining(FSM) can be utilized as an unsupervised approach to reduce the search space where only the frequent subgraphs are considered as representative of documents. gSpan is a state-of-the-art FSM algorithm. The problem with using gSpan for generating subgraphs towards summarization is that the resultant subgraphs contain repetitive words that affects summary. To cater this problem, we propose Maximal gSpan, an extension of gSpan that mines maximal frequent subgraphs. These subgraphs contain diverse words hence resulting in better document coverage for summarization. The sentences of the summary are selected from these subgraphs. The proposed approach achieves better results in terms of sum-marization evaluation metric, i.e., ROUGE scores, compared with TextRank, which is another unsupervised graph-based extractive summarization technique.\",\"PeriodicalId\":230213,\"journal\":{\"name\":\"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"94 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCOM56909.2023.10035618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM56909.2023.10035618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

多文档摘要(Multi-document Summarization, MDS)涉及从多个文档中提取重要信息,以压缩但易于理解的形式表示这些信息。现有的MDS方法使用深度学习模型。这些方法需要大量数据,并使用输入文档的完整搜索空间来生成摘要。另一方面,频繁子图挖掘(FSM)可以作为一种无监督的方法来减少搜索空间,其中只有频繁子图被认为是文档的代表。gSpan是最先进的FSM算法。使用gSpan生成用于摘要的子图的问题是,生成的子图包含影响摘要的重复单词。为了解决这个问题,我们提出了最大值gSpan,它是gSpan的扩展,用于挖掘最大频繁子图。这些子图包含不同的单词,因此可以为摘要提供更好的文档覆盖率。摘要的句子是从这些子图中挑选出来的。与另一种基于无监督图的提取摘要技术TextRank相比,该方法在总结评价指标(即ROUGE分数)方面取得了更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Maximal gSpan: Multi-Document Summarization through Frequent Subgraph Mining
Multi-document Summarization(MDS) involves ex-traction of salient information from multiple documents to represent it in a compressed yet comprehensible form. Existing approaches towards MDS utilize deep learning models. These approaches are data hungry and employ the complete search space of input documents for generating summaries. On the other hand, frequent subgraph mining(FSM) can be utilized as an unsupervised approach to reduce the search space where only the frequent subgraphs are considered as representative of documents. gSpan is a state-of-the-art FSM algorithm. The problem with using gSpan for generating subgraphs towards summarization is that the resultant subgraphs contain repetitive words that affects summary. To cater this problem, we propose Maximal gSpan, an extension of gSpan that mines maximal frequent subgraphs. These subgraphs contain diverse words hence resulting in better document coverage for summarization. The sentences of the summary are selected from these subgraphs. The proposed approach achieves better results in terms of sum-marization evaluation metric, i.e., ROUGE scores, compared with TextRank, which is another unsupervised graph-based extractive summarization technique.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lightweight energy-efficient offloading framework for mobile edge/cloud computing Dual ResNet-based Environmental Sound Classification using GAN Finite Element Method for System-in-Package (SiP) Technology: Thermal Analysis Using Chip Cooling Laminate Chip (CCLC) An Improved Reverse Distillation Model for Unsupervised Anomaly Detection Pictorial Map Generation based on Color Extraction and Sentiment Analysis using SNS Photos
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1