{"title":"最大化gSpan:基于频繁子图挖掘的多文档摘要","authors":"Riva Malik, Kifayat-Ullah Khan, Waqas Nawaz","doi":"10.1109/IMCOM56909.2023.10035618","DOIUrl":null,"url":null,"abstract":"Multi-document Summarization(MDS) involves ex-traction of salient information from multiple documents to represent it in a compressed yet comprehensible form. Existing approaches towards MDS utilize deep learning models. These approaches are data hungry and employ the complete search space of input documents for generating summaries. On the other hand, frequent subgraph mining(FSM) can be utilized as an unsupervised approach to reduce the search space where only the frequent subgraphs are considered as representative of documents. gSpan is a state-of-the-art FSM algorithm. The problem with using gSpan for generating subgraphs towards summarization is that the resultant subgraphs contain repetitive words that affects summary. To cater this problem, we propose Maximal gSpan, an extension of gSpan that mines maximal frequent subgraphs. These subgraphs contain diverse words hence resulting in better document coverage for summarization. The sentences of the summary are selected from these subgraphs. The proposed approach achieves better results in terms of sum-marization evaluation metric, i.e., ROUGE scores, compared with TextRank, which is another unsupervised graph-based extractive summarization technique.","PeriodicalId":230213,"journal":{"name":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","volume":"94 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Maximal gSpan: Multi-Document Summarization through Frequent Subgraph Mining\",\"authors\":\"Riva Malik, Kifayat-Ullah Khan, Waqas Nawaz\",\"doi\":\"10.1109/IMCOM56909.2023.10035618\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-document Summarization(MDS) involves ex-traction of salient information from multiple documents to represent it in a compressed yet comprehensible form. Existing approaches towards MDS utilize deep learning models. These approaches are data hungry and employ the complete search space of input documents for generating summaries. On the other hand, frequent subgraph mining(FSM) can be utilized as an unsupervised approach to reduce the search space where only the frequent subgraphs are considered as representative of documents. gSpan is a state-of-the-art FSM algorithm. The problem with using gSpan for generating subgraphs towards summarization is that the resultant subgraphs contain repetitive words that affects summary. To cater this problem, we propose Maximal gSpan, an extension of gSpan that mines maximal frequent subgraphs. These subgraphs contain diverse words hence resulting in better document coverage for summarization. The sentences of the summary are selected from these subgraphs. The proposed approach achieves better results in terms of sum-marization evaluation metric, i.e., ROUGE scores, compared with TextRank, which is another unsupervised graph-based extractive summarization technique.\",\"PeriodicalId\":230213,\"journal\":{\"name\":\"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"volume\":\"94 2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IMCOM56909.2023.10035618\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMCOM56909.2023.10035618","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Maximal gSpan: Multi-Document Summarization through Frequent Subgraph Mining
Multi-document Summarization(MDS) involves ex-traction of salient information from multiple documents to represent it in a compressed yet comprehensible form. Existing approaches towards MDS utilize deep learning models. These approaches are data hungry and employ the complete search space of input documents for generating summaries. On the other hand, frequent subgraph mining(FSM) can be utilized as an unsupervised approach to reduce the search space where only the frequent subgraphs are considered as representative of documents. gSpan is a state-of-the-art FSM algorithm. The problem with using gSpan for generating subgraphs towards summarization is that the resultant subgraphs contain repetitive words that affects summary. To cater this problem, we propose Maximal gSpan, an extension of gSpan that mines maximal frequent subgraphs. These subgraphs contain diverse words hence resulting in better document coverage for summarization. The sentences of the summary are selected from these subgraphs. The proposed approach achieves better results in terms of sum-marization evaluation metric, i.e., ROUGE scores, compared with TextRank, which is another unsupervised graph-based extractive summarization technique.