Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)

Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden
{"title":"Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)","authors":"Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden","doi":"10.1109/PICECE.2019.8747236","DOIUrl":null,"url":null,"abstract":"The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.","PeriodicalId":375980,"journal":{"name":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICECE.2019.8747236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
面向单文档摘要(GGSDS)的基于图的生长自组织映射
大量的可用文本代表了在许多领域处理和利用它的显著挑战。因此,有许多文章建议自动总结文本。文本摘要仍然需要更准确和更高性能的模型。这是文本挖掘中最常见的任务之一。本文提出了一种新的基于图的单文档摘要(GGSDS)增长自组织映射。GGSDS是一种无监督抽取摘要方法,主要由五个任务组成:文本预处理、文档表示、子主题识别、句子排序和最后的摘要生成。在GGSDS中,文档的整个文本由一个累积图表示。该表示模型的选择支持提取所有必需的特征,以实现最合适的文本摘要,特别是句子之间的共享短语。在设计GGSDS模型时,考虑了子主题对生成摘要的准确性和全面性的影响。为此,使用G-GSOM将句子聚类成簇来表示文本的子主题。接下来,使用TextRank算法对句子进行评分,假设一个句子与其他句子的关系越密切,则认为它对子主题越重要,越具有代表性。最后,在每个聚类中选择得分最高的句子生成摘要。实验结果表明,GGSDS对两个数据集生成单个文档摘要的准确率在80%以上。此外,这些摘要涵盖了文档的大部分子主题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Investigation of Energy Harvesting Using Solar Water Heating and Photovoltaic Systems for Gaza and Montreal QC Climates Synthesis and Characterization of Manganese oxides Nanoparticles for Supercapacitor-Based Energy-Storage Device Sizing of a Photovoltaic LED Street Lighting System with PVsyst Software Net Zero Energy Retrofit Shading Strategies of Buildings in Gaza, Case Study: Multi-Storey Residential Buildings Fuzzy Control Design for Quasi-Z-Source Three Phase Inverter
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1