面向单文档摘要(GGSDS)的基于图的生长自组织映射

2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE) Pub Date : 2019-03-01 DOI:10.1109/PICECE.2019.8747236

Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden

{"title":"面向单文档摘要(GGSDS)的基于图的生长自组织映射","authors":"Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden","doi":"10.1109/PICECE.2019.8747236","DOIUrl":null,"url":null,"abstract":"The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.","PeriodicalId":375980,"journal":{"name":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)\",\"authors\":\"Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden\",\"doi\":\"10.1109/PICECE.2019.8747236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.\",\"PeriodicalId\":375980,\"journal\":{\"name\":\"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)\",\"volume\":\"32 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PICECE.2019.8747236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICECE.2019.8747236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

大量的可用文本代表了在许多领域处理和利用它的显著挑战。因此，有许多文章建议自动总结文本。文本摘要仍然需要更准确和更高性能的模型。这是文本挖掘中最常见的任务之一。本文提出了一种新的基于图的单文档摘要(GGSDS)增长自组织映射。GGSDS是一种无监督抽取摘要方法，主要由五个任务组成:文本预处理、文档表示、子主题识别、句子排序和最后的摘要生成。在GGSDS中，文档的整个文本由一个累积图表示。该表示模型的选择支持提取所有必需的特征，以实现最合适的文本摘要，特别是句子之间的共享短语。在设计GGSDS模型时，考虑了子主题对生成摘要的准确性和全面性的影响。为此，使用G-GSOM将句子聚类成簇来表示文本的子主题。接下来，使用TextRank算法对句子进行评分，假设一个句子与其他句子的关系越密切，则认为它对子主题越重要，越具有代表性。最后，在每个聚类中选择得分最高的句子生成摘要。实验结果表明，GGSDS对两个数据集生成单个文档摘要的准确率在80%以上。此外，这些摘要涵盖了文档的大部分子主题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Graph-based Growing self-organizing map for Single Document Summarization (GGSDS)

The huge collection of text available represents a remarkable challenge to process and exploit it in many fields. Therefore, there is a multitude of articles that are being proposed to summarize text automatically. More accurate and higher performing models are still required for text summarization. It is one of the most common tasks of text mining. In this paper, a novel Graph-based Growing self-organizing map for Single Document Summarization (GGSDS). GGSDS is an unsupervised extractive summarization approach composed mainly of five tasks: text pre-processing, document representation, sub-topics identification, sentence ranking and finally summary generation. The entire text of a document is represented in GGSDS by one accumulative graph. The choice of this representation model supports the extraction of all required features as to achieve the most suitable summary of text, especially the shared phrases between sentences. The impact of the sub-topics on the accuracy and comprehensiveness of the generated summary is taken into account in the design of GGSDS model. For this purpose, G-GSOM is employed to cluster sentences into clusters to represent the sub-topics of text. Next, sentences are scored using TextRank algorithm under the assumption that when a sentence has more relation with others, it is considered as more important and more representative to a sub-topic. Finally, the sentences with the highest score in each cluster are selected for generating the summary. Experimental results showed that GGSDS generated summaries of single documents with more than 80% accuracy of two datasets. Furthermore, these summaries covered most of the sub-topics of the documents.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)

自引率

0.00%

发文量