Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden
{"title":"Graph-based Density Peaks Ranking Approach for Extracting KeyPhrases (GDREK)","authors":"Mahmoud R. Alfarra, Abdalfattah M. Alfarra, Ahmed Salahedden","doi":"10.1109/PICECE.2019.8747175","DOIUrl":null,"url":null,"abstract":"Surprisingly, there are more than 1,500,000 articles found by google scholar search engine on keyphrase extraction (KE) have been published recently, 21,000 of them only in current year. This large number implies that researchers need to find more accurate and better performing models for KE from text as a subtask of text mining and summarization. This paper presents a novel design of KE. The model is composed of Graph-based Representation, sentence clustering and ranking based on Density peaks for KE in single or multi-documents (GDREK) which can be used further in text extractive summarization. The principle of GDREK is using graph model to represent text and then group and rank the sentences in a mutuality manner. In this model, sentence grouping and ranking proceeds by discovering the main topics of text and finding central sentences of each topic incrementally. In this incremental step, as the sentences are grouped based on the Graph-based Growing Self-Organizing Map (G-GSOM), they are ranked using Density Peaks (DP) concept according to a measure of similarity between sentences. Our similarity measure is based on shared phrases and Cosine function. Sentences are scored under the assumption that when a sentence has more similar sentences, it is considered more important (higher density) and more representative. Finally, the most frequent words or phrases in the sentences are selected as key phrases of the text. Experimental results show that our innovative technique extracts the most key phrases and words of two datasets and yields over 75% accuracy and from most sub-topics of text.","PeriodicalId":375980,"journal":{"name":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 7th Palestinian International Conference on Electrical and Computer Engineering (PICECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICECE.2019.8747175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Surprisingly, there are more than 1,500,000 articles found by google scholar search engine on keyphrase extraction (KE) have been published recently, 21,000 of them only in current year. This large number implies that researchers need to find more accurate and better performing models for KE from text as a subtask of text mining and summarization. This paper presents a novel design of KE. The model is composed of Graph-based Representation, sentence clustering and ranking based on Density peaks for KE in single or multi-documents (GDREK) which can be used further in text extractive summarization. The principle of GDREK is using graph model to represent text and then group and rank the sentences in a mutuality manner. In this model, sentence grouping and ranking proceeds by discovering the main topics of text and finding central sentences of each topic incrementally. In this incremental step, as the sentences are grouped based on the Graph-based Growing Self-Organizing Map (G-GSOM), they are ranked using Density Peaks (DP) concept according to a measure of similarity between sentences. Our similarity measure is based on shared phrases and Cosine function. Sentences are scored under the assumption that when a sentence has more similar sentences, it is considered more important (higher density) and more representative. Finally, the most frequent words or phrases in the sentences are selected as key phrases of the text. Experimental results show that our innovative technique extracts the most key phrases and words of two datasets and yields over 75% accuracy and from most sub-topics of text.