{"title":"A novel keyphrase extraction method by combining FP-growth and LDA","authors":"Hao Sun, Bing Li, Bo Han","doi":"10.1109/FSKD.2017.8393033","DOIUrl":null,"url":null,"abstract":"Fast-growing technologies like cloud-computing, big data, mobile Internet, artificial intelligence, etc. have driven the emergences of a lot of new phrases. In this paper, we propose a novel keyphrases extraction method with two steps by combining FP-growth algorithm and Latent Dirichlet Allocation (LDA) topic modeling. In the first step, we apply FP-growth algorithm to obtain frequent neighborhood words co-occurring frequently as candidate phrases. In the second step, we extract significant keyphrases by LDA models. Our experiments on two datasets CVE-2015 and 20-newsgroups have shown that the proposed approach can extract significant keyphrases and these phrases can help improve the text classification accuracy.","PeriodicalId":236093,"journal":{"name":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"125 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2017.8393033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fast-growing technologies like cloud-computing, big data, mobile Internet, artificial intelligence, etc. have driven the emergences of a lot of new phrases. In this paper, we propose a novel keyphrases extraction method with two steps by combining FP-growth algorithm and Latent Dirichlet Allocation (LDA) topic modeling. In the first step, we apply FP-growth algorithm to obtain frequent neighborhood words co-occurring frequently as candidate phrases. In the second step, we extract significant keyphrases by LDA models. Our experiments on two datasets CVE-2015 and 20-newsgroups have shown that the proposed approach can extract significant keyphrases and these phrases can help improve the text classification accuracy.