Stop-words in keyphrase extraction problem

14th Conference of Open Innovation Association FRUCT Pub Date : 2013-11-01 DOI:10.1109/FRUCT.2013.6737953

S. Popova, Liubov Kovriguina, D. Mouromtsev, I. Khodyrev

{"title":"Stop-words in keyphrase extraction problem","authors":"S. Popova, Liubov Kovriguina, D. Mouromtsev, I. Khodyrev","doi":"10.1109/FRUCT.2013.6737953","DOIUrl":null,"url":null,"abstract":"Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research environment has specified a layout for keyphrase extraction. However, some of the possible decisions remain uninvolved in the paradigm. In the paper the authors observe the scope of interdisciplinary methods applicable to automatic stop list feeding. The chosen method belongs to the class of experiential models. The research procedure based on this method allows to improve the quality of keyphrase extraction on the stage of candidate keyphrase building. Several ways to automatic feeding of the stop lists are proposed in the paper as well. One of them is based on provisions of lexical statistics and the results of its application to the discussed task point out the non-gaussian nature of text corpora. The second way based on usage of the Inspec train collection to the feeding of stop lists improves the quality considerably.","PeriodicalId":169672,"journal":{"name":"14th Conference of Open Innovation Association FRUCT","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"14th Conference of Open Innovation Association FRUCT","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FRUCT.2013.6737953","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

Abstract

Keyword extraction problem is one of the most significant tasks in information retrieval. High-quality keyword extraction sufficiently influences the progress in the following subtasks of information retrieval: classification and clustering, data mining, knowledge extraction and representation, etc. The research environment has specified a layout for keyphrase extraction. However, some of the possible decisions remain uninvolved in the paradigm. In the paper the authors observe the scope of interdisciplinary methods applicable to automatic stop list feeding. The chosen method belongs to the class of experiential models. The research procedure based on this method allows to improve the quality of keyphrase extraction on the stage of candidate keyphrase building. Several ways to automatic feeding of the stop lists are proposed in the paper as well. One of them is based on provisions of lexical statistics and the results of its application to the discussed task point out the non-gaussian nature of text corpora. The second way based on usage of the Inspec train collection to the feeding of stop lists improves the quality considerably.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

关键词提取中的停用词问题

关键词提取问题是信息检索中最重要的任务之一。高质量的关键词提取对信息检索的分类与聚类、数据挖掘、知识提取与表示等子任务的进展有充分的影响。研究环境指定了关键字提取的布局。然而，一些可能的决策仍然不涉及范式。在本文中，作者观察了跨学科的方法适用于自动停止表进料的范围。所选择的方法属于经验模型的范畴。基于该方法的研究过程可以在候选关键词构建阶段提高关键词提取的质量。本文还提出了几种自动送料停机表的方法。其中一种是基于词汇统计的规定及其在讨论任务中的应用结果，指出了文本语料库的非高斯性质。第二种方法基于使用Inspec列车收集来馈送停车清单，大大提高了质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

14th Conference of Open Innovation Association FRUCT

自引率

0.00%

发文量

期刊最新文献

MneMojno — Design and deployment of a Semantic web service and a mobile application Energy aware power save mode management in wireless mesh networks The research platform for building medical diagnostic services Development of smart room services on top of Smart-M3 Designing telemedicine apps that health commissioners will adopt