News-Oriented Automatic Chinese Keyword Indexing

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI:10.3115/1119250.1119263

Sujian Li, Houfeng Wang, Shiwen Yu, Chengsheng Xin

引用次数: 9

Abstract

In our information era, keywords are very useful to information retrieval, text clustering and so on. News is always a domain attracting a large amount of attention. However, the majority of news articles come without keywords, and indexing them manually costs highly. Aiming at news articles' characteristics and the resources available, this paper introduces a simple procedure to index keywords based on the scoring system. In the process of indexing, we make use of some relatively mature linguistic techniques and tools to filter those meaningless candidate items. Furthermore, according to the hierarchical relations of content words, keywords are not restricted to extracting from text. These methods have improved our system a lot. At last experimental results are given and analyzed, showing that the quality of extracted keywords are satisfying.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

面向新闻的中文关键词自动索引

在信息时代，关键词在信息检索、文本聚类等方面发挥着重要作用。新闻总是一个吸引大量注意力的领域。然而，大多数新闻文章都没有关键字，手动索引它们的成本很高。针对新闻文章的特点和可利用的资源，介绍了一种基于评分系统的关键词索引的简单程序。在标引过程中，我们利用一些相对成熟的语言技术和工具，过滤掉那些没有意义的候选项。此外，根据内容词的层次关系，关键词不局限于从文本中提取。这些方法大大改善了我们的系统。最后给出了实验结果并进行了分析，结果表明所提取的关键词质量令人满意。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Workshop on Chinese Language Processing

自引率

0.00%

发文量

期刊最新文献

Building a Large Chinese Corpus Annotated with Semantic Dependency A Two-stage Statistical Word Segmentation System for Chinese Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation Chinese Word Segmentation in MSR-NLP Annotating the Propositions in the Penn Chinese Treebank