CHINERS: A Chinese Named Entity Recognition System for the Sports Domain

Workshop on Chinese Language Processing Pub Date : 2003-07-11 DOI:10.3115/1119250.1119258

Tianfang Yao, Wei Ding, G. Erbach

{"title":"CHINERS: A Chinese Named Entity Recognition System for the Sports Domain","authors":"Tianfang Yao, Wei Ding, G. Erbach","doi":"10.3115/1119250.1119258","DOIUrl":null,"url":null,"abstract":"In the investigation for Chinese named entity (NE) recognition, we are confronted with two principal challenges. One is how to ensure the quality of word segmentation and Part-of-Speech (POS) tagging, because its consequence has an adverse impact on the performance of NE recognition. Another is how to flexibly, reliably and accurately recognize NEs. In order to cope with the challenges, we propose a system architecture which is divided into two phases. In the first phase, we should reduce word segmentation and POS tagging errors leading to the second phase as much as possible. For this purpose, we utilize machine learning techniques to repair such errors. In the second phase, we design Finite State Cascades (FSC) which can be automatically constructed depending on the recognition rule sets as a shallow parser for the recognition of NEs. The advantages of that are reliable, accurate and easy to do maintenance for FSC. Additionally, to recognize special NEs, we work out the corresponding strategies to enhance the correctness of the recognition. The experimental evaluation of the system has shown that the total average recall and precision for six types of NEs are 83% and 85% respectively. Therefore, the system architecture is reasonable and effective.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"386 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Chinese Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1119250.1119258","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

In the investigation for Chinese named entity (NE) recognition, we are confronted with two principal challenges. One is how to ensure the quality of word segmentation and Part-of-Speech (POS) tagging, because its consequence has an adverse impact on the performance of NE recognition. Another is how to flexibly, reliably and accurately recognize NEs. In order to cope with the challenges, we propose a system architecture which is divided into two phases. In the first phase, we should reduce word segmentation and POS tagging errors leading to the second phase as much as possible. For this purpose, we utilize machine learning techniques to repair such errors. In the second phase, we design Finite State Cascades (FSC) which can be automatically constructed depending on the recognition rule sets as a shallow parser for the recognition of NEs. The advantages of that are reliable, accurate and easy to do maintenance for FSC. Additionally, to recognize special NEs, we work out the corresponding strategies to enhance the correctness of the recognition. The experimental evaluation of the system has shown that the total average recall and precision for six types of NEs are 83% and 85% respectively. Therefore, the system architecture is reasonable and effective.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CHINERS:面向体育领域的中文命名实体识别系统

在中文命名实体(NE)识别的研究中，我们面临着两个主要的挑战。一个是如何保证分词和词性标注的质量，因为其后果会对网元识别的性能产生不利影响。二是如何灵活、可靠、准确地识别网元。为了应对这些挑战，我们提出了一个分为两个阶段的系统架构。在第一阶段，我们应该尽可能减少导致第二阶段的分词和词性标注错误。为此，我们利用机器学习技术来修复这些错误。在第二阶段，我们设计了有限状态级联(FSC)，它可以根据识别规则集自动构建，作为识别网元的浅解析器。其优点是可靠、准确、易于维护。此外，针对特殊网元的识别，我们制定了相应的识别策略，以提高识别的正确性。实验结果表明，该系统对6种网元的总平均查全率和查准率分别为83%和85%。因此，系统架构合理有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Workshop on Chinese Language Processing

自引率

0.00%

发文量

期刊最新文献

Building a Large Chinese Corpus Annotated with Semantic Dependency A Two-stage Statistical Word Segmentation System for Chinese Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation Chinese Word Segmentation in MSR-NLP Annotating the Propositions in the Penn Chinese Treebank