汉语动词识别:从规范到实现

Binggong Ding, C. Huang, Degen Huang
{"title":"汉语动词识别:从规范到实现","authors":"Binggong Ding, C. Huang, Degen Huang","doi":"10.30019/IJCLCLP.200503.0004","DOIUrl":null,"url":null,"abstract":"Main verb identification is the task of automatically identifying the predicate-verb in a sentence. It is useful for many applications in Chinese Natural Language Processing. Although most studies have focused on the model used to identify the main verb, the definition of the main verb should not be overlooked. In our specification design, we have found many complicated issues that still need to be resolved since they haven't been well discussed in previous works. Thus, the first novel aspect of our work is that we carefully design a specification for annotating the main verb and investigate various complicated cases. We hope this discussion will help to uncover the difficulties involved in this problem. Secondly, we present an approach to realizing main verb identification based on the use of chunk information, which leads to better results than the approach based on part-of-speech. Finally, based on careful observation of the studied corpus, we propose new local and contextual features for main verb identification. According to our specification, we annotate a corpus and then use a Support Vector Machine (SVM) to integrate all the features we propose. Our model, which was trained on our annotated corpus, achieved a promising F score of 92.8%. Furthermore, we show that main verb identification can improve the performance of the Chinese Sentence Breaker, one of the applications of main verb identification, by 2.4%.","PeriodicalId":436300,"journal":{"name":"Int. J. Comput. Linguistics Chin. Lang. Process.","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Chinese Main Verb Identification: From Specification to Realization\",\"authors\":\"Binggong Ding, C. Huang, Degen Huang\",\"doi\":\"10.30019/IJCLCLP.200503.0004\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Main verb identification is the task of automatically identifying the predicate-verb in a sentence. It is useful for many applications in Chinese Natural Language Processing. Although most studies have focused on the model used to identify the main verb, the definition of the main verb should not be overlooked. In our specification design, we have found many complicated issues that still need to be resolved since they haven't been well discussed in previous works. Thus, the first novel aspect of our work is that we carefully design a specification for annotating the main verb and investigate various complicated cases. We hope this discussion will help to uncover the difficulties involved in this problem. Secondly, we present an approach to realizing main verb identification based on the use of chunk information, which leads to better results than the approach based on part-of-speech. Finally, based on careful observation of the studied corpus, we propose new local and contextual features for main verb identification. According to our specification, we annotate a corpus and then use a Support Vector Machine (SVM) to integrate all the features we propose. Our model, which was trained on our annotated corpus, achieved a promising F score of 92.8%. Furthermore, we show that main verb identification can improve the performance of the Chinese Sentence Breaker, one of the applications of main verb identification, by 2.4%.\",\"PeriodicalId\":436300,\"journal\":{\"name\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Comput. Linguistics Chin. Lang. Process.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.30019/IJCLCLP.200503.0004\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Comput. Linguistics Chin. Lang. Process.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.30019/IJCLCLP.200503.0004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

摘要

主谓识别是对句子中谓语动词进行自动识别的任务。该方法在汉语自然语言处理中具有广泛的应用价值。虽然大多数研究都集中在识别主动词的模型上,但主动词的定义也不容忽视。在我们的规范设计中,我们发现许多复杂的问题仍然需要解决,因为它们在以前的工作中没有得到很好的讨论。因此,我们工作的第一个新颖方面是,我们仔细设计了一个注释主动词的规范,并研究了各种复杂的情况。我们希望这次讨论将有助于揭示这个问题所涉及的困难。其次,我们提出了一种基于块信息的主动词识别方法,该方法比基于词性的方法具有更好的识别效果。最后,在仔细观察所研究的语料库的基础上,我们提出了新的局部和语境特征来识别主动词。根据我们的规范,我们对语料库进行标注,然后使用支持向量机(SVM)对我们提出的所有特征进行集成。我们的模型在我们标注的语料库上进行了训练,获得了92.8%的F分。此外,我们发现主动词识别可以使汉语断句(主动词识别的应用之一)的性能提高2.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Chinese Main Verb Identification: From Specification to Realization
Main verb identification is the task of automatically identifying the predicate-verb in a sentence. It is useful for many applications in Chinese Natural Language Processing. Although most studies have focused on the model used to identify the main verb, the definition of the main verb should not be overlooked. In our specification design, we have found many complicated issues that still need to be resolved since they haven't been well discussed in previous works. Thus, the first novel aspect of our work is that we carefully design a specification for annotating the main verb and investigate various complicated cases. We hope this discussion will help to uncover the difficulties involved in this problem. Secondly, we present an approach to realizing main verb identification based on the use of chunk information, which leads to better results than the approach based on part-of-speech. Finally, based on careful observation of the studied corpus, we propose new local and contextual features for main verb identification. According to our specification, we annotate a corpus and then use a Support Vector Machine (SVM) to integrate all the features we propose. Our model, which was trained on our annotated corpus, achieved a promising F score of 92.8%. Furthermore, we show that main verb identification can improve the performance of the Chinese Sentence Breaker, one of the applications of main verb identification, by 2.4%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enriching Cold Start Personalized Language Model Using Social Network Information Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars TQDL: Integrated Models for Cross-Language Document Retrieval Evaluation of TTS Systems in Intelligibility and Comprehension Tasks: a Case Study of HTS-2008 and Multisyn Synthesizers Effects of Combining Bilingual and Collocational Information on Translation of English and Chinese Verb-Noun Pairs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1