首页 > 最新文献

Workshop on Chinese Language Processing最新文献

英文 中文
Annotating the Propositions in the Penn Chinese Treebank 宾大汉语树库命题注释
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119257
Nianwen Xue, Martha Palmer
In this paper, we describe an approach to annotate the propositions in the Penn Chinese Treebank. We describe how diathesis alternation patterns can be used to make coarse sense distinctions for Chinese verbs as a necessary step in annotating the predicate-structure of Chinese verbs. We then discuss the representation scheme we use to label the semantic arguments and adjuncts of the predicates. We discuss several complications for this type of annotation and describe our solutions. We then discuss how a lexical database with predicate-argument structure information can be used to ensure consistent annotation. Finally, we discuss possible applications for this resource.
本文描述了一种对宾夕法尼亚大学中文树库中的命题进行标注的方法。我们描述了如何利用素质交替模式对汉语动词进行粗略的意义区分,作为汉语动词谓语结构注释的必要步骤。然后,我们讨论用于标记谓词的语义参数和附加物的表示方案。我们将讨论这类注释的几个复杂性,并描述我们的解决方案。然后讨论如何使用具有谓词参数结构信息的词法数据库来确保注释的一致性。最后,我们讨论了该资源的可能应用。
{"title":"Annotating the Propositions in the Penn Chinese Treebank","authors":"Nianwen Xue, Martha Palmer","doi":"10.3115/1119250.1119257","DOIUrl":"https://doi.org/10.3115/1119250.1119257","url":null,"abstract":"In this paper, we describe an approach to annotate the propositions in the Penn Chinese Treebank. We describe how diathesis alternation patterns can be used to make coarse sense distinctions for Chinese verbs as a necessary step in annotating the predicate-structure of Chinese verbs. We then discuss the representation scheme we use to label the semantic arguments and adjuncts of the predicates. We discuss several complications for this type of annotation and describe our solutions. We then discuss how a lexical database with predicate-argument structure information can be used to ensure consistent annotation. Finally, we discuss possible applications for this resource.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117232213","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
The semantic Knowledge-base of Contemporary Chinese and Its Applications in WSD 当代汉语语义知识库及其在WSD中的应用
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119266
Hui Wang, Shiwen Yu
The Semantic Knowledge-base of Contemporary Chinese (SKCC) is a large scale Chinese semantic resource developed by the Institute of Computational Linguistics of Peking University. It provides a large amount of semantic information such as semantic hierarchy and collocation features for 66,539 Chinese words and their English counterparts. Its POS and semantic classification represent the latest progress in Chinese linguistics and language engineering. The descriptions of semantic attributes are fairly thorough, comprehensive and authoritative. The paper introduces the outline of SKCC, and indicates that it is effective for word sense disambiguation in MT applications and is likely to be important for general Chinese language processing.
现代汉语语义知识库(SKCC)是北京大学计算语言学研究所开发的大型汉语语义资源。它为66,539个汉语和英语对应词提供了大量的语义信息,如语义层次和搭配特征。它的词性和语义分类代表了汉语语言学和语言工程的最新进展。对语义属性的描述比较透彻、全面和权威。本文介绍了SKCC的概况,指出SKCC在机器翻译应用中的词义消歧是有效的,在一般汉语语言处理中可能具有重要意义。
{"title":"The semantic Knowledge-base of Contemporary Chinese and Its Applications in WSD","authors":"Hui Wang, Shiwen Yu","doi":"10.3115/1119250.1119266","DOIUrl":"https://doi.org/10.3115/1119250.1119266","url":null,"abstract":"The Semantic Knowledge-base of Contemporary Chinese (SKCC) is a large scale Chinese semantic resource developed by the Institute of Computational Linguistics of Peking University. It provides a large amount of semantic information such as semantic hierarchy and collocation features for 66,539 Chinese words and their English counterparts. Its POS and semantic classification represent the latest progress in Chinese linguistics and language engineering. The descriptions of semantic attributes are fairly thorough, comprehensive and authoritative. The paper introduces the outline of SKCC, and indicates that it is effective for word sense disambiguation in MT applications and is likely to be important for general Chinese language processing.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128486806","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Chunking-based Chinese Word Tokenization 基于分组的中文词标记化
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119281
Guodong Zhou
This paper introduces a Chinese word tokenization system through HMM-based chunking. Experiments show that such a system can well deal with the unknown word problem in Chinese word tokenization.
介绍了一种基于hmm分块的汉语词分词系统。实验表明,该系统可以很好地解决中文单词标记中的未知词问题。
{"title":"Chunking-based Chinese Word Tokenization","authors":"Guodong Zhou","doi":"10.3115/1119250.1119281","DOIUrl":"https://doi.org/10.3115/1119250.1119281","url":null,"abstract":"This paper introduces a Chinese word tokenization system through HMM-based chunking. Experiments show that such a system can well deal with the unknown word problem in Chinese word tokenization.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128649444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation 中文分词中重叠歧义消解的无监督训练
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119251
Mu Li, Jianfeng Gao, C. Huang, Jianfeng Li
This paper proposes an unsupervised training approach to resolving overlapping ambiguities in Chinese word segmentation. We present an ensemble of adapted Naive Bayesian classifiers that can be trained using an unlabelled Chinese text corpus. These classifiers differ in that they use context words within windows of different sizes as features. The performance of our approach is evaluated on a manually annotated test set. Experimental results show that the proposed approach achieves an accuracy of 94.3%, rivaling the rule-based and supervised training methods.
本文提出了一种无监督训练方法来解决汉语分词中的重叠歧义问题。我们提出了一个自适应朴素贝叶斯分类器的集合,可以使用未标记的中文文本语料库进行训练。这些分类器的不同之处在于它们使用不同大小窗口内的上下文词作为特征。我们的方法的性能在一个手动标注的测试集上进行了评估。实验结果表明,该方法的准确率为94.3%,与基于规则和监督的训练方法相媲美。
{"title":"Unsupervised Training for Overlapping Ambiguity Resolution in Chinese Word Segmentation","authors":"Mu Li, Jianfeng Gao, C. Huang, Jianfeng Li","doi":"10.3115/1119250.1119251","DOIUrl":"https://doi.org/10.3115/1119250.1119251","url":null,"abstract":"This paper proposes an unsupervised training approach to resolving overlapping ambiguities in Chinese word segmentation. We present an ensemble of adapted Naive Bayesian classifiers that can be trained using an unlabelled Chinese text corpus. These classifiers differ in that they use context words within windows of different sizes as features. The performance of our approach is evaluated on a manually annotated test set. Experimental results show that the proposed approach achieves an accuracy of 94.3%, rivaling the rule-based and supervised training methods.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy 基于双向n图和最大熵的组合方法的话语分割
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119253
Ding Liu, Chengqing Zong
This paper proposes a new approach to segmentation of utterances into sentences using a new linguistic model based upon Maximum-entropy-weighted Bi-directional N-grams. The usual N-gram algorithm searches for sentence boundaries in a text from left to right only. Thus a candidate sentence boundary in the text is evaluated mainly with respect to its left context, without fully considering its right context. Using this approach, utterances are often divided into incomplete sentences or fragments. In order to make use of both the right and left contexts of candidate sentence boundaries, we propose a new linguistic modeling approach based on Maximum-entropy-weighted Bi-directional N-grams. Experimental results indicate that the new approach significantly outperforms the usual N-gram algorithm for segmenting both Chinese and English utterances.
本文提出了一种基于最大熵加权双向n图的语言模型将话语分割成句子的新方法。通常的N-gram算法只从左到右搜索文本中的句子边界。因此,文本中的候选句子边界主要是根据其左上下文来评估的,而没有充分考虑其右上下文。使用这种方法,话语通常被分成不完整的句子或片段。为了同时利用候选句子边界的左右上下文,我们提出了一种基于最大熵加权双向n图的语言建模方法。实验结果表明,该方法在汉语和英语语音分割方面都明显优于常用的N-gram算法。
{"title":"Utterance Segmentation Using Combined Approach Based on Bi-directional N-gram and Maximum Entropy","authors":"Ding Liu, Chengqing Zong","doi":"10.3115/1119250.1119253","DOIUrl":"https://doi.org/10.3115/1119250.1119253","url":null,"abstract":"This paper proposes a new approach to segmentation of utterances into sentences using a new linguistic model based upon Maximum-entropy-weighted Bi-directional N-grams. The usual N-gram algorithm searches for sentence boundaries in a text from left to right only. Thus a candidate sentence boundary in the text is evaluated mainly with respect to its left context, without fully considering its right context. Using this approach, utterances are often divided into incomplete sentences or fragments. In order to make use of both the right and left contexts of candidate sentence boundaries, we propose a new linguistic modeling approach based on Maximum-entropy-weighted Bi-directional N-grams. Experimental results indicate that the new approach significantly outperforms the usual N-gram algorithm for segmenting both Chinese and English utterances.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129373584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A Chinese Efficient Analyser Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing 一种集分词、词性标注、部分句法分析和完全句法分析于一体的汉语高效句法分析器
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119261
Guodong Zhou, Jian Su
This paper introduces an efficient analyser for the Chinese language, which efficiently and effectively integrates word segmentation, part-of-speech tagging, partial parsing and full parsing. The Chinese efficient analyser is based on a Hidden Markov Model (HMM) and an HMM-based tagger. That is, all the components are based on the same HMM-based tagging engine. One advantage of using the same single engine is that it largely decreases the code size and makes the maintenance easy. Another advantage is that it is easy to optimise the code and thus improve the speed while speed plays a critical important role in many applications. Finally, the performances of all the components can benefit from the optimisation of existing algorithms and/or adoption of better algorithms to a single engine. Experiments show that all the components can achieve state-of-art performances with high efficiency for the Chinese language.
本文介绍了一种高效的汉语句法分析器,它将分词、词性标注、部分句法分析和全部句法分析高效地集成在一起。中国高效分析器是基于隐马尔可夫模型(HMM)和基于HMM的标注器。也就是说,所有组件都基于相同的基于hmm的标记引擎。使用相同的单个引擎的一个优点是,它在很大程度上减少了代码大小,使维护变得容易。另一个优点是,它很容易优化代码,从而提高速度,而速度在许多应用程序中起着至关重要的作用。最后,所有组件的性能都可以从现有算法的优化和/或对单个引擎采用更好的算法中受益。实验结果表明,所有组件都能达到汉语语言的最高性能和高效率。
{"title":"A Chinese Efficient Analyser Integrating Word Segmentation, Part-Of-Speech Tagging, Partial Parsing and Full Parsing","authors":"Guodong Zhou, Jian Su","doi":"10.3115/1119250.1119261","DOIUrl":"https://doi.org/10.3115/1119250.1119261","url":null,"abstract":"This paper introduces an efficient analyser for the Chinese language, which efficiently and effectively integrates word segmentation, part-of-speech tagging, partial parsing and full parsing. The Chinese efficient analyser is based on a Hidden Markov Model (HMM) and an HMM-based tagger. That is, all the components are based on the same HMM-based tagging engine. One advantage of using the same single engine is that it largely decreases the code size and makes the maintenance easy. Another advantage is that it is easy to optimise the code and thus improve the speed while speed plays a critical important role in many applications. Finally, the performances of all the components can benefit from the optimisation of existing algorithms and/or adoption of better algorithms to a single engine. Experiments show that all the components can achieve state-of-art performances with high efficiency for the Chinese language.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129215450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Semantic Maps for Word Alignment in Bilingual Parallel Corpora 双语平行语料库中词对齐的语义映射
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119264
Q. Ma, Yujie Zhang, M. Murata, H. Isahara
Effective self-organizing techniques for constructing monolingual semantic maps of Japanese and Chinese have already been developed. By extending the monolingual map to a bilingual semantic map, we have proposed a semantics-based approach for word alignment in a Japanese/Chinese bilingual corpus.
构建日语和汉语单语语义图的有效自组织技术已经得到了发展。通过将单语映射扩展为双语语义映射,我们提出了一种基于语义的日语/汉语双语语料库词对齐方法。
{"title":"Semantic Maps for Word Alignment in Bilingual Parallel Corpora","authors":"Q. Ma, Yujie Zhang, M. Murata, H. Isahara","doi":"10.3115/1119250.1119264","DOIUrl":"https://doi.org/10.3115/1119250.1119264","url":null,"abstract":"Effective self-organizing techniques for constructing monolingual semantic maps of Japanese and Chinese have already been developed. By extending the monolingual map to a bilingual semantic map, we have proposed a semantics-based approach for word alignment in a Japanese/Chinese bilingual corpus.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131201339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Integrating Ngram Model and Case-based Learning for Chinese Word Segmentation 结合Ngram模型和案例学习的中文分词方法
Pub Date : 2003-07-11 DOI: 10.3115/1119250.1119274
C. Kit, Zhiming Xu, J. Webster
This paper presents our recent work for participation in the First International Chinese Word Segmentation Bake-off (ICWSB-1). It is based on a general-purpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model training and disambiguation rule learning, analyze the system's performance, and discuss areas for further improvement, e.g., out-of-vocabulary (OOV) word discovery.
本文介绍了我们最近参加第一届国际汉语分词大赛(ICWSB-1)的工作。它基于通用的分词模型和基于案例的消歧学习方法。该系统在识别词汇中(IV)单词方面表现出色,召回率约为96% -98%。在这里,我们提出了语言模型训练和消歧规则学习的策略,分析了系统的性能,并讨论了进一步改进的领域,例如,词汇外(OOV)单词发现。
{"title":"Integrating Ngram Model and Case-based Learning for Chinese Word Segmentation","authors":"C. Kit, Zhiming Xu, J. Webster","doi":"10.3115/1119250.1119274","DOIUrl":"https://doi.org/10.3115/1119250.1119274","url":null,"abstract":"This paper presents our recent work for participation in the First International Chinese Word Segmentation Bake-off (ICWSB-1). It is based on a general-purpose ngram model for word segmentation and a case-based learning approach to disambiguation. This system excels in identifying in-vocabulary (IV) words, achieving a recall of around 96-98%. Here we present our strategies for language model training and disambiguation rule learning, analyze the system's performance, and discuss areas for further improvement, e.g., out-of-vocabulary (OOV) word discovery.","PeriodicalId":403123,"journal":{"name":"Workshop on Chinese Language Processing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122770018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
Workshop on Chinese Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1