Shujun Wan, Peter Bourgonje, Hongling Xiao, Clara Wan Ching Ho
{"title":"Chinese-DiMLex:汉语话语连接词词典","authors":"Shujun Wan, Peter Bourgonje, Hongling Xiao, Clara Wan Ching Ho","doi":"10.1007/s10579-024-09761-9","DOIUrl":null,"url":null,"abstract":"<p>Machine-readable inventories of connectives that provide information on multiple levels are a useful resource for automated discourse parsing, machine translation, text summarization and argumentation mining, etc. Despite Chinese being one of the world’s most widely spoken languages and having a wealth of annotated corpora, such a lexicon for Chinese still remains absent. In contrast, lexicons for many other languages have long been established. In this paper, we present 226 Chinese discourse connectives, augmented with morphological variations, syntactic (part-of-speech) and semantic (PDBT3.0 sense inventory) information, usage examples and English translations. The resulting lexicon, Chinese-DiMLex, is made publicly available in XML format, and is included in <i>connective-lex.info</i>, a platform specifically designed for human-friendly browsing of connective lexicons across languages. We describe the creation process of the lexicon, and discuss several Chinese-specific considerations and issues arising and discussed in the process. By demonstrating the process, we hope not only to contribute to research and educational purposes, but also to inspire researchers to use our method as a reference for building lexicons for their (native) language(s).</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"7 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2024-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chinese-DiMLex: a lexicon of Chinese discourse connectives\",\"authors\":\"Shujun Wan, Peter Bourgonje, Hongling Xiao, Clara Wan Ching Ho\",\"doi\":\"10.1007/s10579-024-09761-9\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Machine-readable inventories of connectives that provide information on multiple levels are a useful resource for automated discourse parsing, machine translation, text summarization and argumentation mining, etc. Despite Chinese being one of the world’s most widely spoken languages and having a wealth of annotated corpora, such a lexicon for Chinese still remains absent. In contrast, lexicons for many other languages have long been established. In this paper, we present 226 Chinese discourse connectives, augmented with morphological variations, syntactic (part-of-speech) and semantic (PDBT3.0 sense inventory) information, usage examples and English translations. The resulting lexicon, Chinese-DiMLex, is made publicly available in XML format, and is included in <i>connective-lex.info</i>, a platform specifically designed for human-friendly browsing of connective lexicons across languages. We describe the creation process of the lexicon, and discuss several Chinese-specific considerations and issues arising and discussed in the process. By demonstrating the process, we hope not only to contribute to research and educational purposes, but also to inspire researchers to use our method as a reference for building lexicons for their (native) language(s).</p>\",\"PeriodicalId\":49927,\"journal\":{\"name\":\"Language Resources and Evaluation\",\"volume\":\"7 1\",\"pages\":\"\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Language Resources and Evaluation\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s10579-024-09761-9\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10579-024-09761-9","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
摘要
机器可读的连接词目录提供了多层次的信息,是自动话语分析、机器翻译、文本摘要和论证挖掘等方面的有用资源。尽管中文是世界上使用最广泛的语言之一,并且拥有丰富的注释语料库,但这样的中文词典仍然缺失。相比之下,许多其他语言的词典早已建立。在本文中,我们介绍了 226 个汉语话语连接词,并增加了形态变化、句法(语篇)和语义(PDBT3.0 义项库)信息、用法示例和英文翻译。由此产生的词库 Chinese-DiMLex 以 XML 格式公开发布,并收录在 connective-lex.info 中,这是一个专为跨语言连接词词库的人性化浏览而设计的平台。我们描述了词库的创建过程,并讨论了在此过程中出现和讨论的一些中国特有的考虑因素和问题。通过展示这一过程,我们希望不仅能为研究和教育目的做出贡献,而且还能激励研究人员将我们的方法作为建立自己(母语)词典的参考。
Chinese-DiMLex: a lexicon of Chinese discourse connectives
Machine-readable inventories of connectives that provide information on multiple levels are a useful resource for automated discourse parsing, machine translation, text summarization and argumentation mining, etc. Despite Chinese being one of the world’s most widely spoken languages and having a wealth of annotated corpora, such a lexicon for Chinese still remains absent. In contrast, lexicons for many other languages have long been established. In this paper, we present 226 Chinese discourse connectives, augmented with morphological variations, syntactic (part-of-speech) and semantic (PDBT3.0 sense inventory) information, usage examples and English translations. The resulting lexicon, Chinese-DiMLex, is made publicly available in XML format, and is included in connective-lex.info, a platform specifically designed for human-friendly browsing of connective lexicons across languages. We describe the creation process of the lexicon, and discuss several Chinese-specific considerations and issues arising and discussed in the process. By demonstrating the process, we hope not only to contribute to research and educational purposes, but also to inspire researchers to use our method as a reference for building lexicons for their (native) language(s).
期刊介绍:
Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications.
Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use.
Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.