{"title":"CAT: A Collaborative Annotation Tool for Chinese Genealogy Textual Documents","authors":"Huan Jiang, Zihao Wang, Rongrong Li, Yuwei Peng, Zhiyong Peng, Bin Xu","doi":"10.1109/CSCWD57460.2023.10152659","DOIUrl":null,"url":null,"abstract":"The annotation for Chinese genealogy textual documents is helpful for constructing genealogy knowledge graph, training effective machine learning models for knowledge extraction, etc. However, this kind of documents is difficult to annotate. The primary reason is that the texts are written in both classical and vernacular Chinese. These texts also contain numerous ancient characters and are usually without punctuation. Understanding genealogy texts requires sufficient expertise. When multiple users labeling the same text, conflicts may occur. Existing annotation tools are inappropriate for this work. In this paper, we propose a novel interactive labeling tool, which provides text segmenting, entity and relationship tagging etc. With the annotated information, it is convenient to construct knowledge graph from textual documents, which can be used to analyze Chinese genealogy texts. Furthermore, we introduce a weak supervised mechanism with Hidden Markov Model for collaborative annotating with crowdsourcing. The practice shows that our approach is effective for collaborative annotation. It also facilitates the construction of knowledge graph and obtains more high-quality data sets. At present, this annotation tool has been applied into service.","PeriodicalId":51008,"journal":{"name":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","volume":"18 1","pages":"1043-1048"},"PeriodicalIF":2.0000,"publicationDate":"2023-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Supported Cooperative Work-The Journal of Collaborative Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/CSCWD57460.2023.10152659","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
The annotation for Chinese genealogy textual documents is helpful for constructing genealogy knowledge graph, training effective machine learning models for knowledge extraction, etc. However, this kind of documents is difficult to annotate. The primary reason is that the texts are written in both classical and vernacular Chinese. These texts also contain numerous ancient characters and are usually without punctuation. Understanding genealogy texts requires sufficient expertise. When multiple users labeling the same text, conflicts may occur. Existing annotation tools are inappropriate for this work. In this paper, we propose a novel interactive labeling tool, which provides text segmenting, entity and relationship tagging etc. With the annotated information, it is convenient to construct knowledge graph from textual documents, which can be used to analyze Chinese genealogy texts. Furthermore, we introduce a weak supervised mechanism with Hidden Markov Model for collaborative annotating with crowdsourcing. The practice shows that our approach is effective for collaborative annotation. It also facilitates the construction of knowledge graph and obtains more high-quality data sets. At present, this annotation tool has been applied into service.
期刊介绍:
Computer Supported Cooperative Work (CSCW): The Journal of Collaborative Computing and Work Practices is devoted to innovative research in computer-supported cooperative work (CSCW). It provides an interdisciplinary and international forum for the debate and exchange of ideas concerning theoretical, practical, technical, and social issues in CSCW.
The CSCW Journal arose in response to the growing interest in the design, implementation and use of technical systems (including computing, information, and communications technologies) which support people working cooperatively, and its scope remains to encompass the multifarious aspects of research within CSCW and related areas.
The CSCW Journal focuses on research oriented towards the development of collaborative computing technologies on the basis of studies of actual cooperative work practices (where ‘work’ is used in the wider sense). That is, it welcomes in particular submissions that (a) report on findings from ethnographic or similar kinds of in-depth fieldwork of work practices with a view to their technological implications, (b) report on empirical evaluations of the use of extant or novel technical solutions under real-world conditions, and/or (c) develop technical or conceptual frameworks for practice-oriented computing research based on previous fieldwork and evaluations.