{"title":"Automatic construction of a dictionary of variant forms of Chinese characters","authors":"X. Shi","doi":"10.1075/cld.21037.shi","DOIUrl":null,"url":null,"abstract":"\n Many Chinese characters have more than one form of writing owing to complex nature of creation and long evolvement\n history of writing. Most existing Chinese dictionaries list these variant forms but do not explain in a systematic way why a\n specific character is a variant form of another, and only list a few older key bibliographies, many of which are themselves\n dictionaries of various forms. In this article we present a new theory and practice of how to determine whether a Chinese\n character is a variant of another, and show how we can deduce a dictionary of variant characters automatically from a corpus of\n ancient Chinese texts totaling 2.3 billion characters with artificial intelligence techniques. Results show that in over 74,000\n instances of identified variant character groups, more than 20,000 new instances are found by our algorithm. We have then compiled\n all the instances into a dictionary and call it Dictionary of Chinese Variant Words (異體字詞典, Yiti Zi Cidian). The key insight of our theory\n is to find synonymous words with variant characters. The dictionary has already been put online for several years and everyone can\n freely access and edit it like the way they do on Wikipedia.","PeriodicalId":42144,"journal":{"name":"Chinese Language and Discourse","volume":"93 1","pages":""},"PeriodicalIF":0.3000,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Language and Discourse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1075/cld.21037.shi","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
Many Chinese characters have more than one form of writing owing to complex nature of creation and long evolvement
history of writing. Most existing Chinese dictionaries list these variant forms but do not explain in a systematic way why a
specific character is a variant form of another, and only list a few older key bibliographies, many of which are themselves
dictionaries of various forms. In this article we present a new theory and practice of how to determine whether a Chinese
character is a variant of another, and show how we can deduce a dictionary of variant characters automatically from a corpus of
ancient Chinese texts totaling 2.3 billion characters with artificial intelligence techniques. Results show that in over 74,000
instances of identified variant character groups, more than 20,000 new instances are found by our algorithm. We have then compiled
all the instances into a dictionary and call it Dictionary of Chinese Variant Words (異體字詞典, Yiti Zi Cidian). The key insight of our theory
is to find synonymous words with variant characters. The dictionary has already been put online for several years and everyone can
freely access and edit it like the way they do on Wikipedia.