Xiaosu Wang, Yun Xiong, Hao Niu, Jingwen Yue, Yangyong Zhu, Philip S. Yu
{"title":"Improving Chinese Character Representation with Formation Graph Attention Network","authors":"Xiaosu Wang, Yun Xiong, Hao Niu, Jingwen Yue, Yangyong Zhu, Philip S. Yu","doi":"10.1145/3459637.3482265","DOIUrl":null,"url":null,"abstract":"Chinese characters are often composed of subcharacter components which are also semantically informative, and the component-level internal semantic features of a Chinese character inherently bring with additional information that benefits the semantic representation of the character. Therefore, there have been several studies that utilized subcharacter component information (e.g. radical, fine-grained components and stroke n-grams) to improve Chinese character representation. However we argue that it has not been fully explored what would be the best way of modeling and encoding a Chinese character. For improving the representation of a Chinese character, existing methods introduce more component-level internal semantic features as well as more semantic irrelevant subcharacter component information, and these semantic irrelevant subcharacter component will be noisy for representing a Chinese character. Moreover, existing methods suffer from the inability of discriminating the importance of the introduced subcharacter components, accordingly they can not filter out introduced noisy subcharacter component information. In this paper, we first decompose Chinese characters into components according to their formations, then model a Chinese character and its decomposed components as a graph structure named Chinese character formation graph; Chinese character formation graph can reserve the azimuth relationship among subcharacter components, and be advantageous to explicitly model the component-level internal semantic features of a Chinese character. Furtherly, we propose a novel model Chinese Character Formation Graph Attention Network (FGAT) which is able to discriminate the importance of the introduced subcharacter components and extract component-level internal semantic features of a Chinese character efficiently. To demonstrate the effectiveness of our research, we have conducted extensive experiments. The experimental results show that our model achieves better results than state-of-the-art (SOTA) approaches.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Chinese characters are often composed of subcharacter components which are also semantically informative, and the component-level internal semantic features of a Chinese character inherently bring with additional information that benefits the semantic representation of the character. Therefore, there have been several studies that utilized subcharacter component information (e.g. radical, fine-grained components and stroke n-grams) to improve Chinese character representation. However we argue that it has not been fully explored what would be the best way of modeling and encoding a Chinese character. For improving the representation of a Chinese character, existing methods introduce more component-level internal semantic features as well as more semantic irrelevant subcharacter component information, and these semantic irrelevant subcharacter component will be noisy for representing a Chinese character. Moreover, existing methods suffer from the inability of discriminating the importance of the introduced subcharacter components, accordingly they can not filter out introduced noisy subcharacter component information. In this paper, we first decompose Chinese characters into components according to their formations, then model a Chinese character and its decomposed components as a graph structure named Chinese character formation graph; Chinese character formation graph can reserve the azimuth relationship among subcharacter components, and be advantageous to explicitly model the component-level internal semantic features of a Chinese character. Furtherly, we propose a novel model Chinese Character Formation Graph Attention Network (FGAT) which is able to discriminate the importance of the introduced subcharacter components and extract component-level internal semantic features of a Chinese character efficiently. To demonstrate the effectiveness of our research, we have conducted extensive experiments. The experimental results show that our model achieves better results than state-of-the-art (SOTA) approaches.