用构象图注意网络改进汉字表示

Xiaosu Wang, Yun Xiong, Hao Niu, Jingwen Yue, Yangyong Zhu, Philip S. Yu
{"title":"用构象图注意网络改进汉字表示","authors":"Xiaosu Wang, Yun Xiong, Hao Niu, Jingwen Yue, Yangyong Zhu, Philip S. Yu","doi":"10.1145/3459637.3482265","DOIUrl":null,"url":null,"abstract":"Chinese characters are often composed of subcharacter components which are also semantically informative, and the component-level internal semantic features of a Chinese character inherently bring with additional information that benefits the semantic representation of the character. Therefore, there have been several studies that utilized subcharacter component information (e.g. radical, fine-grained components and stroke n-grams) to improve Chinese character representation. However we argue that it has not been fully explored what would be the best way of modeling and encoding a Chinese character. For improving the representation of a Chinese character, existing methods introduce more component-level internal semantic features as well as more semantic irrelevant subcharacter component information, and these semantic irrelevant subcharacter component will be noisy for representing a Chinese character. Moreover, existing methods suffer from the inability of discriminating the importance of the introduced subcharacter components, accordingly they can not filter out introduced noisy subcharacter component information. In this paper, we first decompose Chinese characters into components according to their formations, then model a Chinese character and its decomposed components as a graph structure named Chinese character formation graph; Chinese character formation graph can reserve the azimuth relationship among subcharacter components, and be advantageous to explicitly model the component-level internal semantic features of a Chinese character. Furtherly, we propose a novel model Chinese Character Formation Graph Attention Network (FGAT) which is able to discriminate the importance of the introduced subcharacter components and extract component-level internal semantic features of a Chinese character efficiently. To demonstrate the effectiveness of our research, we have conducted extensive experiments. The experimental results show that our model achieves better results than state-of-the-art (SOTA) approaches.","PeriodicalId":405296,"journal":{"name":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","volume":"117 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Improving Chinese Character Representation with Formation Graph Attention Network\",\"authors\":\"Xiaosu Wang, Yun Xiong, Hao Niu, Jingwen Yue, Yangyong Zhu, Philip S. Yu\",\"doi\":\"10.1145/3459637.3482265\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Chinese characters are often composed of subcharacter components which are also semantically informative, and the component-level internal semantic features of a Chinese character inherently bring with additional information that benefits the semantic representation of the character. Therefore, there have been several studies that utilized subcharacter component information (e.g. radical, fine-grained components and stroke n-grams) to improve Chinese character representation. However we argue that it has not been fully explored what would be the best way of modeling and encoding a Chinese character. For improving the representation of a Chinese character, existing methods introduce more component-level internal semantic features as well as more semantic irrelevant subcharacter component information, and these semantic irrelevant subcharacter component will be noisy for representing a Chinese character. Moreover, existing methods suffer from the inability of discriminating the importance of the introduced subcharacter components, accordingly they can not filter out introduced noisy subcharacter component information. In this paper, we first decompose Chinese characters into components according to their formations, then model a Chinese character and its decomposed components as a graph structure named Chinese character formation graph; Chinese character formation graph can reserve the azimuth relationship among subcharacter components, and be advantageous to explicitly model the component-level internal semantic features of a Chinese character. Furtherly, we propose a novel model Chinese Character Formation Graph Attention Network (FGAT) which is able to discriminate the importance of the introduced subcharacter components and extract component-level internal semantic features of a Chinese character efficiently. To demonstrate the effectiveness of our research, we have conducted extensive experiments. The experimental results show that our model achieves better results than state-of-the-art (SOTA) approaches.\",\"PeriodicalId\":405296,\"journal\":{\"name\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"volume\":\"117 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th ACM International Conference on Information & Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3459637.3482265\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th ACM International Conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3459637.3482265","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

汉字通常由具有语义信息的子字符组成,并且汉字的组件级内部语义特征固有地带来了额外的信息,这些信息有利于汉字的语义表示。因此,已有一些研究利用子字符成分信息(如根号成分、细粒度成分和笔画n图)来改善汉字表示。然而,我们认为,它还没有充分探讨什么是最好的方式建模和编码一个汉字。为了提高汉字的表示能力,现有的方法引入了更多组件级的内部语义特征和更多语义无关的子字符组件信息,这些语义无关的子字符组件将对汉字的表示产生噪声。此外,现有的方法无法区分引入的子字符分量的重要性,因此无法滤除引入的噪声子字符分量信息。本文首先根据汉字的构象将汉字分解成构件,然后将汉字及其分解构件建模为一个图形结构,称为汉字构象图;汉字构象图保留了子汉字成分之间的方位关系,有利于对汉字成分级内部语义特征进行显式建模。在此基础上,我们提出了一种新的汉字形成图注意网络(FGAT)模型,该模型能够有效地识别引入的子字符成分的重要性,并有效地提取汉字成分级的内部语义特征。为了证明我们研究的有效性,我们进行了大量的实验。实验结果表明,我们的模型比最先进的(SOTA)方法取得了更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Improving Chinese Character Representation with Formation Graph Attention Network
Chinese characters are often composed of subcharacter components which are also semantically informative, and the component-level internal semantic features of a Chinese character inherently bring with additional information that benefits the semantic representation of the character. Therefore, there have been several studies that utilized subcharacter component information (e.g. radical, fine-grained components and stroke n-grams) to improve Chinese character representation. However we argue that it has not been fully explored what would be the best way of modeling and encoding a Chinese character. For improving the representation of a Chinese character, existing methods introduce more component-level internal semantic features as well as more semantic irrelevant subcharacter component information, and these semantic irrelevant subcharacter component will be noisy for representing a Chinese character. Moreover, existing methods suffer from the inability of discriminating the importance of the introduced subcharacter components, accordingly they can not filter out introduced noisy subcharacter component information. In this paper, we first decompose Chinese characters into components according to their formations, then model a Chinese character and its decomposed components as a graph structure named Chinese character formation graph; Chinese character formation graph can reserve the azimuth relationship among subcharacter components, and be advantageous to explicitly model the component-level internal semantic features of a Chinese character. Furtherly, we propose a novel model Chinese Character Formation Graph Attention Network (FGAT) which is able to discriminate the importance of the introduced subcharacter components and extract component-level internal semantic features of a Chinese character efficiently. To demonstrate the effectiveness of our research, we have conducted extensive experiments. The experimental results show that our model achieves better results than state-of-the-art (SOTA) approaches.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
UltraGCN Fine and Coarse Granular Argument Classification before Clustering CHASE Crawler Detection in Location-Based Services Using Attributed Action Net Failure Prediction for Large-scale Water Pipe Networks Using GNN and Temporal Failure Series
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1