{"title":"使用多视图变换器进行中文近义词识别","authors":"Yinlong Xiao;Zongcheng Ji;Jianqiang Li;Mei Han","doi":"10.1109/TASLP.2024.3426287","DOIUrl":null,"url":null,"abstract":"Integrating lexical knowledge in Chinese named entity recognition (NER) has been proven effective. Among the existing methods, Flat-LAttice Transformer (FLAT) has achieved great success in both performance and efficiency. FLAT performs lexical enhancement for each sentence by constructing a flat lattice (i.e., a sequence of tokens including the characters in a sentence and the matched words in a lexicon) and calculating self-attention with a fully-connected structure. However, the different interactions between tokens, which can bring different aspects of semantic information for Chinese NER, cannot be well captured by self-attention with a fully-connected structure. In this paper, we propose a novel Multi-View Transformer (MVT) to effectively capture the different interactions between tokens. We first define four views to capture four different token interaction structures. We then construct a view-aware visible matrix for each view according to the corresponding structure and introduce a view-aware dot-product attention for each view to limit the attention scope by incorporating the corresponding visible matrix. Finally, we design three different MVT variants to fuse the multi-view features at different levels of the Transformer architecture. Experimental results conducted on four public Chinese NER datasets show the effectiveness of the proposed method. Specifically, on the most challenging dataset Weibo, which is in an informal text style, MVT outperforms FLAT in F1 score by 2.56%, and when combined with BERT, MVT outperforms FLAT in F1 score by 3.03%.","PeriodicalId":13332,"journal":{"name":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","volume":"32 ","pages":"3656-3668"},"PeriodicalIF":4.1000,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MVT: Chinese NER Using Multi-View Transformer\",\"authors\":\"Yinlong Xiao;Zongcheng Ji;Jianqiang Li;Mei Han\",\"doi\":\"10.1109/TASLP.2024.3426287\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrating lexical knowledge in Chinese named entity recognition (NER) has been proven effective. Among the existing methods, Flat-LAttice Transformer (FLAT) has achieved great success in both performance and efficiency. FLAT performs lexical enhancement for each sentence by constructing a flat lattice (i.e., a sequence of tokens including the characters in a sentence and the matched words in a lexicon) and calculating self-attention with a fully-connected structure. However, the different interactions between tokens, which can bring different aspects of semantic information for Chinese NER, cannot be well captured by self-attention with a fully-connected structure. In this paper, we propose a novel Multi-View Transformer (MVT) to effectively capture the different interactions between tokens. We first define four views to capture four different token interaction structures. We then construct a view-aware visible matrix for each view according to the corresponding structure and introduce a view-aware dot-product attention for each view to limit the attention scope by incorporating the corresponding visible matrix. Finally, we design three different MVT variants to fuse the multi-view features at different levels of the Transformer architecture. Experimental results conducted on four public Chinese NER datasets show the effectiveness of the proposed method. Specifically, on the most challenging dataset Weibo, which is in an informal text style, MVT outperforms FLAT in F1 score by 2.56%, and when combined with BERT, MVT outperforms FLAT in F1 score by 3.03%.\",\"PeriodicalId\":13332,\"journal\":{\"name\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"volume\":\"32 \",\"pages\":\"3656-3668\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2024-07-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE/ACM Transactions on Audio, Speech, and Language Processing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10592817/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ACOUSTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Audio, Speech, and Language Processing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10592817/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ACOUSTICS","Score":null,"Total":0}
引用次数: 0
摘要
在中文命名实体识别(NER)中整合词汇知识已被证明是有效的。在现有方法中,扁平格转换器(FLAT)在性能和效率方面都取得了巨大成功。FLAT 通过构建平面网格(即包括句子中的字符和词库中的匹配词在内的标记序列)和计算具有全连接结构的自注意力来对每个句子进行词性增强。然而,完全连接结构的自注意力无法很好地捕捉到标记之间的不同交互作用,而这些交互作用会为中文 NER 带来不同方面的语义信息。在本文中,我们提出了一种新颖的多视图转换器(Multi-View Transformer,MVT),以有效捕捉标记之间的不同交互。我们首先定义了四种视图,以捕捉四种不同的标记交互结构。然后,我们根据相应的结构为每个视图构建一个视图感知可见矩阵,并为每个视图引入一个视图感知点积注意力,通过结合相应的可见矩阵来限制注意力范围。最后,我们设计了三种不同的 MVT 变体,在 Transformer 架构的不同层次上融合多视图特征。在四个公开的中文 NER 数据集上进行的实验结果表明了所提方法的有效性。具体来说,在最具挑战性的非正式文本数据集微博上,MVT 的 F1 得分比 FLAT 高出 2.56%,如果与 BERT 结合使用,MVT 的 F1 得分比 FLAT 高出 3.03%。
Integrating lexical knowledge in Chinese named entity recognition (NER) has been proven effective. Among the existing methods, Flat-LAttice Transformer (FLAT) has achieved great success in both performance and efficiency. FLAT performs lexical enhancement for each sentence by constructing a flat lattice (i.e., a sequence of tokens including the characters in a sentence and the matched words in a lexicon) and calculating self-attention with a fully-connected structure. However, the different interactions between tokens, which can bring different aspects of semantic information for Chinese NER, cannot be well captured by self-attention with a fully-connected structure. In this paper, we propose a novel Multi-View Transformer (MVT) to effectively capture the different interactions between tokens. We first define four views to capture four different token interaction structures. We then construct a view-aware visible matrix for each view according to the corresponding structure and introduce a view-aware dot-product attention for each view to limit the attention scope by incorporating the corresponding visible matrix. Finally, we design three different MVT variants to fuse the multi-view features at different levels of the Transformer architecture. Experimental results conducted on four public Chinese NER datasets show the effectiveness of the proposed method. Specifically, on the most challenging dataset Weibo, which is in an informal text style, MVT outperforms FLAT in F1 score by 2.56%, and when combined with BERT, MVT outperforms FLAT in F1 score by 3.03%.
期刊介绍:
The IEEE/ACM Transactions on Audio, Speech, and Language Processing covers audio, speech and language processing and the sciences that support them. In audio processing: transducers, room acoustics, active sound control, human audition, analysis/synthesis/coding of music, and consumer audio. In speech processing: areas such as speech analysis, synthesis, coding, speech and speaker recognition, speech production and perception, and speech enhancement. In language processing: speech and text analysis, understanding, generation, dialog management, translation, summarization, question answering and document indexing and retrieval, as well as general language modeling.