{"title":"Chinese Named Entity Recognition Method Based On Multi-head Attention Enhancing Word Information","authors":"Ting Wang, Songze He","doi":"10.1145/3548608.3559300","DOIUrl":null,"url":null,"abstract":"Chinese named entity recognition (CNER) is one of the important tasks in natural language processing. Unlike the English, Chinese lacks explicit word boundaries. Therefore, many models were designed to address this issue by incorporating word lexicon information into the CNER. However, lots of irrelevant information may be included when matching the entire lexicon for each character. Inspired by the SoftLexicon method, we propose a multi-head attention based model to simplify the introduced lexicon information to generate word-level attention vector. In this method, a word vector matched for each character is first obtained and further weighted by the relevance with the character-level vector to calculate the word-level attention vector. In this way, only the words existing in the sentence are matched, which reduces the scope of word matching. The effectiveness of this method is verified on multiple Chinese datasets.","PeriodicalId":201434,"journal":{"name":"Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 2nd International Conference on Control and Intelligent Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3548608.3559300","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Chinese named entity recognition (CNER) is one of the important tasks in natural language processing. Unlike the English, Chinese lacks explicit word boundaries. Therefore, many models were designed to address this issue by incorporating word lexicon information into the CNER. However, lots of irrelevant information may be included when matching the entire lexicon for each character. Inspired by the SoftLexicon method, we propose a multi-head attention based model to simplify the introduced lexicon information to generate word-level attention vector. In this method, a word vector matched for each character is first obtained and further weighted by the relevance with the character-level vector to calculate the word-level attention vector. In this way, only the words existing in the sentence are matched, which reduces the scope of word matching. The effectiveness of this method is verified on multiple Chinese datasets.