{"title":"An Improved Chinese Named Entity Recognition Method with TB-LSTM-CRF","authors":"Jiazheng Li, Tao Wang, Weiwen Zhang","doi":"10.1145/3421515.3421534","DOIUrl":null,"url":null,"abstract":"Owing to the lack of natural delimiters, Chinese named entity recognition (NER) is more challenging than it in English. While Chinese word segmentation (CWS) is generally regarded as key and open problem for Chinese NER, its accuracy is critical for the downstream models trainings and it also often suffers from out-of-vocabulary (OOV). In this paper, we propose an improved Chinese NER model called TB-LSTM-CRF, which introduces a Transformer Block on top of LSTM-CRF. The proposed model with Transformer Block exploits the self-attention mechanism to capture the information from adjacent characters and sentence contexts. It is more practical with using small-size character embeddings. Compared with the baseline using LSTM-CRF, experiment results show our method TB-LSTM-CRF is competitive without the need of any external resources, for instance other dictionaries.","PeriodicalId":294293,"journal":{"name":"2020 2nd Symposium on Signal Processing Systems","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd Symposium on Signal Processing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3421515.3421534","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Owing to the lack of natural delimiters, Chinese named entity recognition (NER) is more challenging than it in English. While Chinese word segmentation (CWS) is generally regarded as key and open problem for Chinese NER, its accuracy is critical for the downstream models trainings and it also often suffers from out-of-vocabulary (OOV). In this paper, we propose an improved Chinese NER model called TB-LSTM-CRF, which introduces a Transformer Block on top of LSTM-CRF. The proposed model with Transformer Block exploits the self-attention mechanism to capture the information from adjacent characters and sentence contexts. It is more practical with using small-size character embeddings. Compared with the baseline using LSTM-CRF, experiment results show our method TB-LSTM-CRF is competitive without the need of any external resources, for instance other dictionaries.