{"title":"基于位置信息和词频的中文命名实体识别","authors":"Zhibo Chen, Jun-Shon Huang, Ya Wang","doi":"10.1109/ICACTE55855.2022.9943631","DOIUrl":null,"url":null,"abstract":"In view of the problem of error propagation caused by the traditional word segmentation error, and the insufficient application of the word vector in the Chinese named entity recognition, this paper proposes an improvement method based on location information and word frequency. Enter it in terms of characters, match the single character and the vocabulary in the vocabulary bank, classify the matched vocabulary according to the position of the characters in the vocabulary, calculate the weight according to the vocabulary of different positions, fuse it with each word vector, and finally join it with the word vector. The splicing results were taken as input to the bidirectional long and short-term network and finally decoded through a conditional random field. Simulations performed on the People’s Daily dataset achieved 95.80% F1 values, better than BiLSTM-CRF, BiLSTM-CNN, etc.","PeriodicalId":165068,"journal":{"name":"2022 15th International Conference on Advanced Computer Theory and Engineering (ICACTE)","volume":"02 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Chinese Named Entity Recognition Based on Location Information and Word Frequency\",\"authors\":\"Zhibo Chen, Jun-Shon Huang, Ya Wang\",\"doi\":\"10.1109/ICACTE55855.2022.9943631\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In view of the problem of error propagation caused by the traditional word segmentation error, and the insufficient application of the word vector in the Chinese named entity recognition, this paper proposes an improvement method based on location information and word frequency. Enter it in terms of characters, match the single character and the vocabulary in the vocabulary bank, classify the matched vocabulary according to the position of the characters in the vocabulary, calculate the weight according to the vocabulary of different positions, fuse it with each word vector, and finally join it with the word vector. The splicing results were taken as input to the bidirectional long and short-term network and finally decoded through a conditional random field. Simulations performed on the People’s Daily dataset achieved 95.80% F1 values, better than BiLSTM-CRF, BiLSTM-CNN, etc.\",\"PeriodicalId\":165068,\"journal\":{\"name\":\"2022 15th International Conference on Advanced Computer Theory and Engineering (ICACTE)\",\"volume\":\"02 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 15th International Conference on Advanced Computer Theory and Engineering (ICACTE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICACTE55855.2022.9943631\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 15th International Conference on Advanced Computer Theory and Engineering (ICACTE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICACTE55855.2022.9943631","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Chinese Named Entity Recognition Based on Location Information and Word Frequency
In view of the problem of error propagation caused by the traditional word segmentation error, and the insufficient application of the word vector in the Chinese named entity recognition, this paper proposes an improvement method based on location information and word frequency. Enter it in terms of characters, match the single character and the vocabulary in the vocabulary bank, classify the matched vocabulary according to the position of the characters in the vocabulary, calculate the weight according to the vocabulary of different positions, fuse it with each word vector, and finally join it with the word vector. The splicing results were taken as input to the bidirectional long and short-term network and finally decoded through a conditional random field. Simulations performed on the People’s Daily dataset achieved 95.80% F1 values, better than BiLSTM-CRF, BiLSTM-CNN, etc.