Yujun Zhou, Bo Xu, Jiaming Xu, Lei Yang, Changliang Li, Bo Xu
{"title":"Compositional Recurrent Neural Networks for Chinese Short Text Classification","authors":"Yujun Zhou, Bo Xu, Jiaming Xu, Lei Yang, Changliang Li, Bo Xu","doi":"10.1109/WI.2016.0029","DOIUrl":null,"url":null,"abstract":"Word segmentation is the first step in Chinese natural language processing, and the error caused by word segmentation can be transmitted to the whole system. In order to reduce the impact of word segmentation and improve the overall performance of Chinese short text classification system, we propose a hybrid model of character-level and word-level features based on recurrent neural network (RNN) with long short-term memory (LSTM). By integrating character-level feature into word-level feature, the missing semantic information by the error of word segmentation will be constructed, meanwhile the wrong semantic relevance will be reduced. The final feature representation is that it suppressed the error of word segmentation in the case of maintaining most of the semantic features of the sentence. The whole model is finally trained end-to-end with supervised Chinese short text classification task. Results demonstrate that the proposed model in this paper is able to represent Chinese short text effectively, and the performances of 32-class and 5-class categorization outperform some remarkable methods.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"7 1","pages":"137-144"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"52","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 52
Abstract
Word segmentation is the first step in Chinese natural language processing, and the error caused by word segmentation can be transmitted to the whole system. In order to reduce the impact of word segmentation and improve the overall performance of Chinese short text classification system, we propose a hybrid model of character-level and word-level features based on recurrent neural network (RNN) with long short-term memory (LSTM). By integrating character-level feature into word-level feature, the missing semantic information by the error of word segmentation will be constructed, meanwhile the wrong semantic relevance will be reduced. The final feature representation is that it suppressed the error of word segmentation in the case of maintaining most of the semantic features of the sentence. The whole model is finally trained end-to-end with supervised Chinese short text classification task. Results demonstrate that the proposed model in this paper is able to represent Chinese short text effectively, and the performances of 32-class and 5-class categorization outperform some remarkable methods.