{"title":"A Construction Engineering Domain New Word Detection Method with the Combination of BiLSTM-CRF and Information Entropy","authors":"Ling Sun, Jing Wan, Lidong Xing","doi":"10.1109/ISKE47853.2019.9170377","DOIUrl":null,"url":null,"abstract":"The study of new word detection is of great significance of the improvement on the performance of Chinese natural language processing tasks. To solve the problem of the inconsistency of coarse-grained long-word boundaries and the detection of compound words in detection of new words, a new word detection method with the combination of BiLSTM-CRF and information entropy(IE) is proposed. First, BiLSTM model extracts candidate new words. Then, information entropy splicing candidate new words to redefine word boundaries. The BiLSTM model could effectively utilize context information, CRF could consider the relationship between adjacent labels, realizing sentence horizontal sequence labeling, which could solve the problem that some compound words and long words are difficult to identify. The results of experiment show that our model achieves better performance on construction engineering datasets.","PeriodicalId":399084,"journal":{"name":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","volume":"315 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 14th International Conference on Intelligent Systems and Knowledge Engineering (ISKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISKE47853.2019.9170377","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The study of new word detection is of great significance of the improvement on the performance of Chinese natural language processing tasks. To solve the problem of the inconsistency of coarse-grained long-word boundaries and the detection of compound words in detection of new words, a new word detection method with the combination of BiLSTM-CRF and information entropy(IE) is proposed. First, BiLSTM model extracts candidate new words. Then, information entropy splicing candidate new words to redefine word boundaries. The BiLSTM model could effectively utilize context information, CRF could consider the relationship between adjacent labels, realizing sentence horizontal sequence labeling, which could solve the problem that some compound words and long words are difficult to identify. The results of experiment show that our model achieves better performance on construction engineering datasets.