{"title":"Automatic filtration of multiword units","authors":"Y. Liu, Zheng Tie","doi":"10.1109/NLPKE.2010.5587783","DOIUrl":null,"url":null,"abstract":"This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"261 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper studies how to filtrate multiword units. We use normalized expectation (NE) to extract multiword unit candidates from patent corpus. Then the multiword unit candidates are filtrated using stop words, frequency, first stop words, last stop words, and contextual entropy. The experimental result shows that the precision rate of multiword units is improved by 8.7% after filtration.