{"title":"Study on Feature Selection and Weighting Based on Synonym Merge in Text Categorization","authors":"Zhenyu Lu, Yongmin Liu, Shuang Zhao, Xuebin Chen","doi":"10.1109/ICFN.2010.70","DOIUrl":null,"url":null,"abstract":"Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term’s strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems.","PeriodicalId":185491,"journal":{"name":"2010 Second International Conference on Future Networks","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Future Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFN.2010.70","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Feature selection and weighting is one of the key problem in text categorization. The chief obstacles to feature selection are noise and sparseness. This paper presents an approach of Chinese text feature selection and weighting based on semantic statistics. First, we use synonymous concepts to extract feature values in text based on Thesaurus which names TongYiCi CiLin. Then, we introduce a new weight function based on term frequency and entropy, which adjusts the effect of the feature term in the classifier according to the feature term’s strength. Experiments show that our method is much better than kinds of traditional feature selection methods and it improve the performance of text categorization systems.