{"title":"An Efficient Feature Selection Using Hidden Topic in Text Categorization","authors":"Zhiwei Zhang, X. Phan, S. Horiguchi","doi":"10.1109/WAINA.2008.137","DOIUrl":null,"url":null,"abstract":"Text categorization is an important research area in information retrieval. In order to save the storage space and get better accuracy, efficient and effective feature selection methods for reducing the data before analysis are highly desired. Usually, researches on feature selection use only a proper measurement such as information gain. In this paper, we propose a new feature selection method by adopting an attractive hidden topic analysis and entropy-based feature ranking. Experiments dealing with the well-known Reuters-21578 and Ohsumed datasets show that our method can achieve a better classification accuracy while reducing the feature dimension dramatically.","PeriodicalId":170418,"journal":{"name":"22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"18","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAINA.2008.137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 18
Abstract
Text categorization is an important research area in information retrieval. In order to save the storage space and get better accuracy, efficient and effective feature selection methods for reducing the data before analysis are highly desired. Usually, researches on feature selection use only a proper measurement such as information gain. In this paper, we propose a new feature selection method by adopting an attractive hidden topic analysis and entropy-based feature ranking. Experiments dealing with the well-known Reuters-21578 and Ohsumed datasets show that our method can achieve a better classification accuracy while reducing the feature dimension dramatically.