{"title":"基于分布信息的文本分类特征选择新方法","authors":"Nianyun Shi, Lingling Liu","doi":"10.1109/PIC.2010.5687404","DOIUrl":null,"url":null,"abstract":"Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature's distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature's distribution characteristics and contribution to categorization. In addition, two kinds of algorithms are designed to implement DIFS. Experiments are carried out on a Chinese corpus and by comparison the proposed approach shows a better performance.","PeriodicalId":142910,"journal":{"name":"2010 IEEE International Conference on Progress in Informatics and Computing","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A new feature selection method based on distributional information for Text Classification\",\"authors\":\"Nianyun Shi, Lingling Liu\",\"doi\":\"10.1109/PIC.2010.5687404\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature's distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature's distribution characteristics and contribution to categorization. In addition, two kinds of algorithms are designed to implement DIFS. Experiments are carried out on a Chinese corpus and by comparison the proposed approach shows a better performance.\",\"PeriodicalId\":142910,\"journal\":{\"name\":\"2010 IEEE International Conference on Progress in Informatics and Computing\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Progress in Informatics and Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PIC.2010.5687404\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Progress in Informatics and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PIC.2010.5687404","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A new feature selection method based on distributional information for Text Classification
Feature Selection (FS) is one of the most important issues in Text Classification (TC). A good feature selection can improve the efficiency and accuracy of a text classifier. Based on the analysis of the feature's distributional information, this paper presents a feature selection method named DIFS. In DIFS a new estimation mechanism is proposed to measure the relevance between feature's distribution characteristics and contribution to categorization. In addition, two kinds of algorithms are designed to implement DIFS. Experiments are carried out on a Chinese corpus and by comparison the proposed approach shows a better performance.