{"title":"Use relative weight to improve the kNN for unbalanced text category","authors":"Xiaodong Liu, F. Ren, Caixia Yuan","doi":"10.1109/NLPKE.2010.5587799","DOIUrl":null,"url":null,"abstract":"The technology of text category is widely used in natural language processing. As one of best text category algorithms, kNN is very popular used in many applications. Traditional kNN assumes that the distribution of training data is even, however, it is not the case for many situations. When we used kNN in our Topic Detection and Tracking (TDT) system, it did not perform well due to the bias of training data set. To overcome the obstacle caused by data bias, this paper proposes an approach which uses relative weight to adjust the weight of kNN (RWKNN). When evaluated on the data of TDT2 and TDT3 Chinese corpus, RWKNN proves to be robust on unbalanced data and yields better performance than the traditional kNN.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"141 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587799","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The technology of text category is widely used in natural language processing. As one of best text category algorithms, kNN is very popular used in many applications. Traditional kNN assumes that the distribution of training data is even, however, it is not the case for many situations. When we used kNN in our Topic Detection and Tracking (TDT) system, it did not perform well due to the bias of training data set. To overcome the obstacle caused by data bias, this paper proposes an approach which uses relative weight to adjust the weight of kNN (RWKNN). When evaluated on the data of TDT2 and TDT3 Chinese corpus, RWKNN proves to be robust on unbalanced data and yields better performance than the traditional kNN.