{"title":"Extracting latent structures in numerical classification: an investigation using two factor models","authors":"Arindam Choudhury, Y. Ong, A. Keane","doi":"10.1109/ICONIP.2002.1198992","DOIUrl":null,"url":null,"abstract":"We investigate the use of SVD based two factor models for numerical data classification. Motivations for such a study include the widespread success of such models (e.g, LSI) in textual information retrieval, emerging connections with well established statistical techniques and the increasing occurrence of mixed mode (text-and-numeric) data. A direct extension as well as an efficient modification of the LSI model applied to numerical data problems are presented and the associated problems and likely remedies discussed. The techniques under investigation are shown to perform competitively with respect to popular existing numerical classification techniques on a range of synthetic and real world benchmark data. In particular, we show that the modified LSI proposed in this work avoids confronting the optimal subspace selection problem yet generalizes well and remains computationally efficient for large data.","PeriodicalId":146553,"journal":{"name":"Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02.","volume":"12 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICONIP.2002.1198992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We investigate the use of SVD based two factor models for numerical data classification. Motivations for such a study include the widespread success of such models (e.g, LSI) in textual information retrieval, emerging connections with well established statistical techniques and the increasing occurrence of mixed mode (text-and-numeric) data. A direct extension as well as an efficient modification of the LSI model applied to numerical data problems are presented and the associated problems and likely remedies discussed. The techniques under investigation are shown to perform competitively with respect to popular existing numerical classification techniques on a range of synthetic and real world benchmark data. In particular, we show that the modified LSI proposed in this work avoids confronting the optimal subspace selection problem yet generalizes well and remains computationally efficient for large data.