{"title":"Histogram-Based Dimensionality Reduction of Term Vector Space","authors":"K. Ciesielski, M. Kłopotek, S. Wierzchon","doi":"10.1109/CISIM.2007.35","DOIUrl":null,"url":null,"abstract":"One of the most vital problems of free-text document processing is the curse of dimensionality. The paper presents a dimensionality reduction algorithm based on informed feature selection. Terms describing the document are based on histogram-like statistics which can be computed as well as incrementally updated at low complexity. The document representation can adapt to changing document collection characteristics. Along with the fundamental concepts we present an empirical verification of the approach.","PeriodicalId":350490,"journal":{"name":"6th International Conference on Computer Information Systems and Industrial Management Applications (CISIM'07)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"6th International Conference on Computer Information Systems and Industrial Management Applications (CISIM'07)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CISIM.2007.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
One of the most vital problems of free-text document processing is the curse of dimensionality. The paper presents a dimensionality reduction algorithm based on informed feature selection. Terms describing the document are based on histogram-like statistics which can be computed as well as incrementally updated at low complexity. The document representation can adapt to changing document collection characteristics. Along with the fundamental concepts we present an empirical verification of the approach.