{"title":"Similarity-based matrix completion algorithm for latent semantic indexing","authors":"Andri Mirzal","doi":"10.1109/ICCSCE.2013.6719936","DOIUrl":null,"url":null,"abstract":"Latent semantic indexing (LSI) is an indexing method to improve performance of an information retrieval system by indexing terms that appear in related documents and weakening influences of terms that appear in unrelated documents. LSI usually is conducted by using the truncated singular value decomposition (SVD). The main difficulty in using this technique is its retrieval performance depends strongly on the choosing of an appropriate decomposition rank. In this paper, by observing the fact that the truncated SVD makes the related documents more connected, we devise a matrix completion algorithm that can mimick this capability. The proposed algorithm is nonparametric, has convergence guarantee, and produces a unique solution for each input. Thus it is more practical and easier to use than the truncated SVD. Experimental results using four standard datasets in LSI research show that the retrieval performances of the proposed algorithm are comparable to the best results offered by the truncated SVD over some decomposition ranks.","PeriodicalId":319285,"journal":{"name":"2013 IEEE International Conference on Control System, Computing and Engineering","volume":"353 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Conference on Control System, Computing and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSCE.2013.6719936","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Latent semantic indexing (LSI) is an indexing method to improve performance of an information retrieval system by indexing terms that appear in related documents and weakening influences of terms that appear in unrelated documents. LSI usually is conducted by using the truncated singular value decomposition (SVD). The main difficulty in using this technique is its retrieval performance depends strongly on the choosing of an appropriate decomposition rank. In this paper, by observing the fact that the truncated SVD makes the related documents more connected, we devise a matrix completion algorithm that can mimick this capability. The proposed algorithm is nonparametric, has convergence guarantee, and produces a unique solution for each input. Thus it is more practical and easier to use than the truncated SVD. Experimental results using four standard datasets in LSI research show that the retrieval performances of the proposed algorithm are comparable to the best results offered by the truncated SVD over some decomposition ranks.