{"title":"User interest modeling by labeled LDA with topic features","authors":"Wenfeng Li, Xiaojie Wang, Rile Hu, Jilei Tian","doi":"10.1109/CCIS.2011.6045022","DOIUrl":null,"url":null,"abstract":"As well known, the user interest is carried in the user's web browsing history that can be mined out. This paper presents an innovative method to extract user's interests from his/her web browsing history. We first apply an efficient algorithm to extract useful texts from the web pages in user's browsed URL sequence. We then proposed a Labeled Latent Dirichlet Allocation with Topic Feature (LLDA-TF) to mine user's interests from the texts. Unlike other works that need a lot of training data to train a model to adopt supervised information, we directly introduce the raw supervised information to the procedure of LLDA-TF. As shown in the experimental results, results given by LLDA-TF fit predefined categories well. Furthermore, LLDA-TF model can name the user interests by category words as well as a keyword list for each category.","PeriodicalId":128504,"journal":{"name":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cloud Computing and Intelligence Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCIS.2011.6045022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
As well known, the user interest is carried in the user's web browsing history that can be mined out. This paper presents an innovative method to extract user's interests from his/her web browsing history. We first apply an efficient algorithm to extract useful texts from the web pages in user's browsed URL sequence. We then proposed a Labeled Latent Dirichlet Allocation with Topic Feature (LLDA-TF) to mine user's interests from the texts. Unlike other works that need a lot of training data to train a model to adopt supervised information, we directly introduce the raw supervised information to the procedure of LLDA-TF. As shown in the experimental results, results given by LLDA-TF fit predefined categories well. Furthermore, LLDA-TF model can name the user interests by category words as well as a keyword list for each category.