{"title":"面向用户的半监督概率主题模型","authors":"Jing Li, Yongbin Qin, Ruizhang Huang","doi":"10.1109/COMPCOMM.2016.7924706","DOIUrl":null,"url":null,"abstract":"Topic modeling has been widely used to mine topics. However, users' individual needs are seldom considered, which is against the trend that individuation becomes more and more important. In this work, we propose a user-oriented probabilistic topic model based on Latent Dirichlet Allocation. Interested and uninterested words are used as supervised information to take users' preferences into account. A self-learning algorithm increasing the quantity of supervised information effectively are also presented. As a semi-supervised model, data with or without supervised information attached are treated differently. In the parameters inference, we integrate the Pólya urn model into the Gibbs sampling process to utilize different kinds of supervised information efficiently. Experiments conducted on real datasets show the model outperforms the state-of-the-art models.","PeriodicalId":210833,"journal":{"name":"2016 2nd IEEE International Conference on Computer and Communications (ICCC)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A user-oriented semi-supervised probabilistic topic model\",\"authors\":\"Jing Li, Yongbin Qin, Ruizhang Huang\",\"doi\":\"10.1109/COMPCOMM.2016.7924706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Topic modeling has been widely used to mine topics. However, users' individual needs are seldom considered, which is against the trend that individuation becomes more and more important. In this work, we propose a user-oriented probabilistic topic model based on Latent Dirichlet Allocation. Interested and uninterested words are used as supervised information to take users' preferences into account. A self-learning algorithm increasing the quantity of supervised information effectively are also presented. As a semi-supervised model, data with or without supervised information attached are treated differently. In the parameters inference, we integrate the Pólya urn model into the Gibbs sampling process to utilize different kinds of supervised information efficiently. Experiments conducted on real datasets show the model outperforms the state-of-the-art models.\",\"PeriodicalId\":210833,\"journal\":{\"name\":\"2016 2nd IEEE International Conference on Computer and Communications (ICCC)\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 2nd IEEE International Conference on Computer and Communications (ICCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMPCOMM.2016.7924706\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 2nd IEEE International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPCOMM.2016.7924706","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A user-oriented semi-supervised probabilistic topic model
Topic modeling has been widely used to mine topics. However, users' individual needs are seldom considered, which is against the trend that individuation becomes more and more important. In this work, we propose a user-oriented probabilistic topic model based on Latent Dirichlet Allocation. Interested and uninterested words are used as supervised information to take users' preferences into account. A self-learning algorithm increasing the quantity of supervised information effectively are also presented. As a semi-supervised model, data with or without supervised information attached are treated differently. In the parameters inference, we integrate the Pólya urn model into the Gibbs sampling process to utilize different kinds of supervised information efficiently. Experiments conducted on real datasets show the model outperforms the state-of-the-art models.