{"title":"πLDA: document clustering with selective structural constraints","authors":"Siliang Tang, Hanqi Wang, Jian Shao, Fei Wu, Ming Chen, Yueting Zhuang","doi":"10.1145/2502081.2502196","DOIUrl":null,"url":null,"abstract":"Segments, such as sentence boundaries in texts or annotated regions in images, can be considered as useful structural constraints (i.e., priors) for unsupervised topic modeling. However, some segment units (e.g., words in texts or visual words in images) inside a given segment may be irrelevant to the topic of this segment due to their characteristics. This paper proposes a model called πLDA, which introduces a latent variable π into LDA, a traditional topic model, to capture the characteristic of each segment unit. That is to say, the πLDA model is conducted to determine whether a segment unit is assigned (or selected) to the topic embedded in its corresponding segment. Compared with other approaches that assume all the segment units in one segment to share a common topic, our proposed πLDA has the selective ability to discover the discriminative segment units (e.g., informative words or visual words). Experimental results and interpretations of them are presented for demonstrating the promising performance of our method.","PeriodicalId":20448,"journal":{"name":"Proceedings of the 21st ACM international conference on Multimedia","volume":"72 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2502081.2502196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Segments, such as sentence boundaries in texts or annotated regions in images, can be considered as useful structural constraints (i.e., priors) for unsupervised topic modeling. However, some segment units (e.g., words in texts or visual words in images) inside a given segment may be irrelevant to the topic of this segment due to their characteristics. This paper proposes a model called πLDA, which introduces a latent variable π into LDA, a traditional topic model, to capture the characteristic of each segment unit. That is to say, the πLDA model is conducted to determine whether a segment unit is assigned (or selected) to the topic embedded in its corresponding segment. Compared with other approaches that assume all the segment units in one segment to share a common topic, our proposed πLDA has the selective ability to discover the discriminative segment units (e.g., informative words or visual words). Experimental results and interpretations of them are presented for demonstrating the promising performance of our method.