{"title":"ContraVis: Contrastive and Visual Topic Modeling for Comparing Document Collections","authors":"T. Le, L. Akoglu","doi":"10.1145/3308558.3313617","DOIUrl":null,"url":null,"abstract":"Given posts on 'abortion' and posts on 'religion' from a political forum, how can we find topics that are discriminative and those in common? In general, (1) how can we compare and contrast two or more different ('labeled') document collections? Moreover, (2) how can we visualize the data (in 2-d or 3-d) to best reflect the similarities and differences between the collections? We introduce (to the best of our knowledge) the first contrastive and visual topic model, called ContraVis, that jointly addresses both problems: (1) contrastive topic modeling, and (2) contrastive visualization. That is, ContraVis learns not only latent topics but also embeddings for the documents, topics and labels for visualization. ContraVis exhibits three key properties by design. It is (i) Contrastive: It enables comparative analysis of different document corpora by extracting latent discriminative and common topics across labeled documents; (ii) Visually-expressive: Different from numerous existing models, it also produces a visualization for all of the documents, labels, and the extracted topics, where proximity in the coordinate space is reflective of proximity in semantic space; (iii) Unified: It extracts topics and visual coordinates simultaneously under a joint model. Through extensive experiments on real-world datasets, we show ContraVis 's potential for providing visual contrastive analysis of multiple document collections. We show both qualitatively and quantitatively that ContraVis significantly outperforms both unsupervised and supervised state-of-the-art topic models in contrastive power, semantic coherence and visual effectiveness.","PeriodicalId":23013,"journal":{"name":"The World Wide Web Conference","volume":"27 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-05-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The World Wide Web Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3308558.3313617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 12
Abstract
Given posts on 'abortion' and posts on 'religion' from a political forum, how can we find topics that are discriminative and those in common? In general, (1) how can we compare and contrast two or more different ('labeled') document collections? Moreover, (2) how can we visualize the data (in 2-d or 3-d) to best reflect the similarities and differences between the collections? We introduce (to the best of our knowledge) the first contrastive and visual topic model, called ContraVis, that jointly addresses both problems: (1) contrastive topic modeling, and (2) contrastive visualization. That is, ContraVis learns not only latent topics but also embeddings for the documents, topics and labels for visualization. ContraVis exhibits three key properties by design. It is (i) Contrastive: It enables comparative analysis of different document corpora by extracting latent discriminative and common topics across labeled documents; (ii) Visually-expressive: Different from numerous existing models, it also produces a visualization for all of the documents, labels, and the extracted topics, where proximity in the coordinate space is reflective of proximity in semantic space; (iii) Unified: It extracts topics and visual coordinates simultaneously under a joint model. Through extensive experiments on real-world datasets, we show ContraVis 's potential for providing visual contrastive analysis of multiple document collections. We show both qualitatively and quantitatively that ContraVis significantly outperforms both unsupervised and supervised state-of-the-art topic models in contrastive power, semantic coherence and visual effectiveness.