{"title":"不确定性感知主题建模可视化","authors":"Valerie Müller, Christian Sieg, L. Linsen","doi":"10.1109/VIS4DH53644.2021.00007","DOIUrl":null,"url":null,"abstract":"Topic modeling is a state-of-the-art technique for analyzing text corpora. It uses a statistical model, most commonly Latent Dirichlet Allocation (LDA), to discover abstract topics that occur in the document collection. However, the LDA-based topic modeling procedure is based on a randomly selected initial configuration as well as a number of parameter values than need to be chosen. This induces uncertainties on the topic modeling results, and visualization methods should convey these uncertainties during the analysis process. We propose a visual uncertainty-aware topic modeling analysis. We capture the uncertainty by computing topic modeling ensembles and propose measures for estimating topic modeling uncertainty from the ensemble. Then, we propose to enhance state-of-the-art topic modeling visualization methods to convey the uncertainty in the topic modeling process. We visualize the entire ensemble of topic modeling results at different levels for topic and document analysis. We apply our visualization methods to a text corpus to document the impact of uncertainty on the analysis.","PeriodicalId":148635,"journal":{"name":"2021 IEEE 6th Workshop on Visualization for the Digital Humanities (VIS4DH)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Uncertainty-aware Topic Modeling Visualization\",\"authors\":\"Valerie Müller, Christian Sieg, L. Linsen\",\"doi\":\"10.1109/VIS4DH53644.2021.00007\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Topic modeling is a state-of-the-art technique for analyzing text corpora. It uses a statistical model, most commonly Latent Dirichlet Allocation (LDA), to discover abstract topics that occur in the document collection. However, the LDA-based topic modeling procedure is based on a randomly selected initial configuration as well as a number of parameter values than need to be chosen. This induces uncertainties on the topic modeling results, and visualization methods should convey these uncertainties during the analysis process. We propose a visual uncertainty-aware topic modeling analysis. We capture the uncertainty by computing topic modeling ensembles and propose measures for estimating topic modeling uncertainty from the ensemble. Then, we propose to enhance state-of-the-art topic modeling visualization methods to convey the uncertainty in the topic modeling process. We visualize the entire ensemble of topic modeling results at different levels for topic and document analysis. We apply our visualization methods to a text corpus to document the impact of uncertainty on the analysis.\",\"PeriodicalId\":148635,\"journal\":{\"name\":\"2021 IEEE 6th Workshop on Visualization for the Digital Humanities (VIS4DH)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 6th Workshop on Visualization for the Digital Humanities (VIS4DH)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/VIS4DH53644.2021.00007\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 6th Workshop on Visualization for the Digital Humanities (VIS4DH)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/VIS4DH53644.2021.00007","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Topic modeling is a state-of-the-art technique for analyzing text corpora. It uses a statistical model, most commonly Latent Dirichlet Allocation (LDA), to discover abstract topics that occur in the document collection. However, the LDA-based topic modeling procedure is based on a randomly selected initial configuration as well as a number of parameter values than need to be chosen. This induces uncertainties on the topic modeling results, and visualization methods should convey these uncertainties during the analysis process. We propose a visual uncertainty-aware topic modeling analysis. We capture the uncertainty by computing topic modeling ensembles and propose measures for estimating topic modeling uncertainty from the ensemble. Then, we propose to enhance state-of-the-art topic modeling visualization methods to convey the uncertainty in the topic modeling process. We visualize the entire ensemble of topic modeling results at different levels for topic and document analysis. We apply our visualization methods to a text corpus to document the impact of uncertainty on the analysis.