Yibing Guo, Yutao Huang, Ye Ding, Shuhan Qi, Xuan Wang, Qing Liao
{"title":"GPU-BTM: A Topic Model for Short Text using Auxiliary Information","authors":"Yibing Guo, Yutao Huang, Ye Ding, Shuhan Qi, Xuan Wang, Qing Liao","doi":"10.1109/DSC50466.2020.00037","DOIUrl":null,"url":null,"abstract":"Recently, short texts become very popular in social life. To understand short texts, researchers develop topic models to extract topic information. However, conventional topic models mainly focus on long documents which cannot deal with the sparsity problem of short text. In this paper, we propose a novel topic model for short text called GPU-BTM, which incorporates Generalized Pólya Urn technique into Biterm Topic Model. GPU-BTM utilizes the similarity information and the co-occurrence pattern of words simultaneously to handle the sparsity problem. Specifically, the GPU module considers the similarity information among words, so that GPU-BTM generates more coherent topics. On the other hand, BTM module tries to capture the co-occurrence pattern of words so that the enriched contexts relieve the data sparsity problem. In the experiment part, the results demonstrate that GPU-BTM model outperforms four latest comparison models on two real world short text datasets.","PeriodicalId":423182,"journal":{"name":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSC50466.2020.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, short texts become very popular in social life. To understand short texts, researchers develop topic models to extract topic information. However, conventional topic models mainly focus on long documents which cannot deal with the sparsity problem of short text. In this paper, we propose a novel topic model for short text called GPU-BTM, which incorporates Generalized Pólya Urn technique into Biterm Topic Model. GPU-BTM utilizes the similarity information and the co-occurrence pattern of words simultaneously to handle the sparsity problem. Specifically, the GPU module considers the similarity information among words, so that GPU-BTM generates more coherent topics. On the other hand, BTM module tries to capture the co-occurrence pattern of words so that the enriched contexts relieve the data sparsity problem. In the experiment part, the results demonstrate that GPU-BTM model outperforms four latest comparison models on two real world short text datasets.