Collective Latent Dirichlet Allocation

2008 Eighth IEEE International Conference on Data Mining Pub Date : 2008-12-15 DOI:10.1109/ICDM.2008.75

Zhiyong Shen, Junyi Sun, Yi-Dong Shen

引用次数: 17

Abstract

In this paper, we propose a new variant of latent Dirichlet allocation (LDA): Collective LDA (C-LDA), for multiple corpora modeling. C-LDA combines multiple corpora during learning such that it can transfer knowledge from one corpus to another; meanwhile it keeps a discriminative node which represents the corpus ID to constrain the learned topics in each corpus. Compared with LDA locally applied to the target corpus, C-LDA results in refined topic-word distribution, while compared with applying LDA globally and straightforwardly to the combined corpus, C-LDA keeps each topic only for one corpus. We demonstrate that C-LDA has improved performance with these advantages by experiments on several benchmark document data sets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

集体潜在狄利克雷分配

在本文中，我们提出了一种新的潜在狄利克雷分配(LDA)的变体:集体LDA (C-LDA)，用于多语料库建模。C-LDA在学习过程中结合多个语料库，实现了语料库之间的知识转移;同时保留一个表示语料库ID的判别节点来约束每个语料库中的学习主题。与局部应用于目标语料库的LDA相比，C-LDA得到了更精细的主题词分布，而与全局直接应用于组合语料库的LDA相比，C-LDA只对一个语料库保留每个主题。我们通过在几个基准文档数据集上的实验证明了C-LDA具有这些优势，从而提高了性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2008 Eighth IEEE International Conference on Data Mining

自引率

0.00%

发文量