基于词嵌入的非参数球形主题建模。

Proceedings of the conference. Association for Computational Linguistics. Meeting Pub Date : 2016-08-01 DOI:10.18653/v1/P16-2087

Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, Sam Gershman

{"title":"基于词嵌入的非参数球形主题建模。","authors":"Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, Sam Gershman","doi":"10.18653/v1/P16-2087","DOIUrl":null,"url":null,"abstract":"Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.","PeriodicalId":74541,"journal":{"name":"Proceedings of the conference. Association for Computational Linguistics. Meeting","volume":"2016 ","pages":"537-542"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6327958/pdf/nihms-999400.pdf","citationCount":"78","resultStr":"{\"title\":\"Nonparametric Spherical Topic Modeling with Word Embeddings.\",\"authors\":\"Kayhan Batmanghelich, Ardavan Saeedi, Karthik Narasimhan, Sam Gershman\",\"doi\":\"10.18653/v1/P16-2087\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.\",\"PeriodicalId\":74541,\"journal\":{\"name\":\"Proceedings of the conference. Association for Computational Linguistics. Meeting\",\"volume\":\"2016 \",\"pages\":\"537-542\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6327958/pdf/nihms-999400.pdf\",\"citationCount\":\"78\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the conference. Association for Computational Linguistics. Meeting\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/P16-2087\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the conference. Association for Computational Linguistics. Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/P16-2087","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 78

摘要

传统的主题模型没有考虑语言的语义规律。最近的词的分布表示在诸如余弦相似度的方向度量上表现出语义一致性。然而，现有主题模型中使用的分类分布和高斯观测分布都不适合利用这种相关性。在本文中，我们建议使用von Mises-Fisher分布来模拟单位球上的单词密度。这种表示非常适合于定向数据。我们将层次狄利克雷过程用于基本主题模型，提出了一种基于随机变分推理的高效推理算法。该模型使我们能够自然地利用词嵌入的语义结构，同时灵活地发现主题的数量。实验表明，我们的方法在两种不同文本语料库的主题一致性方面优于竞争方法，同时提供了有效的推理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

摘要图片

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Nonparametric Spherical Topic Modeling with Word Embeddings.

Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the conference. Association for Computational Linguistics. Meeting

自引率

0.00%

发文量