用实体主题模型发现连贯主题

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI) Pub Date : 2016-10-01 DOI:10.1109/WI.2016.0015

M. Allahyari, K. Kochut

{"title":"用实体主题模型发现连贯主题","authors":"M. Allahyari, K. Kochut","doi":"10.1109/WI.2016.0015","DOIUrl":null,"url":null,"abstract":"Probabilistic topic models are powerful techniques which are widely used for discovering topics or semantic content from a large collection of documents. However, because topic models are entirely unsupervised, they may lead to topics that are not understandable in applications. Recently, several knowledge-based topic models have been proposed which primarily use word-level domain knowledge in the model to enhance the topic coherence and ignore the rich information carried by entities (e.g persons, location, organizations, etc.) associated with the documents. Additionally, there exists a vast amount of prior knowledge (background knowledge) represented as ontologies and Linked Open Data (LOD), which can be incorporated into the topic models to produce coherent topics. In this paper, we introduce a novel entity-based topic model, called EntLDA, to effectively integrate an ontology with an entity topic model to improve the topic modeling process. Furthermore, to increase the coherence of the identified topics, we introduce a novel ontology-based regularization framework, which is then integrated with the EntLDA model. Our experimental results demonstrate the effectiveness of the proposed model in improving the coherence of the topics.","PeriodicalId":6513,"journal":{"name":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","volume":"17 1","pages":"26-33"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Discovering Coherent Topics with Entity Topic Models\",\"authors\":\"M. Allahyari, K. Kochut\",\"doi\":\"10.1109/WI.2016.0015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Probabilistic topic models are powerful techniques which are widely used for discovering topics or semantic content from a large collection of documents. However, because topic models are entirely unsupervised, they may lead to topics that are not understandable in applications. Recently, several knowledge-based topic models have been proposed which primarily use word-level domain knowledge in the model to enhance the topic coherence and ignore the rich information carried by entities (e.g persons, location, organizations, etc.) associated with the documents. Additionally, there exists a vast amount of prior knowledge (background knowledge) represented as ontologies and Linked Open Data (LOD), which can be incorporated into the topic models to produce coherent topics. In this paper, we introduce a novel entity-based topic model, called EntLDA, to effectively integrate an ontology with an entity topic model to improve the topic modeling process. Furthermore, to increase the coherence of the identified topics, we introduce a novel ontology-based regularization framework, which is then integrated with the EntLDA model. Our experimental results demonstrate the effectiveness of the proposed model in improving the coherence of the topics.\",\"PeriodicalId\":6513,\"journal\":{\"name\":\"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"volume\":\"17 1\",\"pages\":\"26-33\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2016.0015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2016.0015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 20

摘要

概率主题模型是一种强大的技术，广泛用于从大量文档中发现主题或语义内容。然而，由于主题模型是完全不受监督的，它们可能导致在应用程序中无法理解的主题。近年来，人们提出了几种基于知识的主题模型，这些模型主要利用词级领域知识来增强主题的连贯性，而忽略了与文档相关的实体(如人物、地点、组织等)所携带的丰富信息。此外，存在大量以本体和关联开放数据(LOD)表示的先验知识(背景知识)，这些知识可以被纳入主题模型以产生连贯的主题。在本文中，我们引入了一种新的基于实体的主题模型EntLDA，将本体与实体主题模型有效地集成在一起，以改进主题建模过程。此外，为了增加识别主题的一致性，我们引入了一种新的基于本体的正则化框架，然后将其与EntLDA模型集成。我们的实验结果证明了该模型在提高主题一致性方面的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Discovering Coherent Topics with Entity Topic Models

Probabilistic topic models are powerful techniques which are widely used for discovering topics or semantic content from a large collection of documents. However, because topic models are entirely unsupervised, they may lead to topics that are not understandable in applications. Recently, several knowledge-based topic models have been proposed which primarily use word-level domain knowledge in the model to enhance the topic coherence and ignore the rich information carried by entities (e.g persons, location, organizations, etc.) associated with the documents. Additionally, there exists a vast amount of prior knowledge (background knowledge) represented as ontologies and Linked Open Data (LOD), which can be incorporated into the topic models to produce coherent topics. In this paper, we introduce a novel entity-based topic model, called EntLDA, to effectively integrate an ontology with an entity topic model to improve the topic modeling process. Furthermore, to increase the coherence of the identified topics, we introduce a novel ontology-based regularization framework, which is then integrated with the EntLDA model. Our experimental results demonstrate the effectiveness of the proposed model in improving the coherence of the topics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE/WIC/ACM International Conference on Web Intelligence (WI)

自引率

0.00%

发文量

期刊最新文献

The Political Power of Twitter IEEE/WIC/ACM International Conference on Web Intelligence A Distributed Approach to Constructing Travel Solutions by Exploiting Web Resources Joint Model of Topics, Expertises, Activities and Trends for Question Answering Web Applications A Multi-context BDI Recommender System: From Theory to Simulation