Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs

Ramesh Nallapati, William W. Cohen
{"title":"Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs","authors":"Ramesh Nallapati, William W. Cohen","doi":"10.1609/icwsm.v2i1.18621","DOIUrl":null,"url":null,"abstract":"In this work, we address the twin problems of unsupervised topic discovery and estimation of topic specific influence of blogs. We propose a new model that can be used to provide a user with highly influential blog postings on the topic of the user's interest. We adopt the framework of an unsupervised model called Latent Dirichlet Allocation, known for its effectiveness in topic discovery. An extension of this model, which we call Link-LDA, defines a generative model for hyperlinks and thereby models topic specific influence of documents, the problem of our interest. However, this model does not exploit the topical relationship between the documents on either side of a hyperlink, i.e., the notion that documents tend to link to other documents on the same topic. We propose a new model, called Link-PLSA-LDA, that combines PLSA and LDA into a single framework, and explicitly models the topical relationship between the linking and the linked document. The output of the new model on blog data reveals very interesting visualizations of topics and influential blogs on each topic. We also perform quantitative evaluation of the model using log-likelihood of unseen data and on the task of link prediction. Both experiments show that that the new model performs better, suggesting its superiority over Link-LDA in modeling topics and topic specific influence of blogs.","PeriodicalId":338112,"journal":{"name":"Proceedings of the International AAAI Conference on Web and Social Media","volume":"109 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"145","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International AAAI Conference on Web and Social Media","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1609/icwsm.v2i1.18621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 145

Abstract

In this work, we address the twin problems of unsupervised topic discovery and estimation of topic specific influence of blogs. We propose a new model that can be used to provide a user with highly influential blog postings on the topic of the user's interest. We adopt the framework of an unsupervised model called Latent Dirichlet Allocation, known for its effectiveness in topic discovery. An extension of this model, which we call Link-LDA, defines a generative model for hyperlinks and thereby models topic specific influence of documents, the problem of our interest. However, this model does not exploit the topical relationship between the documents on either side of a hyperlink, i.e., the notion that documents tend to link to other documents on the same topic. We propose a new model, called Link-PLSA-LDA, that combines PLSA and LDA into a single framework, and explicitly models the topical relationship between the linking and the linked document. The output of the new model on blog data reveals very interesting visualizations of topics and influential blogs on each topic. We also perform quantitative evaluation of the model using log-likelihood of unseen data and on the task of link prediction. Both experiments show that that the new model performs better, suggesting its superiority over Link-LDA in modeling topics and topic specific influence of blogs.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
链接- plsa - lda:一个新的博客主题和影响的无监督模型
在这项工作中,我们解决了博客的无监督主题发现和主题特定影响估计的双重问题。我们提出了一种新的模型,可以用来为用户提供关于用户感兴趣的话题的极具影响力的博客文章。我们采用了一种被称为潜在狄利克雷分配的无监督模型框架,该模型以其在主题发现方面的有效性而闻名。这个模型的扩展,我们称之为Link-LDA,为超链接定义了一个生成模型,从而为文档的特定主题影响建模,这是我们感兴趣的问题。但是,该模型没有利用超链接两侧文档之间的主题关系,即文档倾向于链接到同一主题的其他文档的概念。我们提出了一个新的模型,称为链接-PLSA-LDA,它将PLSA和LDA结合到一个框架中,并明确地模拟了链接和被链接文档之间的主题关系。新模型对博客数据的输出显示了非常有趣的主题可视化和每个主题上有影响力的博客。我们还使用未见数据的对数似然和链路预测任务对模型进行定量评估。两个实验都表明,新模型表现更好,表明其在主题建模和博客主题特定影响方面优于Link-LDA。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Statement of Removal AnnoBERT: Effectively Representing Multiple Annotators’ Label Choices to Improve Hate Speech Detection Just Another Day on Twitter: A Complete 24 Hours of Twitter Data #RoeOverturned: Twitter Dataset on the Abortion Rights Controversy SexWEs: Domain-Aware Word Embeddings via Cross-Lingual Semantic Specialisation for Chinese Sexism Detection in Social Media
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1