OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter

Shatha Jaradat, Nima Dokoohaki, M. Matskin
{"title":"OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter","authors":"Shatha Jaradat, Nima Dokoohaki, M. Matskin","doi":"10.1109/ICDMW.2015.132","DOIUrl":null,"url":null,"abstract":"Analyzing media in real-time is of great importance with social media platforms at the epicenter of crunching, digesting and disseminating content to individuals connected to these platforms. Within this context, topic models, specially LDA, have gained strong momentum due to their scalability, inference power and their compact semantics. Although, state of the art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. As a result, in this manuscript we propose for a labelled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA model. We conclude by presenting experiments using a one year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state of the art baselines.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.132","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Analyzing media in real-time is of great importance with social media platforms at the epicenter of crunching, digesting and disseminating content to individuals connected to these platforms. Within this context, topic models, specially LDA, have gained strong momentum due to their scalability, inference power and their compact semantics. Although, state of the art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. As a result, in this manuscript we propose for a labelled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA model. We conclude by presenting experiments using a one year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state of the art baselines.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
OLLDA: Twitter中有监督的动态主题挖掘框架
实时分析媒体非常重要,因为社交媒体平台是处理、消化和向与这些平台相连的个人传播内容的中心。在这种背景下,主题模型,特别是LDA,由于其可扩展性、推理能力和紧凑的语义而获得了强劲的势头。尽管如此,目前的主题模型在处理动态到达平台的大量数据流方面存在不足,从而影响了它们的解释质量以及对信息过载的适应性。因此,在本文中,我们建议对LDA (OLLDA)进行标记和在线扩展,该扩展通过外部标记和快速消化实时更新的能力进行监督,从而使其更适应Twitter和平台。我们提出的扩展具有处理流中大量新到达的文档的能力,同时能够在tweet文本短且通常草率的情况下实现高主题推断质量。我们的方法主要使用了一种基于变分推理的近似推理技术,并结合了一个标记的LDA模型。最后,我们展示了使用Twitter数据抓取一年的实验,与最先进的基线相比,该实验显示了显著改进的主题推断和时间用户配置文件分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Large-Scale Linear Support Vector Ordinal Regression Solver Joint Recovery and Representation Learning for Robust Correlation Estimation Based on Partially Observed Data Accurate Classification of Biological Data Using Ensembles Large-Scale Unusual Time Series Detection Sentiment Polarity Classification Using Structural Features
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1