Topic Detection from Large Scale of Microblog Stream with High Utility Pattern Clustering

Jiajia Huang, Min Peng, Hua Wang
{"title":"Topic Detection from Large Scale of Microblog Stream with High Utility Pattern Clustering","authors":"Jiajia Huang, Min Peng, Hua Wang","doi":"10.1145/2809890.2809894","DOIUrl":null,"url":null,"abstract":"With the popularity of social media, detecting topics from microblog streams have become an increasingly important task. However, it's a challenge due to microblog streams have the characteristics of high-dimension, short and noisy content, fast changing, huge volume and so on. In this paper, we propose a high utility pattern clustering (HUPC) framework over microblog streams. This framework first extracts a group of representative patterns from the microblog stream, and then groups these patterns into topic clusters. This approach works well on large scale of microblog streams because it clusters the patterns that perform better in describing topics, rather than clustering noises and microblogs directly. Furthermore, the proposed framework can detect coherent topics and new emerging topics simultaneously. Extensive experimental results on Twitter streams and Sina Weibo streams show that the developed method achieves better performance than other existing topic detection methods, leading to a desirable solution of detecting event from microblog streams.","PeriodicalId":67056,"journal":{"name":"车间管理","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"车间管理","FirstCategoryId":"96","ListUrlMain":"https://doi.org/10.1145/2809890.2809894","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 35

Abstract

With the popularity of social media, detecting topics from microblog streams have become an increasingly important task. However, it's a challenge due to microblog streams have the characteristics of high-dimension, short and noisy content, fast changing, huge volume and so on. In this paper, we propose a high utility pattern clustering (HUPC) framework over microblog streams. This framework first extracts a group of representative patterns from the microblog stream, and then groups these patterns into topic clusters. This approach works well on large scale of microblog streams because it clusters the patterns that perform better in describing topics, rather than clustering noises and microblogs directly. Furthermore, the proposed framework can detect coherent topics and new emerging topics simultaneously. Extensive experimental results on Twitter streams and Sina Weibo streams show that the developed method achieves better performance than other existing topic detection methods, leading to a desirable solution of detecting event from microblog streams.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于高效用模式聚类的大规模微博流话题检测
随着社交媒体的普及,从微博流中检测话题已经成为一项越来越重要的任务。然而,由于微博流具有高维、内容短而杂、变化快、体量大等特点,这是一个挑战。本文提出了一种基于微博流的高效用模式聚类(HUPC)框架。该框架首先从微博流中提取一组有代表性的模式,然后将这些模式分组到主题集群中。这种方法在大规模的微博流上工作得很好,因为它聚类了在描述主题方面表现更好的模式,而不是直接聚类噪声和微博。此外,该框架还可以同时检测连贯主题和新出现的主题。在Twitter流和新浪微博流上的大量实验结果表明,所开发的方法比现有的其他话题检测方法具有更好的性能,为微博流事件检测提供了理想的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
316
期刊介绍:
期刊最新文献
Session details: Regular Paper Session II R-Apriori: An Efficient Apriori based Algorithm on Spark Session details: Regular Paper Session I Proceedings of the 8th Workshop on Ph.D. Workshop in Information and Knowledge Management Sparse Kernel Clustering of Massive High-Dimensional Data sets with Large Number of Clusters
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1