SocialStories: Segmenting Stories within Trending Twitter Topics

Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi
{"title":"SocialStories: Segmenting Stories within Trending Twitter Topics","authors":"Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi","doi":"10.1145/2888451.2888453","DOIUrl":null,"url":null,"abstract":"This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2888451.2888453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SocialStories:在Twitter热门话题中分割故事
这项研究提出了SocialStories——一个基于流式tweet增量聚类的系统,用于在Twitter上更广泛的趋势主题中识别细粒度的故事。贡献包括一种新的tf度量,称为逆聚类频率,以及实体的衰减加权。我们对2014年6月发布的19万条推文进行了实验,围绕一个软件品牌在营销会议和软件发布之前、期间和之后被提及的情况。我们工作的新颖之处在于基于文本的相似性计算度量,包括一种新的相似性度量,称为逆聚类频率,以及特定于时间的度量,该度量允许旧实体随着时间的推移而衰减,并保持故事的同质性和新鲜度。与基于LDA和小波的系统相比,我们报告了针对黄金标准(后新闻报道)的改进性能和更高的80%召回率。我们的算法能够将80%的推文聚类到基于故事的聚类中,其纯度为86%。它还可以比手工报告更早地检测趋势故事,并且与基线系统相比,在子主题中识别细粒度故事方面要准确得多。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
On the Dynamics of Username Changing Behavior on Twitter Smart filters for social retrieval Improving Urban Transportation through Social Media Analytics AMEO 2015: A dataset comprising AMCAT test scores, biodata details and employment outcomes of job seekers Learning from Gurus: Analysis and Modeling of Reopened Questions on Stack Overflow
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1