SocialStories: Segmenting Stories within Trending Twitter Topics

Proceedings of the 3rd IKDD Conference on Data Science, 2016 Pub Date : 2016-03-13 DOI:10.1145/2888451.2888453

Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi

{"title":"SocialStories: Segmenting Stories within Trending Twitter Topics","authors":"Kokil Jaidka, Kaushik Ramachandran, Prakhar Gupta, Sajal Rustagi","doi":"10.1145/2888451.2888453","DOIUrl":null,"url":null,"abstract":"This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.","PeriodicalId":136431,"journal":{"name":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd IKDD Conference on Data Science, 2016","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2888451.2888453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

This study present SocialStories - a system based on incremental clustering for streaming tweets, for identifying fine-grained stories within a broader trending topic on Twitter. The contributions include a novel tf-metric, called the inverse cluster frequency, and a decay weighting for entities. We present our experiments on 0.19 million tweets posted in June 2014, revolving around the mentions of a software brand before, during and after a marketing conference and a software release. The novelty of our work is the text-based similarity calculation metrics, including a new similarity metric, called the inverse cluster frequency, and time-specific metrics that allow for the decay of old entities with the passage of time and preserve the homogeneity and the freshness of stories. We report improved performance and higher recall of 80%, against the gold standard (posthoc journalistic reports), as compared to LDA-, and Wavelet-based systems. Our algorithm is able to cluster 80% of all tweets into story-based clusters, which are 86% pure. It also enables earlier detection of trending stories than manual reports, and is far more accurate in identifying fine-grained stories within sub-topics as compared to baseline systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SocialStories:在Twitter热门话题中分割故事

这项研究提出了SocialStories——一个基于流式tweet增量聚类的系统，用于在Twitter上更广泛的趋势主题中识别细粒度的故事。贡献包括一种新的tf度量，称为逆聚类频率，以及实体的衰减加权。我们对2014年6月发布的19万条推文进行了实验，围绕一个软件品牌在营销会议和软件发布之前、期间和之后被提及的情况。我们工作的新颖之处在于基于文本的相似性计算度量，包括一种新的相似性度量，称为逆聚类频率，以及特定于时间的度量，该度量允许旧实体随着时间的推移而衰减，并保持故事的同质性和新鲜度。与基于LDA和小波的系统相比，我们报告了针对黄金标准(后新闻报道)的改进性能和更高的80%召回率。我们的算法能够将80%的推文聚类到基于故事的聚类中，其纯度为86%。它还可以比手工报告更早地检测趋势故事，并且与基线系统相比，在子主题中识别细粒度故事方面要准确得多。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings of the 3rd IKDD Conference on Data Science, 2016

自引率

0.00%

发文量

期刊最新文献

On the Dynamics of Username Changing Behavior on Twitter Smart filters for social retrieval Improving Urban Transportation through Social Media Analytics AMEO 2015: A dataset comprising AMCAT test scores, biodata details and employment outcomes of job seekers Learning from Gurus: Analysis and Modeling of Reopened Questions on Stack Overflow