Extracting interesting related context-dependent concepts from social media streams using temporal distributions

2013 IEEE 29th International Conference on Data Engineering (ICDE) Pub Date : 2013-04-08 DOI:10.1109/ICDE.2013.6544931

C. Sayers, M. Hsu

引用次数: 0

Abstract

To enable the interactive exploration of large social media datasets we exploit the temporal distributions of word n-grams within the message stream to discover “interesting” concepts, determine “relatedness” between concepts, and find representative examples for display. We present a new algorithm for context-dependent “interestingness” using the coefficient of variation of the temporal distribution, apply the well-known technique of Pearson's Correlation to tweets using equi-height histogramming to determine correlation, and employ an asymmetric variant for computing “relatedness” to encourage exploration. We further introduce techniques using interestingness, correlation, and relatedness to automatically discover concepts and select preferred word N-grams for display. These techniques are demonstrated on an 800,000 tweet dataset from the Academy Awards.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用时间分布从社交媒体流中提取有趣的相关上下文相关概念

为了实现对大型社交媒体数据集的交互式探索，我们利用消息流中单词n-grams的时间分布来发现“有趣”的概念，确定概念之间的“相关性”，并找到具有代表性的示例进行显示。我们提出了一种使用时间分布变异系数的上下文相关“兴趣”新算法，将著名的Pearson相关技术应用于推文，使用等高直方图来确定相关性，并采用不对称变体来计算“相关性”以鼓励探索。我们进一步介绍了使用兴趣、相关性和相关性来自动发现概念并选择首选单词n图进行显示的技术。这些技术在来自奥斯卡奖的80万条tweet数据集上进行了演示。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊