Extracting interesting related context-dependent concepts from social media streams using temporal distributions

C. Sayers, M. Hsu
{"title":"Extracting interesting related context-dependent concepts from social media streams using temporal distributions","authors":"C. Sayers, M. Hsu","doi":"10.1109/ICDE.2013.6544931","DOIUrl":null,"url":null,"abstract":"To enable the interactive exploration of large social media datasets we exploit the temporal distributions of word n-grams within the message stream to discover “interesting” concepts, determine “relatedness” between concepts, and find representative examples for display. We present a new algorithm for context-dependent “interestingness” using the coefficient of variation of the temporal distribution, apply the well-known technique of Pearson's Correlation to tweets using equi-height histogramming to determine correlation, and employ an asymmetric variant for computing “relatedness” to encourage exploration. We further introduce techniques using interestingness, correlation, and relatedness to automatically discover concepts and select preferred word N-grams for display. These techniques are demonstrated on an 800,000 tweet dataset from the Academy Awards.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2013.6544931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

To enable the interactive exploration of large social media datasets we exploit the temporal distributions of word n-grams within the message stream to discover “interesting” concepts, determine “relatedness” between concepts, and find representative examples for display. We present a new algorithm for context-dependent “interestingness” using the coefficient of variation of the temporal distribution, apply the well-known technique of Pearson's Correlation to tweets using equi-height histogramming to determine correlation, and employ an asymmetric variant for computing “relatedness” to encourage exploration. We further introduce techniques using interestingness, correlation, and relatedness to automatically discover concepts and select preferred word N-grams for display. These techniques are demonstrated on an 800,000 tweet dataset from the Academy Awards.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
使用时间分布从社交媒体流中提取有趣的相关上下文相关概念
为了实现对大型社交媒体数据集的交互式探索,我们利用消息流中单词n-grams的时间分布来发现“有趣”的概念,确定概念之间的“相关性”,并找到具有代表性的示例进行显示。我们提出了一种使用时间分布变异系数的上下文相关“兴趣”新算法,将著名的Pearson相关技术应用于推文,使用等高直方图来确定相关性,并采用不对称变体来计算“相关性”以鼓励探索。我们进一步介绍了使用兴趣、相关性和相关性来自动发现概念并选择首选单词n图进行显示的技术。这些技术在来自奥斯卡奖的80万条tweet数据集上进行了演示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Big data integration T-share: A large-scale dynamic taxi ridesharing service Coupled clustering ensemble: Incorporating coupling relationships both between base clusterings and objects The adaptive radix tree: ARTful indexing for main-memory databases Learning to rank from distant supervision: Exploiting noisy redundancy for relational entity search
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1