从社交媒体数据中进行稳定和时间主题检测的统一模型

2013 IEEE 29th International Conference on Data Engineering (ICDE) Pub Date : 2013-04-08 DOI:10.1109/ICDE.2013.6544864

Hongzhi Yin, B. Cui, Hua Lu, Yuxin Huang, Junjie Yao

{"title":"从社交媒体数据中进行稳定和时间主题检测的统一模型","authors":"Hongzhi Yin, B. Cui, Hua Lu, Yuxin Huang, Junjie Yao","doi":"10.1109/ICDE.2013.6544864","DOIUrl":null,"url":null,"abstract":"Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.","PeriodicalId":399979,"journal":{"name":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"77","resultStr":"{\"title\":\"A unified model for stable and temporal topic detection from social media data\",\"authors\":\"Hongzhi Yin, B. Cui, Hua Lu, Yuxin Huang, Junjie Yao\",\"doi\":\"10.1109/ICDE.2013.6544864\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.\",\"PeriodicalId\":399979,\"journal\":{\"name\":\"2013 IEEE 29th International Conference on Data Engineering (ICDE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"77\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 29th International Conference on Data Engineering (ICDE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2013.6544864\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 29th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2013.6544864","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 77

摘要

Web 2.0用户在在线社交媒体上生成和传播大量信息。这些用户生成的内容是临时主题(例如突发事件)和稳定主题(例如用户兴趣)的混合。由于时间话题和稳定话题的性质不同，在社交媒体中区分时间话题和稳定话题是非常重要和有用的。然而，这种区分是非常具有挑战性的，因为社交媒体中用户生成的文本长度非常短，因此缺乏有用的语言特征，无法使用传统方法进行精确分析。在本文中，我们提出了一种新的解决方案，可以同时从社交媒体数据中检测稳定话题和时态话题。具体来说，提出了一个统一的用户-时间混合模型来区分时间主题和稳定主题。为了提高该模型的性能，我们设计了一个利用社会网络中先验空间信息的正则化框架，以及一个利用时间维度上的时间先验信息的突发加权平滑方案。我们在Del.icio.us和Twitter的两个真实数据集上进行了大量的实验来评估我们的建议。实验结果表明，该混合模型能够在一次检测过程中区分出时间主题和稳定主题。我们的混合模型增强了空间正则化和突发加权平滑方案，在主题检测精度和对稳定和时间主题的区分方面明显优于竞争对手的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A unified model for stable and temporal topic detection from social media data

Web 2.0 users generate and spread huge amounts of messages in online social media. Such user-generated contents are mixture of temporal topics (e.g., breaking events) and stable topics (e.g., user interests). Due to their different natures, it is important and useful to distinguish temporal topics from stable topics in social media. However, such a discrimination is very challenging because the user-generated texts in social media are very short in length and thus lack useful linguistic features for precise analysis using traditional approaches. In this paper, we propose a novel solution to detect both stable and temporal topics simultaneously from social media data. Specifically, a unified user-temporal mixture model is proposed to distinguish temporal topics from stable topics. To improve this model's performance, we design a regularization framework that exploits prior spatial information in a social network, as well as a burst-weighted smoothing scheme that exploits temporal prior information in the time dimension. We conduct extensive experiments to evaluate our proposal on two real data sets obtained from Del.icio.us and Twitter. The experimental results verify that our mixture model is able to distinguish temporal topics from stable topics in a single detection process. Our mixture model enhanced with the spatial regularization and the burst-weighted smoothing scheme significantly outperforms competitor approaches, in terms of topic detection accuracy and discrimination in stable and temporal topics.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助