Online Emerging Topic Detection on Twitter Using Random Forest with Stock Indicator Features

2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2018-07-01 DOI:10.1109/JCSSE.2018.8457349

Ekapop Verasakulvong, P. Vateekul, Apivadee Piyatumrong, Chatchawal Sangkeettrakarn

{"title":"Online Emerging Topic Detection on Twitter Using Random Forest with Stock Indicator Features","authors":"Ekapop Verasakulvong, P. Vateekul, Apivadee Piyatumrong, Chatchawal Sangkeettrakarn","doi":"10.1109/JCSSE.2018.8457349","DOIUrl":null,"url":null,"abstract":"Social media is one of the most impactful and fastest communication methods. By monitoring Twitter streams, we are able to detect emerging topics and understand events around the world. There are some prior attempts that aim to online detect topics on Twitter. However, they can only detect bursty topics by using user-defined keywords a long with simple rules. In this paper, we propose an algorithm to detect emerging topics on Twitter streams. To detect emerging topics, a clustering technique has been applied to aggregate a set of keywords. Since an emerging topic occurs continuously, the emerging topics are merged with stateful technique to accumulate topics from different time intervals. To detect both high signal topics and small-medium signal topics, we use statistical features based on average, acceleration, and z-score. Moreover, we propose to include the stock indicator features: Relative Strength Index (RSI) and Stochastic Oscillator (STOCH). They are common features in trend (oversold and overbought) detection in stock analysis which is similar to our topic detection in twitter. To capture any event patterns, Random Forest (RF) has been proposed as a classifier to detect emerging keywords by utilizing the stated above five features. To evaluate the performance, we created and published a corpus by collecting Twitter data for 10 days with over 80 million tweets and then labeling possible topics in tota1161 events along with related keywords. The experiment was conducted on our collected data. The Fl-results show that our model outperforms all baselines: TwitterMonitor, SigniTrend, and TopicSketch, in terms of detected keywords and topics.","PeriodicalId":338973,"journal":{"name":"2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE.2018.8457349","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Social media is one of the most impactful and fastest communication methods. By monitoring Twitter streams, we are able to detect emerging topics and understand events around the world. There are some prior attempts that aim to online detect topics on Twitter. However, they can only detect bursty topics by using user-defined keywords a long with simple rules. In this paper, we propose an algorithm to detect emerging topics on Twitter streams. To detect emerging topics, a clustering technique has been applied to aggregate a set of keywords. Since an emerging topic occurs continuously, the emerging topics are merged with stateful technique to accumulate topics from different time intervals. To detect both high signal topics and small-medium signal topics, we use statistical features based on average, acceleration, and z-score. Moreover, we propose to include the stock indicator features: Relative Strength Index (RSI) and Stochastic Oscillator (STOCH). They are common features in trend (oversold and overbought) detection in stock analysis which is similar to our topic detection in twitter. To capture any event patterns, Random Forest (RF) has been proposed as a classifier to detect emerging keywords by utilizing the stated above five features. To evaluate the performance, we created and published a corpus by collecting Twitter data for 10 days with over 80 million tweets and then labeling possible topics in tota1161 events along with related keywords. The experiment was conducted on our collected data. The Fl-results show that our model outperforms all baselines: TwitterMonitor, SigniTrend, and TopicSketch, in terms of detected keywords and topics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于库存指标特征的随机森林的Twitter在线新兴话题检测

社交媒体是最具影响力和最快的沟通方式之一。通过监控Twitter信息流，我们能够发现新兴话题，了解世界各地的事件。之前有一些尝试旨在在线检测Twitter上的话题。但是，它们只能通过使用用户定义的关键字和简单的规则来检测突发主题。在本文中，我们提出了一种算法来检测Twitter流中的新兴主题。为了检测新出现的主题，应用聚类技术对一组关键字进行聚合。由于新出现的主题是连续出现的，因此采用有状态技术对新出现的主题进行合并，从不同的时间间隔积累主题。为了检测高信号主题和中小信号主题，我们使用基于平均值、加速度和z-score的统计特征。此外，我们建议纳入股票指标特征:相对强弱指数(RSI)和随机振荡器(STOCH)。它们是股票分析趋势(超卖和超买)检测中的常见特征，类似于我们在twitter上的主题检测。为了捕获任何事件模式，随机森林(RF)被提出作为一种分类器，利用上述五个特征来检测新出现的关键字。为了评估性能，我们创建并发布了一个语料库，收集了10天内超过8000万条tweet的Twitter数据，然后在总共1161个事件中标记可能的主题以及相关关键字。实验是在我们收集的数据上进行的。结果表明，我们的模型在检测关键字和主题方面优于所有基线:TwitterMonitor, signittrend和TopicSketch。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 15th International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量

期刊最新文献

Android Forensic and Security Assessment for Hospital and Stock-and-Trade Applications in Thailand Traffic State Prediction Using Convolutional Neural Network Development of Low-Cost in-the-Ear EEG Prototype JCSSE 2018 Title Page JCSSE 2018 Session Chairs