{"title":"Design of a method to support Twitter based event detection with heterogeneous data resources","authors":"Koichi Sato, Junbo Wang, Zixue Cheng","doi":"10.1109/ICAWST.2017.8256476","DOIUrl":null,"url":null,"abstract":"There is a high demand for observation of events of public concern in a real time manner by analyzing Big Data. Twitter is a suitable data resource for event detection due to amount of data/users in the Twitter system, and high frequency of data generation. The possibility of event detection by tweets has been proved by a lot of researches. However it still has the following two problems. The first problem is the reliability of information, since tweets are always very noisy and fake information appears in them. The second problem is the lack of enough information for each tweet. It is because a tweet is restricted to 140 letters, so that it can not describe much information. One possible solution is to retrieve additional information, which is related to a Twitter based event detection result, from heterogeneous data resources such as articles, Web Pages, blog posts etc. If the information is retrieved, it can be used to validate the detection result and also provide as further information to enhance the detection result. However properly retrieving related contents from heterogeneous data resources is not easy because of different types of data. To solve the above problem, we propose a method to retrieve additional information related to a set of tweets, which is detected as an event, from heterogeneous data resources by measuring similarity (distance) between them with Normalized Compression Distance. We mainly consider articles in the web as the additional information for Twitter based event detection, since they are well validated and edited. We evaluate the proposed method in experiments, and the results show that it has high anti-noise capability and performs well in practical situation.","PeriodicalId":378618,"journal":{"name":"2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 8th International Conference on Awareness Science and Technology (iCAST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAWST.2017.8256476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
There is a high demand for observation of events of public concern in a real time manner by analyzing Big Data. Twitter is a suitable data resource for event detection due to amount of data/users in the Twitter system, and high frequency of data generation. The possibility of event detection by tweets has been proved by a lot of researches. However it still has the following two problems. The first problem is the reliability of information, since tweets are always very noisy and fake information appears in them. The second problem is the lack of enough information for each tweet. It is because a tweet is restricted to 140 letters, so that it can not describe much information. One possible solution is to retrieve additional information, which is related to a Twitter based event detection result, from heterogeneous data resources such as articles, Web Pages, blog posts etc. If the information is retrieved, it can be used to validate the detection result and also provide as further information to enhance the detection result. However properly retrieving related contents from heterogeneous data resources is not easy because of different types of data. To solve the above problem, we propose a method to retrieve additional information related to a set of tweets, which is detected as an event, from heterogeneous data resources by measuring similarity (distance) between them with Normalized Compression Distance. We mainly consider articles in the web as the additional information for Twitter based event detection, since they are well validated and edited. We evaluate the proposed method in experiments, and the results show that it has high anti-noise capability and performs well in practical situation.