{"title":"Comparison between LDA & NMF for event-detection from large text stream data","authors":"Pranav Suri, N. Roy","doi":"10.1109/CIACT.2017.7977281","DOIUrl":null,"url":null,"abstract":"Usage of social network for topic identification has become essential when dealing with event detection, especially when the events impact the society. In order to address this task, machine learning algorithms and natural language processing techniques have been extensively used. In this paper, an approach to obtain meaningful data from Twitter has been discussed. Further, Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) have been used in order to detect topics from this textual data obtained from Twitter along with RSS feed of news headlines. The observed results show that both the algorithms perform well in detecting topics from text streams, the results of LDA being more semantically interpretable while NMF being faster of the two.","PeriodicalId":218079,"journal":{"name":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIACT.2017.7977281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
Abstract
Usage of social network for topic identification has become essential when dealing with event detection, especially when the events impact the society. In order to address this task, machine learning algorithms and natural language processing techniques have been extensively used. In this paper, an approach to obtain meaningful data from Twitter has been discussed. Further, Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) have been used in order to detect topics from this textual data obtained from Twitter along with RSS feed of news headlines. The observed results show that both the algorithms perform well in detecting topics from text streams, the results of LDA being more semantically interpretable while NMF being faster of the two.