Bilal Tahir, Kamran Amjad, Samar Firdous, M. Mehmood
{"title":"Public Health Surveillance System for Online Social Networks using One-Class Text Classification","authors":"Bilal Tahir, Kamran Amjad, Samar Firdous, M. Mehmood","doi":"10.1109/CEIT.2018.8751852","DOIUrl":null,"url":null,"abstract":"Public health surveillance by traditional means is a costly and time consuming process. Today, the widespread use of social media has enabled researchers to study different aspects of life such as health, lifestyle, etc. Anonymous postings on these forums enable people to benefit from the collective experience of others facing similar problems. To effectively discern target data from the outliers in a web corpus, an efficient mechanism is required. Traditional approaches such as keyword-based filtering results in the loss of relevant data due to limited vocabulary and lack of contextual information. In this paper, we present a data filtration framework based on Long short-term memory (LSTM) recurrent neural network model for one-class text classification. We compare similarity of regenerated texts using this model for each disease with the original text using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric for outlier filtration and classification. Optimal value of ROUGE similarity threshold is determined by introducing an optimization parameter that minimizes the misclassification rate. Leveraging data from three major online health forums, we show that our classification technique outperforms keyword-based filtering and conventional approach of multi-class text classification. Our classification technique can be effectively used for online social networks, search engines, and online recommender systems.","PeriodicalId":357613,"journal":{"name":"2018 6th International Conference on Control Engineering & Information Technology (CEIT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 6th International Conference on Control Engineering & Information Technology (CEIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEIT.2018.8751852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Public health surveillance by traditional means is a costly and time consuming process. Today, the widespread use of social media has enabled researchers to study different aspects of life such as health, lifestyle, etc. Anonymous postings on these forums enable people to benefit from the collective experience of others facing similar problems. To effectively discern target data from the outliers in a web corpus, an efficient mechanism is required. Traditional approaches such as keyword-based filtering results in the loss of relevant data due to limited vocabulary and lack of contextual information. In this paper, we present a data filtration framework based on Long short-term memory (LSTM) recurrent neural network model for one-class text classification. We compare similarity of regenerated texts using this model for each disease with the original text using Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metric for outlier filtration and classification. Optimal value of ROUGE similarity threshold is determined by introducing an optimization parameter that minimizes the misclassification rate. Leveraging data from three major online health forums, we show that our classification technique outperforms keyword-based filtering and conventional approach of multi-class text classification. Our classification technique can be effectively used for online social networks, search engines, and online recommender systems.