Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do
{"title":"An Implementation of Large Scale Hate Speech Detection System for Streaming Social Media Data","authors":"Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do","doi":"10.1109/COMNETSAT56033.2022.9994299","DOIUrl":null,"url":null,"abstract":"The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.","PeriodicalId":221444,"journal":{"name":"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMNETSAT56033.2022.9994299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.