Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do
{"title":"面向社交媒体流数据的大规模仇恨言论检测系统的实现","authors":"Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do","doi":"10.1109/COMNETSAT56033.2022.9994299","DOIUrl":null,"url":null,"abstract":"The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.","PeriodicalId":221444,"journal":{"name":"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Implementation of Large Scale Hate Speech Detection System for Streaming Social Media Data\",\"authors\":\"Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do\",\"doi\":\"10.1109/COMNETSAT56033.2022.9994299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.\",\"PeriodicalId\":221444,\"journal\":{\"name\":\"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMNETSAT56033.2022.9994299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMNETSAT56033.2022.9994299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Implementation of Large Scale Hate Speech Detection System for Streaming Social Media Data
The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.