面向社交媒体流数据的大规模仇恨言论检测系统的实现

2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT) Pub Date : 2022-11-03 DOI:10.1109/COMNETSAT56033.2022.9994299

Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do

{"title":"面向社交媒体流数据的大规模仇恨言论检测系统的实现","authors":"Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do","doi":"10.1109/COMNETSAT56033.2022.9994299","DOIUrl":null,"url":null,"abstract":"The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.","PeriodicalId":221444,"journal":{"name":"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Implementation of Large Scale Hate Speech Detection System for Streaming Social Media Data\",\"authors\":\"Long-An Doan, Phuong-Thao Nguyen, Thi-Oanh Phan, Trong-Hop Do\",\"doi\":\"10.1109/COMNETSAT56033.2022.9994299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.\",\"PeriodicalId\":221444,\"journal\":{\"name\":\"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMNETSAT56033.2022.9994299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMNETSAT56033.2022.9994299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

网络社交媒体的无所不在给社会带来了各种积极和消极的后果。除了好处之外，社交媒体还会因为仇恨和冒犯性的内容而造成很大的问题。利用机器学习技术检测和去除这些有毒内容是社交网络领域的一个重要研究课题。这个主题的两个挑战是，社交媒体数据的量是如此之大，这些数据需要实时处理。在本文中，我们着手开发使用机器学习和大数据技术检测越南YouTube评论中的仇恨言论的系统。来自Youtube的流数据使用Kafka, Spark和机器学习技术进行实时处理。最后，由Streamlit驱动的仪表板将用于显示结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

An Implementation of Large Scale Hate Speech Detection System for Streaming Social Media Data

The omnipresence of online social media brings various positive and negative consequences for society. Besides benefits, social media can cause big problem caused by hate and offensive contents. Detecting and removing those toxic contents using machine learning is a major research topic in social network. Two of the challenges of this topic are that the volume of social media data is so big and that these data need to be processed in real-time. In this paper, we set out to develop system to detect hate speech in Vietnamese YouTube comments using machine learning and big data technology. The streaming data from Youtube is processed in real-time using Kafka, Spark, and machine learning technology. Finally, a dashboard powered by Streamlit will be used to display the results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Conference on Communication, Networks and Satellite (COMNETSAT)

自引率

0.00%

发文量