基于机器学习的人-阿拉伯语乌尔都语仇恨检测:基于大数据集和时间复杂度的比较研究

Mohsan Ali, Ali Muhammad, Muhammad Asad, Makhdoom Sajawal, C. Alexopoulos, Y. Charalabidis
{"title":"基于机器学习的人-阿拉伯语乌尔都语仇恨检测:基于大数据集和时间复杂度的比较研究","authors":"Mohsan Ali, Ali Muhammad, Muhammad Asad, Makhdoom Sajawal, C. Alexopoulos, Y. Charalabidis","doi":"10.1145/3575879.3576011","DOIUrl":null,"url":null,"abstract":"Social media users are growing daily, with hundreds of millions of active users per month on certain networking sites. For any administrative institution, the manual method for regulating user content is challenging. There are hundreds of languages through which you can direct your attention on the web. The Urdu language is among the most widely utilized languages in the world. We have proposed a quick way of detecting the content of Urdu language hate using machine learning models. We used the open data set and manually created instances to make this investigation viable on a balanced data set. Our experimental set-up has demonstrated that support vector machine in the detection of Urdu hatred detection is 81.87% accurate. The training time, testing time, and accuracy helped us select the best model for Urdu hate detection on social media sites. We also compared the training and testing times of various methods. Additionally, we demonstrated k and stratified folding via indexing to provide a better understanding of folding in machine learning. Finally, we compared our findings to those of previously published works in the field of Urdu hate detection.","PeriodicalId":164036,"journal":{"name":"Proceedings of the 26th Pan-Hellenic Conference on Informatics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Towards Perso-Arabic Urdu Language Hate Detection Using Machine Learning: A Comparative Study Based on a Large Dataset and Time-Complexity\",\"authors\":\"Mohsan Ali, Ali Muhammad, Muhammad Asad, Makhdoom Sajawal, C. Alexopoulos, Y. Charalabidis\",\"doi\":\"10.1145/3575879.3576011\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Social media users are growing daily, with hundreds of millions of active users per month on certain networking sites. For any administrative institution, the manual method for regulating user content is challenging. There are hundreds of languages through which you can direct your attention on the web. The Urdu language is among the most widely utilized languages in the world. We have proposed a quick way of detecting the content of Urdu language hate using machine learning models. We used the open data set and manually created instances to make this investigation viable on a balanced data set. Our experimental set-up has demonstrated that support vector machine in the detection of Urdu hatred detection is 81.87% accurate. The training time, testing time, and accuracy helped us select the best model for Urdu hate detection on social media sites. We also compared the training and testing times of various methods. Additionally, we demonstrated k and stratified folding via indexing to provide a better understanding of folding in machine learning. Finally, we compared our findings to those of previously published works in the field of Urdu hate detection.\",\"PeriodicalId\":164036,\"journal\":{\"name\":\"Proceedings of the 26th Pan-Hellenic Conference on Informatics\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 26th Pan-Hellenic Conference on Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3575879.3576011\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 26th Pan-Hellenic Conference on Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3575879.3576011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

社交媒体用户每天都在增长,某些社交网站上每月有数亿活跃用户。对于任何行政机构来说,手动管理用户内容的方法都是具有挑战性的。网上有数百种语言,你可以通过它们来引导你的注意力。乌尔都语是世界上使用最广泛的语言之一。我们提出了一种使用机器学习模型快速检测乌尔都语仇恨内容的方法。我们使用开放数据集并手动创建实例,以便在平衡数据集上进行调查。实验结果表明,支持向量机在乌尔都语仇恨检测中的准确率为81.87%。训练时间、测试时间和准确率帮助我们选择了社交媒体网站上乌尔都语仇恨检测的最佳模型。我们还比较了各种方法的训练和测试时间。此外,我们通过索引演示了k和分层折叠,以便更好地理解机器学习中的折叠。最后,我们将我们的发现与之前在乌尔都语仇恨检测领域发表的作品进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Towards Perso-Arabic Urdu Language Hate Detection Using Machine Learning: A Comparative Study Based on a Large Dataset and Time-Complexity
Social media users are growing daily, with hundreds of millions of active users per month on certain networking sites. For any administrative institution, the manual method for regulating user content is challenging. There are hundreds of languages through which you can direct your attention on the web. The Urdu language is among the most widely utilized languages in the world. We have proposed a quick way of detecting the content of Urdu language hate using machine learning models. We used the open data set and manually created instances to make this investigation viable on a balanced data set. Our experimental set-up has demonstrated that support vector machine in the detection of Urdu hatred detection is 81.87% accurate. The training time, testing time, and accuracy helped us select the best model for Urdu hate detection on social media sites. We also compared the training and testing times of various methods. Additionally, we demonstrated k and stratified folding via indexing to provide a better understanding of folding in machine learning. Finally, we compared our findings to those of previously published works in the field of Urdu hate detection.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Quantum Machine Learning in Drug Discovery: Current State and Challenges CNN-based Segmentation and Classification of Sound Streams under realistic conditions Exam Wizard e-assessment platform: new features, field test results and instructor’s experience A Neuro-Symbolic Approach for Fault Diagnosis in Smart Power Grids A combination of a Proximity technique and Weighted average for LP Problems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1