印尼推文仇恨言论目标分类使用机器学习

Sandy Kurniawan, I. Budi
{"title":"印尼推文仇恨言论目标分类使用机器学习","authors":"Sandy Kurniawan, I. Budi","doi":"10.1109/ICIC50835.2020.9288515","DOIUrl":null,"url":null,"abstract":"In recent years, hate speech found in social media is increasing. The increase in the number of hate speech is caused by the increasing number of social media active users around the world. A lot of hate speech is aimed at governments or certain individuals. Hate speech is very harmful because it may affect the target negatively, whether the target is individuals or groups. Identification of targets in hate speech is crucial as it can be used to prevent the impact of hate speech such as exclusion, discrimination, and violence directed to the target in the hate speech. In this paper, we present our study in hate speech target classification in Indonesian Twitter. We studied hate speech target classification on Indonesian Twitter by comparing the classification performance based on the algorithms and feature representations used. Word n-grams were used as the feature representation combine with Bag-of-Words and Term Frequency - Inverse Document Frequency (TF-IDF). The classification was performed using Naive Bayes, Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT). The best result achieved F1-score of 0.84772 when using TF-IDF with word unigram features combine with SVM classifier.","PeriodicalId":413610,"journal":{"name":"2020 Fifth International Conference on Informatics and Computing (ICIC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Indonesian Tweets Hate Speech Target Classification using Machine Learning\",\"authors\":\"Sandy Kurniawan, I. Budi\",\"doi\":\"10.1109/ICIC50835.2020.9288515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, hate speech found in social media is increasing. The increase in the number of hate speech is caused by the increasing number of social media active users around the world. A lot of hate speech is aimed at governments or certain individuals. Hate speech is very harmful because it may affect the target negatively, whether the target is individuals or groups. Identification of targets in hate speech is crucial as it can be used to prevent the impact of hate speech such as exclusion, discrimination, and violence directed to the target in the hate speech. In this paper, we present our study in hate speech target classification in Indonesian Twitter. We studied hate speech target classification on Indonesian Twitter by comparing the classification performance based on the algorithms and feature representations used. Word n-grams were used as the feature representation combine with Bag-of-Words and Term Frequency - Inverse Document Frequency (TF-IDF). The classification was performed using Naive Bayes, Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT). The best result achieved F1-score of 0.84772 when using TF-IDF with word unigram features combine with SVM classifier.\",\"PeriodicalId\":413610,\"journal\":{\"name\":\"2020 Fifth International Conference on Informatics and Computing (ICIC)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Fifth International Conference on Informatics and Computing (ICIC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICIC50835.2020.9288515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Fifth International Conference on Informatics and Computing (ICIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIC50835.2020.9288515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

近年来,社交媒体上的仇恨言论越来越多。仇恨言论数量的增加是由全球社交媒体活跃用户数量的增加引起的。许多仇恨言论是针对政府或某些个人的。仇恨言论是非常有害的,因为它可能会对目标产生负面影响,无论目标是个人还是群体。确定仇恨言论的目标是至关重要的,因为它可以用来防止仇恨言论的影响,如排斥、歧视和针对仇恨言论目标的暴力。在本文中,我们提出了我们的研究在印尼Twitter仇恨言论目标分类。我们通过比较基于算法和特征表示的分类性能,研究了印度尼西亚Twitter上的仇恨言论目标分类。采用词n图作为特征表示,结合词袋和词频-逆文档频率(TF-IDF)。使用朴素贝叶斯、支持向量机(SVM)和随机森林决策树(RFDT)进行分类。结合单词单图特征的TF-IDF与SVM分类器结合使用,f1得分为0.84772,效果最好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Indonesian Tweets Hate Speech Target Classification using Machine Learning
In recent years, hate speech found in social media is increasing. The increase in the number of hate speech is caused by the increasing number of social media active users around the world. A lot of hate speech is aimed at governments or certain individuals. Hate speech is very harmful because it may affect the target negatively, whether the target is individuals or groups. Identification of targets in hate speech is crucial as it can be used to prevent the impact of hate speech such as exclusion, discrimination, and violence directed to the target in the hate speech. In this paper, we present our study in hate speech target classification in Indonesian Twitter. We studied hate speech target classification on Indonesian Twitter by comparing the classification performance based on the algorithms and feature representations used. Word n-grams were used as the feature representation combine with Bag-of-Words and Term Frequency - Inverse Document Frequency (TF-IDF). The classification was performed using Naive Bayes, Support Vector Machine (SVM), and Random Forest Decision Tree (RFDT). The best result achieved F1-score of 0.84772 when using TF-IDF with word unigram features combine with SVM classifier.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Task Design for Indonesian Cultural Heritage Data Collection with Crowdsourcing PenalViz: A Web-Based Visualization Tool for the Indonesian Penal Code Examining GOJEK Drivers' Loyalty: The Influence of GOJEK's Partnership Mechanism and Service Quality Modeling and Analysis of Three-Phase Active Power Filter Integrated Photovoltaic as a Reactive Power Compensator Using the Simulink Matlab Tool An Evaluation of Internet Addiction Test (IAT)
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1