Should We Translate? Evaluating Toxicity in Online Comments when Translating from Portuguese to English

Jordan K. Kobellarz, Thiago H. Silva
{"title":"Should We Translate? Evaluating Toxicity in Online Comments when Translating from Portuguese to English","authors":"Jordan K. Kobellarz, Thiago H. Silva","doi":"10.1145/3539637.3556892","DOIUrl":null,"url":null,"abstract":"Social media and online discussion platforms suffer from the prevalence of uncivil behavior, such as harassment and abuse, seeking to curb toxic comments. There are several approaches to classifying toxic comments automatically. Some of them have more resources and are more advanced in English, thus, stimulating the task of translating the text from a specific language to English. While researchers have shown evidence that this practice is indicated for certain tasks, such as sentiment analysis, little is known in the context of toxicity identification. In this research, we assess the performance of a freely available model for toxic language detection in online comments called Perspective API, widely adopted by some famous news media sites to identify different toxicity classes in online comments. For that, we obtained comments in Portuguese from two Brazilian news media websites during a politically polarized situation as a use case. Then, this dataset was translated to English and compared to four baseline datasets, two composed of highly toxic comments, one in Portuguese and other in English, and two composed of neutral comments, also one in Portuguese and other in English – all of them in its original language, not translated. Finally, human-annotated comments from the news comments dataset were analyzed to assess the scores provided by the Perspective API for the original and the translated versions. Results indicate that keeping the texts in their original language is preferable, even in comparing different languages. Nevertheless, if the translated version is strictly necessary, ways of dealing with the situation were suggested to preserve as much information as possible from the original version.","PeriodicalId":350776,"journal":{"name":"Proceedings of the Brazilian Symposium on Multimedia and the Web","volume":"106 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Brazilian Symposium on Multimedia and the Web","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3539637.3556892","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

Social media and online discussion platforms suffer from the prevalence of uncivil behavior, such as harassment and abuse, seeking to curb toxic comments. There are several approaches to classifying toxic comments automatically. Some of them have more resources and are more advanced in English, thus, stimulating the task of translating the text from a specific language to English. While researchers have shown evidence that this practice is indicated for certain tasks, such as sentiment analysis, little is known in the context of toxicity identification. In this research, we assess the performance of a freely available model for toxic language detection in online comments called Perspective API, widely adopted by some famous news media sites to identify different toxicity classes in online comments. For that, we obtained comments in Portuguese from two Brazilian news media websites during a politically polarized situation as a use case. Then, this dataset was translated to English and compared to four baseline datasets, two composed of highly toxic comments, one in Portuguese and other in English, and two composed of neutral comments, also one in Portuguese and other in English – all of them in its original language, not translated. Finally, human-annotated comments from the news comments dataset were analyzed to assess the scores provided by the Perspective API for the original and the translated versions. Results indicate that keeping the texts in their original language is preferable, even in comparing different languages. Nevertheless, if the translated version is strictly necessary, ways of dealing with the situation were suggested to preserve as much information as possible from the original version.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
我们应该翻译吗?评估从葡萄牙语翻译成英语时在线评论的毒性
社交媒体和在线讨论平台普遍存在不文明行为,例如骚扰和辱骂,试图遏制有毒评论。有几种方法可以自动对有毒评论进行分类。他们中的一些人资源更丰富,英语水平更高,因此,激发了将文本从特定语言翻译成英语的任务。虽然研究人员已经证明这种做法适用于某些任务,如情绪分析,但在毒性鉴定方面却知之甚少。在这项研究中,我们评估了一个免费的在线评论有毒语言检测模型的性能,该模型被称为Perspective API,被一些著名的新闻媒体网站广泛采用,用于识别在线评论中的不同毒性类别。为此,我们从两家巴西新闻媒体网站获得了葡萄牙语的评论,这是在政治两极分化的情况下作为用例。然后,该数据集被翻译成英语,并与四个基线数据集进行比较,其中两个由高度有害的评论组成,一个用葡萄牙语,另一个用英语,还有两个由中立的评论组成,一个用葡萄牙语,另一个用英语——所有这些都是原始语言,没有翻译。最后,分析来自新闻评论数据集的人工注释评论,以评估Perspective API为原始版本和翻译版本提供的分数。结果表明,即使在比较不同的语言时,保持文本的原始语言也是可取的。然而,如果翻译本是绝对必要的,建议的处理办法是尽可能多地保留原文的资料。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Evaluating Topic Modeling Pre-processing Pipelines for Portuguese Texts A Proposal to Apply SignWriting in IMSC1 Standard for the Next-Generation of Brazilian DTV Broadcasting System Once Learning for Looking and Identifying Based on YOLO-v5 Object Detection I can’t pay! Accessibility analysis of mobile banking apps Should We Translate? Evaluating Toxicity in Online Comments when Translating from Portuguese to English
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1