MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms

Findings Pub Date : 2024-01-25 DOI:10.48550/arXiv.2401.14526

Patrick Lee, Alain Chirino Trujillo, Diana Cuevas Plancarte, O. E. Ojo, Xinyi Liu, Iyanuoluwa Shode, Yuan Zhao, Jing Peng, Anna Feldman

{"title":"MEDs for PETs: Multilingual Euphemism Disambiguation for Potentially Euphemistic Terms","authors":"Patrick Lee, Alain Chirino Trujillo, Diana Cuevas Plancarte, O. E. Ojo, Xinyi Liu, Iyanuoluwa Shode, Yuan Zhao, Jing Peng, Anna Feldman","doi":"10.48550/arXiv.2401.14526","DOIUrl":null,"url":null,"abstract":"Euphemisms are found across the world’s languages, making them a universal linguistic phenomenon. As such, euphemistic data may have useful properties for computational tasks across languages. In this study, we explore this premise by training a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic “categories” such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.","PeriodicalId":508951,"journal":{"name":"Findings","volume":"281 3","pages":"875-881"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Findings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2401.14526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Euphemisms are found across the world’s languages, making them a universal linguistic phenomenon. As such, euphemistic data may have useful properties for computational tasks across languages. In this study, we explore this premise by training a multilingual transformer model (XLM-RoBERTa) to disambiguate potentially euphemistic terms (PETs) in multilingual and cross-lingual settings. In line with current trends, we demonstrate that zero-shot learning across languages takes place. We also show cases where multilingual models perform better on the task compared to monolingual models by a statistically significant margin, indicating that multilingual data presents additional opportunities for models to learn about cross-lingual, computational properties of euphemisms. In a follow-up analysis, we focus on universal euphemistic “categories” such as death and bodily functions among others. We test to see whether cross-lingual data of the same domain is more important than within-language data of other domains to further understand the nature of the cross-lingual transfer.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

MEDs for PETs：针对潜在委婉用语的多语种委婉消歧义法

委婉语遍布世界各种语言，是一种普遍的语言现象。因此，委婉语数据在跨语言计算任务中可能具有有用的特性。在本研究中，我们通过训练多语言转换器模型（XLM-RoBERTa）来探索这一前提，从而在多语言和跨语言环境中消除潜在委婉语（PET）的歧义。与当前趋势一致，我们证明了跨语言零点学习的发生。我们还展示了多语言模型在任务中的表现优于单语言模型的情况，其差异在统计学上非常明显，这表明多语言数据为模型学习委婉语的跨语言计算特性提供了更多机会。在后续分析中，我们将重点放在通用委婉语 "类别 "上，如死亡和身体机能等。我们将测试同一领域的跨语言数据是否比其他领域的语内数据更重要，以进一步了解跨语言迁移的性质。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Findings

自引率

0.00%

发文量