Cross-domain Sentiment Classification in Spanish

Lautaro Estienne, Matías Vera, L. Vega
{"title":"Cross-domain Sentiment Classification in Spanish","authors":"Lautaro Estienne, Matías Vera, L. Vega","doi":"10.1109/ARGENCON55245.2022.9940056","DOIUrl":null,"url":null,"abstract":"Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.","PeriodicalId":318846,"journal":{"name":"2022 IEEE Biennial Congress of Argentina (ARGENCON)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Biennial Congress of Argentina (ARGENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ARGENCON55245.2022.9940056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Sentiment Classification is a fundamental task in the field of Natural Language Processing, and has very important academic and commercial applications. It aims to automatically predict the degree of sentiment present in a text that contains opinions and subjectivity at some level, like product and movie reviews, or tweets. This can be really difficult to accomplish, in part, because different domains of text contains different words and expressions. In addition, this difficulty increases when text is written in a non-English language due to the lack of databases and resources. As a consequence, several cross-domain and cross-language techniques are often applied to this task in order to improve the results. In this work we perform a study on the ability of a classification system trained with a large database of product reviews to generalize to different Spanish domains. Reviews were collected from the MercadoLibre website from seven Latin American countries, allowing the creation of a large and balanced dataset. Results suggest that generalization across domains is feasible though very challenging when trained with these product reviews, and can be improved by pre-training and fine-tuning the classification model.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
西班牙语跨域情感分类
情感分类是自然语言处理领域的一项基础任务,具有重要的学术和商业应用。它旨在自动预测在某种程度上包含观点和主观性的文本中存在的情绪程度,比如产品和电影评论,或者推文。这很难做到,部分原因是不同的文本领域包含不同的单词和表达式。此外,由于缺乏数据库和资源,当文本以非英语语言编写时,这种困难会增加。因此,为了改进结果,经常将一些跨领域和跨语言的技术应用于此任务。在这项工作中,我们对分类系统的能力进行了研究,该分类系统使用大型产品评论数据库进行训练,以推广到不同的西班牙语领域。从“自由市场”网站上收集了来自七个拉丁美洲国家的评论,从而创建了一个庞大而平衡的数据集。结果表明,尽管使用这些产品评论进行训练非常具有挑战性,但跨域泛化是可行的,并且可以通过预训练和微调分类模型来改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Propuestas de normativas para la disposición final de equipamientos de un parque eólico al finalizar su vida productiva Proyecto Laboratorios remotos en carreras de ingeniería de la Universidad Nacional de Tucumán Control de un convertidor DC-DC con puentes duales activos para adaptar niveles de tensión en microrredes de DC usando linealización por realimentación Las Competencias Transversales en Ingeniería. El Seminario Taller Como Herramienta Metodológica Procedimiento de sintonizado de tanques resonantes LCC para carga inalámbrica de vehículos eléctricos
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1