新数字体裁的微调机器翻译质量评级尺度:以用户生成内容为例

IF 0.1 0 LANGUAGE & LINGUISTICS Estudios de Linguistica-Universidad de Alicante-ELUA Pub Date : 2022-07-19 DOI:10.14198/elua.21900
Miguel A. Candel-Mora
{"title":"新数字体裁的微调机器翻译质量评级尺度:以用户生成内容为例","authors":"Miguel A. Candel-Mora","doi":"10.14198/elua.21900","DOIUrl":null,"url":null,"abstract":"With the active participation of users in product review platforms, online consumer-generated content, and, more specifically, user-generated reviews, have become a clear reference in purchasing decision-making processes, which sometimes exceed the impact of advertising campaigns. A common feature of most tourism review platforms is the use of machine translation (MT) systems to immediately make reviews available to users in various languages. However, the quality of the MT output of these reviews varies greatly, primarily due to the subjective and unstructured nature of this digital genre. Different studies confirm that there are no universal quality rating scales. The assessment of MT output quality usually depends on factors such as the purpose of the text or the value given to the immediacy of the translation. New neural MT systems have been a revolution in the quality increase of the translated output; however, new lines of research are opening up to verify whether the quality of this new paradigm of MT can be assessed with the existing scales, mainly from previous rule-based systems and statistical translation, or whether it is necessary to develop new quality metrics specifically for these new intelligent systems. On the other hand, one of the questions that remain to be resolved in this new context of neural MT is whether the use of large amounts of textual data in the training of these systems is as effective as the use of less data but of higher quality and better-adjusted to the specialty and type of text for which it is used. Based on the hypothesis that each genre requires specific quality rating scales, this work identifies the error patterns and textual characteristics of online user reviews from a corpus-based approach analysis that will contribute to adapting quality rating scales to this specific digital genre.","PeriodicalId":40982,"journal":{"name":"Estudios de Linguistica-Universidad de Alicante-ELUA","volume":"149 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2022-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fine-tuning machine translation quality-rating scales for new digital genres: The case of user-generated content\",\"authors\":\"Miguel A. Candel-Mora\",\"doi\":\"10.14198/elua.21900\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the active participation of users in product review platforms, online consumer-generated content, and, more specifically, user-generated reviews, have become a clear reference in purchasing decision-making processes, which sometimes exceed the impact of advertising campaigns. A common feature of most tourism review platforms is the use of machine translation (MT) systems to immediately make reviews available to users in various languages. However, the quality of the MT output of these reviews varies greatly, primarily due to the subjective and unstructured nature of this digital genre. Different studies confirm that there are no universal quality rating scales. The assessment of MT output quality usually depends on factors such as the purpose of the text or the value given to the immediacy of the translation. New neural MT systems have been a revolution in the quality increase of the translated output; however, new lines of research are opening up to verify whether the quality of this new paradigm of MT can be assessed with the existing scales, mainly from previous rule-based systems and statistical translation, or whether it is necessary to develop new quality metrics specifically for these new intelligent systems. On the other hand, one of the questions that remain to be resolved in this new context of neural MT is whether the use of large amounts of textual data in the training of these systems is as effective as the use of less data but of higher quality and better-adjusted to the specialty and type of text for which it is used. Based on the hypothesis that each genre requires specific quality rating scales, this work identifies the error patterns and textual characteristics of online user reviews from a corpus-based approach analysis that will contribute to adapting quality rating scales to this specific digital genre.\",\"PeriodicalId\":40982,\"journal\":{\"name\":\"Estudios de Linguistica-Universidad de Alicante-ELUA\",\"volume\":\"149 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2022-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Estudios de Linguistica-Universidad de Alicante-ELUA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14198/elua.21900\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Estudios de Linguistica-Universidad de Alicante-ELUA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14198/elua.21900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 1

摘要

随着用户在产品评论平台上的积极参与,在线消费者生成的内容,更具体地说,用户生成的评论,已经成为购买决策过程中的明确参考,有时甚至超过了广告活动的影响。大多数旅游评论平台的一个共同特点是使用机器翻译(MT)系统,以各种语言立即向用户提供评论。然而,这些评论的MT输出的质量差异很大,主要是由于这种数字类型的主观和非结构化性质。不同的研究证实,没有通用的质量评定量表。机器翻译输出质量的评估通常取决于文本的目的或翻译的即时性等因素。新的神经机器翻译系统在翻译输出的质量提高方面是一场革命;然而,新的研究方向正在开辟,以验证这种新的机器翻译范式的质量是否可以用现有的尺度来评估,主要来自以前基于规则的系统和统计翻译,或者是否有必要专门为这些新的智能系统开发新的质量指标。另一方面,在神经机器翻译的新背景下,仍然需要解决的问题之一是,在这些系统的训练中使用大量的文本数据是否与使用更少的数据但质量更高,更好地适应所使用的专业和文本类型一样有效。基于每种类型都需要特定的质量评分量表的假设,本工作通过基于语料库的方法分析确定了在线用户评论的错误模式和文本特征,这将有助于使质量评分量表适应这种特定的数字类型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Fine-tuning machine translation quality-rating scales for new digital genres: The case of user-generated content
With the active participation of users in product review platforms, online consumer-generated content, and, more specifically, user-generated reviews, have become a clear reference in purchasing decision-making processes, which sometimes exceed the impact of advertising campaigns. A common feature of most tourism review platforms is the use of machine translation (MT) systems to immediately make reviews available to users in various languages. However, the quality of the MT output of these reviews varies greatly, primarily due to the subjective and unstructured nature of this digital genre. Different studies confirm that there are no universal quality rating scales. The assessment of MT output quality usually depends on factors such as the purpose of the text or the value given to the immediacy of the translation. New neural MT systems have been a revolution in the quality increase of the translated output; however, new lines of research are opening up to verify whether the quality of this new paradigm of MT can be assessed with the existing scales, mainly from previous rule-based systems and statistical translation, or whether it is necessary to develop new quality metrics specifically for these new intelligent systems. On the other hand, one of the questions that remain to be resolved in this new context of neural MT is whether the use of large amounts of textual data in the training of these systems is as effective as the use of less data but of higher quality and better-adjusted to the specialty and type of text for which it is used. Based on the hypothesis that each genre requires specific quality rating scales, this work identifies the error patterns and textual characteristics of online user reviews from a corpus-based approach analysis that will contribute to adapting quality rating scales to this specific digital genre.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
37
期刊最新文献
Reseña del libro: Vela Delfa, Cristina y Cantamutto, Lucía (2021). Los emojis en la interacción digital escrita A vueltas con la oposición imperfecto / pretérito perfecto simple. Ventajas descriptivas y pedagógicas de un enfoque temporal-epistémico Riqueza léxica en la producción escrita en español de alumnos chinos: un estudio basado en la teoría de los sistemas dinámicos complejos La dependencia sintáctica y la correlación temporal como valores definitorios de las formas verbales en la enseñanza del español como lengua extranjera. El caso del pretérito perfecto de subjuntivo Los valores del pretérito perfecto compuesto y del simple en las áreas geográficas del español: variación diatópica y tratamiento en ELE
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1