新数字体裁的微调机器翻译质量评级尺度:以用户生成内容为例

IF 0.1 0 LANGUAGE & LINGUISTICS Estudios de Linguistica-Universidad de Alicante-ELUA Pub Date : 2022-07-19 DOI:10.14198/elua.21900

Miguel A. Candel-Mora

{"title":"新数字体裁的微调机器翻译质量评级尺度:以用户生成内容为例","authors":"Miguel A. Candel-Mora","doi":"10.14198/elua.21900","DOIUrl":null,"url":null,"abstract":"With the active participation of users in product review platforms, online consumer-generated content, and, more specifically, user-generated reviews, have become a clear reference in purchasing decision-making processes, which sometimes exceed the impact of advertising campaigns. A common feature of most tourism review platforms is the use of machine translation (MT) systems to immediately make reviews available to users in various languages. However, the quality of the MT output of these reviews varies greatly, primarily due to the subjective and unstructured nature of this digital genre. Different studies confirm that there are no universal quality rating scales. The assessment of MT output quality usually depends on factors such as the purpose of the text or the value given to the immediacy of the translation. New neural MT systems have been a revolution in the quality increase of the translated output; however, new lines of research are opening up to verify whether the quality of this new paradigm of MT can be assessed with the existing scales, mainly from previous rule-based systems and statistical translation, or whether it is necessary to develop new quality metrics specifically for these new intelligent systems. On the other hand, one of the questions that remain to be resolved in this new context of neural MT is whether the use of large amounts of textual data in the training of these systems is as effective as the use of less data but of higher quality and better-adjusted to the specialty and type of text for which it is used. Based on the hypothesis that each genre requires specific quality rating scales, this work identifies the error patterns and textual characteristics of online user reviews from a corpus-based approach analysis that will contribute to adapting quality rating scales to this specific digital genre.","PeriodicalId":40982,"journal":{"name":"Estudios de Linguistica-Universidad de Alicante-ELUA","volume":"149 1","pages":""},"PeriodicalIF":0.1000,"publicationDate":"2022-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Fine-tuning machine translation quality-rating scales for new digital genres: The case of user-generated content\",\"authors\":\"Miguel A. Candel-Mora\",\"doi\":\"10.14198/elua.21900\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the active participation of users in product review platforms, online consumer-generated content, and, more specifically, user-generated reviews, have become a clear reference in purchasing decision-making processes, which sometimes exceed the impact of advertising campaigns. A common feature of most tourism review platforms is the use of machine translation (MT) systems to immediately make reviews available to users in various languages. However, the quality of the MT output of these reviews varies greatly, primarily due to the subjective and unstructured nature of this digital genre. Different studies confirm that there are no universal quality rating scales. The assessment of MT output quality usually depends on factors such as the purpose of the text or the value given to the immediacy of the translation. New neural MT systems have been a revolution in the quality increase of the translated output; however, new lines of research are opening up to verify whether the quality of this new paradigm of MT can be assessed with the existing scales, mainly from previous rule-based systems and statistical translation, or whether it is necessary to develop new quality metrics specifically for these new intelligent systems. On the other hand, one of the questions that remain to be resolved in this new context of neural MT is whether the use of large amounts of textual data in the training of these systems is as effective as the use of less data but of higher quality and better-adjusted to the specialty and type of text for which it is used. Based on the hypothesis that each genre requires specific quality rating scales, this work identifies the error patterns and textual characteristics of online user reviews from a corpus-based approach analysis that will contribute to adapting quality rating scales to this specific digital genre.\",\"PeriodicalId\":40982,\"journal\":{\"name\":\"Estudios de Linguistica-Universidad de Alicante-ELUA\",\"volume\":\"149 1\",\"pages\":\"\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2022-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Estudios de Linguistica-Universidad de Alicante-ELUA\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14198/elua.21900\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Estudios de Linguistica-Universidad de Alicante-ELUA","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14198/elua.21900","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 1

摘要

随着用户在产品评论平台上的积极参与，在线消费者生成的内容，更具体地说，用户生成的评论，已经成为购买决策过程中的明确参考，有时甚至超过了广告活动的影响。大多数旅游评论平台的一个共同特点是使用机器翻译(MT)系统，以各种语言立即向用户提供评论。然而，这些评论的MT输出的质量差异很大，主要是由于这种数字类型的主观和非结构化性质。不同的研究证实，没有通用的质量评定量表。机器翻译输出质量的评估通常取决于文本的目的或翻译的即时性等因素。新的神经机器翻译系统在翻译输出的质量提高方面是一场革命;然而，新的研究方向正在开辟，以验证这种新的机器翻译范式的质量是否可以用现有的尺度来评估，主要来自以前基于规则的系统和统计翻译，或者是否有必要专门为这些新的智能系统开发新的质量指标。另一方面，在神经机器翻译的新背景下，仍然需要解决的问题之一是，在这些系统的训练中使用大量的文本数据是否与使用更少的数据但质量更高，更好地适应所使用的专业和文本类型一样有效。基于每种类型都需要特定的质量评分量表的假设，本工作通过基于语料库的方法分析确定了在线用户评论的错误模式和文本特征，这将有助于使质量评分量表适应这种特定的数字类型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Fine-tuning machine translation quality-rating scales for new digital genres: The case of user-generated content

With the active participation of users in product review platforms, online consumer-generated content, and, more specifically, user-generated reviews, have become a clear reference in purchasing decision-making processes, which sometimes exceed the impact of advertising campaigns. A common feature of most tourism review platforms is the use of machine translation (MT) systems to immediately make reviews available to users in various languages. However, the quality of the MT output of these reviews varies greatly, primarily due to the subjective and unstructured nature of this digital genre. Different studies confirm that there are no universal quality rating scales. The assessment of MT output quality usually depends on factors such as the purpose of the text or the value given to the immediacy of the translation. New neural MT systems have been a revolution in the quality increase of the translated output; however, new lines of research are opening up to verify whether the quality of this new paradigm of MT can be assessed with the existing scales, mainly from previous rule-based systems and statistical translation, or whether it is necessary to develop new quality metrics specifically for these new intelligent systems. On the other hand, one of the questions that remain to be resolved in this new context of neural MT is whether the use of large amounts of textual data in the training of these systems is as effective as the use of less data but of higher quality and better-adjusted to the specialty and type of text for which it is used. Based on the hypothesis that each genre requires specific quality rating scales, this work identifies the error patterns and textual characteristics of online user reviews from a corpus-based approach analysis that will contribute to adapting quality rating scales to this specific digital genre.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Estudios de Linguistica-Universidad de Alicante-ELUA LANGUAGE & LINGUISTICS-

自引率

0.00%

发文量