We show that a previously proposed algorithm for the N-best trees problem can be made more efficient by changing how it arranges and explores the search space. Given an integer N and a weighted tree automaton (wta) M over the tropical semiring, the algorithm computes N trees of minimal weight with respect to M. Compared with the original algorithm, the modifications increase the laziness of the evaluation strategy, which makes the new algorithm asymptotically more efficient than its predecessor. The algorithm is implemented in the software Betty, and compared to the state-of-the-art algorithm for extracting the N best runs, implemented in the software toolkit Tiburon. The data sets used in the experiments are wtas resulting from real-world natural language processing tasks, as well as artificially created wtas with varying degrees of nondeterminism. We find that Betty outperforms Tiburon on all tested data sets with respect to running time, while Tiburon seems to be the more memory-efficient choice.
{"title":"Improved N-Best Extraction with an Evaluation on Language Data","authors":"Johanna Björklund, F. Drewes, Anna Jonsson","doi":"10.1162/coli_a_00427","DOIUrl":"https://doi.org/10.1162/coli_a_00427","url":null,"abstract":"We show that a previously proposed algorithm for the N-best trees problem can be made more efficient by changing how it arranges and explores the search space. Given an integer N and a weighted tree automaton (wta) M over the tropical semiring, the algorithm computes N trees of minimal weight with respect to M. Compared with the original algorithm, the modifications increase the laziness of the evaluation strategy, which makes the new algorithm asymptotically more efficient than its predecessor. The algorithm is implemented in the software Betty, and compared to the state-of-the-art algorithm for extracting the N best runs, implemented in the software toolkit Tiburon. The data sets used in the experiments are wtas resulting from real-world natural language processing tasks, as well as artificially created wtas with varying degrees of nondeterminism. We find that Betty outperforms Tiburon on all tested data sets with respect to running time, while Tiburon seems to be the more memory-efficient choice.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"119-153"},"PeriodicalIF":9.3,"publicationDate":"2021-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43997136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
As more users across the world are interacting with dialog agents in their daily life, there is a need for better speech understanding that calls for renewed attention to the dynamics between research in automatic speech recognition (ASR) and natural language understanding (NLU). We briefly review these research areas and lay out the current relationship between them. In light of the observations we make in this article, we argue that (1) NLU should be cognizant of the presence of ASR models being used upstream in a dialog system’s pipeline, (2) ASR should be able to learn from errors found in NLU, (3) there is a need for end-to-end data sets that provide semantic annotations on spoken input, (4) there should be stronger collaboration between ASR and NLU research communities.
{"title":"Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems","authors":"Manaal Faruqui, Dilek Z. Hakkani-Tür","doi":"10.1162/coli_a_00430","DOIUrl":"https://doi.org/10.1162/coli_a_00430","url":null,"abstract":"As more users across the world are interacting with dialog agents in their daily life, there is a need for better speech understanding that calls for renewed attention to the dynamics between research in automatic speech recognition (ASR) and natural language understanding (NLU). We briefly review these research areas and lay out the current relationship between them. In light of the observations we make in this article, we argue that (1) NLU should be cognizant of the presence of ASR models being used upstream in a dialog system’s pipeline, (2) ASR should be able to learn from errors found in NLU, (3) there is a need for end-to-end data sets that provide semantic annotations on spoken input, (4) there should be stronger collaboration between ASR and NLU research communities.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"221-232"},"PeriodicalIF":9.3,"publicationDate":"2021-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49216893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Data-hungry deep neural networks have established themselves as the de facto standard for many NLP tasks, including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind their statistical counterparts in low-resource scenarios. One methodology to counterattack this problem is text augmentation, that is, generating new synthetic training data points from existing data. Although NLP has recently witnessed several new textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies that perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion), and character (e.g., character swapping) levels. We systematically compare the methods on part-of-speech tagging, dependency parsing, and semantic role labeling for a diverse set of language families using various models, including the architectures that rely on pretrained multilingual contextualized language models such as mBERT. Augmentation most significantly improves dependency parsing, followed by part-of-speech tagging and semantic role labeling. We find the experimented techniques to be effective on morphologically rich languages in general rather than analytic languages such as Vietnamese. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT, especially for dependency parsing. We identify the character-level methods as the most consistent performers, while synonym replacement and syntactic augmenters provide inconsistent improvements. Finally, we discuss that the results most heavily depend on the task, language pair (e.g., syntactic-level techniques mostly benefit higher-level tasks and morphologically richer languages), and model type (e.g., token-level augmentation provides significant improvements for BPE, while character-level ones give generally higher scores for char and mBERT based models).
{"title":"To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP","authors":"Gözde Gül Şahin","doi":"10.1162/coli_a_00425","DOIUrl":"https://doi.org/10.1162/coli_a_00425","url":null,"abstract":"Abstract Data-hungry deep neural networks have established themselves as the de facto standard for many NLP tasks, including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind their statistical counterparts in low-resource scenarios. One methodology to counterattack this problem is text augmentation, that is, generating new synthetic training data points from existing data. Although NLP has recently witnessed several new textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies that perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion), and character (e.g., character swapping) levels. We systematically compare the methods on part-of-speech tagging, dependency parsing, and semantic role labeling for a diverse set of language families using various models, including the architectures that rely on pretrained multilingual contextualized language models such as mBERT. Augmentation most significantly improves dependency parsing, followed by part-of-speech tagging and semantic role labeling. We find the experimented techniques to be effective on morphologically rich languages in general rather than analytic languages such as Vietnamese. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT, especially for dependency parsing. We identify the character-level methods as the most consistent performers, while synonym replacement and syntactic augmenters provide inconsistent improvements. Finally, we discuss that the results most heavily depend on the task, language pair (e.g., syntactic-level techniques mostly benefit higher-level tasks and morphologically richer languages), and model type (e.g., token-level augmentation provides significant improvements for BPE, while character-level ones give generally higher scores for char and mBERT based models).","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"5-42"},"PeriodicalIF":9.3,"publicationDate":"2021-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49107971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
away other aspects of information, such as the speaker’s empathy, distinction of old/new information, emphasis, and so on. To climb up the hierarchy led to loss of information in lower levels of representation. In Tsujii (1986), instead of mapping at the abstract level, I proposed “transfer based on a bundle of features of all the levels”, in which the transfer would refer to all levels of representation in the source language to produce a corresponding representation in the target language (Figure 4). Because different levels of representation require different geometrical structures (i.e., different tree structures), the realization of this proposal had to wait for development of a clear mathematical formulation of feature-based 6 IS (Interface Structure) is dependent on a specific language. In particular, unlike the interlingual approach, Eurotra did not assume language-independent leximemes in ISs so that the transfer phase between the two ISs (source and target ISs) was indispensable. See footnote 5. 711 D ow naded rom httpdirect.m it.edu/coli/article-p7/1979478/coli_a_00420.pdf by gest on 04 M arch 2022 Computational Linguistics Volume 47, Number 4 Figure 4 Description-based transfer (Tsujii 1986). representation with reentrancy, which allowed multiple levels (i.e., multiple trees) to be represented with their mutual relationships (see the next section). Another idea we adopted to systematize the transfer phase was recursive transfer (Nagao and Tsujii 1986), which was inspired by the idea of compositional semantics in CL. According to the views of linguists at the time, a language is an infinite set of expressions which, in turn, is defined by a finite set of rules. By applying this finite number of rules, one can generate infinitely many grammatical sentences of the language. Compositional semantics claimed that the meaning of a phrase was determined by combining the meanings of its subphrases, using the rules that generated the phrase. Compositional translation applied the same idea to translation. That is, the translation of a phrase was determined by combining the translations of its subphrases. In this way, translations of infinitely many sentences of the source language could be generated. Using the compositional translation approach, the translation of a sentence would be undertaken by recursively tracing a tree structure of a source sentence. The translation of a phrase would then be formulated by combining the translations of its subphrases. That is, translation would be constructed in a bottom up manner, from smaller units of translation to larger units. Furthermore, because the mapping of a phrase from the source to the target would be determined by the lexical head of the phrase, the lexical entry for the head word specified how to map a phrase to the target. In the MU project, we called this lexicondriven, recursive transfer (Nagao and Tsujii 1986) (Figure 5). 712 D ow naded rom httpdirect.m it.edu/coli/article-p7/1979478/coli_
忽略信息的其他方面,比如说话人的同理心、新旧信息的区分、强调等。向上层攀爬会导致较低层次表征的信息丢失。在Tsujii(1986)中,我提出了“基于一束所有层次特征的迁移”,而不是抽象层次的映射,这种迁移将参考源语言中所有层次的表征,从而在目标语言中产生相应的表征(图4)。由于不同层次的表征需要不同的几何结构(即不同的树形结构),这一建议的实现必须等待一个明确的数学公式的发展,基于6 IS(接口结构)是依赖于特定的语言。特别是,与语际方法不同的是,Eurotra没有假设语际翻译中存在与语言无关的词素,因此两个语际翻译(源语和目标语)之间的转移阶段是必不可少的。见脚注5。从httpdirect中导入。m it.edu/coli/article-p7/1979478/coli_a_00420.pdf by gest on 04 march 2022计算语言学第47卷,第4号图4基于描述的转移(Tsujii 1986)。具有重入性的表示,它允许用它们的相互关系来表示多个级别(即多个树)(参见下一节)。我们采用的另一个系统化迁移阶段的想法是递归迁移(Nagao and Tsujii 1986),其灵感来自CL中的组合语义。根据当时语言学家的观点,语言是一组无限的表达,而这些表达又由一组有限的规则来定义。通过应用这些有限数量的规则,人们可以生成无限多的语言语法句子。组合语义学声称,一个短语的意思是通过使用生成短语的规则,将其子短语的意思组合在一起来确定的。作文翻译将同样的思想应用于翻译。也就是说,一个短语的翻译是通过结合它的子短语的翻译来确定的。通过这种方式,可以生成无限多的源语言句子的翻译。使用组合翻译方法,句子的翻译将通过递归地跟踪源句子的树状结构来进行。然后,一个短语的翻译将通过结合其子短语的翻译来制定。也就是说,翻译将以自下而上的方式构建,从较小的翻译单位到较大的翻译单位。此外,由于短语从源到目标的映射将由短语的词头决定,因此词头词的词法条目指定了如何将短语映射到目标。在MU项目中,我们称之为词典驱动的递归传输(Nagao and Tsujii 1986)(图5)。图5词典驱动的递归结构转移(Nagao and Tsujii 1986)。图6词汇转移时的消歧。第一代机器翻译系统用目标表达式替换源表达式的顺序杂乱无章,与之相比,MU项目中的传递顺序是明确定义和系统执行的。教训。第二代MT系统的研究和开发得益于对CL的研究,允许比第一代MT系统更清晰地定义架构和设计原则。MU项目在四年的时间内成功交付了英语-日语和日语-英语MT系统。如果没有这些cl驱动的设计原则,我们不可能在这么短的时间内交付这些结果。然而,这两个学科的目标之间的差异也变得清晰起来。CL理论倾向于关注语言的特定方面(如形态学、句法、语义、话语等),而MT系统必须能够处理语言传达的信息的所有方面。如前所述,仅关注命题内容的层次结构并不会产生好的翻译。CL和NLP之间更严重的差异是对各种歧义的处理。消歧义是大多数NLP任务中最重要的挑战;它要求处理要消除歧义的表达式所在的上下文。换句话说,它需要理解上下文。从httpdirect中导入。m it.edu/coli/article-p7/1979478/coli_a_00420.pdf by gest on 04 m march 2022计算语言学第47卷,第4号消歧的典型例子如图6所示。 日语单词“asobu”的核心含义是“花时间而不参与任何特定的有用任务”,根据上下文可以翻译成“玩”、“玩得开心”、“花时间”、“闲逛”等等。考虑消除歧义的上下文与递归转换相矛盾,因为它需要处理更大的单元(即,要翻译的单元所在的上下文)。消歧义的性质使得递归传递过程变得笨拙。消除歧义也是分析阶段的一个主要问题,我将在下一节中讨论这个问题。语言学习或语言学的主要(虽然是隐藏的)普遍局限是,它倾向于将语言视为一个独立的、封闭的系统,避免了理解问题,而理解问题需要参考知识或非语言语境。然而,许多NLP任务,包括机器翻译,需要从知识和上下文方面理解或解释语言表达,这可能涉及其他输入方式,如视觉刺激、声音等。我将在关于未来研究的章节中讨论这一点。4. 语法形式与解析背景与动机。在我从事机器翻译研究的时候,CL有了新的发展,即基于特征的语法形式主义(Kriege 1993)。乔姆斯基(N. Chomsky)理论语言学中的转换语法在其早期阶段假定树形转换规则的顺序应用阶段将结构的两个层次,即深层结构和表层结构联系起来。MT社区也有类似的想法。他们假设在层次结构中向上爬将涉及规则应用的顺序阶段,这些阶段从一个级别的表示映射到下一个相邻级别的另一个表示。因为每一层次都需要自己的几何结构,所以不可能有统一的非程序性表示,使所有层次的表示并存。这种观点被基于特征的形式化的出现所改变,这些形式化使用有向无环图(dag)来允许重入。它不是从一个层映射到另一个层,而是以声明的方式描述不同表示层之间的相互关系。这种观点与我们基于描述的迁移的想法是一致的,它使用了一组不同级别的特征来进行迁移。此外,当时的一些语法形式主义强调词头的重要性。也就是说,所有级别的局部结构都受到短语词头的约束,这些约束被编码到lexicon中。这也符合我们的词典驱动的迁移。与此同时,CL的进一步重大发展发生了。也就是说,一些规模可观的树库项目,最著名的是宾夕法尼亚树库和兰开斯特/IBM树库,重新激活了语料库语言学,并开始对CL和NLP的研究产生重大影响(Marcus et al. 1994)。从NLP的角度来看,这是过度概括。乔姆斯基的理论语言学明确地回避了与解释有关的问题,将语言视为一个封闭的系统。其他的语言传统有着更轻松、开放的态度。注意,转换语法考虑了一组用于从深层结构生成表层结构的规则。另一方面,“层次攀升”分析模型考虑了一套规则,从表象层次揭示表象的抽象层次。方向相反。歧义在转换语法中不会引起问题。从httpdirect中导入。Tsujii自然语言处理和计算语言学的观点认为,大型树库的出现导致了消除歧义的强大工具(即概率模型)的发展。我们开始研究将这两种趋势结合起来,使分析阶段系统化,即基于基于特征的语法形式的解析。研究的贡献。人们经常声称,模糊性的出现是由于约束不足造成的。在“爬上层次”模型的分析阶段,较低层次的处理不能引用较高层次表示中的约束。这被认为是在层次结构上升的早期阶段模糊性组合爆炸的主要原因。句法分析不能引用语义约束,这意味着句法分析中的歧义将会爆发。另一方面,由于基于特征的形式化可以在单个统一框架中描述所有级别的约束,因此可以参考所有级别的约束,从而缩小可能的解释集合。然而,在实践中,实际的语法仍然非常缺乏约束。 这部分是因为我们没有表达语义和语用约束的有效方法。计算语言学家对将句法和语义联系起来的形式化声明方式感兴趣
{"title":"Natural Language Processing and Computational Linguistics","authors":"Jun'ichi Tsujii","doi":"10.1162/coli_a_00420","DOIUrl":"https://doi.org/10.1162/coli_a_00420","url":null,"abstract":"away other aspects of information, such as the speaker’s empathy, distinction of old/new information, emphasis, and so on. To climb up the hierarchy led to loss of information in lower levels of representation. In Tsujii (1986), instead of mapping at the abstract level, I proposed “transfer based on a bundle of features of all the levels”, in which the transfer would refer to all levels of representation in the source language to produce a corresponding representation in the target language (Figure 4). Because different levels of representation require different geometrical structures (i.e., different tree structures), the realization of this proposal had to wait for development of a clear mathematical formulation of feature-based 6 IS (Interface Structure) is dependent on a specific language. In particular, unlike the interlingual approach, Eurotra did not assume language-independent leximemes in ISs so that the transfer phase between the two ISs (source and target ISs) was indispensable. See footnote 5. 711 D ow naded rom httpdirect.m it.edu/coli/article-p7/1979478/coli_a_00420.pdf by gest on 04 M arch 2022 Computational Linguistics Volume 47, Number 4 Figure 4 Description-based transfer (Tsujii 1986). representation with reentrancy, which allowed multiple levels (i.e., multiple trees) to be represented with their mutual relationships (see the next section). Another idea we adopted to systematize the transfer phase was recursive transfer (Nagao and Tsujii 1986), which was inspired by the idea of compositional semantics in CL. According to the views of linguists at the time, a language is an infinite set of expressions which, in turn, is defined by a finite set of rules. By applying this finite number of rules, one can generate infinitely many grammatical sentences of the language. Compositional semantics claimed that the meaning of a phrase was determined by combining the meanings of its subphrases, using the rules that generated the phrase. Compositional translation applied the same idea to translation. That is, the translation of a phrase was determined by combining the translations of its subphrases. In this way, translations of infinitely many sentences of the source language could be generated. Using the compositional translation approach, the translation of a sentence would be undertaken by recursively tracing a tree structure of a source sentence. The translation of a phrase would then be formulated by combining the translations of its subphrases. That is, translation would be constructed in a bottom up manner, from smaller units of translation to larger units. Furthermore, because the mapping of a phrase from the source to the target would be determined by the lexical head of the phrase, the lexical entry for the head word specified how to map a phrase to the target. In the MU project, we called this lexicondriven, recursive transfer (Nagao and Tsujii 1986) (Figure 5). 712 D ow naded rom httpdirect.m it.edu/coli/article-p7/1979478/coli_","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"707-727"},"PeriodicalIF":9.3,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44009399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Natural Language Processing: A Machine Learning Perspective by Yue Zhang and Zhiyang Teng","authors":"Julia Ive","doi":"10.1162/coli_r_00423","DOIUrl":"https://doi.org/10.1162/coli_r_00423","url":null,"abstract":"","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"233-235"},"PeriodicalIF":9.3,"publicationDate":"2021-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42794314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The importance and pervasiveness of emotions in our lives makes affective computing a tremendously important and vibrant line of work. Systems for automatic emotion recognition (AER) and sentiment analysis can be facilitators of enormous progress (e.g., in improving public health and commerce) but also enablers of great harm (e.g., for suppressing dissidents and manipulating voters). Thus, it is imperative that the affective computing community actively engage with the ethical ramifications of their creations. In this article, I have synthesized and organized information from AI Ethics and Emotion Recognition literature to present fifty ethical considerations relevant to AER. Notably, this ethics sheet fleshes out assumptions hidden in how AER is commonly framed, and in the choices often made regarding the data, method, and evaluation. Special attention is paid to the implications of AER on privacy and social groups. Along the way, key recommendations are made for responsible AER. The objective of the ethics sheet is to facilitate and encourage more thoughtfulness on why to automate, how to automate, and how to judge success well before the building of AER systems. Additionally, the ethics sheet acts as a useful introductory document on emotion recognition (complementing survey articles).
{"title":"Ethics Sheet for Automatic Emotion Recognition and Sentiment Analysis","authors":"Saif M. Mohammad","doi":"10.1162/coli_a_00433","DOIUrl":"https://doi.org/10.1162/coli_a_00433","url":null,"abstract":"Abstract The importance and pervasiveness of emotions in our lives makes affective computing a tremendously important and vibrant line of work. Systems for automatic emotion recognition (AER) and sentiment analysis can be facilitators of enormous progress (e.g., in improving public health and commerce) but also enablers of great harm (e.g., for suppressing dissidents and manipulating voters). Thus, it is imperative that the affective computing community actively engage with the ethical ramifications of their creations. In this article, I have synthesized and organized information from AI Ethics and Emotion Recognition literature to present fifty ethical considerations relevant to AER. Notably, this ethics sheet fleshes out assumptions hidden in how AER is commonly framed, and in the choices often made regarding the data, method, and evaluation. Special attention is paid to the implications of AER on privacy and social groups. Along the way, key recommendations are made for responsible AER. The objective of the ethics sheet is to facilitate and encourage more thoughtfulness on why to automate, how to automate, and how to judge success well before the building of AER systems. Additionally, the ethics sheet acts as a useful introductory document on emotion recognition (complementing survey articles).","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"239-278"},"PeriodicalIF":9.3,"publicationDate":"2021-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48330840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindvrich Helcl, Alexandra Birch
Abstract We present a survey covering the state of the art in low-resource machine translation (MT) research. There are currently around 7,000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available. We present a summary of this topical research field and provide a description of the techniques evaluated by researchers in several recent shared tasks in low-resource MT.
{"title":"Survey of Low-Resource Machine Translation","authors":"B. Haddow, Rachel Bawden, Antonio Valerio Miceli Barone, Jindvrich Helcl, Alexandra Birch","doi":"10.1162/coli_a_00446","DOIUrl":"https://doi.org/10.1162/coli_a_00446","url":null,"abstract":"Abstract We present a survey covering the state of the art in low-resource machine translation (MT) research. There are currently around 7,000 languages spoken in the world and almost all language pairs lack significant resources for training machine translation models. There has been increasing interest in research addressing the challenge of producing useful translation models when very little translated training data is available. We present a summary of this topical research field and provide a description of the techniques evaluated by researchers in several recent shared tasks in low-resource MT.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"673-732"},"PeriodicalIF":9.3,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43974444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fernando Alva-Manchego, Carolina Scarton, Lucia Specia
Abstract In order to simplify sentences, several rewriting operations can be performed, such as replacing complex words per simpler synonyms, deleting unnecessary information, and splitting long sentences. Despite this multi-operation nature, evaluation of automatic simplification systems relies on metrics that moderately correlate with human judgments on the simplicity achieved by executing specific operations (e.g., simplicity gain based on lexical replacements). In this article, we investigate how well existing metrics can assess sentence-level simplifications where multiple operations may have been applied and which, therefore, require more general simplicity judgments. For that, we first collect a new and more reliable data set for evaluating the correlation of metrics and human judgments of overall simplicity. Second, we conduct the first meta-evaluation of automatic metrics in Text Simplification, using our new data set (and other existing data) to analyze the variation of the correlation between metrics’ scores and human judgments across three dimensions: the perceived simplicity level, the system type, and the set of references used for computation. We show that these three aspects affect the correlations and, in particular, highlight the limitations of commonly used operation-specific metrics. Finally, based on our findings, we propose a set of recommendations for automatic evaluation of multi-operation simplifications, suggesting which metrics to compute and how to interpret their scores.
{"title":"The (Un)Suitability of Automatic Evaluation Metrics for Text Simplification","authors":"Fernando Alva-Manchego, Carolina Scarton, Lucia Specia","doi":"10.1162/coli_a_00418","DOIUrl":"https://doi.org/10.1162/coli_a_00418","url":null,"abstract":"Abstract In order to simplify sentences, several rewriting operations can be performed, such as replacing complex words per simpler synonyms, deleting unnecessary information, and splitting long sentences. Despite this multi-operation nature, evaluation of automatic simplification systems relies on metrics that moderately correlate with human judgments on the simplicity achieved by executing specific operations (e.g., simplicity gain based on lexical replacements). In this article, we investigate how well existing metrics can assess sentence-level simplifications where multiple operations may have been applied and which, therefore, require more general simplicity judgments. For that, we first collect a new and more reliable data set for evaluating the correlation of metrics and human judgments of overall simplicity. Second, we conduct the first meta-evaluation of automatic metrics in Text Simplification, using our new data set (and other existing data) to analyze the variation of the correlation between metrics’ scores and human judgments across three dimensions: the perceived simplicity level, the system type, and the set of references used for computation. We show that these three aspects affect the correlations and, in particular, highlight the limitations of commonly used operation-specific metrics. Finally, based on our findings, we propose a set of recommendations for automatic evaluation of multi-operation simplifications, suggesting which metrics to compute and how to interpret their scores.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"861-889"},"PeriodicalIF":9.3,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45077149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract The universal generation problem for LFG grammars is the problem of determining whether a given grammar derives any terminal string with a given f-structure. It is known that this problem is decidable for acyclic f-structures. In this brief note, we show that for those f-structures the problem is nonetheless intractable. This holds even for grammars that are off-line parsable.
{"title":"LFG Generation from Acyclic F-Structures is NP-Hard","authors":"Jürgen Wedekind, R. Kaplan","doi":"10.1162/coli_a_00419","DOIUrl":"https://doi.org/10.1162/coli_a_00419","url":null,"abstract":"Abstract The universal generation problem for LFG grammars is the problem of determining whether a given grammar derives any terminal string with a given f-structure. It is known that this problem is decidable for acyclic f-structures. In this brief note, we show that for those f-structures the problem is nonetheless intractable. This holds even for grammars that are off-line parsable.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"939-946"},"PeriodicalIF":9.3,"publicationDate":"2021-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45076327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract This article describes an experiment to evaluate the impact of different types of ellipses discussed in theoretical linguistics on Neural Machine Translation (NMT), using English to Hindi/Telugu as source and target languages. Evaluation with manual methods shows that most of the errors made by Google NMT are located in the clause containing the ellipsis, the frequency of such errors is slightly more in Telugu than Hindi, and the translation adequacy shows improvement when ellipses are reconstructed with their antecedents. These findings not only confirm the importance of ellipses and their resolution for MT, but also hint toward a possible correlation between the translation of discourse devices like ellipses with the morphological incongruity of the source and target. We also observe that not all ellipses are translated poorly and benefit from reconstruction, advocating for a disparate treatment of different ellipses in MT research.
{"title":"Are Ellipses Important for Machine Translation?","authors":"Payal Khullar","doi":"10.1162/coli_a_00414","DOIUrl":"https://doi.org/10.1162/coli_a_00414","url":null,"abstract":"Abstract This article describes an experiment to evaluate the impact of different types of ellipses discussed in theoretical linguistics on Neural Machine Translation (NMT), using English to Hindi/Telugu as source and target languages. Evaluation with manual methods shows that most of the errors made by Google NMT are located in the clause containing the ellipsis, the frequency of such errors is slightly more in Telugu than Hindi, and the translation adequacy shows improvement when ellipses are reconstructed with their antecedents. These findings not only confirm the importance of ellipses and their resolution for MT, but also hint toward a possible correlation between the translation of discourse devices like ellipses with the morphological incongruity of the source and target. We also observe that not all ellipses are translated poorly and benefit from reconstruction, advocating for a disparate treatment of different ellipses in MT research.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"927-937"},"PeriodicalIF":9.3,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64495124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}