Computational Linguistics最新文献_第10页

Deep Learning Approaches to Text Production 文本生成的深度学习方法

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-10-20 DOI: 10.1162/coli_r_00389

Yue Zhang

Text production (Reiter and Dale 2000; Gatt and Krahmer 2018) is also referred to as natural language generation (NLG). It is a subtask of natural language processing focusing on the generation of natural language text. Although as important as natural language understanding for communication, NLG had received relatively less research attention. Recently, the rise of deep learning techniques has led to a surge of research interest in text production, both in general and for specific applications such as text summarization and dialogue systems. Deep learning allows NLG models to be constructed based on neural representations, thereby enabling end-to-end NLG systems to replace traditional pipeline approaches, which frees us from tedious engineering efforts and improves the output quality. In particular, a neural encoder-decoder structure (Cho et al. 2014; Sutskever, Vinyals, and Le 2014) has been widely used as a basic framework, which computes input representations using a neural encoder, according to which a text sequence is generated token by token using a neural decoder. Very recently, pre-training techniques (Broscheit et al. 2010; Radford 2018; Devlin et al. 2019) have further allowed neural models to collect knowledge from large raw text data, further improving the quality of both encoding and decoding. This book introduces the fundamentals of neural text production, discussing both the mostly investigated tasks and the foundational neural methods. NLG tasks with different types of inputs are introduced, and benchmark datasets are discussed in detail. The encoder-decoder architecture is introduced together with basic neural network components such as convolutional neural network (CNN) (Kim 2014) and recurrent neural network (RNN) (Cho et al. 2014). Elaborations are given on the encoder, the decoder, and task-specific optimization techniques. A contrast is made between the neural solution and traditional solutions to the task. Toward the end of the book, more recent techniques such as self-attention networks (Vaswani et al. 2017) and pre-training are briefly discussed. Throughout the book, figures are given to facilitate understanding and references are provided to enable further reading. Chapter 1 introduces the task of text production, discussing three typical input settings, namely, generation from meaning representations (MR; i.e., realization), generation from data (i.e., data-to-text), and generation from text (i.e., text-to-text). At the end of the chapter, a book outline is given, and the scope, coverage, and notation convention

文本生成（Reiter和Dale 2000；Gatt和Krahmer 2018）也被称为自然语言生成（NLG）。它是自然语言处理的一个子任务，主要关注自然语言文本的生成。尽管NLG与自然语言理解对沟通同样重要，但它受到的研究关注相对较少。最近，深度学习技术的兴起导致了对文本生成的研究兴趣激增，无论是在一般情况下还是在文本摘要和对话系统等特定应用中。深度学习允许基于神经表示构建NLG模型，从而使端到端NLG系统能够取代传统的流水线方法，这使我们摆脱了繁琐的工程工作，并提高了输出质量。特别地，神经编码器-解码器结构（Cho等人，2014；Sutskever、Vinyals和Le 2014）已被广泛用作基本框架，其使用神经编码器计算输入表示，根据该结构，使用神经解码器逐个令牌地生成文本序列。最近，预训练技术（Broscheit等人，2010；Radford 2018；Devlin等人，2019）进一步允许神经模型从大型原始文本数据中收集知识，进一步提高了编码和解码的质量。本书介绍了神经文本生成的基本原理，讨论了主要研究的任务和基本的神经方法。介绍了具有不同类型输入的NLG任务，并详细讨论了基准数据集。编码器-解码器架构与卷积神经网络（CNN）（Kim 2014）和递归神经网络（RNN）（Cho等人，2014）等基本神经网络组件一起介绍。详细阐述了编码器、解码器和特定任务的优化技术。对任务的神经解决方案和传统解决方案进行了对比。在书的最后，简要讨论了最近的技术，如自我注意网络（Vaswani等人，2017）和预训练。在整本书中，给出了数字以便于理解，并提供了参考资料以便于进一步阅读。第一章介绍了文本生成的任务，讨论了三种典型的输入设置，即从意义表示生成（MR；即实现）、从数据生成（即数据到文本）和从文本生成（即文本到文本）。在本章的最后，给出了一本书的大纲，以及范围、覆盖范围和注释惯例

{"title":"Deep Learning Approaches to Text Production","authors":"Yue Zhang","doi":"10.1162/coli_r_00389","DOIUrl":"https://doi.org/10.1162/coli_r_00389","url":null,"abstract":"Text production (Reiter and Dale 2000; Gatt and Krahmer 2018) is also referred to as natural language generation (NLG). It is a subtask of natural language processing focusing on the generation of natural language text. Although as important as natural language understanding for communication, NLG had received relatively less research attention. Recently, the rise of deep learning techniques has led to a surge of research interest in text production, both in general and for specific applications such as text summarization and dialogue systems. Deep learning allows NLG models to be constructed based on neural representations, thereby enabling end-to-end NLG systems to replace traditional pipeline approaches, which frees us from tedious engineering efforts and improves the output quality. In particular, a neural encoder-decoder structure (Cho et al. 2014; Sutskever, Vinyals, and Le 2014) has been widely used as a basic framework, which computes input representations using a neural encoder, according to which a text sequence is generated token by token using a neural decoder. Very recently, pre-training techniques (Broscheit et al. 2010; Radford 2018; Devlin et al. 2019) have further allowed neural models to collect knowledge from large raw text data, further improving the quality of both encoding and decoding. This book introduces the fundamentals of neural text production, discussing both the mostly investigated tasks and the foundational neural methods. NLG tasks with different types of inputs are introduced, and benchmark datasets are discussed in detail. The encoder-decoder architecture is introduced together with basic neural network components such as convolutional neural network (CNN) (Kim 2014) and recurrent neural network (RNN) (Cho et al. 2014). Elaborations are given on the encoder, the decoder, and task-specific optimization techniques. A contrast is made between the neural solution and traditional solutions to the task. Toward the end of the book, more recent techniques such as self-attention networks (Vaswani et al. 2017) and pre-training are briefly discussed. Throughout the book, figures are given to facilitate understanding and references are provided to enable further reading. Chapter 1 introduces the task of text production, discussing three typical input settings, namely, generation from meaning representations (MR; i.e., realization), generation from data (i.e., data-to-text), and generation from text (i.e., text-to-text). At the end of the chapter, a book outline is given, and the scope, coverage, and notation convention","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"46 1","pages":"899-903"},"PeriodicalIF":9.3,"publicationDate":"2020-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1162/coli_r_00389","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47652735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual Attention Model for Citation Recommendation 引文推荐的双重注意模型

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-10-01 DOI: 10.1162/coli_a_00438

Yang Zhang, Qiang Ma

Based on an exponentially increasing number of academic articles, discovering and citing comprehensive and appropriate resources has become a non-trivial task. Conventional citation recommender methods suffer from severe information loss. For example, they do not consider the section of the paper that the user is writing and for which they need to find a citation, the relatedness between the words in the local context (the text span that describes a citation), or the importance on each word from the local context. These shortcomings make such methods insufficient for recommending adequate citations to academic manuscripts. In this study, we propose a novel embedding-based neural network called “dual attention model for citation recommendation (DACR)” to recommend citations during manuscript preparation. Our method adapts embedding of three semantic information: words in the local context, structural contexts, and the section on which a user is working. A neural network model is designed to maximize the similarity between the embedding of the three input (local context words, section and structural contexts) and the target citation appearing in the context. The core of the neural network model is composed of self-attention and additive attention, where the former aims to capture the relatedness between the contextual words and structural context, and the latter aims to learn the importance of them. The experiments on real-world datasets demonstrate the effectiveness of the proposed approach.

在学术论文呈指数级增长的情况下，发现和引用全面、合适的资源已成为一项不容忽视的任务。传统的引文推荐方法存在严重的信息丢失问题。例如，他们不考虑用户正在写的论文的那一部分，他们需要为其找到引文，在本地上下文中单词之间的相关性(描述引文的文本跨度)，或者每个单词在本地上下文中的重要性。这些缺点使得这种方法不足以为学术手稿推荐足够的引文。在本研究中，我们提出了一种新的基于嵌入的神经网络，称为“引文推荐双注意模型(dual attention model for citation recommendation, DACR)”，用于论文准备过程中的引文推荐。我们的方法适应三种语义信息的嵌入:本地上下文中的单词、结构上下文和用户正在工作的部分。设计了一个神经网络模型，使三种输入(局部上下文词、段落和结构上下文)的嵌入与上下文中出现的目标引文之间的相似性最大化。神经网络模型的核心由自注意和加性注意组成，自注意的目的是捕捉上下文词和结构上下文之间的相关性，加性注意的目的是学习它们的重要性。在实际数据集上的实验证明了该方法的有效性。

{"title":"Dual Attention Model for Citation Recommendation","authors":"Yang Zhang, Qiang Ma","doi":"10.1162/coli_a_00438","DOIUrl":"https://doi.org/10.1162/coli_a_00438","url":null,"abstract":"Based on an exponentially increasing number of academic articles, discovering and citing comprehensive and appropriate resources has become a non-trivial task. Conventional citation recommender methods suffer from severe information loss. For example, they do not consider the section of the paper that the user is writing and for which they need to find a citation, the relatedness between the words in the local context (the text span that describes a citation), or the importance on each word from the local context. These shortcomings make such methods insufficient for recommending adequate citations to academic manuscripts. In this study, we propose a novel embedding-based neural network called “dual attention model for citation recommendation (DACR)” to recommend citations during manuscript preparation. Our method adapts embedding of three semantic information: words in the local context, structural contexts, and the section on which a user is working. A neural network model is designed to maximize the similarity between the embedding of the three input (local context words, section and structural contexts) and the target citation appearing in the context. The core of the neural network model is composed of self-attention and additive attention, where the former aims to capture the relatedness between the contextual words and structural context, and the latter aims to learn the importance of them. The experiments on real-world datasets demonstrate the effectiveness of the proposed approach.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"403-470"},"PeriodicalIF":9.3,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41322353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Syntax Role for Neural Semantic Role Labeling 神经语义角色标注的句法角色

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-09-12 DOI: 10.1162/coli_a_00408

Z. Li, Hai Zhao, Shexia He, Jiaxun Cai

Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruning-based and syntax feature-based. Experiments are conducted on the CoNLL-2005, -2009, and -2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.

语义角色标注(SRL)致力于识别句子的语义谓词-论证结构。以往基于传统模型的研究表明，句法信息对SRL的性能有显著的影响;然而，最近的一些神经SRL研究对句法信息的必要性提出了挑战，这些研究表明，在没有句法骨架的情况下，句法信息对神经语义角色标记的重要性大大降低，特别是当与最新的深度神经网络和大规模预训练语言模型配对时。尽管有这个概念，神经SRL领域仍然缺乏对句法信息在SRL中的相关性的系统和充分的研究，无论是依赖关系还是单语和多语设置。本文旨在量化深度学习框架中句法信息对神经SRL的重要性。我们介绍了三种典型的SRL框架(基线):基于序列的、基于树的和基于图的，它们伴随着两类利用语法信息的方法:基于语法修剪的和基于语法特征的。在CoNLL-2005、-2009和-2012基准测试中对所有可用语言进行了实验，结果表明，在一定条件下，神经SRL模型仍然可以从语法信息中获益。此外，我们展示了语法对神经SRL模型的定量意义，并使用现有模型进行了彻底的实证调查。

{"title":"Syntax Role for Neural Semantic Role Labeling","authors":"Z. Li, Hai Zhao, Shexia He, Jiaxun Cai","doi":"10.1162/coli_a_00408","DOIUrl":"https://doi.org/10.1162/coli_a_00408","url":null,"abstract":"Semantic role labeling (SRL) is dedicated to recognizing the semantic predicate-argument structure of a sentence. Previous studies in terms of traditional models have shown syntactic information can make remarkable contributions to SRL performance; however, the necessity of syntactic information was challenged by a few recent neural SRL studies that demonstrate impressive performance without syntactic backbones and suggest that syntax information becomes much less important for neural semantic role labeling, especially when paired with recent deep neural network and large-scale pre-trained language models. Despite this notion, the neural SRL field still lacks a systematic and full investigation on the relevance of syntactic information in SRL, for both dependency and both monolingual and multilingual settings. This paper intends to quantify the importance of syntactic information for neural SRL in the deep learning framework. We introduce three typical SRL frameworks (baselines), sequence-based, tree-based, and graph-based, which are accompanied by two categories of exploiting syntactic information: syntax pruning-based and syntax feature-based. Experiments are conducted on the CoNLL-2005, -2009, and -2012 benchmarks for all languages available, and results show that neural SRL models can still benefit from syntactic information under certain conditions. Furthermore, we show the quantitative significance of syntax to neural SRL models together with a thorough empirical survey using existing models.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"1-46"},"PeriodicalIF":9.3,"publicationDate":"2020-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45507622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Multilingual and Interlingual Semantic Representations for Natural Language Processing: A Brief Introduction 自然语言处理的多语言和语际语义表示:简介

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-06-01 DOI: 10.1162/coli_a_00373

M. Costa-jussà, C. España-Bonet, Pascale Fung, Noah A. Smith

We introduce the Computational Linguistics special issue on Multilingual and Interlingual Semantic Representations for Natural Language Processing. We situate the special issue’s five articles in the context of our fast-changing field, explaining our motivation for this project. We offer a brief summary of the work in the issue, which includes developments on lexical and sentential semantic representations, from symbolic and neural perspectives.

我们介绍了计算语言学专刊《自然语言处理中的多语言和语际语义表示》。我们将特刊的五篇文章置于我们快速变化的领域背景中，解释了我们这个项目的动机。我们从符号和神经的角度对本期的研究工作进行了简要的总结，包括词汇和句子语义表征的发展。

引用次数: 2

CausaLM: Causal Model Explanation Through Counterfactual Language Models 因果LM：通过反事实语言模型解释因果模型

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-05-27 DOI: 10.1162/coli_a_00404

Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart

Abstract Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1

摘要理解深度神经网络的预测是出了名的困难，但对其传播也至关重要。与所有基于机器学习的方法一样，它们与训练数据一样好，也可以捕捉不必要的偏见。虽然有一些工具可以帮助理解是否存在这种偏见，但它们无法区分相关性和因果关系，可能不适合基于文本的模型和对高级语言概念的推理。估计兴趣概念对给定模型的因果影响的一个关键问题是，这种估计需要生成反事实的例子，这对现有的生成技术来说是具有挑战性的。为了弥补这一差距，我们提出了CausaLM，这是一个使用反事实语言表示模型产生因果模型解释的框架。我们的方法基于对深度上下文嵌入模型的微调，该模型具有从问题的因果图派生的辅助对抗性任务。具体来说，我们表明，通过仔细选择辅助对抗性预训练任务，诸如BERT之类的语言表示模型可以有效地学习给定兴趣概念的反事实表示，并用于估计其对模型性能的真实因果影响。我们的方法的一个副产品是一个不受测试概念影响的语言表示模型，它可以帮助减轻数据中根深蒂固的不必要的偏见。1

{"title":"CausaLM: Causal Model Explanation Through Counterfactual Language Models","authors":"Amir Feder, Nadav Oved, Uri Shalit, Roi Reichart","doi":"10.1162/coli_a_00404","DOIUrl":"https://doi.org/10.1162/coli_a_00404","url":null,"abstract":"Abstract Understanding predictions made by deep neural networks is notoriously difficult, but also crucial to their dissemination. As all machine learning–based methods, they are as good as their training data, and can also capture unwanted biases. While there are tools that can help understand whether such biases exist, they do not distinguish between correlation and causation, and might be ill-suited for text-based models and for reasoning about high-level language concepts. A key problem of estimating the causal effect of a concept of interest on a given model is that this estimation requires the generation of counterfactual examples, which is challenging with existing generation technology. To bridge that gap, we propose CausaLM, a framework for producing causal model explanations using counterfactual language representation models. Our approach is based on fine-tuning of deep contextualized embedding models with auxiliary adversarial tasks derived from the causal graph of the problem. Concretely, we show that by carefully choosing auxiliary adversarial pre-training tasks, language representation models such as BERT can effectively learn a counterfactual representation for a given concept of interest, and be used to estimate its true causal effect on model performance. A byproduct of our method is a language representation model that is unaffected by the tested concept, which can be useful in mitigating unwanted bias ingrained in the data.1","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"333-386"},"PeriodicalIF":9.3,"publicationDate":"2020-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43733662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 97

Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve 命名实体识别的可解释性分析，以理解系统预测以及如何改进

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-04-09 DOI: 10.1162/coli_a_00397

Oshin Agarwal, Yinfei Yang, Byron C. Wallace, A. Nenkova

Abstract Named entity recognition systems achieve remarkable performance on domains such as English news. It is natural to ask: What are these models actually learning to achieve this? Are they merely memorizing the names themselves? Or are they capable of interpreting the text and inferring the correct entity type from the linguistic context? We examine these questions by contrasting the performance of several variants of architectures for named entity recognition, with some provided only representations of the context as features. We experiment with GloVe-based BiLSTM-CRF as well as BERT. We find that context does influence predictions, but the main factor driving high performance is learning the named tokens themselves. Furthermore, we find that BERT is not always better at recognizing predictive contexts compared to a BiLSTM-CRF model. We enlist human annotators to evaluate the feasibility of inferring entity types from context alone and find that humans are also mostly unable to infer entity types for the majority of examples on which the context-only system made errors. However, there is room for improvement: A system should be able to recognize any named entity in a predictive context correctly and our experiments indicate that current systems may be improved by such capability. Our human study also revealed that systems and humans do not always learn the same contextual clues, and context-only systems are sometimes correct even when humans fail to recognize the entity type from the context. Finally, we find that one issue contributing to model errors is the use of “entangled” representations that encode both contextual and local token information into a single vector, which can obscure clues. Our results suggest that designing models that explicitly operate over representations of local inputs and context, respectively, may in some cases improve performance. In light of these and related findings, we highlight directions for future work.

摘要命名实体识别系统在英语新闻等领域取得了显著的性能。很自然地会问：这些模型实际上是在学习什么来实现这一点的？他们只是自己记名字吗？或者他们能够解释文本并从语言语境中推断出正确的实体类型吗？我们通过对比命名实体识别的几种架构变体的性能来研究这些问题，其中一些仅提供了上下文的表示作为特征。我们用基于GloVe的BiLSTM CRF和BERT进行了实验。我们发现，上下文确实会影响预测，但推动高性能的主要因素是学习命名的令牌本身。此外，我们发现，与BiLSTM CRF模型相比，BERT在识别预测上下文方面并不总是更好。我们招募了人工注释器来评估仅从上下文推断实体类型的可行性，并发现对于仅上下文系统出错的大多数示例，人工也大多无法推断实体类型。然而，还有改进的空间：一个系统应该能够正确识别预测上下文中的任何命名实体，我们的实验表明，当前的系统可以通过这种能力得到改进。我们的人类研究还表明，系统和人类并不总是学习相同的上下文线索，即使人类无法从上下文中识别实体类型，仅上下文的系统有时也是正确的。最后，我们发现导致模型错误的一个问题是使用“纠缠”表示，将上下文和局部标记信息编码到单个向量中，这可能会模糊线索。我们的结果表明，设计分别在局部输入和上下文表示上显式操作的模型，在某些情况下可能会提高性能。根据这些和相关发现，我们强调了未来工作的方向。

{"title":"Interpretability Analysis for Named Entity Recognition to Understand System Predictions and How They Can Improve","authors":"Oshin Agarwal, Yinfei Yang, Byron C. Wallace, A. Nenkova","doi":"10.1162/coli_a_00397","DOIUrl":"https://doi.org/10.1162/coli_a_00397","url":null,"abstract":"Abstract Named entity recognition systems achieve remarkable performance on domains such as English news. It is natural to ask: What are these models actually learning to achieve this? Are they merely memorizing the names themselves? Or are they capable of interpreting the text and inferring the correct entity type from the linguistic context? We examine these questions by contrasting the performance of several variants of architectures for named entity recognition, with some provided only representations of the context as features. We experiment with GloVe-based BiLSTM-CRF as well as BERT. We find that context does influence predictions, but the main factor driving high performance is learning the named tokens themselves. Furthermore, we find that BERT is not always better at recognizing predictive contexts compared to a BiLSTM-CRF model. We enlist human annotators to evaluate the feasibility of inferring entity types from context alone and find that humans are also mostly unable to infer entity types for the majority of examples on which the context-only system made errors. However, there is room for improvement: A system should be able to recognize any named entity in a predictive context correctly and our experiments indicate that current systems may be improved by such capability. Our human study also revealed that systems and humans do not always learn the same contextual clues, and context-only systems are sometimes correct even when humans fail to recognize the entity type from the context. Finally, we find that one issue contributing to model errors is the use of “entangled” representations that encode both contextual and local token information into a single vector, which can obscure clues. Our results suggest that designing models that explicitly operate over representations of local inputs and context, respectively, may in some cases improve performance. In light of these and related findings, we highlight directions for future work.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"117-140"},"PeriodicalIF":9.3,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42716335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases RYANSQL:跨域数据库中基于草图的复杂文本槽填充递归应用于SQL

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-04-07 DOI: 10.1162/coli_a_00403

Donghyun Choi, M. Shin, EungGyun Kim, Dong Ryeol Shin

Abstract Text-to-SQL is the problem of converting a user question into an SQL query, when the question and database are given. In this article, we present a neural network approach called RYANSQL (Recursively Yielding Annotation Network for SQL) to solve complex Text-to-SQL tasks for cross-domain databases. Statement Position Code (SPC) is defined to transform a nested SQL query into a set of non-nested SELECT statements; a sketch-based slot-filling approach is proposed to synthesize each SELECT statement for its corresponding SPC. Additionally, two input manipulation methods are presented to improve generation performance further. RYANSQL achieved competitive result of 58.2% accuracy on the challenging Spider benchmark. At the time of submission (April 2020), RYANSQL v2, a variant of original RYANSQL, is positioned at 3rd place among all systems and 1st place among the systems not using database content with 60.6% exact matching accuracy. The source code is available at https://github.com/kakaoenterprise/RYANSQL.

摘要文本到SQL是在给定问题和数据库的情况下，将用户的问题转换为SQL查询的问题。在本文中，我们提出了一种称为RYANSQL(递归生成SQL注释网络)的神经网络方法来解决跨域数据库的复杂文本到SQL任务。语句位置码(SPC)用于将嵌套的SQL查询转换为一组非嵌套的SELECT语句;提出了一种基于草图的槽填充方法来综合每个SELECT语句对应的SPC。此外，为了进一步提高生成性能，提出了两种输入操作方法。在具有挑战性的Spider基准测试中，RYANSQL获得了58.2%的准确率。在提交时(2020年4月)，原始RYANSQL的变体RYANSQL v2在所有系统中排名第三，在不使用数据库内容的系统中排名第一，精确匹配准确率为60.6%。源代码可从https://github.com/kakaoenterprise/RYANSQL获得。

引用次数: 77

Data-Driven Sentence Simplification: Survey and Benchmark 数据驱动的句子简化：调查与基准

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-03-01 DOI: 10.1162/coli_a_00370

Fernando Alva-Manchego, Carolina Scarton, Lucia Specia

Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common data sets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments.

句子简化（SS）旨在修改句子，使其更易于阅读和理解。为了做到这一点，可以执行几个重写转换，如替换、重新排序和拆分。执行这些转换，同时保持句子的语法，保留它们的主要思想，并生成更简单的输出，是一个具有挑战性的问题，但仍远未解决。在这篇文章中，我们调查了对SS的研究，重点是试图学习如何使用英语中对齐的原始简化句子对的语料库来简化的方法，这是当今的主流范式。我们还为通用数据集提供了不同方法的基准，以便对它们进行比较，并强调它们的优势和局限性。我们希望这项调查将成为对这项任务感兴趣的研究人员的起点，并有助于激发未来发展的新想法。

引用次数: 85

Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity Multi-SimLex:多语言和跨语言词汇语义相似度的大规模评价

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-02-01 DOI: 10.1162/coli_a_00391

Ivan Vulic, Simon Baker, E. Ponti, Ulla Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden Bar, Matt Malone, T. Poibeau, Roi Reichart, A. Korhonen

Abstract We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language data set is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 crosslingual semantic similarity data sets. Because of its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and crosslingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and crosslingual representation models, including static and contextualized word embeddings (such as fastText, monolingual and multilingual BERT, XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised crosslingual word embeddings. We also present a step-by-step data set creation protocol for creating consistent, Multi-Simlex–style resources for additional languages. We make these contributions—the public release of Multi-SimLex data sets, their creation protocol, strong baseline results, and in-depth analyses which can be helpful in guiding future developments in multilingual lexical semantics and representation learning—available via a Web site that will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.

本文介绍了Multi-SimLex，这是一个大型词汇资源和评估基准，涵盖了12种不同类型语言的数据集，包括主要语言(如汉语普通话、西班牙语、俄语)以及资源较少的语言(如威尔士语、斯瓦希里语)。每个语言数据集都针对语义相似性的词汇关系进行了注释，并包含1888个语义对齐的概念对，提供了词类(名词、动词、形容词、副词)、频率等级、相似间隔、词汇字段和具体级别的代表性覆盖。此外，由于跨语言概念的一致性，我们提供了一套66个跨语言语义相似度数据集。由于其广泛的规模和语言覆盖，Multi-SimLex为实验评估和分析提供了全新的机会。在其单语言和跨语言基准上，我们评估和分析了一系列最新的最先进的单语言和跨语言表示模型，包括静态和情境化词嵌入(如fastText，单语言和多语言BERT, XLM)，外部通知词汇表示，以及完全无监督和(弱)监督的跨语言词嵌入。我们还提供了一个逐步的数据集创建协议，用于为其他语言创建一致的、multi - simlex风格的资源。我们通过一个网站提供了这些贡献——Multi-SimLex数据集的公开发布、它们的创建协议、强大的基线结果，以及有助于指导多语言词汇语义和表示学习的未来发展的深入分析，该网站将鼓励社区努力将Multi-SimLex进一步扩展到更多语言。如此大规模的语义资源可以激发跨语言NLP的重大进一步进展。

{"title":"Multi-SimLex: A Large-Scale Evaluation of Multilingual and Crosslingual Lexical Semantic Similarity","authors":"Ivan Vulic, Simon Baker, E. Ponti, Ulla Petti, Ira Leviant, Kelly Wing, Olga Majewska, Eden Bar, Matt Malone, T. Poibeau, Roi Reichart, A. Korhonen","doi":"10.1162/coli_a_00391","DOIUrl":"https://doi.org/10.1162/coli_a_00391","url":null,"abstract":"Abstract We introduce Multi-SimLex, a large-scale lexical resource and evaluation benchmark covering data sets for 12 typologically diverse languages, including major languages (e.g., Mandarin Chinese, Spanish, Russian) as well as less-resourced ones (e.g., Welsh, Kiswahili). Each language data set is annotated for the lexical relation of semantic similarity and contains 1,888 semantically aligned concept pairs, providing a representative coverage of word classes (nouns, verbs, adjectives, adverbs), frequency ranks, similarity intervals, lexical fields, and concreteness levels. Additionally, owing to the alignment of concepts across languages, we provide a suite of 66 crosslingual semantic similarity data sets. Because of its extensive size and language coverage, Multi-SimLex provides entirely novel opportunities for experimental evaluation and analysis. On its monolingual and crosslingual benchmarks, we evaluate and analyze a wide array of recent state-of-the-art monolingual and crosslingual representation models, including static and contextualized word embeddings (such as fastText, monolingual and multilingual BERT, XLM), externally informed lexical representations, as well as fully unsupervised and (weakly) supervised crosslingual word embeddings. We also present a step-by-step data set creation protocol for creating consistent, Multi-Simlex–style resources for additional languages. We make these contributions—the public release of Multi-SimLex data sets, their creation protocol, strong baseline results, and in-depth analyses which can be helpful in guiding future developments in multilingual lexical semantics and representation learning—available via a Web site that will encourage community effort in further expansion of Multi-Simlex to many more languages. Such a large-scale semantic resource could inspire significant further advances in NLP across languages.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"46 1","pages":"847-897"},"PeriodicalIF":9.3,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1162/coli_a_00391","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45590264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 60

A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation 多语言神经机器翻译中基于内注意的句子表征系统研究

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2020-01-01 DOI: 10.1162/coli_a_00377

Raúl Vázquez, Alessandro Raganato, Mathias Creutz, J. Tiedemann

Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.

神经网络机器翻译通过学习输入句子的良好表征，大大提高了自动翻译的质量。在本文中，我们探索了一种多语言翻译模型，该模型能够通过合并一个中间的跨语言共享层(我们称之为注意桥)来产生固定大小的句子表示。这一层利用每种语言的语义，并发展成一种与语言无关的意义表示，可以有效地用于迁移学习。我们系统地研究了注意桥大小的影响以及在模型中加入额外语言的影响。与之前的相关工作相比，我们证明了在下游任务中翻译性能与句子表征的使用之间没有冲突。特别是，我们发现更大的中间层不仅提高了翻译质量，特别是对于长句子，而且还提高了可训练分类任务的准确性。然而，较短的表示导致压缩增加，这在不可训练的相似性任务中是有益的。类似地，我们表明可训练的下游任务受益于多语言模型，而额外的语言信号不会提高非可训练基准的性能。这是一个重要的见解，有助于为特定的应用程序正确地设计模型。最后，我们还深入分析了所提出的注意桥及其编码语言属性的能力。我们仔细分析了由个体注意头捕获的信息，并确定了解释语言探测任务中特定设置表现的有趣模式。

{"title":"A Systematic Study of Inner-Attention-Based Sentence Representations in Multilingual Neural Machine Translation","authors":"Raúl Vázquez, Alessandro Raganato, Mathias Creutz, J. Tiedemann","doi":"10.1162/coli_a_00377","DOIUrl":"https://doi.org/10.1162/coli_a_00377","url":null,"abstract":"Neural machine translation has considerably improved the quality of automatic translations by learning good representations of input sentences. In this article, we explore a multilingual translation model capable of producing fixed-size sentence representations by incorporating an intermediate crosslingual shared layer, which we refer to as attention bridge. This layer exploits the semantics from each language and develops into a language-agnostic meaning representation that can be efficiently used for transfer learning. We systematically study the impact of the size of the attention bridge and the effect of including additional languages in the model. In contrast to related previous work, we demonstrate that there is no conflict between translation performance and the use of sentence representations in downstream tasks. In particular, we show that larger intermediate layers not only improve translation quality, especially for long sentences, but also push the accuracy of trainable classification tasks. Nevertheless, shorter representations lead to increased compression that is beneficial in non-trainable similarity tasks. Similarly, we show that trainable downstream tasks benefit from multilingual models, whereas additional language signals do not improve performance in non-trainable benchmarks. This is an important insight that helps to properly design models for specific applications. Finally, we also include an in-depth analysis of the proposed attention bridge and its ability to encode linguistic properties. We carefully analyze the information that is captured by individual attention heads and identify interesting patterns that explain the performance of specific settings in linguistic probing tasks.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"46 1","pages":"387-424"},"PeriodicalIF":9.3,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1162/coli_a_00377","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64495002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1