Computational Linguistics最新文献_第7页

Abstractive Text Summarization: Enhancing Sequence-to-Sequence Models Using Word Sense Disambiguation and Semantic Content Generalization 抽象文本摘要:利用词义消歧和语义内容泛化增强序列到序列模型

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-08-05 DOI: 10.1162/coli_a_00417

P. Kouris, Georgios Alexandridis, A. Stafylopatis

Abstract Nowadays, most research conducted in the field of abstractive text summarization focuses on neural-based models alone, without considering their combination with knowledge-based approaches that could further enhance their efficiency. In this direction, this work presents a novel framework that combines sequence-to-sequence neural-based text summarization along with structure and semantic-based methodologies. The proposed framework is capable of dealing with the problem of out-of-vocabulary or rare words, improving the performance of the deep learning models. The overall methodology is based on a well-defined theoretical model of knowledge-based content generalization and deep learning predictions for generating abstractive summaries. The framework is composed of three key elements: (i) a pre-processing task, (ii) a machine learning methodology, and (iii) a post-processing task. The pre-processing task is a knowledge-based approach, based on ontological knowledge resources, word sense disambiguation, and named entity recognition, along with content generalization, that transforms ordinary text into a generalized form. A deep learning model of attentive encoder-decoder architecture, which is expanded to enable a coping and coverage mechanism, as well as reinforcement learning and transformer-based architectures, is trained on a generalized version of text-summary pairs, learning to predict summaries in a generalized form. The post-processing task utilizes knowledge resources, word embeddings, word sense disambiguation, and heuristic algorithms based on text similarity methods in order to transform the generalized version of a predicted summary to a final, human-readable form. An extensive experimental procedure on three popular data sets evaluates key aspects of the proposed framework, while the obtained results exhibit promising performance, validating the robustness of the proposed approach.

摘要如今，在抽象文本摘要领域进行的大多数研究都只关注基于神经的模型，而没有考虑它们与基于知识的方法的结合，这可以进一步提高它们的效率。在这个方向上，这项工作提出了一个新的框架，将基于序列到序列的神经文本摘要与基于结构和语义的方法相结合。所提出的框架能够处理词汇不足或生僻词的问题，提高深度学习模型的性能。总体方法基于一个定义良好的基于知识的内容概括和深度学习预测的理论模型，用于生成抽象摘要。该框架由三个关键元素组成：（i）预处理任务，（ii）机器学习方法，以及（iii）后处理任务。预处理任务是一种基于知识的方法，基于本体论知识资源、词义消歧、命名实体识别以及内容泛化，将普通文本转换为广义形式，以及强化学习和基于转换器的架构，在广义版本的文本摘要对上进行训练，学习以广义形式预测摘要。后处理任务利用知识资源、单词嵌入、词义消歧，以及基于文本相似性方法的启发式算法，以便将预测摘要的广义版本转换为最终的人类可读形式。在三个流行数据集上进行的广泛实验程序评估了所提出框架的关键方面，而获得的结果显示出有希望的性能，验证了所提出方法的稳健性。

{"title":"Abstractive Text Summarization: Enhancing Sequence-to-Sequence Models Using Word Sense Disambiguation and Semantic Content Generalization","authors":"P. Kouris, Georgios Alexandridis, A. Stafylopatis","doi":"10.1162/coli_a_00417","DOIUrl":"https://doi.org/10.1162/coli_a_00417","url":null,"abstract":"Abstract Nowadays, most research conducted in the field of abstractive text summarization focuses on neural-based models alone, without considering their combination with knowledge-based approaches that could further enhance their efficiency. In this direction, this work presents a novel framework that combines sequence-to-sequence neural-based text summarization along with structure and semantic-based methodologies. The proposed framework is capable of dealing with the problem of out-of-vocabulary or rare words, improving the performance of the deep learning models. The overall methodology is based on a well-defined theoretical model of knowledge-based content generalization and deep learning predictions for generating abstractive summaries. The framework is composed of three key elements: (i) a pre-processing task, (ii) a machine learning methodology, and (iii) a post-processing task. The pre-processing task is a knowledge-based approach, based on ontological knowledge resources, word sense disambiguation, and named entity recognition, along with content generalization, that transforms ordinary text into a generalized form. A deep learning model of attentive encoder-decoder architecture, which is expanded to enable a coping and coverage mechanism, as well as reinforcement learning and transformer-based architectures, is trained on a generalized version of text-summary pairs, learning to predict summaries in a generalized form. The post-processing task utilizes knowledge resources, word embeddings, word sense disambiguation, and heuristic algorithms based on text similarity methods in order to transform the generalized version of a predicted summary to a final, human-readable form. An extensive experimental procedure on three popular data sets evaluates key aspects of the proposed framework, while the obtained results exhibit promising performance, validating the robustness of the proposed approach.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"813-859"},"PeriodicalIF":9.3,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47005694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

Variational Deep Logic Network for Joint Inference of Entities and Relations 实体与关系联合推理的变分深度逻辑网络

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-08-05 DOI: 10.1162/coli_a_00415

Wenya Wang, Sinno Jialin Pan

Abstract Currently, deep learning models have been widely adopted and achieved promising results on various application domains. Despite their intriguing performance, most deep learning models function as black boxes, lacking explicit reasoning capabilities and explanations, which are usually essential for complex problems. Take joint inference in information extraction as an example. This task requires the identification of multiple structured knowledge from texts, which is inter-correlated, including entities, events, and the relationships between them. Various deep neural networks have been proposed to jointly perform entity extraction and relation prediction, which only propagate information implicitly via representation learning. However, they fail to encode the intensive correlations between entity types and relations to enforce their coexistence. On the other hand, some approaches adopt rules to explicitly constrain certain relational facts, although the separation of rules with representation learning usually restrains the approaches with error propagation. Moreover, the predefined rules are inflexible and might result in negative effects when data is noisy. To address these limitations, we propose a variational deep logic network that incorporates both representation learning and relational reasoning via the variational EM algorithm. The model consists of a deep neural network to learn high-level features with implicit interactions via the self-attention mechanism and a relational logic network to explicitly exploit target interactions. These two components are trained interactively to bring the best of both worlds. We conduct extensive experiments ranging from fine-grained sentiment terms extraction, end-to-end relation prediction, to end-to-end event extraction to demonstrate the effectiveness of our proposed method.

摘要目前，深度学习模型已被广泛采用，并在各个应用领域取得了可喜的成果。尽管它们的性能很有趣，但大多数深度学习模型都是黑匣子，缺乏明确的推理能力和解释，而这些能力和解释通常对复杂问题至关重要。以信息提取中的联合推理为例。这项任务需要从文本中识别多个相互关联的结构化知识，包括实体、事件及其之间的关系。已经提出了各种深度神经网络来联合执行实体提取和关系预测，它们仅通过表示学习隐式地传播信息。然而，它们未能对实体类型和关系之间的密集相关性进行编码，以加强它们的共存。另一方面，一些方法采用规则来明确约束某些关系事实，尽管规则与表示学习的分离通常会抑制错误传播的方法。此外，预定义的规则是不灵活的，当数据有噪声时可能会产生负面影响。为了解决这些限制，我们提出了一种变分深度逻辑网络，该网络通过变分EM算法结合了表示学习和关系推理。该模型由一个深度神经网络和一个关系逻辑网络组成，前者通过自注意机制学习具有隐含交互的高级特征，后者明确利用目标交互。这两个组件是交互式训练的，以实现两全其美。我们进行了从细粒度情感项提取、端到端关系预测到端到端事件提取的广泛实验，以证明我们提出的方法的有效性。

{"title":"Variational Deep Logic Network for Joint Inference of Entities and Relations","authors":"Wenya Wang, Sinno Jialin Pan","doi":"10.1162/coli_a_00415","DOIUrl":"https://doi.org/10.1162/coli_a_00415","url":null,"abstract":"Abstract Currently, deep learning models have been widely adopted and achieved promising results on various application domains. Despite their intriguing performance, most deep learning models function as black boxes, lacking explicit reasoning capabilities and explanations, which are usually essential for complex problems. Take joint inference in information extraction as an example. This task requires the identification of multiple structured knowledge from texts, which is inter-correlated, including entities, events, and the relationships between them. Various deep neural networks have been proposed to jointly perform entity extraction and relation prediction, which only propagate information implicitly via representation learning. However, they fail to encode the intensive correlations between entity types and relations to enforce their coexistence. On the other hand, some approaches adopt rules to explicitly constrain certain relational facts, although the separation of rules with representation learning usually restrains the approaches with error propagation. Moreover, the predefined rules are inflexible and might result in negative effects when data is noisy. To address these limitations, we propose a variational deep logic network that incorporates both representation learning and relational reasoning via the variational EM algorithm. The model consists of a deep neural network to learn high-level features with implicit interactions via the self-attention mechanism and a relational logic network to explicitly exploit target interactions. These two components are trained interactively to bring the best of both worlds. We conduct extensive experiments ranging from fine-grained sentiment terms extraction, end-to-end relation prediction, to end-to-end event extraction to demonstrate the effectiveness of our proposed method.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"775-812"},"PeriodicalIF":9.3,"publicationDate":"2021-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49240733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Linear-Time Calculation of the Expected Sum of Edge Lengths in Random Projective Linearizations of Trees 树的随机投影线性化中期望边长和的线性时间计算

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-07-07 DOI: 10.1162/coli_a_00442

Lluís Alemany-Puig, R. Ferrer-i-Cancho

Abstract The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on lan- guages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective per- mutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we pre- sent formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.

摘要句子的句法结构通常用句法依赖树来表示。在过去的几十年里，句法相关词之间的距离总和一直是人们关注的焦点。对依赖距离的研究导致了依赖距离最小化原则的形成，即对句子中的单词进行排序，使其总和最小。为了对语言进行相关的定量研究，已经确定了许多随机基线。最简单的随机基线是句子中单词的无约束随机排列的和的期望值，即当一个句子的所有单词的洗牌都是允许的并且是等可能的。在这里，我们关注一个流行的基线:句子单词的随机投影突变，即句法依赖结构是投影的排列，这是语言中句子经常满足的一种形式约束。到目前为止，一个句子的随机投影洗牌中依赖距离和的期望已经用蒙特卡罗过程近似估计，该过程的代价是Rn阶，其中n是句子的单词数，R是样本数;众所周知，R越大，估计误差越小，但时间代价越大。在这里，我们给出了计算期望的公式，在n阶的时间内没有误差。此外，我们展示了星树最大化它，并提供了一种算法来检索最小化它的树。

{"title":"Linear-Time Calculation of the Expected Sum of Edge Lengths in Random Projective Linearizations of Trees","authors":"Lluís Alemany-Puig, R. Ferrer-i-Cancho","doi":"10.1162/coli_a_00442","DOIUrl":"https://doi.org/10.1162/coli_a_00442","url":null,"abstract":"Abstract The syntactic structure of a sentence is often represented using syntactic dependency trees. The sum of the distances between syntactically related words has been in the limelight for the past decades. Research on dependency distances led to the formulation of the principle of dependency distance minimization whereby words in sentences are ordered so as to minimize that sum. Numerous random baselines have been defined to carry out related quantitative studies on lan- guages. The simplest random baseline is the expected value of the sum in unconstrained random permutations of the words in the sentence, namely, when all the shufflings of the words of a sentence are allowed and equally likely. Here we focus on a popular baseline: random projective per- mutations of the words of the sentence, that is, permutations where the syntactic dependency structure is projective, a formal constraint that sentences satisfy often in languages. Thus far, the expectation of the sum of dependency distances in random projective shufflings of a sentence has been estimated approximately with a Monte Carlo procedure whose cost is of the order of Rn, where n is the number of words of the sentence and R is the number of samples; it is well known that the larger R is, the lower the error of the estimation but the larger the time cost. Here we pre- sent formulae to compute that expectation without error in time of the order of n. Furthermore, we show that star trees maximize it, and provide an algorithm to retrieve the trees that minimize it.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"48 1","pages":"491-516"},"PeriodicalIF":9.3,"publicationDate":"2021-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47649997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Decoding Word Embeddings with Brain-Based Semantic Features 基于大脑语义特征的单词嵌入解码

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-07-05 DOI: 10.1162/coli_a_00412

Emmanuele Chersoni, Enrico Santus, Chu-Ren Huang, Alessandro Lenci

Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these representations have been criticized for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons for word embedding performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings, including factorized count vectors and predict models (Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT). In our analysis, we first evaluate the quality of the mapping in a retrieval task, then we shed light on the semantic features that are better encoded in each embedding type. A large number of probing tasks is finally set to assess how the original and the mapped embeddings perform in discriminating semantic categories. For each probing task, we identify the most relevant semantic features and we show that there is a correlation between the embedding performance and how they encode those features. This study sets itself as a step forward in understanding which aspects of meaning are captured by vector spaces, by proposing a new and simple method to carve human-interpretable semantic representations from distributional vectors.

单词嵌入是通过计数或预测技术构建的矢量语义表示，旨在从单词的共同出现中捕捉含义的阴影。自从它们被引入以来，这些表征一直被批评为缺乏可解释的维度。单词嵌入的这种特性限制了我们对它们实际编码的语义特征的理解。此外，由于单词嵌入性能的原因对人类来说往往是不透明的，因此它有助于使用它们的任务的“黑匣子”性质。在这篇文章中，我们通过将单词嵌入映射到可解释向量上来探索单词嵌入中编码的语义特性，该向量由显式和神经生物学动机的语义特征组成（Binder等人，2016）。我们的探索考虑了不同类型的嵌入，包括因子分解计数向量和预测模型（Skip Gram、GloVe等），以及最新的上下文表示（即ELMo和BERT）。在我们的分析中，我们首先评估检索任务中映射的质量，然后阐明在每个嵌入类型中更好地编码的语义特征。最后设置了大量的探测任务来评估原始嵌入和映射嵌入在区分语义类别方面的表现。对于每个探测任务，我们确定了最相关的语义特征，并表明嵌入性能和它们如何编码这些特征之间存在相关性。这项研究提出了一种新的简单方法，从分布向量中雕刻出人类可解释的语义表示，从而在理解向量空间捕捉到意义的哪些方面方面迈出了一步。

{"title":"Decoding Word Embeddings with Brain-Based Semantic Features","authors":"Emmanuele Chersoni, Enrico Santus, Chu-Ren Huang, Alessandro Lenci","doi":"10.1162/coli_a_00412","DOIUrl":"https://doi.org/10.1162/coli_a_00412","url":null,"abstract":"Word embeddings are vectorial semantic representations built with either counting or predicting techniques aimed at capturing shades of meaning from word co-occurrences. Since their introduction, these representations have been criticized for lacking interpretable dimensions. This property of word embeddings limits our understanding of the semantic features they actually encode. Moreover, it contributes to the “black box” nature of the tasks in which they are used, since the reasons for word embedding performance often remain opaque to humans. In this contribution, we explore the semantic properties encoded in word embeddings by mapping them onto interpretable vectors, consisting of explicit and neurobiologically motivated semantic features (Binder et al. 2016). Our exploration takes into account different types of embeddings, including factorized count vectors and predict models (Skip-Gram, GloVe, etc.), as well as the most recent contextualized representations (i.e., ELMo and BERT). In our analysis, we first evaluate the quality of the mapping in a retrieval task, then we shed light on the semantic features that are better encoded in each embedding type. A large number of probing tasks is finally set to assess how the original and the mapped embeddings perform in discriminating semantic categories. For each probing task, we identify the most relevant semantic features and we show that there is a correlation between the embedding performance and how they encode those features. This study sets itself as a step forward in understanding which aspects of meaning are captured by vector spaces, by proposing a new and simple method to carve human-interpretable semantic representations from distributional vectors.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"1-36"},"PeriodicalIF":9.3,"publicationDate":"2021-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41536228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

Toward Gender-Inclusive Coreference Resolution: An Analysis of Gender and Bias Throughout the Machine Learning Lifecycle* 走向性别包容性的共指解决：机器学习生命周期中的性别和偏见分析*

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-07-05 DOI: 10.1162/coli_a_00413

Yang Trista Cao, Hal Daumé Iii

Abstract Correctly resolving textual mentions of people fundamentally entails making inferences about those people. Such inferences raise the risk of systematic biases in coreference resolution systems, including biases that can harm binary and non-binary trans and cis stakeholders. To better understand such biases, we foreground nuanced conceptualizations of gender from sociology and sociolinguistics, and investigate where in the machine learning pipeline such biases can enter a coreference resolution system. We inspect many existing data sets for trans-exclusionary biases, and develop two new data sets for interrogating bias in both crowd annotations and in existing coreference resolution systems. Through these studies, conducted on English text, we confirm that without acknowledging and building systems that recognize the complexity of gender, we will build systems that fail for: quality of service, stereotyping, and over- or under-representation, especially for binary and non-binary trans users.

摘要正确解决文本中对人的提及，从根本上讲就是对这些人进行推论。这种推断增加了共指解析系统中系统性偏见的风险，包括可能损害二元和非二元反式和顺式利益相关者的偏见。为了更好地理解这种偏见，我们展望了社会学和社会语言学对性别的细微概念化，并调查了在机器学习过程中，这种偏见可以在哪里进入共指消解系统。我们检查了许多现有的数据集的跨排斥偏见，并开发了两个新的数据集，用于在人群注释和现有的共指解析系统中询问偏见。通过对英语文本进行的这些研究，我们证实，如果不承认和建立承认性别复杂性的系统，我们将建立失败的系统：服务质量、刻板印象和代表性过高或过低，尤其是对二元和非二元跨性别用户。

引用次数: 27

Universal Dependencies 通用依赖项

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-07-01 DOI: 10.1162/coli_a_00402

Joakim Nivre, Daniel Zeman, Filip Ginter, Francis M. Tyers

Abstract Universal dependencies (UD) is a framework for morphosyntactic annotation of human language, which to date has been used to create treebanks for more than 100 languages. In this article, we outline the linguistic theory of the UD framework, which draws on a long tradition of typologically oriented grammatical theories. Grammatical relations between words are centrally used to explain how predicate–argument structures are encoded morphosyntactically in different languages while morphological features and part-of-speech classes give the properties of words. We argue that this theory is a good basis for crosslinguistically consistent annotation of typologically diverse languages in a way that supports computational natural language understanding as well as broader linguistic studies.

通用依赖关系(Universal dependencies, UD)是一种用于人类语言形态语法注释的框架，迄今为止已被用于创建100多种语言的树库。在这篇文章中，我们概述了UD框架的语言学理论，它借鉴了类型学导向的语法理论的悠久传统。词之间的语法关系主要用来解释不同语言中谓语-实参结构是如何在形态句法上编码的，而词形特征和词性类别则赋予词的属性。我们认为，这一理论是一个良好的基础，跨语言学一致的注释类型多样的语言，支持计算自然语言的理解以及更广泛的语言学研究。

引用次数: 36

The Taxonomy of Writing Systems: How to Measure How Logographic a System Is 书写系统的分类学:如何衡量一个系统的语源性

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-06-30 DOI: 10.1162/coli_a_00409

R. Sproat, Alexander Gutkin

Taxonomies of writing systems since Gelb (1952) have classified systems based on what the written symbols represent: if they represent words or morphemes, they are logographic; if syllables, syllabic; if segments, alphabetic; and so forth. Sproat (2000) and Rogers (2005) broke with tradition by splitting the logographic and phonographic aspects into two dimensions, with logography being graded rather than a categorical distinction. A system could be syllabic, and highly logographic; or alphabetic, and mostly non-logographic. This accords better with how writing systems actually work, but neither author proposed a method for measuring logography. In this article we propose a novel measure of the degree of logography that uses an attention-based sequence-to-sequence model trained to predict the spelling of a token from its pronunciation in context. In an ideal phonographic system, the model should need to attend to only the current token in order to compute how to spell it, and this would show in the attention matrix activations. In contrast, with a logographic system, where a given pronunciation might correspond to several different spellings, the model would need to attend to a broader context. The ratio of the activation outside the token and the total activation forms the basis of our measure. We compare this with a simple lexical measure, and an entropic measure, as well as several other neural models, and argue that on balance our attention-based measure accords best with intuition about how logographic various systems are. Our work provides the first quantifiable measure of the notion of logography that accords with linguistic intuition and, we argue, provides better insight into what this notion means.

自Gelb(1952)以来，书写系统的分类法基于书写符号所代表的内容对系统进行了分类:如果它们代表单词或语素，它们是语域的;若音节，则音节;如果分段，按字母顺序;等等。spproat(2000)和Rogers(2005)打破了传统，将音标学和音标学分成两个维度，音标学被分级而不是分类区分。一个系统可以是音节式的，并且是高度符号化的;或者是按字母顺序排列的，而且大部分都是非按符号排列的。这更符合书写系统的实际工作方式，但两位作者都没有提出衡量文字的方法。在这篇文章中，我们提出了一种新的标记程度测量方法，该方法使用基于注意力的序列到序列模型来训练，从上下文中的发音来预测标记的拼写。在理想的留声机系统中，模型应该只需要关注当前的符号，以便计算如何拼写它，这将在注意矩阵激活中显示出来。相比之下，对于一个给定的发音可能对应于几个不同的拼写的符号系统，该模型将需要关注更广泛的上下文。令牌外的激活与总激活的比率构成了我们测量的基础。我们将其与简单的词汇测量、熵测量以及其他几个神经模型进行了比较，并认为，总的来说，我们基于注意力的测量最符合直觉对不同系统的表征。我们的工作提供了符合语言学直觉的第一个可量化的符号概念测量，我们认为，它提供了更好的洞察这个概念的含义。

{"title":"The Taxonomy of Writing Systems: How to Measure How Logographic a System Is","authors":"R. Sproat, Alexander Gutkin","doi":"10.1162/coli_a_00409","DOIUrl":"https://doi.org/10.1162/coli_a_00409","url":null,"abstract":"Taxonomies of writing systems since Gelb (1952) have classified systems based on what the written symbols represent: if they represent words or morphemes, they are logographic; if syllables, syllabic; if segments, alphabetic; and so forth. Sproat (2000) and Rogers (2005) broke with tradition by splitting the logographic and phonographic aspects into two dimensions, with logography being graded rather than a categorical distinction. A system could be syllabic, and highly logographic; or alphabetic, and mostly non-logographic. This accords better with how writing systems actually work, but neither author proposed a method for measuring logography. In this article we propose a novel measure of the degree of logography that uses an attention-based sequence-to-sequence model trained to predict the spelling of a token from its pronunciation in context. In an ideal phonographic system, the model should need to attend to only the current token in order to compute how to spell it, and this would show in the attention matrix activations. In contrast, with a logographic system, where a given pronunciation might correspond to several different spellings, the model would need to attend to a broader context. The ratio of the activation outside the token and the total activation forms the basis of our measure. We compare this with a simple lexical measure, and an entropic measure, as well as several other neural models, and argue that on balance our attention-based measure accords best with intuition about how logographic various systems are. Our work provides the first quantifiable measure of the notion of logography that accords with linguistic intuition and, we argue, provides better insight into what this notion means.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"1-52"},"PeriodicalIF":9.3,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48450988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Understanding Dialogue: Language Use and Social Interaction 理解对话:语言使用和社会互动

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-06-30 DOI: 10.1162/coli_r_00411

Rachel Bawden

Understanding Dialogue: Language Use and Social Interaction represents a departure from classic theories in psycholinguistics and cognitive sciences; instead of taking as a starting point the isolated speech of an individual that can be extended to accommodate dialogue, a primary focus is put on developing a model adapted to dialogue itself, bearing in mind important aspects of dialogue as an activity with a heavily cooperative component. As a researcher of natural language processing with a background in linguistics, I find highly intriguing the possibilities provided by the dialogue model presented. Although the book does not itself touch upon the potential for automated dialogue, I am inevitably writing this review from the point of view of a computational linguist with these aspects in mind. Building on numerous previous works, including many of the authors’ own studies and theories, Understanding Dialogue presents the shared workspace framework, a framework for understanding not just dialogue but cooperative activities in general, of which dialogue is viewed as a subtype. Based on Bratman’s (1992) concept of shared cooperative activity, the framework provides a joint environment with which interlocutors can interact, both by contributing to the space (with actions or utterances for example), and by perceiving and processing their own or the other participants’ productions. The authors do not limit their work to linguistic communication: Many of their examples, particularly at the beginning of the book, are non-linguistic (e.g., hand shaking, dancing a tango, playing singles tennis); others are primarily physical, but will most likely also involve linguistic communication (such as jointly constructing flat-pack furniture); and others are purely linguistic (e.g., suggesting which restaurant to go to for lunch). The notion of alignment is highly important to this framework both from a linguistic and non-linguistic perspective, and is one of the main inspirations of the book, having previously been presented in Toward a Mechanistic Theory of Dialogue by the same authors. As individuals interact via the joint space, alignment concerns the equivalence in their representations at a conceptual level, with respect to their goals and relevant props in the shared environment (dialogue model alignment) and linguistic representations shared in the workspace (linguistic alignment). Roughly speaking, in this second (linguistic) case, this may for instance correspond to whether or not the individuals have the same representation of the utterance in terms of phonetics (were the sounds perceived correctly?) or in terms of lexical semantics (do they understand the same reference by the word uttered?). From here can be explained a number of different dialogue

理解对话:语言使用和社会互动代表了对心理语言学和认知科学经典理论的背离;与其将个人的孤立讲话作为起点，扩展到适应对话，不如将重点放在发展一种适合对话本身的模式上，牢记对话作为一种具有高度合作成分的活动的重要方面。作为一名具有语言学背景的自然语言处理研究人员，我发现所提出的对话模型提供的可能性非常有趣。虽然这本书本身没有触及自动对话的潜力，但我不可避免地从计算语言学家的角度来写这篇评论，并考虑到这些方面。《理解对话》建立在许多先前的作品基础上，包括许多作者自己的研究和理论，提出了共享工作空间框架，这个框架不仅可以理解对话，还可以理解一般的合作活动，其中对话被视为一个子类型。基于Bratman(1992)的共享合作活动概念，该框架提供了一个共同的环境，对话者可以通过对空间的贡献(例如用行动或话语)，以及通过感知和处理自己或其他参与者的产品来进行互动。作者并没有将他们的研究局限于语言交流:他们的许多例子，特别是在书的开头，都是非语言的(例如，握手，跳探戈，打单打网球);其他主要是身体上的，但也很可能涉及语言交流(比如共同建造平板家具);还有一些纯粹是语言上的(例如，建议去哪家餐馆吃午饭)。从语言学和非语言学的角度来看，对齐的概念对这个框架都非常重要，这是本书的主要灵感之一，之前由同一作者在《走向对话的机械理论》中提出过。当个体通过联合空间进行交互时，对齐关系到他们的表征在概念层面上的等价性，关于他们在共享环境中的目标和相关道具(对话模型对齐)和在工作空间中共享的语言表征(语言对齐)。粗略地说，在第二种(语言学)情况下，例如，这可能对应于个体是否在语音方面(声音是否被正确感知?)或在词汇语义方面(他们是否理解所说的单词的相同参考?)对话语有相同的表征。从这里可以解释一些不同的对话

{"title":"Understanding Dialogue: Language Use and Social Interaction","authors":"Rachel Bawden","doi":"10.1162/coli_r_00411","DOIUrl":"https://doi.org/10.1162/coli_r_00411","url":null,"abstract":"Understanding Dialogue: Language Use and Social Interaction represents a departure from classic theories in psycholinguistics and cognitive sciences; instead of taking as a starting point the isolated speech of an individual that can be extended to accommodate dialogue, a primary focus is put on developing a model adapted to dialogue itself, bearing in mind important aspects of dialogue as an activity with a heavily cooperative component. As a researcher of natural language processing with a background in linguistics, I find highly intriguing the possibilities provided by the dialogue model presented. Although the book does not itself touch upon the potential for automated dialogue, I am inevitably writing this review from the point of view of a computational linguist with these aspects in mind. Building on numerous previous works, including many of the authors’ own studies and theories, Understanding Dialogue presents the shared workspace framework, a framework for understanding not just dialogue but cooperative activities in general, of which dialogue is viewed as a subtype. Based on Bratman’s (1992) concept of shared cooperative activity, the framework provides a joint environment with which interlocutors can interact, both by contributing to the space (with actions or utterances for example), and by perceiving and processing their own or the other participants’ productions. The authors do not limit their work to linguistic communication: Many of their examples, particularly at the beginning of the book, are non-linguistic (e.g., hand shaking, dancing a tango, playing singles tennis); others are primarily physical, but will most likely also involve linguistic communication (such as jointly constructing flat-pack furniture); and others are purely linguistic (e.g., suggesting which restaurant to go to for lunch). The notion of alignment is highly important to this framework both from a linguistic and non-linguistic perspective, and is one of the main inspirations of the book, having previously been presented in Toward a Mechanistic Theory of Dialogue by the same authors. As individuals interact via the joint space, alignment concerns the equivalence in their representations at a conceptual level, with respect to their goals and relevant props in the shared environment (dialogue model alignment) and linguistic representations shared in the workspace (linguistic alignment). Roughly speaking, in this second (linguistic) case, this may for instance correspond to whether or not the individuals have the same representation of the utterance in terms of phonetics (were the sounds perceived correctly?) or in terms of lexical semantics (do they understand the same reference by the word uttered?). From here can be explained a number of different dialogue","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"1-3"},"PeriodicalIF":9.3,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46096381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning 自然语言处理中的嵌入：意义的矢量表示理论与进展

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-06-30 DOI: 10.1162/coli_r_00410

Marcos Garcia

Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics to represent the semantics of natural languages. However, the rise of neural networks in NLP popularized the use of word embeddings, which are now applied as pre-trained vectors in most machine learning architectures. This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados, provides a comprehensive and easy-to-read review of the theory and advances in vector models for NLP, focusing specially on semantic representations and their applications. It is a great introduction to different types of embeddings and the background and motivations behind them. In this sense, the authors adequately present the most relevant concepts and approaches that have been used to build vector representations. They also keep track of the most recent advances of this vibrant and fast-evolving area of research, discussing cross-lingual representations and current language models based on the Transformer. Therefore, this is a useful book for researchers interested in computational methods for semantic representations and artificial intelligence. Although some basic knowledge of machine learning may be necessary to follow a few topics, the book includes clear illustrations and explanations, which make it accessible to a wide range of readers. Apart from the preface and the conclusions, the book is organized into eight chapters. In the first two, the authors introduce some of the core ideas of NLP and artificial neural networks, respectively, discussing several concepts that will be useful throughout the book. Then, Chapters 3 to 6 present different types of vector representations at the lexical level (word embeddings, graph embeddings, sense embeddings, and contextualized embeddings), followed by a brief chapter (7) about sentence and document embeddings. For each specific topic, the book includes methods and data sets to assess the quality of the embeddings. Finally, Chapter 8 raises ethical issues involved

词向量表示在认知科学或计算语言学等多个研究领域有着悠久的传统。它们被用来表示自然语言的各种单位的含义，包括单词、短语和句子。在深度学习海啸之前，基于计数的向量空间模型已经成功地用于计算语言学，以表示自然语言的语义。然而，神经网络在NLP中的兴起普及了单词嵌入的使用，单词嵌入现在被用作大多数机器学习架构中的预训练向量。这本书由Mohammad Taher Pilehvar和Jose Camacho Collados撰写，对NLP向量模型的理论和进展进行了全面而易读的综述，特别关注语义表示及其应用。它很好地介绍了不同类型的嵌入及其背后的背景和动机。从这个意义上讲，作者充分介绍了用于构建向量表示的最相关的概念和方法。他们还跟踪了这一充满活力和快速发展的研究领域的最新进展，讨论了基于Transformer的跨语言表示和当前语言模型。因此，对于对语义表示和人工智能的计算方法感兴趣的研究人员来说，这是一本有用的书。尽管一些机器学习的基本知识可能是跟随一些主题所必需的，但这本书包括清晰的插图和解释，这使它能够为广泛的读者所理解。除前言和结论外，本书共分为八章。在前两部分中，作者分别介绍了NLP和人工神经网络的一些核心思想，并讨论了在整本书中有用的几个概念。然后，第3章至第6章在词汇层面呈现了不同类型的向量表示（单词嵌入、图形嵌入、意义嵌入和上下文嵌入），然后是关于句子和文档嵌入的简短章节（7）。对于每个特定的主题，本书包括评估嵌入质量的方法和数据集。最后，第8章提出了相关的伦理问题

{"title":"Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning","authors":"Marcos Garcia","doi":"10.1162/coli_r_00410","DOIUrl":"https://doi.org/10.1162/coli_r_00410","url":null,"abstract":"Word vector representations have a long tradition in several research fields, such as cognitive science or computational linguistics. They have been used to represent the meaning of various units of natural languages, including, among others, words, phrases, and sentences. Before the deep learning tsunami, count-based vector space models had been successfully used in computational linguistics to represent the semantics of natural languages. However, the rise of neural networks in NLP popularized the use of word embeddings, which are now applied as pre-trained vectors in most machine learning architectures. This book, written by Mohammad Taher Pilehvar and Jose Camacho-Collados, provides a comprehensive and easy-to-read review of the theory and advances in vector models for NLP, focusing specially on semantic representations and their applications. It is a great introduction to different types of embeddings and the background and motivations behind them. In this sense, the authors adequately present the most relevant concepts and approaches that have been used to build vector representations. They also keep track of the most recent advances of this vibrant and fast-evolving area of research, discussing cross-lingual representations and current language models based on the Transformer. Therefore, this is a useful book for researchers interested in computational methods for semantic representations and artificial intelligence. Although some basic knowledge of machine learning may be necessary to follow a few topics, the book includes clear illustrations and explanations, which make it accessible to a wide range of readers. Apart from the preface and the conclusions, the book is organized into eight chapters. In the first two, the authors introduce some of the core ideas of NLP and artificial neural networks, respectively, discussing several concepts that will be useful throughout the book. Then, Chapters 3 to 6 present different types of vector representations at the lexical level (word embeddings, graph embeddings, sense embeddings, and contextualized embeddings), followed by a brief chapter (7) about sentence and document embeddings. For each specific topic, the book includes methods and data sets to assess the quality of the embeddings. Finally, Chapter 8 raises ethical issues involved","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":" ","pages":"1-3"},"PeriodicalIF":9.3,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49166342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 37

Sequence-Level Training for Non-Autoregressive Neural Machine Translation 非自动神经机器翻译的序列级训练

IF 9.3 2区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Computational Linguistics

Pub Date : 2021-06-15 DOI: 10.1162/coli_a_00421

Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Jie Zhou

Abstract In recent years, Neural Machine Translation (NMT) has achieved notable results in various translation tasks. However, the word-by-word generation manner determined by the autoregressive mechanism leads to high translation latency of the NMT and restricts its low-latency applications. Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup by generating target words independently and simultaneously. Nevertheless, NAT still takes the word-level cross-entropy loss as the training objective, which is not optimal because the output of NAT cannot be properly evaluated due to the multimodality problem. In this article, we propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality. First, we propose training NAT models to optimize sequence-level evaluation metrics (e.g., BLEU) based on several novel reinforcement algorithms customized for NAT, which outperform the conventional method by reducing the variance of gradient estimation. Second, we introduce a novel training objective for NAT models, which aims to minimize the Bag-of-N-grams (BoN) difference between the model output and the reference sentence. The BoN training objective is differentiable and can be calculated efficiently without doing any approximations. Finally, we apply a three-stage training strategy to combine these two methods to train the NAT model. We validate our approach on four translation tasks (WMT14 En↔De, WMT16 En↔Ro), which shows that our approach largely outperforms NAT baselines and achieves remarkable performance on all translation tasks. The source code is available at https://github.com/ictnlp/Seq-NAT.

近年来，神经机器翻译(NMT)在各种翻译任务中取得了显著的成果。然而，由自回归机制决定的逐字生成方式导致了NMT的高翻译延迟，限制了其低延迟应用。非自回归神经机器翻译(Non-Autoregressive Neural Machine Translation, NAT)消除了自回归机制，通过独立地、同时地生成目标词，实现了显著的译码加速。然而，NAT仍然以字级交叉熵损失作为训练目标，这并不是最优的，因为NAT的输出由于多模态问题而无法得到正确的评估。在本文中，我们建议使用序列级训练目标来训练NAT模型，该模型将NAT输出作为一个整体进行评估，并与实际翻译质量很好地相关。首先，我们提出了训练NAT模型来优化序列级评估指标(例如BLEU)，基于几种为NAT定制的新型强化算法，这些算法通过减少梯度估计的方差来优于传统方法。其次，我们为NAT模型引入了一个新的训练目标，该目标旨在最小化模型输出与参考句子之间的n -grams bag (BoN)差异。BoN训练目标是可微的，无需任何近似即可有效计算。最后，我们采用一种三阶段训练策略，将这两种方法结合起来训练NAT模型。我们在四个翻译任务(WMT14 En↔De, WMT16 En↔Ro)上验证了我们的方法，这表明我们的方法在很大程度上优于NAT基线，并在所有翻译任务上取得了显著的性能。源代码可从https://github.com/ictnlp/Seq-NAT获得。

{"title":"Sequence-Level Training for Non-Autoregressive Neural Machine Translation","authors":"Chenze Shao, Yang Feng, Jinchao Zhang, Fandong Meng, Jie Zhou","doi":"10.1162/coli_a_00421","DOIUrl":"https://doi.org/10.1162/coli_a_00421","url":null,"abstract":"Abstract In recent years, Neural Machine Translation (NMT) has achieved notable results in various translation tasks. However, the word-by-word generation manner determined by the autoregressive mechanism leads to high translation latency of the NMT and restricts its low-latency applications. Non-Autoregressive Neural Machine Translation (NAT) removes the autoregressive mechanism and achieves significant decoding speedup by generating target words independently and simultaneously. Nevertheless, NAT still takes the word-level cross-entropy loss as the training objective, which is not optimal because the output of NAT cannot be properly evaluated due to the multimodality problem. In this article, we propose using sequence-level training objectives to train NAT models, which evaluate the NAT outputs as a whole and correlates well with the real translation quality. First, we propose training NAT models to optimize sequence-level evaluation metrics (e.g., BLEU) based on several novel reinforcement algorithms customized for NAT, which outperform the conventional method by reducing the variance of gradient estimation. Second, we introduce a novel training objective for NAT models, which aims to minimize the Bag-of-N-grams (BoN) difference between the model output and the reference sentence. The BoN training objective is differentiable and can be calculated efficiently without doing any approximations. Finally, we apply a three-stage training strategy to combine these two methods to train the NAT model. We validate our approach on four translation tasks (WMT14 En↔De, WMT16 En↔Ro), which shows that our approach largely outperforms NAT baselines and achieves remarkable performance on all translation tasks. The source code is available at https://github.com/ictnlp/Seq-NAT.","PeriodicalId":55229,"journal":{"name":"Computational Linguistics","volume":"47 1","pages":"891-925"},"PeriodicalIF":9.3,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45515778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19