首页 > 最新文献

Transactions of the Association for Computational Linguistics最新文献

英文 中文
How to Dissect a Muppet: The Structure of Transformer Embedding Spaces 如何解剖布偶:变压器嵌入空间的结构
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-06-07 DOI: 10.1162/tacl_a_00501
Timothee Mickus, Denis Paperno, Mathieu Constant
Abstract Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each component. We provide evidence that multi-head attentions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of finetuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.
基于Transformer架构的预训练嵌入在NLP社区掀起了一股风暴。我们展示了它们可以在数学上被重构为矢量因素的总和,并展示了如何使用这种重构来研究每个组件的影响。我们提供的证据表明,多头关注和前馈在所有下游应用中并不同样有用,以及微调对整个嵌入空间的影响的定量概述。这种方法使我们能够与以前的广泛研究建立联系,从向量空间各向异性到注意力权重。
{"title":"How to Dissect a Muppet: The Structure of Transformer Embedding Spaces","authors":"Timothee Mickus, Denis Paperno, Mathieu Constant","doi":"10.1162/tacl_a_00501","DOIUrl":"https://doi.org/10.1162/tacl_a_00501","url":null,"abstract":"Abstract Pretrained embeddings based on the Transformer architecture have taken the NLP community by storm. We show that they can mathematically be reframed as a sum of vector factors and showcase how to use this reframing to study the impact of each component. We provide evidence that multi-head attentions and feed-forwards are not equally useful in all downstream applications, as well as a quantitative overview of the effects of finetuning on the overall embedding space. This approach allows us to draw connections to a wide range of previous studies, from vector space anisotropy to attention weights.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"981-996"},"PeriodicalIF":10.9,"publicationDate":"2022-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43694724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Heterogeneous Supervised Topic Models 异构监督主题模型
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-06-01 DOI: 10.1162/tacl_a_00487
Dhanya Sridhar, Hal Daumé, D. Blei
Abstract Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.
社会科学领域的研究人员经常对文本和感兴趣的结果之间的关系感兴趣,其目标是发现文本中的潜在模式并预测未见文本的结果。为此,本文开发了异构监督主题模型(HSTM),这是一种用于文本分析和预测的概率方法。hstm假定文本和结果的联合模型,以发现有助于文本分析和预测的异构模式。hstm的主要优点是它们捕获了文本和潜在主题之间结果之间关系的异质性。为了适应hstm,我们开发了一种基于自编码变分贝叶斯框架的变分推理算法。我们研究了hstm在8个数据集上的性能,发现它们始终优于相关方法,包括微调黑盒模型。最后,我们运用hstm来分析带有亲调或反调标记的新闻文章。我们发现有证据表明,不同的语言用来表示赞成和反对的语气。
{"title":"Heterogeneous Supervised Topic Models","authors":"Dhanya Sridhar, Hal Daumé, D. Blei","doi":"10.1162/tacl_a_00487","DOIUrl":"https://doi.org/10.1162/tacl_a_00487","url":null,"abstract":"Abstract Researchers in the social sciences are often interested in the relationship between text and an outcome of interest, where the goal is to both uncover latent patterns in the text and predict outcomes for unseen texts. To this end, this paper develops the heterogeneous supervised topic model (HSTM), a probabilistic approach to text analysis and prediction. HSTMs posit a joint model of text and outcomes to find heterogeneous patterns that help with both text analysis and prediction. The main benefit of HSTMs is that they capture heterogeneity in the relationship between text and the outcome across latent topics. To fit HSTMs, we develop a variational inference algorithm based on the auto-encoding variational Bayes framework. We study the performance of HSTMs on eight datasets and find that they consistently outperform related methods, including fine-tuned black-box models. Finally, we apply HSTMs to analyze news articles labeled with pro- or anti-tone. We find evidence of differing language used to signal a pro- and anti-tone.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"732-745"},"PeriodicalIF":10.9,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44526850","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression 文本回归预训练模型的不确定性估计与减少
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-06-01 DOI: 10.1162/tacl_a_00483
Yuxia Wang, Daniel Beck, Timothy Baldwin, K. Verspoor
Abstract State-of-the-art classification and regression models are often not well calibrated, and cannot reliably provide uncertainty estimates, limiting their utility in safety-critical applications such as clinical decision-making. While recent work has focused on calibration of classifiers, there is almost no work in NLP on calibration in a regression setting. In this paper, we quantify the calibration of pre- trained language models for text regression, both intrinsically and extrinsically. We further apply uncertainty estimates to augment training data in low-resource domains. Our experiments on three regression tasks in both self-training and active-learning settings show that uncertainty estimation can be used to increase overall performance and enhance model generalization.
目前最先进的分类和回归模型往往没有很好地校准,不能可靠地提供不确定性估计,限制了它们在临床决策等安全关键应用中的效用。虽然最近的工作主要集中在分类器的校准上,但在回归设置中几乎没有关于NLP校准的工作。在本文中,我们量化了文本回归的预训练语言模型的校准,包括内在的和外在的。我们进一步应用不确定性估计来增加低资源领域的训练数据。我们在自我训练和主动学习设置下的三个回归任务上的实验表明,不确定性估计可以用来提高整体性能和增强模型泛化。
{"title":"Uncertainty Estimation and Reduction of Pre-trained Models for Text Regression","authors":"Yuxia Wang, Daniel Beck, Timothy Baldwin, K. Verspoor","doi":"10.1162/tacl_a_00483","DOIUrl":"https://doi.org/10.1162/tacl_a_00483","url":null,"abstract":"Abstract State-of-the-art classification and regression models are often not well calibrated, and cannot reliably provide uncertainty estimates, limiting their utility in safety-critical applications such as clinical decision-making. While recent work has focused on calibration of classifiers, there is almost no work in NLP on calibration in a regression setting. In this paper, we quantify the calibration of pre- trained language models for text regression, both intrinsically and extrinsically. We further apply uncertainty estimates to augment training data in low-resource domains. Our experiments on three regression tasks in both self-training and active-learning settings show that uncertainty estimation can be used to increase overall performance and enhance model generalization.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"680-696"},"PeriodicalIF":10.9,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41329415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Naturalistic Causal Probing for Morpho-Syntax 形态句法的自然因果探究
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-05-14 DOI: 10.1162/tacl_a_00554
Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell
Probing has become a go-to methodology for interpreting and analyzing deep neural models in natural language processing. However, there is still a lack of understanding of the limitations and weaknesses of various types of probes. In this work, we suggest a strategy for input-level intervention on naturalistic sentences. Using our approach, we intervene on the morpho-syntactic features of a sentence, while keeping the rest of the sentence unchanged. Such an intervention allows us to causally probe pre-trained models. We apply our naturalistic causal probing framework to analyze the effects of grammatical gender and number on contextualized representations extracted from three pre-trained models in Spanish, the multilingual versions of BERT, RoBERTa, and GPT-2. Our experiments suggest that naturalistic interventions lead to stable estimates of the causal effects of various linguistic properties. Moreover, our experiments demonstrate the importance of naturalistic causal probing when analyzing pre-trained models. https://github.com/rycolab/naturalistic-causal-probing
探究已经成为解释和分析自然语言处理中深层神经模型的一种常用方法。然而,人们仍然缺乏对各种类型探针的局限性和弱点的了解。在这项工作中,我们提出了一种对自然主义句子进行输入水平干预的策略。使用我们的方法,我们对句子的形态句法特征进行干预,同时保持句子的其余部分不变。这样的干预使我们能够因果地探究预先训练的模型。我们应用我们的自然主义因果探究框架来分析语法性别和数字对从三个预先训练的西班牙语模型、BERT、RoBERTa和GPT-2的多语言版本中提取的上下文化表示的影响。我们的实验表明,自然主义干预可以稳定地估计各种语言特性的因果效应。此外,我们的实验证明了在分析预先训练的模型时自然因果探究的重要性。https://github.com/rycolab/naturalistic-causal-probing
{"title":"Naturalistic Causal Probing for Morpho-Syntax","authors":"Afra Amini, Tiago Pimentel, Clara Meister, Ryan Cotterell","doi":"10.1162/tacl_a_00554","DOIUrl":"https://doi.org/10.1162/tacl_a_00554","url":null,"abstract":"Probing has become a go-to methodology for interpreting and analyzing deep neural models in natural language processing. However, there is still a lack of understanding of the limitations and weaknesses of various types of probes. In this work, we suggest a strategy for input-level intervention on naturalistic sentences. Using our approach, we intervene on the morpho-syntactic features of a sentence, while keeping the rest of the sentence unchanged. Such an intervention allows us to causally probe pre-trained models. We apply our naturalistic causal probing framework to analyze the effects of grammatical gender and number on contextualized representations extracted from three pre-trained models in Spanish, the multilingual versions of BERT, RoBERTa, and GPT-2. Our experiments suggest that naturalistic interventions lead to stable estimates of the causal effects of various linguistic properties. Moreover, our experiments demonstrate the importance of naturalistic causal probing when analyzing pre-trained models. https://github.com/rycolab/naturalistic-causal-probing","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"384-403"},"PeriodicalIF":10.9,"publicationDate":"2022-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43315320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Document Summarization with Latent Queries 具有潜在查询的文档摘要
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-05-01 DOI: 10.1162/tacl_a_00480
Yumo Xu, Mirella Lapata
The availability of large-scale datasets has driven the development of neural models that create generic summaries for single or multiple documents. For query-focused summarization (QFS), labeled training data in the form of queries, documents, and summaries is not readily available. We provide a unified modeling framework for any kind of summarization, under the assumption that all summaries are a response to a query, which is observed in the case of QFS and latent in the case of generic summarization. We model queries as discrete latent variables over document tokens, and learn representations compatible with observed and unobserved query verbalizations. Our framework formulates summarization as a generative process, and jointly optimizes a latent query model and a conditional language model. Despite learning from generic summarization data only, our approach outperforms strong comparison systems across benchmarks, query types, document settings, and target domains.1
大规模数据集的可用性推动了神经模型的发展,该模型为单个或多个文档创建通用摘要。对于以查询为中心的摘要(QFS),以查询、文档和摘要形式标记的训练数据并不容易获得。我们为任何类型的摘要提供了一个统一的建模框架,假设所有摘要都是对查询的响应,这在QFS的情况下是观察到的,在通用摘要的情况下则是潜在的。我们将查询建模为文档标记上的离散潜在变量,并学习与观察到和未观察到的查询语句兼容的表示。我们的框架将摘要表述为一个生成过程,并联合优化了一个潜在查询模型和一个条件语言模型。尽管我们只从通用摘要数据中学习,但我们的方法在基准测试、查询类型、文档设置和目标域方面都优于强大的比较系统。1
{"title":"Document Summarization with Latent Queries","authors":"Yumo Xu, Mirella Lapata","doi":"10.1162/tacl_a_00480","DOIUrl":"https://doi.org/10.1162/tacl_a_00480","url":null,"abstract":"The availability of large-scale datasets has driven the development of neural models that create generic summaries for single or multiple documents. For query-focused summarization (QFS), labeled training data in the form of queries, documents, and summaries is not readily available. We provide a unified modeling framework for any kind of summarization, under the assumption that all summaries are a response to a query, which is observed in the case of QFS and latent in the case of generic summarization. We model queries as discrete latent variables over document tokens, and learn representations compatible with observed and unobserved query verbalizations. Our framework formulates summarization as a generative process, and jointly optimizes a latent query model and a conditional language model. Despite learning from generic summarization data only, our approach outperforms strong comparison systems across benchmarks, query types, document settings, and target domains.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"623-638"},"PeriodicalIF":10.9,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46347457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
A Neighborhood Framework for Resource-Lean Content Flagging 资源精益内容标记的邻域框架
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-05-01 DOI: 10.1162/tacl_a_00472
Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov
We propose a novel framework for cross- lingual content flagging with limited target- language data, which significantly outperforms prior work in terms of predictive performance. The framework is based on a nearest-neighbor architecture. It is a modern instantiation of the vanilla k-nearest neighbor model, as we use Transformer representations in all its components. Our framework can adapt to new source- language instances, without the need to be retrained from scratch. Unlike prior work on neighborhood-based approaches, we encode the neighborhood information based on query– neighbor interactions. We propose two encoding schemes and we show their effectiveness using both qualitative and quantitative analysis. Our evaluation results on eight languages from two different datasets for abusive language detection show sizable improvements of up to 9.5 F1 points absolute (for Italian) over strong baselines. On average, we achieve 3.6 absolute F1 points of improvement for the three languages in the Jigsaw Multilingual dataset and 2.14 points for the WUL dataset.
我们提出了一种新的框架,用于在有限的目标语言数据下进行跨语言内容标记,该框架在预测性能方面显著优于先前的工作。该框架基于最近邻体系结构。它是香草k近邻模型的现代实例化,因为我们在其所有组件中都使用Transformer表示。我们的框架可以适应新的源语言实例,而不需要从头开始重新培训。与先前基于邻域的方法不同,我们基于查询-邻居交互对邻域信息进行编码。我们提出了两种编码方案,并通过定性和定量分析证明了它们的有效性。我们对来自两个不同数据集的八种语言的滥用语言检测评估结果显示,与强基线相比,(意大利语)有高达9.5 F1绝对点的显著改进。平均而言,我们在Jigsaw多语言数据集中的三种语言获得了3.6个绝对F1点的改进,在WUL数据集中获得了2.14个点的改进。
{"title":"A Neighborhood Framework for Resource-Lean Content Flagging","authors":"Sheikh Muhammad Sarwar, Dimitrina Zlatkova, Momchil Hardalov, Yoan Dinkov, Isabelle Augenstein, Preslav Nakov","doi":"10.1162/tacl_a_00472","DOIUrl":"https://doi.org/10.1162/tacl_a_00472","url":null,"abstract":"We propose a novel framework for cross- lingual content flagging with limited target- language data, which significantly outperforms prior work in terms of predictive performance. The framework is based on a nearest-neighbor architecture. It is a modern instantiation of the vanilla k-nearest neighbor model, as we use Transformer representations in all its components. Our framework can adapt to new source- language instances, without the need to be retrained from scratch. Unlike prior work on neighborhood-based approaches, we encode the neighborhood information based on query– neighbor interactions. We propose two encoding schemes and we show their effectiveness using both qualitative and quantitative analysis. Our evaluation results on eight languages from two different datasets for abusive language detection show sizable improvements of up to 9.5 F1 points absolute (for Italian) over strong baselines. On average, we achieve 3.6 absolute F1 points of improvement for the three languages in the Jigsaw Multilingual dataset and 2.14 points for the WUL dataset.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"484-502"},"PeriodicalIF":10.9,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48734165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
End-to-end Argument Mining with Cross-corpora Multi-task Learning 跨语料库多任务学习的端到端论证挖掘
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-05-01 DOI: 10.1162/tacl_a_00481
Gaku Morio, Hiroaki Ozaki, Terufumi Morishita, Kohsuke Yanai
Mining an argument structure from text is an important step for tasks such as argument search and summarization. While studies on argument(ation) mining have proposed promising neural network models, they usually suffer from a shortage of training data. To address this issue, we expand the training data with various auxiliary argument mining corpora and propose an end-to-end cross-corpus training method called Multi-Task Argument Mining (MT-AM). To evaluate our approach, we conducted experiments for the main argument mining tasks on several well-established argument mining corpora. The results demonstrate that MT-AM generally outperformed the models trained on a single corpus. Also, the smaller the target corpus was, the better the MT-AM performed. Our extensive analyses suggest that the improvement of MT-AM depends on several factors of transferability among auxiliary and target corpora.
从文本中挖掘论点结构是论点搜索和摘要等任务的重要步骤。虽然对论点挖掘的研究已经提出了有前景的神经网络模型,但它们通常缺乏训练数据。为了解决这个问题,我们使用各种辅助参数挖掘语料库扩展训练数据,并提出了一种端到端的跨语料库训练方法,称为多任务参数挖掘(MT-AM)。为了评估我们的方法,我们在几个成熟的论点挖掘语料库上进行了主要论点挖掘任务的实验。结果表明,MT-AM通常优于在单个语料库上训练的模型。此外,目标语料库越小,MT-AM执行得越好。我们的广泛分析表明,MT-AM的改进取决于辅助语料库和目标语料库之间的可转移性的几个因素。
{"title":"End-to-end Argument Mining with Cross-corpora Multi-task Learning","authors":"Gaku Morio, Hiroaki Ozaki, Terufumi Morishita, Kohsuke Yanai","doi":"10.1162/tacl_a_00481","DOIUrl":"https://doi.org/10.1162/tacl_a_00481","url":null,"abstract":"Mining an argument structure from text is an important step for tasks such as argument search and summarization. While studies on argument(ation) mining have proposed promising neural network models, they usually suffer from a shortage of training data. To address this issue, we expand the training data with various auxiliary argument mining corpora and propose an end-to-end cross-corpus training method called Multi-Task Argument Mining (MT-AM). To evaluate our approach, we conducted experiments for the main argument mining tasks on several well-established argument mining corpora. The results demonstrate that MT-AM generally outperformed the models trained on a single corpus. Also, the smaller the target corpus was, the better the MT-AM performed. Our extensive analyses suggest that the improvement of MT-AM depends on several factors of transferability among auxiliary and target corpora.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"639-658"},"PeriodicalIF":10.9,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41432190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Visual Spatial Reasoning 视觉空间推理
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-04-30 DOI: 10.1162/tacl_a_00566
Fangyu Liu, Guy Edward Toh Emerson, Nigel Collier
Spatial relations are a basic part of human cognition. However, they are expressed in natural language in a variety of ways, and previous work has suggested that current vision-and-language models (VLMs) struggle to capture relational information. In this paper, we present Visual Spatial Reasoning (VSR), a dataset containing more than 10k natural text-image pairs with 66 types of spatial relations in English (e.g., under, in front of, facing). While using a seemingly simple annotation format, we show how the dataset includes challenging linguistic phenomena, such as varying reference frames. We demonstrate a large gap between human and model performance: The human ceiling is above 95%, while state-of-the-art models only achieve around 70%. We observe that VLMs’ by-relation performances have little correlation with the number of training examples and the tested models are in general incapable of recognising relations concerning the orientations of objects.1
空间关系是人类认知的基本组成部分。然而,它们以各种方式在自然语言中表达,之前的工作表明,当前的视觉和语言模型(VLM)很难捕捉关系信息。在本文中,我们提出了视觉空间推理(VSR),这是一个包含超过10k个自然文本图像对的数据集,具有66种英语空间关系(例如,下方、前方、面向)。在使用看似简单的注释格式的同时,我们展示了数据集如何包括具有挑战性的语言现象,例如不同的参考框架。我们展示了人类和模型性能之间的巨大差距:人类的上限超过95%,而最先进的模型仅达到70%左右。我们观察到,VLM的关系性能与训练示例的数量几乎没有相关性,并且测试的模型通常无法识别与对象方向有关的关系。1
{"title":"Visual Spatial Reasoning","authors":"Fangyu Liu, Guy Edward Toh Emerson, Nigel Collier","doi":"10.1162/tacl_a_00566","DOIUrl":"https://doi.org/10.1162/tacl_a_00566","url":null,"abstract":"Spatial relations are a basic part of human cognition. However, they are expressed in natural language in a variety of ways, and previous work has suggested that current vision-and-language models (VLMs) struggle to capture relational information. In this paper, we present Visual Spatial Reasoning (VSR), a dataset containing more than 10k natural text-image pairs with 66 types of spatial relations in English (e.g., under, in front of, facing). While using a seemingly simple annotation format, we show how the dataset includes challenging linguistic phenomena, such as varying reference frames. We demonstrate a large gap between human and model performance: The human ceiling is above 95%, while state-of-the-art models only achieve around 70%. We observe that VLMs’ by-relation performances have little correlation with the number of training examples and the tested models are in general incapable of recognising relations concerning the orientations of objects.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"635-651"},"PeriodicalIF":10.9,"publicationDate":"2022-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46423709","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
FaithDial: A Faithful Benchmark for Information-Seeking Dialogue 忠实拨号:信息寻求对话的忠实基准
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-04-22 DOI: 10.1162/tacl_a_00529
Nouha Dziri, Ehsan Kamalloo, Sivan Milton, Osmar Zaiane, Mo Yu, E. Ponti, Siva Reddy
Abstract The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.
摘要信息寻求对话的目标是用基于知识来源的自然语言话语来回应寻求者的询问。然而,对话系统经常会产生未经支持的话语,这种现象被称为幻觉。为了缓解这种行为,我们采用了一种以数据为中心的解决方案,并通过在维基百科向导(WoW)基准中编辑幻觉反应,创建了FaithDial,这是一个无幻觉对话的新基准。我们观察到FaithDial比魔兽世界更忠实,同时也保持着引人入胜的对话。我们表明,FaithDial可以作为以下方面的训练信号:i)幻觉评论家,它可以区分话语是否忠实,并在对话连贯性方面,与现有数据集相比,在BEGIN基准上的表现提高了12.8F1分;ii)高质量的对话生成。我们对一系列最先进的模型进行了基准测试,并提出了一个辅助对比目标,该目标基于几个自动化指标实现了最高水平的忠实性和抽象性。此外,我们发现FaithDial的好处可以推广到其他数据集上的零样本传输,如CMU-Dog和TopicalChat。最后,人类评估显示,在FaithDial上训练的模型产生的反应被认为更具可解释性、合作性和参与性。
{"title":"FaithDial: A Faithful Benchmark for Information-Seeking Dialogue","authors":"Nouha Dziri, Ehsan Kamalloo, Sivan Milton, Osmar Zaiane, Mo Yu, E. Ponti, Siva Reddy","doi":"10.1162/tacl_a_00529","DOIUrl":"https://doi.org/10.1162/tacl_a_00529","url":null,"abstract":"Abstract The goal of information-seeking dialogue is to respond to seeker queries with natural language utterances that are grounded on knowledge sources. However, dialogue systems often produce unsupported utterances, a phenomenon known as hallucination. To mitigate this behavior, we adopt a data-centric solution and create FaithDial, a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark. We observe that FaithDial is more faithful than WoW while also maintaining engaging conversations. We show that FaithDial can serve as training signal for: i) a hallucination critic, which discriminates whether an utterance is faithful or not, and boosts the performance by 12.8 F1 score on the BEGIN benchmark compared to existing datasets for dialogue coherence; ii) high-quality dialogue generation. We benchmark a series of state-of-the-art models and propose an auxiliary contrastive objective that achieves the highest level of faithfulness and abstractiveness based on several automated metrics. Further, we find that the benefits of FaithDial generalize to zero-shot transfer on other datasets, such as CMU-Dog and TopicalChat. Finally, human evaluation reveals that responses generated by models trained on FaithDial are perceived as more interpretable, cooperative, and engaging.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"1473-1490"},"PeriodicalIF":10.9,"publicationDate":"2022-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47575589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Chinese Idiom Paraphrasing 汉语习语释义
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-04-15 DOI: 10.1162/tacl_a_00572
Jipeng Qiang, Yang Li, Chaowei Zhang, Yun Li, Yunhao Yuan, Yi Zhu, Xin Wu
Abstract Idioms are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters. Due to the properties of non-compositionality and metaphorical meaning, Chinese idioms are hard to be understood by children and non-native speakers. This study proposes a novel task, denoted as Chinese Idiom Paraphrasing (CIP). CIP aims to rephrase idiom-containing sentences to non-idiomatic ones under the premise of preserving the original sentence’s meaning. Since the sentences without idioms are more easily handled by Chinese NLP systems, CIP can be used to pre-process Chinese datasets, thereby facilitating and improving the performance of Chinese NLP tasks, e.g., machine translation systems, Chinese idiom cloze, and Chinese idiom embeddings. In this study, we can treat the CIP task as a special paraphrase generation task. To circumvent difficulties in acquiring annotations, we first establish a large-scale CIP dataset based on human and machine collaboration, which consists of 115,529 sentence pairs. In addition to three sequence-to-sequence methods as the baselines, we further propose a novel infill-based approach based on text infilling. The results show that the proposed method has better performance than the baselines based on the established CIP dataset.
摘要习语是汉语中的一种习语,大部分由四个汉字组成。汉语习语具有非复合性和隐喻性的特点,很难被非母语儿童所理解。本研究提出了一个新的任务,称为汉语习语释义(CIP)。CIP的目的是在保留原句含义的前提下,将含有习语的句子改写为非习语的句子。由于汉语NLP系统更容易处理没有成语的句子,CIP可以用于预处理汉语数据集,从而促进和提高汉语NLP任务的性能,例如机器翻译系统、汉语成语完形填空和汉语成语嵌入。在本研究中,我们可以将CIP任务视为一个特殊的转述生成任务。为了避免获取注释的困难,我们首先建立了一个基于人机协作的大规模CIP数据集,该数据集由115529个句子对组成。除了三种序列到序列的方法作为基线外,我们还提出了一种基于文本填充的新的基于填充的方法。结果表明,与基于建立的CIP数据集的基线相比,该方法具有更好的性能。
{"title":"Chinese Idiom Paraphrasing","authors":"Jipeng Qiang, Yang Li, Chaowei Zhang, Yun Li, Yunhao Yuan, Yi Zhu, Xin Wu","doi":"10.1162/tacl_a_00572","DOIUrl":"https://doi.org/10.1162/tacl_a_00572","url":null,"abstract":"Abstract Idioms are a kind of idiomatic expression in Chinese, most of which consist of four Chinese characters. Due to the properties of non-compositionality and metaphorical meaning, Chinese idioms are hard to be understood by children and non-native speakers. This study proposes a novel task, denoted as Chinese Idiom Paraphrasing (CIP). CIP aims to rephrase idiom-containing sentences to non-idiomatic ones under the premise of preserving the original sentence’s meaning. Since the sentences without idioms are more easily handled by Chinese NLP systems, CIP can be used to pre-process Chinese datasets, thereby facilitating and improving the performance of Chinese NLP tasks, e.g., machine translation systems, Chinese idiom cloze, and Chinese idiom embeddings. In this study, we can treat the CIP task as a special paraphrase generation task. To circumvent difficulties in acquiring annotations, we first establish a large-scale CIP dataset based on human and machine collaboration, which consists of 115,529 sentence pairs. In addition to three sequence-to-sequence methods as the baselines, we further propose a novel infill-based approach based on text infilling. The results show that the proposed method has better performance than the baselines based on the established CIP dataset.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"740-754"},"PeriodicalIF":10.9,"publicationDate":"2022-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42968525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
Transactions of the Association for Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1