首页 > 最新文献

Transactions of the Association for Computational Linguistics最新文献

英文 中文
A Cross-Linguistic Pressure for Uniform Information Density in Word Order 统一语序信息密度的跨语言压力
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-06 DOI: 10.1162/tacl_a_00589
T. Clark, Clara Meister, Tiago Pimentel, Michael Hahn, Ryan Cotterell, Richard Futrell, Roger Levy Mit, E. Zurich, U. Cambridge, Saarland University, UC Irvine
Abstract While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: The uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.1
摘要尽管自然语言在规范语序和语序灵活性方面存在很大差异,但它们的语序仍然遵循共同的跨语言统计模式,这通常归因于功能压力。为了识别这些压力,先前的工作对真实语序和反事实语序进行了比较。然而,在这些研究中,有一种功能压力被忽视了:统一信息密度假说,认为信息应该在整个话语中均匀分布。在这里,我们要问UID的压力是否影响了语序模式的跨语言性。为此,我们使用计算模型来测试真实订单是否比反事实订单带来更大的信息一致性。在我们对10种类型多样的语言的实证研究中,我们发现:(i)在SVO语言中,真实语序始终比反向语序具有更大的一致性,以及(ii)只有在语言上不可信的反事实语序始终超过真实语序的一致性。这些发现与自然语言发展和使用中信息统一的压力相一致。1
{"title":"A Cross-Linguistic Pressure for Uniform Information Density in Word Order","authors":"T. Clark, Clara Meister, Tiago Pimentel, Michael Hahn, Ryan Cotterell, Richard Futrell, Roger Levy Mit, E. Zurich, U. Cambridge, Saarland University, UC Irvine","doi":"10.1162/tacl_a_00589","DOIUrl":"https://doi.org/10.1162/tacl_a_00589","url":null,"abstract":"Abstract While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: The uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1048-1065"},"PeriodicalIF":10.9,"publicationDate":"2023-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48703963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Supervised Gradual Machine Learning for Aspect-Term Sentiment Analysis 用于方面项情感分析的监督渐进机器学习
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-01 DOI: 10.1162/tacl_a_00571
Yanyan Wang, Qun Chen, Murtadha Ahmed, Zhaoqiang Chen, Jing Su, Wei Pan, Zhanhuai Li
Recent work has shown that Aspect-Term Sentiment Analysis (ATSA) can be effectively performed by Gradual Machine Learning (GML). However, the performance of the current unsupervised solution is limited by inaccurate and insufficient knowledge conveyance. In this paper, we propose a supervised GML approach for ATSA, which can effectively exploit labeled training data to improve knowledge conveyance. It leverages binary polarity relations between instances, which can be either similar or opposite, to enable supervised knowledge conveyance. Besides the explicit polarity relations indicated by discourse structures, it also separately supervises a polarity classification DNN and a binary Siamese network to extract implicit polarity relations. The proposed approach fulfills knowledge conveyance by modeling detected relations as binary features in a factor graph. Our extensive experiments on real benchmark data show that it achieves the state-of-the-art performance across all the test workloads. Our work demonstrates clearly that, in collaboration with DNN for feature extraction, GML outperforms pure DNN solutions.
最近的研究表明,方面术语情感分析(ATSA)可以通过渐进机器学习(GML)有效地进行。然而,当前无监督解决方案的性能受到知识传递不准确和不充分的限制。在本文中,我们提出了一种用于ATSA的监督GML方法,该方法可以有效地利用标记的训练数据来改进知识传递。它利用实例之间的二极性关系,可以是相似的,也可以是相反的,以实现有监督的知识传递。除了话语结构所指示的显性极性关系外,它还分别监督极性分类DNN和二元暹罗网络来提取隐性极性关系。所提出的方法通过将检测到的关系建模为因子图中的二进制特征来实现知识传递。我们在真实基准数据上进行的大量实验表明,它在所有测试工作负载中都实现了最先进的性能。我们的工作清楚地表明,与DNN合作进行特征提取,GML优于纯DNN解决方案。
{"title":"Supervised Gradual Machine Learning for Aspect-Term Sentiment Analysis","authors":"Yanyan Wang, Qun Chen, Murtadha Ahmed, Zhaoqiang Chen, Jing Su, Wei Pan, Zhanhuai Li","doi":"10.1162/tacl_a_00571","DOIUrl":"https://doi.org/10.1162/tacl_a_00571","url":null,"abstract":"Recent work has shown that Aspect-Term Sentiment Analysis (ATSA) can be effectively performed by Gradual Machine Learning (GML). However, the performance of the current unsupervised solution is limited by inaccurate and insufficient knowledge conveyance. In this paper, we propose a supervised GML approach for ATSA, which can effectively exploit labeled training data to improve knowledge conveyance. It leverages binary polarity relations between instances, which can be either similar or opposite, to enable supervised knowledge conveyance. Besides the explicit polarity relations indicated by discourse structures, it also separately supervises a polarity classification DNN and a binary Siamese network to extract implicit polarity relations. The proposed approach fulfills knowledge conveyance by modeling detected relations as binary features in a factor graph. Our extensive experiments on real benchmark data show that it achieves the state-of-the-art performance across all the test workloads. Our work demonstrates clearly that, in collaboration with DNN for feature extraction, GML outperforms pure DNN solutions.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"723-739"},"PeriodicalIF":10.9,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49253493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
OpenFact: Factuality Enhanced Open Knowledge Extraction OpenFact:事实增强的开放知识提取
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-01 DOI: 10.1162/tacl_a_00569
Linfeng Song, Ante Wang, Xiaoman Pan, Hongming Zhang, Dian Yu, Lifeng Jin, Haitao Mi, Jinsong Su, Yue Zhang, Dong Yu
We focus on the factuality property during the extraction of an OpenIE corpus named OpenFact, which contains more than 12 million high-quality knowledge triplets. We break down the factuality property into two important aspects—expressiveness and groundedness—and we propose a comprehensive framework to handle both aspects. To enhance expressiveness, we formulate each knowledge piece in OpenFact based on a semantic frame. We also design templates, extra constraints, and adopt human efforts so that most OpenFact triplets contain enough details. For groundedness, we require the main arguments of each triplet to contain linked Wikidata1 entities. A human evaluation suggests that the OpenFact triplets are much more accurate and contain denser information compared to OPIEC-Linked (Gashteovski et al., 2019), one recent high-quality OpenIE corpus grounded to Wikidata. Further experiments on knowledge base completion and knowledge base question answering show the effectiveness of OpenFact over OPIEC-Linked as supplementary knowledge to Wikidata as the major KG.
我们在提取名为OpenFact的OpenIE语料库的过程中重点研究了真实性,该语料库包含超过1200万个高质量的知识三元组。我们将事实性属性分解为两个重要方面——表现性和基础性,并提出了一个处理这两个方面的综合框架。为了增强表达能力,我们在OpenFact中基于语义框架来制定每个知识片段。我们还设计了模板、额外的约束,并采用人工操作,以便大多数OpenFact三元组都包含足够的细节。对于基础性,我们要求每个三元组的主要参数包含链接的Wikidata1实体。一项人类评估表明,与最近基于维基数据的高质量OpenIE语料库OPIEC Linked(Gashteovski et al.,2019)相比,OpenFact三元组更准确,包含更密集的信息。对知识库完成和知识库问答的进一步实验表明,OpenFact对作为补充知识链接到Wikidata作为主要KG的OPIEC的有效性。
{"title":"OpenFact: Factuality Enhanced Open Knowledge Extraction","authors":"Linfeng Song, Ante Wang, Xiaoman Pan, Hongming Zhang, Dian Yu, Lifeng Jin, Haitao Mi, Jinsong Su, Yue Zhang, Dong Yu","doi":"10.1162/tacl_a_00569","DOIUrl":"https://doi.org/10.1162/tacl_a_00569","url":null,"abstract":"We focus on the factuality property during the extraction of an OpenIE corpus named OpenFact, which contains more than 12 million high-quality knowledge triplets. We break down the factuality property into two important aspects—expressiveness and groundedness—and we propose a comprehensive framework to handle both aspects. To enhance expressiveness, we formulate each knowledge piece in OpenFact based on a semantic frame. We also design templates, extra constraints, and adopt human efforts so that most OpenFact triplets contain enough details. For groundedness, we require the main arguments of each triplet to contain linked Wikidata1 entities. A human evaluation suggests that the OpenFact triplets are much more accurate and contain denser information compared to OPIEC-Linked (Gashteovski et al., 2019), one recent high-quality OpenIE corpus grounded to Wikidata. Further experiments on knowledge base completion and knowledge base question answering show the effectiveness of OpenFact over OPIEC-Linked as supplementary knowledge to Wikidata as the major KG.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"686-702"},"PeriodicalIF":10.9,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42442182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions 开放域问答系统在最小化问题上的对比一致性探讨
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-23 DOI: 10.1162/tacl_a_00591
Zhihan Zhang, W. Yu, Zheng Ning, Mingxuan Ju, Meng Jiang
Abstract Contrast consistency, the ability of a model to make consistently correct predictions in the presence of perturbations, is an essential aspect in NLP. While studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. In this work, we collect minimally edited questions as challenging contrast sets to evaluate OpenQA models. Our collection approach combines both human annotation and large language model generation. We find that the widely used dense passage retriever (DPR) performs poorly on our contrast sets, despite fitting the training set well and performing competitively on standard test sets. To address this issue, we introduce a simple and effective query-side contrastive loss with the aid of data augmentation to improve DPR training. Our experiments on the contrast sets demonstrate that DPR’s contrast consistency is improved without sacrificing its accuracy on the standard test sets.1
摘要对比一致性,即模型在存在扰动的情况下做出一致正确预测的能力,是NLP的一个重要方面。虽然在情感分析和阅读理解等任务中进行了研究,但由于难以收集满足事实要求的扰动问题,在开放领域问答(OpenQA)中仍未进行探索。在这项工作中,我们收集了经过最少编辑的问题作为具有挑战性的对比集,以评估OpenQA模型。我们的集合方法结合了人工注释和大型语言模型生成。我们发现,尽管训练集拟合良好,并且在标准测试集上表现有竞争力,但广泛使用的密集通道检索器(DPR)在我们的对比集上表现不佳。为了解决这个问题,我们引入了一种简单有效的查询侧对比损失,并借助数据扩充来改进DPR训练。我们在对比度集上的实验表明,DPR的对比度一致性得到了提高,而不会牺牲其在标准测试集上的准确性。1
{"title":"Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions","authors":"Zhihan Zhang, W. Yu, Zheng Ning, Mingxuan Ju, Meng Jiang","doi":"10.1162/tacl_a_00591","DOIUrl":"https://doi.org/10.1162/tacl_a_00591","url":null,"abstract":"Abstract Contrast consistency, the ability of a model to make consistently correct predictions in the presence of perturbations, is an essential aspect in NLP. While studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. In this work, we collect minimally edited questions as challenging contrast sets to evaluate OpenQA models. Our collection approach combines both human annotation and large language model generation. We find that the widely used dense passage retriever (DPR) performs poorly on our contrast sets, despite fitting the training set well and performing competitively on standard test sets. To address this issue, we introduce a simple and effective query-side contrastive loss with the aid of data augmentation to improve DPR training. Our experiments on the contrast sets demonstrate that DPR’s contrast consistency is improved without sacrificing its accuracy on the standard test sets.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1082-1096"},"PeriodicalIF":10.9,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43293629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-functional Analysis of Generalization in Behavioral Learning 行为学习中泛化的跨职能分析
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-22 DOI: 10.1162/tacl_a_00590
Pedro Henrique Luz de Araujo, Benjamin Roth
Abstract In behavioral testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimizing performance on the behavioral tests during training (behavioral learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioral test suite, leading to overestimation and misrepresentation of model performance—one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioral learning considering generalization across dimensions of different granularity levels. We optimize behavior-specific loss functions and evaluate models on several partitions of the behavioral test suite controlled to leave out specific phenomena. An aggregate score measures generalization to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification, and reading comprehension) and compare the impact of a diverse set of regularization and domain generalization methods on generalization performance.1
摘要在行为测试中,通过受控的输入-输出对来验证在标准评估设置(具有保留的测试集)中代表性不足的系统功能。在训练期间优化行为测试的表现(行为学习)将提高对i.i.d.数据中未充分表示的现象的覆盖率,并可能导致看似更稳健的模型。然而,存在这样的风险,即模型从行为测试套件中狭隘地捕捉到虚假的相关性,导致对模型性能的高估和失实陈述——这是传统评估的最初陷阱之一。在这项工作中,我们介绍了BeLUGA,这是一种评估行为学习的分析方法,考虑了不同粒度级别维度的泛化。我们优化了行为特定损失函数,并在行为测试套件的几个分区上评估了模型,这些分区被控制为忽略特定现象。聚合分数衡量对看不见的功能的泛化(或过拟合)。我们使用BeLUGA来检验三个具有代表性的NLP任务(情绪分析、转述识别和阅读理解),并比较一组不同的正则化和领域泛化方法对泛化性能的影响。1
{"title":"Cross-functional Analysis of Generalization in Behavioral Learning","authors":"Pedro Henrique Luz de Araujo, Benjamin Roth","doi":"10.1162/tacl_a_00590","DOIUrl":"https://doi.org/10.1162/tacl_a_00590","url":null,"abstract":"Abstract In behavioral testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimizing performance on the behavioral tests during training (behavioral learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioral test suite, leading to overestimation and misrepresentation of model performance—one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioral learning considering generalization across dimensions of different granularity levels. We optimize behavior-specific loss functions and evaluate models on several partitions of the behavioral test suite controlled to leave out specific phenomena. An aggregate score measures generalization to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification, and reading comprehension) and compare the impact of a diverse set of regularization and domain generalization methods on generalization performance.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1066-1081"},"PeriodicalIF":10.9,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41847615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing 基于语义分析的句子分解弥合合成问题与自然问题之间的鸿沟
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-01 DOI: 10.1162/tacl_a_00552
Yilin Niu, Fei Huang, W. Liu, Jianwei Cui, Bin Wang, Minlie Huang
Semantic parsing maps natural language questions into logical forms, which can be executed against a knowledge base for answers. In real-world applications, the performance of a parser is often limited by the lack of training data. To facilitate zero-shot learning, data synthesis has been widely studied to automatically generate paired questions and logical forms. However, data synthesis methods can hardly cover the diverse structures in natural languages, leading to a large gap in sentence structure between synthetic and natural questions. In this paper, we propose a decomposition-based method to unify the sentence structures of questions, which benefits the generalization to natural questions. Experiments demonstrate that our method significantly improves the semantic parser trained on synthetic data (+7.9% on KQA and +8.9% on ComplexWebQuestions in terms of exact match accuracy). Extensive analysis demonstrates that our method can better generalize to natural questions with novel text expressions compared with baselines. Besides semantic parsing, our idea potentially benefits other semantic understanding tasks by mitigating the distracting structure features. To illustrate this, we extend our method to the task of sentence embedding learning, and observe substantial improvements on sentence retrieval (+13.1% for Hit@1).
语义解析将自然语言问题映射为逻辑形式,这些逻辑形式可以根据知识库执行以获得答案。在实际应用中,解析器的性能通常受到缺乏训练数据的限制。为了促进零样本学习,数据合成已被广泛研究,以自动生成配对问题和逻辑形式。然而,数据合成方法很难涵盖自然语言中不同的结构,导致合成问题和自然问题在句子结构上存在很大差距。在本文中,我们提出了一种基于分解的方法来统一问题的句子结构,这有利于对自然问题的泛化。实验表明,我们的方法显著提高了在合成数据上训练的语义解析器(在精确匹配精度方面,KQA上提高了7.9%,ComplexWebQuestions上提高了8.9%)。广泛的分析表明,与基线相比,我们的方法可以更好地推广到具有新颖文本表达的自然问题。除了语义解析之外,我们的想法还可能通过减轻分散注意力的结构特征,为其他语义理解任务带来好处。为了说明这一点,我们将我们的方法扩展到句子嵌入学习的任务中,并观察到句子检索的显著改进(Hit@1)。
{"title":"Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing","authors":"Yilin Niu, Fei Huang, W. Liu, Jianwei Cui, Bin Wang, Minlie Huang","doi":"10.1162/tacl_a_00552","DOIUrl":"https://doi.org/10.1162/tacl_a_00552","url":null,"abstract":"Semantic parsing maps natural language questions into logical forms, which can be executed against a knowledge base for answers. In real-world applications, the performance of a parser is often limited by the lack of training data. To facilitate zero-shot learning, data synthesis has been widely studied to automatically generate paired questions and logical forms. However, data synthesis methods can hardly cover the diverse structures in natural languages, leading to a large gap in sentence structure between synthetic and natural questions. In this paper, we propose a decomposition-based method to unify the sentence structures of questions, which benefits the generalization to natural questions. Experiments demonstrate that our method significantly improves the semantic parser trained on synthetic data (+7.9% on KQA and +8.9% on ComplexWebQuestions in terms of exact match accuracy). Extensive analysis demonstrates that our method can better generalize to natural questions with novel text expressions compared with baselines. Besides semantic parsing, our idea potentially benefits other semantic understanding tasks by mitigating the distracting structure features. To illustrate this, we extend our method to the task of sentence embedding learning, and observe substantial improvements on sentence retrieval (+13.1% for Hit@1).","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"367-383"},"PeriodicalIF":10.9,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42399980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation 高质量非自回归文本生成的有向无环变压器预训练
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-24 DOI: 10.1162/tacl_a_00582
Fei Huang, Pei Ke, Minlie Huang
Abstract Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1
摘要在机器翻译中,非自回归(NAR)文本生成模型由于其显著较快的解码速度和良好的生成质量而备受关注。然而,在更广泛的文本生成任务中,现有的NAR模型缺乏适当的预训练,这使得它们仍然远远落后于预训练的自回归模型。在本文中,我们提出了预训练的有向无循环变换器(PreDAT)和一种新的预训练任务,以提高NAR生成中的预测一致性。在五个文本生成任务上的实验表明,我们的PreDAT显著优于现有的预训练的NAR模型(平均得分+4.2),甚至在基于n-gram的度量中取得了比预训练的自回归基线更好的结果,吞吐量提高了17倍。进一步的分析表明,PreDAT受益于无偏预测顺序,它缓解了自回归生成中的误差累积问题,这为NAR生成的优势提供了新的见解。1
{"title":"Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation","authors":"Fei Huang, Pei Ke, Minlie Huang","doi":"10.1162/tacl_a_00582","DOIUrl":"https://doi.org/10.1162/tacl_a_00582","url":null,"abstract":"Abstract Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"941-959"},"PeriodicalIF":10.9,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42386234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Expectations over Unspoken Alternatives Predict Pragmatic Inferences 对非言语选择的期望预测语用推理
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-07 DOI: 10.1162/tacl_a_00579
Jennifer Hu, R. Levy, Judith Degen, Sebastian Schuster
Abstract Scalar inferences (SI) are a signature example of how humans interpret language based on unspoken alternatives. While empirical studies have demonstrated that human SI rates are highly variable—both within instances of a single scale, and across different scales—there have been few proposals that quantitatively explain both cross- and within-scale variation. Furthermore, while it is generally assumed that SIs arise through reasoning about unspoken alternatives, it remains debated whether humans reason about alternatives as linguistic forms, or at the level of concepts. Here, we test a shared mechanism explaining SI rates within and across scales: context-driven expectations about the unspoken alternatives. Using neural language models to approximate human predictive distributions, we find that SI rates are captured by the expectedness of the strong scalemate as an alternative. Crucially, however, expectedness robustly predicts cross-scale variation only under a meaning-based view of alternatives. Our results suggest that pragmatic inferences arise from context-driven expectations over alternatives, and these expectations operate at the level of concepts.1
抽象标量推断(SI)是人类如何基于未言明的替代品来解释语言的一个标志性例子。虽然实证研究表明,人类的SI率是高度可变的——无论是在单个量表的情况下,还是在不同的量表之间——但很少有建议能够定量解释跨量表和量表内的变化。此外,尽管人们普遍认为SI是通过对未言明的替代品进行推理而产生的,但人类是将替代品作为语言形式进行推理,还是在概念层面进行推理,仍存在争议。在这里,我们测试了一种解释尺度内和尺度间SI率的共享机制:上下文驱动的对未言明的替代方案的期望。使用神经语言模型来近似人类的预测分布,我们发现,作为一种替代方案,强缩放对象的期望性可以捕获SI率。然而,至关重要的是,只有在基于意义的替代观点下,预期性才能稳健地预测跨尺度的变化。我们的研究结果表明,语用推理源于上下文驱动的对替代方案的期望,这些期望在概念层面上起作用。1
{"title":"Expectations over Unspoken Alternatives Predict Pragmatic Inferences","authors":"Jennifer Hu, R. Levy, Judith Degen, Sebastian Schuster","doi":"10.1162/tacl_a_00579","DOIUrl":"https://doi.org/10.1162/tacl_a_00579","url":null,"abstract":"Abstract Scalar inferences (SI) are a signature example of how humans interpret language based on unspoken alternatives. While empirical studies have demonstrated that human SI rates are highly variable—both within instances of a single scale, and across different scales—there have been few proposals that quantitatively explain both cross- and within-scale variation. Furthermore, while it is generally assumed that SIs arise through reasoning about unspoken alternatives, it remains debated whether humans reason about alternatives as linguistic forms, or at the level of concepts. Here, we test a shared mechanism explaining SI rates within and across scales: context-driven expectations about the unspoken alternatives. Using neural language models to approximate human predictive distributions, we find that SI rates are captured by the expectedness of the strong scalemate as an alternative. Crucially, however, expectedness robustly predicts cross-scale variation only under a meaning-based view of alternatives. Our results suggest that pragmatic inferences arise from context-driven expectations over alternatives, and these expectations operate at the level of concepts.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"885-901"},"PeriodicalIF":10.9,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42869322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design 众包隐含话语关系的设计选择:揭示任务设计引入的偏见
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-03 DOI: 10.1162/tacl_a_00586
Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg
Abstract Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias—task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of lay annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations’ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relation senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.
摘要对自然语言标注中的歧义问题的研究大多是从标注者和标注框架引入的偏差角度出发的。在这里,我们建议分析另一种偏见来源-任务设计偏见,它对使用自然语言来引出外行注释者解释的众包语言注释有特别强烈的影响。为此,我们来看看隐式话语关系注释,由于关系的模糊性,这项任务一再被证明是困难的。我们比较了使用两种不同注释任务获得的1200个话语关系的注释,并量化了这两种方法在四个不同领域的偏差。这两种方法都是为众包设计的自然语言注释任务。我们的研究表明,任务设计可以将注释者推向特定的关系,并且使用一种或另一种注释方法可以更好地引出某些话语关系感官。我们还得出结论,在训练和测试模型时应考虑到这种类型的偏差。
{"title":"Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design","authors":"Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg","doi":"10.1162/tacl_a_00586","DOIUrl":"https://doi.org/10.1162/tacl_a_00586","url":null,"abstract":"Abstract Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias—task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of lay annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations’ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relation senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1014-1032"},"PeriodicalIF":10.9,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47802644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compositional Zero-Shot Domain Transfer with Text-to-Text Models 文本到文本模型的组合零射击域转移
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-23 DOI: 10.1162/tacl_a_00585
Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Shenmin Zhang, Tristan Naumann, A. Nori, Hoifung Poon, J. Alvarez-Valle, O. Oktay, Stephanie L. Hyland
Abstract Label scarcity is a bottleneck for improving task performance in specialized domains. We propose a novel compositional transfer learning framework (DoT51) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from masked language modelling of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: We simultaneously train natural language generation (NLG) for in-domain label-to-data generation, which enables data augmentation for self-finetuning and natural language understanding (NLU) for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on natural language inference, text summarization, and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current state-of-the-art in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.
摘要标签稀缺是提高专业领域任务性能的瓶颈。我们提出了一种新的用于零样本域转移的组合转移学习框架(DoT51)。在没有访问域内标签的情况下,DoT5以多任务的方式联合学习域知识(来自未标记的域内自由文本的掩蔽语言建模)和任务知识(来自对更容易获得的通用域数据的任务训练)。为了提高任务训练的可转移性,我们设计了一种名为NLGU的策略:我们同时训练用于域内标签到数据生成的自然语言生成(NLG),这使得能够进行自微调的数据扩充和用于标签预测的自然语言理解(NLU)。我们在生物医学领域和放射学的资源节约型子域上评估了DoT5,重点是自然语言推理、文本摘要和嵌入学习。DoT5通过多任务学习证明了作文迁移学习的有效性。特别是,DoT5在RadNLI上的精度超过7个绝对点,在零样本传输方面优于当前最先进的技术。我们通过消融和一个案例研究验证了DoT5,证明了其解决需要领域内专业知识的具有挑战性的NLI示例的能力。
{"title":"Compositional Zero-Shot Domain Transfer with Text-to-Text Models","authors":"Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Shenmin Zhang, Tristan Naumann, A. Nori, Hoifung Poon, J. Alvarez-Valle, O. Oktay, Stephanie L. Hyland","doi":"10.1162/tacl_a_00585","DOIUrl":"https://doi.org/10.1162/tacl_a_00585","url":null,"abstract":"Abstract Label scarcity is a bottleneck for improving task performance in specialized domains. We propose a novel compositional transfer learning framework (DoT51) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from masked language modelling of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: We simultaneously train natural language generation (NLG) for in-domain label-to-data generation, which enables data augmentation for self-finetuning and natural language understanding (NLU) for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on natural language inference, text summarization, and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current state-of-the-art in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1097-1113"},"PeriodicalIF":10.9,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43977430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Transactions of the Association for Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1