首页 > 最新文献

Transactions of the Association for Computational Linguistics最新文献

英文 中文
OpenFact: Factuality Enhanced Open Knowledge Extraction OpenFact:事实增强的开放知识提取
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-06-01 DOI: 10.1162/tacl_a_00569
Linfeng Song, Ante Wang, Xiaoman Pan, Hongming Zhang, Dian Yu, Lifeng Jin, Haitao Mi, Jinsong Su, Yue Zhang, Dong Yu
We focus on the factuality property during the extraction of an OpenIE corpus named OpenFact, which contains more than 12 million high-quality knowledge triplets. We break down the factuality property into two important aspects—expressiveness and groundedness—and we propose a comprehensive framework to handle both aspects. To enhance expressiveness, we formulate each knowledge piece in OpenFact based on a semantic frame. We also design templates, extra constraints, and adopt human efforts so that most OpenFact triplets contain enough details. For groundedness, we require the main arguments of each triplet to contain linked Wikidata1 entities. A human evaluation suggests that the OpenFact triplets are much more accurate and contain denser information compared to OPIEC-Linked (Gashteovski et al., 2019), one recent high-quality OpenIE corpus grounded to Wikidata. Further experiments on knowledge base completion and knowledge base question answering show the effectiveness of OpenFact over OPIEC-Linked as supplementary knowledge to Wikidata as the major KG.
我们在提取名为OpenFact的OpenIE语料库的过程中重点研究了真实性,该语料库包含超过1200万个高质量的知识三元组。我们将事实性属性分解为两个重要方面——表现性和基础性,并提出了一个处理这两个方面的综合框架。为了增强表达能力,我们在OpenFact中基于语义框架来制定每个知识片段。我们还设计了模板、额外的约束,并采用人工操作,以便大多数OpenFact三元组都包含足够的细节。对于基础性,我们要求每个三元组的主要参数包含链接的Wikidata1实体。一项人类评估表明,与最近基于维基数据的高质量OpenIE语料库OPIEC Linked(Gashteovski et al.,2019)相比,OpenFact三元组更准确,包含更密集的信息。对知识库完成和知识库问答的进一步实验表明,OpenFact对作为补充知识链接到Wikidata作为主要KG的OPIEC的有效性。
{"title":"OpenFact: Factuality Enhanced Open Knowledge Extraction","authors":"Linfeng Song, Ante Wang, Xiaoman Pan, Hongming Zhang, Dian Yu, Lifeng Jin, Haitao Mi, Jinsong Su, Yue Zhang, Dong Yu","doi":"10.1162/tacl_a_00569","DOIUrl":"https://doi.org/10.1162/tacl_a_00569","url":null,"abstract":"We focus on the factuality property during the extraction of an OpenIE corpus named OpenFact, which contains more than 12 million high-quality knowledge triplets. We break down the factuality property into two important aspects—expressiveness and groundedness—and we propose a comprehensive framework to handle both aspects. To enhance expressiveness, we formulate each knowledge piece in OpenFact based on a semantic frame. We also design templates, extra constraints, and adopt human efforts so that most OpenFact triplets contain enough details. For groundedness, we require the main arguments of each triplet to contain linked Wikidata1 entities. A human evaluation suggests that the OpenFact triplets are much more accurate and contain denser information compared to OPIEC-Linked (Gashteovski et al., 2019), one recent high-quality OpenIE corpus grounded to Wikidata. Further experiments on knowledge base completion and knowledge base question answering show the effectiveness of OpenFact over OPIEC-Linked as supplementary knowledge to Wikidata as the major KG.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"686-702"},"PeriodicalIF":10.9,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42442182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions 开放域问答系统在最小化问题上的对比一致性探讨
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-23 DOI: 10.1162/tacl_a_00591
Zhihan Zhang, W. Yu, Zheng Ning, Mingxuan Ju, Meng Jiang
Abstract Contrast consistency, the ability of a model to make consistently correct predictions in the presence of perturbations, is an essential aspect in NLP. While studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. In this work, we collect minimally edited questions as challenging contrast sets to evaluate OpenQA models. Our collection approach combines both human annotation and large language model generation. We find that the widely used dense passage retriever (DPR) performs poorly on our contrast sets, despite fitting the training set well and performing competitively on standard test sets. To address this issue, we introduce a simple and effective query-side contrastive loss with the aid of data augmentation to improve DPR training. Our experiments on the contrast sets demonstrate that DPR’s contrast consistency is improved without sacrificing its accuracy on the standard test sets.1
摘要对比一致性,即模型在存在扰动的情况下做出一致正确预测的能力,是NLP的一个重要方面。虽然在情感分析和阅读理解等任务中进行了研究,但由于难以收集满足事实要求的扰动问题,在开放领域问答(OpenQA)中仍未进行探索。在这项工作中,我们收集了经过最少编辑的问题作为具有挑战性的对比集,以评估OpenQA模型。我们的集合方法结合了人工注释和大型语言模型生成。我们发现,尽管训练集拟合良好,并且在标准测试集上表现有竞争力,但广泛使用的密集通道检索器(DPR)在我们的对比集上表现不佳。为了解决这个问题,我们引入了一种简单有效的查询侧对比损失,并借助数据扩充来改进DPR训练。我们在对比度集上的实验表明,DPR的对比度一致性得到了提高,而不会牺牲其在标准测试集上的准确性。1
{"title":"Exploring Contrast Consistency of Open-Domain Question Answering Systems on Minimally Edited Questions","authors":"Zhihan Zhang, W. Yu, Zheng Ning, Mingxuan Ju, Meng Jiang","doi":"10.1162/tacl_a_00591","DOIUrl":"https://doi.org/10.1162/tacl_a_00591","url":null,"abstract":"Abstract Contrast consistency, the ability of a model to make consistently correct predictions in the presence of perturbations, is an essential aspect in NLP. While studied in tasks such as sentiment analysis and reading comprehension, it remains unexplored in open-domain question answering (OpenQA) due to the difficulty of collecting perturbed questions that satisfy factuality requirements. In this work, we collect minimally edited questions as challenging contrast sets to evaluate OpenQA models. Our collection approach combines both human annotation and large language model generation. We find that the widely used dense passage retriever (DPR) performs poorly on our contrast sets, despite fitting the training set well and performing competitively on standard test sets. To address this issue, we introduce a simple and effective query-side contrastive loss with the aid of data augmentation to improve DPR training. Our experiments on the contrast sets demonstrate that DPR’s contrast consistency is improved without sacrificing its accuracy on the standard test sets.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1082-1096"},"PeriodicalIF":10.9,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43293629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Cross-functional Analysis of Generalization in Behavioral Learning 行为学习中泛化的跨职能分析
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-22 DOI: 10.1162/tacl_a_00590
Pedro Henrique Luz de Araujo, Benjamin Roth
Abstract In behavioral testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimizing performance on the behavioral tests during training (behavioral learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioral test suite, leading to overestimation and misrepresentation of model performance—one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioral learning considering generalization across dimensions of different granularity levels. We optimize behavior-specific loss functions and evaluate models on several partitions of the behavioral test suite controlled to leave out specific phenomena. An aggregate score measures generalization to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification, and reading comprehension) and compare the impact of a diverse set of regularization and domain generalization methods on generalization performance.1
摘要在行为测试中,通过受控的输入-输出对来验证在标准评估设置(具有保留的测试集)中代表性不足的系统功能。在训练期间优化行为测试的表现(行为学习)将提高对i.i.d.数据中未充分表示的现象的覆盖率,并可能导致看似更稳健的模型。然而,存在这样的风险,即模型从行为测试套件中狭隘地捕捉到虚假的相关性,导致对模型性能的高估和失实陈述——这是传统评估的最初陷阱之一。在这项工作中,我们介绍了BeLUGA,这是一种评估行为学习的分析方法,考虑了不同粒度级别维度的泛化。我们优化了行为特定损失函数,并在行为测试套件的几个分区上评估了模型,这些分区被控制为忽略特定现象。聚合分数衡量对看不见的功能的泛化(或过拟合)。我们使用BeLUGA来检验三个具有代表性的NLP任务(情绪分析、转述识别和阅读理解),并比较一组不同的正则化和领域泛化方法对泛化性能的影响。1
{"title":"Cross-functional Analysis of Generalization in Behavioral Learning","authors":"Pedro Henrique Luz de Araujo, Benjamin Roth","doi":"10.1162/tacl_a_00590","DOIUrl":"https://doi.org/10.1162/tacl_a_00590","url":null,"abstract":"Abstract In behavioral testing, system functionalities underrepresented in the standard evaluation setting (with a held-out test set) are validated through controlled input-output pairs. Optimizing performance on the behavioral tests during training (behavioral learning) would improve coverage of phenomena not sufficiently represented in the i.i.d. data and could lead to seemingly more robust models. However, there is the risk that the model narrowly captures spurious correlations from the behavioral test suite, leading to overestimation and misrepresentation of model performance—one of the original pitfalls of traditional evaluation. In this work, we introduce BeLUGA, an analysis method for evaluating behavioral learning considering generalization across dimensions of different granularity levels. We optimize behavior-specific loss functions and evaluate models on several partitions of the behavioral test suite controlled to leave out specific phenomena. An aggregate score measures generalization to unseen functionalities (or overfitting). We use BeLUGA to examine three representative NLP tasks (sentiment analysis, paraphrase identification, and reading comprehension) and compare the impact of a diverse set of regularization and domain generalization methods on generalization performance.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1066-1081"},"PeriodicalIF":10.9,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41847615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing 基于语义分析的句子分解弥合合成问题与自然问题之间的鸿沟
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-05-01 DOI: 10.1162/tacl_a_00552
Yilin Niu, Fei Huang, W. Liu, Jianwei Cui, Bin Wang, Minlie Huang
Semantic parsing maps natural language questions into logical forms, which can be executed against a knowledge base for answers. In real-world applications, the performance of a parser is often limited by the lack of training data. To facilitate zero-shot learning, data synthesis has been widely studied to automatically generate paired questions and logical forms. However, data synthesis methods can hardly cover the diverse structures in natural languages, leading to a large gap in sentence structure between synthetic and natural questions. In this paper, we propose a decomposition-based method to unify the sentence structures of questions, which benefits the generalization to natural questions. Experiments demonstrate that our method significantly improves the semantic parser trained on synthetic data (+7.9% on KQA and +8.9% on ComplexWebQuestions in terms of exact match accuracy). Extensive analysis demonstrates that our method can better generalize to natural questions with novel text expressions compared with baselines. Besides semantic parsing, our idea potentially benefits other semantic understanding tasks by mitigating the distracting structure features. To illustrate this, we extend our method to the task of sentence embedding learning, and observe substantial improvements on sentence retrieval (+13.1% for Hit@1).
语义解析将自然语言问题映射为逻辑形式,这些逻辑形式可以根据知识库执行以获得答案。在实际应用中,解析器的性能通常受到缺乏训练数据的限制。为了促进零样本学习,数据合成已被广泛研究,以自动生成配对问题和逻辑形式。然而,数据合成方法很难涵盖自然语言中不同的结构,导致合成问题和自然问题在句子结构上存在很大差距。在本文中,我们提出了一种基于分解的方法来统一问题的句子结构,这有利于对自然问题的泛化。实验表明,我们的方法显著提高了在合成数据上训练的语义解析器(在精确匹配精度方面,KQA上提高了7.9%,ComplexWebQuestions上提高了8.9%)。广泛的分析表明,与基线相比,我们的方法可以更好地推广到具有新颖文本表达的自然问题。除了语义解析之外,我们的想法还可能通过减轻分散注意力的结构特征,为其他语义理解任务带来好处。为了说明这一点,我们将我们的方法扩展到句子嵌入学习的任务中,并观察到句子检索的显著改进(Hit@1)。
{"title":"Bridging the Gap between Synthetic and Natural Questions via Sentence Decomposition for Semantic Parsing","authors":"Yilin Niu, Fei Huang, W. Liu, Jianwei Cui, Bin Wang, Minlie Huang","doi":"10.1162/tacl_a_00552","DOIUrl":"https://doi.org/10.1162/tacl_a_00552","url":null,"abstract":"Semantic parsing maps natural language questions into logical forms, which can be executed against a knowledge base for answers. In real-world applications, the performance of a parser is often limited by the lack of training data. To facilitate zero-shot learning, data synthesis has been widely studied to automatically generate paired questions and logical forms. However, data synthesis methods can hardly cover the diverse structures in natural languages, leading to a large gap in sentence structure between synthetic and natural questions. In this paper, we propose a decomposition-based method to unify the sentence structures of questions, which benefits the generalization to natural questions. Experiments demonstrate that our method significantly improves the semantic parser trained on synthetic data (+7.9% on KQA and +8.9% on ComplexWebQuestions in terms of exact match accuracy). Extensive analysis demonstrates that our method can better generalize to natural questions with novel text expressions compared with baselines. Besides semantic parsing, our idea potentially benefits other semantic understanding tasks by mitigating the distracting structure features. To illustrate this, we extend our method to the task of sentence embedding learning, and observe substantial improvements on sentence retrieval (+13.1% for Hit@1).","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"367-383"},"PeriodicalIF":10.9,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42399980","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation 高质量非自回归文本生成的有向无环变压器预训练
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-24 DOI: 10.1162/tacl_a_00582
Fei Huang, Pei Ke, Minlie Huang
Abstract Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1
摘要在机器翻译中,非自回归(NAR)文本生成模型由于其显著较快的解码速度和良好的生成质量而备受关注。然而,在更广泛的文本生成任务中,现有的NAR模型缺乏适当的预训练,这使得它们仍然远远落后于预训练的自回归模型。在本文中,我们提出了预训练的有向无循环变换器(PreDAT)和一种新的预训练任务,以提高NAR生成中的预测一致性。在五个文本生成任务上的实验表明,我们的PreDAT显著优于现有的预训练的NAR模型(平均得分+4.2),甚至在基于n-gram的度量中取得了比预训练的自回归基线更好的结果,吞吐量提高了17倍。进一步的分析表明,PreDAT受益于无偏预测顺序,它缓解了自回归生成中的误差累积问题,这为NAR生成的优势提供了新的见解。1
{"title":"Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation","authors":"Fei Huang, Pei Ke, Minlie Huang","doi":"10.1162/tacl_a_00582","DOIUrl":"https://doi.org/10.1162/tacl_a_00582","url":null,"abstract":"Abstract Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine translation. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acyclic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT remarkably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autoregressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alleviates the error accumulation problem in autoregressive generation, which provides new insights into the advantages of NAR generation.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"941-959"},"PeriodicalIF":10.9,"publicationDate":"2023-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42386234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Expectations over Unspoken Alternatives Predict Pragmatic Inferences 对非言语选择的期望预测语用推理
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-07 DOI: 10.1162/tacl_a_00579
Jennifer Hu, R. Levy, Judith Degen, Sebastian Schuster
Abstract Scalar inferences (SI) are a signature example of how humans interpret language based on unspoken alternatives. While empirical studies have demonstrated that human SI rates are highly variable—both within instances of a single scale, and across different scales—there have been few proposals that quantitatively explain both cross- and within-scale variation. Furthermore, while it is generally assumed that SIs arise through reasoning about unspoken alternatives, it remains debated whether humans reason about alternatives as linguistic forms, or at the level of concepts. Here, we test a shared mechanism explaining SI rates within and across scales: context-driven expectations about the unspoken alternatives. Using neural language models to approximate human predictive distributions, we find that SI rates are captured by the expectedness of the strong scalemate as an alternative. Crucially, however, expectedness robustly predicts cross-scale variation only under a meaning-based view of alternatives. Our results suggest that pragmatic inferences arise from context-driven expectations over alternatives, and these expectations operate at the level of concepts.1
抽象标量推断(SI)是人类如何基于未言明的替代品来解释语言的一个标志性例子。虽然实证研究表明,人类的SI率是高度可变的——无论是在单个量表的情况下,还是在不同的量表之间——但很少有建议能够定量解释跨量表和量表内的变化。此外,尽管人们普遍认为SI是通过对未言明的替代品进行推理而产生的,但人类是将替代品作为语言形式进行推理,还是在概念层面进行推理,仍存在争议。在这里,我们测试了一种解释尺度内和尺度间SI率的共享机制:上下文驱动的对未言明的替代方案的期望。使用神经语言模型来近似人类的预测分布,我们发现,作为一种替代方案,强缩放对象的期望性可以捕获SI率。然而,至关重要的是,只有在基于意义的替代观点下,预期性才能稳健地预测跨尺度的变化。我们的研究结果表明,语用推理源于上下文驱动的对替代方案的期望,这些期望在概念层面上起作用。1
{"title":"Expectations over Unspoken Alternatives Predict Pragmatic Inferences","authors":"Jennifer Hu, R. Levy, Judith Degen, Sebastian Schuster","doi":"10.1162/tacl_a_00579","DOIUrl":"https://doi.org/10.1162/tacl_a_00579","url":null,"abstract":"Abstract Scalar inferences (SI) are a signature example of how humans interpret language based on unspoken alternatives. While empirical studies have demonstrated that human SI rates are highly variable—both within instances of a single scale, and across different scales—there have been few proposals that quantitatively explain both cross- and within-scale variation. Furthermore, while it is generally assumed that SIs arise through reasoning about unspoken alternatives, it remains debated whether humans reason about alternatives as linguistic forms, or at the level of concepts. Here, we test a shared mechanism explaining SI rates within and across scales: context-driven expectations about the unspoken alternatives. Using neural language models to approximate human predictive distributions, we find that SI rates are captured by the expectedness of the strong scalemate as an alternative. Crucially, however, expectedness robustly predicts cross-scale variation only under a meaning-based view of alternatives. Our results suggest that pragmatic inferences arise from context-driven expectations over alternatives, and these expectations operate at the level of concepts.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"885-901"},"PeriodicalIF":10.9,"publicationDate":"2023-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42869322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design 众包隐含话语关系的设计选择:揭示任务设计引入的偏见
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-04-03 DOI: 10.1162/tacl_a_00586
Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg
Abstract Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias—task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of lay annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations’ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relation senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.
摘要对自然语言标注中的歧义问题的研究大多是从标注者和标注框架引入的偏差角度出发的。在这里,我们建议分析另一种偏见来源-任务设计偏见,它对使用自然语言来引出外行注释者解释的众包语言注释有特别强烈的影响。为此,我们来看看隐式话语关系注释,由于关系的模糊性,这项任务一再被证明是困难的。我们比较了使用两种不同注释任务获得的1200个话语关系的注释,并量化了这两种方法在四个不同领域的偏差。这两种方法都是为众包设计的自然语言注释任务。我们的研究表明,任务设计可以将注释者推向特定的关系,并且使用一种或另一种注释方法可以更好地引出某些话语关系感官。我们还得出结论,在训练和测试模型时应考虑到这种类型的偏差。
{"title":"Design Choices for Crowdsourcing Implicit Discourse Relations: Revealing the Biases Introduced by Task Design","authors":"Valentina Pyatkin, Frances Yung, Merel C. J. Scholman, Reut Tsarfaty, Ido Dagan, Vera Demberg","doi":"10.1162/tacl_a_00586","DOIUrl":"https://doi.org/10.1162/tacl_a_00586","url":null,"abstract":"Abstract Disagreement in natural language annotation has mostly been studied from a perspective of biases introduced by the annotators and the annotation frameworks. Here, we propose to analyze another source of bias—task design bias, which has a particularly strong impact on crowdsourced linguistic annotations where natural language is used to elicit the interpretation of lay annotators. For this purpose we look at implicit discourse relation annotation, a task that has repeatedly been shown to be difficult due to the relations’ ambiguity. We compare the annotations of 1,200 discourse relations obtained using two distinct annotation tasks and quantify the biases of both methods across four different domains. Both methods are natural language annotation tasks designed for crowdsourcing. We show that the task design can push annotators towards certain relations and that some discourse relation senses can be better elicited with one or the other annotation approach. We also conclude that this type of bias should be taken into account when training and testing models.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1014-1032"},"PeriodicalIF":10.9,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47802644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compositional Zero-Shot Domain Transfer with Text-to-Text Models 文本到文本模型的组合零射击域转移
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-23 DOI: 10.1162/tacl_a_00585
Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Shenmin Zhang, Tristan Naumann, A. Nori, Hoifung Poon, J. Alvarez-Valle, O. Oktay, Stephanie L. Hyland
Abstract Label scarcity is a bottleneck for improving task performance in specialized domains. We propose a novel compositional transfer learning framework (DoT51) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from masked language modelling of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: We simultaneously train natural language generation (NLG) for in-domain label-to-data generation, which enables data augmentation for self-finetuning and natural language understanding (NLU) for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on natural language inference, text summarization, and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current state-of-the-art in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.
摘要标签稀缺是提高专业领域任务性能的瓶颈。我们提出了一种新的用于零样本域转移的组合转移学习框架(DoT51)。在没有访问域内标签的情况下,DoT5以多任务的方式联合学习域知识(来自未标记的域内自由文本的掩蔽语言建模)和任务知识(来自对更容易获得的通用域数据的任务训练)。为了提高任务训练的可转移性,我们设计了一种名为NLGU的策略:我们同时训练用于域内标签到数据生成的自然语言生成(NLG),这使得能够进行自微调的数据扩充和用于标签预测的自然语言理解(NLU)。我们在生物医学领域和放射学的资源节约型子域上评估了DoT5,重点是自然语言推理、文本摘要和嵌入学习。DoT5通过多任务学习证明了作文迁移学习的有效性。特别是,DoT5在RadNLI上的精度超过7个绝对点,在零样本传输方面优于当前最先进的技术。我们通过消融和一个案例研究验证了DoT5,证明了其解决需要领域内专业知识的具有挑战性的NLI示例的能力。
{"title":"Compositional Zero-Shot Domain Transfer with Text-to-Text Models","authors":"Fangyu Liu, Qianchu Liu, Shruthi Bannur, Fernando Pérez-García, Naoto Usuyama, Shenmin Zhang, Tristan Naumann, A. Nori, Hoifung Poon, J. Alvarez-Valle, O. Oktay, Stephanie L. Hyland","doi":"10.1162/tacl_a_00585","DOIUrl":"https://doi.org/10.1162/tacl_a_00585","url":null,"abstract":"Abstract Label scarcity is a bottleneck for improving task performance in specialized domains. We propose a novel compositional transfer learning framework (DoT51) for zero-shot domain transfer. Without access to in-domain labels, DoT5 jointly learns domain knowledge (from masked language modelling of unlabelled in-domain free text) and task knowledge (from task training on more readily available general-domain data) in a multi-task manner. To improve the transferability of task training, we design a strategy named NLGU: We simultaneously train natural language generation (NLG) for in-domain label-to-data generation, which enables data augmentation for self-finetuning and natural language understanding (NLU) for label prediction. We evaluate DoT5 on the biomedical domain and the resource-lean subdomain of radiology, focusing on natural language inference, text summarization, and embedding learning. DoT5 demonstrates the effectiveness of compositional transfer learning through multi-task learning. In particular, DoT5 outperforms the current state-of-the-art in zero-shot transfer by over 7 absolute points in accuracy on RadNLI. We validate DoT5 with ablations and a case study demonstrating its ability to solve challenging NLI examples requiring in-domain expertise.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1097-1113"},"PeriodicalIF":10.9,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43977430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating Transformer Models and Human Behaviors on Chinese Character Naming 评价变形模型与汉字命名中的人类行为
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-22 DOI: 10.1162/tacl_a_00573
Xiaomeng Ma, Lingyu Gao
Abstract Neural network models have been proposed to explain the grapheme-phoneme mapping process in humans for many alphabet languages. These models not only successfully learned the correspondence of the letter strings and their pronunciation, but also captured human behavior in nonce word naming tasks. How would the neural models perform for a non-alphabet language (e.g., Chinese) unknown character task? How well would the model capture human behavior? In this study, we first collect human speakers’ answers on unknown Character naming tasks and then evaluate a set of transformer models by comparing their performance with human behaviors on an unknown Chinese character naming task. We found that the models and humans behaved very similarly, that they had similar accuracy distribution for each character, and had a substantial overlap in answers. In addition, the models’ answers are highly correlated with humans’ answers. These results suggested that the transformer models can capture humans’ character naming behavior well.1
摘要已经提出了神经网络模型来解释人类对许多字母语言的字形-音素映射过程。这些模型不仅成功地学习了字母串及其发音的对应关系,还捕捉到了非单词命名任务中的人类行为。对于非字母语言(如中文)的未知字符任务,神经模型将如何执行?该模型对人类行为的捕捉效果如何?在这项研究中,我们首先收集了人类说话者在未知汉字命名任务中的答案,然后通过比较他们在未知汉字名称任务中的表现和人类行为来评估一组转换器模型。我们发现,模型和人类的行为非常相似,他们对每个角色的准确度分布相似,并且在答案上有很大的重叠。此外,模型的答案与人类的答案高度相关。这些结果表明,transformer模型可以很好地捕捉人类的角色命名行为。1
{"title":"Evaluating Transformer Models and Human Behaviors on Chinese Character Naming","authors":"Xiaomeng Ma, Lingyu Gao","doi":"10.1162/tacl_a_00573","DOIUrl":"https://doi.org/10.1162/tacl_a_00573","url":null,"abstract":"Abstract Neural network models have been proposed to explain the grapheme-phoneme mapping process in humans for many alphabet languages. These models not only successfully learned the correspondence of the letter strings and their pronunciation, but also captured human behavior in nonce word naming tasks. How would the neural models perform for a non-alphabet language (e.g., Chinese) unknown character task? How well would the model capture human behavior? In this study, we first collect human speakers’ answers on unknown Character naming tasks and then evaluate a set of transformer models by comparing their performance with human behaviors on an unknown Chinese character naming task. We found that the models and humans behaved very similarly, that they had similar accuracy distribution for each character, and had a substantial overlap in answers. In addition, the models’ answers are highly correlated with humans’ answers. These results suggested that the transformer models can capture humans’ character naming behavior well.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"755-770"},"PeriodicalIF":10.9,"publicationDate":"2023-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46799942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery 少即是多:通过因果发现减轻开放域对话响应生成模型的虚假关联
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-03-02 DOI: 10.1162/tacl_a_00561
Tao Feng, Lizhen Qu, Gholamreza Haffari
In this paper, we conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDialog curated by ourselves. The current models indeed suffer from spurious correlations and have a tendency to generate irrelevant and generic responses. Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference using a conditional independence classifier. The classifier is trained by a constrained self-training method, coined ConSTrain, to overcome data sparsity. The experimental results based on both human and automatic evaluation show that our method significantly outperforms the competitive baselines in terms of relevance, informativeness, and fluency.
在本文中,我们首次基于自己策划的语料库CGDialog对开放域响应生成模型的虚假相关性进行了研究。目前的模型确实存在虚假的相关性,并有产生不相关和通用反应的趋势。受因果发现算法的启发,我们提出了一种新的模型不可知方法,用于使用条件独立分类器进行训练和推理。分类器通过一种被称为ConSTrain的约束自训练方法来训练,以克服数据稀疏性。基于人工和自动评估的实验结果表明,我们的方法在相关性、信息性和流畅性方面显著优于竞争基线。
{"title":"Less is More: Mitigate Spurious Correlations for Open-Domain Dialogue Response Generation Models by Causal Discovery","authors":"Tao Feng, Lizhen Qu, Gholamreza Haffari","doi":"10.1162/tacl_a_00561","DOIUrl":"https://doi.org/10.1162/tacl_a_00561","url":null,"abstract":"In this paper, we conduct the first study on spurious correlations for open-domain response generation models based on a corpus CGDialog curated by ourselves. The current models indeed suffer from spurious correlations and have a tendency to generate irrelevant and generic responses. Inspired by causal discovery algorithms, we propose a novel model-agnostic method for training and inference using a conditional independence classifier. The classifier is trained by a constrained self-training method, coined ConSTrain, to overcome data sparsity. The experimental results based on both human and automatic evaluation show that our method significantly outperforms the competitive baselines in terms of relevance, informativeness, and fluency.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"511-530"},"PeriodicalIF":10.9,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44913607","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Transactions of the Association for Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1