首页 > 最新文献

Transactions of the Association for Computational Linguistics最新文献

英文 中文
Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off 交流驱动神经主体语言共性的出现:来自语序/区分大小写权衡的证据
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-30 DOI: 10.1162/tacl_a_00587
Yuchen Lian, Arianna Bisazza, T. Verhoef
Abstract Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. We investigate this latter account, focusing on the word-order/case-marking trade-off, a widely attested language universal that has proven particularly hard to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language via supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents. We see this as an essential step towards the investigation of language universals with neural learners.
在基于神经智能体的语言出现和变化模拟中,人工学习者的行为往往与人类学习者不同。一个常见的解释是这些学习者缺乏适当的认知偏见。然而,也有人提出,更自然的语言学习和使用环境可能会导致更像人类的结果。我们研究了后一种说法,重点是词序/大小写标记的权衡,这是一种广泛证明的语言通用,已被证明特别难以模拟。我们提出了一个新的神经智能体语言学习和交流框架(NeLLCom),其中说和听的智能体对首先通过监督学习学习微型语言,然后通过强化学习优化其用于交流。紧跟早期人类实验的设置,我们成功地用新框架复制了这种权衡,而没有在代理中硬编码特定的偏见。我们认为这是用神经学习者研究语言共相的重要一步。
{"title":"Communication Drives the Emergence of Language Universals in Neural Agents: Evidence from the Word-order/Case-marking Trade-off","authors":"Yuchen Lian, Arianna Bisazza, T. Verhoef","doi":"10.1162/tacl_a_00587","DOIUrl":"https://doi.org/10.1162/tacl_a_00587","url":null,"abstract":"Abstract Artificial learners often behave differently from human learners in the context of neural agent-based simulations of language emergence and change. A common explanation is the lack of appropriate cognitive biases in these learners. However, it has also been proposed that more naturalistic settings of language learning and use could lead to more human-like results. We investigate this latter account, focusing on the word-order/case-marking trade-off, a widely attested language universal that has proven particularly hard to simulate. We propose a new Neural-agent Language Learning and Communication framework (NeLLCom) where pairs of speaking and listening agents first learn a miniature language via supervised learning, and then optimize it for communication via reinforcement learning. Following closely the setup of earlier human experiments, we succeed in replicating the trade-off with the new framework without hard-coding specific biases in the agents. We see this as an essential step towards the investigation of language universals with neural learners.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"1033-1047"},"PeriodicalIF":10.9,"publicationDate":"2023-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43044930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences 视觉写作提示:以人物为基础的故事生成与策划图像序列
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-20 DOI: 10.1162/tacl_a_00553
Xudong Hong, A. Sayeed, K. Mehra, Vera Demberg, B. Schiele
Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence. Our new image sequence collection and filtering process has allowed us to obtain stories that are more coherent, diverse, and visually grounded compared to previous work. We also propose a character-based story generation model driven by coherence as a strong baseline. Evaluations show that our generated stories are more coherent, visually grounded, and diverse than stories generated with the current state-of-the-art model. Our code, image features, annotations and collected stories are available at https://vwprompt.github.io/.
目前基于图像的故事生成工作的缺点是现有的图像序列集合背后没有连贯的情节。我们通过生成一个新的基于图像的数据集,视觉写作提示(VWP)来改进视觉故事生成。VWP包含近2K选定的电影镜头序列,每个序列包括5-10张图像。图像序列与通过众包收集的12K个故事相一致,这些故事提供了图像序列和相应图像序列中的一组基础人物。与以前的工作相比,我们新的图像序列收集和过滤过程使我们能够获得更加连贯,多样化和视觉接地的故事。我们还提出了一个基于角色的故事生成模型,该模型由连贯性驱动,作为一个强大的基线。评估表明,我们生成的故事比使用当前最先进的模型生成的故事更连贯、更有视觉基础、更多样化。我们的代码,图像功能,注释和收集的故事可以在https://vwprompt.github.io/上获得。
{"title":"Visual Writing Prompts: Character-Grounded Story Generation with Curated Image Sequences","authors":"Xudong Hong, A. Sayeed, K. Mehra, Vera Demberg, B. Schiele","doi":"10.1162/tacl_a_00553","DOIUrl":"https://doi.org/10.1162/tacl_a_00553","url":null,"abstract":"Current work on image-based story generation suffers from the fact that the existing image sequence collections do not have coherent plots behind them. We improve visual story generation by producing a new image-grounded dataset, Visual Writing Prompts (VWP). VWP contains almost 2K selected sequences of movie shots, each including 5-10 images. The image sequences are aligned with a total of 12K stories which were collected via crowdsourcing given the image sequences and a set of grounded characters from the corresponding image sequence. Our new image sequence collection and filtering process has allowed us to obtain stories that are more coherent, diverse, and visually grounded compared to previous work. We also propose a character-based story generation model driven by coherence as a strong baseline. Evaluations show that our generated stories are more coherent, visually grounded, and diverse than stories generated with the current state-of-the-art model. Our code, image features, annotations and collected stories are available at https://vwprompt.github.io/.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"565-581"},"PeriodicalIF":10.9,"publicationDate":"2023-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45975746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection 基于模型自省的神经机器翻译中幻觉的理解与检测
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-18 DOI: 10.1162/tacl_a_00563
Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale, Marine Carpuat
Neural sequence generation models are known to “hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.
众所周知,神经序列生成模型会产生与源文本无关的输出,从而产生“幻觉”。这些幻觉具有潜在的危害性,但目前尚不清楚它们是在什么情况下产生的,以及如何减轻其影响。在这项工作中,我们首先通过分析通过源扰动产生的对比幻觉输出与非幻觉输出的相对表征贡献来识别幻觉的内部模型症状。然后,我们通过使用这些症状来设计一种轻量级的幻觉检测器,证明这些症状是自然幻觉的可靠指标,该检测器在手动注释的英汉和德英翻译测试台上既优于无模型基线,也优于基于质量估计的强分类器或大型预训练模型。
{"title":"Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection","authors":"Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale, Marine Carpuat","doi":"10.1162/tacl_a_00563","DOIUrl":"https://doi.org/10.1162/tacl_a_00563","url":null,"abstract":"Neural sequence generation models are known to “hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"546-564"},"PeriodicalIF":10.9,"publicationDate":"2023-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48824724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Tracking Brand-Associated Polarity-Bearing Topics in User Reviews 追踪用户评论中与品牌相关的极性话题
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-03 DOI: 10.1162/tacl_a_00555
Runcong Zhao, Lin Gui, Hanqi Yan, Yulan He
Monitoring online customer reviews is important for business organizations to measure customer satisfaction and better manage their reputations. In this paper, we propose a novel dynamic Brand-Topic Model (dBTM) which is able to automatically detect and track brand-associated sentiment scores and polarity-bearing topics from product reviews organized in temporally ordered time intervals. dBTM models the evolution of the latent brand polarity scores and the topic-word distributions over time by Gaussian state space models. It also incorporates a meta learning strategy to control the update of the topic-word distribution in each time interval in order to ensure smooth topic transitions and better brand score predictions. It has been evaluated on a dataset constructed from MakeupAlley reviews and a hotel review dataset. Experimental results show that dBTM outperforms a number of competitive baselines in brand ranking, achieving a good balance of topic coherence and uniqueness, and extracting well-separated polarity-bearing topics across time intervals.1
监控在线客户评价对于商业组织衡量客户满意度和更好地管理其声誉非常重要。在本文中,我们提出了一种新的动态品牌主题模型(dBTM),该模型能够从按时间顺序组织的产品评论中自动检测和跟踪与品牌相关的情感得分和带有极性的主题。dBTM通过高斯状态空间模型对潜在品牌极性得分和主题词分布随时间的演变进行建模。它还结合了元学习策略来控制每个时间间隔内主题词分布的更新,以确保主题平稳过渡和更好的品牌得分预测。它已经在由MakeupAlley评论和酒店评论数据集构建的数据集上进行了评估。实验结果表明,dBTM在品牌排名方面优于许多竞争基线,实现了主题连贯性和唯一性的良好平衡,并在不同时间间隔内提取了分离良好的极性主题。1
{"title":"Tracking Brand-Associated Polarity-Bearing Topics in User Reviews","authors":"Runcong Zhao, Lin Gui, Hanqi Yan, Yulan He","doi":"10.1162/tacl_a_00555","DOIUrl":"https://doi.org/10.1162/tacl_a_00555","url":null,"abstract":"Monitoring online customer reviews is important for business organizations to measure customer satisfaction and better manage their reputations. In this paper, we propose a novel dynamic Brand-Topic Model (dBTM) which is able to automatically detect and track brand-associated sentiment scores and polarity-bearing topics from product reviews organized in temporally ordered time intervals. dBTM models the evolution of the latent brand polarity scores and the topic-word distributions over time by Gaussian state space models. It also incorporates a meta learning strategy to control the update of the topic-word distribution in each time interval in order to ensure smooth topic transitions and better brand score predictions. It has been evaluated on a dataset constructed from MakeupAlley reviews and a hotel review dataset. Experimental results show that dBTM outperforms a number of competitive baselines in brand ranking, achieving a good balance of topic coherence and uniqueness, and extracting well-separated polarity-bearing topics across time intervals.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"404-418"},"PeriodicalIF":10.9,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47444673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates t2 -NER:一种基于两阶段跨度的模板统一命名实体识别框架
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00602
Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao
Abstract Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.
命名实体识别(NER)已经从传统的平面NER发展到重叠的不连续NER。它们大多是单独解决的,只有几个例外情况是用一个模型同时处理三个任务。目前表现最好的方法将统一的NER形式化为词-词关系分类,它几乎不关注提及内容学习,无法检测包含单个词的实体提及。在本文中,我们提出了一个两阶段的基于跨度的模板框架,即T2-NER来解决统一的NER任务。第一阶段是提取实体跨度,其中可以识别平面和重叠的实体。第二阶段是对所有实体跨度对进行分类,其中可以识别不连续实体。最后,采用多任务学习对两个阶段进行联合训练。为了提高基于跨度的模型的效率,我们设计了分组模板和类型化模板两个阶段来实现批量计算。我们还采用相邻填充策略和后一种填充策略来建模判别边界信息,并学习更好的跨(对)表示。此外,我们还引入了语法信息来增强我们的跨度表示。我们在8个基准数据集上对平坦、重叠和不连续的NER进行了广泛的实验,在这些数据集上,我们的模型击败了所有当前竞争的基线,获得了统一NER的最佳性能。
{"title":"<i>T</i> 2 <i>-NER</i>: A <u>T</u>wo-Stage Span-Based Framework for Unified Named Entity Recognition with <u>T</u>emplates","authors":"Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao","doi":"10.1162/tacl_a_00602","DOIUrl":"https://doi.org/10.1162/tacl_a_00602","url":null,"abstract":"Abstract Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135057751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PASTA: A Dataset for Modeling PArticipant STAtes in Narratives PASTA:叙事中参与者状态建模的数据集
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00600
Sayontan Ghosh, Mahnaz Koupaee, Isabella Chen, Francis Ferraro, Nathanael Chambers, Niranjan Balasubramanian
Abstract The events in a narrative are understood as a coherent whole via the underlying states of their participants. Often, these participant states are not explicitly mentioned, instead left to be inferred by the reader. A model that understands narratives should likewise infer these implicit states, and even reason about the impact of changes to these states on the narrative. To facilitate this goal, we introduce a new crowdsourced English-language, Participant States dataset, PASTA. This dataset contains inferable participant states; a counterfactual perturbation to each state; and the changes to the story that would be necessary if the counterfactual were true. We introduce three state-based reasoning tasks that test for the ability to infer when a state is entailed by a story, to revise a story conditioned on a counterfactual state, and to explain the most likely state change given a revised story. Experiments show that today’s LLMs can reason about states to some degree, but there is large room for improvement, especially in problems requiring access and ability to reason with diverse types of knowledge (e.g., physical, numerical, factual).1
叙事中的事件通过参与者的潜在状态被理解为一个连贯的整体。通常,这些参与国家没有被明确提及,而是留给读者去推断。理解叙事的模型也应该推断出这些隐含状态,甚至推断出这些状态的变化对叙事的影响。为了实现这一目标,我们引入了一个新的众包英语参与者国家数据集PASTA。该数据集包含可推断的参与者状态;对每个状态的反事实扰动;如果反事实是真的,对故事的修改是必要的。我们引入了三个基于状态的推理任务,测试推断一个故事何时包含一个状态的能力,修改一个以反事实状态为条件的故事的能力,以及在修改后的故事中解释最可能的状态变化的能力。实验表明,今天的法学硕士可以在某种程度上对状态进行推理,但还有很大的改进空间,特别是在需要使用不同类型的知识(例如,物理,数字,事实)进行推理的问题上
{"title":"<tt>PASTA</tt>: A Dataset for Modeling PArticipant STAtes in Narratives","authors":"Sayontan Ghosh, Mahnaz Koupaee, Isabella Chen, Francis Ferraro, Nathanael Chambers, Niranjan Balasubramanian","doi":"10.1162/tacl_a_00600","DOIUrl":"https://doi.org/10.1162/tacl_a_00600","url":null,"abstract":"Abstract The events in a narrative are understood as a coherent whole via the underlying states of their participants. Often, these participant states are not explicitly mentioned, instead left to be inferred by the reader. A model that understands narratives should likewise infer these implicit states, and even reason about the impact of changes to these states on the narrative. To facilitate this goal, we introduce a new crowdsourced English-language, Participant States dataset, PASTA. This dataset contains inferable participant states; a counterfactual perturbation to each state; and the changes to the story that would be necessary if the counterfactual were true. We introduce three state-based reasoning tasks that test for the ability to infer when a state is entailed by a story, to revise a story conditioned on a counterfactual state, and to explain the most likely state change given a revised story. Experiments show that today’s LLMs can reason about states to some degree, but there is large room for improvement, especially in problems requiring access and ability to reason with diverse types of knowledge (e.g., physical, numerical, factual).1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135560520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Calibrated Interpretation: Confidence Estimation in Semantic Parsing 校正解释:语义解析中的置信度估计
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00598
Elias Stengel-Eskin, Benjamin Van Durme
Abstract Sequence generation models are increasingly being used to translate natural language into programs, i.e., to perform executable semantic parsing. The fact that semantic parsing aims to predict programs that can lead to executed actions in the real world motivates developing safe systems. This in turn makes measuring calibration—a central component to safety—particularly important. We investigate the calibration of popular generation models across four popular semantic parsing datasets, finding that it varies across models and datasets. We then analyze factors associated with calibration error and release new confidence-based challenge splits of two parsing datasets. To facilitate the inclusion of calibration in semantic parsing evaluations, we release a library for computing calibration metrics.1
序列生成模型越来越多地用于将自然语言转换为程序,即执行可执行的语义解析。语义解析旨在预测可能导致在现实世界中执行的操作的程序,这一事实促使开发安全系统。这反过来又使得测量校准——安全的核心组成部分——变得尤为重要。我们研究了四种流行的语义解析数据集上流行的生成模型的校准,发现它在不同的模型和数据集上有所不同。然后,我们分析了与校准误差相关的因素,并发布了两个解析数据集的新的基于置信度的挑战分割。为了方便在语义解析评估中包含校准,我们发布了一个用于计算校准度量的库
{"title":"Calibrated Interpretation: Confidence Estimation in Semantic Parsing","authors":"Elias Stengel-Eskin, Benjamin Van Durme","doi":"10.1162/tacl_a_00598","DOIUrl":"https://doi.org/10.1162/tacl_a_00598","url":null,"abstract":"Abstract Sequence generation models are increasingly being used to translate natural language into programs, i.e., to perform executable semantic parsing. The fact that semantic parsing aims to predict programs that can lead to executed actions in the real world motivates developing safe systems. This in turn makes measuring calibration—a central component to safety—particularly important. We investigate the calibration of popular generation models across four popular semantic parsing datasets, finding that it varies across models and datasets. We then analyze factors associated with calibration error and release new confidence-based challenge splits of two parsing datasets. To facilitate the inclusion of calibration in semantic parsing evaluations, we release a library for computing calibration metrics.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving Multitask Retrieval by Promoting Task Specialization 通过促进任务专门化改进多任务检索
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00597
Wenzheng Zhang, Chenyan Xiong, Karl Stratos, Arnold Overwijk
Abstract In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks. Despite its practical appeal, naive multitask retrieval lags behind task-specific retrieval, in which a separate retriever is trained for each task. We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization. The main ingredients are: (1) a better choice of pretrained model—one that is explicitly optimized for multitasking—along with compatible prompting, and (2) a novel adaptive learning method that encourages each parameter to specialize in a particular task. The resulting multitask retriever is highly performant on the KILT benchmark. Upon analysis, we find that the model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.1
摘要在多任务检索中,训练单个检索器检索多个任务的相关上下文。尽管朴素多任务检索具有实际的吸引力,但它落后于特定任务检索,即为每个任务训练单独的检索器。我们表明,通过促进任务专门化,训练多任务寻回犬超越特定任务寻回犬是可能的。其主要成分是:(1)更好地选择预训练模型——明确针对多任务进行优化的模型——以及兼容的提示;(2)一种新颖的自适应学习方法,鼓励每个参数专门用于特定任务。得到的多任务检索器在KILT基准测试中表现优异。经过分析,我们发现该模型确实学习了比朴素多任务更任务专门化的参数,而没有提示或自适应学习
{"title":"Improving Multitask Retrieval by Promoting Task Specialization","authors":"Wenzheng Zhang, Chenyan Xiong, Karl Stratos, Arnold Overwijk","doi":"10.1162/tacl_a_00597","DOIUrl":"https://doi.org/10.1162/tacl_a_00597","url":null,"abstract":"Abstract In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks. Despite its practical appeal, naive multitask retrieval lags behind task-specific retrieval, in which a separate retriever is trained for each task. We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization. The main ingredients are: (1) a better choice of pretrained model—one that is explicitly optimized for multitasking—along with compatible prompting, and (2) a novel adaptive learning method that encourages each parameter to specialize in a particular task. The resulting multitask retriever is highly performant on the KILT benchmark. Upon analysis, we find that the model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking the Generation of Fact Checking Explanations 对事实核查解释的生成进行基准测试
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00601
Daniel Russo, Serra Sinem Tekiroğlu, Marco Guerini
Abstract Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of fake news produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification. In this paper, instead, we address the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. In particular, we focus on summarization approaches over unstructured knowledge (i.e., news articles) and we experiment with several extractive and abstractive strategies. We employed two datasets with different styles and structures, in order to assess the generalizability of our findings. Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances. Finally, we show that although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.
打击虚假信息是一项具有挑战性但又至关重要的任务。尽管有越来越多的专家参与人工事实核查,但这项活动很耗时,而且无法跟上每天不断增加的假新闻。因此,自动化这个过程是必要的,以帮助遏制错误信息。到目前为止,研究人员主要集中在索赔真实性分类上。相反,在本文中,我们解决了证明的生成(为什么索赔被分类为真或假的文本解释),并使用新的数据集和先进的基线对其进行基准测试。我们特别关注非结构化知识(即新闻文章)的总结方法,并尝试了几种提取和抽象策略。我们采用了两个不同风格和结构的数据集,以评估我们的发现的普遍性。结果表明,在证明过程中,产品摘要受益于权利要求信息,特别是权利要求驱动的提取步骤提高了抽象摘要的性能。最后,我们表明,尽管跨数据集实验会受到性能下降的影响,但在两个数据集的组合上训练的独特模型能够有效地保留样式信息。
{"title":"Benchmarking the Generation of Fact Checking Explanations","authors":"Daniel Russo, Serra Sinem Tekiroğlu, Marco Guerini","doi":"10.1162/tacl_a_00601","DOIUrl":"https://doi.org/10.1162/tacl_a_00601","url":null,"abstract":"Abstract Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of fake news produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification. In this paper, instead, we address the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. In particular, we focus on summarization approaches over unstructured knowledge (i.e., news articles) and we experiment with several extractive and abstractive strategies. We employed two datasets with different styles and structures, in order to assess the generalizability of our findings. Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances. Finally, we show that although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135057750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating a Century of Progress on the Cognitive Science of Adjective Ordering 评价一个世纪以来形容词排序认知科学的进展
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00596
William Dyer, Charles Torres, Gregory Scontras, Richard Futrell
Abstract The literature on adjective ordering abounds with proposals meant to account for why certain adjectives appear before others in multi-adjective strings (e.g., the small brown box). However, these proposals have been developed and tested primarily in isolation and based on English; few researchers have looked at the combined performance of multiple factors in the determination of adjective order, and few have evaluated predictors across multiple languages. The current work approaches both of these objectives by using technologies and datasets from natural language processing to look at the combined performance of existing proposals across 32 languages. Comparing this performance with both random and idealized baselines, we show that the literature on adjective ordering has made significant meaningful progress across its many decades, but there remains quite a gap yet to be explained.
关于形容词排序的文献中有大量的建议,旨在解释为什么某些形容词在多形容词字符串中出现在其他形容词之前(例如,小棕色盒子)。然而,这些建议主要是在孤立的情况下开发和测试的,并以英语为基础;很少有研究人员研究了多种因素在决定形容词顺序方面的综合表现,也很少有研究人员评估了多种语言的预测因素。目前的工作通过使用自然语言处理的技术和数据集来研究跨32种语言的现有提案的综合性能,从而实现了这两个目标。将这一表现与随机基线和理想化基线进行比较,我们发现关于形容词排序的文献在过去几十年里取得了重大的有意义的进展,但仍有相当大的差距有待解释。
{"title":"Evaluating a Century of Progress on the Cognitive Science of Adjective Ordering","authors":"William Dyer, Charles Torres, Gregory Scontras, Richard Futrell","doi":"10.1162/tacl_a_00596","DOIUrl":"https://doi.org/10.1162/tacl_a_00596","url":null,"abstract":"Abstract The literature on adjective ordering abounds with proposals meant to account for why certain adjectives appear before others in multi-adjective strings (e.g., the small brown box). However, these proposals have been developed and tested primarily in isolation and based on English; few researchers have looked at the combined performance of multiple factors in the determination of adjective order, and few have evaluated predictors across multiple languages. The current work approaches both of these objectives by using technologies and datasets from natural language processing to look at the combined performance of existing proposals across 32 languages. Comparing this performance with both random and idealized baselines, we show that the literature on adjective ordering has made significant meaningful progress across its many decades, but there remains quite a gap yet to be explained.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135596936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Transactions of the Association for Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1