首页 > 最新文献

Transactions of the Association for Computational Linguistics最新文献

英文 中文
Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection 基于模型自省的神经机器翻译中幻觉的理解与检测
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-18 DOI: 10.1162/tacl_a_00563
Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale, Marine Carpuat
Neural sequence generation models are known to “hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.
众所周知,神经序列生成模型会产生与源文本无关的输出,从而产生“幻觉”。这些幻觉具有潜在的危害性,但目前尚不清楚它们是在什么情况下产生的,以及如何减轻其影响。在这项工作中,我们首先通过分析通过源扰动产生的对比幻觉输出与非幻觉输出的相对表征贡献来识别幻觉的内部模型症状。然后,我们通过使用这些症状来设计一种轻量级的幻觉检测器,证明这些症状是自然幻觉的可靠指标,该检测器在手动注释的英汉和德英翻译测试台上既优于无模型基线,也优于基于质量估计的强分类器或大型预训练模型。
{"title":"Understanding and Detecting Hallucinations in Neural Machine Translation via Model Introspection","authors":"Weijia Xu, Sweta Agrawal, Eleftheria Briakou, Marianna J. Martindale, Marine Carpuat","doi":"10.1162/tacl_a_00563","DOIUrl":"https://doi.org/10.1162/tacl_a_00563","url":null,"abstract":"Neural sequence generation models are known to “hallucinate”, by producing outputs that are unrelated to the source text. These hallucinations are potentially harmful, yet it remains unclear in what conditions they arise and how to mitigate their impact. In this work, we first identify internal model symptoms of hallucinations by analyzing the relative token contributions to the generation in contrastive hallucinated vs. non-hallucinated outputs generated via source perturbations. We then show that these symptoms are reliable indicators of natural hallucinations, by using them to design a lightweight hallucination detector which outperforms both model-free baselines and strong classifiers based on quality estimation or large pre-trained models on manually annotated English-Chinese and German-English translation test beds.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"546-564"},"PeriodicalIF":10.9,"publicationDate":"2023-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48824724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Tracking Brand-Associated Polarity-Bearing Topics in User Reviews 追踪用户评论中与品牌相关的极性话题
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-03 DOI: 10.1162/tacl_a_00555
Runcong Zhao, Lin Gui, Hanqi Yan, Yulan He
Monitoring online customer reviews is important for business organizations to measure customer satisfaction and better manage their reputations. In this paper, we propose a novel dynamic Brand-Topic Model (dBTM) which is able to automatically detect and track brand-associated sentiment scores and polarity-bearing topics from product reviews organized in temporally ordered time intervals. dBTM models the evolution of the latent brand polarity scores and the topic-word distributions over time by Gaussian state space models. It also incorporates a meta learning strategy to control the update of the topic-word distribution in each time interval in order to ensure smooth topic transitions and better brand score predictions. It has been evaluated on a dataset constructed from MakeupAlley reviews and a hotel review dataset. Experimental results show that dBTM outperforms a number of competitive baselines in brand ranking, achieving a good balance of topic coherence and uniqueness, and extracting well-separated polarity-bearing topics across time intervals.1
监控在线客户评价对于商业组织衡量客户满意度和更好地管理其声誉非常重要。在本文中,我们提出了一种新的动态品牌主题模型(dBTM),该模型能够从按时间顺序组织的产品评论中自动检测和跟踪与品牌相关的情感得分和带有极性的主题。dBTM通过高斯状态空间模型对潜在品牌极性得分和主题词分布随时间的演变进行建模。它还结合了元学习策略来控制每个时间间隔内主题词分布的更新,以确保主题平稳过渡和更好的品牌得分预测。它已经在由MakeupAlley评论和酒店评论数据集构建的数据集上进行了评估。实验结果表明,dBTM在品牌排名方面优于许多竞争基线,实现了主题连贯性和唯一性的良好平衡,并在不同时间间隔内提取了分离良好的极性主题。1
{"title":"Tracking Brand-Associated Polarity-Bearing Topics in User Reviews","authors":"Runcong Zhao, Lin Gui, Hanqi Yan, Yulan He","doi":"10.1162/tacl_a_00555","DOIUrl":"https://doi.org/10.1162/tacl_a_00555","url":null,"abstract":"Monitoring online customer reviews is important for business organizations to measure customer satisfaction and better manage their reputations. In this paper, we propose a novel dynamic Brand-Topic Model (dBTM) which is able to automatically detect and track brand-associated sentiment scores and polarity-bearing topics from product reviews organized in temporally ordered time intervals. dBTM models the evolution of the latent brand polarity scores and the topic-word distributions over time by Gaussian state space models. It also incorporates a meta learning strategy to control the update of the topic-word distribution in each time interval in order to ensure smooth topic transitions and better brand score predictions. It has been evaluated on a dataset constructed from MakeupAlley reviews and a hotel review dataset. Experimental results show that dBTM outperforms a number of competitive baselines in brand ranking, achieving a good balance of topic coherence and uniqueness, and extracting well-separated polarity-bearing topics across time intervals.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"404-418"},"PeriodicalIF":10.9,"publicationDate":"2023-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47444673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
T 2 -NER: A Two-Stage Span-Based Framework for Unified Named Entity Recognition with Templates t2 -NER:一种基于两阶段跨度的模板统一命名实体识别框架
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00602
Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao
Abstract Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.
命名实体识别(NER)已经从传统的平面NER发展到重叠的不连续NER。它们大多是单独解决的,只有几个例外情况是用一个模型同时处理三个任务。目前表现最好的方法将统一的NER形式化为词-词关系分类,它几乎不关注提及内容学习,无法检测包含单个词的实体提及。在本文中,我们提出了一个两阶段的基于跨度的模板框架,即T2-NER来解决统一的NER任务。第一阶段是提取实体跨度,其中可以识别平面和重叠的实体。第二阶段是对所有实体跨度对进行分类,其中可以识别不连续实体。最后,采用多任务学习对两个阶段进行联合训练。为了提高基于跨度的模型的效率,我们设计了分组模板和类型化模板两个阶段来实现批量计算。我们还采用相邻填充策略和后一种填充策略来建模判别边界信息,并学习更好的跨(对)表示。此外,我们还引入了语法信息来增强我们的跨度表示。我们在8个基准数据集上对平坦、重叠和不连续的NER进行了广泛的实验,在这些数据集上,我们的模型击败了所有当前竞争的基线,获得了统一NER的最佳性能。
{"title":"<i>T</i> 2 <i>-NER</i>: A <u>T</u>wo-Stage Span-Based Framework for Unified Named Entity Recognition with <u>T</u>emplates","authors":"Peixin Huang, Xiang Zhao, Minghao Hu, Zhen Tan, Weidong Xiao","doi":"10.1162/tacl_a_00602","DOIUrl":"https://doi.org/10.1162/tacl_a_00602","url":null,"abstract":"Abstract Named Entity Recognition (NER) has so far evolved from the traditional flat NER to overlapped and discontinuous NER. They have mostly been solved separately, with only several exceptions that concurrently tackle three tasks with a single model. The current best-performing method formalizes the unified NER as word-word relation classification, which barely focuses on mention content learning and fails to detect entity mentions comprising a single word. In this paper, we propose a two-stage span-based framework with templates, namely, T2-NER, to resolve the unified NER task. The first stage is to extract entity spans, where flat and overlapped entities can be recognized. The second stage is to classify over all entity span pairs, where discontinuous entities can be recognized. Finally, multi-task learning is used to jointly train two stages. To improve the efficiency of span-based model, we design grouped templates and typed templates for two stages to realize batch computations. We also apply an adjacent packing strategy and a latter packing strategy to model discriminative boundary information and learn better span (pair) representation. Moreover, we introduce the syntax information to enhance our span representation. We perform extensive experiments on eight benchmark datasets for flat, overlapped, and discontinuous NER, where our model beats all the current competitive baselines, obtaining the best performance of unified NER.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135057751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PASTA: A Dataset for Modeling PArticipant STAtes in Narratives PASTA:叙事中参与者状态建模的数据集
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00600
Sayontan Ghosh, Mahnaz Koupaee, Isabella Chen, Francis Ferraro, Nathanael Chambers, Niranjan Balasubramanian
Abstract The events in a narrative are understood as a coherent whole via the underlying states of their participants. Often, these participant states are not explicitly mentioned, instead left to be inferred by the reader. A model that understands narratives should likewise infer these implicit states, and even reason about the impact of changes to these states on the narrative. To facilitate this goal, we introduce a new crowdsourced English-language, Participant States dataset, PASTA. This dataset contains inferable participant states; a counterfactual perturbation to each state; and the changes to the story that would be necessary if the counterfactual were true. We introduce three state-based reasoning tasks that test for the ability to infer when a state is entailed by a story, to revise a story conditioned on a counterfactual state, and to explain the most likely state change given a revised story. Experiments show that today’s LLMs can reason about states to some degree, but there is large room for improvement, especially in problems requiring access and ability to reason with diverse types of knowledge (e.g., physical, numerical, factual).1
叙事中的事件通过参与者的潜在状态被理解为一个连贯的整体。通常,这些参与国家没有被明确提及,而是留给读者去推断。理解叙事的模型也应该推断出这些隐含状态,甚至推断出这些状态的变化对叙事的影响。为了实现这一目标,我们引入了一个新的众包英语参与者国家数据集PASTA。该数据集包含可推断的参与者状态;对每个状态的反事实扰动;如果反事实是真的,对故事的修改是必要的。我们引入了三个基于状态的推理任务,测试推断一个故事何时包含一个状态的能力,修改一个以反事实状态为条件的故事的能力,以及在修改后的故事中解释最可能的状态变化的能力。实验表明,今天的法学硕士可以在某种程度上对状态进行推理,但还有很大的改进空间,特别是在需要使用不同类型的知识(例如,物理,数字,事实)进行推理的问题上
{"title":"<tt>PASTA</tt>: A Dataset for Modeling PArticipant STAtes in Narratives","authors":"Sayontan Ghosh, Mahnaz Koupaee, Isabella Chen, Francis Ferraro, Nathanael Chambers, Niranjan Balasubramanian","doi":"10.1162/tacl_a_00600","DOIUrl":"https://doi.org/10.1162/tacl_a_00600","url":null,"abstract":"Abstract The events in a narrative are understood as a coherent whole via the underlying states of their participants. Often, these participant states are not explicitly mentioned, instead left to be inferred by the reader. A model that understands narratives should likewise infer these implicit states, and even reason about the impact of changes to these states on the narrative. To facilitate this goal, we introduce a new crowdsourced English-language, Participant States dataset, PASTA. This dataset contains inferable participant states; a counterfactual perturbation to each state; and the changes to the story that would be necessary if the counterfactual were true. We introduce three state-based reasoning tasks that test for the ability to infer when a state is entailed by a story, to revise a story conditioned on a counterfactual state, and to explain the most likely state change given a revised story. Experiments show that today’s LLMs can reason about states to some degree, but there is large room for improvement, especially in problems requiring access and ability to reason with diverse types of knowledge (e.g., physical, numerical, factual).1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135560520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Calibrated Interpretation: Confidence Estimation in Semantic Parsing 校正解释:语义解析中的置信度估计
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00598
Elias Stengel-Eskin, Benjamin Van Durme
Abstract Sequence generation models are increasingly being used to translate natural language into programs, i.e., to perform executable semantic parsing. The fact that semantic parsing aims to predict programs that can lead to executed actions in the real world motivates developing safe systems. This in turn makes measuring calibration—a central component to safety—particularly important. We investigate the calibration of popular generation models across four popular semantic parsing datasets, finding that it varies across models and datasets. We then analyze factors associated with calibration error and release new confidence-based challenge splits of two parsing datasets. To facilitate the inclusion of calibration in semantic parsing evaluations, we release a library for computing calibration metrics.1
序列生成模型越来越多地用于将自然语言转换为程序,即执行可执行的语义解析。语义解析旨在预测可能导致在现实世界中执行的操作的程序,这一事实促使开发安全系统。这反过来又使得测量校准——安全的核心组成部分——变得尤为重要。我们研究了四种流行的语义解析数据集上流行的生成模型的校准,发现它在不同的模型和数据集上有所不同。然后,我们分析了与校准误差相关的因素,并发布了两个解析数据集的新的基于置信度的挑战分割。为了方便在语义解析评估中包含校准,我们发布了一个用于计算校准度量的库
{"title":"Calibrated Interpretation: Confidence Estimation in Semantic Parsing","authors":"Elias Stengel-Eskin, Benjamin Van Durme","doi":"10.1162/tacl_a_00598","DOIUrl":"https://doi.org/10.1162/tacl_a_00598","url":null,"abstract":"Abstract Sequence generation models are increasingly being used to translate natural language into programs, i.e., to perform executable semantic parsing. The fact that semantic parsing aims to predict programs that can lead to executed actions in the real world motivates developing safe systems. This in turn makes measuring calibration—a central component to safety—particularly important. We investigate the calibration of popular generation models across four popular semantic parsing datasets, finding that it varies across models and datasets. We then analyze factors associated with calibration error and release new confidence-based challenge splits of two parsing datasets. To facilitate the inclusion of calibration in semantic parsing evaluations, we release a library for computing calibration metrics.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135911382","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Improving Multitask Retrieval by Promoting Task Specialization 通过促进任务专门化改进多任务检索
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00597
Wenzheng Zhang, Chenyan Xiong, Karl Stratos, Arnold Overwijk
Abstract In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks. Despite its practical appeal, naive multitask retrieval lags behind task-specific retrieval, in which a separate retriever is trained for each task. We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization. The main ingredients are: (1) a better choice of pretrained model—one that is explicitly optimized for multitasking—along with compatible prompting, and (2) a novel adaptive learning method that encourages each parameter to specialize in a particular task. The resulting multitask retriever is highly performant on the KILT benchmark. Upon analysis, we find that the model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.1
摘要在多任务检索中,训练单个检索器检索多个任务的相关上下文。尽管朴素多任务检索具有实际的吸引力,但它落后于特定任务检索,即为每个任务训练单独的检索器。我们表明,通过促进任务专门化,训练多任务寻回犬超越特定任务寻回犬是可能的。其主要成分是:(1)更好地选择预训练模型——明确针对多任务进行优化的模型——以及兼容的提示;(2)一种新颖的自适应学习方法,鼓励每个参数专门用于特定任务。得到的多任务检索器在KILT基准测试中表现优异。经过分析,我们发现该模型确实学习了比朴素多任务更任务专门化的参数,而没有提示或自适应学习
{"title":"Improving Multitask Retrieval by Promoting Task Specialization","authors":"Wenzheng Zhang, Chenyan Xiong, Karl Stratos, Arnold Overwijk","doi":"10.1162/tacl_a_00597","DOIUrl":"https://doi.org/10.1162/tacl_a_00597","url":null,"abstract":"Abstract In multitask retrieval, a single retriever is trained to retrieve relevant contexts for multiple tasks. Despite its practical appeal, naive multitask retrieval lags behind task-specific retrieval, in which a separate retriever is trained for each task. We show that it is possible to train a multitask retriever that outperforms task-specific retrievers by promoting task specialization. The main ingredients are: (1) a better choice of pretrained model—one that is explicitly optimized for multitasking—along with compatible prompting, and (2) a novel adaptive learning method that encourages each parameter to specialize in a particular task. The resulting multitask retriever is highly performant on the KILT benchmark. Upon analysis, we find that the model indeed learns parameters that are more task-specialized compared to naive multitasking without prompting or adaptive learning.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135699919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking the Generation of Fact Checking Explanations 对事实核查解释的生成进行基准测试
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00601
Daniel Russo, Serra Sinem Tekiroğlu, Marco Guerini
Abstract Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of fake news produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification. In this paper, instead, we address the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. In particular, we focus on summarization approaches over unstructured knowledge (i.e., news articles) and we experiment with several extractive and abstractive strategies. We employed two datasets with different styles and structures, in order to assess the generalizability of our findings. Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances. Finally, we show that although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.
打击虚假信息是一项具有挑战性但又至关重要的任务。尽管有越来越多的专家参与人工事实核查,但这项活动很耗时,而且无法跟上每天不断增加的假新闻。因此,自动化这个过程是必要的,以帮助遏制错误信息。到目前为止,研究人员主要集中在索赔真实性分类上。相反,在本文中,我们解决了证明的生成(为什么索赔被分类为真或假的文本解释),并使用新的数据集和先进的基线对其进行基准测试。我们特别关注非结构化知识(即新闻文章)的总结方法,并尝试了几种提取和抽象策略。我们采用了两个不同风格和结构的数据集,以评估我们的发现的普遍性。结果表明,在证明过程中,产品摘要受益于权利要求信息,特别是权利要求驱动的提取步骤提高了抽象摘要的性能。最后,我们表明,尽管跨数据集实验会受到性能下降的影响,但在两个数据集的组合上训练的独特模型能够有效地保留样式信息。
{"title":"Benchmarking the Generation of Fact Checking Explanations","authors":"Daniel Russo, Serra Sinem Tekiroğlu, Marco Guerini","doi":"10.1162/tacl_a_00601","DOIUrl":"https://doi.org/10.1162/tacl_a_00601","url":null,"abstract":"Abstract Fighting misinformation is a challenging, yet crucial, task. Despite the growing number of experts being involved in manual fact-checking, this activity is time-consuming and cannot keep up with the ever-increasing amount of fake news produced daily. Hence, automating this process is necessary to help curb misinformation. Thus far, researchers have mainly focused on claim veracity classification. In this paper, instead, we address the generation of justifications (textual explanation of why a claim is classified as either true or false) and benchmark it with novel datasets and advanced baselines. In particular, we focus on summarization approaches over unstructured knowledge (i.e., news articles) and we experiment with several extractive and abstractive strategies. We employed two datasets with different styles and structures, in order to assess the generalizability of our findings. Results show that in justification production summarization benefits from the claim information, and, in particular, that a claim-driven extractive step improves abstractive summarization performances. Finally, we show that although cross-dataset experiments suffer from performance degradation, a unique model trained on a combination of the two datasets is able to retain style information in an efficient manner.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135057750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluating a Century of Progress on the Cognitive Science of Adjective Ordering 评价一个世纪以来形容词排序认知科学的进展
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00596
William Dyer, Charles Torres, Gregory Scontras, Richard Futrell
Abstract The literature on adjective ordering abounds with proposals meant to account for why certain adjectives appear before others in multi-adjective strings (e.g., the small brown box). However, these proposals have been developed and tested primarily in isolation and based on English; few researchers have looked at the combined performance of multiple factors in the determination of adjective order, and few have evaluated predictors across multiple languages. The current work approaches both of these objectives by using technologies and datasets from natural language processing to look at the combined performance of existing proposals across 32 languages. Comparing this performance with both random and idealized baselines, we show that the literature on adjective ordering has made significant meaningful progress across its many decades, but there remains quite a gap yet to be explained.
关于形容词排序的文献中有大量的建议,旨在解释为什么某些形容词在多形容词字符串中出现在其他形容词之前(例如,小棕色盒子)。然而,这些建议主要是在孤立的情况下开发和测试的,并以英语为基础;很少有研究人员研究了多种因素在决定形容词顺序方面的综合表现,也很少有研究人员评估了多种语言的预测因素。目前的工作通过使用自然语言处理的技术和数据集来研究跨32种语言的现有提案的综合性能,从而实现了这两个目标。将这一表现与随机基线和理想化基线进行比较,我们发现关于形容词排序的文献在过去几十年里取得了重大的有意义的进展,但仍有相当大的差距有待解释。
{"title":"Evaluating a Century of Progress on the Cognitive Science of Adjective Ordering","authors":"William Dyer, Charles Torres, Gregory Scontras, Richard Futrell","doi":"10.1162/tacl_a_00596","DOIUrl":"https://doi.org/10.1162/tacl_a_00596","url":null,"abstract":"Abstract The literature on adjective ordering abounds with proposals meant to account for why certain adjectives appear before others in multi-adjective strings (e.g., the small brown box). However, these proposals have been developed and tested primarily in isolation and based on English; few researchers have looked at the combined performance of multiple factors in the determination of adjective order, and few have evaluated predictors across multiple languages. The current work approaches both of these objectives by using technologies and datasets from natural language processing to look at the combined performance of existing proposals across 32 languages. Comparing this performance with both random and idealized baselines, we show that the literature on adjective ordering has made significant meaningful progress across its many decades, but there remains quite a gap yet to be explained.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135596936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to Mathematical Language Processing: Informal Proofs, Word Problems, and Supporting Tasks 数学语言处理导论:非正式证明、文字问题和辅助任务
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00594
Jordan Meadows, André Freitas
Abstract Automating discovery in mathematics and science will require sophisticated methods of information extraction and abstract reasoning, including models that can convincingly process relationships between mathematical elements and natural language, to produce problem solutions of real-world value. We analyze mathematical language processing methods across five strategic sub-areas (identifier-definition extraction, formula retrieval, natural language premise selection, math word problem solving, and informal theorem proving) from recent years, highlighting prevailing methodologies, existing limitations, overarching trends, and promising avenues for future research.
数学和科学中的自动化发现将需要复杂的信息提取和抽象推理方法,包括能够令人信服地处理数学元素和自然语言之间关系的模型,以产生具有现实世界价值的问题解决方案。我们分析了近年来数学语言处理方法在五个战略子领域(标识符定义提取、公式检索、自然语言前提选择、数学单词问题解决和非正式定理证明),强调了流行的方法、现有的局限性、总体趋势和未来研究的有希望的途径。
{"title":"Introduction to Mathematical Language Processing: Informal Proofs, Word Problems, and Supporting Tasks","authors":"Jordan Meadows, André Freitas","doi":"10.1162/tacl_a_00594","DOIUrl":"https://doi.org/10.1162/tacl_a_00594","url":null,"abstract":"Abstract Automating discovery in mathematics and science will require sophisticated methods of information extraction and abstract reasoning, including models that can convincingly process relationships between mathematical elements and natural language, to produce problem solutions of real-world value. We analyze mathematical language processing methods across five strategic sub-areas (identifier-definition extraction, formula retrieval, natural language premise selection, math word problem solving, and informal theorem proving) from recent years, highlighting prevailing methodologies, existing limitations, overarching trends, and promising avenues for future research.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135596959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification T3L:跨语言文本分类的翻译-测试迁移学习
1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-01-01 DOI: 10.1162/tacl_a_00593
Inigo Jauregi Unanue, Gholamreza Haffari, Massimo Piccardi
Abstract Cross-lingual text classification leverages text classifiers trained in a high-resource language to perform text classification in other languages with no or minimal fine-tuning (zero/ few-shots cross-lingual transfer). Nowadays, cross-lingual text classifiers are typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest. However, the performance of these models varies significantly across languages and classification tasks, suggesting that the superposition of the language modelling and classification tasks is not always effective. For this reason, in this paper we propose revisiting the classic “translate-and-test” pipeline to neatly separate the translation and classification stages. The proposed approach couples 1) a neural machine translator translating from the targeted language to a high-resource language, with 2) a text classifier trained in the high-resource language, but the neural machine translator generates “soft” translations to permit end-to-end backpropagation during fine-tuning of the pipeline. Extensive experiments have been carried out over three cross-lingual text classification datasets (XNLI, MLDoc, and MultiEURLEX), with the results showing that the proposed approach has significantly improved performance over a competitive baseline.
跨语言文本分类利用在高资源语言中训练的文本分类器来执行其他语言的文本分类,而无需或最小的微调(零/几次跨语言迁移)。目前,跨语言文本分类器通常建立在大规模的多语言模型(LMs)上,该模型是在各种感兴趣的语言上进行预训练的。然而,这些模型的性能在不同的语言和分类任务之间差异很大,这表明语言建模和分类任务的叠加并不总是有效的。出于这个原因,在本文中,我们建议重新审视经典的“翻译-测试”管道,以整齐地分离翻译和分类阶段。该方法将1)神经机器翻译器从目标语言翻译为高资源语言,2)高资源语言训练的文本分类器,但神经机器翻译器生成“软”翻译,以便在管道微调期间允许端到端反向传播。在三个跨语言文本分类数据集(XNLI、MLDoc和MultiEURLEX)上进行了大量的实验,结果表明所提出的方法在竞争性基线上显著提高了性能。
{"title":"T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification","authors":"Inigo Jauregi Unanue, Gholamreza Haffari, Massimo Piccardi","doi":"10.1162/tacl_a_00593","DOIUrl":"https://doi.org/10.1162/tacl_a_00593","url":null,"abstract":"Abstract Cross-lingual text classification leverages text classifiers trained in a high-resource language to perform text classification in other languages with no or minimal fine-tuning (zero/ few-shots cross-lingual transfer). Nowadays, cross-lingual text classifiers are typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest. However, the performance of these models varies significantly across languages and classification tasks, suggesting that the superposition of the language modelling and classification tasks is not always effective. For this reason, in this paper we propose revisiting the classic “translate-and-test” pipeline to neatly separate the translation and classification stages. The proposed approach couples 1) a neural machine translator translating from the targeted language to a high-resource language, with 2) a text classifier trained in the high-resource language, but the neural machine translator generates “soft” translations to permit end-to-end backpropagation during fine-tuning of the pipeline. Extensive experiments have been carried out over three cross-lingual text classification datasets (XNLI, MLDoc, and MultiEURLEX), with the results showing that the proposed approach has significantly improved performance over a competitive baseline.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135596945","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Transactions of the Association for Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1