Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing最新文献_第5页

Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access 面向任务的非结构化知识访问对话中的主题感知响应生成

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-10 DOI: 10.48550/arXiv.2212.05373

Yue Feng, Gerasimos Lampouras, Ignacio Iacobacci

To alleviate the problem of structured databases' limited coverage, recent task-oriented dialogue systems incorporate external unstructured knowledge to guide the generation of system responses. However, these usually use word or sentence level similarities to detect the relevant knowledge context, which only partially capture the topical level relevance. In this paper, we examine how to better integrate topical information in knowledge grounded task-oriented dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end response generation model. TARG incorporates multiple topic-aware attention mechanisms to derive the importance weighting scheme over dialogue utterances and external knowledge sources towards a better understanding of the dialogue history. Experimental results indicate that TARG achieves state-of-the-art performance in knowledge selection and response generation, outperforming previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4 respectively on Doc2Dial, and performing comparably with previous work on DSTC9; both being knowledge-grounded task-oriented dialogue datasets.

为了减轻结构化数据库覆盖范围有限的问题，最近面向任务的对话系统结合了外部非结构化知识来指导系统响应的生成。然而，这些方法通常使用单词或句子级别的相似性来检测相关知识上下文，这只能部分捕获主题级别的相关性。在本文中，我们研究了如何在基于知识的任务导向对话中更好地整合主题信息，并提出了“主题感知响应生成”(Topic-Aware Response Generation, TARG)，这是一种端到端响应生成模型。TARG结合多个话题感知注意机制，推导出对话话语和外部知识来源的重要性加权方案，从而更好地理解对话历史。实验结果表明，TARG在知识选择和响应生成方面达到了最先进的水平，在Doc2Dial上EM、F1和BLEU-4分别高出3.2、3.6和4.2分，在DSTC9上与前人相当;都是基于知识的、面向任务的对话数据集。

{"title":"Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access","authors":"Yue Feng, Gerasimos Lampouras, Ignacio Iacobacci","doi":"10.48550/arXiv.2212.05373","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05373","url":null,"abstract":"To alleviate the problem of structured databases' limited coverage, recent task-oriented dialogue systems incorporate external unstructured knowledge to guide the generation of system responses. However, these usually use word or sentence level similarities to detect the relevant knowledge context, which only partially capture the topical level relevance. In this paper, we examine how to better integrate topical information in knowledge grounded task-oriented dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end response generation model. TARG incorporates multiple topic-aware attention mechanisms to derive the importance weighting scheme over dialogue utterances and external knowledge sources towards a better understanding of the dialogue history. Experimental results indicate that TARG achieves state-of-the-art performance in knowledge selection and response generation, outperforming previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4 respectively on Doc2Dial, and performing comparably with previous work on DSTC9; both being knowledge-grounded task-oriented dialogue datasets.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"25 1","pages":"7199-7211"},"PeriodicalIF":0.0,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75280442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Successive Prompting for Decomposing Complex Questions 分解复杂问题的连续提示

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04092

Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner

Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce “Successive Prompting” where, we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate synthetic dataset which can be used to bootstrap model’s ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement in F1 of ~5% when compared with a state-of-the-art model with synthetic augmentations and few-shot version of the DROP dataset.

回答需要做出潜在决定的复杂问题是一项具有挑战性的任务，尤其是在监督有限的情况下。最近的工作利用大型语言模型(LMs)的功能，通过演示如何在一次解决复杂问题的同时输出中间合理化，在几个镜头的设置中执行复杂的问题回答。我们引入“连续提示”，我们迭代地将一个复杂的任务分解成一个简单的任务，解决它，然后重复这个过程，直到我们得到最终的解决方案。连续提示将分解复杂问题的监督与回答简单问题的监督解耦，使我们能够(1)在每个推理步骤中有多个机会查询上下文示例;(2)从问题回答中单独学习问题分解，包括使用合成数据;(3)在大型LM表现不佳的推理步骤中使用定制(微调)组件。中间监督通常是手动编写的，收集起来可能会很昂贵。我们介绍了一种生成合成数据集的方法，该方法可以用来引导模型分解和回答中间问题的能力。与具有合成增强和DROP数据集的少镜头版本的最先进模型相比，我们的最佳模型(具有连续提示)在F1上实现了~5%的改进。

{"title":"Successive Prompting for Decomposing Complex Questions","authors":"Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner","doi":"10.48550/arXiv.2212.04092","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04092","url":null,"abstract":"Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce “Successive Prompting” where, we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate synthetic dataset which can be used to bootstrap model’s ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement in F1 of ~5% when compared with a state-of-the-art model with synthetic augmentations and few-shot version of the DROP dataset.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"44 1","pages":"1251-1265"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80951785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 46

Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection 切中要害，不要拐弯抹角:用一次投射移除受保护的属性

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04273

P. Haghighatkhah, Antske Fokkens, Pia Sommerauer, B. Speckmann, Kevin Verbeek

Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections.Multiple iterations, however, increase the risk that information other than the target is negatively affected.We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space.Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.

偏见消除和最近的探索研究试图从嵌入空间中去除特定信息。在这里，重要的是要尽可能多地删除目标信息，同时保留现有的任何其他信息。INLP是最近流行的一种通过迭代零空间投影去除特定信息的方法。然而，多次迭代会增加目标以外的信息受到负面影响的风险。我们介绍了寻找单个目标投影的两种方法:均值投影(MP，更有效)和Tukey中值投影(TMP，有理论保证)。我们对MP和INLP的比较表明:(1)一个MP投影消除了基于目标的线性可分性;(2)MP对整体空间的影响较小。进一步分析表明，在MP后应用随机投影对嵌入空间的总体影响与INLP的多次投影相同。因此，应用一个目标(MP)投影比应用引入随机效应的多个(INLP)投影在方法上更干净。

{"title":"Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection","authors":"P. Haghighatkhah, Antske Fokkens, Pia Sommerauer, B. Speckmann, Kevin Verbeek","doi":"10.48550/arXiv.2212.04273","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04273","url":null,"abstract":"Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections.Multiple iterations, however, increase the risk that information other than the target is negatively affected.We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space.Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"53 1","pages":"8395-8416"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86175184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation 低资源神经机器翻译迁移学习中的一致性建模

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-08 DOI: 10.48550/arXiv.2212.04262

Zhao Li, Xuebo Liu, Derek F. Wong, Lidia S. Chao, Min Zhang

Transfer learning is a simple and powerful method that can be used to boost model performance of low-resource neural machine translation (NMT). Existing transfer learning methods for NMT are static, which simply transfer knowledge from a parent model to a child model once via parameter initialization. In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can continuously transfer knowledge from the parent model during the training of the child model. Specifically, for each training instance of the child model, ConsistTL constructs the semantically-equivalent instance for the parent model and encourages prediction consistency between the parent and child for this instance, which is equivalent to the child model learning each instance under the guidance of the parent model. Experimental results on five low-resource NMT tasks demonstrate that ConsistTL results in significant improvements over strong transfer learning baselines, with a gain up to 1.7 BLEU over the existing back-translation model on the widely-used WMT17 Turkish-English benchmark. Further analysis reveals that ConsistTL can improve the inference calibration of the child model. Code and scripts are freely available at https://github.com/NLP2CT/ConsistTL.

迁移学习是一种简单而有效的方法，可用于提高低资源神经机器翻译的模型性能。现有的NMT迁移学习方法是静态的，它只是通过参数初始化将知识从父模型转移到子模型一次。在本文中，我们提出了一种新的NMT迁移学习方法，即ConsistTL，它可以在子模型的训练过程中不断地从父模型迁移知识。具体来说，对于子模型的每一个训练实例，ConsistTL都为父模型构建语义等价的实例，并鼓励父模型和子模型对该实例的预测一致性，相当于子模型在父模型的指导下学习每一个实例。在五个低资源NMT任务上的实验结果表明，ConsistTL在强迁移学习基线上取得了显著的进步，在广泛使用的WMT17土耳其语-英语基准上，与现有的反翻译模型相比，其增益高达1.7 BLEU。进一步分析表明，ConsistTL可以改善子模型的推理校准。代码和脚本可在https://github.com/NLP2CT/ConsistTL免费获得。

{"title":"ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation","authors":"Zhao Li, Xuebo Liu, Derek F. Wong, Lidia S. Chao, Min Zhang","doi":"10.48550/arXiv.2212.04262","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04262","url":null,"abstract":"Transfer learning is a simple and powerful method that can be used to boost model performance of low-resource neural machine translation (NMT). Existing transfer learning methods for NMT are static, which simply transfer knowledge from a parent model to a child model once via parameter initialization. In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can continuously transfer knowledge from the parent model during the training of the child model. Specifically, for each training instance of the child model, ConsistTL constructs the semantically-equivalent instance for the parent model and encourages prediction consistency between the parent and child for this instance, which is equivalent to the child model learning each instance under the guidance of the parent model. Experimental results on five low-resource NMT tasks demonstrate that ConsistTL results in significant improvements over strong transfer learning baselines, with a gain up to 1.7 BLEU over the existing back-translation model on the widely-used WMT17 Turkish-English benchmark. Further analysis reveals that ConsistTL can improve the inference calibration of the child model. Code and scripts are freely available at https://github.com/NLP2CT/ConsistTL.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"7 1 1","pages":"8383-8394"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76888784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Neural Machine Translation with Contrastive Translation Memories 基于对比翻译记忆的神经机器翻译

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-06 DOI: 10.48550/arXiv.2212.03140

Xin Cheng, Shen Gao, Lemao Liu, Dongyan Zhao, Rui Yan

Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. Different from previous works that make use of mutually similar but redundant translation memories (TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gain in three phases. First, in TM retrieval phase, we adopt contrastive retrieval algorithm to avoid redundancy and uninformativeness of similar translation pieces. Second, in memory encoding stage, given a set of TMs we propose a novel Hierarchical Group Attention module to gather both local context of each TM and global context of the whole TM set. Finally, in training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence. Experimental results show that our framework obtains substantial improvements over strong baselines in the benchmark dataset.

检索增强神经机器翻译模型在许多翻译场景中都取得了成功。与以往使用相互相似但冗余的翻译记忆库不同，我们提出了一种新的检索增强的神经网络机器翻译模型，该模型与源句子整体相似，但彼此相对，在三个阶段提供最大的信息增益。首先，在TM检索阶段，我们采用对比检索算法来避免相似翻译片段的冗余和非信息性。其次，在记忆编码阶段，在给定一组TM的情况下，我们提出了一种新的分层分组注意模块来收集每个TM的局部上下文和整个TM集的全局上下文。最后，在训练阶段，引入Multi-TM对比学习目标，学习每个TM相对于目标句子的显著特征。实验结果表明，我们的框架比基准数据集中的强基线得到了实质性的改进。

{"title":"Neural Machine Translation with Contrastive Translation Memories","authors":"Xin Cheng, Shen Gao, Lemao Liu, Dongyan Zhao, Rui Yan","doi":"10.48550/arXiv.2212.03140","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03140","url":null,"abstract":"Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. Different from previous works that make use of mutually similar but redundant translation memories (TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gain in three phases. First, in TM retrieval phase, we adopt contrastive retrieval algorithm to avoid redundancy and uninformativeness of similar translation pieces. Second, in memory encoding stage, given a set of TMs we propose a novel Hierarchical Group Attention module to gather both local context of each TM and global context of the whole TM set. Finally, in training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence. Experimental results show that our framework obtains substantial improvements over strong baselines in the benchmark dataset.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"185 1","pages":"3591-3601"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76048311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events POQue:询问参与者特定的结果问题，以更深入地理解复杂事件

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-05 DOI: 10.48550/arXiv.2212.02629

Sai Vallurupalli, Sayontan Ghosh, K. Erk, Niranjan Balasubramanian, Francis Ferraro

Knowledge about outcomes is critical for complex event understanding but is hard to acquire.We show that by pre-identifying a participant in a complex event, crowdworkers are ableto (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground theoutcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96weighted Fleiss Kappa). Our dataset, POQUe (Participant Outcome Questions), enables theexploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant’s influence over the event culmination.

关于结果的知识对于理解复杂事件至关重要，但很难获得。我们表明，通过预先识别复杂事件中的参与者，众包工作者能够(1)推断构成该情况的显著事件的集体影响，(2)注释导致该情况的参与者的自愿参与，以及(3)根据参与者的状态变化来确定情况的结果。通过创建多步骤界面和仔细的质量控制策略，我们收集了一个高质量的注释数据集，包含8k短新闻报道和具有高注释者间一致性(0.74-0.96加权Fleiss Kappa)的ROCStories。我们的数据集POQUe(参与者结果问题)能够探索和开发解决语义理解多个方面的模型。通过实验，我们发现当前的语言模型以微妙的方式落后于人类的表现，通过我们的任务公式，目标是对复杂事件的抽象和具体理解，其结果，以及参与者对事件高潮的影响。

{"title":"POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events","authors":"Sai Vallurupalli, Sayontan Ghosh, K. Erk, Niranjan Balasubramanian, Francis Ferraro","doi":"10.48550/arXiv.2212.02629","DOIUrl":"https://doi.org/10.48550/arXiv.2212.02629","url":null,"abstract":"Knowledge about outcomes is critical for complex event understanding but is hard to acquire.We show that by pre-identifying a participant in a complex event, crowdworkers are ableto (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground theoutcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96weighted Fleiss Kappa). Our dataset, POQUe (Participant Outcome Questions), enables theexploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant’s influence over the event culmination.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"16 1","pages":"8674-8697"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89406141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Pair-Based Joint Encoding with Relational Graph Convolutional Networks for Emotion-Cause Pair Extraction 基于关系图卷积网络的情感原因对联合编码

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-04 DOI: 10.48550/arXiv.2212.01844

Junlong Liu, Xichen Shang, Qianli Ma

Emotion-cause pair extraction (ECPE) aims to extract emotion clauses and corresponding cause clauses, which have recently received growing attention. Previous methods sequentially encode features with a specified order. They first encode the emotion and cause features for clause extraction and then combine them for pair extraction. This lead to an imbalance in inter-task feature interaction where features extracted later have no direct contact with the former. To address this issue, we propose a novel **P**air-**B**ased **J**oint **E**ncoding (**PBJE**) network, which generates pairs and clauses features simultaneously in a joint feature encoding manner to model the causal relationship in clauses. PBJE can balance the information flow among emotion clauses, cause clauses and pairs. From a multi-relational perspective, we construct a heterogeneous undirected graph and apply the Relational Graph Convolutional Network (RGCN) to capture the multiplex relationship between clauses and the relationship between pairs and clauses. Experimental results show that PBJE achieves state-of-the-art performance on the Chinese benchmark corpus.

情感-原因对提取(ECPE)旨在提取情感子句和相应的原因子句，近年来受到越来越多的关注。以前的方法按照指定的顺序对特征进行顺序编码。他们首先对情感和原因特征进行编码进行子句提取，然后将它们组合起来进行对提取。这导致了任务间特征交互的不平衡，即后提取的特征与前提取的特征没有直接联系。为了解决这一问题，我们提出了一种新的基于**P**air-**B* based **J** point **E**ncoding (**PBJE**)网络，该网络以联合特征编码的方式同时生成对和子句特征，对子句中的因果关系进行建模。PBJE能够平衡情感子句、原因子句和对之间的信息流。从多关系的角度出发，构造了一个异构无向图，并应用关系图卷积网络(RGCN)捕捉子句之间以及对与子句之间的多重关系。实验结果表明，PBJE在中文基准语料库上达到了最先进的性能。

{"title":"Pair-Based Joint Encoding with Relational Graph Convolutional Networks for Emotion-Cause Pair Extraction","authors":"Junlong Liu, Xichen Shang, Qianli Ma","doi":"10.48550/arXiv.2212.01844","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01844","url":null,"abstract":"Emotion-cause pair extraction (ECPE) aims to extract emotion clauses and corresponding cause clauses, which have recently received growing attention. Previous methods sequentially encode features with a specified order. They first encode the emotion and cause features for clause extraction and then combine them for pair extraction. This lead to an imbalance in inter-task feature interaction where features extracted later have no direct contact with the former. To address this issue, we propose a novel **P**air-**B**ased **J**oint **E**ncoding (**PBJE**) network, which generates pairs and clauses features simultaneously in a joint feature encoding manner to model the causal relationship in clauses. PBJE can balance the information flow among emotion clauses, cause clauses and pairs. From a multi-relational perspective, we construct a heterogeneous undirected graph and apply the Relational Graph Convolutional Network (RGCN) to capture the multiplex relationship between clauses and the relationship between pairs and clauses. Experimental results show that PBJE achieves state-of-the-art performance on the Chinese benchmark corpus.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"27 1","pages":"5339-5351"},"PeriodicalIF":0.0,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82311837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation 通过可控逆向生成构建高度归纳的对话安全语境

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-04 DOI: 10.48550/arXiv.2212.01810

Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang

Large pretrained language models can easily produce toxic or biased content, which is prohibitive for practical use. In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations. However, what type of context is more likely to induce unsafe responses is still under-explored. In this paper, we identify that context toxicity and context category (e.g., textit{profanity}, textit{insult}, textit{drugs}, etc.) are two important factors to cause safety issues in response generation. Hence, we propose a method called emph{reverse generation} to construct adversarial contexts conditioned on a given response, with the flexibility to control category, toxicity level, and inductivity of the generated contexts. Via reverse generation, we augment the existing BAD dataset and construct a new dataset BAD+ which contains more than 120K diverse and highly inductive contexts in 12 categories. We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems. Furthermore, we show that BAD+ can greatly enhance the safety of generation and reveal the key factors of safety improvement. Our code and dataset is available at url{https://github.com/thu-coai/Reverse_Generation}.

大型预训练语言模型很容易产生有害或有偏见的内容，这不利于实际使用。为了检测这些有毒代，现有的方法依赖于模板、现实世界的数据提取、众包工人或自动生成来构建可能诱导有毒代的对抗性环境。然而，哪种类型的环境更有可能引起不安全的反应仍未得到充分探讨。在本文中，我们确定上下文毒性和上下文类别(例如，textit{亵渎}，textit{侮辱}，textit{药物}等)是导致响应生成安全问题的两个重要因素。因此，我们提出了一种称为emph{反向生成}的方法来构建基于给定响应的对抗性上下文，并具有控制生成上下文的类别、毒性水平和归纳性的灵活性。通过反向生成，我们增强了现有的BAD数据集，并构建了一个新的数据集BAD+，该数据集包含12个类别中超过120K个不同且高度归纳的上下文。我们测试了三种流行的预训练对话模型(Blender、DialoGPT和Plato2)，发现BAD+可以在很大程度上暴露它们的安全问题。此外，我们还发现BAD+可以大大提高发电安全性，并揭示了安全改进的关键因素。我们的代码和数据集可在url{https://github.com/thu-coai/Reverse_Generation}上获得。

{"title":"Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation","authors":"Zhexin Zhang, Jiale Cheng, Hao Sun, Jiawen Deng, Fei Mi, Yasheng Wang, Lifeng Shang, Minlie Huang","doi":"10.48550/arXiv.2212.01810","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01810","url":null,"abstract":"Large pretrained language models can easily produce toxic or biased content, which is prohibitive for practical use. In order to detect such toxic generations, existing methods rely on templates, real-world data extraction, crowdsourcing workers, or automatic generation to construct adversarial contexts that are likely to induce toxic generations. However, what type of context is more likely to induce unsafe responses is still under-explored. In this paper, we identify that context toxicity and context category (e.g., textit{profanity}, textit{insult}, textit{drugs}, etc.) are two important factors to cause safety issues in response generation. Hence, we propose a method called emph{reverse generation} to construct adversarial contexts conditioned on a given response, with the flexibility to control category, toxicity level, and inductivity of the generated contexts. Via reverse generation, we augment the existing BAD dataset and construct a new dataset BAD+ which contains more than 120K diverse and highly inductive contexts in 12 categories. We test three popular pretrained dialogue models (Blender, DialoGPT, and Plato2) and find that BAD+ can largely expose their safety problems. Furthermore, we show that BAD+ can greatly enhance the safety of generation and reveal the key factors of safety improvement. Our code and dataset is available at url{https://github.com/thu-coai/Reverse_Generation}.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"49 1","pages":"3684-3697"},"PeriodicalIF":0.0,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87400812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

T-STAR: Truthful Style Transfer using AMR Graph as Intermediate Representation T-STAR:用AMR图作为中间表示的真实风格迁移

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01667

Anubhav Jangra, Preksha Nema, A. Raghuveer

Unavailability of parallel corpora for training text style transfer (TST) models is a very challenging yet common scenario. Also, TST models implicitly need to preserve the content while transforming a source sentence into the target style. To tackle these problems, an intermediate representation is often constructed that is devoid of style while still preserving the meaning of the source sentence. In this work, we study the usefulness of Abstract Meaning Representation (AMR) graph as the intermediate style agnostic representation. We posit that semantic notations like AMR are a natural choice for an intermediate representation. Hence, we propose T-STAR: a model comprising of two components, text-to-AMR encoder and a AMR-to-text decoder. We propose several modeling improvements to enhance the style agnosticity of the generated AMR. To the best of our knowledge, T-STAR is the first work that uses AMR as an intermediate representation for TST. With thorough experimental evaluation we show T-STAR significantly outperforms state of the art techniques by achieving on an average 15.2% higher content preservation with negligible loss (~3%) in style accuracy. Through detailed human evaluation with 90,000 ratings, we also show that T-STAR has upto 50% lesser hallucinations compared to state of the art TST models.

无法获得用于训练文本风格迁移(TST)模型的并行语料库是一个非常具有挑战性但又很常见的情况。此外，TST模型隐式地需要在将源句子转换为目标风格时保留内容。为了解决这些问题，通常构建一个没有风格的中间表示，同时仍然保留源句子的含义。在这项工作中，我们研究了抽象意义表示(AMR)图作为中间风格不可知论表示的有效性。我们假设像AMR这样的语义符号是中间表示的自然选择。因此，我们提出了T-STAR:一个由两个组件组成的模型，文本到amr编码器和amr到文本解码器。我们提出了几个建模改进，以增强生成的AMR的风格不可知性。据我们所知，T-STAR是第一个使用AMR作为TST的中间表示的工作。经过彻底的实验评估，我们表明T-STAR显著优于最先进的技术，平均提高了15.2%的内容保存，而风格准确性的损失可以忽略不计(约3%)。通过对90,000个评分的详细人类评估，我们还表明，与最先进的TST模型相比，T-STAR的幻觉减少了50%。

{"title":"T-STAR: Truthful Style Transfer using AMR Graph as Intermediate Representation","authors":"Anubhav Jangra, Preksha Nema, A. Raghuveer","doi":"10.48550/arXiv.2212.01667","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01667","url":null,"abstract":"Unavailability of parallel corpora for training text style transfer (TST) models is a very challenging yet common scenario. Also, TST models implicitly need to preserve the content while transforming a source sentence into the target style. To tackle these problems, an intermediate representation is often constructed that is devoid of style while still preserving the meaning of the source sentence. In this work, we study the usefulness of Abstract Meaning Representation (AMR) graph as the intermediate style agnostic representation. We posit that semantic notations like AMR are a natural choice for an intermediate representation. Hence, we propose T-STAR: a model comprising of two components, text-to-AMR encoder and a AMR-to-text decoder. We propose several modeling improvements to enhance the style agnosticity of the generated AMR. To the best of our knowledge, T-STAR is the first work that uses AMR as an intermediate representation for TST. With thorough experimental evaluation we show T-STAR significantly outperforms state of the art techniques by achieving on an average 15.2% higher content preservation with negligible loss (~3%) in style accuracy. Through detailed human evaluation with 90,000 ratings, we also show that T-STAR has upto 50% lesser hallucinations compared to state of the art TST models.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"27 1","pages":"8805-8825"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84196834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Modeling Label Correlations for Ultra-Fine Entity Typing with Neural Pairwise Conditional Random Field 基于神经对条件随机场的超精细实体分类标签关联建模

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-03 DOI: 10.48550/arXiv.2212.01581

Chengyue Jiang, Yong Jiang, Weiqi Wu, Pengjun Xie, Kewei Tu

Ultra-fine entity typing (UFET) aims to predict a wide range of type phrases that correctly describe the categories of a given entity mention in a sentence. Most recent works infer each entity type independently, ignoring the correlations between types, e.g., when an entity is inferred as a president, it should also be a politician and a leader. To this end, we use an undirected graphical model called pairwise conditional random field (PCRF) to formulate the UFET problem, in which the type variables are not only unarily influenced by the input but also pairwisely relate to all the other type variables. We use various modern backbones for entity typing to compute unary potentials, and derive pairwise potentials from type phrase representations that both capture prior semantic information and facilitate accelerated inference. We use mean-field variational inference for efficient type inference on very large type sets and unfold it as a neural network module to enable end-to-end training. Experiments on UFET show that the Neural-PCRF consistently outperforms its backbones with little cost and results in a competitive performance against cross-encoder based SOTA while being thousands of times faster. We also find Neural-PCRF effective on a widely used fine-grained entity typing dataset with a smaller type set. We pack Neural-PCRF as a network module that can be plugged onto multi-label type classifiers with ease and release it in .

超精细实体类型(uet)旨在预测广泛的类型短语，正确描述句子中提到的给定实体的类别。最近的大多数作品都独立地推断出每个实体类型，忽略了类型之间的相关性，例如，当一个实体被推断为总统时，它也应该是一个政治家和领导者。为此，我们使用一种称为成对条件随机场(PCRF)的无向图形模型来表述uet问题，其中类型变量不仅受到输入的单一影响，而且还与所有其他类型变量成对相关。我们使用各种现代实体类型主干来计算一元势，并从类型短语表示中获得两两势，这既捕获了先验语义信息，又促进了加速推理。我们使用平均场变分推理对非常大的类型集进行有效的类型推理，并将其展开为一个神经网络模块，以实现端到端的训练。在uet上的实验表明，Neural-PCRF以很少的成本持续优于其骨干，并且在与基于交叉编码器的SOTA的竞争性能中具有竞争力，同时速度快数千倍。我们还发现Neural-PCRF在广泛使用的细粒度实体类型数据集上具有较小的类型集。我们将Neural-PCRF打包为一个网络模块，可以轻松地插入到多标签类型分类器中并释放它。

{"title":"Modeling Label Correlations for Ultra-Fine Entity Typing with Neural Pairwise Conditional Random Field","authors":"Chengyue Jiang, Yong Jiang, Weiqi Wu, Pengjun Xie, Kewei Tu","doi":"10.48550/arXiv.2212.01581","DOIUrl":"https://doi.org/10.48550/arXiv.2212.01581","url":null,"abstract":"Ultra-fine entity typing (UFET) aims to predict a wide range of type phrases that correctly describe the categories of a given entity mention in a sentence. Most recent works infer each entity type independently, ignoring the correlations between types, e.g., when an entity is inferred as a president, it should also be a politician and a leader. To this end, we use an undirected graphical model called pairwise conditional random field (PCRF) to formulate the UFET problem, in which the type variables are not only unarily influenced by the input but also pairwisely relate to all the other type variables. We use various modern backbones for entity typing to compute unary potentials, and derive pairwise potentials from type phrase representations that both capture prior semantic information and facilitate accelerated inference. We use mean-field variational inference for efficient type inference on very large type sets and unfold it as a neural network module to enable end-to-end training. Experiments on UFET show that the Neural-PCRF consistently outperforms its backbones with little cost and results in a competitive performance against cross-encoder based SOTA while being thousands of times faster. We also find Neural-PCRF effective on a widely used fine-grained entity typing dataset with a smaller type set. We pack Neural-PCRF as a network module that can be plugged onto multi-label type classifiers with ease and release it in .","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"303 1","pages":"6836-6847"},"PeriodicalIF":0.0,"publicationDate":"2022-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77494631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5