首页 > 最新文献

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing最新文献

英文 中文
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding 找到这样的人:以人为中心的视觉常识理解
Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang
From a visual scene containing multiple people, human is able to distinguish each individual given the context descriptions about what happened before, their mental/physical states or intentions, etc. Above ability heavily relies on human-centric commonsense knowledge and reasoning. For example, if asked to identify the"person who needs healing"in an image, we need to first know that they usually have injuries or suffering expressions, then find the corresponding visual clues before finally grounding the person. We present a new commonsense task, Human-centric Commonsense Grounding, that tests the models' ability to ground individuals given the context descriptions about what happened before, and their mental/physical states or intentions. We further create a benchmark, HumanCog, a dataset with 130k grounded commonsensical descriptions annotated on 67k images, covering diverse types of commonsense and visual scenes. We set up a context-object-aware method as a strong baseline that outperforms previous pre-trained and non-pretrained models. Further analysis demonstrates that rich visual commonsense and powerful integration of multi-modal commonsense are essential, which sheds light on future works. Data and code will be available https://github.com/Hxyou/HumanCog.
从包含多人的视觉场景中,人类能够根据之前发生的事情,他们的精神/身体状态或意图等上下文描述来区分每个人。以上能力在很大程度上依赖于以人为中心的常识和推理。例如,如果我们被要求在一幅图像中识别“需要治疗的人”,我们首先需要知道他们通常有受伤或痛苦的表情,然后找到相应的视觉线索,最后再把这个人放下去。我们提出了一个新的常识性任务,以人类为中心的常识性基础,它测试了模型在给定之前发生的事情的上下文描述以及他们的精神/身体状态或意图的情况下对个体进行基础的能力。我们进一步创建了一个基准,HumanCog,这是一个数据集,在67k图像上标注了130k基于常识的描述,涵盖了不同类型的常识和视觉场景。我们建立了一个上下文-对象感知方法作为强基线,优于先前的预训练和非预训练模型。进一步的分析表明,丰富的视觉常识和强大的多模态常识整合是必不可少的,这对未来的工作有一定的启示。数据和代码将提供https://github.com/Hxyou/HumanCog。
{"title":"Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding","authors":"Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang","doi":"10.48550/arXiv.2212.06971","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06971","url":null,"abstract":"From a visual scene containing multiple people, human is able to distinguish each individual given the context descriptions about what happened before, their mental/physical states or intentions, etc. Above ability heavily relies on human-centric commonsense knowledge and reasoning. For example, if asked to identify the\"person who needs healing\"in an image, we need to first know that they usually have injuries or suffering expressions, then find the corresponding visual clues before finally grounding the person. We present a new commonsense task, Human-centric Commonsense Grounding, that tests the models' ability to ground individuals given the context descriptions about what happened before, and their mental/physical states or intentions. We further create a benchmark, HumanCog, a dataset with 130k grounded commonsensical descriptions annotated on 67k images, covering diverse types of commonsense and visual scenes. We set up a context-object-aware method as a strong baseline that outperforms previous pre-trained and non-pretrained models. Further analysis demonstrates that rich visual commonsense and powerful integration of multi-modal commonsense are essential, which sheds light on future works. Data and code will be available https://github.com/Hxyou/HumanCog.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"152 1","pages":"5444-5454"},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86337788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging 通过随机加权平均提高预训练语言模型的泛化
Peng Lu, I. Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, A. Ghodsi, P. Langlais
Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.
知识蒸馏(Knowledge Distillation, KD)是一种常用的技术,用于提高精简预训练语言模型(PLMs)在下游任务上的泛化能力。然而,这种方法增加了额外的负担,即为每个新数据集训练一个单独的教师模型。或者,人们可以直接致力于改进紧凑模型的优化过程,以获得更好的泛化。最近的研究发现,局部最小值的平坦度与更好的泛化有很好的相关性。在这项工作中,我们采用随机加权平均(SWA),一种鼓励收敛到更平坦的最小值的方法,来微调plm。我们在各种NLP任务(文本分类、问答和生成)和不同的模型架构上进行了广泛的实验,并证明我们的自适应在没有额外计算成本的情况下提高了泛化。此外,我们观察到这种简单的优化技术能够优于最先进的紧凑模型的KD方法。
{"title":"Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging","authors":"Peng Lu, I. Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, A. Ghodsi, P. Langlais","doi":"10.48550/arXiv.2212.05956","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05956","url":null,"abstract":"Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"158 1","pages":"4948-4954"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80612040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue 基于视频的对话的信息理论文本幻觉还原
Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Changdong Yoo
Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.
基于视频的对话(VGD)旨在解码关于给定视频和对话上下文的问题的答案句子。尽管最近多模态推理在生成回答句方面取得了成功,但现有的对话系统仍然存在文本幻觉问题,这意味着在不理解问题的情况下,不加区分地从输入文本复制文本。这是由于从数据集中的回答句子通常包含输入文本的单词这一事实中学习到虚假相关性,因此VGD系统过度依赖于复制输入文本中的单词,希望这些单词与基本事实文本重叠。因此,我们设计了文本幻觉缓解(THAM)框架,该框架结合了由所提出的信息论文本幻觉测量方法产生的文本幻觉正则化(THR)损失。将THAM应用于当前的对话系统验证了VGD基准(即AVSD@DSTC7和AVSD@DSTC8)的有效性,并显示出增强的可解释性。
{"title":"Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue","authors":"Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Changdong Yoo","doi":"10.48550/arXiv.2212.05765","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05765","url":null,"abstract":"Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"1 1","pages":"4182-4193"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90639904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification FastClass:一种时间效率的弱监督文本分类方法
Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang
Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords.Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.
弱监督文本分类旨在仅使用类描述和未标记的数据来训练分类器。最近的研究表明,关键字驱动的方法可以在各种任务上达到最先进的性能。然而,这些方法不仅依赖于精心设计的类描述来获得特定于类的关键字,而且还需要大量未标记的数据,并且需要很长时间来训练。本文提出了一种高效的弱监督分类方法FastClass。它使用密集文本表示从外部未标记语料库中检索类相关文档,并选择最优子集来训练分类器。与关键字驱动的方法相比,我们的方法较少依赖于初始类描述,因为它不再需要将每个类描述扩展为一组特定于类的关键字。在广泛的分类任务上的实验表明,所提出的方法在分类精度方面经常优于关键词驱动模型,并且通常具有数量级的训练速度。
{"title":"FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification","authors":"Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang","doi":"10.48550/arXiv.2212.05506","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05506","url":null,"abstract":"Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords.Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"14 1","pages":"4746-4758"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86119901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access 面向任务的非结构化知识访问对话中的主题感知响应生成
Yue Feng, Gerasimos Lampouras, Ignacio Iacobacci
To alleviate the problem of structured databases' limited coverage, recent task-oriented dialogue systems incorporate external unstructured knowledge to guide the generation of system responses. However, these usually use word or sentence level similarities to detect the relevant knowledge context, which only partially capture the topical level relevance. In this paper, we examine how to better integrate topical information in knowledge grounded task-oriented dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end response generation model. TARG incorporates multiple topic-aware attention mechanisms to derive the importance weighting scheme over dialogue utterances and external knowledge sources towards a better understanding of the dialogue history. Experimental results indicate that TARG achieves state-of-the-art performance in knowledge selection and response generation, outperforming previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4 respectively on Doc2Dial, and performing comparably with previous work on DSTC9; both being knowledge-grounded task-oriented dialogue datasets.
为了减轻结构化数据库覆盖范围有限的问题,最近面向任务的对话系统结合了外部非结构化知识来指导系统响应的生成。然而,这些方法通常使用单词或句子级别的相似性来检测相关知识上下文,这只能部分捕获主题级别的相关性。在本文中,我们研究了如何在基于知识的任务导向对话中更好地整合主题信息,并提出了“主题感知响应生成”(Topic-Aware Response Generation, TARG),这是一种端到端响应生成模型。TARG结合多个话题感知注意机制,推导出对话话语和外部知识来源的重要性加权方案,从而更好地理解对话历史。实验结果表明,TARG在知识选择和响应生成方面达到了最先进的水平,在Doc2Dial上EM、F1和BLEU-4分别高出3.2、3.6和4.2分,在DSTC9上与前人相当;都是基于知识的、面向任务的对话数据集。
{"title":"Topic-Aware Response Generation in Task-Oriented Dialogue with Unstructured Knowledge Access","authors":"Yue Feng, Gerasimos Lampouras, Ignacio Iacobacci","doi":"10.48550/arXiv.2212.05373","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05373","url":null,"abstract":"To alleviate the problem of structured databases' limited coverage, recent task-oriented dialogue systems incorporate external unstructured knowledge to guide the generation of system responses. However, these usually use word or sentence level similarities to detect the relevant knowledge context, which only partially capture the topical level relevance. In this paper, we examine how to better integrate topical information in knowledge grounded task-oriented dialogue and propose ``Topic-Aware Response Generation'' (TARG), an end-to-end response generation model. TARG incorporates multiple topic-aware attention mechanisms to derive the importance weighting scheme over dialogue utterances and external knowledge sources towards a better understanding of the dialogue history. Experimental results indicate that TARG achieves state-of-the-art performance in knowledge selection and response generation, outperforming previous state-of-the-art by 3.2, 3.6, and 4.2 points in EM, F1 and BLEU-4 respectively on Doc2Dial, and performing comparably with previous work on DSTC9; both being knowledge-grounded task-oriented dialogue datasets.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"25 1","pages":"7199-7211"},"PeriodicalIF":0.0,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75280442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Successive Prompting for Decomposing Complex Questions 分解复杂问题的连续提示
Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner
Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce “Successive Prompting” where, we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate synthetic dataset which can be used to bootstrap model’s ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement in F1 of ~5% when compared with a state-of-the-art model with synthetic augmentations and few-shot version of the DROP dataset.
回答需要做出潜在决定的复杂问题是一项具有挑战性的任务,尤其是在监督有限的情况下。最近的工作利用大型语言模型(LMs)的功能,通过演示如何在一次解决复杂问题的同时输出中间合理化,在几个镜头的设置中执行复杂的问题回答。我们引入“连续提示”,我们迭代地将一个复杂的任务分解成一个简单的任务,解决它,然后重复这个过程,直到我们得到最终的解决方案。连续提示将分解复杂问题的监督与回答简单问题的监督解耦,使我们能够(1)在每个推理步骤中有多个机会查询上下文示例;(2)从问题回答中单独学习问题分解,包括使用合成数据;(3)在大型LM表现不佳的推理步骤中使用定制(微调)组件。中间监督通常是手动编写的,收集起来可能会很昂贵。我们介绍了一种生成合成数据集的方法,该方法可以用来引导模型分解和回答中间问题的能力。与具有合成增强和DROP数据集的少镜头版本的最先进模型相比,我们的最佳模型(具有连续提示)在F1上实现了~5%的改进。
{"title":"Successive Prompting for Decomposing Complex Questions","authors":"Dheeru Dua, Shivanshu Gupta, Sameer Singh, Matt Gardner","doi":"10.48550/arXiv.2212.04092","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04092","url":null,"abstract":"Answering complex questions that require making latent decisions is a challenging task, especially when limited supervision is available. Recent works leverage the capabilities of large language models (LMs) to perform complex question answering in a few-shot setting by demonstrating how to output intermediate rationalizations while solving the complex question in a single pass. We introduce “Successive Prompting” where, we iteratively break down a complex task into a simple task, solve it, and then repeat the process until we get the final solution. Successive prompting decouples the supervision for decomposing complex questions from the supervision for answering simple questions, allowing us to (1) have multiple opportunities to query in-context examples at each reasoning step (2) learn question decomposition separately from question answering, including using synthetic data, and (3) use bespoke (fine-tuned) components for reasoning steps where a large LM does not perform well. The intermediate supervision is typically manually written, which can be expensive to collect. We introduce a way to generate synthetic dataset which can be used to bootstrap model’s ability to decompose and answer intermediate questions. Our best model (with successive prompting) achieves an improvement in F1 of ~5% when compared with a state-of-the-art model with synthetic augmentations and few-shot version of the DROP dataset.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"44 1","pages":"1251-1265"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80951785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection 切中要害,不要拐弯抹角:用一次投射移除受保护的属性
P. Haghighatkhah, Antske Fokkens, Pia Sommerauer, B. Speckmann, Kevin Verbeek
Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections.Multiple iterations, however, increase the risk that information other than the target is negatively affected.We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space.Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.
偏见消除和最近的探索研究试图从嵌入空间中去除特定信息。在这里,重要的是要尽可能多地删除目标信息,同时保留现有的任何其他信息。INLP是最近流行的一种通过迭代零空间投影去除特定信息的方法。然而,多次迭代会增加目标以外的信息受到负面影响的风险。我们介绍了寻找单个目标投影的两种方法:均值投影(MP,更有效)和Tukey中值投影(TMP,有理论保证)。我们对MP和INLP的比较表明:(1)一个MP投影消除了基于目标的线性可分性;(2)MP对整体空间的影响较小。进一步分析表明,在MP后应用随机投影对嵌入空间的总体影响与INLP的多次投影相同。因此,应用一个目标(MP)投影比应用引入随机效应的多个(INLP)投影在方法上更干净。
{"title":"Better Hit the Nail on the Head than Beat around the Bush: Removing Protected Attributes with a Single Projection","authors":"P. Haghighatkhah, Antske Fokkens, Pia Sommerauer, B. Speckmann, Kevin Verbeek","doi":"10.48550/arXiv.2212.04273","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04273","url":null,"abstract":"Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections.Multiple iterations, however, increase the risk that information other than the target is negatively affected.We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space.Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"53 1","pages":"8395-8416"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86175184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation 低资源神经机器翻译迁移学习中的一致性建模
Zhao Li, Xuebo Liu, Derek F. Wong, Lidia S. Chao, Min Zhang
Transfer learning is a simple and powerful method that can be used to boost model performance of low-resource neural machine translation (NMT). Existing transfer learning methods for NMT are static, which simply transfer knowledge from a parent model to a child model once via parameter initialization. In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can continuously transfer knowledge from the parent model during the training of the child model. Specifically, for each training instance of the child model, ConsistTL constructs the semantically-equivalent instance for the parent model and encourages prediction consistency between the parent and child for this instance, which is equivalent to the child model learning each instance under the guidance of the parent model. Experimental results on five low-resource NMT tasks demonstrate that ConsistTL results in significant improvements over strong transfer learning baselines, with a gain up to 1.7 BLEU over the existing back-translation model on the widely-used WMT17 Turkish-English benchmark. Further analysis reveals that ConsistTL can improve the inference calibration of the child model. Code and scripts are freely available at https://github.com/NLP2CT/ConsistTL.
迁移学习是一种简单而有效的方法,可用于提高低资源神经机器翻译的模型性能。现有的NMT迁移学习方法是静态的,它只是通过参数初始化将知识从父模型转移到子模型一次。在本文中,我们提出了一种新的NMT迁移学习方法,即ConsistTL,它可以在子模型的训练过程中不断地从父模型迁移知识。具体来说,对于子模型的每一个训练实例,ConsistTL都为父模型构建语义等价的实例,并鼓励父模型和子模型对该实例的预测一致性,相当于子模型在父模型的指导下学习每一个实例。在五个低资源NMT任务上的实验结果表明,ConsistTL在强迁移学习基线上取得了显著的进步,在广泛使用的WMT17土耳其语-英语基准上,与现有的反翻译模型相比,其增益高达1.7 BLEU。进一步分析表明,ConsistTL可以改善子模型的推理校准。代码和脚本可在https://github.com/NLP2CT/ConsistTL免费获得。
{"title":"ConsistTL: Modeling Consistency in Transfer Learning for Low-Resource Neural Machine Translation","authors":"Zhao Li, Xuebo Liu, Derek F. Wong, Lidia S. Chao, Min Zhang","doi":"10.48550/arXiv.2212.04262","DOIUrl":"https://doi.org/10.48550/arXiv.2212.04262","url":null,"abstract":"Transfer learning is a simple and powerful method that can be used to boost model performance of low-resource neural machine translation (NMT). Existing transfer learning methods for NMT are static, which simply transfer knowledge from a parent model to a child model once via parameter initialization. In this paper, we propose a novel transfer learning method for NMT, namely ConsistTL, which can continuously transfer knowledge from the parent model during the training of the child model. Specifically, for each training instance of the child model, ConsistTL constructs the semantically-equivalent instance for the parent model and encourages prediction consistency between the parent and child for this instance, which is equivalent to the child model learning each instance under the guidance of the parent model. Experimental results on five low-resource NMT tasks demonstrate that ConsistTL results in significant improvements over strong transfer learning baselines, with a gain up to 1.7 BLEU over the existing back-translation model on the widely-used WMT17 Turkish-English benchmark. Further analysis reveals that ConsistTL can improve the inference calibration of the child model. Code and scripts are freely available at https://github.com/NLP2CT/ConsistTL.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"7 1 1","pages":"8383-8394"},"PeriodicalIF":0.0,"publicationDate":"2022-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76888784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Neural Machine Translation with Contrastive Translation Memories 基于对比翻译记忆的神经机器翻译
Xin Cheng, Shen Gao, Lemao Liu, Dongyan Zhao, Rui Yan
Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. Different from previous works that make use of mutually similar but redundant translation memories (TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gain in three phases. First, in TM retrieval phase, we adopt contrastive retrieval algorithm to avoid redundancy and uninformativeness of similar translation pieces. Second, in memory encoding stage, given a set of TMs we propose a novel Hierarchical Group Attention module to gather both local context of each TM and global context of the whole TM set. Finally, in training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence. Experimental results show that our framework obtains substantial improvements over strong baselines in the benchmark dataset.
检索增强神经机器翻译模型在许多翻译场景中都取得了成功。与以往使用相互相似但冗余的翻译记忆库不同,我们提出了一种新的检索增强的神经网络机器翻译模型,该模型与源句子整体相似,但彼此相对,在三个阶段提供最大的信息增益。首先,在TM检索阶段,我们采用对比检索算法来避免相似翻译片段的冗余和非信息性。其次,在记忆编码阶段,在给定一组TM的情况下,我们提出了一种新的分层分组注意模块来收集每个TM的局部上下文和整个TM集的全局上下文。最后,在训练阶段,引入Multi-TM对比学习目标,学习每个TM相对于目标句子的显著特征。实验结果表明,我们的框架比基准数据集中的强基线得到了实质性的改进。
{"title":"Neural Machine Translation with Contrastive Translation Memories","authors":"Xin Cheng, Shen Gao, Lemao Liu, Dongyan Zhao, Rui Yan","doi":"10.48550/arXiv.2212.03140","DOIUrl":"https://doi.org/10.48550/arXiv.2212.03140","url":null,"abstract":"Retrieval-augmented Neural Machine Translation models have been successful in many translation scenarios. Different from previous works that make use of mutually similar but redundant translation memories (TMs), we propose a new retrieval-augmented NMT to model contrastively retrieved translation memories that are holistically similar to the source sentence while individually contrastive to each other providing maximal information gain in three phases. First, in TM retrieval phase, we adopt contrastive retrieval algorithm to avoid redundancy and uninformativeness of similar translation pieces. Second, in memory encoding stage, given a set of TMs we propose a novel Hierarchical Group Attention module to gather both local context of each TM and global context of the whole TM set. Finally, in training phase, a Multi-TM contrastive learning objective is introduced to learn salient feature of each TM with respect to target sentence. Experimental results show that our framework obtains substantial improvements over strong baselines in the benchmark dataset.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"185 1","pages":"3591-3601"},"PeriodicalIF":0.0,"publicationDate":"2022-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76048311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events POQue:询问参与者特定的结果问题,以更深入地理解复杂事件
Sai Vallurupalli, Sayontan Ghosh, K. Erk, Niranjan Balasubramanian, Francis Ferraro
Knowledge about outcomes is critical for complex event understanding but is hard to acquire.We show that by pre-identifying a participant in a complex event, crowdworkers are ableto (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground theoutcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96weighted Fleiss Kappa). Our dataset, POQUe (Participant Outcome Questions), enables theexploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant’s influence over the event culmination.
关于结果的知识对于理解复杂事件至关重要,但很难获得。我们表明,通过预先识别复杂事件中的参与者,众包工作者能够(1)推断构成该情况的显著事件的集体影响,(2)注释导致该情况的参与者的自愿参与,以及(3)根据参与者的状态变化来确定情况的结果。通过创建多步骤界面和仔细的质量控制策略,我们收集了一个高质量的注释数据集,包含8k短新闻报道和具有高注释者间一致性(0.74-0.96加权Fleiss Kappa)的ROCStories。我们的数据集POQUe(参与者结果问题)能够探索和开发解决语义理解多个方面的模型。通过实验,我们发现当前的语言模型以微妙的方式落后于人类的表现,通过我们的任务公式,目标是对复杂事件的抽象和具体理解,其结果,以及参与者对事件高潮的影响。
{"title":"POQue: Asking Participant-specific Outcome Questions for a Deeper Understanding of Complex Events","authors":"Sai Vallurupalli, Sayontan Ghosh, K. Erk, Niranjan Balasubramanian, Francis Ferraro","doi":"10.48550/arXiv.2212.02629","DOIUrl":"https://doi.org/10.48550/arXiv.2212.02629","url":null,"abstract":"Knowledge about outcomes is critical for complex event understanding but is hard to acquire.We show that by pre-identifying a participant in a complex event, crowdworkers are ableto (1) infer the collective impact of salient events that make up the situation, (2) annotate the volitional engagement of participants in causing the situation, and (3) ground theoutcome of the situation in state changes of the participants. By creating a multi-step interface and a careful quality control strategy, we collect a high quality annotated dataset of8K short newswire narratives and ROCStories with high inter-annotator agreement (0.74-0.96weighted Fleiss Kappa). Our dataset, POQUe (Participant Outcome Questions), enables theexploration and development of models that address multiple aspects of semantic understanding. Experimentally, we show that current language models lag behind human performance in subtle ways through our task formulations that target abstract and specific comprehension of a complex event, its outcome, and a participant’s influence over the event culmination.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"16 1","pages":"8674-8697"},"PeriodicalIF":0.0,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89406141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1