Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing最新文献

英文中文

Generating Multiple-Length Summaries via Reinforcement Learning for Unsupervised Sentence Summarization 基于强化学习的无监督句子摘要生成多长度摘要

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-21 DOI: 10.48550/arXiv.2212.10843

Dongmin Hyun, Xiting Wang, Chanyoung Park, Xing Xie, Hwanjo Yu

Sentence summarization shortens given texts while maintaining core contents of the texts. Unsupervised approaches have been studied to summarize texts without human-written summaries. However, recent unsupervised models are extractive, which remove words from texts and thus they are less flexible than abstractive summarization. In this work, we devise an abstractive model based on reinforcement learning without ground-truth summaries. We formulate the unsupervised summarization based on the Markov decision process with rewards representing the summary quality. To further enhance the summary quality, we develop a multi-summary learning mechanism that generates multiple summaries with varying lengths for a given text, while making the summaries mutually enhance each other. Experimental results show that the proposed model substantially outperforms both abstractive and extractive models, yet frequently generating new words not contained in input texts.

句子摘要在保持文本核心内容的同时，缩短了给定文本。已经研究了无监督的方法来总结没有人类书面摘要的文本。然而，最近的无监督模型是抽取的，它从文本中删除单词，因此它们不如抽象摘要灵活。在这项工作中，我们设计了一个基于强化学习的抽象模型，没有真实摘要。我们提出了基于马尔可夫决策过程的无监督总结，奖励代表总结质量。为了进一步提高摘要质量，我们开发了一种多摘要学习机制，该机制可以为给定文本生成不同长度的多个摘要，同时使摘要相互增强。实验结果表明，该模型大大优于抽象模型和抽取模型，但经常生成输入文本中不包含的新词。

引用次数: 4

Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer? 多语言BERT的跨语言句法差异:它有多好?它如何影响迁移?

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-21 DOI: 10.48550/arXiv.2212.10879

Ningyu Xu, Tao Gui, Ruotian Ma, Qi Zhang, Jingting Ye, Menghan Zhang, Xuanjing Huang

Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.

多语言BERT (mBERT)已经证明了相当大的跨语言句法能力，从而实现了句法知识的有效零概率跨语言迁移。一些语言之间的迁移更为成功，但人们并不清楚是什么导致了这种差异，以及它是否公平地反映了语言之间的差异。在这项工作中，我们研究了在24种不同类型的语言背景下由mBERT诱导的语法关系的分布。我们证明了不同语言分布之间的距离与语言形式的句法差异高度一致。这种通过自我监督习得的差异在零迁移表现中起着至关重要的作用，并且可以通过语言间形态句法特性的差异来预测。这些结果表明，mBERT以一种符合语言多样性的方式对语言进行了适当的编码，并为跨语言迁移的机制提供了新的见解。

{"title":"Cross-Linguistic Syntactic Difference in Multilingual BERT: How Good is It and How Does It Affect Transfer?","authors":"Ningyu Xu, Tao Gui, Ruotian Ma, Qi Zhang, Jingting Ye, Menghan Zhang, Xuanjing Huang","doi":"10.48550/arXiv.2212.10879","DOIUrl":"https://doi.org/10.48550/arXiv.2212.10879","url":null,"abstract":"Multilingual BERT (mBERT) has demonstrated considerable cross-lingual syntactic ability, whereby it enables effective zero-shot cross-lingual transfer of syntactic knowledge. The transfer is more successful between some languages, but it is not well understood what leads to this variation and whether it fairly reflects difference between languages. In this work, we investigate the distributions of grammatical relations induced from mBERT in the context of 24 typologically different languages. We demonstrate that the distance between the distributions of different languages is highly consistent with the syntactic difference in terms of linguistic formalisms. Such difference learnt via self-supervision plays a crucial role in the zero-shot transfer performance and can be predicted by variation in morphosyntactic properties between languages. These results suggest that mBERT properly encodes languages in a way consistent with linguistic diversity and provide insights into the mechanism of cross-lingual transfer.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"71 1","pages":"8073-8092"},"PeriodicalIF":0.0,"publicationDate":"2022-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74047386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Azimuth: Systematic Error Analysis for Text Classification 方位角:文本分类系统误差分析

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-16 DOI: 10.48550/arXiv.2212.08216

Gabrielle Gauthier Melançon, Orlando Marquez Ayala, Lindsay D. Brin, Chris Tyler, Frederic Branchaud-Charron, Joseph Marinier, Karine Grande, Dieu-Thu Le

We present Azimuth, an open-source and easy-to-use tool to perform error analysis for text classification. Compared to other stages of the ML development cycle, such as model training and hyper-parameter tuning, the process and tooling for the error analysis stage are less mature. However, this stage is critical for the development of reliable and trustworthy AI systems. To make error analysis more systematic, we propose an approach comprising dataset analysis and model quality assessment, which Azimuth facilitates. We aim to help AI practitioners discover and address areas where the model does not generalize by leveraging and integrating a range of ML techniques, such as saliency maps, similarity, uncertainty, and behavioral analyses, all in one tool. Our code and documentation are available at github.com/servicenow/azimuth.

我们介绍了Azimuth，一个开源和易于使用的工具，用于执行文本分类的错误分析。与机器学习开发周期的其他阶段(如模型训练和超参数调优)相比，错误分析阶段的过程和工具不太成熟。然而，这一阶段对于开发可靠和值得信赖的人工智能系统至关重要。为了使误差分析更加系统化，我们提出了一种包含数据集分析和模型质量评估的方法。我们的目标是通过利用和集成一系列ML技术，如显著性地图、相似性、不确定性和行为分析，帮助人工智能从业者发现和解决模型不能泛化的领域。我们的代码和文档可在github.com/servicenow/azimuth上获得。

引用次数: 2

Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification 基于密集特征记忆增强变压器的COVID-19疫苗搜索分类

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-16 DOI: 10.48550/arXiv.2212.13898

Jai Gupta, Yi Tay, C. Kamath, Vinh Q. Tran, Donald Metzler, S. Bavadekar, Mimi Sun, E. Gabrilovich

With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic. Given the protection they provide, vaccines are becoming mandatory in certain social and professional settings. This paper presents a classification model for detecting COVID-19 vaccination related search queries, a machine learning model that is used to generate search insights for COVID-19 vaccinations. The proposed method combines and leverages advancements from modern state-of-the-art (SOTA) natural language understanding (NLU) techniques such as pretrained Transformers with traditional dense features. We propose a novel approach of considering dense features as memory tokens that the model can attend to. We show that this new modeling approach enables a significant improvement to the Vaccine Search Insights (VSI) task, improving a strong well-established gradient-boosting baseline by relative +15% improvement in F1 score and +14% in precision.

随着2019冠状病毒病(COVID-19)的毁灭性爆发，疫苗是在这场全球大流行中抵御大规模感染的关键防线之一。鉴于疫苗所提供的保护，在某些社会和专业环境中，疫苗正成为强制性的。本文提出了一种用于检测COVID-19疫苗接种相关搜索查询的分类模型，该模型是一种用于生成COVID-19疫苗接种搜索洞察的机器学习模型。所提出的方法结合并利用了现代最先进的(SOTA)自然语言理解(NLU)技术的进步，例如具有传统密集特征的预训练变形金刚。我们提出了一种新颖的方法，将密集特征视为模型可以关注的记忆标记。我们表明，这种新的建模方法能够显著改善疫苗搜索洞察(VSI)任务，通过F1得分和精度的相对提高+15%和+14%，改善了强大的已建立的梯度增强基线。

{"title":"Dense Feature Memory Augmented Transformers for COVID-19 Vaccination Search Classification","authors":"Jai Gupta, Yi Tay, C. Kamath, Vinh Q. Tran, Donald Metzler, S. Bavadekar, Mimi Sun, E. Gabrilovich","doi":"10.48550/arXiv.2212.13898","DOIUrl":"https://doi.org/10.48550/arXiv.2212.13898","url":null,"abstract":"With the devastating outbreak of COVID-19, vaccines are one of the crucial lines of defense against mass infection in this global pandemic. Given the protection they provide, vaccines are becoming mandatory in certain social and professional settings. This paper presents a classification model for detecting COVID-19 vaccination related search queries, a machine learning model that is used to generate search insights for COVID-19 vaccinations. The proposed method combines and leverages advancements from modern state-of-the-art (SOTA) natural language understanding (NLU) techniques such as pretrained Transformers with traditional dense features. We propose a novel approach of considering dense features as memory tokens that the model can attend to. We show that this new modeling approach enables a significant improvement to the Vaccine Search Insights (VSI) task, improving a strong well-established gradient-boosting baseline by relative +15% improvement in F1 score and +14% in precision.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"43 1","pages":"521-530"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84126133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ReCo: Reliable Causal Chain Reasoning via Structural Causal Recurrent Neural Networks 基于结构因果递归神经网络的可靠因果链推理

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-16 DOI: 10.48550/arXiv.2212.08322

Kai Xiong, Xiao Ding, Zhongyang Li, L. Du, Bing Qin, Yi Zheng, Baoxing Huai

Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario.To address these issues, we propose a novel Reliable Causal chain reasoning framework (ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks (SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge.

因果链推理(Causal chain reasoning, CCR)是许多决策人工智能系统的基本能力，它要求模型通过连接因果对来构建可靠的因果链。然而，CCR存在两个主要的传递问题:阈值效应和场景漂移。换句话说，要拼接的因果对可能具有冲突的阈值边界或场景。为了解决这些问题，我们提出了一种新的可靠因果链推理框架(ReCo)，该框架引入外生变量来表示因果链中每个因果对的阈值和场景因素，并通过结构因果递归神经网络(SRNN)估计外生变量之间的阈值和场景矛盾。实验表明，在中英文CCR数据集上，ReCo算法的性能优于一系列强基线。此外，通过注入由ReCo提取的可靠因果链知识，BERT模型在四个下游因果相关任务上的表现优于其他类型知识增强的BERT模型。

{"title":"ReCo: Reliable Causal Chain Reasoning via Structural Causal Recurrent Neural Networks","authors":"Kai Xiong, Xiao Ding, Zhongyang Li, L. Du, Bing Qin, Yi Zheng, Baoxing Huai","doi":"10.48550/arXiv.2212.08322","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08322","url":null,"abstract":"Causal chain reasoning (CCR) is an essential ability for many decision-making AI systems, which requires the model to build reliable causal chains by connecting causal pairs. However, CCR suffers from two main transitive problems: threshold effect and scene drift. In other words, the causal pairs to be spliced may have a conflicting threshold boundary or scenario.To address these issues, we propose a novel Reliable Causal chain reasoning framework (ReCo), which introduces exogenous variables to represent the threshold and scene factors of each causal pair within the causal chain, and estimates the threshold and scene contradictions across exogenous variables via structural causal recurrent neural networks (SRNN). Experiments show that ReCo outperforms a series of strong baselines on both Chinese and English CCR datasets. Moreover, by injecting reliable causal chain knowledge distilled by ReCo, BERT can achieve better performances on four downstream causal-related tasks than BERT models enhanced by other kinds of knowledge.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"30 1","pages":"6426-6438"},"PeriodicalIF":0.0,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86496573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Injecting Domain Knowledge in Language Models for Task-oriented Dialogue Systems 面向任务对话系统的语言模型领域知识注入

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-15 DOI: 10.48550/arXiv.2212.08120

Denis Emelin, Daniele Bonadiman, Sawsan Alqahtani, Yi Zhang, Saab Mansour

Pre-trained language models (PLM) have advanced the state-of-the-art across NLP applications, but lack domain-specific knowledge that does not naturally occur in pre-training data. Previous studies augmented PLMs with symbolic knowledge for different downstream NLP tasks. However, knowledge bases (KBs) utilized in these studies are usually large-scale and static, in contrast to small, domain-specific, and modifiable knowledge bases that are prominent in real-world task-oriented dialogue (TOD) systems. In this paper, we showcase the advantages of injecting domain-specific knowledge prior to fine-tuning on TOD tasks. To this end, we utilize light-weight adapters that can be easily integrated with PLMs and serve as a repository for facts learned from different KBs. To measure the efficacy of proposed knowledge injection methods, we introduce Knowledge Probing using Response Selection (KPRS) – a probe designed specifically for TOD models. Experiments on KPRS and the response generation task show improvements of knowledge injection with adapters over strong baselines.

预训练语言模型(PLM)已经在NLP应用中推进了最先进的技术，但缺乏在预训练数据中不会自然出现的领域特定知识。以往的研究在不同的下游NLP任务中增强了符号知识的plm。然而，在这些研究中使用的知识库(KBs)通常是大规模和静态的，与现实世界中面向任务的对话(TOD)系统中突出的小型、特定领域和可修改的知识库形成对比。在本文中，我们展示了在对TOD任务进行微调之前注入特定领域知识的优势。为此，我们利用轻量级适配器，它可以很容易地与plm集成，并作为从不同KBs学习到的事实的存储库。为了衡量所提出的知识注入方法的有效性，我们引入了使用响应选择(KPRS)的知识探测——一种专门为TOD模型设计的探测。在KPRS和响应生成任务上的实验表明，在强基线的基础上，适配器的知识注入得到了改进。

{"title":"Injecting Domain Knowledge in Language Models for Task-oriented Dialogue Systems","authors":"Denis Emelin, Daniele Bonadiman, Sawsan Alqahtani, Yi Zhang, Saab Mansour","doi":"10.48550/arXiv.2212.08120","DOIUrl":"https://doi.org/10.48550/arXiv.2212.08120","url":null,"abstract":"Pre-trained language models (PLM) have advanced the state-of-the-art across NLP applications, but lack domain-specific knowledge that does not naturally occur in pre-training data. Previous studies augmented PLMs with symbolic knowledge for different downstream NLP tasks. However, knowledge bases (KBs) utilized in these studies are usually large-scale and static, in contrast to small, domain-specific, and modifiable knowledge bases that are prominent in real-world task-oriented dialogue (TOD) systems. In this paper, we showcase the advantages of injecting domain-specific knowledge prior to fine-tuning on TOD tasks. To this end, we utilize light-weight adapters that can be easily integrated with PLMs and serve as a repository for facts learned from different KBs. To measure the efficacy of proposed knowledge injection methods, we introduce Knowledge Probing using Response Selection (KPRS) – a probe designed specifically for TOD models. Experiments on KPRS and the response generation task show improvements of knowledge injection with adapters over strong baselines.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"56 1","pages":"11962-11974"},"PeriodicalIF":0.0,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79138347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding 找到这样的人:以人为中心的视觉常识理解

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-14 DOI: 10.48550/arXiv.2212.06971

Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang

From a visual scene containing multiple people, human is able to distinguish each individual given the context descriptions about what happened before, their mental/physical states or intentions, etc. Above ability heavily relies on human-centric commonsense knowledge and reasoning. For example, if asked to identify the"person who needs healing"in an image, we need to first know that they usually have injuries or suffering expressions, then find the corresponding visual clues before finally grounding the person. We present a new commonsense task, Human-centric Commonsense Grounding, that tests the models' ability to ground individuals given the context descriptions about what happened before, and their mental/physical states or intentions. We further create a benchmark, HumanCog, a dataset with 130k grounded commonsensical descriptions annotated on 67k images, covering diverse types of commonsense and visual scenes. We set up a context-object-aware method as a strong baseline that outperforms previous pre-trained and non-pretrained models. Further analysis demonstrates that rich visual commonsense and powerful integration of multi-modal commonsense are essential, which sheds light on future works. Data and code will be available https://github.com/Hxyou/HumanCog.

从包含多人的视觉场景中，人类能够根据之前发生的事情，他们的精神/身体状态或意图等上下文描述来区分每个人。以上能力在很大程度上依赖于以人为中心的常识和推理。例如，如果我们被要求在一幅图像中识别“需要治疗的人”，我们首先需要知道他们通常有受伤或痛苦的表情，然后找到相应的视觉线索，最后再把这个人放下去。我们提出了一个新的常识性任务，以人类为中心的常识性基础，它测试了模型在给定之前发生的事情的上下文描述以及他们的精神/身体状态或意图的情况下对个体进行基础的能力。我们进一步创建了一个基准，HumanCog，这是一个数据集，在67k图像上标注了130k基于常识的描述，涵盖了不同类型的常识和视觉场景。我们建立了一个上下文-对象感知方法作为强基线，优于先前的预训练和非预训练模型。进一步的分析表明，丰富的视觉常识和强大的多模态常识整合是必不可少的，这对未来的工作有一定的启示。数据和代码将提供https://github.com/Hxyou/HumanCog。

{"title":"Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding","authors":"Haoxuan You, Rui Sun, Zhecan Wang, Kai-Wei Chang, Shih-Fu Chang","doi":"10.48550/arXiv.2212.06971","DOIUrl":"https://doi.org/10.48550/arXiv.2212.06971","url":null,"abstract":"From a visual scene containing multiple people, human is able to distinguish each individual given the context descriptions about what happened before, their mental/physical states or intentions, etc. Above ability heavily relies on human-centric commonsense knowledge and reasoning. For example, if asked to identify the\"person who needs healing\"in an image, we need to first know that they usually have injuries or suffering expressions, then find the corresponding visual clues before finally grounding the person. We present a new commonsense task, Human-centric Commonsense Grounding, that tests the models' ability to ground individuals given the context descriptions about what happened before, and their mental/physical states or intentions. We further create a benchmark, HumanCog, a dataset with 130k grounded commonsensical descriptions annotated on 67k images, covering diverse types of commonsense and visual scenes. We set up a context-object-aware method as a strong baseline that outperforms previous pre-trained and non-pretrained models. Further analysis demonstrates that rich visual commonsense and powerful integration of multi-modal commonsense are essential, which sheds light on future works. Data and code will be available https://github.com/Hxyou/HumanCog.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"152 1","pages":"5444-5454"},"PeriodicalIF":0.0,"publicationDate":"2022-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86337788","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging 通过随机加权平均提高预训练语言模型的泛化

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.05956

Peng Lu, I. Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, A. Ghodsi, P. Langlais

Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.

知识蒸馏(Knowledge Distillation, KD)是一种常用的技术，用于提高精简预训练语言模型(PLMs)在下游任务上的泛化能力。然而，这种方法增加了额外的负担，即为每个新数据集训练一个单独的教师模型。或者，人们可以直接致力于改进紧凑模型的优化过程，以获得更好的泛化。最近的研究发现，局部最小值的平坦度与更好的泛化有很好的相关性。在这项工作中，我们采用随机加权平均(SWA)，一种鼓励收敛到更平坦的最小值的方法，来微调plm。我们在各种NLP任务(文本分类、问答和生成)和不同的模型架构上进行了广泛的实验，并证明我们的自适应在没有额外计算成本的情况下提高了泛化。此外，我们观察到这种简单的优化技术能够优于最先进的紧凑模型的KD方法。

{"title":"Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging","authors":"Peng Lu, I. Kobyzev, Mehdi Rezagholizadeh, Ahmad Rashid, A. Ghodsi, P. Langlais","doi":"10.48550/arXiv.2212.05956","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05956","url":null,"abstract":"Knowledge Distillation (KD) is a commonly used technique for improving the generalization of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such methods impose the additional burden of training a separate teacher model for every new dataset. Alternatively, one may directly work on the improvement of the optimization procedure of the compact model toward better generalization. Recent works observe that the flatness of the local minimum correlates well with better generalization. In this work, we adapt Stochastic Weight Averaging (SWA), a method encouraging convergence to a flatter minimum, to fine-tuning PLMs. We conduct extensive experiments on various NLP tasks (text classification, question answering, and generation) and different model architectures and demonstrate that our adaptation improves the generalization without extra computation cost. Moreover, we observe that this simple optimization technique is able to outperform the state-of-the-art KD methods for compact models.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"158 1","pages":"4948-4954"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80612040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue 基于视频的对话的信息理论文本幻觉还原

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-12 DOI: 10.48550/arXiv.2212.05765

Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Changdong Yoo

Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.

基于视频的对话(VGD)旨在解码关于给定视频和对话上下文的问题的答案句子。尽管最近多模态推理在生成回答句方面取得了成功，但现有的对话系统仍然存在文本幻觉问题，这意味着在不理解问题的情况下，不加区分地从输入文本复制文本。这是由于从数据集中的回答句子通常包含输入文本的单词这一事实中学习到虚假相关性，因此VGD系统过度依赖于复制输入文本中的单词，希望这些单词与基本事实文本重叠。因此，我们设计了文本幻觉缓解(THAM)框架，该框架结合了由所提出的信息论文本幻觉测量方法产生的文本幻觉正则化(THR)损失。将THAM应用于当前的对话系统验证了VGD基准(即AVSD@DSTC7和AVSD@DSTC8)的有效性，并显示出增强的可解释性。

{"title":"Information-Theoretic Text Hallucination Reduction for Video-grounded Dialogue","authors":"Sunjae Yoon, Eunseop Yoon, Hee Suk Yoon, Junyeong Kim, Changdong Yoo","doi":"10.48550/arXiv.2212.05765","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05765","url":null,"abstract":"Video-grounded Dialogue (VGD) aims to decode an answer sentence to a question regarding a given video and dialogue context. Despite the recent success of multi-modal reasoning to generate answer sentences, existing dialogue systems still suffer from a text hallucination problem, which denotes indiscriminate text-copying from input texts without an understanding of the question. This is due to learning spurious correlations from the fact that answer sentences in the dataset usually include the words of input texts, thus the VGD system excessively relies on copying words from input texts by hoping those words to overlap with ground-truth texts. Hence, we design Text Hallucination Mitigating (THAM) framework, which incorporates Text Hallucination Regularization (THR) loss derived from the proposed information-theoretic text hallucination measurement approach. Applying THAM with current dialogue systems validates the effectiveness on VGD benchmarks (i.e., AVSD@DSTC7 and AVSD@DSTC8) and shows enhanced interpretability.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"1 1","pages":"4182-4193"},"PeriodicalIF":0.0,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90639904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification FastClass:一种时间效率的弱监督文本分类方法

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

Pub Date : 2022-12-11 DOI: 10.48550/arXiv.2212.05506

Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang

Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords.Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.

弱监督文本分类旨在仅使用类描述和未标记的数据来训练分类器。最近的研究表明，关键字驱动的方法可以在各种任务上达到最先进的性能。然而，这些方法不仅依赖于精心设计的类描述来获得特定于类的关键字，而且还需要大量未标记的数据，并且需要很长时间来训练。本文提出了一种高效的弱监督分类方法FastClass。它使用密集文本表示从外部未标记语料库中检索类相关文档，并选择最优子集来训练分类器。与关键字驱动的方法相比，我们的方法较少依赖于初始类描述，因为它不再需要将每个类描述扩展为一组特定于类的关键字。在广泛的分类任务上的实验表明，所提出的方法在分类精度方面经常优于关键词驱动模型，并且通常具有数量级的训练速度。

{"title":"FastClass: A Time-Efficient Approach to Weakly-Supervised Text Classification","authors":"Tingyu Xia, Yue Wang, Yuan Tian, Yi Chang","doi":"10.48550/arXiv.2212.05506","DOIUrl":"https://doi.org/10.48550/arXiv.2212.05506","url":null,"abstract":"Weakly-supervised text classification aims to train a classifier using only class descriptions and unlabeled data. Recent research shows that keyword-driven methods can achieve state-of-the-art performance on various tasks. However, these methods not only rely on carefully-crafted class descriptions to obtain class-specific keywords but also require substantial amount of unlabeled data and takes a long time to train. This paper proposes FastClass, an efficient weakly-supervised classification approach. It uses dense text representation to retrieve class-relevant documents from external unlabeled corpus and selects an optimal subset to train a classifier. Compared to keyword-driven methods, our approach is less reliant on initial class descriptions as it no longer needs to expand each class description into a set of class-specific keywords.Experiments on a wide range of classification tasks show that the proposed approach frequently outperforms keyword-driven models in terms of classification accuracy and often enjoys orders-of-magnitude faster training speed.","PeriodicalId":74540,"journal":{"name":"Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing","volume":"14 1","pages":"4746-4758"},"PeriodicalIF":0.0,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86119901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Conference on Empirical Methods in Natural Language Processing. Conference on Empirical Methods in Natural Language Processing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀