First Workshop on Insights from Negative Results in NLP最新文献

英文中文

Combining Extraction and Generation for Constructing Belief-Consequence Causal Links 结合提取与生成构建信念-结果因果关系

First Workshop on Insights from Negative Results in NLP

Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.insights-1.22

M. Alexeeva, Allegra A. Beal, M. Surdeanu

In this paper, we introduce and justify a new task—causal link extraction based on beliefs—and do a qualitative analysis of the ability of a large language model—InstructGPT-3—to generate implicit consequences of beliefs. With the language model-generated consequences being promising, but not consistent, we propose directions of future work, including data collection, explicit consequence extraction using rule-based and language modeling-based approaches, and using explicitly stated consequences of beliefs to fine-tune or prompt the language model to produce outputs suitable for the task.

在本文中，我们介绍并证明了一个新的任务-基于信念的因果联系提取-并对大型语言模型- instructgpt -3 -生成信念的隐式结果的能力进行了定性分析。由于语言模型生成的结果有希望，但不一致，我们提出了未来工作的方向，包括数据收集，使用基于规则和基于语言建模的方法显式结果提取，以及使用明确陈述的信念结果来微调或提示语言模型产生适合任务的输出。

引用次数: 1

Evaluating the Practical Utility of Confidence-score based Techniques for Unsupervised Open-world Classification 评估基于置信度分数的无监督开放世界分类技术的实际效用

First Workshop on Insights from Negative Results in NLP

Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.insights-1.3

Sopan Khosla, Rashmi Gangadharaiah

Open-world classification in dialog systems require models to detect open intents, while ensuring the quality of in-domain (ID) intent classification. In this work, we revisit methods that leverage distance-based statistics for unsupervised out-of-domain (OOD) detection. We show that despite their superior performance on threshold-independent metrics like AUROC on test-set, threshold values chosen based on the performance on a validation-set do not generalize well to the test-set, thus resulting in substantially lower performance on ID or OOD detection accuracy and F1-scores. Our analysis shows that this lack of generalizability can be successfully mitigated by setting aside a hold-out set from validation data for threshold selection (sometimes achieving relative gains as high as 100%). Extensive experiments on seven benchmark datasets show that this fix puts the performance of these methods at par with, or sometimes even better than, the current state-of-the-art OOD detection techniques.

对话系统中的开放世界分类要求模型检测开放意图，同时保证域内意图分类的质量。在这项工作中，我们重新审视了利用基于距离的统计进行无监督域外(OOD)检测的方法。我们表明，尽管它们在测试集上的AUROC等与阈值无关的指标上表现优异，但基于验证集上的性能选择的阈值并不能很好地推广到测试集，从而导致ID或OOD检测精度和f1分数的性能大大降低。我们的分析表明，通过从验证数据中留出一个保留集用于阈值选择(有时可以获得高达100%的相对增益)，可以成功地减轻这种泛化性的缺乏。在7个基准数据集上进行的大量实验表明，该修复使这些方法的性能与当前最先进的OOD检测技术相当，有时甚至更好。

引用次数: 4

Pre-trained language models evaluating themselves - A comparative study 预训练语言模型自我评估——一项比较研究

First Workshop on Insights from Negative Results in NLP

Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.insights-1.25

Philipp Koch, M. Aßenmacher, C. Heumann

Evaluating generated text received new attention with the introduction of model-based metrics in recent years. These new metrics have a higher correlation with human judgments and seemingly overcome many issues of previous n-gram based metrics from the symbolic age. In this work, we examine the recently introduced metrics BERTScore, BLEURT, NUBIA, MoverScore, and Mark-Evaluate (Petersen). We investigate their sensitivity to different types of semantic deterioration (part of speech drop and negation), word order perturbations, word drop, and the common problem of repetition. No metric showed appropriate behaviour for negation, and further none of them was overall sensitive to the other issues mentioned above.

近年来，随着基于模型的度量标准的引入，对生成文本的评估受到了新的关注。这些新指标与人类判断具有更高的相关性，并且似乎克服了符号时代以前基于n图的指标的许多问题。在这项工作中，我们研究了最近引入的指标BERTScore, BLEURT, NUBIA, MoverScore和Mark-Evaluate (Petersen)。我们研究了他们对不同类型的语义退化(词性缺失和否定)、词序扰动、词性缺失和常见的重复问题的敏感性。没有任何指标显示出适当的否定行为，而且它们对上述其他问题都不敏感。

引用次数: 1

Challenges in including extra-linguistic context in pre-trained language models 在预训练的语言模型中包含语言外语境的挑战

First Workshop on Insights from Negative Results in NLP

Pub Date : 1900-01-01 DOI: 10.18653/v1/2022.insights-1.18

Ionut-Teodor Sorodoc, Laura Aina, Gemma Boleda

To successfully account for language, computational models need to take into account both the linguistic context (the content of the utterances) and the extra-linguistic context (for instance, the participants in a dialogue). We focus on a referential task that asks models to link entity mentions in a TV show to the corresponding characters, and design an architecture that attempts to account for both kinds of context. In particular, our architecture combines a previously proposed specialized module (an “entity library”) for character representation with transfer learning from a pre-trained language model. We find that, although the model does improve linguistic contextualization, it fails to successfully integrate extra-linguistic information about the participants in the dialogue. Our work shows that it is very challenging to incorporate extra-linguistic information into pre-trained language models.

为了成功地解释语言，计算模型需要同时考虑语言上下文(话语的内容)和语言外上下文(例如，对话中的参与者)。我们专注于一个参考任务，该任务要求模型将电视节目中的实体提及与相应的角色联系起来，并设计一个试图解释这两种上下文的体系结构。特别是，我们的架构结合了先前提出的用于字符表示的专门模块(“实体库”)和从预训练的语言模型迁移学习。我们发现，尽管该模型确实改善了语言语境化，但它未能成功地整合对话参与者的语言外信息。我们的工作表明，将语言外的信息纳入预训练的语言模型是非常具有挑战性的。

引用次数: 0

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

First Workshop on Insights from Negative Results in NLP

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀