First Workshop on Insights from Negative Results in NLP最新文献

英文中文

Do Transformers Dream of Inference, or Can Pretrained Generative Models Learn Implicit Inferential Rules? 变形金刚梦想推理，还是预训练生成模型可以学习隐式推理规则?

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.12

Zhengzhong Liang, M. Surdeanu

Large pretrained language models (LM) have been used successfully for multi-hop question answering. However, most of these directions are not interpretable, as they do not make the inference hops necessary to explain a candidate answer explicitly. In this work, we investigate the capability of a state-of-the-art transformer LM to generate explicit inference hops, i.e., to infer a new statement necessary to answer a question given some premise input statements. Our analysis shows that such LMs can generate new statements for some simple inference types, but performance remains poor for complex, real-world inference types such as those that require monotonicity, composition, and commonsense knowledge.

大型预训练语言模型(LM)已成功用于多跳问答。然而，这些指示中的大多数是不可解释的，因为它们没有做出明确解释候选答案所需的推断跳。在这项工作中，我们研究了最先进的变压器LM生成显式推理跳的能力，即，在给定一些前提输入语句的情况下，推断出回答问题所需的新语句。我们的分析表明，这样的lm可以为一些简单的推理类型生成新的语句，但是对于复杂的、现实世界的推理类型，比如那些需要单调性、组合和常识的推理类型，性能仍然很差。

引用次数: 1

Domain adaptation challenges of BERT in tokenization and sub-word representations of Out-of-Vocabulary words BERT在词汇外词的标记化和子词表示中的领域适应挑战

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.1

Anmol Nayak, Hariprasad Timmapathini, Karthikeyan Ponnalagu, Vijendran Gopalan Venkoparao

BERT model (Devlin et al., 2019) has achieved significant progress in several Natural Language Processing (NLP) tasks by leveraging the multi-head self-attention mechanism (Vaswani et al., 2017) in its architecture. However, it still has several research challenges which are not tackled well for domain specific corpus found in industries. In this paper, we have highlighted these problems through detailed experiments involving analysis of the attention scores and dynamic word embeddings with the BERT-Base-Uncased model. Our experiments have lead to interesting findings that showed: 1) Largest substring from the left that is found in the vocabulary (in-vocab) is always chosen at every sub-word unit that can lead to suboptimal tokenization choices, 2) Semantic meaning of a vocabulary word deteriorates when found as a substring in an Out-Of-Vocabulary (OOV) word, and 3) Minor misspellings in words are inadequately handled. We believe that if these challenges are tackled, it will significantly help the domain adaptation aspect of BERT.

BERT模型(Devlin et al.， 2019)通过在其架构中利用多头自注意机制(Vaswani et al.， 2017)，在几个自然语言处理(NLP)任务中取得了重大进展。然而，对于工业中发现的特定领域语料库，它仍然存在一些研究难题，没有得到很好的解决。在本文中，我们通过详细的实验，包括使用bert - base - uncase模型分析注意力得分和动态词嵌入，突出了这些问题。我们的实验得出了一些有趣的发现:1)词汇表(in-vocab)中左侧最大的子字符串总是在每个子词单元上选择，这可能导致次优的标记化选择;2)当在词汇表外(OOV)单词中发现子字符串时，词汇表单词的语义会恶化;3)单词中的轻微拼写错误处理不当。我们相信，如果这些挑战得到解决，将对BERT的领域适应方面有很大的帮助。

{"title":"Domain adaptation challenges of BERT in tokenization and sub-word representations of Out-of-Vocabulary words","authors":"Anmol Nayak, Hariprasad Timmapathini, Karthikeyan Ponnalagu, Vijendran Gopalan Venkoparao","doi":"10.18653/v1/2020.insights-1.1","DOIUrl":"https://doi.org/10.18653/v1/2020.insights-1.1","url":null,"abstract":"BERT model (Devlin et al., 2019) has achieved significant progress in several Natural Language Processing (NLP) tasks by leveraging the multi-head self-attention mechanism (Vaswani et al., 2017) in its architecture. However, it still has several research challenges which are not tackled well for domain specific corpus found in industries. In this paper, we have highlighted these problems through detailed experiments involving analysis of the attention scores and dynamic word embeddings with the BERT-Base-Uncased model. Our experiments have lead to interesting findings that showed: 1) Largest substring from the left that is found in the vocabulary (in-vocab) is always chosen at every sub-word unit that can lead to suboptimal tokenization choices, 2) Semantic meaning of a vocabulary word deteriorates when found as a substring in an Out-Of-Vocabulary (OOV) word, and 3) Minor misspellings in words are inadequately handled. We believe that if these challenges are tackled, it will significantly help the domain adaptation aspect of BERT.","PeriodicalId":441528,"journal":{"name":"First Workshop on Insights from Negative Results in NLP","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127365426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

Label Propagation-Based Semi-Supervised Learning for Hate Speech Classification 基于标签传播的半监督学习仇恨言论分类

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.8

Ashwin Geet D'Sa, I. Illina, D. Fohr, D. Klakow, Dana Ruiter

Research on hate speech classification has received increased attention. In real-life scenarios, a small amount of labeled hate speech data is available to train a reliable classifier. Semi-supervised learning takes advantage of a small amount of labeled data and a large amount of unlabeled data. In this paper, label propagation-based semi-supervised learning is explored for the task of hate speech classification. The quality of labeling the unlabeled set depends on the input representations. In this work, we show that pre-trained representations are label agnostic, and when used with label propagation yield poor results. Neural network-based fine-tuning can be adopted to learn task-specific representations using a small amount of labeled data. We show that fully fine-tuned representations may not always be the best representations for the label propagation and intermediate representations may perform better in a semi-supervised setup.

仇恨言论分类的研究越来越受到人们的关注。在现实场景中，少量标记的仇恨言论数据可用于训练可靠的分类器。半监督学习利用了少量的标记数据和大量的未标记数据。本文探讨了基于标签传播的半监督学习方法在仇恨言论分类中的应用。标记未标记集的质量取决于输入表示。在这项工作中，我们表明预训练的表示是标签不可知论的，当与标签传播一起使用时，结果很差。基于神经网络的微调可以使用少量标记数据来学习特定于任务的表示。我们表明，完全微调的表示可能并不总是标签传播的最佳表示，中间表示可能在半监督设置中表现更好。

引用次数: 6

Can Knowledge Graph Embeddings Tell Us What Fact-checked Claims Are About? 知识图谱嵌入能告诉我们事实核查的声明是关于什么的吗?

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.11

Valentina Beretta, S. Harispe, K. Boland, Luke Lo Seen, Konstantin Todorov, Andon Tchechmedjiev

The web offers a wealth of discourse data that help researchers from various fields analyze debates about current societal issues and gauge the effects on society of important phenomena such as misinformation spread. Such analyses often revolve around claims made by people about a given topic of interest. Fact-checking portals offer partially structured information that can assist such analysis. However, exploiting the network structure of such online discourse data is as of yet under-explored. We study the effectiveness of using neural-graph embedding features for claim topic prediction and their complementarity with text embeddings. We show that graph embeddings are modestly complementary with text embeddings, but the low performance of graph embedding features alone indicate that the model fails to capture topological features pertinent of the topic prediction task.

网络提供了丰富的话语数据，帮助来自不同领域的研究人员分析当前社会问题的争论，并衡量诸如错误信息传播等重要现象对社会的影响。这类分析通常围绕着人们对某一特定话题感兴趣的观点展开。事实核查门户提供了部分结构化的信息，可以帮助进行此类分析。然而，利用这些在线话语数据的网络结构尚未得到充分的探索。我们研究了使用神经图嵌入特征进行索赔主题预测的有效性及其与文本嵌入的互补性。我们发现图嵌入与文本嵌入是适度互补的，但图嵌入特征本身的低性能表明该模型无法捕获与主题预测任务相关的拓扑特征。

引用次数: 1

How Effectively Can Machines Defend Against Machine-Generated Fake News? An Empirical Study 机器如何有效防御机器生成的假新闻?实证研究

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.7

Meghana Moorthy Bhat, S. Parthasarathy

We empirically study the effectiveness of machine-generated fake news detectors by understanding the model’s sensitivity to different synthetic perturbations during test time. The current machine-generated fake news detectors rely on provenance to determine the veracity of news. Our experiments find that the success of these detectors can be limited since they are rarely sensitive to semantic perturbations and are very sensitive to syntactic perturbations. Also, we would like to open-source our code and believe it could be a useful diagnostic tool for evaluating models aimed at fighting machine-generated fake news.

通过了解模型在测试期间对不同合成扰动的敏感性，我们对机器生成假新闻检测器的有效性进行了实证研究。目前机器生成的假新闻检测器依赖于来源来确定新闻的真实性。我们的实验发现，这些检测器的成功是有限的，因为它们很少对语义扰动敏感，而对句法扰动非常敏感。此外，我们希望开源我们的代码，并相信它可能是一个有用的诊断工具，用于评估旨在打击机器生成假新闻的模型。

引用次数: 12

An Analysis of Capsule Networks for Part of Speech Tagging in High- and Low-resource Scenarios 高低资源情景下词性标注的胶囊网络分析

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.10

Andrew Zupon, Faiz Rafique, M. Surdeanu

Neural networks are a common tool in NLP, but it is not always clear which architecture to use for a given task. Different tasks, different languages, and different training conditions can all affect how a neural network will perform. Capsule Networks (CapsNets) are a relatively new architecture in NLP. Due to their novelty, CapsNets are being used more and more in NLP tasks. However, their usefulness is still mostly untested.In this paper, we compare three neural network architectures—LSTM, CNN, and CapsNet—on a part of speech tagging task. We compare these architectures in both high- and low-resource training conditions and find that no architecture consistently performs the best. Our analysis shows that our CapsNet performs nearly as well as a more complex LSTM under certain training conditions, but not others, and that our CapsNet almost always outperforms our CNN. We also find that our CapsNet implementation shows faster prediction times than the LSTM for Scottish Gaelic but not for Spanish, highlighting the effect that the choice of languages can have on the models.

神经网络是NLP中的常用工具，但对于给定的任务，使用哪种架构并不总是很清楚。不同的任务、不同的语言和不同的训练条件都会影响神经网络的表现。胶囊网络(CapsNets)是自然语言处理中一个相对较新的体系结构。由于其新颖性，capnet在NLP任务中的应用越来越多。然而，它们的效用大部分仍未经检验。在本文中，我们比较了lstm、CNN和capsnet三种神经网络架构在词性标注任务中的应用。我们在高资源和低资源的训练条件下比较了这些体系结构，发现没有一个体系结构始终表现最好。我们的分析表明，在某些训练条件下，我们的CapsNet的表现几乎与更复杂的LSTM一样好，但在其他条件下则不然，而且我们的CapsNet几乎总是优于我们的CNN。我们还发现，对于苏格兰盖尔语，我们的CapsNet实现显示出比LSTM更快的预测时间，但对于西班牙语则不然，这突出了语言选择对模型的影响。

引用次数: 2

Layout-Aware Text Representations Harm Clustering Documents by Type 布局感知文本表示损害按类型聚类文档

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.9

Catherine Finegan-Dollak, Ashish Verma

Clustering documents by type—grouping invoices with invoices and articles with articles—is a desirable first step for organizing large collections of document scans. Humans approaching this task use both the semantics of the text and the document layout to assist in grouping like documents. LayoutLM (Xu et al., 2019), a layout-aware transformer built on top of BERT with state-of-the-art performance on document-type classification, could reasonably be expected to outperform regular BERT (Devlin et al., 2018) for document-type clustering. However, we find experimentally that BERT significantly outperforms LayoutLM on this task (p <0.001). We analyze clusters to show where layout awareness is an asset and where it is a liability.

按类型对文档进行聚类——对发票和文章进行分组，对文章和发票进行分组——是组织大量文档扫描集的理想的第一步。处理此任务的人使用文本的语义和文档布局来帮助对文档进行分组。LayoutLM (Xu et al.， 2019)是一个基于BERT的布局感知转换器，在文档类型分类方面具有最先进的性能，可以合理地预期在文档类型聚类方面优于常规BERT (Devlin et al.， 2018)。然而，我们在实验中发现BERT在这个任务上明显优于LayoutLM (p <0.001)。我们对集群进行分析，以显示布局感知在哪里是一项资产，在哪里是一项负担。

引用次数: 4

Q. Can Knowledge Graphs be used to Answer Boolean Questions? A. It’s complicated! 知识图谱可以用来回答布尔问题吗?这很复杂!

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-11-01 DOI: 10.18653/v1/2020.insights-1.2

Daria Dzendzik, Carl Vogel, Jennifer Foster

In this paper we explore the problem of machine reading comprehension, focusing on the BoolQ dataset of Yes/No questions. We carry out an error analysis of a BERT-based machine reading comprehension model on this dataset, revealing issues such as unstable model behaviour and some noise within the dataset itself. We then experiment with two approaches for integrating information from knowledge graphs: (i) concatenating knowledge graph triples to text passages and (ii) encoding knowledge with a Graph Neural Network. Neither of these approaches show a clear improvement and we hypothesize that this may be due to a combination of inaccuracies in the knowledge graph, imprecision in entity linking, and the models’ inability to capture additional information from knowledge graphs.

在本文中，我们探讨了机器阅读理解问题，重点关注是/否问题的BoolQ数据集。我们在该数据集上对基于bert的机器阅读理解模型进行了误差分析，揭示了模型行为不稳定和数据集本身存在一些噪声等问题。然后，我们尝试了两种方法来整合来自知识图的信息:(i)将知识图三元组连接到文本段落;(ii)使用图神经网络对知识进行编码。这两种方法都没有显示出明显的改进，我们假设这可能是由于知识图的不准确性，实体链接的不准确性以及模型无法从知识图中捕获额外信息的组合。

引用次数: 2

The Extraordinary Failure of Complement Coercion Crowdsourcing 互补强制众包的巨大失败

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-10-12 DOI: 10.18653/v1/2020.insights-1.17

Yanai Elazar, Victoria Basmov, Shauli Ravfogel, Yoav Goldberg, Reut Tsarfaty

Crowdsourcing has eased and scaled up the collection of linguistic annotation in recent years. In this work, we follow known methodologies of collecting labeled data for the complement coercion phenomenon. These are constructions with an implied action — e.g., “I started a new book I bought last week”, where the implied action is reading. We aim to collect annotated data for this phenomenon by reducing it to either of two known tasks: Explicit Completion and Natural Language Inference. However, in both cases, crowdsourcing resulted in low agreement scores, even though we followed the same methodologies as in previous work. Why does the same process fail to yield high agreement scores? We specify our modeling schemes, highlight the differences with previous work and provide some insights about the task and possible explanations for the failure. We conclude that specific phenomena require tailored solutions, not only in specialized algorithms, but also in data collection methods.

近年来，众包简化并扩大了语言注释的收集。在这项工作中，我们遵循已知的方法来收集补语强制现象的标记数据。这些都是带有暗示动作的结构，例如，“我开始看我上周买的一本新书”，暗示动作是阅读。我们的目标是通过将其简化为两个已知任务中的任何一个来收集这种现象的注释数据:显式完成和自然语言推理。然而，在这两种情况下，众包导致了较低的协议得分，即使我们遵循了与之前工作相同的方法。为什么同样的过程不能产生高的一致性分数?我们详细说明了我们的建模方案，强调了与以前工作的不同之处，并提供了一些关于任务和失败的可能解释的见解。我们的结论是，特定的现象需要量身定制的解决方案，不仅在专门的算法上，而且在数据收集方法上。

引用次数: 6

On Task-Level Dialogue Composition of Generative Transformer Model 生成式变压器模型的任务级对话组成研究

First Workshop on Insights from Negative Results in NLP

Pub Date : 2020-10-09 DOI: 10.18653/v1/2020.insights-1.6

Prasanna Parthasarathi, Arvind Neelakantan, Sharan Narang

Task-oriented dialogue systems help users accomplish tasks such as booking a movie ticket and ordering food via conversation. Generative models parameterized by a deep neural network are widely used for next turn response generation in such systems. It is natural for users of the system to want to accomplish multiple tasks within the same conversation, but the ability of generative models to compose multiple tasks is not well studied. In this work, we begin by studying the effect of training human-human task-oriented dialogues towards improving the ability to compose multiple tasks on Transformer generative models. To that end, we propose and explore two solutions: (1) creating synthetic multiple task dialogue data for training from human-human single task dialogue and (2) forcing the encoder representation to be invariant to single and multiple task dialogues using an auxiliary loss. The results from our experiments highlight the difficulty of even the sophisticated variant of transformer model in learning to compose multiple tasks from single task dialogues.

面向任务的对话系统帮助用户通过对话完成预定电影票和订餐等任务。基于深度神经网络参数化的生成模型被广泛用于此类系统的下一回合响应生成。对于系统的用户来说，想要在同一个会话中完成多个任务是很自然的，但是生成模型组合多个任务的能力还没有得到很好的研究。在这项工作中，我们首先研究了训练面向任务的人机对话对提高Transformer生成模型上组合多个任务的能力的影响。为此，我们提出并探索了两种解决方案:(1)从人-人单任务对话中创建用于训练的合成多任务对话数据;(2)使用辅助损失强制编码器表示对单任务和多任务对话保持不变。我们的实验结果表明，即使是变压器模型的复杂变体，在学习从单一任务对话中组合多个任务时也很困难。

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

First Workshop on Insights from Negative Results in NLP

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀