AACL Bioflux最新文献

英文中文

Underspecification in Scene Description-to-Depiction Tasks 场景描述到描述任务的规格不足

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05815

B. Hutchinson, Jason Baldridge, Vinodkumar Prabhakaran

Questions regarding implicitness, ambiguity and underspecification are crucial for understanding the task validity and ethical concerns of multimodal image+text systems, yet have received little attention to date. This position paper maps out a conceptual framework to address this gap, focusing on systems which generate images depicting scenes from scene descriptions. In doing so, we account for how texts and images convey meaning differently. We outline a set of core challenges concerning textual and visual ambiguity, as well as risks that may be amplified by ambiguous and underspecified elements. We propose and discuss strategies for addressing these challenges, including generating visually ambiguous images, and generating a set of diverse images.

关于隐含性、模糊性和不规范的问题对于理解多模态图像+文本系统的任务有效性和伦理问题至关重要，但迄今为止却很少受到关注。本文提出了一个解决这一差距的概念框架，重点关注从场景描述中生成描绘场景的图像的系统。在这样做的过程中，我们解释了文本和图像如何传达不同的含义。我们概述了一系列关于文本和视觉模糊性的核心挑战，以及可能被模糊和未明确的元素放大的风险。我们提出并讨论了解决这些挑战的策略，包括生成视觉上模糊的图像，以及生成一组不同的图像。

引用次数: 21

CSS: Combining Self-training and Self-supervised Learning for Few-shot Dialogue State Tracking CSS:结合自我训练和自我监督学习的少镜头对话状态跟踪

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05146

Haoning Zhang, Junwei Bao, Haipeng Sun, Huaishao Luo, Wenye Li, Shuguang Cui

Few-shot dialogue state tracking (DST) is a realistic problem that trains the DST model with limited labeled data. Existing few-shot methods mainly transfer knowledge learned from external labeled dialogue data (e.g., from question answering, dialogue summarization, machine reading comprehension tasks, etc.) into DST, whereas collecting a large amount of external labeled data is laborious, and the external data may not effectively contribute to the DST-specific task. In this paper, we propose a few-shot DST framework called CSS, which Combines Self-training and Self-supervised learning methods. The unlabeled data of the DST task is incorporated into the self-training iterations, where the pseudo labels are predicted by a DST model trained on limited labeled data in advance. Besides, a contrastive self-supervised method is used to learn better representations, where the data is augmented by the dropout operation to train the model. Experimental results on the MultiWOZ dataset show that our proposed CSS achieves competitive performance in several few-shot scenarios.

少镜头对话状态跟踪(DST)是在有限的标记数据下训练DST模型的一个现实问题。现有的少数shot方法主要是将从外部标记的对话数据(如问答、对话摘要、机器阅读理解任务等)中学习到的知识转移到DST中，而收集大量的外部标记数据非常费力，并且外部数据可能无法有效地为特定于DST的任务做出贡献。本文提出了一种集自训练和自监督学习于一体的分级学习框架CSS。DST任务的未标记数据被合并到自训练迭代中，其中伪标签由预先在有限标记数据上训练的DST模型预测。此外，还使用了一种对比自监督方法来学习更好的表示，其中通过dropout操作增强数据以训练模型。在MultiWOZ数据集上的实验结果表明，我们提出的CSS在几个小镜头场景中取得了具有竞争力的性能。

引用次数: 1

BanglaParaphrase: A High-Quality Bangla Paraphrase Dataset 高质量孟加拉语释义数据集

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-11 DOI: 10.48550/arXiv.2210.05109

Ajwad Akil, Najrin Sultana, Abhik Bhattacharjee, Rifat Shahriyar

In this work, we present BanglaParaphrase, a high-quality synthetic Bangla Paraphrase dataset curated by a novel filtering pipeline. We aim to take a step towards alleviating the low resource status of the Bangla language in the NLP domain through the introduction of BanglaParaphrase, which ensures quality by preserving both semantics and diversity, making it particularly useful to enhance other Bangla datasets. We show a detailed comparative analysis between our dataset and models trained on it with other existing works to establish the viability of our synthetic paraphrase data generation pipeline. We are making the dataset and models publicly available at https://github.com/csebuetnlp/banglaparaphrase to further the state of Bangla NLP.

在这项工作中，我们提出了bang腹腔镜短语，这是一个高质量的合成孟加拉语释义数据集，由一个新的过滤管道管理。我们的目标是通过引入bang腹腔镜短语来缓解孟加拉语在自然语言处理领域的低资源状态，通过保留语义和多样性来确保质量，使其对增强其他孟加拉语数据集特别有用。我们展示了我们的数据集和在其上训练的模型与其他现有作品之间的详细比较分析，以建立我们的合成释义数据生成管道的可行性。我们正在将数据集和模型在https://github.com/csebuetnlp/banglaparaphrase上公开，以进一步发展孟加拉国的自然语言处理。

引用次数: 4

Transformer-based Localization from Embodied Dialog with Large-scale Pre-training 基于变压器的大规模预训练嵌入对话定位

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04864

Meera Hahn, James M. Rehg

We address the challenging task of Localization via Embodied Dialog (LED). Given a dialog from two agents, an Observer navigating through an unknown environment and a Locator who is attempting to identify the Observer’s location, the goal is to predict the Observer’s final location in a map. We develop a novel LED-Bert architecture and present an effective pretraining strategy. We show that a graph-based scene representation is more effective than the top-down 2D maps used in prior works. Our approach outperforms previous baselines.

我们通过嵌入式对话(LED)解决了本地化的挑战性任务。给定来自两个代理的对话，一个是在未知环境中导航的观察者，另一个是试图识别观察者位置的定位器，目标是预测观察者在地图上的最终位置。我们开发了一种新的LED-Bert架构，并提出了一种有效的预训练策略。我们表明，基于图形的场景表示比先前工作中使用的自上而下的2D地图更有效。我们的方法优于以前的基线。

引用次数: 1

CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media CrowdChecked:检测社交媒体中先前经过事实核查的言论

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-10 DOI: 10.48550/arXiv.2210.04447

Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry I. Ilvovsky, Preslav Nakov

While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible approach as people trust manual fact-checking, and as many claims are repeated multiple times. Yet, a major issue when building such systems is the small number of known tweet–verifying article pairs available for training. Here, we aim to bridge this gap by making use of crowd fact-checking, i.e., mining claims in social media for which users have responded with a link to a fact-checking article. In particular, we mine a large-scale collection of 330,000 tweets paired with a corresponding fact-checking article. We further propose an end-to-end framework to learn from this noisy data based on modified self-adaptive training, in a distant supervision scenario. Our experiments on the CLEF’21 CheckThat! test set show improvements over the state of the art by two points absolute. Our code and datasets are available at https://github.com/mhardalov/crowdchecked-claims

虽然在开发自动化事实核查系统方面取得了实质性进展，但在用户眼中，它们仍然缺乏可信度。因此，出现了一种有趣的方法:通过验证输入声明先前是否已由专业事实检查员进行事实检查来执行自动事实检查，并返回一篇解释其决定的文章。这是一种明智的方法，因为人们信任人工事实核查，而且许多说法被重复了多次。然而，在构建这样的系统时，一个主要问题是可供训练的已知推文验证文章对数量很少。在这里，我们的目标是通过使用群体事实核查来弥合这一差距，即在社交媒体上挖掘用户回复事实核查文章链接的声明。特别是，我们挖掘了33万条tweet的大规模集合，并与相应的事实核查文章配对。我们进一步提出了一个端到端框架，在远程监督场景中，基于改进的自适应训练从这些噪声数据中学习。我们在CLEF ' 21上的实验测试集显示，与目前的技术水平相比，进步了绝对两点。我们的代码和数据集可在https://github.com/mhardalov/crowdchecked-claims上获得

{"title":"CrowdChecked: Detecting Previously Fact-Checked Claims in Social Media","authors":"Momchil Hardalov, Anton Chernyavskiy, Ivan Koychev, Dmitry I. Ilvovsky, Preslav Nakov","doi":"10.48550/arXiv.2210.04447","DOIUrl":"https://doi.org/10.48550/arXiv.2210.04447","url":null,"abstract":"While there has been substantial progress in developing systems to automate fact-checking, they still lack credibility in the eyes of the users. Thus, an interesting approach has emerged: to perform automatic fact-checking by verifying whether an input claim has been previously fact-checked by professional fact-checkers and to return back an article that explains their decision. This is a sensible approach as people trust manual fact-checking, and as many claims are repeated multiple times. Yet, a major issue when building such systems is the small number of known tweet–verifying article pairs available for training. Here, we aim to bridge this gap by making use of crowd fact-checking, i.e., mining claims in social media for which users have responded with a link to a fact-checking article. In particular, we mine a large-scale collection of 330,000 tweets paired with a corresponding fact-checking article. We further propose an end-to-end framework to learn from this noisy data based on modified self-adaptive training, in a distant supervision scenario. Our experiments on the CLEF’21 CheckThat! test set show improvements over the state of the art by two points absolute. Our code and datasets are available at https://github.com/mhardalov/crowdchecked-claims","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"49 1","pages":"266-285"},"PeriodicalIF":0.0,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73628095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts Twitter中的命名实体识别:短期时间变化的数据集和分析

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03797

Asahi Ushio, Leonardo Neves, V'itor Silva, Francesco Barbieri, José Camacho-Collados

Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021. The dataset was constructed by carefully distributing the tweets over time and taking representative trends as a basis. Along with the dataset, we provide a set of language model baselines and perform an analysis on the language model performance on the task, especially analyzing the impact of different time periods. In particular, we focus on three important temporal aspects in our analysis: short-term degradation of NER models over time, strategies to fine-tune a language model over different periods, and self-labeling as an alternative to lack of recently-labeled data. TweetNER7 is released publicly (https://huggingface.co/datasets/tner/tweetner7) along with the models fine-tuned on it (NER models have been integrated into TweetNLP and can be found at https://github.com/asahi417/tner/tree/master/examples/tweetner7_paper).

语言模型预训练的最新进展导致了命名实体识别(NER)的重要改进。尽管如此，这种进步主要是在格式良好的文档中进行测试的，比如新闻、维基百科或科学文章。在社交媒体中，情况是不同的，由于其嘈杂和动态的性质，它增加了另一层复杂性。在本文中，我们关注最大的社交媒体平台之一Twitter中的NER，并构建了一个新的NER数据集TweetNER7，该数据集包含7种实体类型，标注了2019年9月至2021年8月的11,382条推文。该数据集是通过仔细分布推文并以代表性趋势为基础构建的。与数据集一起，我们提供了一组语言模型基线，并对语言模型在任务上的性能进行了分析，特别是分析了不同时间段的影响。在我们的分析中，我们特别关注了三个重要的时间方面:随着时间的推移，NER模型的短期退化，在不同时期微调语言模型的策略，以及作为缺乏最近标记数据的替代方法的自标记。TweetNER7是公开发布的(https://huggingface.co/datasets/tner/tweetner7)，同时还发布了经过微调的模型(NER模型已经集成到TweetNLP中，可以在https://github.com/asahi417/tner/tree/master/examples/tweetner7_paper上找到)。

{"title":"Named Entity Recognition in Twitter: A Dataset and Analysis on Short-Term Temporal Shifts","authors":"Asahi Ushio, Leonardo Neves, V'itor Silva, Francesco Barbieri, José Camacho-Collados","doi":"10.48550/arXiv.2210.03797","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03797","url":null,"abstract":"Recent progress in language model pre-training has led to important improvements in Named Entity Recognition (NER). Nonetheless, this progress has been mainly tested in well-formatted documents such as news, Wikipedia, or scientific articles. In social media the landscape is different, in which it adds another layer of complexity due to its noisy and dynamic nature. In this paper, we focus on NER in Twitter, one of the largest social media platforms, and construct a new NER dataset, TweetNER7, which contains seven entity types annotated over 11,382 tweets from September 2019 to August 2021. The dataset was constructed by carefully distributing the tweets over time and taking representative trends as a basis. Along with the dataset, we provide a set of language model baselines and perform an analysis on the language model performance on the task, especially analyzing the impact of different time periods. In particular, we focus on three important temporal aspects in our analysis: short-term degradation of NER models over time, strategies to fine-tune a language model over different periods, and self-labeling as an alternative to lack of recently-labeled data. TweetNER7 is released publicly (https://huggingface.co/datasets/tner/tweetner7) along with the models fine-tuned on it (NER models have been integrated into TweetNLP and can be found at https://github.com/asahi417/tner/tree/master/examples/tweetner7_paper).","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"2 1","pages":"309-319"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84424063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

The Lifecycle of “Facts”: A Survey of Social Bias in Knowledge Graphs “事实”的生命周期:知识图谱中的社会偏见调查

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03353

Angelie Kraft, Ricardo Usbeck

Knowledge graphs are increasingly used in a plethora of downstream tasks or in the augmentation of statistical models to improve factuality. However, social biases are engraved in these representations and propagate downstream. We conducted a critical analysis of literature concerning biases at different steps of a knowledge graph lifecycle. We investigated factors introducing bias, as well as the biases that are rendered by knowledge graphs and their embedded versions afterward. Limitations of existing measurement and mitigation strategies are discussed and paths forward are proposed.

知识图越来越多地用于大量的下游任务或用于增强统计模型以提高事实性。然而，社会偏见被刻在这些表征中，并向下游传播。我们对有关知识图谱生命周期不同阶段偏差的文献进行了批判性分析。我们研究了引入偏差的因素，以及由知识图及其嵌入版本呈现的偏差。讨论了现有测量和缓解战略的局限性，并提出了前进的道路。

引用次数: 3

Missing Modality meets Meta Sampling (M3S): An Efficient Universal Approach for Multimodal Sentiment Analysis with Missing Modality 情态缺失与元抽样(M3S):一种有效的通用的情态缺失多模态情感分析方法

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03428

Haozhe Chi, Minghua Yang, Junhao Zhu, Guanhong Wang, Gaoang Wang

Multimodal sentiment analysis (MSA) is an important way of observing mental activities with the help of data captured from multiple modalities. However, due to the recording or transmission error, some modalities may include incomplete data. Most existing works that address missing modalities usually assume a particular modality is completely missing and seldom consider a mixture of missing across multiple modalities. In this paper, we propose a simple yet effective meta-sampling approach for multimodal sentiment analysis with missing modalities, namely Missing Modality-based Meta Sampling (M3S). To be specific, M3S formulates a missing modality sampling strategy into the modal agnostic meta-learning (MAML) framework. M3S can be treated as an efficient add-on training component on existing models and significantly improve their performances on multimodal data with a mixture of missing modalities. We conduct experiments on IEMOCAP, SIMS and CMU-MOSI datasets, and superior performance is achieved compared with recent state-of-the-art methods.

多模态情感分析是利用多模态数据来观察心理活动的一种重要方法。然而，由于记录或传输错误，某些模式可能包含不完整的数据。大多数解决缺失模态的现有工作通常假设一个特定的模态完全缺失，很少考虑多种模态的混合缺失。在本文中，我们提出了一种简单而有效的元抽样方法，用于缺失模态的多模态情感分析，即基于缺失模态的元抽样(M3S)。具体来说，M3S将缺失模态采样策略纳入模态不可知论元学习(MAML)框架。M3S可以作为现有模型的有效附加训练组件，并显着提高其在具有混合缺失模态的多模态数据上的性能。我们在IEMOCAP, SIMS和CMU-MOSI数据集上进行了实验，与最近最先进的方法相比，取得了卓越的性能。

引用次数: 0

Hate Speech and Offensive Language Detection in Bengali 孟加拉语中的仇恨言论和攻击性语言检测

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-07 DOI: 10.48550/arXiv.2210.03479

Mithun Das, Somnath Banerjee, Punyajoy Saha, Animesh Mukherjee

Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research’s limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others.

社交媒体经常成为滋生各种仇恨和冒犯性内容的温床。在一个没有偏见的社会中，识别社交媒体上的此类内容至关重要，因为它会对种族、性别或宗教产生影响。然而，尽管在英语仇恨言论检测方面有广泛的研究，但在孟加拉语等低资源语言的仇恨内容检测方面仍存在空白。此外，社交媒体上目前的一个趋势是使用罗马化的孟加拉语进行日常互动。为了克服现有研究的局限性，在本研究中，我们开发了一个由5K实际和5K罗马化孟加拉文推文组成的10K孟加拉文注释数据集。我们实施了几个基线模型来对这些仇恨帖子进行分类。我们进一步探索语际迁移机制，以提高分类性能。最后，我们通过查看模型的错误分类帖子进行了深入的错误分析。在分别训练实际数据集和罗马化数据集时，我们观察到XLM-Roberta表现最好。此外，我们看到在联合训练和少射训练中，MuRIL通过更好地解释语义表达式而优于其他模型。我们将代码和数据集公开给其他人。

{"title":"Hate Speech and Offensive Language Detection in Bengali","authors":"Mithun Das, Somnath Banerjee, Punyajoy Saha, Animesh Mukherjee","doi":"10.48550/arXiv.2210.03479","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03479","url":null,"abstract":"Social media often serves as a breeding ground for various hateful and offensive content. Identifying such content on social media is crucial due to its impact on the race, gender, or religion in an unprejudiced society. However, while there is extensive research in hate speech detection in English, there is a gap in hateful content detection in low-resource languages like Bengali. Besides, a current trend on social media is the use of Romanized Bengali for regular interactions. To overcome the existing research’s limitations, in this study, we develop an annotated dataset of 10K Bengali posts consisting of 5K actual and 5K Romanized Bengali tweets. We implement several baseline models for the classification of such hateful posts. We further explore the interlingual transfer mechanism to boost classification performance. Finally, we perform an in-depth error analysis by looking into the misclassified posts by the models. While training actual and Romanized datasets separately, we observe that XLM-Roberta performs the best. Further, we witness that on joint training and few-shot training, MuRIL outperforms other models by interpreting the semantic expressions better. We make our code and dataset public for others.","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"20 1","pages":"286-296"},"PeriodicalIF":0.0,"publicationDate":"2022-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78374235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation 不是另一个否定基准:子条款否定的NaN-NLI测试套件

Q3 Environmental Science

AACL Bioflux

Pub Date : 2022-10-06 DOI: 10.48550/arXiv.2210.03256

Thinh Hung Truong, Yulia Otmakhova, Tim Baldwin, Trevor Cohn, Karin M. Verspoor, Jey Han Lau

Negation is poorly captured by current language models, although the extent of this problem is not widely understood. We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods, with the aim of understanding sub-clausal negation. The test suite contains premise–hypothesis pairs where the premise contains sub-clausal negation and the hypothesis is constructed by making minimal modifications to the premise in order to reflect different possible interpretations. Aside from adopting standard NLI labels, our test suite is systematically constructed under a rigorous linguistic framework. It includes annotation of negation types and constructions grounded in linguistic theory, as well as the operations used to construct hypotheses. This facilitates fine-grained analysis of model performance. We conduct experiments using pre-trained language models to demonstrate that our test suite is more challenging than existing benchmarks focused on negation, and show how our annotation supports a deeper understanding of the current NLI capabilities in terms of negation and quantification.

目前的语言模型很难捕捉到否定，尽管这个问题的程度还没有被广泛理解。我们引入了一个自然语言推理(NLI)测试套件，以探测NLP方法的能力，目的是理解子条款否定。测试套件包含前提-假设对，其中前提包含子条款否定，假设是通过对前提进行最小修改来构建的，以反映不同的可能解释。除了采用标准的NLI标签外，我们的测试套件是在严格的语言框架下系统构建的。它包括基于语言学理论的否定类型和结构的注释，以及用于构建假设的操作。这有助于对模型性能进行细粒度分析。我们使用预先训练的语言模型进行实验，以证明我们的测试套件比现有的专注于否定的基准测试更具挑战性，并展示我们的注释如何支持对否定和量化方面当前NLI能力的更深入理解。

{"title":"Not another Negation Benchmark: The NaN-NLI Test Suite for Sub-clausal Negation","authors":"Thinh Hung Truong, Yulia Otmakhova, Tim Baldwin, Trevor Cohn, Karin M. Verspoor, Jey Han Lau","doi":"10.48550/arXiv.2210.03256","DOIUrl":"https://doi.org/10.48550/arXiv.2210.03256","url":null,"abstract":"Negation is poorly captured by current language models, although the extent of this problem is not widely understood. We introduce a natural language inference (NLI) test suite to enable probing the capabilities of NLP methods, with the aim of understanding sub-clausal negation. The test suite contains premise–hypothesis pairs where the premise contains sub-clausal negation and the hypothesis is constructed by making minimal modifications to the premise in order to reflect different possible interpretations. Aside from adopting standard NLI labels, our test suite is systematically constructed under a rigorous linguistic framework. It includes annotation of negation types and constructions grounded in linguistic theory, as well as the operations used to construct hypotheses. This facilitates fine-grained analysis of model performance. We conduct experiments using pre-trained language models to demonstrate that our test suite is more challenging than existing benchmarks focused on negation, and show how our annotation supports a deeper understanding of the current NLI capabilities in terms of negation and quantification.","PeriodicalId":39298,"journal":{"name":"AACL Bioflux","volume":"11 1","pages":"883-894"},"PeriodicalIF":0.0,"publicationDate":"2022-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82194821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

AACL Bioflux

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀