首页 > 最新文献

Transactions of the Association for Computational Linguistics最新文献

英文 中文
On the Role of Negative Precedent in Legal Outcome Prediction 论负面先例在法律结果预测中的作用
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-17 DOI: 10.1162/tacl_a_00532
Josef Valvoda, Ryan Cotterell, Simone Teufel
Every legal case sets a precedent by developing the law in one of the following two ways. It either expands its scope, in which case it sets positive precedent, or it narrows it, in which case it sets negative precedent. Legal outcome prediction, the prediction of positive outcome, is an increasingly popular task in AI. In contrast, we turn our focus to negative outcomes here, and introduce a new task of negative outcome prediction. We discover an asymmetry in existing models’ ability to predict positive and negative outcomes. Where the state-of-the-art outcome prediction model we used predicts positive outcomes at 75.06 F1, it predicts negative outcomes at only 10.09 F1, worse than a random baseline. To address this performance gap, we develop two new models inspired by the dynamics of a court process. Our first model significantly improves positive outcome prediction score to 77.15 F1 and our second model more than doubles the negative outcome prediction performance to 24.01 F1. Despite this improvement, shifting focus to negative outcomes reveals that there is still much room for improvement for outcome prediction models. https://github.com/valvoda/Negative-Precedent-in-Legal-Outcome-Prediction
每一个法律案件都以以下两种方式之一发展法律,从而开创先例。它要么扩大其范围,在这种情况下,它开创了积极的先例,要么缩小了范围,在那种情况下,他开创了消极的先例。法律结果预测,即对积极结果的预测,是人工智能中越来越流行的任务。相比之下,我们在这里将重点转向消极结果,并引入了一项新的消极结果预测任务。我们发现现有模型预测积极和消极结果的能力存在不对称性。我们使用的最先进的结果预测模型在75.06 F1时预测阳性结果,而在10.09 F1时预测阴性结果,比随机基线更差。为了解决这一性能差距,我们开发了两个受法庭程序动态启发的新模型。我们的第一个模型将阳性结果预测得分显著提高到77.15 F1,第二个模型将阴性结果预测性能提高了一倍多,达到24.01 F1。尽管有了这种改进,但将重点转移到负面结果上表明,结果预测模型仍有很大的改进空间。https://github.com/valvoda/Negative-Precedent-in-Legal-Outcome-Prediction
{"title":"On the Role of Negative Precedent in Legal Outcome Prediction","authors":"Josef Valvoda, Ryan Cotterell, Simone Teufel","doi":"10.1162/tacl_a_00532","DOIUrl":"https://doi.org/10.1162/tacl_a_00532","url":null,"abstract":"Every legal case sets a precedent by developing the law in one of the following two ways. It either expands its scope, in which case it sets positive precedent, or it narrows it, in which case it sets negative precedent. Legal outcome prediction, the prediction of positive outcome, is an increasingly popular task in AI. In contrast, we turn our focus to negative outcomes here, and introduce a new task of negative outcome prediction. We discover an asymmetry in existing models’ ability to predict positive and negative outcomes. Where the state-of-the-art outcome prediction model we used predicts positive outcomes at 75.06 F1, it predicts negative outcomes at only 10.09 F1, worse than a random baseline. To address this performance gap, we develop two new models inspired by the dynamics of a court process. Our first model significantly improves positive outcome prediction score to 77.15 F1 and our second model more than doubles the negative outcome prediction performance to 24.01 F1. Despite this improvement, shifting focus to negative outcomes reveals that there is still much room for improvement for outcome prediction models. https://github.com/valvoda/Negative-Precedent-in-Legal-Outcome-Prediction","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"34-48"},"PeriodicalIF":10.9,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48320538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
MENLI: Robust Evaluation Metrics from Natural Language Inference MENLI:来自自然语言推理的鲁棒评估指标
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-15 DOI: 10.1162/tacl_a_00576
Yanran Chen, Steffen Eger
Abstract Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%–30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).
最近提出的基于bert的文本生成评估指标在标准基准测试中表现良好,但容易受到对抗性攻击,例如与信息正确性相关的攻击。我们认为,这(部分)源于它们是语义相似性模型这一事实。相比之下,我们开发了基于自然语言推理(NLI)的评估指标,我们认为这是一种更合适的建模。我们设计了一个基于偏好的对抗性攻击框架,并表明我们基于NLI的指标比最近基于bert的指标对攻击更健壮。在标准基准测试中,我们基于NLI的度量优于现有的摘要度量,但低于SOTA MT度量。然而,当将现有指标与我们的NLI指标相结合时,我们获得了更高的对抗性鲁棒性(15%-30%)和在标准基准上测量的更高质量指标(+5%至30%)。
{"title":"MENLI: Robust Evaluation Metrics from Natural Language Inference","authors":"Yanran Chen, Steffen Eger","doi":"10.1162/tacl_a_00576","DOIUrl":"https://doi.org/10.1162/tacl_a_00576","url":null,"abstract":"Abstract Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%–30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"804-825"},"PeriodicalIF":10.9,"publicationDate":"2022-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45057275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Multi-task Active Learning for Pre-trained Transformer-based Models 基于预训练变压器模型的多任务主动学习
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-10 DOI: 10.1162/tacl_a_00515
Guy Rotman, Roi Reichart
Abstract Multi-task learning, in which several tasks are jointly learned by a single model, allows NLP models to share information from multiple annotations and may facilitate better predictions when the tasks are inter-related. This technique, however, requires annotating the same text with multiple annotation schemes, which may be costly and laborious. Active learning (AL) has been demonstrated to optimize annotation processes by iteratively selecting unlabeled examples whose annotation is most valuable for the NLP model. Yet, multi-task active learning (MT-AL) has not been applied to state-of-the-art pre-trained Transformer-based NLP models. This paper aims to close this gap. We explore various multi-task selection criteria in three realistic multi-task scenarios, reflecting different relations between the participating tasks, and demonstrate the effectiveness of multi-task compared to single-task selection. Our results suggest that MT-AL can be effectively used in order to minimize annotation efforts for multi-task NLP models.1
摘要多任务学习,其中多个任务由单个模型联合学习,允许NLP模型共享来自多个注释的信息,并且当任务相互关联时,可以促进更好的预测。然而,这种技术需要使用多个注释方案对同一文本进行注释,这可能既昂贵又费力。主动学习(AL)已被证明可以通过迭代选择注释对NLP模型最有价值的未标记示例来优化注释过程。然而,多任务主动学习(MT-AL)尚未应用于最先进的基于Transformer的预训练NLP模型。本文旨在缩小这一差距。我们在三个现实的多任务场景中探索了各种多任务选择标准,反映了参与任务之间的不同关系,并证明了多任务选择与单任务选择相比的有效性。我们的结果表明,为了最大限度地减少多任务NLP模型的注释工作量,可以有效地使用MT-AL。1
{"title":"Multi-task Active Learning for Pre-trained Transformer-based Models","authors":"Guy Rotman, Roi Reichart","doi":"10.1162/tacl_a_00515","DOIUrl":"https://doi.org/10.1162/tacl_a_00515","url":null,"abstract":"Abstract Multi-task learning, in which several tasks are jointly learned by a single model, allows NLP models to share information from multiple annotations and may facilitate better predictions when the tasks are inter-related. This technique, however, requires annotating the same text with multiple annotation schemes, which may be costly and laborious. Active learning (AL) has been demonstrated to optimize annotation processes by iteratively selecting unlabeled examples whose annotation is most valuable for the NLP model. Yet, multi-task active learning (MT-AL) has not been applied to state-of-the-art pre-trained Transformer-based NLP models. This paper aims to close this gap. We explore various multi-task selection criteria in three realistic multi-task scenarios, reflecting different relations between the participating tasks, and demonstrate the effectiveness of multi-task compared to single-task selection. Our results suggest that MT-AL can be effectively used in order to minimize annotation efforts for multi-task NLP models.1","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"1209-1228"},"PeriodicalIF":10.9,"publicationDate":"2022-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42251011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Compositional Evaluation on Japanese Textual Entailment and Similarity 日语语篇纠缠与相似性的作文评价
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-09 DOI: 10.1162/tacl_a_00518
Hitomi Yanaka, K. Mineshima
Abstract Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.
摘要自然语言推理(NLI)和语义文本相似性(STS)是广泛用于预训练语言模型组合评估的基准任务。尽管人们对语言共性越来越感兴趣,但大多数NLI/STS研究几乎都只关注英语。特别是,日语中没有可用的多语言NLI/STS数据集,这在类型学上与英语不同,可以揭示语言模型在对语序和大小写的敏感性等问题上目前有争议的行为。在此背景下,我们介绍了JSICK,这是一个从英语数据集SICK手动翻译而来的日语NLI/STS数据集。我们还提供了一个用于成分推理的压力测试数据集,该数据集通过在JSICK中转换句子的句法结构来创建,以研究语言模型是否对语序和格助词敏感。我们在不同的预训练语言模型上进行了基线实验,并比较了多语言模型在应用于日语和其他语言时的性能。压力测试实验的结果表明,目前预先训练的语言模型对语序和格标记不敏感。
{"title":"Compositional Evaluation on Japanese Textual Entailment and Similarity","authors":"Hitomi Yanaka, K. Mineshima","doi":"10.1162/tacl_a_00518","DOIUrl":"https://doi.org/10.1162/tacl_a_00518","url":null,"abstract":"Abstract Natural Language Inference (NLI) and Semantic Textual Similarity (STS) are widely used benchmark tasks for compositional evaluation of pre-trained language models. Despite growing interest in linguistic universals, most NLI/STS studies have focused almost exclusively on English. In particular, there are no available multilingual NLI/STS datasets in Japanese, which is typologically different from English and can shed light on the currently controversial behavior of language models in matters such as sensitivity to word order and case particles. Against this background, we introduce JSICK, a Japanese NLI/STS dataset that was manually translated from the English dataset SICK. We also present a stress-test dataset for compositional inference, created by transforming syntactic structures of sentences in JSICK to investigate whether language models are sensitive to word order and case particles. We conduct baseline experiments on different pre-trained language models and compare the performance of multilingual models when applied to Japanese and other languages. The results of the stress-test experiments suggest that the current pre-trained language models are insensitive to word order and case marking.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"1266-1284"},"PeriodicalIF":10.9,"publicationDate":"2022-08-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43354036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Abstractive Meeting Summarization: A Survey 摘要会议综述
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-08 DOI: 10.1162/tacl_a_00578
Virgile Rennard, Guokan Shang, Julie Hunter, M. Vazirgiannis
Abstract A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls. Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved language generation systems, opening the door to improved forms of abstractive summarization—a form of summarization particularly well-suited for multi-party conversation. In this paper, we provide an overview of the challenges raised by the task of abstractive meeting summarization and of the data sets, models, and evaluation metrics that have been used to tackle the problems.
一个能够可靠地识别和总结对话中最重要的点的系统在各种现实环境中都是有价值的,从商务会议到医疗咨询再到客户服务电话。深度学习的最新进展,特别是编码器-解码器架构的发明,极大地改进了语言生成系统,为改进的抽象摘要形式打开了大门——一种特别适合多方对话的摘要形式。在本文中,我们概述了抽象会议总结任务所带来的挑战,以及用于解决这些问题的数据集、模型和评估指标。
{"title":"Abstractive Meeting Summarization: A Survey","authors":"Virgile Rennard, Guokan Shang, Julie Hunter, M. Vazirgiannis","doi":"10.1162/tacl_a_00578","DOIUrl":"https://doi.org/10.1162/tacl_a_00578","url":null,"abstract":"Abstract A system that could reliably identify and sum up the most important points of a conversation would be valuable in a wide variety of real-world contexts, from business meetings to medical consultations to customer service calls. Recent advances in deep learning, and especially the invention of encoder-decoder architectures, has significantly improved language generation systems, opening the door to improved forms of abstractive summarization—a form of summarization particularly well-suited for multi-party conversation. In this paper, we provide an overview of the challenges raised by the task of abstractive meeting summarization and of the data sets, models, and evaluation metrics that have been used to tackle the problems.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"861-884"},"PeriodicalIF":10.9,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44640875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Template-based Abstractive Microblog Opinion Summarization 基于模板的微博意见摘要
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-08 DOI: 10.1162/tacl_a_00516
I. Bilal, Bo Wang, A. Tsakalidis, Dong Nguyen, R. Procter, M. Liakata
Abstract We introduce the task of microblog opinion summarization (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarization dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarizing news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favors extractive summarization models. To showcase the dataset’s utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarization models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.
摘要本文引入了微博意见摘要的任务,并分享了3100个金标准意见摘要的数据集,以促进该领域的研究。该数据集包含2年期间的tweet摘要,涵盖的主题比任何其他公共Twitter摘要数据集都多。摘要本质上是抽象的,是由熟练的记者根据将事实信息(主要故事)与作者观点分开的模板总结新闻文章而创作的。我们的方法不同于之前从社交媒体生成金标准摘要的工作,后者通常涉及选择具有代表性的帖子,因此更倾向于提取摘要模型。为了展示数据集的实用性和挑战,我们对一系列抽象和提取的最先进的总结模型进行了基准测试,并取得了良好的性能,前者的表现优于后者。我们还表明,微调是必要的,以提高性能和调查使用不同的样本大小的好处。
{"title":"Template-based Abstractive Microblog Opinion Summarization","authors":"I. Bilal, Bo Wang, A. Tsakalidis, Dong Nguyen, R. Procter, M. Liakata","doi":"10.1162/tacl_a_00516","DOIUrl":"https://doi.org/10.1162/tacl_a_00516","url":null,"abstract":"Abstract We introduce the task of microblog opinion summarization (MOS) and share a dataset of 3100 gold-standard opinion summaries to facilitate research in this domain. The dataset contains summaries of tweets spanning a 2-year period and covers more topics than any other public Twitter summarization dataset. Summaries are abstractive in nature and have been created by journalists skilled in summarizing news articles following a template separating factual information (main story) from author opinions. Our method differs from previous work on generating gold-standard summaries from social media, which usually involves selecting representative posts and thus favors extractive summarization models. To showcase the dataset’s utility and challenges, we benchmark a range of abstractive and extractive state-of-the-art summarization models and achieve good performance, with the former outperforming the latter. We also show that fine-tuning is necessary to improve performance and investigate the benefits of using different sample sizes.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"1229-1248"},"PeriodicalIF":10.9,"publicationDate":"2022-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48395571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Multilingual Coreference Resolution in Multiparty Dialogue 多方对话中的多语言共指解决
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-02 DOI: 10.1162/tacl_a_00581
Boyuan Zheng, Patrick Xia, M. Yarmohammadi, Benjamin Van Durme
Abstract Existing multiparty dialogue datasets for entity coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference resolution data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting.
现有的用于实体共同参考解析的多方对话数据集尚处于萌芽阶段,许多挑战尚未解决。我们创建了一个大规模的数据集,多语言多方核心(MMC),为这个任务基于电视成绩单。由于多种语言的高质量字幕的可用性,我们建议通过注释投影重用注释来创建其他语言(汉语和波斯语)的高质量共同参考分辨率数据。在黄金(英语)数据上,现成的模型在MMC上的表现相对较差,这表明MMC比先前的数据集具有更广泛的多方共同参考覆盖范围。在白银数据上,我们发现将其用于数据增强和从头开始训练都取得了成功,这有效地模拟了零射击跨语言设置。
{"title":"Multilingual Coreference Resolution in Multiparty Dialogue","authors":"Boyuan Zheng, Patrick Xia, M. Yarmohammadi, Benjamin Van Durme","doi":"10.1162/tacl_a_00581","DOIUrl":"https://doi.org/10.1162/tacl_a_00581","url":null,"abstract":"Abstract Existing multiparty dialogue datasets for entity coreference resolution are nascent, and many challenges are still unaddressed. We create a large-scale dataset, Multilingual Multiparty Coref (MMC), for this task based on TV transcripts. Due to the availability of gold-quality subtitles in multiple languages, we propose reusing the annotations to create silver coreference resolution data in other languages (Chinese and Farsi) via annotation projection. On the gold (English) data, off-the-shelf models perform relatively poorly on MMC, suggesting that MMC has broader coverage of multiparty coreference than prior datasets. On the silver data, we find success both using it for data augmentation and training from scratch, which effectively simulates the zero-shot cross-lingual setting.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"922-940"},"PeriodicalIF":10.9,"publicationDate":"2022-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41519963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Long-Text Understanding with Short-Text Models 利用短文本模型实现长文本的高效理解
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-08-01 DOI: 10.1162/tacl_a_00547
Maor Ivgi, Uri Shaham, Jonathan Berant
Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles, and long documents due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scratch. In this work, we propose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.
基于transformer的预训练语言模型(LMs)在自然语言理解中无处不在,但由于其二次复杂度,不能应用于故事、科学文章和长文档等长序列。虽然已经提出了无数有效的转换器变体,但它们通常基于定制实现,需要从头开始进行昂贵的预训练。在这项工作中,我们提出了SLED:滑动编码器和解码器,这是一种处理长序列的简单方法,可以重用和利用经过实战测试的短文本预训练的lm。具体来说,我们将输入划分为重叠的块,用短文本LM编码器对每个块进行编码,并使用预训练的解码器融合块间的信息(融合在解码器中)。我们通过对照实验证明,SLED为长文本理解提供了一种可行的策略,并在scroll上评估了我们的方法,scroll是一个包含7个数据集的基准测试,涵盖了广泛的语言理解任务。我们发现,SLED与专业模型相比具有竞争力,这些模型的规模高达50倍,需要专门且昂贵的预训练步骤。
{"title":"Efficient Long-Text Understanding with Short-Text Models","authors":"Maor Ivgi, Uri Shaham, Jonathan Berant","doi":"10.1162/tacl_a_00547","DOIUrl":"https://doi.org/10.1162/tacl_a_00547","url":null,"abstract":"Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scientific articles, and long documents due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on custom implementations that require expensive pretraining from scratch. In this work, we propose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"284-299"},"PeriodicalIF":10.9,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42297630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval 聚合检索器:一种用于鲁棒密集通道检索的聚合文本表示的简单方法
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-07-31 DOI: 10.1162/tacl_a_00556
Sheng-Chieh Lin, Minghan Li, Jimmy Lin
Pre-trained language models have been successful in many knowledge-intensive NLP tasks. However, recent work has shown that models such as BERT are not “structurally ready” to aggregate textual information into a [CLS] vector for dense passage retrieval (DPR). This “lack of readiness” results from the gap between language model pre-training and DPR fine-tuning. Previous solutions call for computationally expensive techniques such as hard negative mining, cross-encoder distillation, and further pre-training to learn a robust DPR model. In this work, we instead propose to fully exploit knowledge in a pre-trained language model for DPR by aggregating the contextualized token embeddings into a dense vector, which we call agg★. By concatenating vectors from the [CLS] token and agg★, our Aggretriever model substantially improves the effectiveness of dense retrieval models on both in-domain and zero-shot evaluations without introducing substantial training overhead. Code is available at https://github.com/castorini/dhr.
经过预训练的语言模型在许多知识密集型NLP任务中都取得了成功。然而,最近的工作表明,像BERT这样的模型在结构上还没有准备好将文本信息聚合到用于密集段落检索(DPR)的[CLS]向量中。这种“准备不足”是语言模型预训练和DPR微调之间的差距造成的。以前的解决方案需要计算成本高昂的技术,如硬负挖掘、交叉编码器提取和进一步的预训练,以学习稳健的DPR模型。在这项工作中,我们建议通过将上下文化的令牌嵌入聚合到密集向量中,来充分利用DPR的预训练语言模型中的知识,我们称之为agg★. 通过连接[CLS]标记和agg的矢量★, 我们的Aggregather模型在不引入大量训练开销的情况下,显著提高了密集检索模型在域内和零样本评估中的有效性。代码可在https://github.com/castorini/dhr.
{"title":"Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval","authors":"Sheng-Chieh Lin, Minghan Li, Jimmy Lin","doi":"10.1162/tacl_a_00556","DOIUrl":"https://doi.org/10.1162/tacl_a_00556","url":null,"abstract":"Pre-trained language models have been successful in many knowledge-intensive NLP tasks. However, recent work has shown that models such as BERT are not “structurally ready” to aggregate textual information into a [CLS] vector for dense passage retrieval (DPR). This “lack of readiness” results from the gap between language model pre-training and DPR fine-tuning. Previous solutions call for computationally expensive techniques such as hard negative mining, cross-encoder distillation, and further pre-training to learn a robust DPR model. In this work, we instead propose to fully exploit knowledge in a pre-trained language model for DPR by aggregating the contextualized token embeddings into a dense vector, which we call agg★. By concatenating vectors from the [CLS] token and agg★, our Aggretriever model substantially improves the effectiveness of dense retrieval models on both in-domain and zero-shot evaluations without introducing substantial training overhead. Code is available at https://github.com/castorini/dhr.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"11 1","pages":"436-452"},"PeriodicalIF":10.9,"publicationDate":"2022-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49271623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Unit Testing for Concepts in Neural Networks 神经网络概念的单元测试
IF 10.9 1区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-07-28 DOI: 10.1162/tacl_a_00514
Charles Lovering, Elizabeth-Jane Pavlick
Abstract Many complex problems are naturally understood in terms of symbolic concepts. For example, our concept of “cat” is related to our concepts of “ears” and “whiskers” in a non-arbitrary way. Fodor (1998) proposes one theory of concepts, which emphasizes symbolic representations related via constituency structures. Whether neural networks are consistent with such a theory is open for debate. We propose unit tests for evaluating whether a system’s behavior is consistent with several key aspects of Fodor’s criteria. Using a simple visual concept learning task, we evaluate several modern neural architectures against this specification. We find that models succeed on tests of groundedness, modularity, and reusability of concepts, but that important questions about causality remain open. Resolving these will require new methods for analyzing models’ internal states.
许多复杂的问题自然是用符号概念来理解的。例如,我们的“猫”概念与我们的“耳朵”和“胡须”概念以一种非任意的方式联系在一起。Fodor(1998)提出了一种概念理论,强调通过选区结构相关的符号表征。神经网络是否与这种理论相一致还有待讨论。我们建议用单元测试来评估系统的行为是否符合Fodor标准的几个关键方面。使用一个简单的视觉概念学习任务,我们根据该规范评估了几种现代神经架构。我们发现模型在概念的基础、模块性和可重用性的测试上取得了成功,但是关于因果关系的重要问题仍然是开放的。解决这些问题需要新的方法来分析模型的内部状态。
{"title":"Unit Testing for Concepts in Neural Networks","authors":"Charles Lovering, Elizabeth-Jane Pavlick","doi":"10.1162/tacl_a_00514","DOIUrl":"https://doi.org/10.1162/tacl_a_00514","url":null,"abstract":"Abstract Many complex problems are naturally understood in terms of symbolic concepts. For example, our concept of “cat” is related to our concepts of “ears” and “whiskers” in a non-arbitrary way. Fodor (1998) proposes one theory of concepts, which emphasizes symbolic representations related via constituency structures. Whether neural networks are consistent with such a theory is open for debate. We propose unit tests for evaluating whether a system’s behavior is consistent with several key aspects of Fodor’s criteria. Using a simple visual concept learning task, we evaluate several modern neural architectures against this specification. We find that models succeed on tests of groundedness, modularity, and reusability of concepts, but that important questions about causality remain open. Resolving these will require new methods for analyzing models’ internal states.","PeriodicalId":33559,"journal":{"name":"Transactions of the Association for Computational Linguistics","volume":"10 1","pages":"1193-1208"},"PeriodicalIF":10.9,"publicationDate":"2022-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48124078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
期刊
Transactions of the Association for Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1