首页 > 最新文献

Computational Linguistics最新文献

英文 中文
The Role of Typological Feature Prediction in NLP and Linguistics 类型学特征预测在NLP和语言学中的作用
IF 9.3 2区 计算机科学 Pub Date : 2023-11-20 DOI: 10.1162/coli_a_00498
Johannes Bjerva
Computational typology has gained traction in the field of Natural Language Processing (NLP) in recent years, as evidenced by the increasing number of papers on the topic and the establishment of a Special Interest Group on the topic (SIGTYP), including the organization of successful workshops and shared tasks. A considerable amount of work in this sub-field is concerned with prediction of typological features, e.g., for databases such as the World Atlas of Language Structures (WALS) or Grambank. Prediction is argued to be useful either because (1) it allows for obtaining feature values for relatively undocumented languages, alleviating the sparseness in WALS, in turn argued to be useful for both NLP and linguistics; and (2) it allows us to probe models to see whether or not these typological features are encapsulated in, e.g., language representations. In this article, we present a critical stance concerning prediction of typological features, investigating to what extent this line of research is aligned with purported needs—both from the perspective of NLP practitioners, and perhaps more importantly, from the perspective of linguists specialized in typology and language documentation. We provide evidence that this line of research in its current state suffers from a lack of interdisciplinary alignment. Based on an extensive survey of the linguistic typology community, we present concrete recommendations for future research in order to improve this alignment between linguists and NLP researchers, beyond the scope of typological feature prediction.
近年来,计算类型学在自然语言处理(NLP)领域获得了广泛的关注,关于该主题的论文越来越多,关于该主题的特别兴趣小组(SIGTYP)的建立证明了这一点,包括组织成功的研讨会和共享任务。在这一子领域中,相当多的工作与类型学特征的预测有关,例如,对诸如世界语言结构地图集(WALS)或格兰班克数据库的预测。预测被认为是有用的,因为(1)它允许获得相对未记录的语言的特征值,减轻WALS的稀疏性,反过来被认为对NLP和语言学都有用;(2)它允许我们探测模型,看看这些类型学特征是否被封装在语言表征中。在这篇文章中,我们提出了一个关于类型学特征预测的关键立场,从NLP从业者的角度,也许更重要的是,从专门从事类型学和语言文档的语言学家的角度,调查这条研究路线在多大程度上符合据称的需求。我们提供的证据表明,这条研究线在其目前的状态遭受缺乏跨学科的一致性。基于对语言类型学社区的广泛调查,我们提出了未来研究的具体建议,以改善语言学家和NLP研究人员之间的这种一致性,超出了类型学特征预测的范围。
{"title":"The Role of Typological Feature Prediction in NLP and Linguistics","authors":"Johannes Bjerva","doi":"10.1162/coli_a_00498","DOIUrl":"https://doi.org/10.1162/coli_a_00498","url":null,"abstract":"Computational typology has gained traction in the field of Natural Language Processing (NLP) in recent years, as evidenced by the increasing number of papers on the topic and the establishment of a Special Interest Group on the topic (SIGTYP), including the organization of successful workshops and shared tasks. A considerable amount of work in this sub-field is concerned with prediction of typological features, e.g., for databases such as the World Atlas of Language Structures (WALS) or Grambank. Prediction is argued to be useful either because (1) it allows for obtaining feature values for relatively undocumented languages, alleviating the sparseness in WALS, in turn argued to be useful for both NLP and linguistics; and (2) it allows us to probe models to see whether or not these typological features are encapsulated in, e.g., language representations. In this article, we present a critical stance concerning prediction of typological features, investigating to what extent this line of research is aligned with purported needs—both from the perspective of NLP practitioners, and perhaps more importantly, from the perspective of linguists specialized in typology and language documentation. We provide evidence that this line of research in its current state suffers from a lack of interdisciplinary alignment. Based on an extensive survey of the linguistic typology community, we present concrete recommendations for future research in order to improve this alignment between linguists and NLP researchers, beyond the scope of typological feature prediction.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"66 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the Role of Morphological Information for Contextual Lemmatization 论形态信息在语境词源化中的作用
IF 9.3 2区 计算机科学 Pub Date : 2023-11-15 DOI: 10.1162/coli_a_00497
Olia Toporkov, Rodrigo Agerri
Lemmatization is a natural language processing (NLP) task which consists of producing, from a given inflected word, its canonical form or lemma. Lemmatization is one of the basic tasks that facilitate downstream NLP applications, and is of particular importance for high-inflected languages. Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance. In order to address this issue, in this paper we empirically investigate the role of morphological information to develop contextual lemmatizers in six languages within a varied spectrum of morphological complexity: Basque, Turkish, Russian, Czech, Spanish and English. Furthermore, and unlike the vast majority of previous work, we also evaluate lemmatizers in out-of-domain settings, which constitutes, after all, their most common application use. The results of our study are rather surprising. It turns out that providing lemmatizers with fine-grained morphological features during training is not that beneficial, not even for agglutinative languages. In fact, modern contextual word representations seem to implicitly encode enough morphological information to obtain competitive contextual lemmatizers without seeing any explicit morphological signal. Moreover, our experiments suggest that the best lemmatizers out-of-domain are those using simple UPOS tags or those trained without morphology and, finally, that current evaluation practices for lemmatization are not adequate to clearly discriminate between models.
引理化是一种自然语言处理(NLP)任务,它包括从一个给定的屈折词产生其规范形式或引理。词形化是促进下游NLP应用的基本任务之一,对于高屈折语言尤为重要。考虑到从屈折词中获得引理的过程可以通过查看其形态句法类别来解释,包括细粒度的形态句法信息来训练上下文引理器已经成为一种常见的做法,而不考虑这在下游性能方面是否是最佳的。为了解决这一问题,在本文中,我们实证研究了形态信息在六种不同形态复杂性语言(巴斯克语、土耳其语、俄语、捷克语、西班牙语和英语)中开发上下文词法的作用。此外,与之前的绝大多数工作不同,我们还在域外设置中评估归纳器,毕竟,这是它们最常见的应用。我们的研究结果相当令人吃惊。事实证明,在训练过程中为词源学提供细粒度的形态学特征并不是那么有益,即使对于粘合语言也是如此。事实上,现代语境词表征似乎隐式地编码了足够的形态信息,以获得竞争的语境词法,而无需看到任何明确的形态信号。此外,我们的实验表明,域外最好的词法归纳器是那些使用简单的UPOS标签或未经形态学训练的词法归纳器,最后,目前的词法归纳评估实践不足以明确区分模型。
{"title":"On the Role of Morphological Information for Contextual Lemmatization","authors":"Olia Toporkov, Rodrigo Agerri","doi":"10.1162/coli_a_00497","DOIUrl":"https://doi.org/10.1162/coli_a_00497","url":null,"abstract":"Lemmatization is a natural language processing (NLP) task which consists of producing, from a given inflected word, its canonical form or lemma. Lemmatization is one of the basic tasks that facilitate downstream NLP applications, and is of particular importance for high-inflected languages. Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance. In order to address this issue, in this paper we empirically investigate the role of morphological information to develop contextual lemmatizers in six languages within a varied spectrum of morphological complexity: Basque, Turkish, Russian, Czech, Spanish and English. Furthermore, and unlike the vast majority of previous work, we also evaluate lemmatizers in out-of-domain settings, which constitutes, after all, their most common application use. The results of our study are rather surprising. It turns out that providing lemmatizers with fine-grained morphological features during training is not that beneficial, not even for agglutinative languages. In fact, modern contextual word representations seem to implicitly encode enough morphological information to obtain competitive contextual lemmatizers without seeing any explicit morphological signal. Moreover, our experiments suggest that the best lemmatizers out-of-domain are those using simple UPOS tags or those trained without morphology and, finally, that current evaluation practices for lemmatization are not adequate to clearly discriminate between models.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"21 10","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language Model Behavior: A Comprehensive Survey 语言模型行为:综合调查
IF 9.3 2区 计算机科学 Pub Date : 2023-11-15 DOI: 10.1162/coli_a_00492
Tyler A. Chang, Benjamin K. Bergen
Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about large language model capabilities, thus providing a resource for applied work and for research in adjacent fields that use language models.
转换语言模型受到了公众的广泛关注,然而它们生成的文本常常让NLP研究人员感到惊讶。在这项调查中,我们讨论了250多项关于特定任务微调之前英语语言模型行为的最新研究。语言模型具有语法、语义、语用、世界知识和推理方面的基本能力,但这些能力对特定输入和表面特征很敏感。尽管随着模型扩展到数千亿个参数,生成的文本质量显著提高,但模型仍然容易出现不真实的响应、常识性错误、记忆文本和社会偏见。这些弱点中的许多都可以归结为对文本中学习模式的过度概括或不充分概括。我们综合了最近的结果,突出了目前已知的大型语言模型能力,从而为应用工作和使用语言模型的邻近领域的研究提供了资源。
{"title":"Language Model Behavior: A Comprehensive Survey","authors":"Tyler A. Chang, Benjamin K. Bergen","doi":"10.1162/coli_a_00492","DOIUrl":"https://doi.org/10.1162/coli_a_00492","url":null,"abstract":"Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about large language model capabilities, thus providing a resource for applied work and for research in adjacent fields that use language models.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"14 7","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation 低资源神经机器翻译中单语数据开发的再思考
IF 9.3 2区 计算机科学 Pub Date : 2023-11-15 DOI: 10.1162/coli_a_00496
Jianhui Pang, Derek Fai Wong, Dayiheng Liu, Jun Xie, Baosong Yang, Yu Wan, Lidia Sam Chao
The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the effectiveness of techniques such as Back-Translation and self-supervised objectives, including Masked Language Modeling, Causal Language Modeling, and Denoise Autoencoding, in improving the performance of machine translation models. However, the manner in which these methods contribute to the success of machine translation tasks and how they can be effectively combined remains an under-researched area. In this study, we carry out a systematic investigation of the effects of these techniques on linguistic properties through the use of probing tasks, including source language comprehension, bilingual word alignment, and translation fluency. We further evaluate the impact of Pre-Training, Back-Translation, and Multi-Task Learning on bitexts of varying sizes. Our findings inform the design of more effective pipelines for leveraging monolingual data in extremely low-resource and low-resource machine translation tasks. Experiment results show consistent performance gains in seven translation directions, which provide further support for our conclusions and understanding of the role of monolingual data in machine translation.
单语数据的利用已被证明是解决低资源机器翻译问题的一种有前途的策略。先前的研究已经证明了反翻译和自监督目标等技术(包括掩码语言建模、因果语言建模和去噪自动编码)在提高机器翻译模型性能方面的有效性。然而,这些方法如何促进机器翻译任务的成功以及如何有效地将它们结合起来仍然是一个研究不足的领域。在这项研究中,我们通过使用探索性任务,包括源语理解、双语词对齐和翻译流畅性,对这些技术对语言特性的影响进行了系统的调查。我们进一步评估了预训练、反翻译和多任务学习对不同大小文本的影响。我们的研究结果为在极低资源和低资源机器翻译任务中利用单语言数据设计更有效的管道提供了信息。实验结果表明,在7个翻译方向上的性能提升是一致的,这进一步支持了我们的结论和对单语数据在机器翻译中的作用的理解。
{"title":"Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation","authors":"Jianhui Pang, Derek Fai Wong, Dayiheng Liu, Jun Xie, Baosong Yang, Yu Wan, Lidia Sam Chao","doi":"10.1162/coli_a_00496","DOIUrl":"https://doi.org/10.1162/coli_a_00496","url":null,"abstract":"The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the effectiveness of techniques such as Back-Translation and self-supervised objectives, including Masked Language Modeling, Causal Language Modeling, and Denoise Autoencoding, in improving the performance of machine translation models. However, the manner in which these methods contribute to the success of machine translation tasks and how they can be effectively combined remains an under-researched area. In this study, we carry out a systematic investigation of the effects of these techniques on linguistic properties through the use of probing tasks, including source language comprehension, bilingual word alignment, and translation fluency. We further evaluate the impact of Pre-Training, Back-Translation, and Multi-Task Learning on bitexts of varying sizes. Our findings inform the design of more effective pipelines for leveraging monolingual data in extremely low-resource and low-resource machine translation tasks. Experiment results show consistent performance gains in seven translation directions, which provide further support for our conclusions and understanding of the role of monolingual data in machine translation.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"64 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How is a “Kitchen Chair” like a “Farm Horse”? Exploring the Representation of Noun-Noun Compound Semantics in Transformer-based Language Models “厨房椅”和“农场马”有什么相似之处?基于变换的语言模型中名词-名词复合语义的表征探讨
IF 9.3 2区 计算机科学 Pub Date : 2023-11-15 DOI: 10.1162/coli_a_00495
Mark Ormerod, Barry Devereux, Jesús Martínez del Rincón
Despite the success of Transformer-based language models in a wide variety of natural language processing tasks, our understanding of how these models process a given input in order to represent task-relevant information remains incomplete. In this work, we focus on semantic composition and examine how Transformer-based language models represent semantic information related to the meaning of English noun-noun compounds. We probe Transformer-based language models for their knowledge of the thematic relations that link the head nouns and modifier words of compounds (e.g., KITCHEN CHAIR: a chair located in a kitchen). Firstly, using a dataset featuring groups of compounds with shared lexical or semantic features, we find that token representations of six Transformer-based language models distinguish between pairs of compounds based on whether they use the same thematic relation. Secondly, we utilize fine-grained vector representations of compound semantics derived from human annotations, and find that token vectors from several models elicit a strong signal of the semantic relations used in the compounds. In a novel ‘compositional probe’ setting, where we compare the semantic relation signal in mean-pooled token vectors of compounds to mean-pooled token vectors when the two constituent words appear in separate sentences, we find that the Transformer-based language models that best represent the semantics of noun-noun compounds also do so substantially better than in the control condition where the two constituent works are processed separately. Overall, our results shed light on the ability of Transformer-based language models to support compositional semantic processes in representing the meaning of noun-noun compounds.
尽管基于transformer的语言模型在各种自然语言处理任务中取得了成功,但我们对这些模型如何处理给定输入以表示任务相关信息的理解仍然不完整。在这项工作中,我们关注语义组成,并研究基于transformer的语言模型如何表示与英语名词-名词复合词意义相关的语义信息。我们探索了基于transformer的语言模型,以了解连接词头名词和复合词修饰语的主位关系(例如,KITCHEN CHAIR:位于厨房的椅子)。首先,使用具有共享词法或语义特征的化合物组的数据集,我们发现六个基于transformer的语言模型的标记表示根据它们是否使用相同的主题关系来区分化合物对。其次,我们利用来自人类注释的复合语义的细粒度向量表示,并发现来自几个模型的标记向量引出了复合中使用的语义关系的强信号。在一种新的“组合探针”设置中,我们比较了两个组成词出现在单独句子中时,复合词的平均池标记向量中的语义关系信号与平均池标记向量中的语义关系信号,我们发现基于transformer的语言模型最能代表名词-名词复合词的语义,也比两个组成词单独处理的控制条件要好得多。总的来说,我们的结果揭示了基于transformer的语言模型在表示名词-名词复合词的意义时支持组合语义过程的能力。
{"title":"How is a “Kitchen Chair” like a “Farm Horse”? Exploring the Representation of Noun-Noun Compound Semantics in Transformer-based Language Models","authors":"Mark Ormerod, Barry Devereux, Jesús Martínez del Rincón","doi":"10.1162/coli_a_00495","DOIUrl":"https://doi.org/10.1162/coli_a_00495","url":null,"abstract":"Despite the success of Transformer-based language models in a wide variety of natural language processing tasks, our understanding of how these models process a given input in order to represent task-relevant information remains incomplete. In this work, we focus on semantic composition and examine how Transformer-based language models represent semantic information related to the meaning of English noun-noun compounds. We probe Transformer-based language models for their knowledge of the thematic relations that link the head nouns and modifier words of compounds (e.g., KITCHEN CHAIR: a chair located in a kitchen). Firstly, using a dataset featuring groups of compounds with shared lexical or semantic features, we find that token representations of six Transformer-based language models distinguish between pairs of compounds based on whether they use the same thematic relation. Secondly, we utilize fine-grained vector representations of compound semantics derived from human annotations, and find that token vectors from several models elicit a strong signal of the semantic relations used in the compounds. In a novel ‘compositional probe’ setting, where we compare the semantic relation signal in mean-pooled token vectors of compounds to mean-pooled token vectors when the two constituent words appear in separate sentences, we find that the Transformer-based language models that best represent the semantics of noun-noun compounds also do so substantially better than in the control condition where the two constituent works are processed separately. Overall, our results shed light on the ability of Transformer-based language models to support compositional semantic processes in representing the meaning of noun-noun compounds.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"53 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering 基于问答输入干预的语言模型语义忠实度分析
IF 9.3 2区 计算机科学 Pub Date : 2023-11-15 DOI: 10.1162/coli_a_00493
Akshay Chaturvedi, Soumadeep Saha, Nicholas Asher, Swarnadeep Bhar, Utpal Garain
Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.
基于转换器的语言模型已被证明对一些NLP任务非常有效。在本文中,我们考虑了三个转换模型,BERT, RoBERTa和XLNet,在小版本和大版本中,并研究了它们的表示对文本语义内容的忠实程度。我们形式化了语义忠实的概念,其中文本的语义内容应该在问答模型的推理中因果关系地出现。然后,我们通过观察模型在执行两种新的语义干预-删除干预和否定干预后回答关于故事的问题的行为来验证这一概念。虽然转换模型在标准问答任务中实现了高性能,但我们表明,一旦我们在大量情况下执行这些干预,它们就不能在语义上忠实(删除干预为50%,否定干预为20%)。然后,我们提出了一种基于干预的培训制度,可以显著减轻缺失干预的不良影响(从50%到6%)。我们分析了模型的内部工作原理,以更好地理解基于干预的缺失干预训练的有效性。但我们表明,这种训练并没有减弱语义不忠实的其他方面,例如模型无法处理否定干预或捕捉文本的谓词-论证结构。我们还通过提示测试了InstructGPT处理两个干预和捕获谓词-参数结构的能力。尽管InstructGPT模型在谓词-参数结构任务上确实实现了非常高的性能,但它们无法充分响应我们的删除和否定干预。
{"title":"Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering","authors":"Akshay Chaturvedi, Soumadeep Saha, Nicholas Asher, Swarnadeep Bhar, Utpal Garain","doi":"10.1162/coli_a_00493","DOIUrl":"https://doi.org/10.1162/coli_a_00493","url":null,"abstract":"Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"6 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Universal Generation for Optimality Theory Is PSPACE-Complete 最优性理论的通用生成是pspace完备的
IF 9.3 2区 计算机科学 Pub Date : 2023-11-15 DOI: 10.1162/coli_a_00494
Sophie Hao
This paper shows that the universal generation problem (Heinz, Kobele, and Riggle 2009) for Optimality Theory (OT, Prince and Smolensky 1993, 2004) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard (Eisner 1997, 2000b; Wareham 1998; Idsardi 2006) and at most EXPSPACE-hard (Riggle 2004), our results place universal generation in between those two classes, assuming that NP ≠ PSPACE. We additionally show that when the number of constraints is bounded in advance, universal generation is at least NL-hard and at most NPNP-hard. Our proofs rely on a close connection between OT and the intersection non-emptiness problem for finite automata, which is PSPACE-complete in general (Kozen 1977) and NL-complete when the number of automata is bounded (Jones 1975). Our analysis shows that constraint interaction is the main contributor to the complexity of OT: the ability to factor transformations into simple, interacting constraints allows OT to furnish compact descriptions of intricate phonological phenomena.
本文证明了最优性理论(OT, Prince and Smolensky 1993, 2004)的普遍生成问题(Heinz, Kobele, and Riggle 2009)是pspace完备的。虽然先前的研究表明,普遍生成至少是NP-hard的(Eisner 1997, 2000b;Wareham 1998;Idsardi 2006)和最多EXPSPACE-hard (Riggle 2004),我们的结果将全称生成置于这两类之间,假设NP≠PSPACE。我们还表明,当约束的数量预先有界时,通用生成至少是NL-hard,最多是NPNP-hard。我们的证明依赖于OT与有限自动机的交非空问题之间的密切联系,一般情况下是PSPACE-complete (Kozen 1977),而当自动机的数量有界时是NL-complete (Jones 1975)。我们的分析表明,约束相互作用是OT复杂性的主要贡献者:将转换分解为简单的、相互作用的约束的能力允许OT提供复杂语音现象的紧凑描述。
{"title":"Universal Generation for Optimality Theory Is PSPACE-Complete","authors":"Sophie Hao","doi":"10.1162/coli_a_00494","DOIUrl":"https://doi.org/10.1162/coli_a_00494","url":null,"abstract":"This paper shows that the universal generation problem (Heinz, Kobele, and Riggle 2009) for Optimality Theory (OT, Prince and Smolensky 1993, 2004) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard (Eisner 1997, 2000b; Wareham 1998; Idsardi 2006) and at most EXPSPACE-hard (Riggle 2004), our results place universal generation in between those two classes, assuming that NP ≠ PSPACE. We additionally show that when the number of constraints is bounded in advance, universal generation is at least NL-hard and at most NPNP-hard. Our proofs rely on a close connection between OT and the intersection non-emptiness problem for finite automata, which is PSPACE-complete in general (Kozen 1977) and NL-complete when the number of automata is bounded (Jones 1975). Our analysis shows that constraint interaction is the main contributor to the complexity of OT: the ability to factor transformations into simple, interacting constraints allows OT to furnish compact descriptions of intricate phonological phenomena.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"90 7","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Language Embeddings Sometimes Contain Typological Generalizations 语言嵌入有时包含类型概括
2区 计算机科学 Pub Date : 2023-09-29 DOI: 10.1162/coli_a_00491
Robert Östling, Murathan Kurfalı
Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1,295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (1) multiple sets of language representations, (2) multilingual word embeddings, (3) projected and predicted syntactic and morphological features, (4) software to provide linguistically sound evaluations of language representations.
神经网络模型在多大程度上可以学习语言结构的泛化,我们如何发现它们学到了什么?我们通过训练神经模型,在1,295种语言的圣经翻译的大规模多语言数据集上进行一系列自然语言处理任务,来探索这些问题。然后将学习到的语言表征与现有的类型数据库以及通过注释投影获得的一组新的定量句法和形态学特征进行比较。我们得出的结论是,一些概括与语言类型学的传统特征惊人地接近,但我们的大多数模型,以及以前的工作,似乎并没有做出语言学上有意义的概括。在评估中仔细注意细节是避免误报的必要条件。此外,为了鼓励在这一领域的持续工作,我们发布了几个资源,涵盖了我们数据中的大部分或全部语言:(1)多套语言表示,(2)多语言词嵌入,(3)预测和预测的句法和形态特征,(4)提供语言表示的语言健全评估的软件。
{"title":"Language Embeddings Sometimes Contain Typological Generalizations","authors":"Robert Östling, Murathan Kurfalı","doi":"10.1162/coli_a_00491","DOIUrl":"https://doi.org/10.1162/coli_a_00491","url":null,"abstract":"Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1,295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (1) multiple sets of language representations, (2) multilingual word embeddings, (3) projected and predicted syntactic and morphological features, (4) software to provide linguistically sound evaluations of language representations.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135131571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Grammatical Error Correction: A Survey of the State of the Art 语法错误纠正:技术现状调查
IF 9.3 2区 计算机科学 Pub Date : 2023-09-01 DOI: 10.1162/coli_a_00478
Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarize the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgments, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as a comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.

语法纠错(GEC)是对文本中的错误进行自动检测和纠正的任务。这项任务不仅包括语法错误的纠正,如缺少介词和主动不匹配,还包括正字法和语义错误,如拼写错误和选词错误。该领域在过去十年中取得了重大进展,部分原因是一系列五个共享任务的推动,这些任务推动了基于规则的方法、统计分类器、统计机器翻译以及最后的神经机器翻译系统的发展,这些系统代表了当前的主导地位。在这篇调查报告中,我们将该领域浓缩成一篇文章,首先概述了该任务的一些语言挑战,介绍了研究人员可用的最流行的数据集(英语和其他语言),并总结了各种方法和技术,这些方法和技术特别关注人工错误生成。接下来,我们描述了许多不同的评估方法,以及围绕度量可靠性的关注,特别是与主观的人类判断有关,然后总结了最近的进展,并对未来的工作和仍然存在的挑战提出了建议。我们希望这项调查将作为一个全面的资源,为研究人员谁是新的领域或谁想要随时了解最新的发展。
{"title":"Grammatical Error Correction: A Survey of the State of the Art","authors":"Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe","doi":"10.1162/coli_a_00478","DOIUrl":"https://doi.org/10.1162/coli_a_00478","url":null,"abstract":"<p>Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarize the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgments, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as a comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.</p>","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"15 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138542303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Obituary: Yorick Wilks 讣告:约里克·威尔克斯
2区 计算机科学 Pub Date : 2023-08-10 DOI: 10.1162/coli_a_00485
John Tait, Robert Gaizauskas, Kalina Bontcheva
Yorick was a great friend of Natural Language Engineering. He was a member of the founding editorial board, but more to the point was a sage and encouraging advisor to the Founding Editors Roberto Garigliano, John Tait, and Branimir Boguraev right from the genesis of the project. At the time of his death, Yorick was one of, if not the, doyen of computational linguists. He had been continuously active in the field since 1962. Having graduated in philosophy, he took up a position in Margaret Masterman’s Cambridge Language Research Unit, an eccentric and somewhat informal organisation which started the careers of many pioneers of artificial intelligence and natural language engineering including Karen Spärck Jones, Martin Kay, Margaret Boden, and Roger Needham (thought by some to be the originator of machine learning, as well as much else in computing). Yorick was awarded a PhD in 1968 for work on the use of interlingua in machine translation. His PhD thesis stands out not least for its bright yellow binding (Wilks, 1968). Wilks’ effective PhD supervisor was Margaret Masterman, a student of Wittgenstein’s, although his work was formally directed by the distinguished philosopher Richard Braithwaite, Masterman’s husband, as she lacked an appropriate established position in the University of Cambridge. Inevitably, given the puny computers of the time, Yorick’s PhD work falls well short of the scientific standards of the 21st Century. Despite its shortcomings, his pioneering work influenced many people who have ultimately contributed to the now widespread practical use of machine translation and other automatic language processing systems. In particular, it would be reasonable to surmise that the current success of deep learning systems is based on inferring or inducing a hidden interlingua of the sort Wilks and colleagues tried to handcraft in the 1960s and 1970s. Furthermore, all probabilistic language systems are based on selecting a better or more likely interpretation of a fragment of language over a less likely one, a development of the preference semantics notion originally invented and popularised byWillks (1973, 1975). As a result, his early work continues to be worth studying, not least for the very deep insights careful reading often reveals. Underlying this early work was an interest in metaphor, which Yorick recognised as a pervasive feature of language. This was a topic to which Yorick returned repeatedly throughout his life. Wilks (1978) began to develop his approach, with Barnden (2007) providing a useful summary of work to that date. However, there is much later work – for example Wilks et al. (2013). Wilks was an important figure in the attempt to utilise existing, published dictionaries as a knowledge source for automatic natural language processing systems (Wilks, Slator, and Guthrie, 1996). This endeavour ultimately foundered on the differing interests of commercial dictionary publishers and developers of natural language processing
{"title":"Obituary: Yorick Wilks","authors":"John Tait, Robert Gaizauskas, Kalina Bontcheva","doi":"10.1162/coli_a_00485","DOIUrl":"https://doi.org/10.1162/coli_a_00485","url":null,"abstract":"Yorick was a great friend of Natural Language Engineering. He was a member of the founding editorial board, but more to the point was a sage and encouraging advisor to the Founding Editors Roberto Garigliano, John Tait, and Branimir Boguraev right from the genesis of the project. At the time of his death, Yorick was one of, if not the, doyen of computational linguists. He had been continuously active in the field since 1962. Having graduated in philosophy, he took up a position in Margaret Masterman’s Cambridge Language Research Unit, an eccentric and somewhat informal organisation which started the careers of many pioneers of artificial intelligence and natural language engineering including Karen Spärck Jones, Martin Kay, Margaret Boden, and Roger Needham (thought by some to be the originator of machine learning, as well as much else in computing). Yorick was awarded a PhD in 1968 for work on the use of interlingua in machine translation. His PhD thesis stands out not least for its bright yellow binding (Wilks, 1968). Wilks’ effective PhD supervisor was Margaret Masterman, a student of Wittgenstein’s, although his work was formally directed by the distinguished philosopher Richard Braithwaite, Masterman’s husband, as she lacked an appropriate established position in the University of Cambridge. Inevitably, given the puny computers of the time, Yorick’s PhD work falls well short of the scientific standards of the 21st Century. Despite its shortcomings, his pioneering work influenced many people who have ultimately contributed to the now widespread practical use of machine translation and other automatic language processing systems. In particular, it would be reasonable to surmise that the current success of deep learning systems is based on inferring or inducing a hidden interlingua of the sort Wilks and colleagues tried to handcraft in the 1960s and 1970s. Furthermore, all probabilistic language systems are based on selecting a better or more likely interpretation of a fragment of language over a less likely one, a development of the preference semantics notion originally invented and popularised byWillks (1973, 1975). As a result, his early work continues to be worth studying, not least for the very deep insights careful reading often reveals. Underlying this early work was an interest in metaphor, which Yorick recognised as a pervasive feature of language. This was a topic to which Yorick returned repeatedly throughout his life. Wilks (1978) began to develop his approach, with Barnden (2007) providing a useful summary of work to that date. However, there is much later work – for example Wilks et al. (2013). Wilks was an important figure in the attempt to utilise existing, published dictionaries as a knowledge source for automatic natural language processing systems (Wilks, Slator, and Guthrie, 1996). This endeavour ultimately foundered on the differing interests of commercial dictionary publishers and developers of natural language processing","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135492967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computational Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1