Computational typology has gained traction in the field of Natural Language Processing (NLP) in recent years, as evidenced by the increasing number of papers on the topic and the establishment of a Special Interest Group on the topic (SIGTYP), including the organization of successful workshops and shared tasks. A considerable amount of work in this sub-field is concerned with prediction of typological features, e.g., for databases such as the World Atlas of Language Structures (WALS) or Grambank. Prediction is argued to be useful either because (1) it allows for obtaining feature values for relatively undocumented languages, alleviating the sparseness in WALS, in turn argued to be useful for both NLP and linguistics; and (2) it allows us to probe models to see whether or not these typological features are encapsulated in, e.g., language representations. In this article, we present a critical stance concerning prediction of typological features, investigating to what extent this line of research is aligned with purported needs—both from the perspective of NLP practitioners, and perhaps more importantly, from the perspective of linguists specialized in typology and language documentation. We provide evidence that this line of research in its current state suffers from a lack of interdisciplinary alignment. Based on an extensive survey of the linguistic typology community, we present concrete recommendations for future research in order to improve this alignment between linguists and NLP researchers, beyond the scope of typological feature prediction.
{"title":"The Role of Typological Feature Prediction in NLP and Linguistics","authors":"Johannes Bjerva","doi":"10.1162/coli_a_00498","DOIUrl":"https://doi.org/10.1162/coli_a_00498","url":null,"abstract":"Computational typology has gained traction in the field of Natural Language Processing (NLP) in recent years, as evidenced by the increasing number of papers on the topic and the establishment of a Special Interest Group on the topic (SIGTYP), including the organization of successful workshops and shared tasks. A considerable amount of work in this sub-field is concerned with prediction of typological features, e.g., for databases such as the World Atlas of Language Structures (WALS) or Grambank. Prediction is argued to be useful either because (1) it allows for obtaining feature values for relatively undocumented languages, alleviating the sparseness in WALS, in turn argued to be useful for both NLP and linguistics; and (2) it allows us to probe models to see whether or not these typological features are encapsulated in, e.g., language representations. In this article, we present a critical stance concerning prediction of typological features, investigating to what extent this line of research is aligned with purported needs—both from the perspective of NLP practitioners, and perhaps more importantly, from the perspective of linguists specialized in typology and language documentation. We provide evidence that this line of research in its current state suffers from a lack of interdisciplinary alignment. Based on an extensive survey of the linguistic typology community, we present concrete recommendations for future research in order to improve this alignment between linguists and NLP researchers, beyond the scope of typological feature prediction.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"66 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138543718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lemmatization is a natural language processing (NLP) task which consists of producing, from a given inflected word, its canonical form or lemma. Lemmatization is one of the basic tasks that facilitate downstream NLP applications, and is of particular importance for high-inflected languages. Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance. In order to address this issue, in this paper we empirically investigate the role of morphological information to develop contextual lemmatizers in six languages within a varied spectrum of morphological complexity: Basque, Turkish, Russian, Czech, Spanish and English. Furthermore, and unlike the vast majority of previous work, we also evaluate lemmatizers in out-of-domain settings, which constitutes, after all, their most common application use. The results of our study are rather surprising. It turns out that providing lemmatizers with fine-grained morphological features during training is not that beneficial, not even for agglutinative languages. In fact, modern contextual word representations seem to implicitly encode enough morphological information to obtain competitive contextual lemmatizers without seeing any explicit morphological signal. Moreover, our experiments suggest that the best lemmatizers out-of-domain are those using simple UPOS tags or those trained without morphology and, finally, that current evaluation practices for lemmatization are not adequate to clearly discriminate between models.
{"title":"On the Role of Morphological Information for Contextual Lemmatization","authors":"Olia Toporkov, Rodrigo Agerri","doi":"10.1162/coli_a_00497","DOIUrl":"https://doi.org/10.1162/coli_a_00497","url":null,"abstract":"Lemmatization is a natural language processing (NLP) task which consists of producing, from a given inflected word, its canonical form or lemma. Lemmatization is one of the basic tasks that facilitate downstream NLP applications, and is of particular importance for high-inflected languages. Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance. In order to address this issue, in this paper we empirically investigate the role of morphological information to develop contextual lemmatizers in six languages within a varied spectrum of morphological complexity: Basque, Turkish, Russian, Czech, Spanish and English. Furthermore, and unlike the vast majority of previous work, we also evaluate lemmatizers in out-of-domain settings, which constitutes, after all, their most common application use. The results of our study are rather surprising. It turns out that providing lemmatizers with fine-grained morphological features during training is not that beneficial, not even for agglutinative languages. In fact, modern contextual word representations seem to implicitly encode enough morphological information to obtain competitive contextual lemmatizers without seeing any explicit morphological signal. Moreover, our experiments suggest that the best lemmatizers out-of-domain are those using simple UPOS tags or those trained without morphology and, finally, that current evaluation practices for lemmatization are not adequate to clearly discriminate between models.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"21 10","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about large language model capabilities, thus providing a resource for applied work and for research in adjacent fields that use language models.
{"title":"Language Model Behavior: A Comprehensive Survey","authors":"Tyler A. Chang, Benjamin K. Bergen","doi":"10.1162/coli_a_00492","DOIUrl":"https://doi.org/10.1162/coli_a_00492","url":null,"abstract":"Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers. In this survey, we discuss over 250 recent studies of English language model behavior before task-specific fine-tuning. Language models possess basic capabilities in syntax, semantics, pragmatics, world knowledge, and reasoning, but these capabilities are sensitive to specific inputs and surface features. Despite dramatic increases in generated text quality as models scale to hundreds of billions of parameters, the models are still prone to unfactual responses, commonsense errors, memorized text, and social biases. Many of these weaknesses can be framed as over-generalizations or under-generalizations of learned patterns in text. We synthesize recent results to highlight what is currently known about large language model capabilities, thus providing a resource for applied work and for research in adjacent fields that use language models.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"14 7","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jianhui Pang, Derek Fai Wong, Dayiheng Liu, Jun Xie, Baosong Yang, Yu Wan, Lidia Sam Chao
The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the effectiveness of techniques such as Back-Translation and self-supervised objectives, including Masked Language Modeling, Causal Language Modeling, and Denoise Autoencoding, in improving the performance of machine translation models. However, the manner in which these methods contribute to the success of machine translation tasks and how they can be effectively combined remains an under-researched area. In this study, we carry out a systematic investigation of the effects of these techniques on linguistic properties through the use of probing tasks, including source language comprehension, bilingual word alignment, and translation fluency. We further evaluate the impact of Pre-Training, Back-Translation, and Multi-Task Learning on bitexts of varying sizes. Our findings inform the design of more effective pipelines for leveraging monolingual data in extremely low-resource and low-resource machine translation tasks. Experiment results show consistent performance gains in seven translation directions, which provide further support for our conclusions and understanding of the role of monolingual data in machine translation.
{"title":"Rethinking the Exploitation of Monolingual Data for Low-Resource Neural Machine Translation","authors":"Jianhui Pang, Derek Fai Wong, Dayiheng Liu, Jun Xie, Baosong Yang, Yu Wan, Lidia Sam Chao","doi":"10.1162/coli_a_00496","DOIUrl":"https://doi.org/10.1162/coli_a_00496","url":null,"abstract":"The utilization of monolingual data has been shown to be a promising strategy for addressing low-resource machine translation problems. Previous studies have demonstrated the effectiveness of techniques such as Back-Translation and self-supervised objectives, including Masked Language Modeling, Causal Language Modeling, and Denoise Autoencoding, in improving the performance of machine translation models. However, the manner in which these methods contribute to the success of machine translation tasks and how they can be effectively combined remains an under-researched area. In this study, we carry out a systematic investigation of the effects of these techniques on linguistic properties through the use of probing tasks, including source language comprehension, bilingual word alignment, and translation fluency. We further evaluate the impact of Pre-Training, Back-Translation, and Multi-Task Learning on bitexts of varying sizes. Our findings inform the design of more effective pipelines for leveraging monolingual data in extremely low-resource and low-resource machine translation tasks. Experiment results show consistent performance gains in seven translation directions, which provide further support for our conclusions and understanding of the role of monolingual data in machine translation.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"64 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mark Ormerod, Barry Devereux, Jesús Martínez del Rincón
Despite the success of Transformer-based language models in a wide variety of natural language processing tasks, our understanding of how these models process a given input in order to represent task-relevant information remains incomplete. In this work, we focus on semantic composition and examine how Transformer-based language models represent semantic information related to the meaning of English noun-noun compounds. We probe Transformer-based language models for their knowledge of the thematic relations that link the head nouns and modifier words of compounds (e.g., KITCHEN CHAIR: a chair located in a kitchen). Firstly, using a dataset featuring groups of compounds with shared lexical or semantic features, we find that token representations of six Transformer-based language models distinguish between pairs of compounds based on whether they use the same thematic relation. Secondly, we utilize fine-grained vector representations of compound semantics derived from human annotations, and find that token vectors from several models elicit a strong signal of the semantic relations used in the compounds. In a novel ‘compositional probe’ setting, where we compare the semantic relation signal in mean-pooled token vectors of compounds to mean-pooled token vectors when the two constituent words appear in separate sentences, we find that the Transformer-based language models that best represent the semantics of noun-noun compounds also do so substantially better than in the control condition where the two constituent works are processed separately. Overall, our results shed light on the ability of Transformer-based language models to support compositional semantic processes in representing the meaning of noun-noun compounds.
{"title":"How is a “Kitchen Chair” like a “Farm Horse”? Exploring the Representation of Noun-Noun Compound Semantics in Transformer-based Language Models","authors":"Mark Ormerod, Barry Devereux, Jesús Martínez del Rincón","doi":"10.1162/coli_a_00495","DOIUrl":"https://doi.org/10.1162/coli_a_00495","url":null,"abstract":"Despite the success of Transformer-based language models in a wide variety of natural language processing tasks, our understanding of how these models process a given input in order to represent task-relevant information remains incomplete. In this work, we focus on semantic composition and examine how Transformer-based language models represent semantic information related to the meaning of English noun-noun compounds. We probe Transformer-based language models for their knowledge of the thematic relations that link the head nouns and modifier words of compounds (e.g., KITCHEN CHAIR: a chair located in a kitchen). Firstly, using a dataset featuring groups of compounds with shared lexical or semantic features, we find that token representations of six Transformer-based language models distinguish between pairs of compounds based on whether they use the same thematic relation. Secondly, we utilize fine-grained vector representations of compound semantics derived from human annotations, and find that token vectors from several models elicit a strong signal of the semantic relations used in the compounds. In a novel ‘compositional probe’ setting, where we compare the semantic relation signal in mean-pooled token vectors of compounds to mean-pooled token vectors when the two constituent words appear in separate sentences, we find that the Transformer-based language models that best represent the semantics of noun-noun compounds also do so substantially better than in the control condition where the two constituent works are processed separately. Overall, our results shed light on the ability of Transformer-based language models to support compositional semantic processes in representing the meaning of noun-noun compounds.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"53 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akshay Chaturvedi, Soumadeep Saha, Nicholas Asher, Swarnadeep Bhar, Utpal Garain
Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.
{"title":"Analyzing Semantic Faithfulness of Language Models via Input Intervention on Question Answering","authors":"Akshay Chaturvedi, Soumadeep Saha, Nicholas Asher, Swarnadeep Bhar, Utpal Garain","doi":"10.1162/coli_a_00493","DOIUrl":"https://doi.org/10.1162/coli_a_00493","url":null,"abstract":"Transformer-based language models have been shown to be highly effective for several NLP tasks. In this paper, we consider three transformer models, BERT, RoBERTa, and XLNet, in both small and large versions, and investigate how faithful their representations are with respect to the semantic content of texts. We formalize a notion of semantic faithfulness, in which the semantic content of a text should causally figure in a model's inferences in question answering. We then test this notion by observing a model's behavior on answering questions about a story after performing two novel semantic interventions—deletion intervention and negation intervention. While transformer models achieve high performance on standard question answering tasks, we show that they fail to be semantically faithful once we perform these interventions for a significant number of cases (∼ 50% for deletion intervention, and ∼ 20% drop in accuracy for negation intervention). We then propose an intervention-based training regime that can mitigate the undesirable effects for deletion intervention by a significant margin (from ∼ 50% to ∼ 6%). We analyze the inner-workings of the models to better understand the effectiveness of intervention-based training for deletion intervention. But we show that this training does not attenuate other aspects of semantic unfaithfulness such as the models' inability to deal with negation intervention or to capture the predicate-argument structure of texts. We also test InstructGPT, via prompting, for its ability to handle the two interventions and to capture predicate-argument structure. While InstructGPT models do achieve very high performance on predicate-argument structure task, they fail to respond adequately to our deletion and negation interventions.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"6 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper shows that the universal generation problem (Heinz, Kobele, and Riggle 2009) for Optimality Theory (OT, Prince and Smolensky 1993, 2004) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard (Eisner 1997, 2000b; Wareham 1998; Idsardi 2006) and at most EXPSPACE-hard (Riggle 2004), our results place universal generation in between those two classes, assuming that NP ≠ PSPACE. We additionally show that when the number of constraints is bounded in advance, universal generation is at least NL-hard and at most NPNP-hard. Our proofs rely on a close connection between OT and the intersection non-emptiness problem for finite automata, which is PSPACE-complete in general (Kozen 1977) and NL-complete when the number of automata is bounded (Jones 1975). Our analysis shows that constraint interaction is the main contributor to the complexity of OT: the ability to factor transformations into simple, interacting constraints allows OT to furnish compact descriptions of intricate phonological phenomena.
本文证明了最优性理论(OT, Prince and Smolensky 1993, 2004)的普遍生成问题(Heinz, Kobele, and Riggle 2009)是pspace完备的。虽然先前的研究表明,普遍生成至少是NP-hard的(Eisner 1997, 2000b;Wareham 1998;Idsardi 2006)和最多EXPSPACE-hard (Riggle 2004),我们的结果将全称生成置于这两类之间,假设NP≠PSPACE。我们还表明,当约束的数量预先有界时,通用生成至少是NL-hard,最多是NPNP-hard。我们的证明依赖于OT与有限自动机的交非空问题之间的密切联系,一般情况下是PSPACE-complete (Kozen 1977),而当自动机的数量有界时是NL-complete (Jones 1975)。我们的分析表明,约束相互作用是OT复杂性的主要贡献者:将转换分解为简单的、相互作用的约束的能力允许OT提供复杂语音现象的紧凑描述。
{"title":"Universal Generation for Optimality Theory Is PSPACE-Complete","authors":"Sophie Hao","doi":"10.1162/coli_a_00494","DOIUrl":"https://doi.org/10.1162/coli_a_00494","url":null,"abstract":"This paper shows that the universal generation problem (Heinz, Kobele, and Riggle 2009) for Optimality Theory (OT, Prince and Smolensky 1993, 2004) is PSPACE-complete. While prior work has shown that universal generation is at least NP-hard (Eisner 1997, 2000b; Wareham 1998; Idsardi 2006) and at most EXPSPACE-hard (Riggle 2004), our results place universal generation in between those two classes, assuming that NP ≠ PSPACE. We additionally show that when the number of constraints is bounded in advance, universal generation is at least NL-hard and at most NPNP-hard. Our proofs rely on a close connection between OT and the intersection non-emptiness problem for finite automata, which is PSPACE-complete in general (Kozen 1977) and NL-complete when the number of automata is bounded (Jones 1975). Our analysis shows that constraint interaction is the main contributor to the complexity of OT: the ability to factor transformations into simple, interacting constraints allows OT to furnish compact descriptions of intricate phonological phenomena.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"90 7","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138526537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1,295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (1) multiple sets of language representations, (2) multilingual word embeddings, (3) projected and predicted syntactic and morphological features, (4) software to provide linguistically sound evaluations of language representations.
{"title":"Language Embeddings Sometimes Contain Typological Generalizations","authors":"Robert Östling, Murathan Kurfalı","doi":"10.1162/coli_a_00491","DOIUrl":"https://doi.org/10.1162/coli_a_00491","url":null,"abstract":"Abstract To what extent can neural network models learn generalizations about language structure, and how do we find out what they have learned? We explore these questions by training neural models for a range of natural language processing tasks on a massively multilingual dataset of Bible translations in 1,295 languages. The learned language representations are then compared to existing typological databases as well as to a novel set of quantitative syntactic and morphological features obtained through annotation projection. We conclude that some generalizations are surprisingly close to traditional features from linguistic typology, but that most of our models, as well as those of previous work, do not appear to have made linguistically meaningful generalizations. Careful attention to details in the evaluation turns out to be essential to avoid false positives. Furthermore, to encourage continued work in this field, we release several resources covering most or all of the languages in our data: (1) multiple sets of language representations, (2) multilingual word embeddings, (3) projected and predicted syntactic and morphological features, (4) software to provide linguistically sound evaluations of language representations.","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"133 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135131571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe
Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarize the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgments, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as a comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.
{"title":"Grammatical Error Correction: A Survey of the State of the Art","authors":"Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe","doi":"10.1162/coli_a_00478","DOIUrl":"https://doi.org/10.1162/coli_a_00478","url":null,"abstract":"<p>Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense the field into a single article and first outline some of the linguistic challenges of the task, introduce the most popular datasets that are available to researchers (for both English and other languages), and summarize the various methods and techniques that have been developed with a particular focus on artificial error generation. We next describe the many different approaches to evaluation as well as concerns surrounding metric reliability, especially in relation to subjective human judgments, before concluding with an overview of recent progress and suggestions for future work and remaining challenges. We hope that this survey will serve as a comprehensive resource for researchers who are new to the field or who want to be kept apprised of recent developments.</p>","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"15 1","pages":""},"PeriodicalIF":9.3,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138542303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yorick was a great friend of Natural Language Engineering. He was a member of the founding editorial board, but more to the point was a sage and encouraging advisor to the Founding Editors Roberto Garigliano, John Tait, and Branimir Boguraev right from the genesis of the project. At the time of his death, Yorick was one of, if not the, doyen of computational linguists. He had been continuously active in the field since 1962. Having graduated in philosophy, he took up a position in Margaret Masterman’s Cambridge Language Research Unit, an eccentric and somewhat informal organisation which started the careers of many pioneers of artificial intelligence and natural language engineering including Karen Spärck Jones, Martin Kay, Margaret Boden, and Roger Needham (thought by some to be the originator of machine learning, as well as much else in computing). Yorick was awarded a PhD in 1968 for work on the use of interlingua in machine translation. His PhD thesis stands out not least for its bright yellow binding (Wilks, 1968). Wilks’ effective PhD supervisor was Margaret Masterman, a student of Wittgenstein’s, although his work was formally directed by the distinguished philosopher Richard Braithwaite, Masterman’s husband, as she lacked an appropriate established position in the University of Cambridge. Inevitably, given the puny computers of the time, Yorick’s PhD work falls well short of the scientific standards of the 21st Century. Despite its shortcomings, his pioneering work influenced many people who have ultimately contributed to the now widespread practical use of machine translation and other automatic language processing systems. In particular, it would be reasonable to surmise that the current success of deep learning systems is based on inferring or inducing a hidden interlingua of the sort Wilks and colleagues tried to handcraft in the 1960s and 1970s. Furthermore, all probabilistic language systems are based on selecting a better or more likely interpretation of a fragment of language over a less likely one, a development of the preference semantics notion originally invented and popularised byWillks (1973, 1975). As a result, his early work continues to be worth studying, not least for the very deep insights careful reading often reveals. Underlying this early work was an interest in metaphor, which Yorick recognised as a pervasive feature of language. This was a topic to which Yorick returned repeatedly throughout his life. Wilks (1978) began to develop his approach, with Barnden (2007) providing a useful summary of work to that date. However, there is much later work – for example Wilks et al. (2013). Wilks was an important figure in the attempt to utilise existing, published dictionaries as a knowledge source for automatic natural language processing systems (Wilks, Slator, and Guthrie, 1996). This endeavour ultimately foundered on the differing interests of commercial dictionary publishers and developers of natural language processing
{"title":"Obituary: Yorick Wilks","authors":"John Tait, Robert Gaizauskas, Kalina Bontcheva","doi":"10.1162/coli_a_00485","DOIUrl":"https://doi.org/10.1162/coli_a_00485","url":null,"abstract":"Yorick was a great friend of Natural Language Engineering. He was a member of the founding editorial board, but more to the point was a sage and encouraging advisor to the Founding Editors Roberto Garigliano, John Tait, and Branimir Boguraev right from the genesis of the project. At the time of his death, Yorick was one of, if not the, doyen of computational linguists. He had been continuously active in the field since 1962. Having graduated in philosophy, he took up a position in Margaret Masterman’s Cambridge Language Research Unit, an eccentric and somewhat informal organisation which started the careers of many pioneers of artificial intelligence and natural language engineering including Karen Spärck Jones, Martin Kay, Margaret Boden, and Roger Needham (thought by some to be the originator of machine learning, as well as much else in computing). Yorick was awarded a PhD in 1968 for work on the use of interlingua in machine translation. His PhD thesis stands out not least for its bright yellow binding (Wilks, 1968). Wilks’ effective PhD supervisor was Margaret Masterman, a student of Wittgenstein’s, although his work was formally directed by the distinguished philosopher Richard Braithwaite, Masterman’s husband, as she lacked an appropriate established position in the University of Cambridge. Inevitably, given the puny computers of the time, Yorick’s PhD work falls well short of the scientific standards of the 21st Century. Despite its shortcomings, his pioneering work influenced many people who have ultimately contributed to the now widespread practical use of machine translation and other automatic language processing systems. In particular, it would be reasonable to surmise that the current success of deep learning systems is based on inferring or inducing a hidden interlingua of the sort Wilks and colleagues tried to handcraft in the 1960s and 1970s. Furthermore, all probabilistic language systems are based on selecting a better or more likely interpretation of a fragment of language over a less likely one, a development of the preference semantics notion originally invented and popularised byWillks (1973, 1975). As a result, his early work continues to be worth studying, not least for the very deep insights careful reading often reveals. Underlying this early work was an interest in metaphor, which Yorick recognised as a pervasive feature of language. This was a topic to which Yorick returned repeatedly throughout his life. Wilks (1978) began to develop his approach, with Barnden (2007) providing a useful summary of work to that date. However, there is much later work – for example Wilks et al. (2013). Wilks was an important figure in the attempt to utilise existing, published dictionaries as a knowledge source for automatic natural language processing systems (Wilks, Slator, and Guthrie, 1996). This endeavour ultimately foundered on the differing interests of commercial dictionary publishers and developers of natural language processing","PeriodicalId":49089,"journal":{"name":"Computational Linguistics","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135492967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}