As more written language data become available, the interest in written language mixing / codeswitching (LM/CS) is increasing (Sebba, Mahootian & Jonsson 2012; Sebba 2013). LM/CS in non-naturalistic (e.g., literary) texts raises issues related to gauging (1) the authenticity and representativity of a textual corpus, and deciding (2) whether categories/mechanisms of spoken LM/CS apply to written LM/CS.1 We focus on Guarani-Spanish LM/CS (Jopara) as represented in the Paraguayan novel Ramona Quebranto (RQ). We apply the framework of Muysken (1997; 2000; 2013), developed as a taxonomy of spoken LM/CS. Our contribution extends its applicability to written LM/CS. We show that Jopara has a mix of insertional and backflagging strategies, with infrequent alternations.
{"title":"Analyzing the structure of code-switched written texts","authors":"Bruno Estigarribia, Zachary Wilkins","doi":"10.1075/LV.00007.EST","DOIUrl":"https://doi.org/10.1075/LV.00007.EST","url":null,"abstract":"\u0000 As more written language data become available, the interest in written language mixing / codeswitching (LM/CS) is increasing (Sebba, Mahootian & Jonsson 2012; Sebba 2013). LM/CS in non-naturalistic (e.g., literary) texts raises issues related to gauging (1) the authenticity and representativity of a textual corpus, and deciding (2) whether categories/mechanisms of spoken LM/CS apply to written LM/CS.1 We focus on Guarani-Spanish LM/CS (Jopara) as represented in the Paraguayan novel Ramona Quebranto (RQ). We apply the framework of Muysken (1997; 2000; 2013), developed as a taxonomy of spoken LM/CS. Our contribution extends its applicability to written LM/CS. We show that Jopara has a mix of insertional and backflagging strategies, with infrequent alternations.","PeriodicalId":103584,"journal":{"name":"Romance Parsed Corpora","volume":"4 10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124612000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Barbara E. Bullock, Jacqueline Serigos, Almeida Jacqueline Toribio, Arthur Wendorf
This article describes efforts to collect, process, and automatically annotate a corpus of Spanish as spoken in Texas. It elaborates the protocols for the development of the corpus and the procedures for automatic annotation, illustrating the common pitfalls to language identification in bilingual corpora and potential methods for circumventing them. The benefits of a comparative corpus approach to contact varieties is illustrated by a case study of a putative verbal calque from the Spanish in Texas data. It is demonstrated that the relative frequency of the verb is much higher than in its source Mexican variety and that the verb selects different complements in Texas than it does in other varieties. The article concludes with a discussion of how computational tools might be fruitfully exploited to resolve long-standing debates about language variation in contact settings.
{"title":"The challenges and benefits of annotating oral bilingual corpora","authors":"Barbara E. Bullock, Jacqueline Serigos, Almeida Jacqueline Toribio, Arthur Wendorf","doi":"10.1075/LV.00006.BUL","DOIUrl":"https://doi.org/10.1075/LV.00006.BUL","url":null,"abstract":"\u0000 This article describes efforts to collect, process, and automatically annotate a corpus of Spanish as spoken in Texas. It elaborates the protocols for the development of the corpus and the procedures for automatic annotation, illustrating the common pitfalls to language identification in bilingual corpora and potential methods for circumventing them. The benefits of a comparative corpus approach to contact varieties is illustrated by a case study of a putative verbal calque from the Spanish in Texas data. It is demonstrated that the relative frequency of the verb is much higher than in its source Mexican variety and that the verb selects different complements in Texas than it does in other varieties. The article concludes with a discussion of how computational tools might be fruitfully exploited to resolve long-standing debates about language variation in contact settings.","PeriodicalId":103584,"journal":{"name":"Romance Parsed Corpora","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127383658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article introduces the Tycho Brahe Corpus (TBC), a parsed corpus of Historical Portuguese built on the model of the Penn-York Corpora of English. As an illustration of the usefulness of the TBC, the article presents research on the evolution of the position and interpretation of subjects in Portuguese from the 16th to the 19th century. Two main claims emerge, in response to questions that have largely remained unanswered until now, due to the paucity of available data. One is that the texts of the classical period instantiate verb-movement to Comp in matrix clauses, reflecting a V2 grammar. The other is that quantitative and qualitative changes appearing in the texts of the authors born from the beginning of the 18th century on indicate that, at this period, verb-movement to Comp was lost and the modern SVO grammar emerged.
{"title":"The Tycho Brahe Corpus of Historical Portuguese","authors":"Charlotte Galves","doi":"10.1075/LV.00004.GAL","DOIUrl":"https://doi.org/10.1075/LV.00004.GAL","url":null,"abstract":"\u0000 This article introduces the Tycho Brahe Corpus (TBC), a parsed corpus of Historical Portuguese built on the model of the Penn-York Corpora of English. As an illustration of the usefulness of the TBC, the article presents research on the evolution of the position and interpretation of subjects in Portuguese from the 16th to the 19th century. Two main claims emerge, in response to questions that have largely remained unanswered until now, due to the paucity of available data. One is that the texts of the classical period instantiate verb-movement to Comp in matrix clauses, reflecting a V2 grammar. The other is that quantitative and qualitative changes appearing in the texts of the authors born from the beginning of the 18th century on indicate that, at this period, verb-movement to Comp was lost and the modern SVO grammar emerged.","PeriodicalId":103584,"journal":{"name":"Romance Parsed Corpora","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128811951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The argument DP hypothesis, adopted by many syntactic analyses, claims that nominal arguments are introduced by a determiner (D), which may be covert or overt. While overt D is obligatory in Modern French (consistent with the argument DP hypothesis), it was not obligatory in earlier stages of French. We explore the factors that contributed to this change – including semantic class, syntactic function, number, and definiteness – focusing on a shift that occurred in the D-paradigm in two Anglo-Norman texts of the 12th century. Quantitative analysis (Goldvarb) yields two major findings. First, the effect of syntactic function remains constant: subject position favours overt D, but object position inhibits it. Second, there is a change in the effect of semantic class: count nouns increasingly favour overt D, but non-count (mass and abstract) nouns increasingly inhibit it. More generally, the gradual disappearance of bare Ns in French reflects the emergence of paradigmatically conditioned D.
{"title":"The variable use of determiners in Old French and the argument DP hypothesis","authors":"Monique Dufresne, Mire-ô B. Tremblay, R. Déchaîne","doi":"10.1075/LV.00003.DUF","DOIUrl":"https://doi.org/10.1075/LV.00003.DUF","url":null,"abstract":"\u0000 The argument DP hypothesis, adopted by many syntactic analyses, claims that nominal arguments are introduced by a determiner (D), which may be covert or overt. While overt D is obligatory in Modern French (consistent with the argument DP hypothesis), it was not obligatory in earlier stages of French. We explore the factors that contributed to this change – including semantic class, syntactic function, number, and definiteness – focusing on a shift that occurred in the D-paradigm in two Anglo-Norman texts of the 12th century. Quantitative analysis (Goldvarb) yields two major findings. First, the effect of syntactic function remains constant: subject position favours overt D, but object position inhibits it. Second, there is a change in the effect of semantic class: count nouns increasingly favour overt D, but non-count (mass and abstract) nouns increasingly inhibit it. More generally, the gradual disappearance of bare Ns in French reflects the emergence of paradigmatically conditioned D.","PeriodicalId":103584,"journal":{"name":"Romance Parsed Corpora","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115105118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This contribution presents two syntactically annotated corpora of Old French, Modéliser le changement: les voies du français (MCVF) and the Syntactic Reference Corpus of Medieval French (SRCMF). The focus is on how the underlying syntactic theory (constituency vs. dependency) influences the grammar model and how this choice is reflected in the syntactic annotations of the corpora. The comparison relates to the most relevant general properties of the corpora as well as to two phenomena, null subjects and cleft constructions. Null subjects highlight possible conflicts between syntactic annotation models and syntactic theory, and the information-structural properties of cleft constructions pose a particular problem for the interpretation and annotation of historical corpora. Both phenomena are major instances of diachronic variation in French. The study is relevant for corpus users working on diachronic syntax, as well for corpus builders wishing to design a grammar model for annotation.
本文介绍了两个古法语语法注释语料库,modsamliiser le changement: les voies du franais (MCVF)和中古法语句法参考语料库(SRCMF)。重点是底层的句法理论(集合与依赖)如何影响语法模型,以及这种选择如何反映在语料库的句法注释中。这种比较涉及到语料库最相关的一般性质,以及两种现象:空主语和裂缝结构。空主语突出了句法标注模型与句法理论之间可能存在的冲突,而断裂性结构的信息结构特性给历史语料库的解释和标注带来了特殊的问题。这两种现象都是法语历时变化的主要实例。该研究不仅适用于使用历时语法的语料库用户,也适用于希望为标注设计语法模型的语料库构建者。
{"title":"Diachronic syntax based on constituency and dependency annotated corpora","authors":"A. Stein","doi":"10.1075/LV.00005.STE","DOIUrl":"https://doi.org/10.1075/LV.00005.STE","url":null,"abstract":"\u0000 This contribution presents two syntactically annotated corpora of Old French, Modéliser le changement: les voies du français (MCVF) and the Syntactic Reference Corpus of Medieval French (SRCMF). The focus is on how the underlying syntactic theory (constituency vs. dependency) influences the grammar model and how this choice is reflected in the syntactic annotations of the corpora. The comparison relates to the most relevant general properties of the corpora as well as to two phenomena, null subjects and cleft constructions. Null subjects highlight possible conflicts between syntactic annotation models and syntactic theory, and the information-structural properties of cleft constructions pose a particular problem for the interpretation and annotation of historical corpora. Both phenomena are major instances of diachronic variation in French. The study is relevant for corpus users working on diachronic syntax, as well for corpus builders wishing to design a grammar model for annotation.","PeriodicalId":103584,"journal":{"name":"Romance Parsed Corpora","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115215926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we argue that verb clusters in Dutch varieties are merged and linearized in fully ascending (1-2-3) or fully descending (3-2-1) orders. We argue that verb clusters that deviate from these orders involve non-verbal material: adjectival participles, or nominal infinitives. As a result, our approach does not involve any unmotivated movements that are specific for verb clusters. Support for our analysis comes from (i) the interpretation of verb clusters; (ii) the fact that order variation depends on the types of verbs involved, which can be explained by selectional requirements of the verbs; and (iii) the geographic co-occurrence patterns of various orders. First, the 1-3-2 and 3-1-2 orders are argued to be ascending orders with a non-verbal 3. Indeed these orders occur in grammars that have ascending, rather than descending, verb clusters. Secondly, the 1-3-2 order is argued to be an interrupted V1-V2 cluster with a non-verbal 3. Indeed, this order is most common in the region where non-verbal material can interrupt the verb cluster. Our analysis of word order variation in verb clusters in terms of principles of grammar is further supported by an experiment in which we asked a large number of speakers distributed over the Dutch language area to rank all logically possible orders, including orders that are not common in their own variety of Dutch. The results demonstrate that speakers apply their syntactic knowledge to rank verb cluster orders that they do not use themselves. We argue that this knowledge cannot be due to familiarity with the various orders.
{"title":"Merging verb cluster variation","authors":"S. Barbiers, H. J. Bennis, L. Dros-Hendriks","doi":"10.1075/LV.00008.BAR","DOIUrl":"https://doi.org/10.1075/LV.00008.BAR","url":null,"abstract":"\u0000 In this paper we argue that verb clusters in Dutch varieties are merged and linearized in fully ascending (1-2-3) or fully descending (3-2-1) orders. We argue that verb clusters that deviate from these orders involve non-verbal material: adjectival participles, or nominal infinitives. As a result, our approach does not involve any unmotivated movements that are specific for verb clusters.\u0000 Support for our analysis comes from (i) the interpretation of verb clusters; (ii) the fact that order variation depends on the types of verbs involved, which can be explained by selectional requirements of the verbs; and (iii) the geographic co-occurrence patterns of various orders. First, the 1-3-2 and 3-1-2 orders are argued to be ascending orders with a non-verbal 3. Indeed these orders occur in grammars that have ascending, rather than descending, verb clusters. Secondly, the 1-3-2 order is argued to be an interrupted V1-V2 cluster with a non-verbal 3. Indeed, this order is most common in the region where non-verbal material can interrupt the verb cluster.\u0000 Our analysis of word order variation in verb clusters in terms of principles of grammar is further supported by an experiment in which we asked a large number of speakers distributed over the Dutch language area to rank all logically possible orders, including orders that are not common in their own variety of Dutch. The results demonstrate that speakers apply their syntactic knowledge to rank verb cluster orders that they do not use themselves. We argue that this knowledge cannot be due to familiarity with the various orders.","PeriodicalId":103584,"journal":{"name":"Romance Parsed Corpora","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132716754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}