This study analyses the discourse functions of tense in the New York Times Corpus under a three-dimensional framework – the three dimensions being tense, verb and textual position. The texts are divided into ten 10 percent sub-sections and the distribution of the tenses along textual positions, the distribution of verb categories along textual positions as well as the distribution of tense-verb construction along textual positions are calculated to examine the features of tense use in news reports. The association between tense and verb categories is calculated using WordSmith log-likelihood statistics. Quantitative distribution analysis of the tenses reveals their distribution patterns. The distribution of the present and the present perfect follows a multi-peaked curve while there is a steady increase of preterit from the beginning to the end. The association between tense and verb shows that different tenses have attractions for different verb categories. The present tense attracts state verbs, the past tense attracts achievement verbs, and the present perfect prefers achievement and activity verbs. Analysis of tense-verb constructions along textual positions reveals that tense-verb constructions have localised functions – within different textual positions, tense-verb constructions take on various features and focus on different functions. All these findings constitute the stylistic use of tenses in news reports and reveal modern news values in the journalistic community.
{"title":"A corpus-based study of the discourse functions of English tense: the co-occurrence of tense and lexical aspect at various textual positions of news reports","authors":"Liying Zhang","doi":"10.3366/cor.2023.0282","DOIUrl":"https://doi.org/10.3366/cor.2023.0282","url":null,"abstract":"This study analyses the discourse functions of tense in the New York Times Corpus under a three-dimensional framework – the three dimensions being tense, verb and textual position. The texts are divided into ten 10 percent sub-sections and the distribution of the tenses along textual positions, the distribution of verb categories along textual positions as well as the distribution of tense-verb construction along textual positions are calculated to examine the features of tense use in news reports. The association between tense and verb categories is calculated using WordSmith log-likelihood statistics. Quantitative distribution analysis of the tenses reveals their distribution patterns. The distribution of the present and the present perfect follows a multi-peaked curve while there is a steady increase of preterit from the beginning to the end. The association between tense and verb shows that different tenses have attractions for different verb categories. The present tense attracts state verbs, the past tense attracts achievement verbs, and the present perfect prefers achievement and activity verbs. Analysis of tense-verb constructions along textual positions reveals that tense-verb constructions have localised functions – within different textual positions, tense-verb constructions take on various features and focus on different functions. All these findings constitute the stylistic use of tenses in news reports and reveal modern news values in the journalistic community.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46920870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Corpus of Historical Mapudungun (chm), which I present here, is a lemmatised, part-of-speech and grapho-phonologically parsed collection of texts in the ancestral language of the Mapuche people. This paper gives an overview of the corpus materials (spanning 1606 to 1930), their processing and search capabilities. The tei xml tags at the word and morpheme levels are shown to be suitable to account for the abundant agglutinative morphology of the language. The advantages of visualising sound–spelling equivalences across the various spelling systems in the corpus are also emphasised. Some uses and limitations of the corpus are surveyed too, with a particular emphasis on the contribution of typologically diverse languages to understanding language change and the importance of making heritage materials available to native speaker communities for revitalisation purposes.
{"title":"The Corpus of Historical Mapudungun: morpho-phonological parsing and the history of a Native American language","authors":"Benjamin Molineaux","doi":"10.3366/cor.2023.0281","DOIUrl":"https://doi.org/10.3366/cor.2023.0281","url":null,"abstract":"The Corpus of Historical Mapudungun (chm), which I present here, is a lemmatised, part-of-speech and grapho-phonologically parsed collection of texts in the ancestral language of the Mapuche people. This paper gives an overview of the corpus materials (spanning 1606 to 1930), their processing and search capabilities. The tei xml tags at the word and morpheme levels are shown to be suitable to account for the abundant agglutinative morphology of the language. The advantages of visualising sound–spelling equivalences across the various spelling systems in the corpus are also emphasised. Some uses and limitations of the corpus are surveyed too, with a particular emphasis on the contribution of typologically diverse languages to understanding language change and the importance of making heritage materials available to native speaker communities for revitalisation purposes.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43046650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper reports on a comparative investigation into the differences and similarities in the use of phrasal verbs (pvs) by L1 English and L1 Chinese scholars (ess and css) in academic English writing. Using a corpus of research articles from the fields of Physics, Computer Science, Linguistics and Management written by ess and css, we present data to reveal that: ( i) pvs are used in both css’ and ess’ research articles across disciplines; ( ii) there are significant differences in the use of pvs between css and ess, with css employing pvs less frequently than ess in both types and tokens; ( iii) disciplinary variations have been detected – research articles in soft science disciplines (Linguistics and Management) deploy significantly more pvs and the tendency is particularly so in ess’ research articles; ( iv) both css and ess use the ‘Verb + Adverbial particle + np’ or ‘Verb + np + Adverbial particle’ pattern and the ‘Verb + Preposition + np’ pattern most frequently; and ( v) the majority of the most frequent pvs are shared by css and ess and used in their metaphorical senses. Qualitative analyses of the four selected items demonstrate that the co-selection between the collocating nouns and the structural patterns of pvs decides the senses being realised. These findings shed light on teaching academic writing and provide writers with some guidance on verb choices.
{"title":"A comparable corpus-based study of phrasal verbs in academic writing by English and Chinese scholars across disciplines","authors":"Xianwei Gao","doi":"10.3366/cor.2023.0283","DOIUrl":"https://doi.org/10.3366/cor.2023.0283","url":null,"abstract":"This paper reports on a comparative investigation into the differences and similarities in the use of phrasal verbs (pvs) by L1 English and L1 Chinese scholars (ess and css) in academic English writing. Using a corpus of research articles from the fields of Physics, Computer Science, Linguistics and Management written by ess and css, we present data to reveal that: ( i) pvs are used in both css’ and ess’ research articles across disciplines; ( ii) there are significant differences in the use of pvs between css and ess, with css employing pvs less frequently than ess in both types and tokens; ( iii) disciplinary variations have been detected – research articles in soft science disciplines (Linguistics and Management) deploy significantly more pvs and the tendency is particularly so in ess’ research articles; ( iv) both css and ess use the ‘Verb + Adverbial particle + np’ or ‘Verb + np + Adverbial particle’ pattern and the ‘Verb + Preposition + np’ pattern most frequently; and ( v) the majority of the most frequent pvs are shared by css and ess and used in their metaphorical senses. Qualitative analyses of the four selected items demonstrate that the co-selection between the collocating nouns and the structural patterns of pvs decides the senses being realised. These findings shed light on teaching academic writing and provide writers with some guidance on verb choices.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43988076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The original meaning of words or phrases is often in dispute in Founding Era legislation, especially the US Constitution. The Corpus of Founding Era American English (cofea) accurately provides evidence for the meaning of contested terms during the Founding Era. cofea consists of 126,394 texts and over 136 million words. This corpus has been and is being used by legal researchers and interpreters in scholarly research as well as various courts, including the Supreme Court. This paper describes the motivation for the creation of cofea and describes the process of designing and collecting the corpus.
{"title":"Corpus of Founding Era American English: designing a corpus for interpreting the United States Constitution","authors":"Brett Hashimoto","doi":"10.3366/cor.2023.0270","DOIUrl":"https://doi.org/10.3366/cor.2023.0270","url":null,"abstract":"The original meaning of words or phrases is often in dispute in Founding Era legislation, especially the US Constitution. The Corpus of Founding Era American English (cofea) accurately provides evidence for the meaning of contested terms during the Founding Era. cofea consists of 126,394 texts and over 136 million words. This corpus has been and is being used by legal researchers and interpreters in scholarly research as well as various courts, including the Supreme Court. This paper describes the motivation for the creation of cofea and describes the process of designing and collecting the corpus.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47861140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To date, corpus-based methods for comparing language varieties have fallen into one of two camps: ( 1) md analysis – a complicated multi-variate approach based on analysis of functionally motivated linguistic features in each text of a corpus, or ( 2) keyword/key pos analysis – simple, univariate techniques to identify any feature with a statistically skewed distribution in a corpus. In this paper, we introduce a complementary technique – key feature analysis – which is a simple quantitative approach to compare the texts in two varieties with respect to a set of functionally motivated lexico-grammatical features. We introduce the methods of key feature analysis, contrast them with other approaches for comparing text varieties, and present case studies from the domains of online registers and US presidential debates.
{"title":"Key feature analysis: a simple, yet powerful method for comparing text varieties","authors":"Jesse Egbert, D. Biber","doi":"10.3366/cor.2023.0275","DOIUrl":"https://doi.org/10.3366/cor.2023.0275","url":null,"abstract":"To date, corpus-based methods for comparing language varieties have fallen into one of two camps: ( 1) md analysis – a complicated multi-variate approach based on analysis of functionally motivated linguistic features in each text of a corpus, or ( 2) keyword/key pos analysis – simple, univariate techniques to identify any feature with a statistically skewed distribution in a corpus. In this paper, we introduce a complementary technique – key feature analysis – which is a simple quantitative approach to compare the texts in two varieties with respect to a set of functionally motivated lexico-grammatical features. We introduce the methods of key feature analysis, contrast them with other approaches for comparing text varieties, and present case studies from the domains of online registers and US presidential debates.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44470337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Review: Egbert, Larsson and Biber. 2020. Doing Linguistics with a Corpus: Methodological Considerations for the Everyday User","authors":"Veysel Altunel","doi":"10.3366/cor.2023.0277","DOIUrl":"https://doi.org/10.3366/cor.2023.0277","url":null,"abstract":"","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49277866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Family violence is an enduring social problem with devastating impacts. The Victorian Government (Australia) Royal Commission (state inquiry) into Family Violence (rcfv) noted that language is implicated in the under-reporting and under-recording of violence, and emphasised the importance of agencies having ‘a common language’ and ‘shared understanding’ of family violence. Our analyses examine written submissions to the rcfv for frequencies and collocations, focussed on the construction and roles of human referents. We utilised corpus-assisted discourse analysis to explore whether community service and law-based professional bodies do have common vocabularies and if these represent shared ideas, responding directly to agendas set by those involved. Our analyses show key differences but also uncover a shared lack of agency given to victims and a loss of focus on the role of those who inflict these forms of violence. We argue for the utility of corpus linguistic methods to show empirically how language is used to construct conceptualisations of family violence across key sectors of the service system. We intend this research as a starting point for discussion between professionals working to improve cross-sector communication, by bringing linguistic insights to this deep-rooted social issue.
{"title":"A common language and shared understanding of family violence? Corpus-based approaches in support of system responses to family violence","authors":"Cara Penry Williams, Tonya N. Stebbins","doi":"10.3366/cor.2023.0273","DOIUrl":"https://doi.org/10.3366/cor.2023.0273","url":null,"abstract":"Family violence is an enduring social problem with devastating impacts. The Victorian Government (Australia) Royal Commission (state inquiry) into Family Violence (rcfv) noted that language is implicated in the under-reporting and under-recording of violence, and emphasised the importance of agencies having ‘a common language’ and ‘shared understanding’ of family violence. Our analyses examine written submissions to the rcfv for frequencies and collocations, focussed on the construction and roles of human referents. We utilised corpus-assisted discourse analysis to explore whether community service and law-based professional bodies do have common vocabularies and if these represent shared ideas, responding directly to agendas set by those involved. Our analyses show key differences but also uncover a shared lack of agency given to victims and a loss of focus on the role of those who inflict these forms of violence. We argue for the utility of corpus linguistic methods to show empirically how language is used to construct conceptualisations of family violence across key sectors of the service system. We intend this research as a starting point for discussion between professionals working to improve cross-sector communication, by bringing linguistic insights to this deep-rooted social issue.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47502070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mood is one of the most popular topics in Spanish and mood alternation occurs when the structure is just an adverb and a verb. Although many studies have addressed this subject, most focus on verbs. Some studies have discussed mood alternation with adverbs of doubt, but only a few use statistical methods. In addition, the existing research on speakers’ social factors is insufficient. Therefore, the objectives of this study are to reveal differences between the adverbs of doubt using statistical methods and to elucidate the relationship between mood alternation and social factors. To achieve these objectives, this corpus study extracts data from corpes xxi (2020) and preseea-Madrid ( distrito de Salamanca) and analyses 1,587 examples containing posiblemente, probablemente, quizá, quizás, tal vez, a lo mejor, igual and seguramente using multiple correspondence analysis and decision tree analysis. The results show that adverbs of doubt can be divided into two groups. One includes posiblemente, probablemente, quizá, quizás and tal vez which are more likely to co-occur with the subjunctive. The other includes a lo mejor, igual and seguramente which have tendencies to co-occur with the indicative. Furthermore, for social factors, male, educated people, and peninsular Spanish relates more to the subjunctive.
语气是西班牙语中最受欢迎的话题之一,当结构只是副词和动词时,就会发生语气交替。尽管许多研究都涉及这一主题,但大多数研究都集中在动词上。一些研究讨论了疑问副词的语气转换,但只有少数研究使用统计学方法。此外,现有关于说话人社会因素的研究还不够充分。因此,本研究的目的是利用统计学方法揭示怀疑副词之间的差异,并阐明情绪变化与社会因素之间的关系。为了实现这些目标,本语料库研究从corpes xxi(2020)和presea Madrid(distrito de Salamanca)中提取数据,并使用多重对应分析和决策树分析分析了1587个例子,其中包括posiblemente、probablemente、quiqá、quizás、tal vez、a lo mejor、igual和seguramente。结果表明,疑问副词可分为两类。其中包括posiblemente、probabllemente,quizá,quizàs和tal vez,它们更可能与虚拟语气同时出现。另一种包括lo mejor、igual和seguramente,它们有与指示性同时出现的趋势。此外,由于社会因素,男性、受过教育的人和西班牙半岛人更多地与虚拟语气有关。
{"title":"Quantitative-statistic corpus analysis about mood variation in Spanish based on linguistic and social variables","authors":"Harunobu Hirota","doi":"10.3366/cor.2023.0274","DOIUrl":"https://doi.org/10.3366/cor.2023.0274","url":null,"abstract":"Mood is one of the most popular topics in Spanish and mood alternation occurs when the structure is just an adverb and a verb. Although many studies have addressed this subject, most focus on verbs. Some studies have discussed mood alternation with adverbs of doubt, but only a few use statistical methods. In addition, the existing research on speakers’ social factors is insufficient. Therefore, the objectives of this study are to reveal differences between the adverbs of doubt using statistical methods and to elucidate the relationship between mood alternation and social factors. To achieve these objectives, this corpus study extracts data from corpes xxi (2020) and preseea-Madrid ( distrito de Salamanca) and analyses 1,587 examples containing posiblemente, probablemente, quizá, quizás, tal vez, a lo mejor, igual and seguramente using multiple correspondence analysis and decision tree analysis. The results show that adverbs of doubt can be divided into two groups. One includes posiblemente, probablemente, quizá, quizás and tal vez which are more likely to co-occur with the subjunctive. The other includes a lo mejor, igual and seguramente which have tendencies to co-occur with the indicative. Furthermore, for social factors, male, educated people, and peninsular Spanish relates more to the subjunctive.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":null,"pages":null},"PeriodicalIF":0.5,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46232491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}