Corpora最新文献

英文中文

Semantic prosody of Slovene adverb–verb collocations: introducing the top-down approach 斯洛文尼亚语副词-动词搭配的语义韵律:介绍自上而下的方法

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2022-04-01 DOI: 10.3366/cor.2022.0234

This paper presents a corpus-driven Sinclairian analysis of five high-frequency Slovene verbs covering the lexical paradigm ‘to express orally’ in combination with their premodifying adverbs of manner. One of the main goals of the paper is to establish how frequent the phenomenon of semantic prosody actually is among high-frequency lexical items (here, adv-v pairs). A methodology aiming to provide an answer to this question has been proposed featuring the top-down approach (i.e., in order of decreasing frequency of occurrence). It involves setting up the widest possible parameters of searching for so-called ‘extended units of meaning’ and their semantic prosody amongst the most frequent lexical patterns in a language. A total of twenty-six adv-v pairs have been examined. Results indicate a strong correlation between the frequency of multi-word lexical items and their tendency to develop semantic prosodies: high-frequency collocations are thus more likely to have semantic prosodies compared to their lower-frequency counterparts. Overall, results also corroborate the trend of semantic prosody to be found with mainly negative meanings and to a lesser extent in neutral meanings, while no positive semantic prosody has been determined in this study.

本文采用语料库驱动的辛克莱语分析方法，分析了斯洛文尼亚语中五个高频动词，包括“口头表达”的词汇范式及其方式副词。本文的主要目标之一是确定语义韵律现象在高频词汇项(这里是副词-v对)中实际出现的频率。已经提出了一种旨在提供这个问题答案的方法，其特点是自上而下的方法(即，按发生频率递减的顺序)。它包括设置尽可能广泛的参数来搜索所谓的“扩展意义单位”及其在语言中最常见的词汇模式中的语义韵律。共检查了26个副词-v对。结果表明，多词词汇项目的频率与其发展语义韵律的趋势之间存在很强的相关性:因此，与低频搭配相比，高频搭配更有可能产生语义韵律。总体而言，研究结果也证实了语义韵律以负面意义为主，中性意义较少的趋势，而本研究尚未确定积极的语义韵律。

{"title":"Semantic prosody of Slovene adverb–verb collocations: introducing the top-down approach","authors":"P. Jurko","doi":"10.3366/cor.2022.0234","DOIUrl":"https://doi.org/10.3366/cor.2022.0234","url":null,"abstract":"This paper presents a corpus-driven Sinclairian analysis of five high-frequency Slovene verbs covering the lexical paradigm ‘to express orally’ in combination with their premodifying adverbs of manner. One of the main goals of the paper is to establish how frequent the phenomenon of semantic prosody actually is among high-frequency lexical items (here, adv-v pairs). A methodology aiming to provide an answer to this question has been proposed featuring the top-down approach (i.e., in order of decreasing frequency of occurrence). It involves setting up the widest possible parameters of searching for so-called ‘extended units of meaning’ and their semantic prosody amongst the most frequent lexical patterns in a language. A total of twenty-six adv-v pairs have been examined. Results indicate a strong correlation between the frequency of multi-word lexical items and their tendency to develop semantic prosodies: high-frequency collocations are thus more likely to have semantic prosodies compared to their lower-frequency counterparts. Overall, results also corroborate the trend of semantic prosody to be found with mainly negative meanings and to a lesser extent in neutral meanings, while no positive semantic prosody has been determined in this study.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42993874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

On the status of statistical reporting versus linguistic description in corpus linguistics: a ten-year perspective 统计报告与语言描述在语料库语言学中的地位:十年展望

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2022-04-01 DOI: 10.3366/cor.2022.0238

Tove Larsson Jesse Egbert D. Biber

This study investigates ( i) whether there has been a shift towards increased statistical focus in corpus linguistic research articles, and, if so, ( ii) whether this has had any repercussions for the attention paid to linguistic description. We investigate this through an analysis of the relative focus on statistical reporting versus linguistic description in the way the results are reported and discussed in research articles published in four major corpus linguistics journals in 2009 and 2019. The results display a marked change: in 2009, a clear majority of the articles exhibit a preference for linguistic description over statistical reporting; in 2019, the exact opposite is true. The number of different statistical techniques employed has also gone up. Whilst the increased statistical focus may reflect increased methodological sophistication, our results show that it has come at a cost: a diminished focus on linguistic description, evident, for example, through fewer text excerpts and linguistic examples, which appears to be symptomatic of increasing distance from the language that is the object of study. We discuss these shifts and suggest some ways of employing sophisticated statistical techniques without sacrificing a focus on language.

本研究调查了(i)语料库语言学研究文章中是否有向增加统计焦点的转变，如果有，(ii)这是否对语言描述的关注产生了任何影响。我们通过分析2009年和2019年在四家主要语料库语言学期刊上发表的研究文章中报告和讨论结果的方式，对统计报告与语言描述的相对关注进行了调查。结果显示了一个显著的变化:在2009年，绝大多数文章表现出对语言描述的偏好，而不是统计报告;2019年，情况正好相反。所采用的不同统计技术的数量也有所增加。虽然增加的统计重点可能反映了方法的复杂性，但我们的结果表明，这是有代价的:对语言描述的关注减少了，例如，通过更少的文本摘录和语言示例，这似乎是与作为研究对象的语言距离增加的症状。我们讨论了这些变化，并提出了一些在不牺牲语言重点的情况下使用复杂统计技术的方法。

{"title":"On the status of statistical reporting versus linguistic description in corpus linguistics: a ten-year perspective","authors":"Tove Larsson, Jesse Egbert, D. Biber","doi":"10.3366/cor.2022.0238","DOIUrl":"https://doi.org/10.3366/cor.2022.0238","url":null,"abstract":"This study investigates ( i) whether there has been a shift towards increased statistical focus in corpus linguistic research articles, and, if so, ( ii) whether this has had any repercussions for the attention paid to linguistic description. We investigate this through an analysis of the relative focus on statistical reporting versus linguistic description in the way the results are reported and discussed in research articles published in four major corpus linguistics journals in 2009 and 2019. The results display a marked change: in 2009, a clear majority of the articles exhibit a preference for linguistic description over statistical reporting; in 2019, the exact opposite is true. The number of different statistical techniques employed has also gone up. Whilst the increased statistical focus may reflect increased methodological sophistication, our results show that it has come at a cost: a diminished focus on linguistic description, evident, for example, through fewer text excerpts and linguistic examples, which appears to be symptomatic of increasing distance from the language that is the object of study. We discuss these shifts and suggest some ways of employing sophisticated statistical techniques without sacrificing a focus on language.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44253043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 8

Review: Pastor and Colson (eds). 2020. Computational Phraseology 评论:牧师和科尔森(编)。2020. 计算措辞

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2022-04-01 DOI: 10.3366/cor.2022.0239

Joe Geluso

引用次数: 0

Review: Egbert and Baker (eds). 2020. Using Corpus Methods to Triangulate Linguistic Analysis 回顾:Egbert和Baker(编)。2020. 用语料库方法进行语言分析

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2021-11-01 DOI: 10.3366/cor.2021.0230

Xiaoli Fu

Previous research on methodological triangulation, like Baker and Egbert (2016), has mainly focussed on triangulation within corpus linguistics (CL). This timely volume presents triangulation between corpus linguistic methods and other linguistic methodologies through nine empirical studies in discourse analysis, applied linguistics and psycholinguistics. The volume consists of an introduction, nine chapters grouped into three sections, and a ‘Synthesis and Conclusion’. In the Introduction, the editors briefly introduce CL and methodological triangulation. A brief review of previous literature on triangulation between CL and other linguistic methods in the fields of discourse analysis, applied linguistics and psycholinguistics is then presented. It ends with a sequential introduction to the nine studies in the volume. Part I (Chapters 2 to 4) falls into the area of discourse analysis. To analyse text structure in a corpus of twenty-four academic lectures, in Chapter 2, Erin Schnur and Eniko Csomay employ manual/automatic segmentation and qualitative/quantitative analysis. The first approach involves manual segmentation using Mechanical Turk (MT) and qualitative coding of the 1,056 segments identified based on eight functions. The analysis here focusses on the distribution of segment functions in the texts. In the second approach, 769 Vocabulary-Based Discourse Units are automatically identified with TextTiler and then subjected to quantitative analysis, identifying four text-types of segments with similar linguistic features. Thus, the second case study focusses on the distribution of linguistic patterns in text structure to illustrate the association between language variation and pedagogical purpose. In Chapter 3, Tony McEnery, Helen Baker and Carmen Dayrell rely on an historical newspaper corpus to explore the reality of droughts in nineteenth-century Britain. To control the potential errors in the digitised

之前关于方法论三角测量的研究，如Baker和Egbert(2016)，主要集中在语料库语言学(CL)中的三角测量。这个及时的卷提出了语料库语言学方法之间的三角测量和其他语言学方法通过九个实证研究在话语分析，应用语言学和心理语言学。该卷包括一个介绍，九章分为三个部分，和一个“综合和结论”。在引言部分，编者简要介绍了三角法和三角法。在此基础上，简要回顾了语篇分析、应用语言学和心理语言学等领域中关于语篇分析与其他语言学方法之间的三角测量的文献。它以对卷中的九项研究的顺序介绍结束。第一部分(第二章至第四章)是语篇分析领域。为了分析24篇学术讲座的文本结构，在第二章中，Erin Schnur和Eniko Csomay采用了手动/自动分割和定性/定量分析。第一种方法是使用机械土耳其语(MT)进行人工分割，并对基于8个功能识别的1,056个片段进行定性编码。本文主要分析语段功能在语篇中的分布。在第二种方法中，使用TextTiler自动识别769个基于词汇的语篇单元，然后进行定量分析，识别出语言特征相似的四种语段文本类型。因此，第二个案例研究侧重于语言模式在语篇结构中的分布，以说明语言变化与教学目的之间的关系。在第三章中，托尼·麦克纳里、海伦·贝克和卡门·戴雷尔依靠历史报纸语库来探索19世纪英国干旱的现实。控制数字化过程中潜在的误差

{"title":"Review: Egbert and Baker (eds). 2020. Using Corpus Methods to Triangulate Linguistic Analysis","authors":"Xiaoli Fu","doi":"10.3366/cor.2021.0230","DOIUrl":"https://doi.org/10.3366/cor.2021.0230","url":null,"abstract":"Previous research on methodological triangulation, like Baker and Egbert (2016), has mainly focussed on triangulation within corpus linguistics (CL). This timely volume presents triangulation between corpus linguistic methods and other linguistic methodologies through nine empirical studies in discourse analysis, applied linguistics and psycholinguistics. The volume consists of an introduction, nine chapters grouped into three sections, and a ‘Synthesis and Conclusion’. In the Introduction, the editors briefly introduce CL and methodological triangulation. A brief review of previous literature on triangulation between CL and other linguistic methods in the fields of discourse analysis, applied linguistics and psycholinguistics is then presented. It ends with a sequential introduction to the nine studies in the volume. Part I (Chapters 2 to 4) falls into the area of discourse analysis. To analyse text structure in a corpus of twenty-four academic lectures, in Chapter 2, Erin Schnur and Eniko Csomay employ manual/automatic segmentation and qualitative/quantitative analysis. The first approach involves manual segmentation using Mechanical Turk (MT) and qualitative coding of the 1,056 segments identified based on eight functions. The analysis here focusses on the distribution of segment functions in the texts. In the second approach, 769 Vocabulary-Based Discourse Units are automatically identified with TextTiler and then subjected to quantitative analysis, identifying four text-types of segments with similar linguistic features. Thus, the second case study focusses on the distribution of linguistic patterns in text structure to illustrate the association between language variation and pedagogical purpose. In Chapter 3, Tony McEnery, Helen Baker and Carmen Dayrell rely on an historical newspaper corpus to explore the reality of droughts in nineteenth-century Britain. To control the potential errors in the digitised","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47881288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pinning down the gap: gender and the online representation of professional tennis players 确定差距:性别和职业网球运动员的在线表现

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2021-11-01 DOI: 10.3366/cor.2021.0227

A. Yip

Sport is a powerful social institution where hegemonic masculinity is constantly constructed and naturalised through the positioning of physicality and athleticism alongside maleness. Female athletes continue to be sub-ordinated by means of under-representation and trivialising gender discourses. So far, the extensive discussion of gendered language in sports media has primarily focussed on identifying the manifestations of gender bias in traditional news media. There has been little endeavour to explore the language of online media and tournament organisers. This study addresses that gap by comparing online gender representations of tennis players during the Wimbledon Championships 2018 on five online news websites and the tournament website. It also contributes to existing literature by providing corpus evidence of gender bias in sports media. The corpus consists of 1,622 articles (1,076,475 tokens). Findings from frequency, collocation and concordance analysis indicate that despite some instances of gender-neutral representations, female players are prone to gender marking and gender-bland sexism on all websites. I argue that the challenges women face relate to the tension between femininity and athleticism, and the misguided belief that women need to but can never eliminate the muscle gap.

体育是一个强大的社会机构，在这里，通过将身体素质和运动能力与男性气质并列，霸权的男性气质不断被构建和自然化。由于代表性不足和性别话语的轻视，女运动员继续处于从属地位。到目前为止，关于体育媒体中性别语言的广泛讨论主要集中在识别传统新闻媒体中性别偏见的表现。在探索网络媒体和赛事组织者的语言方面，几乎没有什么努力。这项研究通过比较2018年温布尔登网球锦标赛期间五个在线新闻网站和比赛网站上网球运动员的在线性别表现来解决这一差距。它还通过提供体育媒体中性别偏见的语料库证据，为现有文献做出贡献。语料库由1,622篇文章(1,076,475个令牌)组成。频率、搭配和一致性分析的结果表明，尽管存在一些性别中立的表现，但女性玩家在所有网站上都倾向于性别标记和性别平淡的性别歧视。我认为，女性面临的挑战与女性气质和运动能力之间的紧张关系，以及女性需要但永远无法消除肌肉差距的错误信念有关。

{"title":"Pinning down the gap: gender and the online representation of professional tennis players","authors":"A. Yip","doi":"10.3366/cor.2021.0227","DOIUrl":"https://doi.org/10.3366/cor.2021.0227","url":null,"abstract":"Sport is a powerful social institution where hegemonic masculinity is constantly constructed and naturalised through the positioning of physicality and athleticism alongside maleness. Female athletes continue to be sub-ordinated by means of under-representation and trivialising gender discourses. So far, the extensive discussion of gendered language in sports media has primarily focussed on identifying the manifestations of gender bias in traditional news media. There has been little endeavour to explore the language of online media and tournament organisers. This study addresses that gap by comparing online gender representations of tennis players during the Wimbledon Championships 2018 on five online news websites and the tournament website. It also contributes to existing literature by providing corpus evidence of gender bias in sports media. The corpus consists of 1,622 articles (1,076,475 tokens). Findings from frequency, collocation and concordance analysis indicate that despite some instances of gender-neutral representations, female players are prone to gender marking and gender-bland sexism on all websites. I argue that the challenges women face relate to the tension between femininity and athleticism, and the misguided belief that women need to but can never eliminate the muscle gap.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49030958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Separatism: a cross-linguistic corpus-assisted study of word-meaning development in a time of conflict 分离主义:冲突时期跨语言语料库辅助下的词义发展研究

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2021-11-01 DOI: 10.3366/cor.2021.0228

Tatyana Karpenko-Seccombe

This paper considers the role of historical context in initiating shifts in word meaning. The study focusses on two words – the translation equivalents separatist and separatism – in the discourses of Russian and Ukrainian parliamentary debates before and during the Russian–Ukrainian conflict which emerged at the beginning of 2014. The paper employs a cross-linguistic corpus-assisted discourse analysis to investigate the way wider socio-political context affects word usage and meaning. To allow a comparison of discourses around separatism between two parliaments, four corpora were compiled covering the debates in both parliaments before and during the conflict. Keywords, collocations and n-grams were studied and compared, and this was followed by qualitative analysis of concordance lines, co-text and the larger context in which these words occurred. The results show how originally close meanings of translation equivalents began to diverge and manifest noticeable changes in their connotative, affective and, to an extent, denotative meanings at a time of conflict in line with the dominant ideologies of the parliaments as well as the political affiliations of individuals.

本文考虑了历史语境在引发词义转换中的作用。这项研究的重点是2014年初出现的俄乌冲突之前和期间，俄罗斯和乌克兰议会辩论中的两个词——翻译相当于分离主义和分裂主义。本文采用跨语言语料库辅助语篇分析来研究更广泛的社会政治语境对词语使用和意义的影响。为了比较两个议会之间围绕分离主义的讨论，汇编了四份语料库，涵盖了冲突前和冲突期间两个议会的辩论。研究和比较了关键词、搭配和n-gram，然后对一致行、共文本和这些词出现的大上下文进行了定性分析。研究结果表明，在与议会的主导意识形态以及个人的政治派别相冲突的时候，翻译对等物的原本相近的含义开始出现分歧，并在其内涵、情感和一定程度上的外延意义上表现出明显的变化。

{"title":"Separatism: a cross-linguistic corpus-assisted study of word-meaning development in a time of conflict","authors":"Tatyana Karpenko-Seccombe","doi":"10.3366/cor.2021.0228","DOIUrl":"https://doi.org/10.3366/cor.2021.0228","url":null,"abstract":"This paper considers the role of historical context in initiating shifts in word meaning. The study focusses on two words – the translation equivalents separatist and separatism – in the discourses of Russian and Ukrainian parliamentary debates before and during the Russian–Ukrainian conflict which emerged at the beginning of 2014. The paper employs a cross-linguistic corpus-assisted discourse analysis to investigate the way wider socio-political context affects word usage and meaning. To allow a comparison of discourses around separatism between two parliaments, four corpora were compiled covering the debates in both parliaments before and during the conflict. Keywords, collocations and n-grams were studied and compared, and this was followed by qualitative analysis of concordance lines, co-text and the larger context in which these words occurred. The results show how originally close meanings of translation equivalents began to diverge and manifest noticeable changes in their connotative, affective and, to an extent, denotative meanings at a time of conflict in line with the dominant ideologies of the parliaments as well as the political affiliations of individuals.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48018322","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Exploring and categorising the Arabic copula and auxiliary kāna through enhanced part-of-speech tagging 通过增强词性标注对阿拉伯语联结词和助词kāna进行探索和分类

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2021-11-01 DOI: 10.3366/cor.2021.0225

A. Hardie Wesam M. A. Ibrahim

Arabic syntax has yet to be studied in detail from a corpus-based perspective. The Arabic copula kāna (‘be’), functions also as an auxiliary, creating periphrastic tense–aspect constructions; but the literature on these functions is far from exhaustive. To analyse kāna within the one-million word Corpus of Contemporary Arabic, part-of-speech tagging (using novel, targeted enhancements to a previously described program which improves the accessibility for linguistic analysis of the output of Habash et al.’s [2012] mada disambiguator for the Buckwalter Arabic morphological analyser) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10 percent samples (499 instances of copula kāna and 387 of auxiliary kāna) are analysed manually to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns’ main parameters of variation; special descriptions are developed for specific, apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall, we uncover substantial new detail, not mentioned in existing grammars (e.g., the quantitative predominance of the past imperfect construction over other uses of auxiliary kāna). There exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions but also pedagogy of Arabic as a first or second/foreign language.

阿拉伯语语法尚未从语料库的角度进行详细研究。阿拉伯语系词kāna（'be'）也起辅助作用，创造了周边时-体结构；但关于这些功能的文献还远远不够详尽。为了分析现代阿拉伯语一百万字语料库中的kāna，词性标记（使用对先前描述的程序的新颖、有针对性的增强，该程序提高了Habash等人[2012]mada用于Buckwalter阿拉伯语词形分析器的消歧器输出的语言分析的可访问性）以高准确率应用于消歧系词和辅助词。提取两者的一致性，并手动分析10%的样本（499个系词kāna和387个助词kās na），以识别表层语法模式和含义。然后，根据更一般的模式的主要变异参数，将这种原始分析系统化；专门的描述是针对特定的、明显固定的形式表达（包括两种提供动词和形容形式表达的短语）而开发的。总的来说，我们发现了大量新的细节，这些细节在现有语法中没有提及（例如，过去不完美的结构在数量上优于辅助kāna的其他用法）。这些基于语料库的发现不仅有助于语法描述，而且有助于提高阿拉伯语作为第一语言或第二语言/外语的教育学。

{"title":"Exploring and categorising the Arabic copula and auxiliary kāna through enhanced part-of-speech tagging","authors":"A. Hardie, Wesam M. A. Ibrahim","doi":"10.3366/cor.2021.0225","DOIUrl":"https://doi.org/10.3366/cor.2021.0225","url":null,"abstract":"Arabic syntax has yet to be studied in detail from a corpus-based perspective. The Arabic copula kāna (‘be’), functions also as an auxiliary, creating periphrastic tense–aspect constructions; but the literature on these functions is far from exhaustive. To analyse kāna within the one-million word Corpus of Contemporary Arabic, part-of-speech tagging (using novel, targeted enhancements to a previously described program which improves the accessibility for linguistic analysis of the output of Habash et al.’s [2012] mada disambiguator for the Buckwalter Arabic morphological analyser) is applied to disambiguate copula and auxiliary at a high rate of accuracy. Concordances of both are extracted, and 10 percent samples (499 instances of copula kāna and 387 of auxiliary kāna) are analysed manually to identify surface-level grammatical patterns and meanings. This raw analysis is then systematised according to the more general patterns’ main parameters of variation; special descriptions are developed for specific, apparently fixed-form expressions (including two phraseologies which afford expression of verbal and adjectival modality). Overall, we uncover substantial new detail, not mentioned in existing grammars (e.g., the quantitative predominance of the past imperfect construction over other uses of auxiliary kāna). There exists notable potential for these corpus-based findings to inform and enhance not only grammatical descriptions but also pedagogy of Arabic as a first or second/foreign language.","PeriodicalId":44933,"journal":{"name":"Corpora","volume":" ","pages":""},"PeriodicalIF":0.5,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47452203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Expanding lindsei to spoken learner English from several L1s across cefr levels 将英语口语从几个15级扩展到英语口语学习者

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2021-08-19 DOI: 10.3366/cor.2021.0220

Tomáš Gráf Lan-fen Huang

Learner corpus studies typically investigate the language of second-language learners with a different first language (L1) or with proficiency levels inferred from external criteria (e.g., the Louvain International Database of Spoken English Interlanguage, lindsei; Gilquin et al., 2010 ). This paper reports the process of expanding the original Czech ( Gráf, 2017 ) and Taiwanese ( Huang, 2014 ) sub-corpora (predominantly at B2 and C1; Huang et al., 2018 ) with samples from learners of other L1s across cefr levels. In addition to sixty interviews by the German, Finnish and Norwegian lindsei teams, another eighty-three interviews with university students in Taiwan and Finland were held. The data collection and transcription procedures were adapted from lindsei guidelines to ensure comparability. Each fourteen-minute interview was anonymised using Audacity, and orthographically transcribed and aligned by means of exmaralda. The levels of speaking proficiency in the supplemented data were assessed by two expert raters. The expanded learner corpus, containing 243 interviews, will be of considerable value for studying the development of learner English.

学习者语料库研究通常调查具有不同第一语言（L1）或根据外部标准推断出的熟练程度的第二语言学习者的语言（例如，Louvain国际英语口语中介语数据库，lindsei；Gilquin等人，2010年）。本文报道了扩展原始捷克语（Gráf，2017）和台语（Huang，2014）子语料库（主要在B2和C1；Huang et al.，2018）的过程，样本来自不同cefr水平的其他L1学习者。除了德国、芬兰和挪威林赛团队的60次采访外，还对台湾和芬兰的83名大学生进行了采访。数据收集和转录程序根据lindsei指南进行了调整，以确保可比性。每一次14分钟的采访都使用Audacity进行匿名处理，并通过exmaralda进行拼写转录和对齐。两名专家评估了补充数据中的口语水平。扩展后的学习者语料库包含243个访谈，对研究英语学习者的发展具有相当大的价值。

引用次数: 1

An algorithm to identify periods of establishment and obsolescence of linguistic items in a diachronic corpus 一种识别历时语料库中语言项目建立和过时时期的算法

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2021-08-19 DOI: 10.3366/cor.2021.0218

S. Wichmann Evandro Cunha

When exploring diachronic corpora, it is often beneficial for linguists to pinpoint not only the first or the last attestation dates of certain linguistic items, but also the moments in which they become more strongly established in the corpus or, conversely, the moments in which they, despite still being part of the language, become obsolete. In this paper, we propose an algorithm to assist the identification of such periods based on the frequency of items in a corpus. Our simple and generalisable algorithm can be used for the investigation of any linguistic item in any corpus which is divided into time-frames. We also demonstrate the applicability of our method using lexical data from the Corpus of Historical American English (coha), providing case studies on the statistics and characteristics of words that appear in or disappear from this corpus in different periods.

在探索历时语料库时，语言学家通常不仅要确定某些语言项目的第一个或最后一个证明日期，还要确定它们在语料库中变得更加牢固的时刻，或者相反，确定它们尽管仍然是语言的一部分，但却变得过时的时刻。在本文中，我们提出了一种基于语料库中项目频率的算法来帮助识别此类周期。我们的简单且可推广的算法可以用于调查任何语料库中的任何语言项目，该语料库被划分为时间框架。我们还利用美国历史英语语料库（coha）的词汇数据证明了我们的方法的适用性，并对不同时期出现或消失在该语料库中的单词的统计数据和特征进行了案例研究。

引用次数: 1

Review: McEnery, Hardie and Younis (eds). 2019. Arabic Corpus Linguistics. Edinburgh: Edinburgh University Press 书评:McEnery, Hardie and Younis主编。2019. 阿拉伯语料库语言学。爱丁堡:爱丁堡大学出版社

IF 0.5 Q3 LINGUISTICS

Corpora

Pub Date : 2021-08-19 DOI: 10.3366/cor.2021.0222

Mansoor Al-Surmi

引用次数: 1

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Corpora

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀