首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
Classification and identification level ambiguity in error annotation 错误标注中的分类和识别级别歧义
Pub Date : 2022-12-01 DOI: 10.1016/j.acorp.2022.100035
Alexandros Tantos, Nikolaos Amvrazis

The vast majority of corpus annotation projects goes through a piloting phase in which the annotation scheme is gradually shaped through iterative annotation cycles until its final version is produced and applied to the collected data. The differences in annotators’ choices are usually recorded and reflected by the ‘Inter-annotator Agreement’ (IAA) that serves as a proxy to understand and resolve the raised issues. However, little has been reported on how to formulate a systematic approach to: (i) tracing the source of the differences in the annotators’ choices and (ii) provide attainable solutions that would considerably increase IAA. In this paper, the ‘Greek Learner Corpus II’ (GLCII) -the largest online greek learner corpus will serve as a basis to shed light on two commonly met types of ambiguity in error annotation that are closely related to target languages in which syncretism is ubiquitous in grammar (e.g., Greek and Romanian): a classification level and an identification level ambiguity.

绝大多数语料库注释项目都经历了一个试验阶段,在这个阶段,通过迭代注释周期逐渐形成注释方案,直到产生最终版本并将其应用于收集的数据。注释者选择的差异通常由“注释者间协议”(IAA)记录和反映,该协议作为理解和解决所提出问题的代理。然而,关于如何制定一种系统的方法来:(i)追踪注释者选择差异的来源和(ii)提供可实现的解决方案,从而大大增加IAA的报道很少。在本文中,最大的在线希腊语学习者语料库“希腊语学习者语料库II”(GLCII)将作为揭示错误注释中两种常见的歧义类型的基础,这两种类型与目标语言密切相关,其中语法中普遍存在融合(例如希腊语和罗马尼亚语):分类级别的歧义和识别级别的歧义。
{"title":"Classification and identification level ambiguity in error annotation","authors":"Alexandros Tantos,&nbsp;Nikolaos Amvrazis","doi":"10.1016/j.acorp.2022.100035","DOIUrl":"10.1016/j.acorp.2022.100035","url":null,"abstract":"<div><p>The vast majority of corpus annotation projects goes through a piloting phase in which the annotation scheme is gradually shaped through iterative annotation cycles until its final version is produced and applied to the collected data. The differences in annotators’ choices are usually recorded and reflected by the ‘Inter-annotator Agreement’ (IAA) that serves as a proxy to understand and resolve the raised issues. However, little has been reported on how to formulate a systematic approach to: (i) tracing the source of the differences in the annotators’ choices and (ii) provide attainable solutions that would considerably increase IAA. In this paper, the ‘Greek Learner Corpus II’ (GLCII) -the largest online greek learner corpus will serve as a basis to shed light on two commonly met types of ambiguity in error annotation that are closely related to target languages in which syncretism is ubiquitous in grammar (e.g., Greek and Romanian): a classification level and an identification level ambiguity.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100035"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46834109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A tutorial on norming linguistic stimuli for clinical populations 规范临床人群语言刺激的教程
Pub Date : 2022-12-01 DOI: 10.1016/j.acorp.2022.100022
Oliver Delgaram-Nejad , Gerasimos Chatzidamianos , Dawn Archer , Alex Bartha , Louise Robinson

Stimuli norming (the process of controlling experimental items to minimise bias) is important for the validity of psycholinguistic experiments. Survey norming (asking large numbers of people to rate or otherwise define the items) is typically used for this purpose but requires large samples. Clinical populations are not always large, nor easy to reach. Clinical participants often have ongoing symptomatology, and some cohorts experience language and communication difficulties. We present a corpus-linguistic method suitable for clinical populations for which survey norming is difficult or inappropriate. We also include the experiment generated, which measures metaphor-creation behaviour in schizophrenia to test Cognitive Constraint Theory (CCT) in clinical and nonclinical populations (see S2.1). We describe the design rationale before outlining the design stages in tutorial form. This allows us to show readers why the approach was needed and support them to consider and respond to the challenges that we encountered. We conclude that it is easier to consider norming and design practices in parallel when experimental units are defined linguistically. Corpus stimuli norming provides a versatile alternative when survey norming is prohibitive, especially in speech pathology.

刺激规范(控制实验项目以减少偏差的过程)对心理语言学实验的有效性很重要。调查规范(要求大量的人评价或以其他方式定义项目)通常用于此目的,但需要大样本。临床人群并不总是很大,也不容易到达。临床参与者通常有持续的症状,一些队列经历语言和沟通困难。我们提出了一种语料库语言方法,适合临床人群的调查规范是困难的或不适当的。我们还包括生成的实验,该实验测量精神分裂症患者的隐喻创造行为,以在临床和非临床人群中测试认知约束理论(CCT)(见S2.1)。在以教程形式概述设计阶段之前,我们先描述设计原理。这使我们能够向读者展示为什么需要这种方法,并支持他们考虑和应对我们遇到的挑战。我们的结论是,当实验单元在语言上定义时,更容易并行考虑规范化和设计实践。语料库刺激规范提供了一个通用的替代调查规范是禁止的,特别是在言语病理学。
{"title":"A tutorial on norming linguistic stimuli for clinical populations","authors":"Oliver Delgaram-Nejad ,&nbsp;Gerasimos Chatzidamianos ,&nbsp;Dawn Archer ,&nbsp;Alex Bartha ,&nbsp;Louise Robinson","doi":"10.1016/j.acorp.2022.100022","DOIUrl":"10.1016/j.acorp.2022.100022","url":null,"abstract":"<div><p>Stimuli norming (the process of controlling experimental items to minimise bias) is important for the validity of psycholinguistic experiments. Survey norming (asking large numbers of people to rate or otherwise define the items) is typically used for this purpose but requires large samples. Clinical populations are not always large, nor easy to reach. Clinical participants often have ongoing symptomatology, and some cohorts experience language and communication difficulties. We present a corpus-linguistic method suitable for clinical populations for which survey norming is difficult or inappropriate. We also include the experiment generated, which measures metaphor-creation behaviour in schizophrenia to test Cognitive Constraint Theory (CCT) in clinical and nonclinical populations (see S2.1). We describe the design rationale before outlining the design stages in tutorial form. This allows us to show readers why the approach was needed and support them to consider and respond to the challenges that we encountered. We conclude that it is easier to consider norming and design practices in parallel when experimental units are defined linguistically. Corpus stimuli norming provides a versatile alternative when survey norming is prohibitive, especially in speech pathology.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100022"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000077/pdfft?md5=40b8aaab346c1faa805c35598a6254f4&pid=1-s2.0-S2666799122000077-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45726550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Nicole Mockler(2022)《构建教师身份:印刷媒体如何定义和代表教师及其工作》。
Pub Date : 2022-12-01 DOI: 10.1016/j.acorp.2022.100034
Jamie McKeown
{"title":"","authors":"Jamie McKeown","doi":"10.1016/j.acorp.2022.100034","DOIUrl":"10.1016/j.acorp.2022.100034","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100034"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46229291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparing the situational and linguistic characteristics of first year writing and engineering writing 大一写作和工科写作的情景和语言特征比较
Pub Date : 2022-12-01 DOI: 10.1016/j.acorp.2022.100031
Shelley Staples , Ashley JoEtta

First year writing (FYW) courses aim to prepare students for disciplinary writing. However, research suggests that FYW often fails to provide sufficient preparation for writing across genres and disciplines (Leki, 2007). A register-functional approach to corpus linguistics has elucidated key differences across disciplines and genres for both published and student academic writing (Biber and Gray, 2016; Staples et al., 2016; Staples and Reppen, 2016). To date, however, no studies have compared these features across FYW and First Year Engineering (FYE) writing.

This research uses a corpus of FYE and FYW texts developed by the authors. The subset for this study includes papers written by undergraduate students majoring in Engineering and taking FYE and FYW courses in the same semester. Technical Briefs (TB) and Design Reports (DR) were selected from the FYE corpus and Rhetorical Analysis (RA) and Research Reports (RR) from the FYW corpus. We investigated the situational context and normed frequencies of linguistic features hypothesized to show similarities and differences.

Our situational analysis shows key differences in characteristics of the RA and TB, particularly regarding audiences (clients for the TB, and instructors for the RA) and the object of analysis (advertisements for the RA and mathematical models for the TB). There were more similarities between the RR and DR, including a shared focus on a solution to a problem and the presence of both a methods and results section. Results from the linguistic analysis show the impact of the situational characteristics. For example, conditional clauses and premodifying nouns were used at similar rates of occurrence in the DR and RR, reflecting their inclusion of research questions and their sharing detailed information about the problem and solution. Implications of these findings for teaching in these contexts will be discussed.

第一年写作(FYW)课程旨在为学生的学科写作做准备。然而,研究表明,FYW往往不能为跨体裁和学科的写作提供充分的准备(Leki, 2007)。语料库语言学的语域功能方法阐明了出版和学生学术写作在学科和流派之间的关键差异(Biber和Gray, 2016;Staples et al., 2016;Staples and Reppen, 2016)。然而,到目前为止,还没有研究将这些特征在FYW和第一年工程(FYE)写作中进行比较。本研究使用了作者开发的财政年度和财政年度文本语料库。本研究的子集包括工程专业本科生在同一学期上FYE和FYW课程的论文。技术简报(TB)和设计报告(DR)选自fyye语料库,修辞分析(RA)和研究报告(RR)选自FYW语料库。我们调查了情景语境和规范频率的语言特征的假设,以显示相似性和差异性。我们的情境分析显示了RA和TB在特征上的关键差异,特别是在受众(TB的客户和RA的讲师)和分析对象(RA的广告和TB的数学模型)方面。RR和DR之间有更多的相似之处,包括对问题解决方案的共同关注,以及方法和结果部分的存在。语言分析的结果显示了情景特征的影响。例如,条件从句和前置名词在DR和RR中的出现率相似,这反映了它们包含了研究问题,并且它们共享了关于问题和解决方案的详细信息。本文将讨论这些发现对这些背景下教学的影响。
{"title":"Comparing the situational and linguistic characteristics of first year writing and engineering writing","authors":"Shelley Staples ,&nbsp;Ashley JoEtta","doi":"10.1016/j.acorp.2022.100031","DOIUrl":"10.1016/j.acorp.2022.100031","url":null,"abstract":"<div><p>First year writing (FYW) courses aim to prepare students for disciplinary writing. However, research suggests that FYW often fails to provide sufficient preparation for writing across genres and disciplines (Leki, 2007). A register-functional approach to corpus linguistics has elucidated key differences across disciplines and genres for both published and student academic writing (Biber and Gray, 2016; Staples et al., 2016; Staples and Reppen, 2016). To date, however, no studies have compared these features across FYW and First Year Engineering (FYE) writing.</p><p>This research uses a corpus of FYE and FYW texts developed by the authors. The subset for this study includes papers written by undergraduate students majoring in Engineering and taking FYE and FYW courses in the same semester. Technical Briefs (TB) and Design Reports (DR) were selected from the FYE corpus and Rhetorical Analysis (RA) and Research Reports (RR) from the FYW corpus. We investigated the situational context and normed frequencies of linguistic features hypothesized to show similarities and differences.</p><p>Our situational analysis shows key differences in characteristics of the RA and TB, particularly regarding audiences (clients for the TB, and instructors for the RA) and the object of analysis (advertisements for the RA and mathematical models for the TB). There were more similarities between the RR and DR, including a shared focus on a solution to a problem and the presence of both a methods and results section. Results from the linguistic analysis show the impact of the situational characteristics. For example, conditional clauses and premodifying nouns were used at similar rates of occurrence in the DR and RR, reflecting their inclusion of research questions and their sharing detailed information about the problem and solution. Implications of these findings for teaching in these contexts will be discussed.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100031"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000168/pdfft?md5=495e055e62e32825e71ff86704ea1eec&pid=1-s2.0-S2666799122000168-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47181612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A corpus-assisted ecolinguistic analysis of the representations of tree/s and forest/s in US discourse from 1820-2019 1820-2019年美国语篇中tree/s和forest/s表征的语料库辅助生态语言学分析
Pub Date : 2022-12-01 DOI: 10.1016/j.acorp.2022.100036
Robert Poole , Marco A. Micalay-Hurtado

This study presents a corpus-assisted ecolinguistic analysis of the evolving discursive representations of tree/s and forest/s in US American discourse from 1820 to 2019 in the approximately 475-million word Corpus of Historical American English (Davies, 2010). To explore these entities and their depictions in prevailing discourse, this study performs a diachronic collocation analysis of adjectives occurring with the terms across the span of the corpus. The analysis identified the 100 most frequent adjective collocates appearing with the singular and plural forms of tree/s and forest/s and calculated Kendall's tau correlation coefficient scores using decade-by-decade per million use rates in order to empirically assess the strength of trends in language use. The findings indicate a divergence in broadly positive and negative representations over the time span as adjectives construing poor health and lack of vitality are rising while adjectives conveying positive attributes of size, beauty, and wellbeing are declining. In addition, adjectives reflecting experiential engagement with tree/s and forest/s have progressively been replaced by a discourse of scientific identification and governmental dominion.

本研究在大约4.75亿单词的美国历史英语语料库中,对1820年至2019年美国话语中tree/s和forest/s的话语表征演变进行了语料库辅助的生态语言学分析(Davies, 2010)。为了探索这些实体及其在主流语篇中的描述,本研究对语料库中与术语一起出现的形容词进行了历时性搭配分析。该分析确定了100个最常见的形容词搭配,与tree/s和forest/s的单数和复数形式一起出现,并计算了Kendall的tau相关系数得分,以每100万次使用率为单位,以经验评估语言使用趋势的强度。研究结果表明,随着时间的推移,积极和消极的表达出现了分歧,表达健康状况不佳和缺乏活力的形容词越来越多,而表达体型、美丽和幸福等积极属性的形容词越来越少。此外,反映与tree/s和forest/s的经验接触的形容词已逐渐被科学认同和政府统治的话语所取代。
{"title":"A corpus-assisted ecolinguistic analysis of the representations of tree/s and forest/s in US discourse from 1820-2019","authors":"Robert Poole ,&nbsp;Marco A. Micalay-Hurtado","doi":"10.1016/j.acorp.2022.100036","DOIUrl":"10.1016/j.acorp.2022.100036","url":null,"abstract":"<div><p>This study presents a corpus-assisted ecolinguistic analysis of the evolving discursive representations of <em>tree/s</em> and <em>forest/s</em> in US American discourse from 1820 to 2019 in the approximately 475-million word Corpus of Historical American English (Davies, 2010). To explore these entities and their depictions in prevailing discourse, this study performs a diachronic collocation analysis of adjectives occurring with the terms across the span of the corpus. The analysis identified the 100 most frequent adjective collocates appearing with the singular and plural forms of <em>tree/s</em> and <em>forest/s</em><span> and calculated Kendall's tau correlation coefficient<span> scores using decade-by-decade per million use rates in order to empirically assess the strength of trends in language use. The findings indicate a divergence in broadly positive and negative representations over the time span as adjectives construing poor health and lack of vitality are rising while adjectives conveying positive attributes of size, beauty, and wellbeing are declining. In addition, adjectives reflecting experiential engagement with </span></span><em>tree/s</em> and <em>forest/s</em> have progressively been replaced by a discourse of scientific identification and governmental dominion.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100036"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48561654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Vocabulary in digital science resources for middle school learners 面向中学学习者的数字科学资源词汇
Pub Date : 2022-12-01 DOI: 10.1016/j.acorp.2022.100023
Rebeca Arndt

This corpus-based study examined the vocabulary in a 2.7-million-token corpus composed of digital science resources for middle school (6–8 grade) students in the United States. The findings of this study show that to reach the suggested 95%–98% lexical coverage thresholds of the Digital Science Corpus (DSC) that are conventionally deemed to facilitate minimal and optimal reading comprehension (Laufer, 2020), middle school (MS) students grade 6–8 must recognize the first 6,000 and 14,000 most frequent word families in the BNC/COCA (Nation, 2012), respectively, plus proper nouns and marginal words. The results of the lexical analysis across the three sub-corpora in the DSC suggest that the Life Science sub-corpora has a considerably larger vocabulary load than the Physical Science and Earth and Space Science sub-corpora. Additionally, while 98.60% of the most frequent 1,000 BNC/COCA word families occurred at least six times in the DSC, the 2,000–7,000 BNC/COCA word families provided significantly fewer opportunities for repeated occurrence. Since more than half of the words in the 5,000–7,000 BNC/COCA bands occurred five times or less in the overall corpus, most words across these bands do not have high enough frequency in the digital science resources to allow MS students to learn them incidentally from reading the texts found in digital science resources. Several pedagogically relevant suggestions for middle school science teachers are discussed.

这项基于语料库的研究检查了由美国中学(6-8年级)学生的数字科学资源组成的270万个token语料库中的词汇。本研究的结果表明,要达到数字科学语料库(DSC) 95%-98%的词汇覆盖阈值,即通常被认为有助于最小和最佳阅读理解(Laufer, 2020), 6-8年级的中学生必须分别识别BNC/COCA (Nation, 2012)中出现频率最高的前6,000和14,000个词族,以及专有名词和边缘词。对DSC中三个子语料库的词汇量分析结果表明,生命科学子语料库的词汇量明显大于物理科学和地球与空间科学子语料库。此外,在频率最高的1000个BNC/COCA词族中,98.60%的词族在DSC中至少出现6次,而在2000 - 7000个BNC/COCA词族中,重复出现的机会显著减少。由于5000 - 7000个BNC/COCA频带中超过一半的单词在整个语料库中出现了5次或更少的次数,因此这些频带中的大多数单词在数字科学资源中的频率不够高,无法让MS学生通过阅读数字科学资源中的文本来偶然学习它们。对中学科学教师的教学建议进行了探讨。
{"title":"Vocabulary in digital science resources for middle school learners","authors":"Rebeca Arndt","doi":"10.1016/j.acorp.2022.100023","DOIUrl":"10.1016/j.acorp.2022.100023","url":null,"abstract":"<div><p>This corpus-based study examined the vocabulary in a 2.7-million-token corpus composed of digital science resources for middle school (6–8 grade) students in the United States. The findings of this study show that to reach the suggested 95%–98% lexical coverage thresholds of the Digital Science Corpus (DSC) that are conventionally deemed to facilitate minimal and optimal reading comprehension (Laufer, 2020), middle school (MS) students grade 6–8 must recognize the first 6,000 and 14,000 most frequent word families in the BNC/COCA (Nation, 2012), respectively, plus proper nouns and marginal words. The results of the lexical analysis across the three sub-corpora in the DSC suggest that the Life Science sub-corpora has a considerably larger vocabulary load than the Physical Science and Earth and Space Science sub-corpora. Additionally, while 98.60% of the most frequent 1,000 BNC/COCA word families occurred at least six times in the DSC, the 2,000–7,000 BNC/COCA word families provided significantly fewer opportunities for repeated occurrence. Since more than half of the words in the 5,000–7,000 BNC/COCA bands occurred five times or less in the overall corpus, most words across these bands do not have high enough frequency in the digital science resources to allow MS students to learn them incidentally from reading the texts found in digital science resources. Several pedagogically relevant suggestions for middle school science teachers are discussed.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100023"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46760659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Studying children's writing development with a corpus 用语料库研究儿童的写作发展
Pub Date : 2022-12-01 DOI: 10.1016/j.acorp.2022.100026
Philip Durrant

One of Randi Reppen's major contributions has been her pioneering corpus research into school children's writing. In this paper, I will discuss how such research can contribute to both theory and educational practice. I will then look at two sets of unresolved methodological issues in this area: the issue of defining appropriate linguistic and textual categories, and the issue of drawing valid developmental inferences.

The issue of categories arises because corpus analysis depends on abstracting away from specific instances of language use in specific texts to make claims about the use of linguistic categories (e.g., noun phrases, low-frequency vocabulary) in textual categories (e.g., stories, science reports). Such abstraction enables researchers to draw out patterns of language variation that are difficult to spot by other means. But it also raises the problem of how to define categories that are reliably operationalizable, that capture consistent developmental patterns, and that are theoretically and educationally informative.

The issue of drawing valid inferences stems from the fact that corpus data record the products of complex, contextually contingent writing processes, involving the interaction of many variables. Capturing the combined outcomes of these complex processes promotes ecological validity. However, it also creates challenges for researchers who want to draw conclusions about specific aspects of the writing process, such as writers’ knowledge of vocabulary or grammar, or their emerging awareness of audience.

This paper will discuss these issues in detail, illustrating their impact and suggesting ways forward for educationally informative corpus research.

Randi Reppen的主要贡献之一是她对学校儿童写作的开创性语料库研究。在本文中,我将讨论这些研究如何对理论和教育实践做出贡献。然后,我将研究这一领域中两组尚未解决的方法论问题:定义适当的语言和文本类别的问题,以及得出有效的发展推论的问题。类别的问题之所以出现,是因为语料库分析依赖于从特定文本中语言使用的特定实例中抽象出来,从而对文本类别(如故事、科学报告)中语言类别(如名词短语、低频词汇)的使用提出要求。这种抽象使研究人员能够绘制出难以用其他方法发现的语言变化模式。但它也提出了一个问题,即如何定义可可靠地操作的类别,捕捉一致的发展模式,并在理论上和教育上提供信息。绘制有效推论的问题源于语料库数据记录了复杂的、上下文偶然的写作过程的产物,涉及许多变量的相互作用。捕获这些复杂过程的综合结果可以促进生态有效性。然而,这也给研究人员带来了挑战,他们想要得出关于写作过程的具体方面的结论,比如作家的词汇或语法知识,或者他们对读者的新兴意识。本文将详细讨论这些问题,说明它们的影响,并提出教育信息语料库研究的方向。
{"title":"Studying children's writing development with a corpus","authors":"Philip Durrant","doi":"10.1016/j.acorp.2022.100026","DOIUrl":"10.1016/j.acorp.2022.100026","url":null,"abstract":"<div><p>One of Randi Reppen's major contributions has been her pioneering corpus research into school children's writing. In this paper, I will discuss how such research can contribute to both theory and educational practice. I will then look at two sets of unresolved methodological issues in this area: the issue of defining appropriate linguistic and textual categories, and the issue of drawing valid developmental inferences.</p><p>The issue of categories arises because corpus analysis depends on abstracting away from specific instances of language use in specific texts to make claims about the use of linguistic categories (e.g., noun phrases, low-frequency vocabulary) in textual categories (e.g., stories, science reports). Such abstraction enables researchers to draw out patterns of language variation that are difficult to spot by other means. But it also raises the problem of how to define categories that are reliably operationalizable, that capture consistent developmental patterns, and that are theoretically and educationally informative.</p><p>The issue of drawing valid inferences stems from the fact that corpus data record the products of complex, contextually contingent writing processes, involving the interaction of many variables. Capturing the combined outcomes of these complex processes promotes ecological validity. However, it also creates challenges for researchers who want to draw conclusions about specific aspects of the writing process, such as writers’ knowledge of vocabulary or grammar, or their emerging awareness of audience.</p><p>This paper will discuss these issues in detail, illustrating their impact and suggesting ways forward for educationally informative corpus research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 3","pages":"Article 100026"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799122000119/pdfft?md5=fb51c2a7b368f3ae872ed91f7cbfef1b&pid=1-s2.0-S2666799122000119-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47210491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Compiling and analysing a large corpus of online discussions to explore users’ interactions 编译和分析大量在线讨论语料库,以探索用户交互
Pub Date : 2022-08-01 DOI: 10.1016/j.acorp.2022.100017
Shi Min CHUA

This methodology-focused paper reports how I compiled and analysed a 12-million-word corpus of threaded online discussions by employing Corpus Workbench tool (CWB, Evert & Hardie, 2011) and combining corpus analysis with micro-analysis drawing on the principles of digital Conversation Analysis. The tool not only affords an efficient retrieval and analysis of a large dataset, but also, more importantly, facilitates exploration of a corpus of online discussions based on different variables (e.g., topics of discussions, role of internet users, types of postings) and units of analysis (e.g., subforums, threads, postings). Examples are presented to illustrate how I used this tool to investigate various aspects of online discussions, and extract threads surrounding a particular topic or language practices for micro-analysis. I propose internet users’ interactions in online discussions can be further explored in the field of corpus linguistics by using this tool and a synergy of corpus linguistics and an interactional approach.

这篇以方法为重点的论文报告了我是如何利用语料库工作台工具(CWB, Evert &Hardie, 2011),并根据数字会话分析的原则,将语料库分析与微观分析相结合。该工具不仅提供了对大型数据集的有效检索和分析,而且更重要的是,它促进了基于不同变量(例如,讨论主题、互联网用户角色、帖子类型)和分析单元(例如,子论坛、线程、帖子)的在线讨论语料库的探索。本文提供的示例说明了我如何使用该工具调查在线讨论的各个方面,并提取围绕特定主题或语言实践的线索进行微观分析。我建议利用这一工具和语料库语言学与互动方法的协同作用,在语料库语言学领域进一步探索互联网用户在在线讨论中的互动。
{"title":"Compiling and analysing a large corpus of online discussions to explore users’ interactions","authors":"Shi Min CHUA","doi":"10.1016/j.acorp.2022.100017","DOIUrl":"10.1016/j.acorp.2022.100017","url":null,"abstract":"<div><p>This methodology-focused paper reports how I compiled and analysed a 12-million-word corpus of threaded online discussions by employing Corpus Workbench tool (CWB, Evert &amp; Hardie, 2011) and combining corpus analysis with micro-analysis drawing on the principles of digital Conversation Analysis. The tool not only affords an efficient retrieval and analysis of a large dataset, but also, more importantly, facilitates exploration of a corpus of online discussions based on different variables (e.g., topics of discussions, role of internet users, types of postings) and units of analysis (e.g., subforums, threads, postings). Examples are presented to illustrate how I used this tool to investigate various aspects of online discussions, and extract threads surrounding a particular topic or language practices for micro-analysis. I propose internet users’ interactions in online discussions can be further explored in the field of corpus linguistics by using this tool and a synergy of corpus linguistics and an interactional approach.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 2","pages":"Article 100017"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266679912200003X/pdfft?md5=bc9ad1325dae08c713ab8180e4a3e150&pid=1-s2.0-S266679912200003X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47000995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Donkey discourse: Corpus linguistics and charity communications for improved animal welfare 驴语篇:语料库语言学与改善动物福利的慈善交流
Pub Date : 2022-08-01 DOI: 10.1016/j.acorp.2022.100019
Emma McClaughlin , Cara Clancy , Fiona Cooke

A corpus linguistic approach has been applied to examine the representation of donkeys in public discourses for an international equid welfare charity (The Donkey Sanctuary) with a view to improving the British public's understanding of the roles of donkeys, in Britain and worldwide. By increasing understanding of public perceptions, this study aims to support improvements in donkey welfare through targeted education.

The study explored patterning in public discourses about donkeys (online and print news and social media) using corpus linguistic (CL) techniques and tools, supplemented with methods from discourse analysis. The findings highlight key representations of donkeys in public and media discourses that are not present in informed discourses about the animals. In this paper, we examine the results of the corpus study from the perspective of one current aim from The Donkey Sanctuary's public engagement strategy: to promote understanding of donkeys as sentient beings with the capacity to experience a wide range of emotional responses to events or situations.

We found that donkey experience is more subtly represented in the discourses than other aspects of donkey lives, such as actions and behaviours, which have more obvious, overt representations. The results demonstrate the value of applying the CL framework for researchers and practitioners involved in textual analysis for charity communications and public awareness campaigns. We discuss the implications that our findings have for donkey welfare—and animal welfare more generally—as well as what such a methodology could offer other organisations providing public education and/or relying on philanthropic support from the public.

语料库语言学的方法已经应用于检查驴在公共话语的国际马福利慈善机构(驴保护区)的代表,以提高英国公众对驴的角色的理解,在英国和世界各地。本研究旨在透过有针对性的教育,增进公众对驴福利的了解。该研究使用语料库语言学(CL)技术和工具,并辅以话语分析方法,探索了关于驴的公共话语(在线、印刷新闻和社交媒体)的模式。研究结果强调了驴在公共和媒体话语中的主要代表,而这些代表在关于动物的知情话语中并不存在。在本文中,我们从驴保护区公共参与战略的一个当前目标的角度来研究语料库研究的结果:促进对驴作为有知觉的生物的理解,它们有能力对事件或情况做出广泛的情绪反应。我们发现,驴的经历在话语中的表现比驴生活的其他方面(如行动和行为)更为微妙,后者有更明显、更公开的表现。研究结果表明,在慈善传播和公众意识运动的文本分析中,应用CL框架对研究人员和实践者具有重要价值。我们讨论了我们的发现对驴福利的影响,以及更广泛的动物福利,以及这种方法可以为其他提供公共教育和/或依赖公众慈善支持的组织提供什么。
{"title":"Donkey discourse: Corpus linguistics and charity communications for improved animal welfare","authors":"Emma McClaughlin ,&nbsp;Cara Clancy ,&nbsp;Fiona Cooke","doi":"10.1016/j.acorp.2022.100019","DOIUrl":"10.1016/j.acorp.2022.100019","url":null,"abstract":"<div><p>A corpus linguistic approach has been applied to examine the representation of donkeys in public discourses for an international equid welfare charity (The Donkey Sanctuary) with a view to improving the British public's understanding of the roles of donkeys, in Britain and worldwide. By increasing understanding of public perceptions, this study aims to support improvements in donkey welfare through targeted education.</p><p>The study explored patterning in public discourses about donkeys (online and print news and social media) using corpus linguistic (CL) techniques and tools, supplemented with methods from discourse analysis. The findings highlight key representations of donkeys in public and media discourses that are not present in informed discourses about the animals. In this paper, we examine the results of the corpus study from the perspective of one current aim from The Donkey Sanctuary's public engagement strategy: to promote understanding of donkeys as sentient beings with the capacity to experience a wide range of emotional responses to events or situations.</p><p>We found that donkey experience is more subtly represented in the discourses than other aspects of donkey lives, such as actions and behaviours, which have more obvious, overt representations. The results demonstrate the value of applying the CL framework for researchers and practitioners involved in textual analysis for charity communications and public awareness campaigns. We discuss the implications that our findings have for donkey welfare—and animal welfare more generally—as well as what such a methodology could offer other organisations providing public education and/or relying on philanthropic support from the public.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 2","pages":"Article 100019"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45507598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Finding social (mis)alignment in older adult and opioid health policy implementation with corpus-assisted discourse analysis 用语料库辅助语篇分析发现老年人和阿片类药物卫生政策实施中的社会(错误)一致性
Pub Date : 2022-08-01 DOI: 10.1016/j.acorp.2022.100020
Brett A. Diaz

The effective implementation of health policies addressing opioid addiction may be jeopardized by the complex and sometimes mismatched beliefs and discourses held by policymakers, agency administrators, case managers, and ultimately target populations. Policies must be “aligned” socially across service levels, but misalignment by well-meaning stakeholders becomes a potential hindrance to implementation at different administrative levels. This observation motivates the study to ask, what do health policies and agents actually say? Data for this study come from policy documents (n = 100; words = 571,481) and ethnographic interviews (n = 29; words = 171,492) collected from rural, older adult health service offices. Results and analysis focus on comparing linguistic features, keywords and collocations, between policy texts and agents’ talk. Findings show a complex, socially mediated relationship between priorities and stances in official documents and the enacting agents, especially regarding the causes and effects of the opioid epidemic.

政策制定者、机构管理人员、病例管理人员以及最终目标人群所持有的复杂的、有时是不匹配的信念和话语,可能会危及解决阿片类药物成瘾问题的卫生政策的有效实施。政策必须在各个服务级别之间“协调一致”,但善意的利益相关者的不协调可能成为在不同行政级别上实施政策的潜在障碍。这一观察结果促使这项研究提出这样一个问题:卫生政策和机构实际上是怎么说的?本研究的数据来自政策文件(n = 100;词汇= 571,481)和民族志访谈(n = 29;字数= 171492),收集自农村老年人卫生服务机构。结果和分析侧重于比较政策文本和代理人谈话的语言特征、关键词和搭配。调查结果表明,官方文件中的优先事项和立场与颁布机构之间存在复杂的社会中介关系,特别是在阿片类药物流行的原因和影响方面。
{"title":"Finding social (mis)alignment in older adult and opioid health policy implementation with corpus-assisted discourse analysis","authors":"Brett A. Diaz","doi":"10.1016/j.acorp.2022.100020","DOIUrl":"10.1016/j.acorp.2022.100020","url":null,"abstract":"<div><p>The effective implementation of health policies addressing opioid addiction may be jeopardized by the complex and sometimes mismatched beliefs and discourses held by policymakers, agency administrators, case managers, and ultimately target populations. Policies must be “aligned” socially across service levels, but misalignment by well-meaning stakeholders becomes a potential hindrance to implementation at different administrative levels. This observation motivates the study to ask, what do health policies and agents actually say? Data for this study come from policy documents (n = 100; words = 571,481) and ethnographic interviews (n = 29; words = 171,492) collected from rural, older adult health service offices. Results and analysis focus on comparing linguistic features, keywords and collocations, between policy texts and agents’ talk. Findings show a complex, socially mediated relationship between priorities and stances in official documents and the enacting agents, especially regarding the causes and effects of the opioid epidemic.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"2 2","pages":"Article 100020"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42537937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1