首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
Exploring early L2 writing development through the lens of grammatical complexity 从语法复杂性的角度探讨早期二语写作的发展
Pub Date : 2023-10-30 DOI: 10.1016/j.acorp.2023.100077
Tove Larsson , Tony Berber Sardinha , Bethany Gray , Douglas Biber

The present study explores the development of grammatical complexity in L2 English writing at the beginner, lower intermediate, and upper intermediate levels to see (i) to what extent the developmental stages proposed in Biber et al. (2011) are evident in low-proficiency L2 writing, and if so, what the patterns of progression are, and (ii) whether students gradually move away from speech-like production toward more advanced written production. We use data from COBRA, a corpus of L1 Brazilian Portuguese learner production, along with BR-ICLE and BR-LINDSEI. All the data were tagged using the Biber tagger (Biber, 1988) and the Developmental Complexity tagger (Gray et al., 2019), and subsequently analyzed using a technique developed in Staples et al. (2022) to quantify developmental profiles across levels. The technique considers not only overall change in frequency across levels, but also the incremental variation across each adjacent level (based on % frequency changes). The results show that the features were infrequent overall, with a majority of both clausal and phrasal features exhibiting an increase in frequency across the levels, albeit to varying degrees. This general pattern is contrary to predictions based on findings from previous studies, which found phrasal features increasing in use and clausal features decreasing in use. Nonetheless, for the features associated with each developmental stage, the frequencies generally increased, becoming more similar to advanced written production and more dissimilar to spoken production, as hypothesized in Biber et al. (2011).

本研究探讨了第二语言英语写作在初级、中低和中高水平时语法复杂性的发展,以了解(i) Biber等人(2011)提出的发展阶段在低熟练程度的第二语言写作中有多明显,如果是这样,发展模式是什么,以及(ii)学生是否逐渐从类似言语的生产转向更高级的书面生产。我们使用的数据来自COBRA,这是一个L1巴西葡萄牙语学习者的语料库,以及BR-ICLE和BR-LINDSEI。所有数据都使用Biber标记器(Biber, 1988)和发育复杂性标记器(Gray等人,2019)进行标记,随后使用Staples等人(2022)开发的技术进行分析,以量化各水平的发育概况。该技术不仅考虑了各级频率的总体变化,而且还考虑了每个相邻级别的增量变化(基于%频率变化)。结果表明,这些特征总体上并不常见,大多数小句和短语特征在各个级别上都表现出频率的增加,尽管程度不同。这种普遍模式与基于先前研究结果的预测相反,先前的研究发现短语特征的使用增加,小句特征的使用减少。尽管如此,正如Biber等人(2011)所假设的那样,对于与每个发展阶段相关的特征,频率普遍增加,与高级书面生产更相似,与口语生产更不同。
{"title":"Exploring early L2 writing development through the lens of grammatical complexity","authors":"Tove Larsson ,&nbsp;Tony Berber Sardinha ,&nbsp;Bethany Gray ,&nbsp;Douglas Biber","doi":"10.1016/j.acorp.2023.100077","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100077","url":null,"abstract":"<div><p>The present study explores the development of grammatical complexity in L2 English writing at the beginner, lower intermediate, and upper intermediate levels to see (i) to what extent the developmental stages proposed in Biber et al. (2011) are evident in low-proficiency L2 writing, and if so, what the patterns of progression are, and (ii) whether students gradually move away from speech-like production toward more advanced written production. We use data from COBRA, a corpus of L1 Brazilian Portuguese learner production, along with BR-ICLE and BR-LINDSEI. All the data were tagged using the Biber tagger (Biber, 1988) and the Developmental Complexity tagger (Gray et al., 2019), and subsequently analyzed using a technique developed in Staples et al. (2022) to quantify developmental profiles across levels. The technique considers not only overall change in frequency across levels, but also the incremental variation across each adjacent level (based on % frequency changes). The results show that the features were infrequent overall, with a majority of both clausal and phrasal features exhibiting an increase in frequency across the levels, albeit to varying degrees. This general pattern is contrary to predictions based on findings from previous studies, which found phrasal features increasing in use and clausal features <em>decreasing</em> in use. Nonetheless, for the features associated with each developmental stage, the frequencies generally increased, becoming more similar to advanced written production and more dissimilar to spoken production, as hypothesized in Biber et al. (2011).</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91989988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Effective corpus use in second language learning: A meta-analytic approach 第二语言学习中语料库的有效使用:一种元分析方法
Pub Date : 2023-10-21 DOI: 10.1016/j.acorp.2023.100076
Shotaro Ueno , Osamu Takeuchi

Data-driven learning (DDL) refers to the use of corpora by second and foreign language (L2) learners to explore and inductively discover patterns of their target language use from authentic language data without interventions from others. Although previous meta-analyses have demonstrated the positive effects of DDL on L2 learning (Boulton and Cobb, 2017), the number of empirical studies has been increasing since then. Therefore, this study included more recent studies and used meta-analyses to examine the extent to which: (1) DDL exerts an effect on L2 learning; and (2) moderator variables affect DDL's influence on L2 learning. The results demonstrated small to medium effect sizes for experimental/control group comparisons and pre/post and pre/delayed designs. Moreover, the moderator analyses found that moderator variables, such as publication types, learners’ factors, and research designs, influence the magnitude of DDL effectiveness in L2 learning.

数据驱动学习(data -driven learning, DDL)是指第二语言和外语学习者在没有他人干预的情况下,利用语料库从真实的语言数据中探索和归纳发现目标语言的使用模式。虽然之前的荟萃分析已经证明了DDL对第二语言学习的积极影响(Boulton和Cobb, 2017),但自那以后,实证研究的数量一直在增加。因此,本研究纳入了最近的研究,并使用荟萃分析来检验:(1)DDL对二语学习的影响程度;(2)调节变量影响DDL对二语学习的影响。结果表明,实验组/对照组比较、前/后和前/延迟设计的效应大小为小到中等。此外,调节变量分析发现,调节变量,如出版物类型、学习者因素和研究设计,会影响第二语言学习中DDL有效性的大小。
{"title":"Effective corpus use in second language learning: A meta-analytic approach","authors":"Shotaro Ueno ,&nbsp;Osamu Takeuchi","doi":"10.1016/j.acorp.2023.100076","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100076","url":null,"abstract":"<div><p>Data-driven learning (DDL) refers to the use of corpora by second and foreign language (L2) learners to explore and inductively discover patterns of their target language use from authentic language data without interventions from others. Although previous meta-analyses have demonstrated the positive effects of DDL on L2 learning (Boulton and Cobb, 2017), the number of empirical studies has been increasing since then. Therefore, this study included more recent studies and used meta-analyses to examine the extent to which: (1) DDL exerts an effect on L2 learning; and (2) moderator variables affect DDL's influence on L2 learning. The results demonstrated small to medium effect sizes for experimental/control group comparisons and pre/post and pre/delayed designs. Moreover, the moderator analyses found that moderator variables, such as publication types, learners’ factors, and research designs, influence the magnitude of DDL effectiveness in L2 learning.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91957142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using corpus linguistics to create tasks for teaching and assessing Aeronautical English 运用语料库语言学创建航空英语教学和评估任务
Pub Date : 2023-10-11 DOI: 10.1016/j.acorp.2023.100075
Aline Pacheco , Angela Carolina de Moraes Garcia , Ana Lúcia Tavares Monteiro , Malila Carvalho de Almeida Prado , Patrícia Tosqui-Lucks

This article presents the theoretical basis for corpus linguistics applied to Aeronautical English teaching and assessment followed by practical examples on how to use corpora to develop tasks for both purposes. It originates from the design of two webinars held remotely at the end of 2020, and promoted by the International Civil Aviation English Association. The webinars were targeted at Aeronautical English teachers, material designers, and test developers with little or no previous knowledge of corpus linguistics with the aim of guiding the audience in preparing step–by–step tasks using corpora. We share the work involved in the task design suggested, bridging the gap between research and practice. We conclude by outlining limitations, and suggesting prospects for future research.

本文介绍了语料库语言学应用于航空英语教学和评估的理论基础,并举例说明了如何利用语料库来开发航空英语教学和评估任务。它源于2020年底远程举办的两场网络研讨会的设计,并由国际民航英语协会推动。网络研讨会针对的是航空英语教师、材料设计师和测试开发人员,他们之前对语料库语言学知之甚少或一无所知,目的是指导听众使用语料库准备一步一步的任务。我们分享任务设计建议所涉及的工作,弥合研究与实践之间的差距。最后,我们概述了局限性,并提出了未来研究的展望。
{"title":"Using corpus linguistics to create tasks for teaching and assessing Aeronautical English","authors":"Aline Pacheco ,&nbsp;Angela Carolina de Moraes Garcia ,&nbsp;Ana Lúcia Tavares Monteiro ,&nbsp;Malila Carvalho de Almeida Prado ,&nbsp;Patrícia Tosqui-Lucks","doi":"10.1016/j.acorp.2023.100075","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100075","url":null,"abstract":"<div><p><span>This article presents the theoretical basis for corpus linguistics applied to Aeronautical English teaching and assessment followed by practical examples on how to use corpora to develop tasks for both purposes. It originates from the design of two webinars held remotely at the end of 2020, and promoted by the International </span>Civil Aviation English Association. The webinars were targeted at Aeronautical English teachers, material designers, and test developers with little or no previous knowledge of corpus linguistics with the aim of guiding the audience in preparing step–by–step tasks using corpora. We share the work involved in the task design suggested, bridging the gap between research and practice. We conclude by outlining limitations, and suggesting prospects for future research.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49863545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lexical change and stability in 100 years of English in US newspapers 美国报纸100年来英语词汇的变化与稳定
Pub Date : 2023-09-08 DOI: 10.1016/j.acorp.2023.100073
Robert Poole , Qudus Ayinde Adebayo

This study explores diachronic variation across approximately one hundred years of the newspaper register in US American English from 1920 to 2019 as captured in the Corpus of Historical American English (Davies, 2010). Informed by a similar study of lexical change in British English (Baker, 2011), the analysis identified high-frequency words exhibiting the greatest increases and decreases in use as well as those words demonstrating stability across the four sampling periods: 1920–29, 1950–59, 1980–89, 2010–19. The process to identify words of change and stability began first with the application of a cumulative frequency threshold; coefficient of variance and Kendall's Tau correlation coefficient were then calculated to aid in identification. In other words, the process targeted high-frequency words whose use has demonstrated the greatest change or stability. The discussion presents the three resulting word lists (increasing, decreasing, stable) and reports concordance and collocation analysis of select words from each list to gain insight into the underlying factors informing lexical change and stability.

本研究探讨了美国历史英语语料库(Davies, 2010)中从1920年到2019年的美国英语中大约100年的报纸语域的历时变化。根据对英式英语词汇变化的类似研究(Baker, 2011),该分析发现,在1920 - 1929年、1950 - 1959年、1980 - 1989年、2010 - 2019年四个采样时期,高频词的使用增加和减少幅度最大,而高频词的使用则保持稳定。识别变化和稳定词的过程首先从应用累积频率阈值开始;然后计算方差系数和肯德尔Tau相关系数以帮助识别。换句话说,这个过程针对的是使用变化最大或最稳定的高频词。讨论了三个结果词表(增加,减少,稳定),并报告了从每个列表中选择的单词的一致性和搭配分析,以深入了解影响词汇变化和稳定的潜在因素。
{"title":"Lexical change and stability in 100 years of English in US newspapers","authors":"Robert Poole ,&nbsp;Qudus Ayinde Adebayo","doi":"10.1016/j.acorp.2023.100073","DOIUrl":"10.1016/j.acorp.2023.100073","url":null,"abstract":"<div><p>This study explores diachronic variation across approximately one hundred years of the newspaper register in US American English from 1920 to 2019 as captured in the Corpus of Historical American English (Davies, 2010). Informed by a similar study of lexical change in British English (Baker, 2011), the analysis identified high-frequency words exhibiting the greatest increases and decreases in use as well as those words demonstrating stability across the four sampling periods: 1920–29, 1950–59, 1980–89, 2010–19. The process to identify words of change and stability began first with the application of a cumulative frequency threshold; coefficient of variance and Kendall's Tau correlation coefficient were then calculated to aid in identification. In other words, the process targeted high-frequency words whose use has demonstrated the greatest change or stability. The discussion presents the three resulting word lists (increasing, decreasing, stable) and reports concordance and collocation analysis of select words from each list to gain insight into the underlying factors informing lexical change and stability.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46738896","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use 数据驱动学习与生成式人工智能:引入元认知资源使用框架
Pub Date : 2023-09-07 DOI: 10.1016/j.acorp.2023.100074
Atsushi Mizumoto

This paper explores the intersection of data-driven learning (DDL) and generative AI (GenAI), represented by technologies like ChatGPT, in the realm of language learning and teaching. It presents two complementary perspectives on how to integrate these approaches. The first viewpoint advocates for a blended methodology that synergizes DDL and GenAI, capitalizing on their complementary strengths while offsetting their individual limitations. The second introduces the Metacognitive Resource Use (MRU) framework, a novel paradigm that positions DDL within an expansive ecosystem of language resources, which also includes GenAI tools. Anchored in the foundational principles of metacognition, the MRU framework centers on two pivotal dimensions: metacognitive knowledge and metacognitive regulation. The paper proposes pedagogical recommendations designed to enable learners to strategically utilize a wide range of language resources, from corpora to GenAI technologies, guided by their self-awareness, the specifics of the task, and relevant strategies. The paper concludes by highlighting promising avenues for future research, notably the empirical assessment of both the integrated DDL-GenAI approach and the MRU framework.

本文探讨了以ChatGPT等技术为代表的数据驱动学习(DDL)和生成式人工智能(GenAI)在语言学习和教学领域的交叉。它就如何整合这些方法提出了两个互补的观点。第一种观点提倡一种混合方法,使DDL和GenAI协同工作,利用它们的互补优势,同时抵消它们各自的局限性。第二部分介绍了元认知资源使用(MRU)框架,这是一种新的范式,将DDL定位在一个广泛的语言资源生态系统中,其中也包括GenAI工具。MRU框架以元认知的基本原理为基础,以两个关键维度为中心:元认知知识和元认知调节。本文提出了教学建议,旨在使学习者能够在自我意识、任务细节和相关策略的指导下,战略性地利用广泛的语言资源,从语料库到GenAI技术。本文最后强调了未来研究的有希望的途径,特别是对综合DDL-GenAI方法和MRU框架的实证评估。
{"title":"Data-driven Learning Meets Generative AI: Introducing the Framework of Metacognitive Resource Use","authors":"Atsushi Mizumoto","doi":"10.1016/j.acorp.2023.100074","DOIUrl":"10.1016/j.acorp.2023.100074","url":null,"abstract":"<div><p>This paper explores the intersection of data-driven learning (DDL) and generative AI (GenAI), represented by technologies like ChatGPT, in the realm of language learning and teaching. It presents two complementary perspectives on how to integrate these approaches. The first viewpoint advocates for a blended methodology that synergizes DDL and GenAI, capitalizing on their complementary strengths while offsetting their individual limitations. The second introduces the Metacognitive Resource Use (MRU) framework, a novel paradigm that positions DDL within an expansive ecosystem of language resources, which also includes GenAI tools. Anchored in the foundational principles of metacognition, the MRU framework centers on two pivotal dimensions: metacognitive knowledge and metacognitive regulation. The paper proposes pedagogical recommendations designed to enable learners to strategically utilize a wide range of language resources, from corpora to GenAI technologies, guided by their self-awareness, the specifics of the task, and relevant strategies. The paper concludes by highlighting promising avenues for future research, notably the empirical assessment of both the integrated DDL-GenAI approach and the MRU framework.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48929007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stative verbs and perceptions of intensity: The case of ‘believe’ in simple and progressive aspect 静态动词与强度知觉——以“Believe”的简单体和进行体为例
Pub Date : 2023-09-04 DOI: 10.1016/j.acorp.2023.100072
Naoko Taguchi , Marianna Gracheva

This study assessed the validity of descriptive findings from corpus linguistics research by analyzing human participants’ performance and perception data. While the stative verb believe usually occurs in the simple aspect, a corpus-based analysis has revealed that believe also occurs in the progressive form in communicative situations conveying a heightened degree of intensity and marked with specific linguistic features such as intensifying adjectives, adverbs of certainty, direct addresses, and others (Gracheva, in press). This study adopted an experimental approach to further assess the link between the progressive form in situations of use conducive to assertive stance and emotional involvement and its surrounding linguistic characteristics. Eighty-six native English speakers were presented with 24 naturally-occurring texts from corpora. Half of the texts involved linguistic features of intensity (progressive aspect condition), while half involved no such features (simple aspect condition). Participants read the texts and selected the form of believe (simple or progressive aspect) which they thought was appropriate in each text. Results showed that participants selected the progressive aspect 47% of the times for the texts featuring language of intensity, while their selection of that aspect was less than 3% in the simple condition texts. Follow-up interviews revealed that participants sensed the intensity conveyed by the texts (e.g., strong emotion, urgency, emphasis), leading to their choice of the progressive over the simple aspect.

本研究通过分析人类参与者的表现和感知数据来评估语料库语言学研究中描述性发现的有效性。虽然状态动词believe通常以简单语态出现,但一项基于语料库的分析表明,believe在交际情境中也以进行形式出现,传达出更高程度的强度,并带有特定的语言特征,如强化形容词、确定性副词、直接指代等(Gracheva, in press)。本研究采用实验方法进一步评估了在有利于自信立场和情感投入的使用情境中进行式及其周围语言特征之间的联系。86名以英语为母语的人从语料库中获得了24个自然产生的文本。一半的文本涉及强度的语言特征(进行体条件),而一半的文本不涉及强度的语言特征(简单体条件)。参与者阅读文本并选择他们认为在每篇文本中合适的相信形式(简单或渐进)。结果表明,在具有强烈语言的文本中,参与者选择进步方面的次数占47%,而在简单条件文本中,他们选择进步方面的次数不到3%。后续访谈显示,参与者感受到文本所传达的强度(例如,强烈的情感、紧迫性、强调),导致他们选择渐进而不是简单的方面。
{"title":"Stative verbs and perceptions of intensity: The case of ‘believe’ in simple and progressive aspect","authors":"Naoko Taguchi ,&nbsp;Marianna Gracheva","doi":"10.1016/j.acorp.2023.100072","DOIUrl":"10.1016/j.acorp.2023.100072","url":null,"abstract":"<div><p><span>This study assessed the validity of descriptive findings from corpus linguistics research by analyzing human participants’ performance and perception data. While the stative verb </span><em>believe</em> usually occurs in the simple aspect, a corpus-based analysis has revealed that <em>believe</em> also occurs in the progressive form in communicative situations conveying a heightened degree of intensity and marked with specific linguistic features such as intensifying adjectives, adverbs of certainty, direct addresses, and others (<span>Gracheva, in press</span>). This study adopted an experimental approach to further assess the link between the progressive form in situations of use conducive to assertive stance and emotional involvement and its surrounding linguistic characteristics. Eighty-six native English speakers were presented with 24 naturally-occurring texts from corpora. Half of the texts involved linguistic features of intensity (progressive aspect condition), while half involved no such features (simple aspect condition). Participants read the texts and selected the form of <em>believe</em> (simple or progressive aspect) which they thought was appropriate in each text. Results showed that participants selected the progressive aspect 47% of the times for the texts featuring language of intensity, while their selection of that aspect was less than 3% in the simple condition texts. Follow-up interviews revealed that participants sensed the intensity conveyed by the texts (e.g., strong emotion, urgency, emphasis), leading to their choice of the progressive over the simple aspect<em>.</em></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49497487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The DAIS-C: A small, specialised, spoken, schizophrenia corpus DAIS-C:一个小型的、专门的、口语的精神分裂症语料库
Pub Date : 2023-08-23 DOI: 10.1016/j.acorp.2023.100069
Oliver Delgaram-Nejad , Dawn Archer , Gerasimos Chatzidamianos , Louise Robinson , Alex Bartha

This paper describes the design and development of the DAIS-C (Discussing Abstract Ideas in Schizophrenia Corpus), a small, specialised corpus of spoken language in which speakers with a diagnosis of schizophrenia and those with no self-reported psychiatric or neuroleptic history were interviewed on the same topics. The corpus was constructed to allow for comparative analyses of speech behaviour in relation to linguistic creativity and formal thought disorder (FTD), but additional steps were taken to ensure that the corpus could be of use to other researchers and research questions. The present paper covers design decisions relevant to the construction of clinical corpora alongside information about the corpus of potential use to researchers interested in its use.

本文描述了DAIS-C(讨论精神分裂症语料库中的抽象概念)的设计和开发,这是一个小型的、专门的口语语料库,在这个语料库中,被诊断为精神分裂症的说话者和那些没有自我报告精神或神经安定病史的说话者就相同的主题进行了访谈。该语料库的构建是为了对语言创造力和形式思维障碍(FTD)相关的言语行为进行比较分析,但还采取了额外的步骤,以确保该语料库可以用于其他研究人员和研究问题。本论文涵盖了与临床语料库建设相关的设计决策,以及对其使用感兴趣的研究人员潜在使用语料库的信息。
{"title":"The DAIS-C: A small, specialised, spoken, schizophrenia corpus","authors":"Oliver Delgaram-Nejad ,&nbsp;Dawn Archer ,&nbsp;Gerasimos Chatzidamianos ,&nbsp;Louise Robinson ,&nbsp;Alex Bartha","doi":"10.1016/j.acorp.2023.100069","DOIUrl":"10.1016/j.acorp.2023.100069","url":null,"abstract":"<div><p>This paper describes the design and development of the DAIS-C (Discussing Abstract Ideas in Schizophrenia Corpus), a small, specialised corpus of spoken language in which speakers with a diagnosis of schizophrenia and those with no self-reported psychiatric or neuroleptic history were interviewed on the same topics. The corpus was constructed to allow for comparative analyses of speech behaviour in relation to linguistic creativity and formal thought disorder (FTD), but additional steps were taken to ensure that the corpus could be of use to other researchers and research questions. The present paper covers design decisions relevant to the construction of clinical corpora alongside information about the corpus of potential use to researchers interested in its use.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48630178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Development of word count data corpus for Hindi and Marathi literature 印地语和马拉地语文学字数统计语料库的开发
Pub Date : 2023-08-23 DOI: 10.1016/j.acorp.2023.100070
Vivek Belhekar, Radhika Bhargava

India has a huge diversity of languages, and Hindi and Marathi are the most spoken languages in the northern and western parts of India. Hindi and Marathi have more than 528 million and 83 million speakers, respectively. The paper describes the development of the Hindi Word Corpus (Hindi WordCorp) and the Marathi Word Corpus (Marathi WordCorp), reporting the frequency of single words (1-gram) used in written texts of the respective languages using the bag-of-words model (BoW). The word frequencies are provided for eleven decades (pre-1920, 1920 to 2020). These texts include books (fiction, non-fiction, history, autobiographies, etc.) and magazines. Academic and reference books were not used. The Hindi WordCorp and Marathi WordCorp used 640 and 712 texts, respectively. An analysis was employed to check whether the texts used were enough to stabilize the rank-order of the total frequencies of the words. Zipf's and Heaps’ law coefficients indicated the sufficiency of the texts. Researchers in various areas like linguistics, social sciences, text mining, machine learning, etc., can use the dataset to answer research questions about language and culture. Some demonstrative examples are provided for using the datasets in the two languages. The dataset is made available on an open data repository. The paper is an account of dataset creation for Hindi and Marathi WordCorp. Hence, no empirical results or conclusions are made based on the data created. A WebApp named Indian Languages Word Corpus (ILWC) has been developed for users. Future directions for text mining and language models are discussed.

印度有多种多样的语言,印地语和马拉地语是印度北部和西部最常用的语言。印地语和马拉地语的使用者分别超过5.28亿和8300万。本文描述了印地语单词语料库(Hindi WordCorp)和马拉地语单词语料库(Marathi WordCorp)的发展,使用单词袋模型(BoW)报告了各自语言书面文本中单个单词(1克)的使用频率。提供了11年的词频(1920年以前、1920年至2020年)。这些文本包括书籍(小说、非小说、历史、自传等)和杂志。没有使用学术书籍和参考书。印地语词汇公司和马拉地语词汇公司分别使用了640和712个文本。通过分析来检查所使用的文本是否足以稳定单词总频率的等级顺序。齐夫定律系数和希普斯定律系数表明了文本的充分性。语言学、社会科学、文本挖掘、机器学习等各个领域的研究人员都可以使用该数据集来回答有关语言和文化的研究问题。提供了一些使用两种语言的数据集的示范示例。数据集在开放数据存储库上可用。这篇论文是关于印地语和马拉地语WordCorp数据集创建的一篇文章。因此,没有实证结果或结论是基于所创建的数据。为用户开发了一个名为印度语言词库(ILWC)的web应用程序。讨论了文本挖掘和语言模型的未来发展方向。
{"title":"Development of word count data corpus for Hindi and Marathi literature","authors":"Vivek Belhekar,&nbsp;Radhika Bhargava","doi":"10.1016/j.acorp.2023.100070","DOIUrl":"10.1016/j.acorp.2023.100070","url":null,"abstract":"<div><p><span>India has a huge diversity of languages, and Hindi and Marathi are the most spoken languages in the northern and western parts of India. Hindi and Marathi have more than 528 million and 83 million speakers, respectively. The paper describes the development of the Hindi Word Corpus (Hindi WordCorp) and the Marathi Word Corpus (Marathi WordCorp), reporting the frequency of single words (1-gram) used in written texts of the respective languages using the bag-of-words model (BoW). The word frequencies are provided for eleven decades (pre-1920, 1920 to 2020). These texts include books (fiction, non-fiction, history, autobiographies, etc.) and magazines. Academic and reference books were not used. The Hindi WordCorp and Marathi WordCorp used 640 and 712 texts, respectively. An analysis was employed to check whether the texts used were enough to stabilize the rank-order of the total frequencies of the words. Zipf's and Heaps’ law coefficients indicated the sufficiency of the texts. Researchers in various areas like linguistics, social sciences, text mining, machine learning, etc., can use the dataset to answer research questions about language and culture. Some demonstrative examples are provided for using the datasets in the two languages. The dataset is made available on an </span>open data<span> repository. The paper is an account of dataset creation for Hindi and Marathi WordCorp. Hence, no empirical results or conclusions are made based on the data created. A WebApp named Indian Languages Word Corpus (ILWC) has been developed for users. Future directions for text mining and language models are discussed.</span></p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48742031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The identification of YouTube videos that feature the linguistic features of English informal speech YouTube视频中具有英语非正式言语语言特征的识别
Pub Date : 2023-08-20 DOI: 10.1016/j.acorp.2023.100068
Christopher R. Cooper

YouTube is becoming an increasingly popular entertainment platform, with videos catering to a wide range of interests. If L2 users are to become proficient in the primary form of language, conversation, then the affordances created by YouTube videos containing informal speech could be very useful. In the current study a near-random corpus of 2602 YouTube video transcripts was compiled and 200 randomly selected texts from the Spoken BNC2014 (Love et al., 2017) were used as a reference corpus representing informal spoken English. The texts were tagged with 67 linguistic features as part of an additive multi-dimensional analysis. The dimension scores for each text were used in a cluster analysis to investigate which texts clustered with the Spoken BNC2014 texts. A two-cluster solution was chosen with 666 YouTube texts and 171 Spoken BNC2014 texts in one cluster, and the remaining texts in the other cluster. A small sample of texts from each cluster was analysed in detail. It is shown that this method has the potential to identify videos featuring informal speech and that some videos with similar categories have a very different linguistic style.

YouTube正在成为一个越来越受欢迎的娱乐平台,其视频迎合了广泛的兴趣。如果第二语言使用者要精通语言的主要形式,即对话,那么YouTube视频中包含的非正式演讲可能非常有用。在当前的研究中,编译了2602个YouTube视频文本的近乎随机语料库,并从口语BNC2014 (Love et al., 2017)中随机选择了200个文本作为代表非正式口语的参考语料库。作为附加的多维分析的一部分,这些文本被标记为67种语言特征。在聚类分析中使用每个文本的维度得分来调查哪些文本与口语BNC2014文本聚类。我们选择了一个双集群解决方案,其中666个YouTube文本和171个口语BNC2014文本在一个集群中,其余文本在另一个集群中。对每组文本的一小部分样本进行了详细分析。研究表明,这种方法有可能识别出具有非正式语言特征的视频,并且一些具有类似类别的视频具有非常不同的语言风格。
{"title":"The identification of YouTube videos that feature the linguistic features of English informal speech","authors":"Christopher R. Cooper","doi":"10.1016/j.acorp.2023.100068","DOIUrl":"10.1016/j.acorp.2023.100068","url":null,"abstract":"<div><p>YouTube is becoming an increasingly popular entertainment platform, with videos catering to a wide range of interests. If L2 users are to become proficient in the primary form of language, conversation, then the affordances created by YouTube videos containing informal speech could be very useful. In the current study a near-random corpus of 2602 YouTube video transcripts was compiled and 200 randomly selected texts from the Spoken BNC2014 (Love et al., 2017) were used as a reference corpus representing informal spoken English. The texts were tagged with 67 linguistic features as part of an additive multi-dimensional analysis. The dimension scores for each text were used in a cluster analysis to investigate which texts clustered with the Spoken BNC2014 texts. A two-cluster solution was chosen with 666 YouTube texts and 171 Spoken BNC2014 texts in one cluster, and the remaining texts in the other cluster. A small sample of texts from each cluster was analysed in detail. It is shown that this method has the potential to identify videos featuring informal speech and that some videos with similar categories have a very different linguistic style.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42633528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Review of McEnery and Brezina (2022) Fundamental Principles of Corpus Linguistics 回顾McEnery和Brezina(2022)语料库语言学的基本原则
Pub Date : 2023-08-01 DOI: 10.1016/j.acorp.2023.100055
Rickey Lu
{"title":"Review of McEnery and Brezina (2022) Fundamental Principles of Corpus Linguistics","authors":"Rickey Lu","doi":"10.1016/j.acorp.2023.100055","DOIUrl":"https://doi.org/10.1016/j.acorp.2023.100055","url":null,"abstract":"","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49858142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1