首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
Conventionalized phrases and disability policy: A corpus analysis of 2-year and 4-year public colleges in California 常规用语和残疾政策:加州两年制和四年制公立学院语料库分析
Pub Date : 2024-10-28 DOI: 10.1016/j.acorp.2024.100113
Stephen Eyman
This corpus-based study analyzes the use of conventionalized phrases in disability policy. Specifically, it focuses on the three phrases made common by the Americans with Disabilities Act: qualified individual with a disability, reasonable accommodations, and interactive process. These three phrases are analyzed in the context of disability policy at 2-year and 4-year public colleges in California. A corpus of disability policies was created for each of these contexts and analyzed to better understand the varied implementation of conventionalized phrases across contexts. The study finds that the three phrases from the ADA have been diffused across higher education disability policies in the corpora created and are highly conventionalized in these contexts. Additionally, these phrases can be used with slightly different valences depending on the context. These differences in use appear to be directly related to the relationship between the three phrases themselves and they mirror debates in disability policy such as that around the modal ‘may’ in relation to whether or not an institution implements an interactive process. Furthermore, institutional differences in the implementation of these phrases is potentially related to the stances institutions take towards disability and disability policy.
这项基于语料库的研究分析了残疾政策中常规化短语的使用。具体而言,研究重点是《美国残疾人法案》中常用的三个短语:合格的残疾人、合理便利和互动过程。本研究结合加州两年制和四年制公立学院的残障政策对这三个短语进行了分析。我们为每种情况创建了残疾政策语料库,并对其进行了分析,以更好地了解常规化短语在不同情况下的不同实施情况。研究发现,在所创建的语料库中,《美国残疾人法案》中的三个短语已在高等教育残疾人政策中广泛使用,并在这些语境中高度常规化。此外,根据语境的不同,这些短语的使用价值也略有不同。这些使用上的差异似乎与这三个短语本身之间的关系直接相关,它们反映了残疾政策中的争论,如围绕着 "可能 "这一模态,与院校是否实施互动过程有关的争论。此外,机构在使用这些短语上的差异可能与机构对残疾和残疾政策所持的立场有关。
{"title":"Conventionalized phrases and disability policy: A corpus analysis of 2-year and 4-year public colleges in California","authors":"Stephen Eyman","doi":"10.1016/j.acorp.2024.100113","DOIUrl":"10.1016/j.acorp.2024.100113","url":null,"abstract":"<div><div>This corpus-based study analyzes the use of conventionalized phrases in disability policy. Specifically, it focuses on the three phrases made common by the Americans with Disabilities Act: qualified individual with a disability, reasonable accommodations, and interactive process. These three phrases are analyzed in the context of disability policy at 2-year and 4-year public colleges in California. A corpus of disability policies was created for each of these contexts and analyzed to better understand the varied implementation of conventionalized phrases across contexts. The study finds that the three phrases from the ADA have been diffused across higher education disability policies in the corpora created and are highly conventionalized in these contexts. Additionally, these phrases can be used with slightly different valences depending on the context. These differences in use appear to be directly related to the relationship between the three phrases themselves and they mirror debates in disability policy such as that around the modal ‘may’ in relation to whether or not an institution implements an interactive process. Furthermore, institutional differences in the implementation of these phrases is potentially related to the stances institutions take towards disability and disability policy.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100113"},"PeriodicalIF":0.0,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142659056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The effects of teacher, peer and self-feedback on error correction with corpus use 教师、同伴和自我反馈对使用语料库纠错的影响
Pub Date : 2024-10-23 DOI: 10.1016/j.acorp.2024.100114
Yoshiho Satake
The strengths of corpora in language learning have been stated, while not many studies have explored the effects of feedback on error correction in the settings of data-driven learning (DDL), which is an approach where learners use corpora to learn language patterns inductively. Therefore, this study examines the effects of feedback on second language (L2) error correction with corpus use. The author hypothesizes that seeing many example sentences of the target word(s) with corpus use is useful in correcting L2 errors and that different sources of feedback have different effects on error correction. To test the hypotheses, the effects of teacher feedback on 55 participants’ error correction with use of the Corpus of Contemporary American English (COCA) were compared with those of peer feedback along with those of self-feedback. The results show that teacher feedback especially worked well for correcting omission errors and agreement errors. The strength of teacher feedback was identifying correctable errors. The results suggest that efficient corpus use for error correction requires teachers to consider appropriate combinations of feedback and error types (e.g., teacher feedback for omission errors and agreement errors).
语料库在语言学习中的优势已经得到了阐述,但在数据驱动学习(DDL)环境下,即学习者使用语料库归纳学习语言模式时,探讨反馈对纠错的影响的研究并不多。因此,本研究探讨了反馈对使用语料库进行第二语言(L2)纠错的影响。作者假设,通过使用语料库看到许多目标词的例句有助于纠正 L2 错误,而且不同来源的反馈对纠错有不同的影响。为了验证这一假设,我们比较了教师反馈对 55 名学员使用当代美国英语语料库(COCA)纠错的效果,以及同伴反馈和自我反馈的效果。结果表明,教师反馈在纠正遗漏错误和一致错误方面效果尤佳。教师反馈的优势在于识别可纠正的错误。结果表明,要有效地利用语料库进行纠错,教师需要考虑反馈与错误类型的适当组合(例如,教师对遗漏错误和一致错误的反馈)。
{"title":"The effects of teacher, peer and self-feedback on error correction with corpus use","authors":"Yoshiho Satake","doi":"10.1016/j.acorp.2024.100114","DOIUrl":"10.1016/j.acorp.2024.100114","url":null,"abstract":"<div><div>The strengths of corpora in language learning have been stated, while not many studies have explored the effects of feedback on error correction in the settings of data-driven learning (DDL), which is an approach where learners use corpora to learn language patterns inductively. Therefore, this study examines the effects of feedback on second language (L2) error correction with corpus use. The author hypothesizes that seeing many example sentences of the target word(s) with corpus use is useful in correcting L2 errors and that different sources of feedback have different effects on error correction. To test the hypotheses, the effects of teacher feedback on 55 participants’ error correction with use of the Corpus of Contemporary American English (COCA) were compared with those of peer feedback along with those of self-feedback. The results show that teacher feedback especially worked well for correcting omission errors and agreement errors. The strength of teacher feedback was identifying correctable errors. The results suggest that efficient corpus use for error correction requires teachers to consider appropriate combinations of feedback and error types (e.g., teacher feedback for omission errors and agreement errors).</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100114"},"PeriodicalIF":0.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142586523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating the visual content of a commercialized academic listening test: Implications for validity 调查商业化学术听力测试的视觉内容:对有效性的影响
Pub Date : 2024-10-18 DOI: 10.1016/j.acorp.2024.100109
Zhuohan Hou , Vahid Aryadoust , Azrifah Zakaria
As incorporating visual modes in listening tests is gradually gaining traction in second language (L2) assessment, the inclusion of such visuals brings up questions about the role of visual modes in meaning-making during listening and test validity. In this study, we investigated the visual features of the International English Language Testing System (IELTS) listening test through the application of the social semiotic multimodal framework. Our corpus comprised 300 visuals from 256 academic listening testlets published between 1996 and 2022. Unlike the past studies of social semiotic multimodal analyses that relied on qualitative methods, our study adopted a series of visualization and quantitative statistical analysis of frequency and dispersion measures, using the general linear model to examine the visuals from a social semiotic multimodal perspective. The results revealed significant variation in the visual structures of the testlets. Through applying a post-hoc analysis, we further proposed recommendations for further research on multimodal materials in listening assessment and discussed the implications of the observed variation for the validity of the IELTS listening test. This study may be considered the first attempt to examine L2 listening assessment from a corpus-based social semiotic multimodal perspective, which may inspire more investigations on multimodal listening.
随着视觉模式在听力测试中的应用在第二语言(L2)评估中逐渐受到重视,视觉模式在听力过程中的意义生成和测试有效性问题也随之而来。在本研究中,我们应用社会符号学多模态框架研究了国际英语语言测试系统(IELTS)听力测试的视觉特征。我们的语料库由 1996 年至 2022 年间发布的 256 份学术听力测试卷中的 300 个视觉效果组成。与以往依赖定性方法进行社会符号学多模态分析的研究不同,我们的研究采用了一系列可视化和频率与离散度量的定量统计分析方法,利用一般线性模型从社会符号学多模态的角度来研究视觉效果。结果显示,测试片的视觉结构存在明显差异。通过事后分析,我们进一步提出了在听力评估中进一步研究多模态材料的建议,并讨论了观察到的差异对雅思听力测试有效性的影响。本研究可被视为首次尝试从基于语料库的社会符号学多模态视角来研究 L2 听力评估,这可能会激发更多关于多模态听力的研究。
{"title":"Investigating the visual content of a commercialized academic listening test: Implications for validity","authors":"Zhuohan Hou ,&nbsp;Vahid Aryadoust ,&nbsp;Azrifah Zakaria","doi":"10.1016/j.acorp.2024.100109","DOIUrl":"10.1016/j.acorp.2024.100109","url":null,"abstract":"<div><div>As incorporating visual modes in listening tests is gradually gaining traction in second language (L2) assessment, the inclusion of such visuals brings up questions about the role of visual modes in meaning-making during listening and test validity. In this study, we investigated the visual features of the International English Language Testing System (IELTS) listening test through the application of the social semiotic multimodal framework. Our corpus comprised 300 visuals from 256 academic listening testlets published between 1996 and 2022. Unlike the past studies of social semiotic multimodal analyses that relied on qualitative methods, our study adopted a series of visualization and quantitative statistical analysis of frequency and dispersion measures, using the general linear model to examine the visuals from a social semiotic multimodal perspective. The results revealed significant variation in the visual structures of the testlets. Through applying a post-hoc analysis, we further proposed recommendations for further research on multimodal materials in listening assessment and discussed the implications of the observed variation for the validity of the IELTS listening test. This study may be considered the first attempt to examine L2 listening assessment from a corpus-based social semiotic multimodal perspective, which may inspire more investigations on multimodal listening.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100109"},"PeriodicalIF":0.0,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142537362","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Corpus linguistics will benefit from greater adoption of pre-registration: A novice-friendly split-corpus approach to pre-registration 语料库语言学将从更多采用预注册中受益:预注册的新手友好型分割语料库方法
Pub Date : 2024-10-11 DOI: 10.1016/j.acorp.2024.100111
Matthew H.C. Mak
In this brief article, I contend that the field of corpus linguistics stands to gain significantly from an increased adoption of pre-registration. Pre-registration serves to constrain the almost infinite degree of analytic freedom inherent in corpus analysis, thereby enhancing the transparency, reliability, and potential impact of corpus research. While pre-registration is increasingly popular in fields such as psychology and medicine, its uptake in corpus linguistics remains notably limited. To facilitate the transition toward pre-registration, I describe a straightforward split-corpus approach, ideally suited for corpus linguists new to pre-registration and for both hypothesis-testing and exploratory research. This method involves dividing a corpus into an exploratory set (20–40 % of the corpus) and a confirmatory set (the remaining 60–80 %). The exploratory set allows researchers to freely generate hypotheses and develop analysis plans, while the confirmatory set is then used for a more structured and objective analysis according to the pre-specified protocols. By employing this approach, corpus linguists can effectively balance exploratory flexibility with the rigour of confirmatory analysis, boosting the reliability of corpus findings. An increased uptake of pre-registration may not only bolster recognition of corpus linguistics as a robust empirical field, but it may also encourage a stronger emphasis on the building of cumulative knowledge.
在这篇简短的文章中,我认为语料库语言学领域可以从越来越多地采用预注册中获得巨大收益。预注册可以限制语料库分析固有的几乎无限的分析自由度,从而提高语料库研究的透明度、可靠性和潜在影响力。虽然预注册在心理学和医学等领域越来越流行,但在语料库语言学中的应用却仍然非常有限。为了促进向预注册的过渡,我介绍了一种直接的分割语料库方法,非常适合刚开始预注册的语料库语言学家,也适合假设检验和探索性研究。这种方法是将语料库分为探索集(语料库的 20-40%)和确认集(剩余的 60-80%)。探索集允许研究人员自由地提出假设和制定分析计划,而确认集则用于按照预先规定的协议进行更有条理和客观的分析。通过采用这种方法,语料库语言学家可以有效地平衡探索的灵活性和确认分析的严谨性,提高语料库研究结果的可靠性。更多地采用预注册的方法,不仅可以提高人们对语料库语言学作为一个强大的实证领域的认可,还可以鼓励人们更加重视积累知识。
{"title":"Corpus linguistics will benefit from greater adoption of pre-registration: A novice-friendly split-corpus approach to pre-registration","authors":"Matthew H.C. Mak","doi":"10.1016/j.acorp.2024.100111","DOIUrl":"10.1016/j.acorp.2024.100111","url":null,"abstract":"<div><div>In this brief article, I contend that the field of corpus linguistics stands to gain significantly from an increased adoption of pre-registration. Pre-registration serves to constrain the almost infinite degree of analytic freedom inherent in corpus analysis, thereby enhancing the transparency, reliability, and potential impact of corpus research. While pre-registration is increasingly popular in fields such as psychology and medicine, its uptake in corpus linguistics remains notably limited. To facilitate the transition toward pre-registration, I describe a straightforward split-corpus approach, ideally suited for corpus linguists new to pre-registration and for both hypothesis-testing and exploratory research. This method involves dividing a corpus into an exploratory set (20–40 % of the corpus) and a confirmatory set (the remaining 60–80 %). The exploratory set allows researchers to freely generate hypotheses and develop analysis plans, while the confirmatory set is then used for a more structured and objective analysis according to the pre-specified protocols. By employing this approach, corpus linguists can effectively balance exploratory flexibility with the rigour of confirmatory analysis, boosting the reliability of corpus findings. An increased uptake of pre-registration may not only bolster recognition of corpus linguistics as a robust empirical field, but it may also encourage a stronger emphasis on the building of cumulative knowledge.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100111"},"PeriodicalIF":0.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Breach of pacta sunt servanda: A corpus-assisted analysis of newspaper discourse on the AUKUS agreement 违反 "条约必须遵守":对报纸上有关《奥库斯协定》的言论进行语料库辅助分析
Pub Date : 2024-10-05 DOI: 10.1016/j.acorp.2024.100108
Radoslava Trnavac , Encarnacion Hidalgo Tenorio
The AUKUS agreement,1 a strategic pact between Australia, the United Kingdom, and the United States, primarily aimed to facilitate Australia's acquisition of eight nuclear-powered submarines from the US and Britain. This agreement led to the abrupt termination of a previous contract with France's state-owned Naval Group. This article examines the language used in media coverage of the AUKUS agreement in newspapers from various Anglophone and Asian countries. Employing a combination of Sentiment Analysis (Crossley et al., 2017) and Corpus-Assisted Discourse Studies (Partington, 2013; Gillings et al., 2023), we focus on identifying key linguistic patterns, themes, and the sentiment embedded in the discourse. Our findings indicate a general positive assessment of AUKUS in the Anglophone media, contrasted with negative portrayals in Chinese publications. Moreover, the analysis of linguistic components such as adjectives, nouns, and verbs reveals underlying complexities and conflicting viewpoints within the Anglophone discourse itself. By applying Corpus-Assisted Discourse Studies, we uncover the contextual and linguistic factors that shape these diverse perspectives.
AUKUS 协议1 是澳大利亚、英国和美国之间的一项战略协议,主要目的是促进澳大利亚从美国和英国购买 8 艘核动力潜艇。该协议导致澳大利亚突然终止了之前与法国国有海军集团签订的合同。本文研究了多个英语国家和亚洲国家的报纸在报道 AUKUS 协议时所使用的语言。我们综合运用了情感分析法(Crossley 等人,2017 年)和语料库辅助话语研究法(Partington,2013 年;Gillings 等人,2023 年),重点确定了关键的语言模式、主题以及话语中蕴含的情感。我们的研究结果表明,英语媒体对 AUKUS 的评价总体上是积极的,而中文出版物中的评价则是消极的。此外,对形容词、名词和动词等语言成分的分析揭示了英语媒体话语中潜在的复杂性和相互冲突的观点。通过应用语料库辅助话语研究,我们揭示了形成这些不同观点的语境和语言因素。
{"title":"Breach of pacta sunt servanda: A corpus-assisted analysis of newspaper discourse on the AUKUS agreement","authors":"Radoslava Trnavac ,&nbsp;Encarnacion Hidalgo Tenorio","doi":"10.1016/j.acorp.2024.100108","DOIUrl":"10.1016/j.acorp.2024.100108","url":null,"abstract":"<div><div>The AUKUS agreement,<span><span><sup>1</sup></span></span> a strategic pact between Australia, the United Kingdom, and the United States, primarily aimed to facilitate Australia's acquisition of eight nuclear-powered submarines from the US and Britain. This agreement led to the abrupt termination of a previous contract with France's state-owned Naval Group. This article examines the language used in media coverage of the AUKUS agreement in newspapers from various Anglophone and Asian countries. Employing a combination of Sentiment Analysis (Crossley et al., 2017) and Corpus-Assisted Discourse Studies (Partington, 2013; Gillings et al., 2023), we focus on identifying key linguistic patterns, themes, and the sentiment embedded in the discourse. Our findings indicate a general positive assessment of AUKUS in the Anglophone media, contrasted with negative portrayals in Chinese publications. Moreover, the analysis of linguistic components such as adjectives, nouns, and verbs reveals underlying complexities and conflicting viewpoints within the Anglophone discourse itself. By applying Corpus-Assisted Discourse Studies, we uncover the contextual and linguistic factors that shape these diverse perspectives.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100108"},"PeriodicalIF":0.0,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142437801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Identifying ChatGPT-generated texts in EFL students’ writing: Through comparative analysis of linguistic fingerprints 识别英语语言学生写作中由 ChatGPT 生成的文本:通过语言指纹的比较分析
Pub Date : 2024-09-26 DOI: 10.1016/j.acorp.2024.100106
Atsushi Mizumoto , Sachiko Yasuda , Yu Tamura
The emergence of generative AI (GenAI) poses new challenges for L2 writing teachers. This study investigates the distinguishability of essays written by Japanese EFL learners from those generated by ChatGPT. Partially replicating Herbold et al. (2023), 140 first-year university students wrote essays and completed a survey on ChatGPT use. Among them, 125 wrote independently, 13 used ChatGPT for proofreading, and two asked ChatGPT to write the entire essay. To create a comparative dataset, 123 additional essays were generated by ChatGPT, imitating the two texts. The resulting 263 essays were then analyzed using the natural language processing (NLP) technique, including automated linguistic analysis and machine learning classification using random forest. The results reveal significant differences between human-written and ChatGPT-generated essays across all linguistic features, with the latter being easily identifiable. This study emphasizes the need for clear guidelines on the ethical use of AI in L2 writing, highlighting the potential risk of inappropriate AI use and the importance of fostering a mutual understanding of AI use with learners regarding responsible AI integration in academic work.
生成式人工智能(GenAI)的出现给 L2 写作教师带来了新的挑战。本研究调查了日本 EFL 学习者撰写的文章与 ChatGPT 生成的文章之间的可区分性。部分复制 Herbold 等人(2023 年)的研究,140 名大学一年级学生撰写了文章,并完成了关于 ChatGPT 使用情况的调查。其中,125 人独立写作,13 人使用 ChatGPT 进行校对,2 人要求 ChatGPT 撰写整篇文章。为了创建一个比较数据集,ChatGPT 又模仿这两篇文章生成了 123 篇文章。然后使用自然语言处理(NLP)技术对生成的 263 篇文章进行了分析,包括自动语言分析和使用随机森林的机器学习分类。结果显示,人类撰写的文章与 ChatGPT 生成的文章在所有语言特征上都存在显著差异,后者很容易识别。本研究强调了在 L2 写作中使用人工智能的道德规范,强调了不当使用人工智能的潜在风险,以及与学习者就负责任地将人工智能融入学术工作促进对人工智能使用的相互理解的重要性。
{"title":"Identifying ChatGPT-generated texts in EFL students’ writing: Through comparative analysis of linguistic fingerprints","authors":"Atsushi Mizumoto ,&nbsp;Sachiko Yasuda ,&nbsp;Yu Tamura","doi":"10.1016/j.acorp.2024.100106","DOIUrl":"10.1016/j.acorp.2024.100106","url":null,"abstract":"<div><div>The emergence of generative AI (GenAI) poses new challenges for L2 writing teachers. This study investigates the distinguishability of essays written by Japanese EFL learners from those generated by ChatGPT. Partially replicating Herbold et al. (2023), 140 first-year university students wrote essays and completed a survey on ChatGPT use. Among them, 125 wrote independently, 13 used ChatGPT for proofreading, and two asked ChatGPT to write the entire essay. To create a comparative dataset, 123 additional essays were generated by ChatGPT, imitating the two texts. The resulting 263 essays were then analyzed using the natural language processing (NLP) technique, including automated linguistic analysis and machine learning classification using random forest. The results reveal significant differences between human-written and ChatGPT-generated essays across all linguistic features, with the latter being easily identifiable. This study emphasizes the need for clear guidelines on the ethical use of AI in L2 writing, highlighting the potential risk of inappropriate AI use and the importance of fostering a mutual understanding of AI use with learners regarding responsible AI integration in academic work.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100106"},"PeriodicalIF":0.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142422071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
English podcasts for schoolchildren and their vocabulary demands 学童英语播客及其词汇需求
Pub Date : 2024-09-20 DOI: 10.1016/j.acorp.2024.100107
Emily Casaletto , Irina Kerimova , Ulugbek Nurmukhamedov
This exploratory study examines the vocabulary demands of English children's podcasts. A 359,153-word podcast corpus was created using the written transcripts of episodes from these popular children's podcasts: But Why, Circle Round, KidNuz, Smash Boom Best, and Wow in the World. The corpus was analyzed to determine the vocabulary size necessary to know 95 % and 98 % of the words in the English children's podcasts. The results showed that a vocabulary size of the most 4,000-word families plus knowledge of proper nouns (PN), marginal words (MW), transparent compounds (TC) and acronyms (AC) provided 95.69 % coverage of the children's podcast corpus and a vocabulary size of 7,000-word families plus PN, MW, TC and AC reached 98.10 % coverage, indicating that podcasts designed for children require a larger vocabulary size compared to general-audience podcasts designed for adults.
本探索性研究探讨了英语儿童播客的词汇需求。我们使用这些流行的儿童播客的书面文字记录创建了一个包含 359,153 个单词的播客语料库:But Why、Circle Round、KidNuz、Smash Boom Best 和 Wow in the World。对语料库进行了分析,以确定认识英语儿童播客中 95% 和 98% 的单词所需的词汇量。结果表明,4,000 个词族的词汇量加上专有名词 (PN)、边缘词 (MW)、透明化合物 (TC) 和缩略语 (AC) 的知识,儿童播客语料的覆盖率为 95.69%;7,000 个词族的词汇量加上 PN、MW、TC 和 AC 的知识,覆盖率达到 98.10%。
{"title":"English podcasts for schoolchildren and their vocabulary demands","authors":"Emily Casaletto ,&nbsp;Irina Kerimova ,&nbsp;Ulugbek Nurmukhamedov","doi":"10.1016/j.acorp.2024.100107","DOIUrl":"10.1016/j.acorp.2024.100107","url":null,"abstract":"<div><div>This exploratory study examines the vocabulary demands of English children's podcasts. A 359,153-word podcast corpus was created using the written transcripts of episodes from these popular children's podcasts: <em>But Why, Circle Round, KidNuz, Smash Boom Best</em>, and <em>Wow in the World</em>. The corpus was analyzed to determine the vocabulary size necessary to know 95 % and 98 % of the words in the English children's podcasts. The results showed that a vocabulary size of the most 4,000-word families plus knowledge of proper nouns (PN), marginal words (MW), transparent compounds (TC) and acronyms (AC) provided 95.69 % coverage of the children's podcast corpus and a vocabulary size of 7,000-word families plus PN, MW, TC and AC reached 98.10 % coverage, indicating that podcasts designed for children require a larger vocabulary size compared to general-audience podcasts designed for adults.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100107"},"PeriodicalIF":0.0,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Investigating spoken classroom interactions in linguistically heterogeneous learning groups – An interdisciplinary approach to process video-based data in second language acquisition classrooms 调查语言异质学习小组的课堂口语互动--处理第二语言习得课堂视频数据的跨学科方法
Pub Date : 2024-09-15 DOI: 10.1016/j.acorp.2024.100104
Moritz Sahlender , Stefanie Helbert , Inga ten Hagen , Anastasia Knaus , Zarah Weiss
Speaking the local language is central for successful integration into society. The teacher's language in second language (L2) classrooms serves as a crucial tool in language learning. Heterogeneity of learners’ language proficiency levels challenges teachers to adapt their language and accompanied instructional behavior. We offer an approach to study language acquisition processes and how teachers adapt their instructional language. This article presents our language-independent guidelines for processing video-based data of classroom interactions and demonstrate their reliability in a German as Second Language (GSL) classroom. These guidelines enable transcriptions of spoken language in noisy environments and detailed annotations of non-verbal classroom behavior. We outline research avenues at the intersection of empirical education research and linguistics that become feasible through these resources focusing on studying (non-)verbal adaptation strategies of teachers for learners at different proficiency levels. Our work directly fosters the interdisciplinary study of teacher-learner interactions, teacher competencies, and language acquisition.
会说当地语言是成功融入社会的关键。在第二语言(L2)课堂上,教师的语言是语言学习的重要工具。学习者语言水平的异质性对教师的语言和教学行为提出了挑战。我们提供了一种研究语言习得过程和教师如何调整教学语言的方法。本文介绍了我们处理课堂互动视频数据的独立于语言的指南,并在德语作为第二语言(GSL)的课堂上证明了这些指南的可靠性。通过这些指南,我们可以在嘈杂的环境中转录有声语言,并对非语言课堂行为进行详细注释。我们概述了实证教育研究和语言学交叉领域的研究途径,通过这些资源,研究教师针对不同水平的学习者所采取的(非)语言适应策略成为可能。我们的工作直接促进了对师生互动、教师能力和语言习得的跨学科研究。
{"title":"Investigating spoken classroom interactions in linguistically heterogeneous learning groups – An interdisciplinary approach to process video-based data in second language acquisition classrooms","authors":"Moritz Sahlender ,&nbsp;Stefanie Helbert ,&nbsp;Inga ten Hagen ,&nbsp;Anastasia Knaus ,&nbsp;Zarah Weiss","doi":"10.1016/j.acorp.2024.100104","DOIUrl":"10.1016/j.acorp.2024.100104","url":null,"abstract":"<div><div>Speaking the local language is central for successful integration into society. The teacher's language in second language (L2) classrooms serves as a crucial tool in language learning. Heterogeneity of learners’ language proficiency levels challenges teachers to adapt their language and accompanied instructional behavior. We offer an approach to study language acquisition processes and how teachers adapt their instructional language. This article presents our language-independent guidelines for processing video-based data of classroom interactions and demonstrate their reliability in a German as Second Language (GSL) classroom. These guidelines enable transcriptions of spoken language in noisy environments and detailed annotations of non-verbal classroom behavior. We outline research avenues at the intersection of empirical education research and linguistics that become feasible through these resources focusing on studying (non-)verbal adaptation strategies of teachers for learners at different proficiency levels. Our work directly fosters the interdisciplinary study of teacher-learner interactions, teacher competencies, and language acquisition.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100104"},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142323215","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Capturing chronological variation in L2 speech through lexical measurements and regression analysis 通过词汇测量和回归分析捕捉 L2 言语中的年代变化
Pub Date : 2024-09-15 DOI: 10.1016/j.acorp.2024.100105
Mariko Abe , Yuichiro Kobayashi , Yusuke Kondo

This study aims to bridge gaps in current research by analyzing a longitudinal spoken learner corpus of low-proficiency English learners. We investigated the chronological variation in lexical measurements in second language (L2) speaking production, focusing on data from 104 low-proficiency learners elicited eight times over 23 months. Our findings show that measures such as the number of different words and type-token ratio are effective indicators of L2 speaking development, whereas the use of sophisticated vocabulary was not significantly correlated with learning duration. These results suggest that in the early stages of L2 acquisition, speaking skills are influenced primarily by lexical variation. This finding underscores the importance of lexical variation as a key factor in novice-level L2 speaking proficiency.

本研究旨在通过分析低水平英语学习者的纵向口语学习者语料库,弥补当前研究的不足。我们调查了第二语言(L2)口语表达中词汇测量的时间变化,重点研究了在 23 个月内八次激发 104 名低水平学习者的数据。我们的研究结果表明,不同单词的数量和类型-单词比等测量指标是第二语言口语发展的有效指标,而复杂词汇的使用与学习时间的长短没有明显的相关性。这些结果表明,在学习 L2 的早期阶段,口语技能主要受词汇变化的影响。这一发现强调了词汇变化作为新手水平 L2 口语能力关键因素的重要性。
{"title":"Capturing chronological variation in L2 speech through lexical measurements and regression analysis","authors":"Mariko Abe ,&nbsp;Yuichiro Kobayashi ,&nbsp;Yusuke Kondo","doi":"10.1016/j.acorp.2024.100105","DOIUrl":"10.1016/j.acorp.2024.100105","url":null,"abstract":"<div><p>This study aims to bridge gaps in current research by analyzing a longitudinal spoken learner corpus of low-proficiency English learners. We investigated the chronological variation in lexical measurements in second language (L2) speaking production, focusing on data from 104 low-proficiency learners elicited eight times over 23 months. Our findings show that measures such as the number of different words and type-token ratio are effective indicators of L2 speaking development, whereas the use of sophisticated vocabulary was not significantly correlated with learning duration. These results suggest that in the early stages of L2 acquisition, speaking skills are influenced primarily by lexical variation. This finding underscores the importance of lexical variation as a key factor in novice-level L2 speaking proficiency.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100105"},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799124000224/pdfft?md5=18e6b1567dc0d76abee155e9e4bd6910&pid=1-s2.0-S2666799124000224-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142270812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FreeTxt: A corpus-based bilingual free-text survey and questionnaire data analysis toolkit FreeTxt:基于语料库的双语自由文本调查和问卷数据分析工具包
Pub Date : 2024-08-23 DOI: 10.1016/j.acorp.2024.100103
Dawn Knight , Nouran Khallaf , Paul Rayson , Mahmoud El-Haj , Ignatius Ezeani , Steve Morris

Qualitative free-text responses (e.g. from questionnaires and surveys) pose a challenge to many companies and institutions which lack the expertise to analyse such data with ease. While a range of sophisticated tools for the analysis of text do exist, these are often expensive, difficult to use and/or inaccessible to non-expert users. These tools also lack support for the analysis of English and Welsh text, which can be a particular challenge in the bilingual context of Wales. This paper details the key functionalities of the first corpus-based ‘FreeTxt’ toolkit which has been designed to support the systematic analysis and visualisation of free-text data, as a direct response to these two key needs. This paper demonstrates how, by working in partnership, software engineers, natural language processing (NLP) experts and corpus linguists can collaborate with end-users and beneficiaries to provide effective solutions to real world problems. Through the development of FreeTxt (www.freetxt.app), we aimed to empower end-users to direct and lead their own analyses of both small-scale and more extensive datasets to maximise the reach and potential impact generated. The approaches reported here, and the bilingual toolkit developed, can be replicated and extended for use in other language contexts and across a range of public and professional sectors. FreeTxt is now available for the analysis of Welsh and/or English, for use by anyone in any sector in Wales and beyond.

定性的自由文本回复(如来自问卷和调查的回复)给许多公司和机构带来了挑战,因为它们缺乏轻松分析此类数据的专业知识。虽然目前确实存在一系列复杂的文本分析工具,但这些工具往往价格昂贵、难以使用和/或非专家用户无法使用。这些工具还缺乏对英语和威尔士语文本分析的支持,这在威尔士的双语环境中是一个特殊的挑战。本文详细介绍了首个基于语料库的 "FreeTxt "工具包的主要功能,该工具包旨在支持自由文本数据的系统分析和可视化,是对这两个关键需求的直接回应。本文展示了软件工程师、自然语言处理(NLP)专家和语料库语言学家如何通过合作,与最终用户和受益者共同为现实问题提供有效的解决方案。通过开发 FreeTxt (www.freetxt.app),我们旨在授权最终用户指导和领导他们自己对小规模和更大规模数据集的分析,以最大限度地扩大影响范围和潜在影响。本文所报告的方法和开发的双语工具包可在其他语言环境和一系列公共与专业部门中复制和扩展使用。FreeTxt 现在可用于威尔士语和/或英语的分析,供威尔士及其他地区任何部门的任何人使用。
{"title":"FreeTxt: A corpus-based bilingual free-text survey and questionnaire data analysis toolkit","authors":"Dawn Knight ,&nbsp;Nouran Khallaf ,&nbsp;Paul Rayson ,&nbsp;Mahmoud El-Haj ,&nbsp;Ignatius Ezeani ,&nbsp;Steve Morris","doi":"10.1016/j.acorp.2024.100103","DOIUrl":"10.1016/j.acorp.2024.100103","url":null,"abstract":"<div><p>Qualitative free-text responses (e.g. from questionnaires and surveys) pose a challenge to many companies and institutions which lack the expertise to analyse such data with ease. While a range of sophisticated tools for the analysis of text <em>do</em> exist, these are often expensive, difficult to use and/or inaccessible to non-expert users. These tools also lack support for the analysis of English <em>and</em> Welsh text, which can be a particular challenge in the bilingual context of Wales. This paper details the key functionalities of the first corpus-based ‘FreeTxt’ toolkit which has been designed to support the systematic analysis and visualisation of free-text data, as a direct response to these two key needs. This paper demonstrates how, by working in partnership, software engineers, natural language processing (NLP) experts and corpus linguists can collaborate with end-users and beneficiaries to provide effective solutions to real world problems. Through the development of FreeTxt (<span><span>www.freetxt.app</span><svg><path></path></svg></span>), we aimed to empower end-users to <em>direct</em> and lead their own analyses of both small-scale and more extensive datasets to maximise the reach and potential impact generated. The approaches reported here, and the bilingual toolkit developed, can be replicated and extended for use in other language contexts and across a range of public and professional sectors. FreeTxt is now available for the analysis of Welsh and/or English, for use by <em>anyone</em> in <em>any sector</em> in Wales and beyond.</p></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"4 3","pages":"Article 100103"},"PeriodicalIF":0.0,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666799124000200/pdfft?md5=65f8a01d41b4150af967f22d4f542b8f&pid=1-s2.0-S2666799124000200-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142150563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1