首页 > 最新文献

Applied Corpus Linguistics最新文献

英文 中文
SCILIC and SCAWL: Developing a smart cities corpus and academic word list SCILIC和SCAWL:开发智慧城市语料库和学术词表
IF 2.1 Pub Date : 2026-01-16 DOI: 10.1016/j.acorp.2026.100191
Abdulaziz B Sanosi
The rapid growth of smart city initiatives over the past two decades has led to a surge in research and practical applications. However, it has also resulted in significant terminological fragmentation across academic discourse, educational practices, and urban policy frameworks, posing challenges to achieving the educational and urban development targets outlined in the United Nations’ SDG 4: Quality Education and SDG 11: Sustainable Cities and Communities. To address this gap, the present study aims to develop and validate the Smart CIties LIterature Corpus (SCILIC) and to generate a word list from it through systematic corpus linguistic analysis. The corpus comprises 3.6 million tokens sourced from two primary domains: peer-reviewed articles indexed in Scopus and Web of Science (2015–2025) and technical reports from the UNHabitat digital repository (2010–2025). Utilizing #LancsBox and complementary analytical tools, the study compiled a balanced and representative corpus and generated the Smart Cities Academic Word List (SCAWL), comprising 550-word families and 667 individual words. Quantitative analysis indicates that SCAWL accounts for 7.8% of the total corpus tokens. The findings underline the multidisciplinary nature of smart city vocabulary and highlight the importance of integrating both academic and policy-oriented sources. By supporting the development of targeted educational resources and promoting clearer conceptual understanding, this research contributes directly to the advancement of SDG 4 and SDG 11, fostering both educational quality and sustainable urban development.
过去二十年来,智慧城市倡议的快速发展导致了研究和实际应用的激增。然而,这也导致了学术话语、教育实践和城市政策框架中术语的严重分裂,给实现联合国可持续发展目标4:优质教育和可持续发展目标11:可持续城市和社区中概述的教育和城市发展目标带来了挑战。为了解决这一差距,本研究旨在开发和验证智慧城市文学语料库(SCILIC),并通过系统的语料库语言分析从中生成单词列表。该语料库包括来自两个主要领域的360万个代币:Scopus和Web of Science索引的同行评议文章(2015-2025)和联合国人居署数字资源库的技术报告(2010-2025)。利用#LancsBox和互补的分析工具,该研究编制了一个平衡且具有代表性的语料库,并生成了智能城市学术词汇表(SCAWL),其中包括550个单词家族和667个单个单词。定量分析表明,SCAWL占语料库令牌总数的7.8%。研究结果强调了智慧城市词汇的多学科性质,并强调了整合学术和政策导向资源的重要性。通过支持有针对性的教育资源的开发和促进更清晰的概念理解,本研究直接有助于推进可持续发展目标4和可持续发展目标11,促进教育质量和可持续城市发展。
{"title":"SCILIC and SCAWL: Developing a smart cities corpus and academic word list","authors":"Abdulaziz B Sanosi","doi":"10.1016/j.acorp.2026.100191","DOIUrl":"10.1016/j.acorp.2026.100191","url":null,"abstract":"<div><div>The rapid growth of smart city initiatives over the past two decades has led to a surge in research and practical applications. However, it has also resulted in significant terminological fragmentation across academic discourse, educational practices, and urban policy frameworks, posing challenges to achieving the educational and urban development targets outlined in the United Nations’ SDG 4: Quality Education and SDG 11: Sustainable Cities and Communities. To address this gap, the present study aims to develop and validate the Smart CIties LIterature Corpus (SCILIC) and to generate a word list from it through systematic corpus linguistic analysis. The corpus comprises 3.6 million tokens sourced from two primary domains: peer-reviewed articles indexed in Scopus and Web of Science (2015–2025) and technical reports from the UN<img>Habitat digital repository (2010–2025). Utilizing #LancsBox and complementary analytical tools, the study compiled a balanced and representative corpus and generated the Smart Cities Academic Word List (SCAWL), comprising 550-word families and 667 individual words. Quantitative analysis indicates that SCAWL accounts for 7.8% of the total corpus tokens. The findings underline the multidisciplinary nature of smart city vocabulary and highlight the importance of integrating both academic and policy-oriented sources. By supporting the development of targeted educational resources and promoting clearer conceptual understanding, this research contributes directly to the advancement of SDG 4 and SDG 11, fostering both educational quality and sustainable urban development.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100191"},"PeriodicalIF":2.1,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linguistic stratification in academic publishing: A corpus-based analysis of lexicogrammatical variation across journal tiers 学术出版中的语言分层:基于语料库的期刊各层级词汇语法差异分析
IF 2.1 Pub Date : 2026-01-12 DOI: 10.1016/j.acorp.2026.100190
Ezra Alexander
This study examines linguistic stratification in academic publishing through corpus analysis of lexicogrammatical variation between high-tier and low-tier scientific journals. Using a specialized corpus of 2.3 million words from biochemistry, cell biology, and genetics publications, the research employs contrastive intralingual analysis to investigate how journal prestige influences language choices. Through key bundle analysis and examination of multiword units, the study reveals systematic differences in passive voice usage, tense selection, modal constructions, and lexical choices between journal tiers. High-tier journals demonstrate greater use of present tense constructions, specific vocabulary, and confident assertions, while low-tier journals show preference for past tense passives, generic verbs, and tentative modal expressions. The findings indicate that journal tier creates distinct linguistic expectations that reflect confidence versus tentativeness in academic writing. These patterns suggest that publication contexts systematically influence lexicogrammatical choices, with implications for how journal prestige shapes acceptable academic discourse and may create differential barriers for scholars navigating research publication in English.
本研究透过语料库分析高阶与低阶科技期刊的词汇语法差异,探讨学术出版中的语言分层现象。该研究使用来自生物化学、细胞生物学和遗传学出版物的230万单词的专业语料库,采用对比语内分析来调查期刊声望如何影响语言选择。通过对关键束的分析和对多词单位的考察,揭示了不同期刊在被动语态使用、时态选择、情态结构和词汇选择等方面的系统性差异。高水平期刊更多地使用现在时结构、特定词汇和自信的断言,而低水平期刊则倾向于使用过去式被动语态、一般动词和试测情态表达。研究结果表明,期刊层创造了独特的语言期望,反映了学术写作的信心与试探性。这些模式表明,出版语境系统地影响了词典语法的选择,暗示了期刊声望如何塑造可接受的学术话语,并可能为学者在英语研究出版物中导航创造差异障碍。
{"title":"Linguistic stratification in academic publishing: A corpus-based analysis of lexicogrammatical variation across journal tiers","authors":"Ezra Alexander","doi":"10.1016/j.acorp.2026.100190","DOIUrl":"10.1016/j.acorp.2026.100190","url":null,"abstract":"<div><div>This study examines linguistic stratification in academic publishing through corpus analysis of lexicogrammatical variation between high-tier and low-tier scientific journals. Using a specialized corpus of 2.3 million words from biochemistry, cell biology, and genetics publications, the research employs contrastive intralingual analysis to investigate how journal prestige influences language choices. Through key bundle analysis and examination of multiword units, the study reveals systematic differences in passive voice usage, tense selection, modal constructions, and lexical choices between journal tiers. High-tier journals demonstrate greater use of present tense constructions, specific vocabulary, and confident assertions, while low-tier journals show preference for past tense passives, generic verbs, and tentative modal expressions. The findings indicate that journal tier creates distinct linguistic expectations that reflect confidence versus tentativeness in academic writing. These patterns suggest that publication contexts systematically influence lexicogrammatical choices, with implications for how journal prestige shapes acceptable academic discourse and may create differential barriers for scholars navigating research publication in English.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100190"},"PeriodicalIF":2.1,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vocabulary in academic texts across disciplinary fields 跨学科领域学术文本中的词汇
IF 2.1 Pub Date : 2026-01-12 DOI: 10.1016/j.acorp.2026.100189
Xina Jin, Rachael Ruegg, Stephen Skalicky, Averil Coxhead
This article reports on a corpus-based study examining vocabulary of texts used as reading materials in two academic fields at a higher education institution in New Zealand: Computer Science (CS) and Media Design (MD). It first presents the vocabulary profiles of academic texts from both fields using Nation’s (2020) British National Corpus/Corpus of Contemporary American English (BNC/COCA) word frequency lists. Then, it outlines the vocabulary load required to comprehend different types of academic texts within the two corpora. The results indicate that while CS texts contain a wide range of mathematical, statistical, and programming-related lexical items, MD texts include a considerable number of proper nouns, such as the names of brands, companies, designers, and locations, as well as many non-English words. In terms of vocabulary demand, reading CS texts requires knowledge of 4,000 to 6,000 word families to reach 95 % to 98 % lexical coverage. In contrast, MD texts require knowledge of up to 8,000 word families for optimal comprehension across various text types. Interestingly, journal articles in both corpora show lower lexical demands than other types of texts, such as book chapters, textbooks, and materials sourced from online platforms (e.g., magazines, newspapers). The findings suggest that lexical demands vary when handling reading materials across different disciplinary areas in higher education, and provide insights into the extent of vocabulary knowledge needed to understand and learn different subject content through texts.
本文报告了一项基于语料库的研究,研究了新西兰一所高等教育机构的两个学术领域:计算机科学(CS)和媒体设计(MD)中用作阅读材料的文本词汇。它首先使用Nation的(2020)英国国家语料库/当代美国英语语料库(BNC/COCA)词频列表展示了两个领域的学术文本的词汇概况。然后,概述了在两个语料库中理解不同类型的学术文本所需的词汇量。结果表明,虽然计算机文本包含广泛的数学、统计和编程相关的词汇项目,但计算机文本包括相当数量的专有名词,如品牌、公司、设计师和地点的名称,以及许多非英语单词。在词汇需求方面,阅读CS文本需要掌握4000 - 6000个单词族,词汇覆盖率达到95% - 98%。相比之下,MD文本需要多达8000个单词族的知识,以便在各种文本类型中获得最佳理解。有趣的是,这两种语料库中的期刊文章比其他类型的文本(如书籍章节、教科书和来自在线平台的材料(如杂志、报纸))显示出更低的词汇需求。研究结果表明,在高等教育中,不同学科领域的学生在处理阅读材料时对词汇的需求是不同的,这为通过文本理解和学习不同学科内容所需的词汇知识程度提供了见解。
{"title":"Vocabulary in academic texts across disciplinary fields","authors":"Xina Jin,&nbsp;Rachael Ruegg,&nbsp;Stephen Skalicky,&nbsp;Averil Coxhead","doi":"10.1016/j.acorp.2026.100189","DOIUrl":"10.1016/j.acorp.2026.100189","url":null,"abstract":"<div><div>This article reports on a corpus-based study examining vocabulary of texts used as reading materials in two academic fields at a higher education institution in New Zealand: Computer Science (CS) and Media Design (MD). It first presents the vocabulary profiles of academic texts from both fields using Nation’s (2020) British National Corpus/Corpus of Contemporary American English (BNC/COCA) word frequency lists. Then, it outlines the vocabulary load required to comprehend different types of academic texts within the two corpora. The results indicate that while CS texts contain a wide range of mathematical, statistical, and programming-related lexical items, MD texts include a considerable number of proper nouns, such as the names of brands, companies, designers, and locations, as well as many non-English words. In terms of vocabulary demand, reading CS texts requires knowledge of 4,000 to 6,000 word families to reach 95 % to 98 % lexical coverage. In contrast, MD texts require knowledge of up to 8,000 word families for optimal comprehension across various text types. Interestingly, journal articles in both corpora show lower lexical demands than other types of texts, such as book chapters, textbooks, and materials sourced from online platforms (e.g., magazines, newspapers). The findings suggest that lexical demands vary when handling reading materials across different disciplinary areas in higher education, and provide insights into the extent of vocabulary knowledge needed to understand and learn different subject content through texts.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100189"},"PeriodicalIF":2.1,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037219","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring engineering discourse through phrase-frames: Pedagogical recommendations 通过短语框架探索工程话语:教学建议
IF 2.1 Pub Date : 2026-01-11 DOI: 10.1016/j.acorp.2026.100188
Tatiana Nekrasova-Beker
This paper discusses how the phraseology of the engineering discourse can be explored through the analysis of multi-word sequences that include a variable component – 5-word phrase-frames (or p-frames, e.g., at the top * the, based on the *of). Target p-frames identified in pedagogical materials employed in undergraduate engineering courses were further subjected to a series of analyses focusing on frequency, range, predictability of filler distribution, discourse functions, and evaluation of most frequent fillers occupying the variable slot in order to identify sequences that can be prioritized during language-focused instruction. The results indicated that many of the frequent p-frames identified across engineering sub-corpora were typically unpredictable, multifunctional, and captured a range of content from general academic to specific to engineering sub-domains. Based on the findings, pedagogical recommendations for language practitioners who target discipline-specific language patterns in their English for Academic Purposes (EAP) or English for Specific Purposes (ESP) classes are discussed.
本文讨论了如何通过分析包含可变成分的多词序列来探索工程话语的短语学- 5词短语框架(或p-框架,例如,在顶部* the,基于*of)。在本科工程课程中使用的教学材料中确定的目标p-框架进一步进行了一系列分析,重点是频率、范围、填充语分布的可预测性、话语功能,以及对占据可变槽的最常见填充语的评估,以确定在以语言为重点的教学中可以优先考虑的序列。结果表明,在工程子语料库中识别的许多频繁的p-框架通常是不可预测的,多功能的,并且捕获了从一般学术到特定工程子领域的一系列内容。在此基础上,本文讨论了在学术英语(EAP)或特殊用途英语(ESP)课程中针对特定学科语言模式的语言实践者的教学建议。
{"title":"Exploring engineering discourse through phrase-frames: Pedagogical recommendations","authors":"Tatiana Nekrasova-Beker","doi":"10.1016/j.acorp.2026.100188","DOIUrl":"10.1016/j.acorp.2026.100188","url":null,"abstract":"<div><div>This paper discusses how the phraseology of the engineering discourse can be explored through the analysis of multi-word sequences that include a variable component – 5-word phrase-frames (or p-frames, e.g., <em>at the top * the, based on the *of</em>). Target p-frames identified in pedagogical materials employed in undergraduate engineering courses were further subjected to a series of analyses focusing on frequency, range, predictability of filler distribution, discourse functions, and evaluation of most frequent fillers occupying the variable slot in order to identify sequences that can be prioritized during language-focused instruction. The results indicated that many of the frequent p-frames identified across engineering sub-corpora were typically unpredictable, multifunctional, and captured a range of content from general academic to specific to engineering sub-domains. Based on the findings, pedagogical recommendations for language practitioners who target discipline-specific language patterns in their English for Academic Purposes (EAP) or English for Specific Purposes (ESP) classes are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100188"},"PeriodicalIF":2.1,"publicationDate":"2026-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146037193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Speech and Language Disorders: A systematic review of corpora and future directions 语音和语言障碍:语料库的系统回顾和未来发展方向
IF 2.1 Pub Date : 2026-01-08 DOI: 10.1016/j.acorp.2025.100186
Abeer Z. Al-Marridi , Samawiyah M. Ulde , Ahmed Bensaid , Tariq A. Khwaileh
Speech and Language Disorders (SLDs) significantly impact social interaction, communication, and educational outcomes, making them a global health priority. According to data published by Komodo Health, speech disorder diagnoses among children aged 0–12 increased by 110% in 2022, reaching 1.2 million cases compared to the pre-pandemic average of 570,000. Addressing this growing challenge requires empowering the research community with diverse and comprehensive corpora to drive investigations and develop innovative tools. This paper systematically reviews existing SLD corpora, evaluating their relevance to research and technological innovation. The corpora are categorized based on target population, language, data modality, and task domain. Thirteen SLDs are explored, including neurological language breakdown, motor speech disorders, child language impairments, and communication challenges in autism spectrum disorder. The review identifies key research directions in the field of SLD and highlights critical gaps and challenges using statistical insights drawn from the analyzed search. Emerging trends such as multimodal data integration and artificial intelligence applications for advanced data analysis are emphasized. The review concludes with recommendations for enhancing the utility and accessibility of SLD corpora, underscoring the importance of interdisciplinary collaboration and community engagement to address existing limitations. This review serves as a valuable resource for clinicians and researchers, guiding them in selecting the most suitable database/corpora to address their clinical and investigative needs while advancing the field of SLD research and innovation.
言语和语言障碍(SLDs)严重影响社会交往、沟通和教育成果,使其成为全球卫生重点。根据科莫多健康中心公布的数据,2022年,0-12岁儿童的语言障碍诊断增加了110%,达到120万例,而大流行前的平均水平为57万例。解决这一日益增长的挑战需要赋予研究社区多样化和全面的语料库,以推动研究和开发创新工具。本文系统地回顾了现有的SLD语料库,评估了它们与研究和技术创新的相关性。根据目标人群、语言、数据模式和任务领域对语料库进行分类。研究了13种特殊障碍,包括神经性语言障碍、运动语言障碍、儿童语言障碍和自闭症谱系障碍的沟通挑战。该综述确定了SLD领域的关键研究方向,并利用分析搜索得出的统计见解强调了关键差距和挑战。强调了多模式数据集成和人工智能应用于高级数据分析等新兴趋势。该审查最后提出了提高SLD语料库的实用性和可及性的建议,强调了跨学科合作和社区参与的重要性,以解决现有的局限性。这篇综述为临床医生和研究人员提供了宝贵的资源,指导他们选择最合适的数据库/语料库来满足他们的临床和研究需求,同时推进SLD领域的研究和创新。
{"title":"Speech and Language Disorders: A systematic review of corpora and future directions","authors":"Abeer Z. Al-Marridi ,&nbsp;Samawiyah M. Ulde ,&nbsp;Ahmed Bensaid ,&nbsp;Tariq A. Khwaileh","doi":"10.1016/j.acorp.2025.100186","DOIUrl":"10.1016/j.acorp.2025.100186","url":null,"abstract":"<div><div>Speech and Language Disorders (SLDs) significantly impact social interaction, communication, and educational outcomes, making them a global health priority. According to data published by Komodo Health, speech disorder diagnoses among children aged 0–12 increased by 110% in 2022, reaching 1.2 million cases compared to the pre-pandemic average of 570,000. Addressing this growing challenge requires empowering the research community with diverse and comprehensive corpora to drive investigations and develop innovative tools. This paper systematically reviews existing SLD corpora, evaluating their relevance to research and technological innovation. The corpora are categorized based on target population, language, data modality, and task domain. Thirteen SLDs are explored, including neurological language breakdown, motor speech disorders, child language impairments, and communication challenges in autism spectrum disorder. The review identifies key research directions in the field of SLD and highlights critical gaps and challenges using statistical insights drawn from the analyzed search. Emerging trends such as multimodal data integration and artificial intelligence applications for advanced data analysis are emphasized. The review concludes with recommendations for enhancing the utility and accessibility of SLD corpora, underscoring the importance of interdisciplinary collaboration and community engagement to address existing limitations. This review serves as a valuable resource for clinicians and researchers, guiding them in selecting the most suitable database/corpora to address their clinical and investigative needs while advancing the field of SLD research and innovation.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100186"},"PeriodicalIF":2.1,"publicationDate":"2026-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145976819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Factors influencing number marking errors in spoken English by L1 Chinese learners: A learner corpus study 影响母语汉语学习者英语口语数字标注错误的因素:一项学习者语料库研究
IF 2.1 Pub Date : 2025-12-28 DOI: 10.1016/j.acorp.2025.100187
Dongchen Yao
Number marking represents a fundamental aspect of English, whereas Chinese lacks inflection for number and does not morpho-syntactically encode a number distinction. As a result, Chinese learners of English have been shown to exhibit difficulties in fully acquiring number marking. The present study aims to add to our understanding of the difficulties faced by Chinese learners of English by examining what factors correlate with number-marking errors in English nouns. The analysis draws on the Chinese subcomponent of monologues in the International Corpus Network of Asian Learners of English (ICNALE), focusing on L2 English learners whose first language is Mandarin Chinese (henceforth Chinese). A mixed-effects logistic regression analysis was conducted to examine both linguistic factors (i.e., count-mass distinction, concreteness, atomicity, determiner, and L1 transfer) and sociolinguistic factors (i.e., sex, age, and proficiency). The statistical results reveal that count-mass distinction, concreteness, and the use of determiners are the most important predictors of number-marking errors in English nouns. Mass nouns, concrete nouns, and the presence of determiners are associated with a lower likelihood of number-marking errors compared to their counterparts. Abstract count nouns pose the greatest challenge for Chinese learners of English, with most errors occurring in the singular form of count nouns.
数字标记是英语的一个基本方面,而汉语没有数字的屈折变化,也没有在形态句法上编码数字的区别。因此,中国的英语学习者在完全掌握数字标记方面表现出困难。本研究旨在通过考察与英语名词数字标注错误相关的因素,加深我们对中国英语学习者所面临的困难的理解。本文分析了亚洲英语学习者国际语料库网络(ICNALE)中独白的汉语子成分,重点关注以普通话为第一语言的第二语言英语学习者。进行了混合效应逻辑回归分析,以检查语言因素(即计数-质量差异、具体性、原子性、限定词和L1迁移)和社会语言因素(即性别、年龄和熟练程度)。统计结果表明,数量-质量差别、具体性和限定词的使用是英语名词标注数字错误的最重要的预测因素。质量名词、具体名词和限定词的存在与数字标记错误的可能性较低有关。抽象可数名词是中国英语学习者面临的最大挑战,大多数错误发生在可数名词的单数形式上。
{"title":"Factors influencing number marking errors in spoken English by L1 Chinese learners: A learner corpus study","authors":"Dongchen Yao","doi":"10.1016/j.acorp.2025.100187","DOIUrl":"10.1016/j.acorp.2025.100187","url":null,"abstract":"<div><div>Number marking represents a fundamental aspect of English, whereas Chinese lacks inflection for number and does not morpho-syntactically encode a number distinction. As a result, Chinese learners of English have been shown to exhibit difficulties in fully acquiring number marking. The present study aims to add to our understanding of the difficulties faced by Chinese learners of English by examining what factors correlate with number-marking errors in English nouns. The analysis draws on the Chinese subcomponent of monologues in the <em>International Corpus Network of Asian Learners of English</em> (ICNALE), focusing on L2 English learners whose first language is Mandarin Chinese (henceforth Chinese). A mixed-effects logistic regression analysis was conducted to examine both linguistic factors (i.e., count-mass distinction, concreteness, atomicity, determiner, and L1 transfer) and sociolinguistic factors (i.e., sex, age, and proficiency). The statistical results reveal that count-mass distinction, concreteness, and the use of determiners are the most important predictors of number-marking errors in English nouns. Mass nouns, concrete nouns, and the presence of determiners are associated with a lower likelihood of number-marking errors compared to their counterparts. Abstract count nouns pose the greatest challenge for Chinese learners of English, with most errors occurring in the singular form of count nouns.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100187"},"PeriodicalIF":2.1,"publicationDate":"2025-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145924893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Each of them is one of a kind: A corpus-based study on two type-noun morphemes in spoken Mandarin 每一个都是独一无二的:基于语料库的汉语口语中两种类型名词语素的研究
IF 2.1 Pub Date : 2025-12-24 DOI: 10.1016/j.acorp.2025.100185
Chen-Yu Chester Hsieh
This article presents a corpus-based analysis of how two near-synonymous type nouns (TNs) in Mandarin, zhǒng and lèi, diverge in their distributional patterns and interactional functions in spoken discourse. Using quantitative collocational profiling and qualitative analysis informed by Interactional Linguistics, the study examines 968 instances of zhǒng and 179 instances of lèi in the NCCU Corpus of Spoken Mandarin. The findings show that zhǒng forms a broad set of prefabricated expressions, each favoring particular lexico-grammatical constructions and serving evaluative, referential, and turn-projecting functions. In contrast, lèi, most prominently in the form zhīlèi, is more restricted in distribution, occurs predominantly in utterance-final position, and indexes uncertainty and turn completion. These results demonstrate that even near-synonymous TN morphemes differentiate in systematic ways shaped by linguistic form, sequential context, and interactional needs. The study contributes to research on TNs, classifier systems, and pragmatic markers in Mandarin, while offering implications for cross-linguistic comparison and Chinese language pedagogy.
本文以语料库为基础,分析了汉语中两个近同义类型名词zhǒng和l i在口语语篇中的分布模式和互动功能的差异。本研究运用互动语言学的定量搭配分析和定性分析方法,对中央语言学院普通话口语语料库中的968个zhǒng和179个l进行了分析。研究结果表明,zhǒng形成了一套广泛的预制表达式,每个表达式都倾向于特定的词汇语法结构,并具有评价、参考和转向投射功能。相比之下,li在分布上更受限制,主要出现在词尾位置,并表示不确定性和转折完成。这些结果表明,即使是接近同义的TN语素也会以系统的方式由语言形式、顺序上下文和相互作用需求形成。本研究对汉语的分类系统、语用标记、语用标记等方面的研究具有重要意义,同时对跨语言比较和汉语教学具有重要意义。
{"title":"Each of them is one of a kind: A corpus-based study on two type-noun morphemes in spoken Mandarin","authors":"Chen-Yu Chester Hsieh","doi":"10.1016/j.acorp.2025.100185","DOIUrl":"10.1016/j.acorp.2025.100185","url":null,"abstract":"<div><div>This article presents a corpus-based analysis of how two near-synonymous type nouns (TNs) in Mandarin, <em>zhǒng</em> and <em>lèi</em>, diverge in their distributional patterns and interactional functions in spoken discourse. Using quantitative collocational profiling and qualitative analysis informed by Interactional Linguistics, the study examines 968 instances of <em>zhǒng</em> and 179 instances of <em>lèi</em> in the NCCU Corpus of Spoken Mandarin. The findings show that <em>zhǒng</em> forms a broad set of prefabricated expressions, each favoring particular lexico-grammatical constructions and serving evaluative, referential, and turn-projecting functions. In contrast, <em>lèi</em>, most prominently in the form <em>zhīlèi</em>, is more restricted in distribution, occurs predominantly in utterance-final position, and indexes uncertainty and turn completion. These results demonstrate that even near-synonymous TN morphemes differentiate in systematic ways shaped by linguistic form, sequential context, and interactional needs. The study contributes to research on TNs, classifier systems, and pragmatic markers in Mandarin, while offering implications for cross-linguistic comparison and Chinese language pedagogy.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100185"},"PeriodicalIF":2.1,"publicationDate":"2025-12-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145884222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructions of ‘sound’ in scientific discourses about cochlear implants 关于人工耳蜗的科学话语中“声音”的结构
IF 2.1 Pub Date : 2025-12-15 DOI: 10.1016/j.acorp.2025.100183
Emily Kecman , Stephanie Lloyd , Isabelle Boisvert
The linguistic resources employed to discuss sensory experiences and phenomena can vary considerably between different cultural, disciplinary and socio-political contexts. Whilst questions about the discourses of sound have long been explored in some fields, within the field of cochlear implant research, such questions have received limited attention. This article draws together literature from diverse fields, highlighting the various complexities inherent in talking about “sound” in different contexts. The results of a collocation analysis of “sound” within the CIRCorpus - (a purpose-built 3-million-word corpus comprised of scientific research articles about cochlear implants published between 1960 and 2024) are then reported. The collocation analysis highlights a discursive environment in which sound is predominantly framed within a language of testing and ability, suggesting that discussions of sound within CI research have become distinctly psychologized and increasingly technicalized and homogenized over time. The implications of these patterns for informing future CI research agendas are discussed.
在不同的文化、学科和社会政治背景下,用于讨论感官体验和现象的语言资源可能会有很大差异。虽然关于声音话语的问题在一些领域已经被探索了很长时间,但在人工耳蜗研究领域,这些问题受到的关注有限。本文汇集了来自不同领域的文献,强调了在不同语境中谈论“声音”所固有的各种复杂性。然后报告CIRCorpus -(一个专门建造的300万字语料库,由1960年至2024年间发表的关于人工耳蜗的科学研究文章组成)中“声音”的搭配分析结果。搭配分析强调了一个话语环境,在这个环境中,声音主要是在一种测试和能力的语言中被框定的,这表明随着时间的推移,CI研究中关于声音的讨论已经变得明显的心理化,越来越技术化和同质化。讨论了这些模式对未来CI研究议程的影响。
{"title":"Constructions of ‘sound’ in scientific discourses about cochlear implants","authors":"Emily Kecman ,&nbsp;Stephanie Lloyd ,&nbsp;Isabelle Boisvert","doi":"10.1016/j.acorp.2025.100183","DOIUrl":"10.1016/j.acorp.2025.100183","url":null,"abstract":"<div><div>The linguistic resources employed to discuss sensory experiences and phenomena can vary considerably between different cultural, disciplinary and socio-political contexts. Whilst questions about the discourses of sound have long been explored in some fields, within the field of cochlear implant research, such questions have received limited attention. This article draws together literature from diverse fields, highlighting the various complexities inherent in talking about “sound” in different contexts. The results of a collocation analysis of “sound” within the CIRCorpus - (a purpose-built 3-million-word corpus comprised of scientific research articles about cochlear implants published between 1960 and 2024) are then reported. The collocation analysis highlights a discursive environment in which sound is predominantly framed within a language of <em>testing</em> and <em>abilit</em>y, suggesting that discussions of sound within CI research have become distinctly psychologized and increasingly technicalized and homogenized over time. The implications of these patterns for informing future CI research agendas are discussed.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100183"},"PeriodicalIF":2.1,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A corpus study of lexical richness in three groups of learners: are heritage learners in between monolingual native and L2 learners? 三组学习者词汇丰富性的语料库研究:传统学习者是介于单语母语学习者和二语学习者之间吗?
IF 2.1 Pub Date : 2025-12-15 DOI: 10.1016/j.acorp.2025.100182
Irene Checa-García
This study investigated lexical richness across three learner groups to determine heritage language learners’ (HLL) position relative to monolingually-raised native speakers (MNS) and second language learners (L2L). Using corpus analysis of written compositions from 94 university students (36 HLL, 30 L2L, 28 MNS), lexical diversity, density, and sophistication were examined alongside self-perceived vocabulary abilities and difficulties.
Results revealed distinct patterns across indices and acquisition groups. Lexical diversity was primarily influenced by self-reported difficulties finding words and using sophisticated vocabulary, with minimal group differences except that MNS showed less diversity reduction when experiencing word-finding difficulties. Lexical density showed the strongest group effect: L2Ls exhibited significantly higher density (a 0.39 units increase) compared to HLLs, likely due to omission of mandatory Spanish function words through English transfer—an effect absent in HLLs. For lexical sophistication, MNS demonstrated significantly higher scores than both HLL and L2L groups, though with modest effect sizes. Self-perceived vocabulary abilities were generally associated with diversity measures across all groups, while sophistication—the most reliable indicator of writing quality in previous research—showed no relationship to learners' perceived difficulties or abilities.
These findings suggest HLLs align more closely with MNS in most lexical richness measures except sophistication, where they pattern with L2Ls. Results also suggest that lexical density inadequately measures informational content at intermediate L2 levels due to syntactic interference, while sophistication may be the lexical aspect in more need of instruction for acquisition groups that may receive less often sophisticated words input, such as L2L and HLL.
本研究调查了三个学习者群体的词汇丰富程度,以确定传统语言学习者(HLL)相对于单语母语者(MNS)和第二语言学习者(L2L)的地位。通过对94名大学生(36名HLL、30名L2L、28名MNS)的书面作文进行语料库分析,研究了词汇多样性、密度和复杂程度以及自我感知的词汇能力和困难程度。结果揭示了不同指数和收购组的不同模式。词汇多样性主要受自我报告的找词困难和使用复杂词汇的影响,除了MNS在遇到找词困难时表现出较少的多样性减少外,组间差异很小。词汇密度表现出最强的群体效应:与高水平英语相比,低水平英语表现出显著更高的密度(增加0.39个单位),这可能是由于英语迁移中省略了强制性的西班牙语虚词,而高水平英语没有这种效应。在词汇复杂性方面,MNS组的得分明显高于HLL和L2L组,尽管效应大小不大。自我感知的词汇能力通常与所有群体的多样性指标有关,而复杂程度——之前研究中最可靠的写作质量指标——与学习者感知的困难或能力没有关系。这些发现表明,在大多数词汇丰富度测量中,hls与MNS更接近,除了复杂程度,它们与L2Ls模式一致。结果还表明,由于句法干扰,词汇密度不能充分衡量中级二语水平的信息内容,而复杂程度可能是词汇方面更需要指导的习得群体,他们可能接受较少的复杂单词输入,如L2L和HLL。
{"title":"A corpus study of lexical richness in three groups of learners: are heritage learners in between monolingual native and L2 learners?","authors":"Irene Checa-García","doi":"10.1016/j.acorp.2025.100182","DOIUrl":"10.1016/j.acorp.2025.100182","url":null,"abstract":"<div><div>This study investigated lexical richness across three learner groups to determine heritage language learners’ (HLL) position relative to monolingually-raised native speakers (MNS) and second language learners (L2L). Using corpus analysis of written compositions from 94 university students (36 HLL, 30 L2L, 28 MNS), lexical diversity, density, and sophistication were examined alongside self-perceived vocabulary abilities and difficulties.</div><div>Results revealed distinct patterns across indices and acquisition groups. Lexical diversity was primarily influenced by self-reported difficulties finding words and using sophisticated vocabulary, with minimal group differences except that MNS showed less diversity reduction when experiencing word-finding difficulties. Lexical density showed the strongest group effect: L2Ls exhibited significantly higher density (a 0.39 units increase) compared to HLLs, likely due to omission of mandatory Spanish function words through English transfer—an effect absent in HLLs. For lexical sophistication, MNS demonstrated significantly higher scores than both HLL and L2L groups, though with modest effect sizes. Self-perceived vocabulary abilities were generally associated with diversity measures across all groups, while sophistication—the most reliable indicator of writing quality in previous research—showed no relationship to learners' perceived difficulties or abilities.</div><div>These findings suggest HLLs align more closely with MNS in most lexical richness measures except sophistication, where they pattern with L2Ls. Results also suggest that lexical density inadequately measures informational content at intermediate L2 levels due to syntactic interference, while sophistication may be the lexical aspect in more need of instruction for acquisition groups that may receive less often sophisticated words input, such as L2L and HLL.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100182"},"PeriodicalIF":2.1,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic prosody as a part of attitudinal meaning: Its effect on the processing of synonymous words 作为态度意义一部分的语义韵律对同义词加工的影响
IF 2.1 Pub Date : 2025-12-15 DOI: 10.1016/j.acorp.2025.100184
Leyla Çimen , Günçe Günduğdu , Nermin Yazıcı
Speakers may utilize grammatical, lexical, or syntactic resources at the discourse level to encode attitudes. From the perspective of Appraisal Theory, such an attitude includes emotional reactions, judgments, and evaluations. This study investigates how semantic prosody, a linguistic resource that generates attitudinal meaning, influences the processing of synonyms. To achieve this end, lexical priming and lexical decision tasks were conducted with 39 participants using a within–subject design. In the lexical priming task, semantic prosody was found to have a priming effect on the recognition of synonyms that are compatible with its own semantic prosody. In the lexical decision task, semantic prosody was found to shorten reaction times in the recognition of word units that were consistent with their own semantic prosody. The findings indicate that, in addition to emotion–laden words, which explicitly, metaphorically, or attitudinally signal attitudes, semantic prosody also acquires attitudinal meaning, contributing to processing. This attitudinal function of semantic prosody indicates that, as a result of associative learning, it acquires selective attention through frequent and consistent usage, thereby generating an automatized response. The attitudinal relationship that semantic prosody establishes with collocations in the text has been discussed in terms of how and why it affects the processing of words, and the underlying acquisitional processes have been described. As the first study to investigate the processing of Turkish synonyms with different semantic prosodies, this research is expected to provide a basis for further research.
说话者可以利用话语层面的语法、词汇或句法资源对态度进行编码。从评价理论的角度来看,这种态度包括情绪反应、判断和评价。本研究探讨了语义韵律作为一种产生态度意义的语言资源,对同义词加工的影响。为了达到这一目的,词汇启动和词汇决策任务采用主题内设计对39名参与者进行。在词汇启动任务中,发现语义韵律对与自身语义韵律相匹配的同义词的识别具有启动效应。在词汇决策任务中,发现语义韵律能缩短反应时间,以识别与自己的语义韵律一致的单词单位。研究结果表明,除了明确、隐喻或态度性地表达态度的情感词汇外,语义韵律还获得态度意义,有助于加工。语义韵律的这种态度功能表明,作为联想学习的结果,它通过频繁和一致的使用获得选择性注意,从而产生自动化反应。从语义韵律如何以及为什么影响词汇加工的角度讨论了语义韵律与文本中搭配所建立的态度关系,并描述了潜在的习得过程。作为首个对不同语义韵律的土耳其语同义词加工进行研究的研究,本研究有望为进一步的研究提供基础。
{"title":"Semantic prosody as a part of attitudinal meaning: Its effect on the processing of synonymous words","authors":"Leyla Çimen ,&nbsp;Günçe Günduğdu ,&nbsp;Nermin Yazıcı","doi":"10.1016/j.acorp.2025.100184","DOIUrl":"10.1016/j.acorp.2025.100184","url":null,"abstract":"<div><div>Speakers may utilize grammatical, lexical, or syntactic resources at the discourse level to encode attitudes. From the perspective of Appraisal Theory, such an attitude includes emotional reactions, judgments, and evaluations. This study investigates how semantic prosody, a linguistic resource that generates attitudinal meaning, influences the processing of synonyms. To achieve this end, lexical priming and lexical decision tasks were conducted with 39 participants using a within–subject design. In the lexical priming task, semantic prosody was found to have a priming effect on the recognition of synonyms that are compatible with its own semantic prosody. In the lexical decision task, semantic prosody was found to shorten reaction times in the recognition of word units that were consistent with their own semantic prosody. The findings indicate that, in addition to emotion–laden words, which explicitly, metaphorically, or attitudinally signal attitudes, semantic prosody also acquires attitudinal meaning, contributing to processing. This attitudinal function of semantic prosody indicates that, as a result of associative learning, it acquires selective attention through frequent and consistent usage, thereby generating an automatized response. The attitudinal relationship that semantic prosody establishes with collocations in the text has been discussed in terms of how and why it affects the processing of words, and the underlying acquisitional processes have been described. As the first study to investigate the processing of Turkish synonyms with different semantic prosodies, this research is expected to provide a basis for further research.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"6 1","pages":"Article 100184"},"PeriodicalIF":2.1,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145839975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Applied Corpus Linguistics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1