{"title":"The unit of analysis in learner corpus research on formulaic language","authors":"Joe Geluso , Hui-Hsien Feng , Randy Appel","doi":"10.1016/j.acorp.2025.100123","DOIUrl":null,"url":null,"abstract":"<div><div>This study employs two case studies to investigate how differences in the unit of analysis in learner corpus research (LCR) studies on formulaic language (e.g., lexical bundles and phrase frames) have the potential to lead researchers to disparate inferences even when analyzing the same corpora. LCR studies on written formulaic language (FL) commonly use the corpus as the unit of analysis, or a per-corpus approach, for inter-group comparisons. This approach combines essays from different individuals into a single long essay that represents the entire group. Less frequently, LCR studies on FL use the individual texts that comprise a corpus as the unit of analysis, or a per-text approach. A per-text approach allows the researcher to generate group means and standard deviations, or ranked frequencies at the text level. Findings suggest that the two research designs can lead to different results and hence conflicting inferences from the same data set. Specifically, a per-text approach appears less prone to identify significant differences between groups than a per-corpus approach, and better reflects similarities between groups such as the absence of linguistic features. We conclude with instructions on how to generate per-text counts using a popular and free corpus analysis tool.</div></div>","PeriodicalId":72254,"journal":{"name":"Applied Corpus Linguistics","volume":"5 1","pages":"Article 100123"},"PeriodicalIF":0.0000,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Corpus Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666799125000061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study employs two case studies to investigate how differences in the unit of analysis in learner corpus research (LCR) studies on formulaic language (e.g., lexical bundles and phrase frames) have the potential to lead researchers to disparate inferences even when analyzing the same corpora. LCR studies on written formulaic language (FL) commonly use the corpus as the unit of analysis, or a per-corpus approach, for inter-group comparisons. This approach combines essays from different individuals into a single long essay that represents the entire group. Less frequently, LCR studies on FL use the individual texts that comprise a corpus as the unit of analysis, or a per-text approach. A per-text approach allows the researcher to generate group means and standard deviations, or ranked frequencies at the text level. Findings suggest that the two research designs can lead to different results and hence conflicting inferences from the same data set. Specifically, a per-text approach appears less prone to identify significant differences between groups than a per-corpus approach, and better reflects similarities between groups such as the absence of linguistic features. We conclude with instructions on how to generate per-text counts using a popular and free corpus analysis tool.