{"title":"Structural Factor Analysis of Lexical Complexity Constructs and Measures: A Quantitative Measure-Testing Process on Specialised Academic Texts","authors":"Maryam Nasseri, Philip McCarthy","doi":"10.1080/09296174.2023.2258782","DOIUrl":null,"url":null,"abstract":"ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on contributorsMaryam NasseriMaryam Nasseri received her doctoral degree from the University of Birmingham (UK), where she worked on the application of statistical modelling, NLP, and corpus linguistics methods on lexical and syntactic complexity. She has received multiple awards and grants, including the ISLE 2020 grant for syntactic complexification in academic texts and the AUS 2023-26 research grant for statistical modelling and designing software for lexical proficiency grading of academic writing. She has published in journals such as System, Journal of English for Academic Purposes (JEAP), and Assessing Writing and reviewed multiple articles and books for Taylor & Francis, Assessing Writing, and Journal of Language and Education (JLE).Philip McCarthyPhilip McCarthy is an Associate Professor and discourse scientist specializing in software design and corpus analysis. His major interest is analyzing the English writings of students. His articles have been published in journals such as Discourse Processes, The Modern Language Journal, Written Communication, and Applied Psycholinguistics. McCarthy has been a teacher for 30 years, working in locations such as Turkiye, Japan, Britain, the US and the UAE. He is currently the principal investigator of a project on lexical proficiency grading of academic writing funded by the American University of Sharjah (AUS).","PeriodicalId":45514,"journal":{"name":"Journal of Quantitative Linguistics","volume":null,"pages":null},"PeriodicalIF":0.7000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quantitative Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/09296174.2023.2258782","RegionNum":2,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 0
Abstract
ABSTRACTThis study evaluates 22 lexical complexity measures that represent the three constructs of density, diversity and sophistication. The selection of these measures stems from an extensive review of the SLA linguistics literature. All measures were subjected to qualitative screening for indicators/predictors of lexical proficiency/development and criterion validity based on the body of scholarship. This study’s measure-testing process begins by dividing the selected measures into two groups, similarly calculated and dissimilarly calculated, based on their quantification methods and the results of correlation tests. Using a specialized corpus of postgraduate academic texts, a Structural Factor Analysis (SFA) comprising a Confirmatory Factor Analysis (CFA) and Exploratory Factor Analysis (EFA) is then conducted. The purpose of SFA is to 1) verify and examine the lexical classifications proposed in the literature, 2) evaluate the relationship between various lexical constructs and their representative measures, 3) identify the indices that best represent each construct and 4) detect possible new structures/dimensions. Based on the analysis of the corpus, the study discusses the construct-distinctiveness of lexical complexity constructs, as well as strong indicators of each conceptual/mathematical group among the measures. Finally, a unique and smaller set of measures representative of each construct is suggested for future studies that require measure selection. AcknowledgmentsWe would like to thank the two anonymous reviewers for their valuable suggestions and comments.Disclosure statementNo potential conflict of interest was reported by the author(s).Credit authorship contribution statementMaryam Nasseri: Conceptualization, Data curation, Methodology, Data analysis and evaluation of findings, Project administration, Visualization, Writing: original draft, Writing: critical review & editing, Funding acquisition.Philip McCarthy: Measure-selection, Writing: critical review & editing, Funding acquisition.Notes1. The lexical sophistication measures in LCA-AW are filtered through the BAWE (British Academic Written English) corpus and its most-frequently-used academic writing words used in linguistics and language studies as well as the general English frequency word lists based on the BNC (the British National Corpus) or ANC (American National Corpus).2. LCA-AW and TAALED calculate the indices based on lemma forms while Coh-Metrix calculates the vocd-D index based on word forms. In the latter case, lemmatized files can be used as the input to Coh-Metrix.3. The R packages used in this study include psych (version 1.8.12, Revelle, Citation2018), lavaan (version 0.5–18, Rosseel, Citation2012) and corrplot (version 0.84, Wei & Simko, Citation2017).Additional informationFundingThis study is part of the “Lexical Proficiency Grading for Academic Writing (FRG23-C-S66)” comprehensive research granted by the American University of Sharjah (AUS).Notes on contributorsMaryam NasseriMaryam Nasseri received her doctoral degree from the University of Birmingham (UK), where she worked on the application of statistical modelling, NLP, and corpus linguistics methods on lexical and syntactic complexity. She has received multiple awards and grants, including the ISLE 2020 grant for syntactic complexification in academic texts and the AUS 2023-26 research grant for statistical modelling and designing software for lexical proficiency grading of academic writing. She has published in journals such as System, Journal of English for Academic Purposes (JEAP), and Assessing Writing and reviewed multiple articles and books for Taylor & Francis, Assessing Writing, and Journal of Language and Education (JLE).Philip McCarthyPhilip McCarthy is an Associate Professor and discourse scientist specializing in software design and corpus analysis. His major interest is analyzing the English writings of students. His articles have been published in journals such as Discourse Processes, The Modern Language Journal, Written Communication, and Applied Psycholinguistics. McCarthy has been a teacher for 30 years, working in locations such as Turkiye, Japan, Britain, the US and the UAE. He is currently the principal investigator of a project on lexical proficiency grading of academic writing funded by the American University of Sharjah (AUS).
期刊介绍:
The Journal of Quantitative Linguistics is an international forum for the publication and discussion of research on the quantitative characteristics of language and text in an exact mathematical form. This approach, which is of growing interest, opens up important and exciting theoretical perspectives, as well as solutions for a wide range of practical problems such as machine learning or statistical parsing, by introducing into linguistics the methods and models of advanced scientific disciplines such as the natural sciences, economics, and psychology.