This study aims to bridge gaps in current research by analyzing a longitudinal spoken learner corpus of low-proficiency English learners. We investigated the chronological variation in lexical measurements in second language (L2) speaking production, focusing on data from 104 low-proficiency learners elicited eight times over 23 months. Our findings show that measures such as the number of different words and type-token ratio are effective indicators of L2 speaking development, whereas the use of sophisticated vocabulary was not significantly correlated with learning duration. These results suggest that in the early stages of L2 acquisition, speaking skills are influenced primarily by lexical variation. This finding underscores the importance of lexical variation as a key factor in novice-level L2 speaking proficiency.
Qualitative free-text responses (e.g. from questionnaires and surveys) pose a challenge to many companies and institutions which lack the expertise to analyse such data with ease. While a range of sophisticated tools for the analysis of text do exist, these are often expensive, difficult to use and/or inaccessible to non-expert users. These tools also lack support for the analysis of English and Welsh text, which can be a particular challenge in the bilingual context of Wales. This paper details the key functionalities of the first corpus-based ‘FreeTxt’ toolkit which has been designed to support the systematic analysis and visualisation of free-text data, as a direct response to these two key needs. This paper demonstrates how, by working in partnership, software engineers, natural language processing (NLP) experts and corpus linguists can collaborate with end-users and beneficiaries to provide effective solutions to real world problems. Through the development of FreeTxt (www.freetxt.app), we aimed to empower end-users to direct and lead their own analyses of both small-scale and more extensive datasets to maximise the reach and potential impact generated. The approaches reported here, and the bilingual toolkit developed, can be replicated and extended for use in other language contexts and across a range of public and professional sectors. FreeTxt is now available for the analysis of Welsh and/or English, for use by anyone in any sector in Wales and beyond.
Previous research examined L2 interaction by describing salient features exhibited in different patterns of peer interaction. These studies mostly used qualitative methods and focused on the collaborative aspect of such construct (Galaczi, 2008). The present study adopts a quantitative approach to explore and describe L2 interaction, utilizing the data of the Corpus of Collaborative Oral Tasks (CCOT). Specifically, it measures pairs’ interaction by creating a composite score of interactivity to understand the relationship between the dyads' degree of interactivity and their use of lexico-grammatical features as well as their L2 fluency. Pearson's correlation tests showed weak to moderate positive relationships between interactivity and discourse particles, response forms, wh-questions, and second person pronouns. Additionally, the tests revealed weak negative relationships between interactivity and both nominal forms and hesitations. Furthermore, revealing moderate relationships, Pearson's correlation tests showed that interactivity was associated with more fluent L2 speech, where learners of higher interactivity levels tended to produce fewer silent pauses and faster speech rates. The study provides insights for scholars interested in L2 interaction. It suggests that some linguistic features were not only associated with collaborative behaviors (as reported in the literature) but also with interactivity as broad aspect. Furthermore, it provides a description of how the act of turn taking might potentially serve the fluency of higher interactivity students, warranting further investigation of turn frequency among L2 test takers as test raters might potentially be influenced by the test candidates’ fluency. Finally, it reports that L2 interactivity exhibited a relationship pattern with linguistic features that resembles patterns reported in the literature of studies on native speakers of English.
Register is among the most important predictors of linguistic variation. In a register such as instructor feedback, linguistic features have particularly high stakes, as they can make feedback more clear, detailed, and/or (de)motivating. Mitigation strategies (i.e., the use of hedges and other softeners) are frequently found in instructor feedback and are particularly influential in terms of the feedback's effectiveness. This study compares the patterns of mitigation strategies used in written and spoken feedback to gain insights into register variation. Written comments (provided electronically) and spoken comments (provided through screencast feedback, in which instructors share verbal feedback along with a screenshare of the student's essay) in the Writing Feedback Corpus (WFC) were analyzed. 1,568 comments across these registers were manually coded for mitigation within head acts (core speech acts) and external modification in the surrounding discourse. Strategies were compared quantitatively using key feature analysis (Egbert & Biber, 2023). The findings indicate that feedback registers promote the use of different mitigation strategies and external modification strategies, with written feedback favoring interrogative syntax and unmitigated forms and spoken feedback favoring personal attribution, hedges, and the nursery we as well as the external modifiers minimizer, positive comment, and reason. Implications for providing feedback on student writing are highlighted.
Corpus findings are only useful if the corpus adequately represents the content and language of the target domain; yet few studies evaluate or report representativeness. This paper argues that corpus linguists should focus explicitly on the validation process. It introduces the innovative concept of a Representativeness Argument, which is an explicit statement of reliability and validity to enable defensible applications of a corpus for a specifically defined purpose and audience. Adapted from Toulmin's (1958/2003) argument model, its originality lies in its attention to both target domain and linguistic representativeness, and in the critical role played by expert judgements. To illustrate this approach, I present a representativeness argument for the 1.98-million-word ‘DSVC-IL’ corpus, which was compiled to investigate the discipline-specific vocabulary required for reading postgraduate International Law texts. The corpus is demonstrated to adequately represent target domain content, established by analysing modules and reading lists, and confirmed by experts. The language is shown to adequately reflect the domain through analysis of a 1026-flemma Single Word List, extracted using measures of frequency, keyness, range and evenness of distribution. List items are evenly-distributed in randomly-split corpus halves (rs=.98, p<.00). The list provides similar coverage of the DSVC-IL (26.37%) and other texts from the domain (23.87%). Moreover, Law experts confirmed the majority of list items were Law words. Together, the evidence supports the usefulness of the corpus and list for its explicitly defined purpose.
With the proliferation of large corpora and the availability of sophisticated corpus-analysis tools, the number of corpus-based word lists targeting different types of vocabulary has rapidly increased during the last 20 years. This wide variety of lists has caused problems for practitioners, for whom it is not always easy to decide which list is most useful for their purpose and context. Given the paucity of systematic guidance on how to evaluate word lists, this study aimed to construct an evaluation tool that is based on Nation's (2016) framework of critiquing word lists, but is reformulated for a different purpose and for different target users, in order to increase the applicability of information derived from corpus analysis (the word lists). Constructed based on a thorough literature review, and informed by practitioners’ views and uses of word lists, along with consultations with ELT practitioners and word list experts, the tool targets ELT practitioners such as teachers, curriculum and assessment coordinators, and materials developers involved in directing vocabulary acquisition. The tool caters to practitioners with different levels of expertise and knowledge—especially those who are unfamiliar with the intricacies of developing corpus-based word lists. This paper documents the development of the initial version of the evaluation tool, as well as its first iteration, drawing upon the insights of both word list experts and practitioners in ELT.