This study aims to bridge gaps in current research by analyzing a longitudinal spoken learner corpus of low-proficiency English learners. We investigated the chronological variation in lexical measurements in second language (L2) speaking production, focusing on data from 104 low-proficiency learners elicited eight times over 23 months. Our findings show that measures such as the number of different words and type-token ratio are effective indicators of L2 speaking development, whereas the use of sophisticated vocabulary was not significantly correlated with learning duration. These results suggest that in the early stages of L2 acquisition, speaking skills are influenced primarily by lexical variation. This finding underscores the importance of lexical variation as a key factor in novice-level L2 speaking proficiency.
Qualitative free-text responses (e.g. from questionnaires and surveys) pose a challenge to many companies and institutions which lack the expertise to analyse such data with ease. While a range of sophisticated tools for the analysis of text do exist, these are often expensive, difficult to use and/or inaccessible to non-expert users. These tools also lack support for the analysis of English and Welsh text, which can be a particular challenge in the bilingual context of Wales. This paper details the key functionalities of the first corpus-based ‘FreeTxt’ toolkit which has been designed to support the systematic analysis and visualisation of free-text data, as a direct response to these two key needs. This paper demonstrates how, by working in partnership, software engineers, natural language processing (NLP) experts and corpus linguists can collaborate with end-users and beneficiaries to provide effective solutions to real world problems. Through the development of FreeTxt (www.freetxt.app), we aimed to empower end-users to direct and lead their own analyses of both small-scale and more extensive datasets to maximise the reach and potential impact generated. The approaches reported here, and the bilingual toolkit developed, can be replicated and extended for use in other language contexts and across a range of public and professional sectors. FreeTxt is now available for the analysis of Welsh and/or English, for use by anyone in any sector in Wales and beyond.
Previous research examined L2 interaction by describing salient features exhibited in different patterns of peer interaction. These studies mostly used qualitative methods and focused on the collaborative aspect of such construct (Galaczi, 2008). The present study adopts a quantitative approach to explore and describe L2 interaction, utilizing the data of the Corpus of Collaborative Oral Tasks (CCOT). Specifically, it measures pairs’ interaction by creating a composite score of interactivity to understand the relationship between the dyads' degree of interactivity and their use of lexico-grammatical features as well as their L2 fluency. Pearson's correlation tests showed weak to moderate positive relationships between interactivity and discourse particles, response forms, wh-questions, and second person pronouns. Additionally, the tests revealed weak negative relationships between interactivity and both nominal forms and hesitations. Furthermore, revealing moderate relationships, Pearson's correlation tests showed that interactivity was associated with more fluent L2 speech, where learners of higher interactivity levels tended to produce fewer silent pauses and faster speech rates. The study provides insights for scholars interested in L2 interaction. It suggests that some linguistic features were not only associated with collaborative behaviors (as reported in the literature) but also with interactivity as broad aspect. Furthermore, it provides a description of how the act of turn taking might potentially serve the fluency of higher interactivity students, warranting further investigation of turn frequency among L2 test takers as test raters might potentially be influenced by the test candidates’ fluency. Finally, it reports that L2 interactivity exhibited a relationship pattern with linguistic features that resembles patterns reported in the literature of studies on native speakers of English.
Register is among the most important predictors of linguistic variation. In a register such as instructor feedback, linguistic features have particularly high stakes, as they can make feedback more clear, detailed, and/or (de)motivating. Mitigation strategies (i.e., the use of hedges and other softeners) are frequently found in instructor feedback and are particularly influential in terms of the feedback's effectiveness. This study compares the patterns of mitigation strategies used in written and spoken feedback to gain insights into register variation. Written comments (provided electronically) and spoken comments (provided through screencast feedback, in which instructors share verbal feedback along with a screenshare of the student's essay) in the Writing Feedback Corpus (WFC) were analyzed. 1,568 comments across these registers were manually coded for mitigation within head acts (core speech acts) and external modification in the surrounding discourse. Strategies were compared quantitatively using key feature analysis (Egbert & Biber, 2023). The findings indicate that feedback registers promote the use of different mitigation strategies and external modification strategies, with written feedback favoring interrogative syntax and unmitigated forms and spoken feedback favoring personal attribution, hedges, and the nursery we as well as the external modifiers minimizer, positive comment, and reason. Implications for providing feedback on student writing are highlighted.

