Pub Date : 2023-09-23DOI: 10.1177/02655322231195027
Ramsey L. Cardwell, Steven W. Nydick, J.R. Lockwood, Alina A. von Davier
Applicants must often demonstrate adequate English proficiency when applying to postsecondary institutions by taking an English language proficiency test, such as the TOEFL iBT, IELTS Academic, or Duolingo English Test (DET). Concordance tables aim to provide equivalent scores across multiple assessments, helping admissions officers to make fair decisions regardless of the test that an applicant took. We present our approaches to addressing practical (i.e., data collection and analysis) challenges in the context of building concordance tables between overall scores from the DET and those from the TOEFL iBT and IELTS Academic tests. We summarize a novel method for combining self-reported and official scores to meet recommended minimum sample sizes for concordance studies. We also evaluate sensitivity of estimated concordances to choices about how to (a) weight the observed data to the target population; (b) define outliers; (c) select appropriate pairs of test scores for repeat test takers; and (d) compute equating functions between pairs of scores. We find that estimated concordance functions are largely robust to different combinations of these choices in the regions of the proficiency distribution most relevant to admissions decisions. We discuss implications of our results for both test users and language testers.
{"title":"Practical considerations when building concordances between English tests","authors":"Ramsey L. Cardwell, Steven W. Nydick, J.R. Lockwood, Alina A. von Davier","doi":"10.1177/02655322231195027","DOIUrl":"https://doi.org/10.1177/02655322231195027","url":null,"abstract":"Applicants must often demonstrate adequate English proficiency when applying to postsecondary institutions by taking an English language proficiency test, such as the TOEFL iBT, IELTS Academic, or Duolingo English Test (DET). Concordance tables aim to provide equivalent scores across multiple assessments, helping admissions officers to make fair decisions regardless of the test that an applicant took. We present our approaches to addressing practical (i.e., data collection and analysis) challenges in the context of building concordance tables between overall scores from the DET and those from the TOEFL iBT and IELTS Academic tests. We summarize a novel method for combining self-reported and official scores to meet recommended minimum sample sizes for concordance studies. We also evaluate sensitivity of estimated concordances to choices about how to (a) weight the observed data to the target population; (b) define outliers; (c) select appropriate pairs of test scores for repeat test takers; and (d) compute equating functions between pairs of scores. We find that estimated concordance functions are largely robust to different combinations of these choices in the regions of the proficiency distribution most relevant to admissions decisions. We discuss implications of our results for both test users and language testers.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135967326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-17DOI: 10.1177/02655322231193705
Yasuyo Sawaki
{"title":"Book review: C. A. Chapelle and E. Voss (Eds.), Validity Argument in Language Testing: Case Studies of Validation Research","authors":"Yasuyo Sawaki","doi":"10.1177/02655322231193705","DOIUrl":"https://doi.org/10.1177/02655322231193705","url":null,"abstract":"","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47509742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-17DOI: 10.1177/02655322231191133
Laura Schildt, B. Deygers, A. Weideman
In the context of policy-driven language testing for citizenship, a growing body of research examines the political justifications and ethical implications of language requirements and test use. However, virtually no studies have looked at the role that language testers play in the evolution of language requirements. Critical gaps remain in our understanding of language testers’ first-hand experiences interacting with policymakers and how they perceive the use of tests in public policy. We examined these questions using an exploratory design and semi-structured interviews with 28 test executives representing 25 exam boards in 20 European countries. The interviews were transcribed and double coded in NVivo (weighted kappa = .83) using a priori and inductive coding. We used a horizontal analysis to evaluate responses by participant and a vertical analysis to identify between-case themes. Findings indicate that language testers may benefit from policy literacy to form part of policy webs wherein they can influence instrumental decisions concerning language in migration policy.
{"title":"Language testers and their place in the policy web","authors":"Laura Schildt, B. Deygers, A. Weideman","doi":"10.1177/02655322231191133","DOIUrl":"https://doi.org/10.1177/02655322231191133","url":null,"abstract":"In the context of policy-driven language testing for citizenship, a growing body of research examines the political justifications and ethical implications of language requirements and test use. However, virtually no studies have looked at the role that language testers play in the evolution of language requirements. Critical gaps remain in our understanding of language testers’ first-hand experiences interacting with policymakers and how they perceive the use of tests in public policy. We examined these questions using an exploratory design and semi-structured interviews with 28 test executives representing 25 exam boards in 20 European countries. The interviews were transcribed and double coded in NVivo (weighted kappa = .83) using a priori and inductive coding. We used a horizontal analysis to evaluate responses by participant and a vertical analysis to identify between-case themes. Findings indicate that language testers may benefit from policy literacy to form part of policy webs wherein they can influence instrumental decisions concerning language in migration policy.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46533450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-08DOI: 10.1177/02655322231190058
Takanori Sato
Assessing the content of learners’ compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical thinking skills and subject knowledge. This study aims to conceptualize the construct of content in CLIL by exploring subject specialists’ perspectives on essays’ content quality in a CLIL context. Eleven researchers of English as a lingua franca (ELF) rated the content quality of research-based argumentative essays on ELF submitted in a CLIL course and produced think-aloud protocols. This study explored some essay features that have not been considered relevant in language assessment but are essential in the CLIL context, including the accuracy of the content, presence and quality of research, and presence of elements required in academic essays. Furthermore, the findings of this study confirmed that the components of content often addressed in language assessment (e.g., elaboration and logicality) are pertinent to writing assessment in CLIL. The manner in which subject specialists construe the content quality of essays on their specialized discipline can deepen the current understanding of content in CLIL.
{"title":"Assessing the content quality of essays in content and language integrated learning: Exploring the construct from subject specialists’ perspectives","authors":"Takanori Sato","doi":"10.1177/02655322231190058","DOIUrl":"https://doi.org/10.1177/02655322231190058","url":null,"abstract":"Assessing the content of learners’ compositions is a common practice in second language (L2) writing assessment. However, the construct definition of content in L2 writing assessment potentially underrepresents the target competence in content and language integrated learning (CLIL), which aims to foster not only L2 proficiency but also critical thinking skills and subject knowledge. This study aims to conceptualize the construct of content in CLIL by exploring subject specialists’ perspectives on essays’ content quality in a CLIL context. Eleven researchers of English as a lingua franca (ELF) rated the content quality of research-based argumentative essays on ELF submitted in a CLIL course and produced think-aloud protocols. This study explored some essay features that have not been considered relevant in language assessment but are essential in the CLIL context, including the accuracy of the content, presence and quality of research, and presence of elements required in academic essays. Furthermore, the findings of this study confirmed that the components of content often addressed in language assessment (e.g., elaboration and logicality) are pertinent to writing assessment in CLIL. The manner in which subject specialists construe the content quality of essays on their specialized discipline can deepen the current understanding of content in CLIL.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48912792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-25DOI: 10.1177/02655322231186706
William S. Pearson
Many candidates undertaking high-stakes English language proficiency tests for academic enrolment do not achieve the results they need for reasons including linguistic unreadiness, test unpreparedness, illness, an unfavourable configuration of tasks, or administrative and marking errors. Owing to the importance of meeting goals or out of a belief that original test performance was satisfactory, some individuals query their results, while others go on to retake the test, perhaps on multiple occasions. This article critically reviews the policies of eight well-known, on-demand gatekeeping English language tests, describing the systems adopted by language assessment organisations to regulate results enquiries, candidates resitting (components of) a test where performance fell short of requirements, and repeat test-taking. It was found that all providers institute clear mechanisms through which candidates can query their results, with notable variations exhibited in procedures, costs, restrictions, outcomes, and how policies are communicated to test-takers. Test resit options are scarce, while organisations enact few restrictions on test retakes in the form of mandatory waiting times and cautionary advice. The implications for language assessment organisations are discussed.
{"title":"Test review: High-stakes English language proficiency tests—Enquiry, resit, and retake policies","authors":"William S. Pearson","doi":"10.1177/02655322231186706","DOIUrl":"https://doi.org/10.1177/02655322231186706","url":null,"abstract":"Many candidates undertaking high-stakes English language proficiency tests for academic enrolment do not achieve the results they need for reasons including linguistic unreadiness, test unpreparedness, illness, an unfavourable configuration of tasks, or administrative and marking errors. Owing to the importance of meeting goals or out of a belief that original test performance was satisfactory, some individuals query their results, while others go on to retake the test, perhaps on multiple occasions. This article critically reviews the policies of eight well-known, on-demand gatekeeping English language tests, describing the systems adopted by language assessment organisations to regulate results enquiries, candidates resitting (components of) a test where performance fell short of requirements, and repeat test-taking. It was found that all providers institute clear mechanisms through which candidates can query their results, with notable variations exhibited in procedures, costs, restrictions, outcomes, and how policies are communicated to test-takers. Test resit options are scarce, while organisations enact few restrictions on test retakes in the form of mandatory waiting times and cautionary advice. The implications for language assessment organisations are discussed.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41921024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-20DOI: 10.1177/02655322231186707
Tomohito Hiromori, H. Mohebbi
Agresti, A. (2013). Categorical data analysis (3rd ed.). Wiley. Dixon, P. (2008). Models of accuracy in repeated-measures designs. Journal of Memory and Language, 59(4), 447–456. https://doi.org/10.1016/j.jml.2007.11.004 Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the multilevel Rasch model: With the lme4 package. Journal of Statistical Software, 20(2), 1–18. https://doi.org/10.18637/ jss.v020.i02 Embretson, S., & Gorin, J. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38(4), 343–368. https://doi.org/10. 1111/j.1745-3984.2001.tb01131.x Jiang, Z. (2018). Using the linear mixed-effect model framework to estimate generalizability variance components in R. Methodology, 14(3), 133–142. https://doi.org/10.1027/1614-2241/ a000149 Kidd, E., Donnelly, S., & Christiansen, M. H. (2018). Individual differences in language acquisition and processing. Trends in Cognitive Sciences, 22(2), 154–169. https://doi.org/10.1016/j. tics.2017.11.006
{"title":"Book review: K. Sadeghi (Ed.), Technology-Assisted Language Assessment in Diverse Contexts: Lessons from the Transition to Online Testing During Covid-19","authors":"Tomohito Hiromori, H. Mohebbi","doi":"10.1177/02655322231186707","DOIUrl":"https://doi.org/10.1177/02655322231186707","url":null,"abstract":"Agresti, A. (2013). Categorical data analysis (3rd ed.). Wiley. Dixon, P. (2008). Models of accuracy in repeated-measures designs. Journal of Memory and Language, 59(4), 447–456. https://doi.org/10.1016/j.jml.2007.11.004 Doran, H., Bates, D., Bliese, P., & Dowling, M. (2007). Estimating the multilevel Rasch model: With the lme4 package. Journal of Statistical Software, 20(2), 1–18. https://doi.org/10.18637/ jss.v020.i02 Embretson, S., & Gorin, J. (2001). Improving construct validity with cognitive psychology principles. Journal of Educational Measurement, 38(4), 343–368. https://doi.org/10. 1111/j.1745-3984.2001.tb01131.x Jiang, Z. (2018). Using the linear mixed-effect model framework to estimate generalizability variance components in R. Methodology, 14(3), 133–142. https://doi.org/10.1027/1614-2241/ a000149 Kidd, E., Donnelly, S., & Christiansen, M. H. (2018). Individual differences in language acquisition and processing. Trends in Cognitive Sciences, 22(2), 154–169. https://doi.org/10.1016/j. tics.2017.11.006","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46160101","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-07DOI: 10.1177/02655322231183077
Louise Palmour
This article explores the nature of the construct underlying classroom-based English for academic purpose (EAP) oral presentation assessments, which are used, in part, to determine admission to programmes of study at UK universities. Through analysis of qualitative data (from questionnaires, interviews, rating discussions, and fieldnotes), the article highlights how, in EAP settings, there is a tendency for the rating criteria and EAP teacher assessors to sometimes focus too narrowly on particular spoken linguistic aspects of oral presentations. This is in spite of student assessees drawing on, and teacher assessors valuing, the multimodal communicative affordances available in oral presentation performances. To better avoid such construct underrepresentation, oral presentation tasks should be acknowledged and represented in rating scales, teacher assessor decision-making, and training in EAP contexts.
{"title":"Assessing speaking through multimodal oral presentations: The case of construct underrepresentation in EAP contexts","authors":"Louise Palmour","doi":"10.1177/02655322231183077","DOIUrl":"https://doi.org/10.1177/02655322231183077","url":null,"abstract":"This article explores the nature of the construct underlying classroom-based English for academic purpose (EAP) oral presentation assessments, which are used, in part, to determine admission to programmes of study at UK universities. Through analysis of qualitative data (from questionnaires, interviews, rating discussions, and fieldnotes), the article highlights how, in EAP settings, there is a tendency for the rating criteria and EAP teacher assessors to sometimes focus too narrowly on particular spoken linguistic aspects of oral presentations. This is in spite of student assessees drawing on, and teacher assessors valuing, the multimodal communicative affordances available in oral presentation performances. To better avoid such construct underrepresentation, oral presentation tasks should be acknowledged and represented in rating scales, teacher assessor decision-making, and training in EAP contexts.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49589857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-03DOI: 10.1177/02655322231179494
C. Occhino, Ryan Lidster, Leah Geer, Jason D. Listman, P. Hauser
We describe the development and initial validation of the “ASL Fingerspelling and Number Comprehension Test” (ASL FaN-CT), a test of recognition proficiency for fingerspelled words in American Sign Language (ASL). Despite the relative frequency of fingerspelling in ASL discourse, learners commonly struggle to produce and perceive fingerspelling more than they do other facets of ASL. However, assessments of fingerspelling knowledge are highly underrepresented in the testing literature for signed languages. After first describing the construct, we describe test development, piloting, revisions, and evaluate the strength of the test’s validity argument vis-à-vis its intended interpretation and use as a screening instrument for current and future employees. The results of a pilot on 79 ASL learners provide strong evidence that the revised test is performing as intended and can be used to make accurate decisions about ASL learners’ proficiency in fingerspelling recognition. We conclude by describing the item properties observed in our current test, and our plans for continued validation and analysis with respect to a battery of tests of ASL proficiency currently in development.
{"title":"Development of the American Sign Language Fingerspelling and Numbers Comprehension Test (ASL FaN-CT)","authors":"C. Occhino, Ryan Lidster, Leah Geer, Jason D. Listman, P. Hauser","doi":"10.1177/02655322231179494","DOIUrl":"https://doi.org/10.1177/02655322231179494","url":null,"abstract":"We describe the development and initial validation of the “ASL Fingerspelling and Number Comprehension Test” (ASL FaN-CT), a test of recognition proficiency for fingerspelled words in American Sign Language (ASL). Despite the relative frequency of fingerspelling in ASL discourse, learners commonly struggle to produce and perceive fingerspelling more than they do other facets of ASL. However, assessments of fingerspelling knowledge are highly underrepresented in the testing literature for signed languages. After first describing the construct, we describe test development, piloting, revisions, and evaluate the strength of the test’s validity argument vis-à-vis its intended interpretation and use as a screening instrument for current and future employees. The results of a pilot on 79 ASL learners provide strong evidence that the revised test is performing as intended and can be used to make accurate decisions about ASL learners’ proficiency in fingerspelling recognition. We conclude by describing the item properties observed in our current test, and our plans for continued validation and analysis with respect to a battery of tests of ASL proficiency currently in development.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49114525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-07-03DOI: 10.1177/02655322231179134
Okim Kang, Xun Yan, M. Kostromitina, Ron I. Thomson, T. Isaacs
This study aimed to answer an ongoing validity question related to the use of nonstandard English accents in international tests of English proficiency and associated issues of test fairness. More specifically, we examined (1) the extent to which different or shared English accents had an impact on listeners’ performances on the Duolingo listening tests and (2) the extent to which different English accents affected listeners’ performances on two different task types. Speakers from four interlanguage English accent varieties (Chinese, Spanish, Indian English [Hindi], and Korean) produced speech samples for “yes/no” vocabulary and dictation Duolingo listening tasks. Listeners who spoke with these same four English accents were then recruited to take the Duolingo listening test items. Results indicated that there is a shared first language (L1) benefit effect overall, with comparable test scores between shared-L1 and inner-circle L1 accents, and no significant differences in listeners’ listening performance scores across highly intelligible accent varieties. No task type effect was found. The findings provide guidance to better understand fairness, equality, and practicality of designing and administering high-stakes English tests targeting a diversity of accents.
{"title":"Fairness of using different English accents: The effect of shared L1s in listening tasks of the Duolingo English test","authors":"Okim Kang, Xun Yan, M. Kostromitina, Ron I. Thomson, T. Isaacs","doi":"10.1177/02655322231179134","DOIUrl":"https://doi.org/10.1177/02655322231179134","url":null,"abstract":"This study aimed to answer an ongoing validity question related to the use of nonstandard English accents in international tests of English proficiency and associated issues of test fairness. More specifically, we examined (1) the extent to which different or shared English accents had an impact on listeners’ performances on the Duolingo listening tests and (2) the extent to which different English accents affected listeners’ performances on two different task types. Speakers from four interlanguage English accent varieties (Chinese, Spanish, Indian English [Hindi], and Korean) produced speech samples for “yes/no” vocabulary and dictation Duolingo listening tasks. Listeners who spoke with these same four English accents were then recruited to take the Duolingo listening test items. Results indicated that there is a shared first language (L1) benefit effect overall, with comparable test scores between shared-L1 and inner-circle L1 accents, and no significant differences in listeners’ listening performance scores across highly intelligible accent varieties. No task type effect was found. The findings provide guidance to better understand fairness, equality, and practicality of designing and administering high-stakes English tests targeting a diversity of accents.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":" ","pages":""},"PeriodicalIF":4.1,"publicationDate":"2023-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45816789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-27DOI: 10.1177/02655322231159143
Danielle Guzman-Orth, Jonathan Steinberg, Traci Albee
Standardizing accessible test design and development to meet students’ individual access needs is a complex task. The following study provides one approach to accessible test design and development using participatory design methods with school community members. Participatory research provides opportunities to empower collaborators by co-creating knowledge that is useful for assessment development. In this study, teachers of students who are visually impaired, students who are blind or are visually impaired, English language teachers, and test administrators provided feedback at critical stages of the development process to explore the construct validity of English language proficiency (ELP) assessments. Students who are blind or visually impared need to be able to show what they know and can do without impact from construct-irrelevant variance like language acquisition or disability characteristics. Building on our iterative accessible test design, development, and delivery practices, and as part of a large project on English-learner proficiency test accessibility and usability, we collected rich observation and interview data from 17 students who were blind or visually impaired and were enrolled in grades kindergarten through Grade 12. We examined the ratings and item metadata, including assistive technology preferences and interactions, while we used grounded theory approaches to examine qualitative thematic findings. Implications for research and practice are discussed.
{"title":"English learners who are blind or visually impaired: A participatory design approach to enhancing fairness and validity for language testing accommodations","authors":"Danielle Guzman-Orth, Jonathan Steinberg, Traci Albee","doi":"10.1177/02655322231159143","DOIUrl":"https://doi.org/10.1177/02655322231159143","url":null,"abstract":"Standardizing accessible test design and development to meet students’ individual access needs is a complex task. The following study provides one approach to accessible test design and development using participatory design methods with school community members. Participatory research provides opportunities to empower collaborators by co-creating knowledge that is useful for assessment development. In this study, teachers of students who are visually impaired, students who are blind or are visually impaired, English language teachers, and test administrators provided feedback at critical stages of the development process to explore the construct validity of English language proficiency (ELP) assessments. Students who are blind or visually impared need to be able to show what they know and can do without impact from construct-irrelevant variance like language acquisition or disability characteristics. Building on our iterative accessible test design, development, and delivery practices, and as part of a large project on English-learner proficiency test accessibility and usability, we collected rich observation and interview data from 17 students who were blind or visually impaired and were enrolled in grades kindergarten through Grade 12. We examined the ratings and item metadata, including assistive technology preferences and interactions, while we used grounded theory approaches to examine qualitative thematic findings. Implications for research and practice are discussed.","PeriodicalId":17928,"journal":{"name":"Language Testing","volume":"40 1","pages":"933 - 959"},"PeriodicalIF":4.1,"publicationDate":"2023-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43968301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}