Studies in Language Assessment最新文献

英文中文

Using dictation to measure language proficiency: A Rasch analysis 用听写来衡量语言能力:拉什分析

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2016-01-01 DOI: 10.58379/mbsw8958

Paul Leeming, Aeric Wong

Groups are used widely in the language classroom and, particularly in classes where there is a wide range of English proficiency among students, teachers may want to construct balanced groups based on the language proficiency of individual students. In order to construct such groups, teachers need a reliable measure that effectively differentiates between different levels of proficiency, and yet there are contexts where information regarding student proficiency may not be available. This paper reports on the use of an in-house dictation test to measure the English proficiency of students in a Japanese university. Rasch analysis was used to determine the degree to which the dictation differentiated between the range of proficiencies in the classes, and to assess the reliability of the test. Correlation with scores from TOEIC and SLEP tests was used to confirm that the dictation tests English proficiency. Results show that dictation is a simple, cheap, and effective means of assessing the relative proficiency of students in this context, and can be used for constructing balanced groups.

小组在语言课堂上被广泛使用，特别是在学生英语水平参差不齐的课堂上，教师可能希望根据个别学生的语言水平构建平衡的小组。为了构建这样的小组，教师需要一个可靠的衡量标准来有效地区分不同的熟练程度，然而在某些情况下，关于学生熟练程度的信息可能是不可用的。本文报道了一所日本大学使用内部听写测试来衡量学生的英语水平。Rasch分析用于确定听写在班级熟练程度之间的差异程度，并评估测试的可靠性。通过与TOEIC和SLEP测试成绩的相关性来证实听写测试的英语水平。结果表明，在这种情况下，听写是一种简单、廉价、有效的评估学生相对熟练程度的方法，可用于构建平衡小组。

引用次数: 4

An evaluation of an online rater training program for the speaking and writing sub-tests of the Aptis test 对阿普提斯考试口语和写作子测试在线评分员培训计划的评估

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2016-01-01 DOI: 10.58379/xdyp1068

U. Knoch, J. Fairbairn, A. Huisman

Many large scale proficiency assessments that use human raters as part of their scoring procedures struggle with the realities of being able to offer regular face-to-face rater training workshops for new raters in different locations in the world. A number of these testing agencies have therefore introduced online rater training systems in order to access raters in a larger number of locations as well as from different contexts. Potential raters have more flexibility to complete the training in their own time and at their own pace. This paper describes the collaborative evaluation of a new online rater training module developed for a large scale international language assessment. The longitudinal evaluation focussed on two key points in the development process of the new program. The first, involving scrutiny of the online program, took place when the site was close to completion and the second, an empirical evaluation, followed the training of the first trial cohort of raters. The main purpose of this paper is to detail some of the complexities of completing such an evaluation within the operational demands of rolling out a new system and to comment on the advantages of the collaborative nature of such a project.

许多使用人工评分员作为评分程序一部分的大规模熟练程度评估，都难以为世界各地的新评分员提供定期面对面的评分员培训研讨会。因此，一些考试机构引进了在线评分员培训系统，以便在更多的地点和不同的背景下访问评分员。潜在的评分员有更大的灵活性，可以按照自己的时间和节奏完成培训。本文描述了为大规模国际语言评估而开发的一个新的在线评估员培训模块的协同评估。纵向评价集中在新方案开发过程中的两个关键点上。第一次是在网站即将完工时对在线课程进行审查，第二次是在对第一批试用评价员进行培训之后进行实证评估。本文的主要目的是详细说明在推出新系统的操作需求中完成这样一个评估的一些复杂性，并评论这样一个项目的协作性质的优势。

{"title":"An evaluation of an online rater training program for the speaking and writing sub-tests of the Aptis test","authors":"U. Knoch, J. Fairbairn, A. Huisman","doi":"10.58379/xdyp1068","DOIUrl":"https://doi.org/10.58379/xdyp1068","url":null,"abstract":"Many large scale proficiency assessments that use human raters as part of their scoring procedures struggle with the realities of being able to offer regular face-to-face rater training workshops for new raters in different locations in the world. A number of these testing agencies have therefore introduced online rater training systems in order to access raters in a larger number of locations as well as from different contexts. Potential raters have more flexibility to complete the training in their own time and at their own pace. This paper describes the collaborative evaluation of a new online rater training module developed for a large scale international language assessment. The longitudinal evaluation focussed on two key points in the development process of the new program. The first, involving scrutiny of the online program, took place when the site was close to completion and the second, an empirical evaluation, followed the training of the first trial cohort of raters. The main purpose of this paper is to detail some of the complexities of completing such an evaluation within the operational demands of rolling out a new system and to comment on the advantages of the collaborative nature of such a project.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85706267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Maintaining the connection between test and context: A language test for university admission 保持考试和语境之间的联系:大学入学的语言考试

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2016-01-01 DOI: 10.58379/upin8160

J. Pill

This paper reflects on a review of an existing English language examination for admission to an English-medium university in a non-English-dominant context. Studying how well an established test sits in its present context may highlight environmental changes causing gaps and points of friction. Such an evaluation therefore provides a baseline understanding from which to move forward. From the 1960s to 1980s, experts developed an examination for applicants to the American University of Beirut that was similar to the Test of English as a Foreign Language (TOEFL) of that time. The AUB English Entrance Examination has remained relatively unchanged since then. Concern about its effectiveness prompted a recent review, providing an opportunity to study consequences of employing a test not fully adapted to its current use. The review found differences in what is/was viewed as appropriate test format and content, and in definitions of language proficiency. It also noted unwarranted assumptions made about comparability of results from different tests. Current language practices at the university, in the region and in the globalized workplace where graduates subsequently seek employment are different from those assumed when the test was first developed. This indicates the need for test revision and, for example, the potential benefit of developing an institutional language policy.

本文反思了在非英语主导的背景下，现有的英语大学入学英语语言考试的回顾。研究一个既定的测试在当前环境下的表现可能会突出环境变化导致的差距和摩擦点。因此，这样的评估提供了一个基线的理解，可以据此向前推进。从20世纪60年代到80年代，专家们为贝鲁特美国大学的申请者开发了一种类似于当时的托福考试的考试。从那时起，AUB英语入学考试一直保持相对不变。对其有效性的关注促使最近进行了审查，提供了一个机会来研究采用不完全适应其当前使用的测试的后果。审查发现，在适当的测试形式和内容以及语言能力的定义方面存在差异。它还注意到对不同测试结果的可比性所作的毫无根据的假设。在大学、地区和全球化的工作场所中，毕业生随后寻找工作，当前的语言实践与最初开发测试时的假设不同。这表明有必要修订考试，例如，有必要拟订一项机构性的语文政策。

{"title":"Maintaining the connection between test and context: A language test for university admission","authors":"J. Pill","doi":"10.58379/upin8160","DOIUrl":"https://doi.org/10.58379/upin8160","url":null,"abstract":"This paper reflects on a review of an existing English language examination for admission to an English-medium university in a non-English-dominant context. Studying how well an established test sits in its present context may highlight environmental changes causing gaps and points of friction. Such an evaluation therefore provides a baseline understanding from which to move forward. From the 1960s to 1980s, experts developed an examination for applicants to the American University of Beirut that was similar to the Test of English as a Foreign Language (TOEFL) of that time. The AUB English Entrance Examination has remained relatively unchanged since then. Concern about its effectiveness prompted a recent review, providing an opportunity to study consequences of employing a test not fully adapted to its current use. The review found differences in what is/was viewed as appropriate test format and content, and in definitions of language proficiency. It also noted unwarranted assumptions made about comparability of results from different tests. Current language practices at the university, in the region and in the globalized workplace where graduates subsequently seek employment are different from those assumed when the test was first developed. This indicates the need for test revision and, for example, the potential benefit of developing an institutional language policy.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89095816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Using evaluation to promote change in language teacher practice 运用评价促进语言教师实践的变革

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2016-01-01 DOI: 10.58379/wxpg2438

R. Erlam

Recent literature in teacher education has argued for a shift away from the development of teacher cognitions as a goal of teacher education, to the development of core practices which would make a difference to students’ lives in the classroom (Ball & Forzani, 2009; Kubanyiova & Feryok, 2015; Zeichner, 2012). Hiebert and Morris (2012) propose that these key practices would be embedded into instructional contexts and preserved as lesson plans and as common assessments. This paper focuses on the evaluation tools developed for an in-service professional development programme for language teachers (the Teacher Professional Development Languages (TPDL) programme: http://www.tpdl.ac.nz/). TPDL is a year-long programme for teachers of foreign languages in NZ schools. Programme participants are visited by TPDL In-School support facilitators four times during the course of the year. The facilitators observe their teaching practice and then use two key documents, the ‘Evidence of Principles and Strategies (EPS) portfolio’ and the ‘Progress Standards’ to assist teachers to evaluate their practice against key criteria. As the year progresses the teachers are increasingly encouraged to take ownership and control of the use of these tools, so that by Visit 4, the evaluation is conducted as a self-assessment. This paper evaluates these tools and considers evidence for their validity. Data is presented from the case study of one teacher, to further demonstrate how the tools are used and to document evidence for any change in teaching practice.

最近的教师教育文献认为，教师教育的目标应该从教师认知的发展转向核心实践的发展，这将对学生的课堂生活产生影响(Ball & Forzani, 2009;Kubanyiova & Feryok, 2015;Zeichner, 2012)。希伯特和莫里斯(2012)提出，这些关键的实践将嵌入到教学环境中，并作为课程计划和共同评估保留下来。本文的重点是为语言教师在职专业发展计划(教师专业发展语言(TPDL)计划:http://www.tpdl.ac.nz/)开发的评估工具。TPDL是为新西兰学校的外语教师提供的为期一年的课程。计划参加者在一年内会获TPDL校内支援辅导员探访四次。辅导员观察他们的教学实践，然后使用两个关键文件，“原则和策略的证据(EPS)组合”和“进步标准”来帮助教师根据关键标准评估他们的实践。随着时间的推移，越来越多的教师被鼓励掌握和控制这些工具的使用，这样到第四次访问时，评估就像自我评估一样进行。本文对这些工具进行了评估，并考虑了其有效性的证据。数据来自一位教师的案例研究，以进一步展示如何使用这些工具，并为教学实践中的任何变化提供证据。

{"title":"Using evaluation to promote change in language teacher practice","authors":"R. Erlam","doi":"10.58379/wxpg2438","DOIUrl":"https://doi.org/10.58379/wxpg2438","url":null,"abstract":"Recent literature in teacher education has argued for a shift away from the development of teacher cognitions as a goal of teacher education, to the development of core practices which would make a difference to students’ lives in the classroom (Ball & Forzani, 2009; Kubanyiova & Feryok, 2015; Zeichner, 2012). Hiebert and Morris (2012) propose that these key practices would be embedded into instructional contexts and preserved as lesson plans and as common assessments. This paper focuses on the evaluation tools developed for an in-service professional development programme for language teachers (the Teacher Professional Development Languages (TPDL) programme: http://www.tpdl.ac.nz/). TPDL is a year-long programme for teachers of foreign languages in NZ schools. Programme participants are visited by TPDL In-School support facilitators four times during the course of the year. The facilitators observe their teaching practice and then use two key documents, the ‘Evidence of Principles and Strategies (EPS) portfolio’ and the ‘Progress Standards’ to assist teachers to evaluate their practice against key criteria. As the year progresses the teachers are increasingly encouraged to take ownership and control of the use of these tools, so that by Visit 4, the evaluation is conducted as a self-assessment. This paper evaluates these tools and considers evidence for their validity. Data is presented from the case study of one teacher, to further demonstrate how the tools are used and to document evidence for any change in teaching practice.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82595080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Interaction in a paired oral assessment: Revisiting the effect of proficiency 配对口头评估中的互动:再访熟练程度的影响

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2016-01-01 DOI: 10.58379/lzzz5040

Young-A Son

Paired oral assessments have gained increasing popularity as a method of assessing speaking skills (East, 2015; Galaczi, 2014). Several advantages have been associated with this method, including practicality and authenticity (Taylor, 2003). Nevertheless, concerns have also been raised in terms of the interlocutor effect in paired speaking tests, particularly in regard to the interlocutor’s oral proficiency (e.g., Norton, 2005). The present study reports on an approximate replication of Davis (2009), who looked at the effect of interlocutor proficiency on paired speaking assessments. The current study compared the oral performance of 24 university students in two different pairing conditions: once paired with a partner of the same proficiency level and once with a partner of a different proficiency level. Results of this replication study partially confirmed Davis’s (2009) results. There were only minimal differences in test-takers’ scores between both conditions. A multi-facet Rasch analysis confirmed these results indicating that the pairing conditions were equivalent in difficulty. There were, however, observable differences in the quantity of talk depending on the interlocutor’s proficiency. Unlike Davis (2009), this study found that low-proficiency test-takers produced fewer words when paired with high-proficiency partners. Even though the number of words produced by test takers was affected by their partner’s proficiency, their performance scores remained constant.

配对口头评估作为一种评估口语技能的方法越来越受欢迎(East, 2015;Galaczi, 2014)。这种方法有几个优点，包括实用性和真实性(Taylor, 2003)。然而，配对口语测试中的对话者效应也引起了关注，特别是在对话者的口语熟练程度方面(例如，Norton, 2005)。本研究报告近似复制了戴维斯(2009)的研究，他研究了对话者熟练程度对配对口语评估的影响。目前的研究比较了24名大学生在两种不同的配对条件下的口语表现:一次与熟练程度相同的搭档搭档，一次与熟练程度不同的搭档搭档。该重复研究的结果部分证实了Davis(2009)的结果。在两种情况下，考生的分数只有很小的差异。多面Rasch分析证实了这些结果，表明配对条件在难度上是等效的。然而，根据对话者的熟练程度，谈话的数量有明显的差异。与Davis(2009)不同的是，本研究发现，低水平的考生在与高水平的搭档搭档时，会产生更少的单词。尽管测试者产生的单词数量受到他们搭档熟练程度的影响，但他们的表现分数保持不变。

{"title":"Interaction in a paired oral assessment: Revisiting the effect of proficiency","authors":"Young-A Son","doi":"10.58379/lzzz5040","DOIUrl":"https://doi.org/10.58379/lzzz5040","url":null,"abstract":"Paired oral assessments have gained increasing popularity as a method of assessing speaking skills (East, 2015; Galaczi, 2014). Several advantages have been associated with this method, including practicality and authenticity (Taylor, 2003). Nevertheless, concerns have also been raised in terms of the interlocutor effect in paired speaking tests, particularly in regard to the interlocutor’s oral proficiency (e.g., Norton, 2005). The present study reports on an approximate replication of Davis (2009), who looked at the effect of interlocutor proficiency on paired speaking assessments. The current study compared the oral performance of 24 university students in two different pairing conditions: once paired with a partner of the same proficiency level and once with a partner of a different proficiency level. Results of this replication study partially confirmed Davis’s (2009) results. There were only minimal differences in test-takers’ scores between both conditions. A multi-facet Rasch analysis confirmed these results indicating that the pairing conditions were equivalent in difficulty. There were, however, observable differences in the quantity of talk depending on the interlocutor’s proficiency. Unlike Davis (2009), this study found that low-proficiency test-takers produced fewer words when paired with high-proficiency partners. Even though the number of words produced by test takers was affected by their partner’s proficiency, their performance scores remained constant.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87704852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

S. Gollin-Kies, D. R. Hall and S. H. Moore. Language for Specific Purposes S. Gollin-Kies, D. R. Hall和S. H. Moore。特定用途语言

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2016-01-01 DOI: 10.58379/erbg3448

A. Koschade

n/a

N/A

引用次数: 0

Authentic interaction and examiner accommodation in the IELTS speaking test: A discussion 雅思口语考试中真实的互动和考官的适应:讨论

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2015-01-01 DOI: 10.58379/kdbk9824

A. Filipi

Speakers naturally adjust their speech in interactions with others and will use accommodative strategies if their co-speaker is having difficulty understanding. These same adjustments have also been found in examiner accommodation in second language speaking tests (Cafarella, 1997; Ross, 1992). In the training of examiners in the IELTS speaking test, there is an attempt to control the degree of examiner accommodation in the interests of consistency. Examiners are explicitly instructed to avoid the use of response tokens or to repeat a question only once without rephrasing it in the face of repair (Seedhouse & Egbert, 2006). This specific attempt to remove aspects of what is deemed to be authentic1 interactional behaviour runs counter to what speakers do ‘in the wild’ as the growing body of research in conversation analysis shows (see for example Hutchby & Wooffit, 2008). We believe that it is timely to discuss the issue of examiner accommodation within a language-testing context against a backdrop of what is now known about naturally occurring interaction. We initiate such a discussion by reviewing the scholarly literature on interaction, and on the IELTS speaking test and examiner accommodation.

说话者在与他人互动时自然会调整自己的语言，如果对方有理解困难，他们会使用适应策略。同样的调整也出现在第二语言口语测试的考官适应上(Cafarella, 1997;罗斯,1992)。在雅思口语考试考官的培训中，为了一致性的利益，试图控制考官的适应程度。考官被明确指示要避免使用回答令牌，或者在面对修理时只重复一次问题，而不要重新措辞(Seedhouse & Egbert, 2006)。正如越来越多的谈话分析研究所显示的那样，这种移除被认为是真实的互动行为的具体尝试与说话者在“野外”的行为背道而驰(例如，见Hutchby & Wooffit, 2008)。我们认为，在目前已知的自然发生的相互作用的背景下，在语言测试环境中讨论考官适应问题是及时的。我们通过回顾有关互动的学术文献，以及雅思口语考试和考官住宿的学术文献来发起这样的讨论。

{"title":"Authentic interaction and examiner accommodation in the IELTS speaking test: A discussion","authors":"A. Filipi","doi":"10.58379/kdbk9824","DOIUrl":"https://doi.org/10.58379/kdbk9824","url":null,"abstract":"Speakers naturally adjust their speech in interactions with others and will use accommodative strategies if their co-speaker is having difficulty understanding. These same adjustments have also been found in examiner accommodation in second language speaking tests (Cafarella, 1997; Ross, 1992). In the training of examiners in the IELTS speaking test, there is an attempt to control the degree of examiner accommodation in the interests of consistency. Examiners are explicitly instructed to avoid the use of response tokens or to repeat a question only once without rephrasing it in the face of repair (Seedhouse & Egbert, 2006). This specific attempt to remove aspects of what is deemed to be authentic1 interactional behaviour runs counter to what speakers do ‘in the wild’ as the growing body of research in conversation analysis shows (see for example Hutchby & Wooffit, 2008). We believe that it is timely to discuss the issue of examiner accommodation within a language-testing context against a backdrop of what is now known about naturally occurring interaction. We initiate such a discussion by reviewing the scholarly literature on interaction, and on the IELTS speaking test and examiner accommodation.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76697061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

DIF investigations across groups of gender and academic background in a large-scale high-stakes language test 大规模高风险语言测试中跨性别和学术背景的DIF调查

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2015-01-01 DOI: 10.58379/rshg8366

Xia-li Song, Liying Cheng, D. Klinger

High-stakes pre-entry language testing is the predominate tool used to measure test takers’ proficiency for admission purposes in higher education in China. Given the important role of these tests, there are heated discussions about how to ensure test fairness for different groups of test takers. This study examined the fairness of the Graduate School Entrance English Examination (GSEEE) that is used to decide whether over one million test takers can enter master’s programs in China. Using SIBTEST and content analysis, the study investigated differential item functioning (DIF) and the presence of potential bias on the GSEEE with aspects to groups of gender and academic background. Results found that a large percentage of the GSEEE items did not provide reliable results to distinguish good and poor performers. A number of DIF and DBF functioned differentially and three test reviewers identified a myriad of factors such as motivation and learning styles that potentially contributed to group performance differences. However, consistent evidence was not found to suggest these flagged items/texts exhibited bias. While systematic bias may not have been detected, the results revealed poor test reliability and the study highlighted an urgent need to improve test quality and clarify the purpose of the test. DIF issues may be revisited once test quality has been improved.

在中国高等教育中，高风险的入学前语言测试是衡量考生入学能力的主要工具。鉴于这些考试的重要作用，人们就如何确保不同考生群体的考试公平展开了激烈的讨论。这项研究考察了研究生入学英语考试(GSEEE)的公平性，GSEEE被用来决定中国100多万考生是否能进入硕士项目。本研究采用SIBTEST和内容分析方法，从性别和学术背景两个方面考察了GSEEE的差异项目功能(DIF)和潜在偏差的存在。结果发现，很大比例的GSEEE项目没有提供可靠的结果来区分优等生和劣等生。许多DIF和DBF的功能是不同的，三个测试评审者确定了无数的因素，如动机和学习风格，这些因素可能会导致小组表现的差异。然而，没有一致的证据表明这些标记的项目/文本显示出偏见。虽然系统偏倚可能没有被发现，但结果显示了较差的测试可靠性，该研究强调了提高测试质量和澄清测试目的的迫切需要。一旦测试质量得到改善，DIF问题可能会被重新审视。

{"title":"DIF investigations across groups of gender and academic background in a large-scale high-stakes language test ","authors":"Xia-li Song, Liying Cheng, D. Klinger","doi":"10.58379/rshg8366","DOIUrl":"https://doi.org/10.58379/rshg8366","url":null,"abstract":"High-stakes pre-entry language testing is the predominate tool used to measure test takers’ proficiency for admission purposes in higher education in China. Given the important role of these tests, there are heated discussions about how to ensure test fairness for different groups of test takers. This study examined the fairness of the Graduate School Entrance English Examination (GSEEE) that is used to decide whether over one million test takers can enter master’s programs in China. Using SIBTEST and content analysis, the study investigated differential item functioning (DIF) and the presence of potential bias on the GSEEE with aspects to groups of gender and academic background. Results found that a large percentage of the GSEEE items did not provide reliable results to distinguish good and poor performers. A number of DIF and DBF functioned differentially and three test reviewers identified a myriad of factors such as motivation and learning styles that potentially contributed to group performance differences. However, consistent evidence was not found to suggest these flagged items/texts exhibited bias. While systematic bias may not have been detected, the results revealed poor test reliability and the study highlighted an urgent need to improve test quality and clarify the purpose of the test. DIF issues may be revisited once test quality has been improved.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"90 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83924605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Ducasse, A. M. Interaction in paired oral proficiency assessment in Spanish. Language Testing and Evaluation Series 西班牙语口语能力配对评估中的互动。语言测试与评价系列

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2015-01-01 DOI: 10.58379/edcu5184

C. Inoue

n/a

N/A

引用次数: 0

Accuplacer Companion in a foreign language context: An argument-based validation of both test score meaning and impact 外语语境中的Accuplacer伴侣:基于论证的测试分数意义和影响验证

Q4 LINGUISTICS

Studies in Language Assessment

Pub Date : 2015-01-01 DOI: 10.58379/vavb1448

Robert C. Johnson, A. Riazi

Use of a single, standardised instrument to make high-stakes decisions about test-takers is pervasive in higher education around the world, including English as a foreign language (EFL) contexts. Contrary to longstanding best practices, however, few test users endeavour to meaningfully validate the instrument(s) they use for their specific context and purposes. This study reports efforts to validate a standardised placement test, used in a US-accredited, higher education institution in the Pacific, to exempt, exclude, or place students within its Developmental English Program. A hybrid of two validation structures – Kane’s (1992, 1994) interpretive model and Bachman’s (2005) and Bachman and Palmer’s (2010) assessment use argument – and a broad range of types and sources of evidence were used to ensure a balanced focus on both test score interpretation and test utilisation. Outcomes establish serious doubt as to the validity of the instrument for the local context. Moreover, results provide valuable insights regarding the dangers of not evaluating the validity of an assessment for the local context, the relative strengths and weaknesses of standardised tests used for placement, and the value of argument-based validation.

在世界各地的高等教育中，包括英语作为外语(EFL)的背景下，使用单一的、标准化的工具来对考生做出高风险的决定是很普遍的。然而，与长期存在的最佳实践相反，很少有测试用户努力有意义地验证他们为特定环境和目的使用的工具。本研究报告了验证标准化分班测试的努力，该测试在太平洋地区的美国认可的高等教育机构中使用，以免除，排除或将学生安置在其发展英语课程中。两种验证结构——凯恩(1992、1994)的解释模型和巴赫曼(2005)、巴赫曼和帕尔默(2010)的评估使用论证——的混合，以及广泛的证据类型和来源，以确保对考试成绩解释和考试利用的平衡关注。结果严重怀疑该工具在当地情况下的有效性。此外，研究结果提供了宝贵的见解，说明不根据当地情况评估评估的有效性的危险、用于安置的标准化测试的相对优势和劣势，以及基于论证的验证的价值。

{"title":"Accuplacer Companion in a foreign language context: An argument-based validation of both test score meaning and impact","authors":"Robert C. Johnson, A. Riazi","doi":"10.58379/vavb1448","DOIUrl":"https://doi.org/10.58379/vavb1448","url":null,"abstract":"Use of a single, standardised instrument to make high-stakes decisions about test-takers is pervasive in higher education around the world, including English as a foreign language (EFL) contexts. Contrary to longstanding best practices, however, few test users endeavour to meaningfully validate the instrument(s) they use for their specific context and purposes. This study reports efforts to validate a standardised placement test, used in a US-accredited, higher education institution in the Pacific, to exempt, exclude, or place students within its Developmental English Program. A hybrid of two validation structures – Kane’s (1992, 1994) interpretive model and Bachman’s (2005) and Bachman and Palmer’s (2010) assessment use argument – and a broad range of types and sources of evidence were used to ensure a balanced focus on both test score interpretation and test utilisation. Outcomes establish serious doubt as to the validity of the instrument for the local context. Moreover, results provide valuable insights regarding the dangers of not evaluating the validity of an assessment for the local context, the relative strengths and weaknesses of standardised tests used for placement, and the value of argument-based validation.","PeriodicalId":29650,"journal":{"name":"Studies in Language Assessment","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81984060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Studies in Language Assessment

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀