首页 > 最新文献

Practical Assessment, Research and Evaluation最新文献

英文 中文
Getting Lucky: How Guessing Threatens the Validity of Performance Classifications 运气:猜测如何威胁绩效分类的有效性
Q2 Social Sciences Pub Date : 2016-02-01 DOI: 10.7275/1G6P-4Y79
B. P. Foley
There is always a chance that examinees will answer multiple choice (MC) items correctly by guessing. Design choices in some modern exams have created situations where guessing at random through the full exam—rather than only for a subset of items where the examinee does not know the answer— can be an effective strategy to pass the exam. This paper describes two case studies to illustrate this problem, discusses test development decisions that can help address the situation, and provides recommendations to testing professionals to help identify when guessing at random can be an effective strategy to pass the exam.
考生总是有机会通过猜测答对选择题。在一些现代考试中的设计选择创造了在整个考试中随机猜测的情况,而不是只对考生不知道答案的部分题目进行猜测,这是通过考试的有效策略。本文描述了两个案例研究来说明这个问题,讨论了可以帮助解决这种情况的测试开发决策,并为测试专业人员提供了建议,以帮助确定随机猜测何时可以成为通过考试的有效策略。
{"title":"Getting Lucky: How Guessing Threatens the Validity of Performance Classifications","authors":"B. P. Foley","doi":"10.7275/1G6P-4Y79","DOIUrl":"https://doi.org/10.7275/1G6P-4Y79","url":null,"abstract":"There is always a chance that examinees will answer multiple choice (MC) items correctly by guessing. Design choices in some modern exams have created situations where guessing at random through the full exam—rather than only for a subset of items where the examinee does not know the answer— can be an effective strategy to pass the exam. This paper describes two case studies to illustrate this problem, discusses test development decisions that can help address the situation, and provides recommendations to testing professionals to help identify when guessing at random can be an effective strategy to pass the exam.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87713731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Tutorial on Using Regression Models with Count Outcomes Using R. 使用R使用回归模型计数结果教程。
Q2 Social Sciences Pub Date : 2016-02-01 DOI: 10.7275/PJ8C-H254
A Alexander Beaujean, G. Morgan
Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix. Count variables such as number of times a student reached a goal, discipline referrals, and absences are ubiquitous in school settings. After a review of published single-case design studies Shadish and Sullivan (2011) recently concluded that nearly all outcome variables were some form of a count. Yet, most analyses they reviewed used traditional data analysis methods designed for normally-distributed continuous data.
教育研究人员经常研究计数变量,如学生达到目标的次数,纪律推荐和缺勤。大多数研究这些变量的研究人员使用典型的回归方法(即普通最小二乘),要么转换计数变量,要么不转换计数变量。在任何一种情况下,对计数数据使用典型回归都可能产生有偏差的参数估计,从而减少从这些数据得出的任何推论。由于在培训项目中很少教授计数变量回归模型,我们提出了一个教程来帮助教育研究者在他们自己的研究中使用这些方法。我们演示了使用泊松、负二项、零膨胀泊松和零膨胀负二项回归模型分析和解释计数数据。以学生逃课次数为例,介绍了计数回归方法。本示例的数据可以免费获得,附录中包含了运行示例分析所使用的R语法。计算变量,如学生达到目标的次数,纪律推荐和缺勤在学校环境中无处不在。在回顾了已发表的单例设计研究后,Shadish和Sullivan(2011)最近得出结论,几乎所有的结果变量都是某种形式的计数。然而,他们回顾的大多数分析都使用了传统的数据分析方法,这些方法是为正态分布的连续数据设计的。
{"title":"Tutorial on Using Regression Models with Count Outcomes Using R.","authors":"A Alexander Beaujean, G. Morgan","doi":"10.7275/PJ8C-H254","DOIUrl":"https://doi.org/10.7275/PJ8C-H254","url":null,"abstract":"Education researchers often study count variables, such as times a student reached a goal, discipline referrals, and absences. Most researchers that study these variables use typical regression methods (i.e., ordinary least-squares) either with or without transforming the count variables. In either case, using typical regression for count data can produce parameter estimates that are biased, thus diminishing any inferences made from such data. As count-variable regression models are seldom taught in training programs, we present a tutorial to help educational researchers use such methods in their own research. We demonstrate analyzing and interpreting count data using Poisson, negative binomial, zero-inflated Poisson, and zero-inflated negative binomial regression models. The count regression methods are introduced through an example using the number of times students skipped class. The data for this example are freely available and the R syntax used run the example analyses are included in the Appendix. Count variables such as number of times a student reached a goal, discipline referrals, and absences are ubiquitous in school settings. After a review of published single-case design studies Shadish and Sullivan (2011) recently concluded that nearly all outcome variables were some form of a count. Yet, most analyses they reviewed used traditional data analysis methods designed for normally-distributed continuous data.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2016-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91216241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Methods for Examining the Psychometric Quality of Subscores: A Review and Application. 测验分值心理测量质量的检验方法:综述与应用。
Q2 Social Sciences Pub Date : 2015-11-01 DOI: 10.7275/NG3Q-0D19
Jonathan Wedman, Per-Erik Lyrén
When subscores on a test are reported to the test taker, the appropriateness of reporting them depends on whether they provide useful information above what is provided by the total score. Subscore ...
当一个测试的子分数被报告给考生时,报告的适当性取决于它们是否提供了比总分更有用的信息。Subscore……
{"title":"Methods for Examining the Psychometric Quality of Subscores: A Review and Application.","authors":"Jonathan Wedman, Per-Erik Lyrén","doi":"10.7275/NG3Q-0D19","DOIUrl":"https://doi.org/10.7275/NG3Q-0D19","url":null,"abstract":"When subscores on a test are reported to the test taker, the appropriateness of reporting them depends on whether they provide useful information above what is provided by the total score. Subscore ...","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75259214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
RMP Evaluations, Course Easiness, and Grades: Are they Related? RMP评估、课程难易程度和成绩:它们是否相关?
Q2 Social Sciences Pub Date : 2015-10-01 DOI: 10.7275/914Z-7K31
S. A. Rizvi
This paper investigates the relationship between the student evaluations of the instructors at the RateMyProfessors.com (RMP) website and the average grades awarded by those instructors. As of Spring 2012, the RMP site included evaluations of 538 full-and part-time instructors at the College of Staten Island (CSI). We selected the evaluations of the 419 instructors who taught at CSI for at least two semesters from Fall 2009 to Spring 2011 and had at least ten evaluations. This research indicates that there is a strong correlation between RMP’s overall evaluation and easiness scores. However, the perceived easiness of an instructor/course does not always result in higher grades for students. Furthermore, we found that the instructors who received high overall evaluation and easiness scores (4.0 to 5.0) at the RMP site do not necessarily award high grades. This is a very important finding as it disputes the argument that instructors receive high evaluations because they are easy or award high grades. On the other hand, instructors of the courses that are perceived to be difficult (RMP easiness score of 3.0 or less) are likely to be tough graders. However, instructors who received moderate overall evaluation and easiness scores (between 3.0 and 4.0) the RMP site had a high correlation between these scores and average grade awarded by those instructors. Finally, our research shows that the instructors in non-STEM disciplines award higher grades than the instructors in STEM disciplines. Non-STEM instructors also received higher overall evaluations than their STEM counterparts and non-STEM courses were perceived easier by the students than STEM courses.
本文研究了RateMyProfessors.com (RMP)网站上教师的学生评价与教师平均成绩之间的关系。截至2012年春季,RMP网站对史坦顿岛学院(CSI)的538名全职和兼职教师进行了评估。我们选取了从2009年秋季到2011年春季在CSI任教至少两个学期的419名教师进行评估,并进行了至少10次评估。本研究表明,RMP的总体评价与轻松度得分之间存在较强的相关性。然而,老师/课程的轻松并不总是给学生带来更高的成绩。此外,我们发现在RMP网站上获得高综合评价和轻松得分(4.0至5.0)的教师不一定会给予高分。这是一个非常重要的发现,因为它反驳了教师获得高评价是因为他们容易或给高分的观点。另一方面,那些被认为是困难的课程(RMP容易度得分为3.0或更低)的教师可能是严厉的评分者。然而,在RMP网站上获得中等总体评价和轻松得分(在3.0到4.0之间)的教师,这些得分与教师的平均得分之间有很高的相关性。最后,我们的研究表明,非STEM学科的教师比STEM学科的教师给予更高的分数。非STEM教师的总体评价也高于STEM教师,学生认为非STEM课程比STEM课程更容易。
{"title":"RMP Evaluations, Course Easiness, and Grades: Are they Related?","authors":"S. A. Rizvi","doi":"10.7275/914Z-7K31","DOIUrl":"https://doi.org/10.7275/914Z-7K31","url":null,"abstract":"This paper investigates the relationship between the student evaluations of the instructors at the RateMyProfessors.com (RMP) website and the average grades awarded by those instructors. As of Spring 2012, the RMP site included evaluations of 538 full-and part-time instructors at the College of Staten Island (CSI). We selected the evaluations of the 419 instructors who taught at CSI for at least two semesters from Fall 2009 to Spring 2011 and had at least ten evaluations. This research indicates that there is a strong correlation between RMP’s overall evaluation and easiness scores. However, the perceived easiness of an instructor/course does not always result in higher grades for students. Furthermore, we found that the instructors who received high overall evaluation and easiness scores (4.0 to 5.0) at the RMP site do not necessarily award high grades. This is a very important finding as it disputes the argument that instructors receive high evaluations because they are easy or award high grades. On the other hand, instructors of the courses that are perceived to be difficult (RMP easiness score of 3.0 or less) are likely to be tough graders. However, instructors who received moderate overall evaluation and easiness scores (between 3.0 and 4.0) the RMP site had a high correlation between these scores and average grade awarded by those instructors. Finally, our research shows that the instructors in non-STEM disciplines award higher grades than the instructors in STEM disciplines. Non-STEM instructors also received higher overall evaluations than their STEM counterparts and non-STEM courses were perceived easier by the students than STEM courses.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89547310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Real Cost-Benefit Analysis Is Needed in American Public Education. 美国公共教育需要真正的成本效益分析。
Q2 Social Sciences Pub Date : 2015-07-01 DOI: 10.7275/T2BA-A657
Bert D. Stoneberg
Public school critics often point to rising expenditures and relatively flat test scores to justify their school reform agendas. The claims are flawed because their analyses fail to account for the difference in data types between dollars (ratio) and test scores (interval). A cost-benefit analysis using dollars as a common metric for both costs and benefits can provide a good estimate of their relationship. It also acknowledges that costs and benefits are both subject to inflation. The National Center for Education Research administers a methods training program for researchers who want to know more about cost-benefit analyses on education policies and programs.
公立学校的批评者经常以不断增加的支出和相对平淡的考试成绩来为他们的学校改革议程辩护。这种说法是有缺陷的,因为他们的分析没有考虑到美元(比率)和考试分数(间隔)之间数据类型的差异。成本效益分析使用美元作为成本和收益的通用度量,可以很好地估计它们之间的关系。它还承认成本和收益都受到通货膨胀的影响。国家教育研究中心为想要更多了解教育政策和项目的成本效益分析的研究人员管理一个方法培训项目。
{"title":"Real Cost-Benefit Analysis Is Needed in American Public Education.","authors":"Bert D. Stoneberg","doi":"10.7275/T2BA-A657","DOIUrl":"https://doi.org/10.7275/T2BA-A657","url":null,"abstract":"Public school critics often point to rising expenditures and relatively flat test scores to justify their school reform agendas. The claims are flawed because their analyses fail to account for the difference in data types between dollars (ratio) and test scores (interval). A cost-benefit analysis using dollars as a common metric for both costs and benefits can provide a good estimate of their relationship. It also acknowledges that costs and benefits are both subject to inflation. The National Center for Education Research administers a methods training program for researchers who want to know more about cost-benefit analyses on education policies and programs.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84705136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Linking Errors between Two Populations and Tests: A Case Study in International Surveys in Education. 两个群体和测试之间的错误联系:国际教育调查的案例研究。
Q2 Social Sciences Pub Date : 2015-06-01 DOI: 10.7275/YK4S-0A49
D. Hastedt, Deana Desa
This simulation study was prompted by the current increased interest in linking national studies to international large-scale assessments (ILSAs) such as IEA’s TIMSS, IEA’s PIRLS, and OECD’s PISA. Linkage in this scenario is achieved by including items from the international assessments in the national assessments on the premise that the average achievement scores from the latter can be linked to the international metric. In addition to raising issues associated with different testing conditions, administrative procedures, and the like, this approach also poses psychometric challenges. This paper endeavors to shed some light on the effects that can be expected, the linkage errors in particular, by countries using this practice. The ILSA selected for this simulation study was IEA TIMSS 2011, and the three countries used as the national assessment cases were Botswana, Honduras, and Tunisia, all of which participated in TIMSS 2011. The items selected as items common to the simulated national tests and the international test came from the Grade 4 TIMSS 2011 mathematics items that IEA released into the public domain after completion of this assessment. The findings of the current study show that linkage errors seemed to achieve acceptable levels if 30 or more items were used for the linkage, although the errors were still significantly higher compared to the TIMSS’ cutoffs. Comparison of the estimated country averages based on the simulated national surveys and the averages based on the international TIMSS assessment revealed only one instance across the three countries of the estimates approaching parity. Also, the percentages of students in these countries who actually reached the defined benchmarks on the TIMSS achievement scale differed significantly from the results based on TIMSS and the results for the simulated national assessments. As a conclusion, we advise against using groups of released items from international assessments in national assessments in order to link the results of the former to the latter.
这项模拟研究是由于目前人们对将国家研究与国际大规模评估(ilsa)联系起来的兴趣日益增加,例如国际能源署的TIMSS、国际能源署的PIRLS和经合组织的PISA。这种情况下的联系是通过在国家评估中包括国际评估的项目来实现的,前提是后者的平均成绩分数可以与国际指标联系起来。除了提出与不同测试条件、管理程序等相关的问题外,这种方法还带来了心理测量学方面的挑战。本文试图阐明使用这种做法的国家可能预期的影响,特别是联系误差。本次模拟研究选择的ILSA是IEA TIMSS 2011,作为国家评估案例的三个国家是博茨瓦纳、洪都拉斯和突尼斯,这三个国家都参加了TIMSS 2011。本次模拟国家测试和国际测试中所选择的共同项目来自于国际能源署在本次评估完成后对外发布的TIMSS 2011年四级数学项目。目前的研究结果表明,如果使用30个或更多的项目进行联系,联系误差似乎达到了可接受的水平,尽管与TIMSS的截止值相比,错误仍然明显更高。根据模拟国家调查估计的国家平均值与根据国际TIMSS评估估计的平均值的比较显示,三个国家中只有一个估计数接近等值的情况。此外,在这些国家中,实际达到TIMSS成就量表定义基准的学生百分比与基于TIMSS的结果和模拟国家评估的结果有很大不同。作为结论,我们建议不要在国家评估中使用国际评估中公布的项目组,以便将前者的结果与后者联系起来。
{"title":"Linking Errors between Two Populations and Tests: A Case Study in International Surveys in Education.","authors":"D. Hastedt, Deana Desa","doi":"10.7275/YK4S-0A49","DOIUrl":"https://doi.org/10.7275/YK4S-0A49","url":null,"abstract":"This simulation study was prompted by the current increased interest in linking national studies to international large-scale assessments (ILSAs) such as IEA’s TIMSS, IEA’s PIRLS, and OECD’s PISA. Linkage in this scenario is achieved by including items from the international assessments in the national assessments on the premise that the average achievement scores from the latter can be linked to the international metric. In addition to raising issues associated with different testing conditions, administrative procedures, and the like, this approach also poses psychometric challenges. This paper endeavors to shed some light on the effects that can be expected, the linkage errors in particular, by countries using this practice. The ILSA selected for this simulation study was IEA TIMSS 2011, and the three countries used as the national assessment cases were Botswana, Honduras, and Tunisia, all of which participated in TIMSS 2011. The items selected as items common to the simulated national tests and the international test came from the Grade 4 TIMSS 2011 mathematics items that IEA released into the public domain after completion of this assessment. The findings of the current study show that linkage errors seemed to achieve acceptable levels if 30 or more items were used for the linkage, although the errors were still significantly higher compared to the TIMSS’ cutoffs. Comparison of the estimated country averages based on the simulated national surveys and the averages based on the international TIMSS assessment revealed only one instance across the three countries of the estimates approaching parity. Also, the percentages of students in these countries who actually reached the defined benchmarks on the TIMSS achievement scale differed significantly from the results based on TIMSS and the results for the simulated national assessments. As a conclusion, we advise against using groups of released items from international assessments in national assessments in order to link the results of the former to the latter.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74501819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Introduction to Missing Data in the Context of Differential Item Functioning. 差异项目功能背景下缺失数据的介绍。
Q2 Social Sciences Pub Date : 2015-04-01 DOI: 10.7275/FPG0-5079
Kathleen P Banks
This article introduces practitioners and researchers to the topic of missing data in the context of differential item functioning (DIF), reviews the current literature on the issue, discusses implications of the review, and offers suggestions for future research. A total of nine studies were reviewed. All of these studies determined what effect particular missing data techniques would have on the results of certain DIF detection procedures under various conditions. The most important finding of this review involved the use of zero imputation as a missing data technique. The review shows that zero imputation can lead to inflated Type I errors, especially in cases where the examinees ability level has not been taken into consideration.
本文介绍了从业者和研究人员对差异项目功能(DIF)背景下缺失数据的主题,回顾了目前关于这一问题的文献,讨论了综述的意义,并提出了对未来研究的建议。共回顾了9项研究。所有这些研究都确定了在各种条件下,特定缺失数据技术对某些DIF检测程序的结果会产生什么影响。本综述最重要的发现包括使用零输入作为缺失数据技术。审查表明,零归因会导致I类错误膨胀,特别是在没有考虑考生能力水平的情况下。
{"title":"An Introduction to Missing Data in the Context of Differential Item Functioning.","authors":"Kathleen P Banks","doi":"10.7275/FPG0-5079","DOIUrl":"https://doi.org/10.7275/FPG0-5079","url":null,"abstract":"This article introduces practitioners and researchers to the topic of missing data in the context of differential item functioning (DIF), reviews the current literature on the issue, discusses implications of the review, and offers suggestions for future research. A total of nine studies were reviewed. All of these studies determined what effect particular missing data techniques would have on the results of certain DIF detection procedures under various conditions. The most important finding of this review involved the use of zero imputation as a missing data technique. The review shows that zero imputation can lead to inflated Type I errors, especially in cases where the examinees ability level has not been taken into consideration.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86561395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Interrater Reliability in Large-Scale Assessments--Can Teachers Score National Tests Reliably without External Controls?. 大规模评估中的互估者信度——教师能否在没有外部控制的情况下可靠地评分国家考试?
Q2 Social Sciences Pub Date : 2015-04-01 DOI: 10.7275/Y2EN-ZM89
Anna Lind Pantzare
In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examin ...
在大多数大型评估系统中,为了保证互传器可靠性的质量,实施了一套相当昂贵的外部质量控制。本研究实证检验……
{"title":"Interrater Reliability in Large-Scale Assessments--Can Teachers Score National Tests Reliably without External Controls?.","authors":"Anna Lind Pantzare","doi":"10.7275/Y2EN-ZM89","DOIUrl":"https://doi.org/10.7275/Y2EN-ZM89","url":null,"abstract":"In most large-scale assessment systems a set of rather expensive external quality controls are implemented in order to guarantee the quality of interrater reliability. This study empirically examin ...","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76298766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
What Is Your Teacher Rubric? Extracting Teachers' Assessment Constructs. 你的教师准则是什么?教师评价构念的提取。
Q2 Social Sciences Pub Date : 2015-03-01 DOI: 10.7275/M3SA-P692
Heejeong Jeong
Rubrics not only document the scales and criteria of what is assessed, but can also represent the assessment construct of the developer. Rubrics display the key assessment criteria, and the simplicity or complexity of the rubric can illustrate the meaning associated with the score. For this study, five experienced teachers developed a rubric for an EFL (English as a Foreign Language) descriptive writing task. Results show that even for the same task, teachers developed different formats and styles of rubric with both similar and different criteria. The teacher rubrics were analyzed for assessment criteria, rubric type and scale type. Findings illustrate that in terms of criteria, all teacher rubrics had five areas in common: comprehension, paragraph structure, sentence structure, vocabulary, and grammar. The criteria that varied were mechanics, length, task completion, and selfcorrection. Rubric style and scales also were different among teachers. Teachers who valued global concerns (i.e., comprehension) in writing designed more general holistic rubrics, while teachers who focused more on sentence-level concerns (i.e., grammar) developed analytic rubrics with more details. The assessment construct of the teacher was shown in the rubric through assessment criteria, rubric style, and scale.
规则不仅记录了评估的尺度和标准,而且还可以表示开发人员的评估结构。指标显示了关键的评估标准,指标的简单性或复杂性可以说明与分数相关的含义。在这项研究中,五位经验丰富的教师为EFL(英语作为外语)描述性写作任务制定了一个标准。结果表明,即使是同一任务,教师也会在相似和不同的标准下制定不同的格式和风格的标题。对教师评语进行了评价标准、评语类型和量表类型的分析。研究结果表明,就标准而言,所有教师的标准有五个共同的方面:理解,段落结构,句子结构,词汇和语法。不同的标准是机制、长度、任务完成情况和自我纠正。教师之间的评语风格和尺度也存在差异。在写作中重视全局关注(即理解)的教师设计了更一般的整体规则,而更关注句子层面关注(即语法)的教师开发了更详细的分析规则。教师的评价结构通过评价标准、评价风格、评价量表在评价量表中表现出来。
{"title":"What Is Your Teacher Rubric? Extracting Teachers' Assessment Constructs.","authors":"Heejeong Jeong","doi":"10.7275/M3SA-P692","DOIUrl":"https://doi.org/10.7275/M3SA-P692","url":null,"abstract":"Rubrics not only document the scales and criteria of what is assessed, but can also represent the assessment construct of the developer. Rubrics display the key assessment criteria, and the simplicity or complexity of the rubric can illustrate the meaning associated with the score. For this study, five experienced teachers developed a rubric for an EFL (English as a Foreign Language) descriptive writing task. Results show that even for the same task, teachers developed different formats and styles of rubric with both similar and different criteria. The teacher rubrics were analyzed for assessment criteria, rubric type and scale type. Findings illustrate that in terms of criteria, all teacher rubrics had five areas in common: comprehension, paragraph structure, sentence structure, vocabulary, and grammar. The criteria that varied were mechanics, length, task completion, and selfcorrection. Rubric style and scales also were different among teachers. Teachers who valued global concerns (i.e., comprehension) in writing designed more general holistic rubrics, while teachers who focused more on sentence-level concerns (i.e., grammar) developed analytic rubrics with more details. The assessment construct of the teacher was shown in the rubric through assessment criteria, rubric style, and scale.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76579259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Defining and Measuring Academic Success. 定义和衡量学业成功。
Q2 Social Sciences Pub Date : 2015-03-01 DOI: 10.7275/HZ5X-TX03
Travis T. York, Charles W. Gibson, Susan Rankin
Despite, and perhaps because of its amorphous nature, the term ‘academic success’ is one of the most widely used constructs in educational research and assessment within higher education. This paper conducts an analytic literature review to examine the use and operationalization of the term in multiple academic fields. Dominant definitions of the term are conceptually evaluated using Astin’s I-E-O model resulting in the proposition of a revised definition and new conceptual model of academic success. Measurements of academic success found throughout the literature are presented in accordance with the presented model of academic success. These measurements are provided with details in a user-friendly table (Appendix B). Results also indicate that grades and GPA are the most commonly used measure of academic success. Finally, recommendations are given for future research and practice to increase effective assessment of academic success.
尽管如此,也许正是因为其无定形的本质,“学业成功”这个术语是高等教育研究和评估中最广泛使用的概念之一。本文通过分析文献综述来考察该术语在多个学术领域的使用和运作。该术语的主流定义使用Astin的I-E-O模型进行概念性评估,从而提出了修订定义和新的学术成功概念模型。在整个文献中发现的学术成功的测量是根据所提出的学术成功模型提出的。这些衡量标准在一个用户友好的表格中提供了详细信息(附录B)。结果还表明,成绩和GPA是最常用的学术成功衡量标准。最后,对未来的研究和实践提出了建议,以提高学业成功的有效评估。
{"title":"Defining and Measuring Academic Success.","authors":"Travis T. York, Charles W. Gibson, Susan Rankin","doi":"10.7275/HZ5X-TX03","DOIUrl":"https://doi.org/10.7275/HZ5X-TX03","url":null,"abstract":"Despite, and perhaps because of its amorphous nature, the term ‘academic success’ is one of the most widely used constructs in educational research and assessment within higher education. This paper conducts an analytic literature review to examine the use and operationalization of the term in multiple academic fields. Dominant definitions of the term are conceptually evaluated using Astin’s I-E-O model resulting in the proposition of a revised definition and new conceptual model of academic success. Measurements of academic success found throughout the literature are presented in accordance with the presented model of academic success. These measurements are provided with details in a user-friendly table (Appendix B). Results also indicate that grades and GPA are the most commonly used measure of academic success. Finally, recommendations are given for future research and practice to increase effective assessment of academic success.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74897889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 406
期刊
Practical Assessment, Research and Evaluation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1