首页 > 最新文献

Practical Assessment, Research and Evaluation最新文献

英文 中文
An Application of the Partial Credit IRT Model in Identifying Benchmarks for Polytomous Rating Scale Instruments. 部分信用IRT模型在多重评定量表工具基准识别中的应用。
Q2 Social Sciences Pub Date : 2018-05-01 DOI: 10.7275/1cf3-aq56
Enis Dogan
Several large scale assessments include student, teacher, and school background questionnaires. Results from such questionnaires can be reported for each item separately, or as indices based on aggregation of multiple items into a scale. Interpreting scale scores is not always an easy task though. In disseminating results of achievement tests, one solution to this conundrum is to identify cut scores on the reporting scale in order to divide it into achievement levels that correspond to distinct knowledge and skill profiles. This allows for the reporting of the percentage of students at each achievement level in addition to average scale scores. Dividing a scale into meaningful segments can, and perhaps should, be done to enrich interpretability of scales based on questionnaire items as well. This article illustrates an approach based on an application of Item Response Theory (IRT) to accomplish this. The application is demonstrated with a polytomous rating scale instrument designed to measure students’ sense of school belonging.
几项大规模评估包括学生、教师和学校背景调查问卷。这些问卷的结果可以单独报告每个项目,也可以将多个项目汇总成一个量表作为指标。然而,解释量表得分并不总是一件容易的事。在传播成绩测试结果时,解决这一难题的一个办法是确定报告量表上的最低分数,以便根据不同的知识和技能概况将其划分为不同的成绩水平。除了平均分数外,还可以报告每个成绩级别的学生百分比。将量表划分为有意义的部分可以,也许应该这样做,以丰富基于问卷项目的量表的可解释性。本文阐述了一种基于项目反应理论(IRT)应用的方法来实现这一目标。应用多元评等量表来衡量学生的学校归属感。
{"title":"An Application of the Partial Credit IRT Model in Identifying Benchmarks for Polytomous Rating Scale Instruments.","authors":"Enis Dogan","doi":"10.7275/1cf3-aq56","DOIUrl":"https://doi.org/10.7275/1cf3-aq56","url":null,"abstract":"Several large scale assessments include student, teacher, and school background questionnaires. Results from such questionnaires can be reported for each item separately, or as indices based on aggregation of multiple items into a scale. Interpreting scale scores is not always an easy task though. In disseminating results of achievement tests, one solution to this conundrum is to identify cut scores on the reporting scale in order to divide it into achievement levels that correspond to distinct knowledge and skill profiles. This allows for the reporting of the percentage of students at each achievement level in addition to average scale scores. Dividing a scale into meaningful segments can, and perhaps should, be done to enrich interpretability of scales based on questionnaire items as well. This article illustrates an approach based on an application of Item Response Theory (IRT) to accomplish this. The application is demonstrated with a polytomous rating scale instrument designed to measure students’ sense of school belonging.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82243844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
From Simulation to Implementation: Two CAT Case Studies 从模拟到实现:两个CAT案例研究
Q2 Social Sciences Pub Date : 2018-01-01 DOI: 10.7275/BWVG-D091
John J. Barnard
Measurement specialists strive to shorten assessment time without compromising precision of scores. Computerized Adaptive Testing (CAT) has rapidly gained ground over the past decades to fulfill this goal. However, parameters for implementation of CATs need to be explored in simulations before implementation so that it can be determined whether expectations can be met. CATs can become costly if trial-and-error strategies are followed and especially if constraints are included in the algorithms, simulations can save time and money. In this study it was found that for both a multiplechoice question test and a rating scale questionnaire, simulations not only predicted outcomes for CATs very well, but also illustrated the efficiency of CATs when compared to fixed length tests.
测量专家努力缩短评估时间而不影响分数的精度。计算机化自适应测试(CAT)在过去几十年中迅速取得了进展,以实现这一目标。然而,在实施之前,需要在模拟中探索实施cat的参数,以便确定是否可以满足期望。如果遵循试错策略,特别是在算法中包含约束的情况下,cat可能会变得昂贵,模拟可以节省时间和金钱。本研究发现,对于选择题测试和评定量表问卷,模拟不仅可以很好地预测cat的结果,而且可以说明与固定长度测试相比,cat的效率更高。
{"title":"From Simulation to Implementation: Two CAT Case Studies","authors":"John J. Barnard","doi":"10.7275/BWVG-D091","DOIUrl":"https://doi.org/10.7275/BWVG-D091","url":null,"abstract":"Measurement specialists strive to shorten assessment time without compromising precision of scores. Computerized Adaptive Testing (CAT) has rapidly gained ground over the past decades to fulfill this goal. However, parameters for implementation of CATs need to be explored in simulations before implementation so that it can be determined whether expectations can be met. CATs can become costly if trial-and-error strategies are followed and especially if constraints are included in the algorithms, simulations can save time and money. In this study it was found that for both a multiplechoice question test and a rating scale questionnaire, simulations not only predicted outcomes for CATs very well, but also illustrated the efficiency of CATs when compared to fixed length tests.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80229329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Fairness Concerns of Discrete Option Multiple Choice Items. 离散选项多项选择项目的公平性问题。
Q2 Social Sciences Pub Date : 2018-01-01 DOI: 10.7275/JBRV-4E93
Carol Eckerly, Russell J. Smith, J. Sowles
The Discrete Option Multiple Choice (DOMC) item format was introduced by Foster and Miller (2009) with the intent of improving the security of test content. However, by changing the amount and order of the content presented, the test taking experience varies by test taker, thereby introducing potential fairness issues. In this paper we investigated fairness concerns by evaluating the impact on test takers of the differing testing experiences when items are administered in the DOMC format. Specifically, we described the impact of the presentation order of the key on item difficulty and discrimination as well as the cumulative impact at the test level. We recommend not including DOMC items in exams until the methodology of scoring test takers on these items is revised to address specific fairness concerns identified in this paper.
离散选项多项选择(DOMC)项目格式是由Foster和Miller(2009)提出的,目的是提高测试内容的安全性。然而,通过改变内容的数量和顺序,考生的考试体验会有所不同,从而引入潜在的公平问题。在本文中,我们通过评估当项目以DOMC格式管理时不同测试体验对考生的影响来调查公平问题。具体而言,我们在测试层面上描述了键的呈现顺序对项目难度和辨别力的影响以及累积影响。我们建议不要在考试中包括DOMC项目,直到对这些项目的评分方法进行修改,以解决本文中确定的具体公平问题。
{"title":"Fairness Concerns of Discrete Option Multiple Choice Items.","authors":"Carol Eckerly, Russell J. Smith, J. Sowles","doi":"10.7275/JBRV-4E93","DOIUrl":"https://doi.org/10.7275/JBRV-4E93","url":null,"abstract":"The Discrete Option Multiple Choice (DOMC) item format was introduced by Foster and Miller (2009) with the intent of improving the security of test content. However, by changing the amount and order of the content presented, the test taking experience varies by test taker, thereby introducing potential fairness issues. In this paper we investigated fairness concerns by evaluating the impact on test takers of the differing testing experiences when items are administered in the DOMC format. Specifically, we described the impact of the presentation order of the key on item difficulty and discrimination as well as the cumulative impact at the test level. We recommend not including DOMC items in exams until the methodology of scoring test takers on these items is revised to address specific fairness concerns identified in this paper.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87440946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A Comparison of Subject Matter Experts’ Perceptions and Job Analysis Surveys 主题专家认知与职业分析调查之比较
Q2 Social Sciences Pub Date : 2018-01-01 DOI: 10.7275/7DEY-ZD62
Adam E. Wyse, Ben Babcock
Two common approaches for performing job analysis in credentialing programs are committee-based methods, which rely solely on subject matter experts’ judgments, and task inventory surveys. This study evaluates how well subject matter experts’ perceptions coincide with task inventory survey results for three credentialing programs. Results suggest that subject matter expert ratings differ in systematic ways from task inventory survey results and that task lists generated based solely on subject matter experts’ intuitions generally lead to narrower task lists. Results also indicated that there can be key differences for procedures and non-procedures, with subject matter experts’ judgments often tending to exhibit lower agreement levels with task inventory survey results for procedures than for non-procedures. We recommend that organizations performing job analyses think very carefully before relying solely on subject matter experts’ judgments as their primary method of job analysis.
在资格认证项目中进行工作分析的两种常见方法是基于委员会的方法,它完全依赖于主题专家的判断,以及任务清单调查。本研究评估了主题专家的看法与三个认证项目的任务清单调查结果相吻合的程度。结果表明,主题专家评级与任务清单调查结果在系统方式上存在差异,并且仅基于主题专家的直觉生成的任务列表通常会导致更窄的任务列表。结果还表明,程序和非程序之间可能存在关键差异,与非程序相比,主题专家的判断往往倾向于对程序任务清单调查结果表现出较低的一致性。我们建议进行职业分析的组织在完全依赖主题专家的判断作为其职业分析的主要方法之前,要非常仔细地考虑。
{"title":"A Comparison of Subject Matter Experts’ Perceptions and Job Analysis Surveys","authors":"Adam E. Wyse, Ben Babcock","doi":"10.7275/7DEY-ZD62","DOIUrl":"https://doi.org/10.7275/7DEY-ZD62","url":null,"abstract":"Two common approaches for performing job analysis in credentialing programs are committee-based methods, which rely solely on subject matter experts’ judgments, and task inventory surveys. This study evaluates how well subject matter experts’ perceptions coincide with task inventory survey results for three credentialing programs. Results suggest that subject matter expert ratings differ in systematic ways from task inventory survey results and that task lists generated based solely on subject matter experts’ intuitions generally lead to narrower task lists. Results also indicated that there can be key differences for procedures and non-procedures, with subject matter experts’ judgments often tending to exhibit lower agreement levels with task inventory survey results for procedures than for non-procedures. We recommend that organizations performing job analyses think very carefully before relying solely on subject matter experts’ judgments as their primary method of job analysis.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81072107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research. 随机森林在机构研究中替代回归的预测分析方法。
Q2 Social Sciences Pub Date : 2018-01-01 DOI: 10.7275/1WPR-M024
Lingjun He, R. Levine, J. Fan, Joshua Beemer, Jeanne Stronach
In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of random forest in circumstances where the regression assumptions are often violated in big data applications. Random forest is a model averaging procedure where each tree is constructed based on a bootstrap sample of the data set. In particular, we emphasize the ease of application, low computational cost, high predictive accuracy, flexibility, and interpretability of random forest machinery. Our overall recommendation is that institutional researchers look beyond classical regression and single decision tree analytics tools, and consider random forest as the predominant method for prediction tasks. The proposed points of view are detailed and illustrated through a simulation experiment and analyses of data from real institutional research projects.
{"title":"Random Forest as a Predictive Analytics Alternative to Regression in Institutional Research.","authors":"Lingjun He, R. Levine, J. Fan, Joshua Beemer, Jeanne Stronach","doi":"10.7275/1WPR-M024","DOIUrl":"https://doi.org/10.7275/1WPR-M024","url":null,"abstract":"In institutional research, modern data mining approaches are seldom considered to address predictive analytics problems. The goal of this paper is to highlight the advantages of tree-based machine learning algorithms over classic (logistic) regression methods for data-informed decision making in higher education problems, and stress the success of random forest in circumstances where the regression assumptions are often violated in big data applications. Random forest is a model averaging procedure where each tree is constructed based on a bootstrap sample of the data set. In particular, we emphasize the ease of application, low computational cost, high predictive accuracy, flexibility, and interpretability of random forest machinery. Our overall recommendation is that institutional researchers look beyond classical regression and single decision tree analytics tools, and consider random forest as the predominant method for prediction tasks. The proposed points of view are detailed and illustrated through a simulation experiment and analyses of data from real institutional research projects.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78672506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 36
An Evaluation of Normal Versus Lognormal Distribution in Data Description and Empirical Analysis 正态分布与对数正态分布在数据描述和实证分析中的评价
Q2 Social Sciences Pub Date : 2017-12-21 DOI: 10.7275/0EAT-HB38
R. Diwakar
Many existing methods of statistical inference and analysis rely heavily on the assumption that the data are normally distributed. However, the normality assumption is not fulfilled when dealing with data which does not contain negative values or are otherwise skewed – a common occurrence in diverse disciplines such as finance, economics, political science, sociology, philology, biology and physical and industrial processes. In this situation, a lognormal distribution may better represent the data than the normal distribution. In this paper, I re-visit the key attributes of the normal and lognormal distributions, and demonstrate through an empirical analysis of the ‘number of political parties' in India, how logarithmic transformation can help in bringing a lognormally distributed data closer to a normal one. The paper also provides further empirical evidence to show that many variables of interest to political and other social scientists could be better modelled using the lognormal distribution. More generally, the paper emphasises the potential for improved description and empirical analysis of quantitative data by paying more attention to its distribution, and complements previous publications in Practical Research and Assessment Evaluation (PARE) on this subject.
许多现有的统计推断和分析方法严重依赖于数据正态分布的假设。然而,当处理不包含负值或以其他方式倾斜的数据时,常态性假设不被满足——这在金融、经济学、政治学、社会学、文献学、生物学、物理和工业过程等不同学科中很常见。在这种情况下,对数正态分布可能比正态分布更好地表示数据。在本文中,我重新审视了正态分布和对数正态分布的关键属性,并通过对印度“政党数量”的实证分析,证明对数变换如何有助于使对数正态分布的数据更接近正态分布。本文还提供了进一步的经验证据,表明许多政治和其他社会科学家感兴趣的变量可以使用对数正态分布更好地建模。更一般地说,本文强调了通过更多地关注其分布来改进定量数据的描述和实证分析的潜力,并补充了以前在《实践研究与评估评价》(PARE)上关于这一主题的出版物。
{"title":"An Evaluation of Normal Versus Lognormal Distribution in Data Description and Empirical Analysis","authors":"R. Diwakar","doi":"10.7275/0EAT-HB38","DOIUrl":"https://doi.org/10.7275/0EAT-HB38","url":null,"abstract":"Many existing methods of statistical inference and analysis rely heavily on the assumption that the data are normally distributed. However, the normality assumption is not fulfilled when dealing with data which does not contain negative values or are otherwise skewed – a common occurrence in diverse disciplines such as finance, economics, political science, sociology, philology, biology and physical and industrial processes. In this situation, a lognormal distribution may better represent the data than the normal distribution. In this paper, I re-visit the key attributes of the normal and lognormal distributions, and demonstrate through an empirical analysis of the ‘number of political parties' in India, how logarithmic transformation can help in bringing a lognormally distributed data closer to a normal one. The paper also provides further empirical evidence to show that many variables of interest to political and other social scientists could be better modelled using the lognormal distribution. More generally, the paper emphasises the potential for improved description and empirical analysis of quantitative data by paying more attention to its distribution, and complements previous publications in Practical Research and Assessment Evaluation (PARE) on this subject.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80622306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Developing Rubrics to Assess Complex (Generic) Skills in the Classroom: How to Distinguish Skills’ Mastery Levels? 制定评估课堂上复杂(通用)技能的标准:如何区分技能的掌握水平?
Q2 Social Sciences Pub Date : 2017-12-01 DOI: 10.7275/XFP0-8228
E. Rusman, K. Dirkx
Many schools use analytic rubrics to (formatively) assess complex, generic or transversal (21st century) skills, such as collaborating and presenting. In rubrics, performance indicators on different levels of mastering a skill (e.g., novice, practiced, advanced, talented) are described. However, the dimensions used to describe the different mastery levels vary within and across rubrics and are in many cases not consistent, concise and often trivial, thereby hampering the quality of rubrics used to learn and assess complex skills. In this study we reviewed 600 rubrics available in three international databases (Rubistar, For All Rubrics, i-rubrics) and analyzed the dimensions found within 12 strictly selected rubrics that are currently used to distinguish mastery levels and describe performance indicators for the skill 'collaboration' at secondary schools. These dimensions were subsequently defined and categorized. This resulted in 13 different dimensions, clustered in 6 categories, feasible for defining skills’ mastery levels in rubrics. The identified dimensions can specifically support both teachers and researchers to construct, review and investigate performance indicators for each mastery level of a complex skill. On a more general level, they can support analysis of the overall quality of analytic rubrics to (formatively) assess complex skills.
许多学校使用分析性标准来(形式化地)评估复杂的、通用的或横向的(21世纪的)技能,比如合作和展示。在标题中,描述了掌握一项技能的不同水平(例如,新手、熟练、高级、天才)的绩效指标。然而,用于描述不同掌握水平的维度在规则内部和不同的规则之间变化,并且在许多情况下不一致,简洁且通常是琐碎的,从而阻碍了用于学习和评估复杂技能的规则的质量。在这项研究中,我们回顾了三个国际数据库(Rubistar, For All rubrics, i-rubrics)中提供的600个标准,并分析了在12个严格选择的标准中发现的维度,这些标准目前用于区分掌握水平和描述中学技能“协作”的绩效指标。这些维度随后被定义和分类。这产生了13个不同的维度,聚集在6个类别中,可以用标准来定义技能的掌握水平。确定的维度可以专门支持教师和研究人员构建、审查和调查复杂技能的每个掌握水平的绩效指标。在更一般的层面上,它们可以支持分析规则的整体质量的分析,以(形式化地)评估复杂的技能。
{"title":"Developing Rubrics to Assess Complex (Generic) Skills in the Classroom: How to Distinguish Skills’ Mastery Levels?","authors":"E. Rusman, K. Dirkx","doi":"10.7275/XFP0-8228","DOIUrl":"https://doi.org/10.7275/XFP0-8228","url":null,"abstract":"Many schools use analytic rubrics to (formatively) assess complex, generic or transversal (21st century) skills, such as collaborating and presenting. In rubrics, performance indicators on different levels of mastering a skill (e.g., novice, practiced, advanced, talented) are described. However, the dimensions used to describe the different mastery levels vary within and across rubrics and are in many cases not consistent, concise and often trivial, thereby hampering the quality of rubrics used to learn and assess complex skills. In this study we reviewed 600 rubrics available in three international databases (Rubistar, For All Rubrics, i-rubrics) and analyzed the dimensions found within 12 strictly selected rubrics that are currently used to distinguish mastery levels and describe performance indicators for the skill 'collaboration' at secondary schools. These dimensions were subsequently defined and categorized. This resulted in 13 different dimensions, clustered in 6 categories, feasible for defining skills’ mastery levels in rubrics. The identified dimensions can specifically support both teachers and researchers to construct, review and investigate performance indicators for each mastery level of a complex skill. On a more general level, they can support analysis of the overall quality of analytic rubrics to (formatively) assess complex skills.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82741140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
The Miscalculation of Interrater Reliability: A Case Study Involving the AAC&U VALUE Rubrics. 互译器可靠性的误算:以AAC&U值准则为例。
Q2 Social Sciences Pub Date : 2017-12-01 DOI: 10.7275/Y36W-HG55
R. F. Szafran
Institutional assessment of student learning objectives has become a fact-of-life in American higher education and the Association of American Colleges and Universities’ (AAC&U) VALUE Rubrics have become a widely adopted evaluation and scoring tool for student work. As faculty from a variety of disciplines, some less familiar with the psychometric literature, are drawn into assessment roles, it is important to point out two easily made but serious errors in what might appear to be one of the more straightforward assessments of measurement quality—interrater reliability. The first error which can occur when a third rater is brought in to adjudicate a discrepancy in the scores reported by an initial two raters has been well-documented in the literature but never before illustrated with AAC&U rubrics. The second error is to cease training before the raters have demonstrated a satisfactory level of interrater reliability. This research note describes an actual case study in which the interrater reliability of the AAC&U rubrics was incorrectly reported and when correctly reported found to be inadequate. The note concludes with recommendations for the correct measurement of interrater reliability.
对学生学习目标的机构评估已经成为美国高等教育的一个现实,美国高校协会(AAC&U)的价值标准已经成为广泛采用的学生作业评估和评分工具。由于来自不同学科的教师,有些不太熟悉心理测量学文献,被吸引到评估角色中,重要的是要指出两个容易犯但严重的错误,这些错误可能是测量质量的一个更直接的评估-解释者可靠性。第一个错误可能发生在第三个评价者被引入来评判最初两个评价者所报告的分数的差异时,这在文献中有很好的记录,但从未在AAC&U标准中说明过。第二个错误是,在评价者的相互信度达到令人满意的水平之前,就停止了训练。本研究报告描述了一个实际的案例研究,其中AAC&U准则的互解释器可靠性被错误地报告,当正确报告时发现是不充分的。该说明最后提出了正确测量互传器可靠性的建议。
{"title":"The Miscalculation of Interrater Reliability: A Case Study Involving the AAC&U VALUE Rubrics.","authors":"R. F. Szafran","doi":"10.7275/Y36W-HG55","DOIUrl":"https://doi.org/10.7275/Y36W-HG55","url":null,"abstract":"Institutional assessment of student learning objectives has become a fact-of-life in American higher education and the Association of American Colleges and Universities’ (AAC&U) VALUE Rubrics have become a widely adopted evaluation and scoring tool for student work. As faculty from a variety of disciplines, some less familiar with the psychometric literature, are drawn into assessment roles, it is important to point out two easily made but serious errors in what might appear to be one of the more straightforward assessments of measurement quality—interrater reliability. The first error which can occur when a third rater is brought in to adjudicate a discrepancy in the scores reported by an initial two raters has been well-documented in the literature but never before illustrated with AAC&U rubrics. The second error is to cease training before the raters have demonstrated a satisfactory level of interrater reliability. This research note describes an actual case study in which the interrater reliability of the AAC&U rubrics was incorrectly reported and when correctly reported found to be inadequate. The note concludes with recommendations for the correct measurement of interrater reliability.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79227302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Advocating the Broad Use of the Decision Tree Method in Education 提倡在教育中广泛运用决策树方法
Q2 Social Sciences Pub Date : 2017-11-01 DOI: 10.7275/2W3N-0F07
C. Gomes, L. Almeida
Predictive studies have been widely undertaken in the field of education to provide strategic information about the extensive set of processes related to teaching and learning, as well as about what variables predict certain educational outcomes, such as academic achievement or dropout. As in any other area, there is a set of standard techniques that is usually used in predictive studies in the field education. Even though the Decision Tree Method is a well-known and standard approach in Data Mining and Machine Learning, and is broadly used in data science since the 1980's, this method is not part of the mainstream techniques used in predictive studies in the field of education. In this paper, we support a broad use of the Decision Tree Method in education. Instead of presenting formal algorithms or mathematical axioms to present the Decision Tree Method, we strictly present the method in practical terms, focusing on the rationale of the method, on how to interpret its results, and also, on the reasons why it should be broadly applied. We first show the modus operandi of the Decision Tree Method through a didactic example; afterwards, we apply the method in a classification task, in order to analyze specific educational data.
在教育领域,预测性研究已被广泛开展,以提供与教与学有关的一系列广泛过程的战略信息,以及预测某些教育成果(如学业成就或辍学)的变量。正如在任何其他领域一样,有一套标准技术通常用于实地教育的预测研究。尽管决策树方法是数据挖掘和机器学习中众所周知的标准方法,并且自20世纪80年代以来广泛用于数据科学,但该方法并不是教育领域预测研究中使用的主流技术的一部分。在本文中,我们支持决策树方法在教育中的广泛应用。我们没有提出正式的算法或数学公理来介绍决策树方法,而是严格地从实际角度来介绍该方法,重点是该方法的基本原理,如何解释其结果,以及为什么它应该被广泛应用。我们首先通过一个说教性的例子来展示决策树方法的运作方式;然后,我们将该方法应用到一个分类任务中,以分析具体的教育数据。
{"title":"Advocating the Broad Use of the Decision Tree Method in Education","authors":"C. Gomes, L. Almeida","doi":"10.7275/2W3N-0F07","DOIUrl":"https://doi.org/10.7275/2W3N-0F07","url":null,"abstract":"Predictive studies have been widely undertaken in the field of education to provide strategic information about the extensive set of processes related to teaching and learning, as well as about what variables predict certain educational outcomes, such as academic achievement or dropout. As in any other area, there is a set of standard techniques that is usually used in predictive studies in the field education. Even though the Decision Tree Method is a well-known and standard approach in Data Mining and Machine Learning, and is broadly used in data science since the 1980's, this method is not part of the mainstream techniques used in predictive studies in the field of education. In this paper, we support a broad use of the Decision Tree Method in education. Instead of presenting formal algorithms or mathematical axioms to present the Decision Tree Method, we strictly present the method in practical terms, focusing on the rationale of the method, on how to interpret its results, and also, on the reasons why it should be broadly applied. We first show the modus operandi of the Decision Tree Method through a didactic example; afterwards, we apply the method in a classification task, in order to analyze specific educational data.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77680146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 22
Effective Use of Formative Assessment by High School Teachers. 高中教师对形成性评价的有效运用。
Q2 Social Sciences Pub Date : 2017-10-01 DOI: 10.7275/ZH1K-ZK32
Melanie Brink, D. Bartz
The purpose of this mixed-methods study was to gain insights and understandings of high school teachers’ perceptions and use of formative assessment to enhance their planning, individualization of instruction, and adjustment of course content to improve student learning. The study was conducted over two years in a midwestern high school of approximately 1,000 students. Crucial to the three project teachers’ understanding of formative assessment was developing and using preset curriculum road maps that tightly aligned course goals, learning objectives, activities, instructional methods, and assessment. The in-depth case studies of the sample’s three teachers revealed that, when provided with specific information about formative assessment through staff development, they became more positive toward such assessment, and their implementation skills were greatly improved. The staff development had an especially positive impact on the teachers’ understanding and skill sets for individualizing instructional practices. The personalization of the staff development proved to be the most beneficial when it tailored the content to the varying levels of initial proficiency of the three sample teachers. Support for formative assessment by the administrative team members was essential to creating a cultural shift from summative to formative assessment.
本混合方法研究的目的是了解高中教师对形成性评估的认知和使用,以加强他们的计划、教学个性化和课程内容调整,以改善学生的学习。这项研究在中西部一所大约有1000名学生的高中进行了两年多。三位项目教师理解形成性评估的关键是开发和使用预先设定的课程路线图,将课程目标、学习目标、活动、教学方法和评估紧密结合起来。通过对样本中三位教师的深入案例研究发现,当通过员工发展为他们提供形成性评估的具体信息时,他们对形成性评估的态度变得更加积极,实施能力也大大提高。员工发展对教师对个性化教学实践的理解和技能组合有特别积极的影响。个性化的员工发展证明是最有益的,当它量身定制的内容,以不同水平的初始熟练程度的三个样本教师。行政小组成员对形成性评价的支持对于创造从总结性评价到形成性评价的文化转变是必不可少的。
{"title":"Effective Use of Formative Assessment by High School Teachers.","authors":"Melanie Brink, D. Bartz","doi":"10.7275/ZH1K-ZK32","DOIUrl":"https://doi.org/10.7275/ZH1K-ZK32","url":null,"abstract":"The purpose of this mixed-methods study was to gain insights and understandings of high school teachers’ perceptions and use of formative assessment to enhance their planning, individualization of instruction, and adjustment of course content to improve student learning. The study was conducted over two years in a midwestern high school of approximately 1,000 students. Crucial to the three project teachers’ understanding of formative assessment was developing and using preset curriculum road maps that tightly aligned course goals, learning objectives, activities, instructional methods, and assessment. The in-depth case studies of the sample’s three teachers revealed that, when provided with specific information about formative assessment through staff development, they became more positive toward such assessment, and their implementation skills were greatly improved. The staff development had an especially positive impact on the teachers’ understanding and skill sets for individualizing instructional practices. The personalization of the staff development proved to be the most beneficial when it tailored the content to the varying levels of initial proficiency of the three sample teachers. Support for formative assessment by the administrative team members was essential to creating a cultural shift from summative to formative assessment.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83284391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Practical Assessment, Research and Evaluation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1