Practical Assessment, Research and Evaluation最新文献

英文中文

Assessing the Assessment: Rubrics Training for Pre-Service and New In-Service Teachers. 评估评估:职前教师和新在职教师的培训大纲。

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-10-01 DOI: 10.7275/SJT6-5K13

Michael G. Lovorn, A. Rezaei

引用次数: 23

Best Practices in Using Large, Complex Samples: The Importance of Using Appropriate Weights and Design Effect Compensation. 使用大型复杂样本的最佳实践:使用适当权重和设计效果补偿的重要性。

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-09-01 DOI: 10.7275/2KYG-M659

J. Osborne

Large surveys often use probability sampling in order to obtain representative samples, and these data sets are valuable tools for researchers in all areas of science. Yet many researchers are not formally prepared to appropriately utilize these resources. Indeed, users of one popular dataset were generally found not to have modeled the analyses to take account of the complex sample (Johnson & Elliott, 1998) even when publishing in highly-regarded journals. It is well known that failure to appropriately model the complex sample can substantially bias the results of the analysis. Examples presented in this paper highlight the risk of error of inference and mis-estimation of parameters from failure to analyze these data sets appropriately.

大型调查通常使用概率抽样来获得代表性样本，这些数据集对所有科学领域的研究人员来说都是有价值的工具。然而，许多研究人员还没有正式准备好适当地利用这些资源。事实上，一个流行数据集的用户通常被发现没有建模分析，以考虑复杂的样本(Johnson & Elliott, 1998)，即使在高声望的期刊上发表。众所周知，对复杂样品进行适当建模的失败会大大影响分析结果。文中给出的例子强调了由于不能适当地分析这些数据集而导致的推理错误和参数错误估计的风险。

引用次数: 30

A Graphical Transition Table for Communicating Status and Growth. 沟通状态和成长的图形化过渡表。

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-06-01 DOI: 10.7275/T9R9-D719

Adam E. Wyse, Ji Zeng, Joseph A. Martineau

This paper introduces a simple and intuitive graphical display for transition table based accountability models that can be used to communicate information about students’ status and growth simultaneously. This graphical transition table includes the use of shading to convey year to year transitions and different sized letters for performance categories to depict yearly status. Examples based on Michigan’s transition table used on their Michigan Educational Assessment Program (MEAP) assessments are provided to illustrate the utility of the graphical transition table in practical contexts. Additional potential applications of the graphical transition table are also suggested.

本文介绍了一种简单直观的基于过渡表的问责模型的图形显示方法，该方法可用于同时传达学生的状态和成长信息。这个图形转换表包括使用阴影来表示每年的转换，以及使用不同大小的字母来描述年度状态的性能类别。本文以密歇根教育评估计划(MEAP)评估中使用的密歇根过渡表为例，说明了图形化过渡表在实际环境中的实用性。还提出了图形化转换表的其他潜在应用。

引用次数: 0

Too Reliable to Be True? Response Bias as a Potential Source of Inflation in Paper-and-Pencil Questionnaire Reliability. 太可靠而不真实?反应偏差是纸笔问卷可靠性膨胀的潜在来源。

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-06-01 DOI: 10.7275/E482-N724

Eyal Péer, Eyal Gamliel

When respondents answer paper-and-pencil (PP) questionnaires, they sometimes modify their responses to correspond to previously answered items. As a result, this response bias might artificially inflate the reliability of PP questionnaires. We compared the internal consistency of PP questionnaires to computerized questionnaires that presented a different number of items on a computer screen simultaneously. Study 1 showed that a PP questionnaire’s internal consistency was higher than that of the same questionnaire presented on a computer screen with one, two or four questions per screen. Study 2 replicated these findings to show that internal consistency was also relatively high when all questions were shown on one screen. This suggests that the differences found in Study 1 were not due to the difference in presentation medium. Thus, this paper suggests that reliability measures of PP questionnaires might be inflated because of a response bias resulting from participants cross-checking their answers against ones given to previous questions.

当被调查者回答纸笔(PP)问卷时，他们有时会修改他们的回答以对应于先前回答的项目。因此，这种反应偏差可能人为地夸大了PP问卷的可靠性。我们比较了PP问卷和电脑问卷的内部一致性，电脑问卷同时在电脑屏幕上呈现不同数量的项目。研究1表明，PP问卷的内部一致性高于在计算机屏幕上呈现的相同问卷，每个屏幕有一个，两个或四个问题。研究2重复了这些发现，表明当所有问题都显示在一个屏幕上时，内部一致性也相对较高。这表明在研究1中发现的差异不是由于呈现媒介的差异。因此，本文认为PP问卷的可靠性测量可能会被夸大，因为参与者交叉检查他们的答案与先前问题的答案所导致的反应偏差。

引用次数: 63

Is a Picture Is Worth a Thousand Words? Creating Effective Questionnaires with Pictures. 一幅图胜过千言万语吗?用图片制作有效的问卷。

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-05-01 DOI: 10.7275/BGPE-A067

Laura Reynolds-Keefer, Robert Johnson

In developing attitudinal instruments for young children, researchers, program evaluators, and clinicians often use response scales with pictures or images (e.g., smiley faces) as anchors. This article considers highlights connections between word-based and picture based Likert scales and highlights the value in translating conventions used in word-based Likert scales to those with pictures or images.

在为幼儿开发态度工具时，研究人员、项目评估人员和临床医生经常使用带有图片或图像(如笑脸)的反应量表作为锚点。本文考虑了基于文字的李克特量表和基于图片的李克特量表之间的联系，并强调了将基于文字的李克特量表中使用的惯例转换为带有图片或图像的李克特量表的价值。

引用次数: 62

Applying Tests of Equivalence for Multiple Group Comparisons: Demonstration of the Confidence Interval Approach. 多组比较等效检验的应用:置信区间方法的论证。

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-04-01 DOI: 10.7275/D5WF-5P77

Shayna A. Rusticus, C. Lovato

Assessing the comparability of different groups is an issue facing many researchers and evaluators in a variety of settings. Commonly, null hypothesis significance testing (NHST) is incorrectly used to demonstrate comparability when a non-significant result is found. This is problematic because a failure to find a difference between groups is not equivalent to showing that the groups are comparable. This paper provides a comparison of the confidence interval approach to equivalency testing and the more traditional analysis of variance (ANOVA) method using both continuous and rating scale data from three geographically separate medical education teaching sites. Equivalency testing is recommended as a better alternative to demonstrating comparability through its examination of whether mean differences between two groups are small enough that these differences can be considered practically unimportant and thus, the groups can be treated as equivalent.

评估不同群体的可比性是许多研究人员和评估人员在各种情况下面临的问题。通常，当发现非显著结果时，错误地使用零假设显著性检验(NHST)来证明可比性。这是有问题的，因为找不到组之间的差异并不等于表明组之间具有可比性。本文利用三个地理位置不同的医学教育教学点的连续和评定量表数据，对等效检验的置信区间方法和更传统的方差分析(ANOVA)方法进行了比较。通过检验两组之间的平均差异是否足够小，以至于这些差异实际上可以被认为不重要，因此两组可以被视为等效，等效性检验被推荐为证明可比性的更好替代方法。

引用次数: 48

Evaluating the Quantity-Quality Trade-off in the Selection of Anchor Items: a Vertical Scaling Approach 评价锚项目选择中的数量-质量权衡:一种垂直尺度方法

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-04-01 DOI: 10.7275/NNCY-EW26

Florian Pibal, H. Cesnik

When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.

在管理跨年级的考试时，通常采用垂直缩放法将不同考试的分数放在一个共同的总体尺度上，以便跟踪考生的进步。然而，为了能够将不同年级的结果联系起来，需要在两种测试表格中包含共同的项目。在文献中，对于常见物品的理想数量似乎没有明确的共识。与一些学者一致，我们认为更多的锚项目承担更高的风险，如位移，项目漂移或不希望的拟合统计，并且拥有更少的心理测量功能良好的锚项目有时可能更可取。为了证明这一点，进行了一项研究，包括对6至8年级的1350名考生进行阅读理解测试。在采用循序渐进的方法时，我们发现考试管理中跨年级高项目漂移的悖论可以得到缓解，甚至最终被消除。同时，积极的副作用是增加了经验数据的解释力。此外，研究发现，尺度调整可用于评估垂直尺度方法的有效性，在某些情况下，可以比使用校准的锚定项目产生更准确的结果。

{"title":"Evaluating the Quantity-Quality Trade-off in the Selection of Anchor Items: a Vertical Scaling Approach","authors":"Florian Pibal, H. Cesnik","doi":"10.7275/NNCY-EW26","DOIUrl":"https://doi.org/10.7275/NNCY-EW26","url":null,"abstract":"When administering tests across grades, vertical scaling is often employed to place scores from different tests on a common overall scale so that test-takers’ progress can be tracked. In order to be able to link the results across grades, however, common items are needed that are included in both test forms. In the literature there seems to be no clear agreement about the ideal number of common items. In line with some scholars, we argue that a greater number of anchor items bear a higher risk of unwanted effects like displacement, item drift, or undesired fit statistics and that having fewer psychometrically well-functioning anchor items can sometimes be more desirable. In order to demonstrate this, a study was conducted that included the administration of a reading-comprehension test to 1,350 test-takers across grades 6 to 8. In employing a step-by-step approach, we found that the paradox of high item drift in test administrations across grades can be mitigated and eventually even be eliminated. At the same time, a positive side effect was an increase in the explanatory power of the empirical data. Moreover, it was found that scaling adjustment can be used to evaluate the effectiveness of a vertical scaling approach and, in certain cases, can lead to more accurate results than the use of calibrated anchor items.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91005580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Do more online instructional ratings lead to better prediction of instructor quality 更多的在线教学评分能更好地预测教师的质量吗

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-02-01 DOI: 10.7275/NHNN-1N13

S. Sanders, Bhavneet Walia, Joel Potter, Kenneth W. Linna

Online instructional ratings are taken by many with a grain of salt. This study analyzes the ability of said ratings to estimate the official (university-administered) instructional ratings of the same respective university instructors. Given self-selection among raters, we further test whether more online ratings of instructors lead to better prediction of official ratings in terms of both R-squared value and root mean squared error. We lastly test and correct for heteroskedastic error terms in the regression analysis to allow for the first robust estimations on the topic. Despite having a starkly different distribution of values, online ratings explain much of the variation in official ratings. This conclusion strengthens, and root mean squared error typically falls, as one considers regression subsets over which instructors have a larger number of online ratings. Though (public) online ratings do not mimic the results of (semi-private) official ratings, they provide a reliable source of information for predicting official ratings. There is strong evidence that this reliability increases in online rating usage.

许多人对在线教学评分持怀疑态度。本研究分析上述评级的能力，以估计官方(大学管理的)教学评级的同一各自的大学教师。考虑到评分者的自我选择，我们进一步检验了更多的教师在线评分是否能在r平方值和均方根误差方面更好地预测官方评分。最后，我们在回归分析中测试和纠正异方差误差项，以允许对该主题进行第一次稳健估计。尽管两国的价值观分布截然不同，但在线评级解释了官方评级的很大差异。这一结论得到了加强，而且根均方误差通常会下降，因为人们考虑到回归子集，其中教师拥有更多的在线评分。尽管(公开的)在线评级不能模仿(半私人的)官方评级的结果，但它们为预测官方评级提供了可靠的信息来源。有强有力的证据表明，这种可靠性在在线评级使用中有所提高。

引用次数: 17

Termination Criteria for Computerized Classification Testing. 计算机分类试验终止标准。

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-02-01 DOI: 10.7275/WQ8M-ZK25

Nathan A. Thompson

Computerized classification testing (CCT) is an approach to designing tests with intelligent algorithms, similar to adaptive testing, but specifically designed for the purpose of classifying examinees into categories such as “pass” and “fail.” Like adaptive testing for point estimation of ability, the key component is the termination criterion, namely the algorithm that decides whether to classify the examinee and end the test or to continue and administer another item. This paper applies a newly suggested termination criterion, the generalized likelihood ratio (GLR), to CCT. It also explores the role of the indifference region in the specification of likelihood-ratio based termination criteria, comparing the GLR to the sequential probability ratio test. Results from simulation studies suggest that the GLR is always at least as efficient as existing methods.

计算机分类考试(CCT)是一种采用智能算法设计考试的方法，类似于自适应考试，但专门设计用于将考生分为“及格”和“不及格”等类别。与能力点估计的自适应测试一样，其关键部分是终止标准，即决定是对考生进行分类并结束测试，还是继续并管理另一个项目的算法。本文将一个新提出的终止准则——广义似然比(GLR)应用于CCT。它还探讨了无差异区域在基于似然比的终止标准规范中的作用，并将GLR与序列概率比检验进行了比较。模拟研究的结果表明，GLR总是至少与现有方法一样有效。

引用次数: 21

FORMATIVE USE OF ASSESSMENT INFORMATION: IT'S A PROCESS, SO LET'S SAY WHAT WE MEAN 评估信息的形成性使用:这是一个过程，所以让我们说一下我们的意思

Q2 Social Sciences

Practical Assessment, Research and Evaluation

Pub Date : 2011-02-01 DOI: 10.7275/3YVY-AT83

Robert Good

The term formative assessment is often used to describe a type of assessment. The purpose of this paper is to challenge the use of this phrase given that formative assessment as a noun phrase ignores the well-established understanding that it is a process more than an object. A model that combines content, context, and strategies is presented as one way to view the process nature of assessing formatively. The alternate phrase formative use of assessment information is suggested as a more appropriate way to describe how content, context, and strategies can be used together in order to close the gap between where a student is performing currently and the intended learning goal. Let’s start with an elementary grammar review: adjectives modify nouns; adverbs modify verbs, adjectives, and other adverbs. Applied to recent assessment literature, the term formative assessment would therefore contain the adjective formative modifying the noun assessment, creating a noun phrase representing a thing or object. Indeed, formative assessment as a noun phrase is regularly juxtaposed to summative assessment in both purpose and timing. Formative assessment is commonly understood to occur during instruction with the intent to identify relative strengths and weaknesses and guide instruction, while summative assessment occurs after a unit of instruction with the intent of measuring performance levels of the skills and content related to the unit of instruction (Stiggins, Arter, Chappuis, & Chappuis, 2006). Distinguishing formative and summative assessments in this manner may have served an important introductory purpose, however using formative as a descriptor of a type of assessment has had ramifi cations that merit critical consideration. Given that formative assessment has received considerable attention in the literature over the last 20 or so years, this article contends that it is time to move beyond the well-established broad distinctions between formative and summative assessments and consider the subtle – yet important – distinction between the term formative assessment as an object and the intended meaning. The focus here is to suggest that if we want to realize the true potential of formative practices in our classrooms, then we need to start saying what we mean.

形成性评估这个术语通常用来描述一种类型的评估。本文的目的是挑战这个短语的使用，因为形成性评估作为名词短语忽略了一个公认的理解，即它是一个过程而不是一个对象。将内容、上下文和策略结合在一起的模型是一种以形式化方式查看评估过程本质的方法。评估信息的替代短语形成性使用被认为是一种更合适的方式来描述如何将内容、上下文和策略一起使用，以缩小学生当前的表现与预期的学习目标之间的差距。让我们从基本语法复习开始:形容词修饰名词;副词修饰动词、形容词和其他副词。因此，在最近的评价文献中，形成性评价一词将包含形容词形成性对名词评价的修饰，创造一个代表事物或物体的名词短语。事实上，形成性评估作为一个名词短语，在目的和时间上经常与总结性评估并列。形成性评估通常被理解为在教学过程中进行，目的是确定相对优势和劣势并指导教学，而总结性评估发生在教学单元之后，目的是衡量与该教学单元相关的技能和内容的表现水平(Stiggins, Arter, Chappuis， & Chappuis, 2006)。以这种方式区分形成性评估和总结性评估可能具有重要的介绍性目的，然而，使用形成性评估作为一种评估类型的描述符具有值得批判性考虑的分支。鉴于在过去20年左右的时间里，形成性评估在文献中受到了相当大的关注，本文认为，现在是时候超越形成性评估和总结性评估之间公认的广泛区别，并考虑形成性评估这一术语作为对象和预期意义之间微妙但重要的区别。这里的重点是建议，如果我们想要在课堂上实现形成性实践的真正潜力，那么我们需要开始说出我们的意思。

{"title":"FORMATIVE USE OF ASSESSMENT INFORMATION: IT'S A PROCESS, SO LET'S SAY WHAT WE MEAN","authors":"Robert Good","doi":"10.7275/3YVY-AT83","DOIUrl":"https://doi.org/10.7275/3YVY-AT83","url":null,"abstract":"The term formative assessment is often used to describe a type of assessment. The purpose of this paper is to challenge the use of this phrase given that formative assessment as a noun phrase ignores the well-established understanding that it is a process more than an object. A model that combines content, context, and strategies is presented as one way to view the process nature of assessing formatively. The alternate phrase formative use of assessment information is suggested as a more appropriate way to describe how content, context, and strategies can be used together in order to close the gap between where a student is performing currently and the intended learning goal. Let’s start with an elementary grammar review: adjectives modify nouns; adverbs modify verbs, adjectives, and other adverbs. Applied to recent assessment literature, the term formative assessment would therefore contain the adjective formative modifying the noun assessment, creating a noun phrase representing a thing or object. Indeed, formative assessment as a noun phrase is regularly juxtaposed to summative assessment in both purpose and timing. Formative assessment is commonly understood to occur during instruction with the intent to identify relative strengths and weaknesses and guide instruction, while summative assessment occurs after a unit of instruction with the intent of measuring performance levels of the skills and content related to the unit of instruction (Stiggins, Arter, Chappuis, & Chappuis, 2006). Distinguishing formative and summative assessments in this manner may have served an important introductory purpose, however using formative as a descriptor of a type of assessment has had ramifi cations that merit critical consideration. Given that formative assessment has received considerable attention in the literature over the last 20 or so years, this article contends that it is time to move beyond the well-established broad distinctions between formative and summative assessments and consider the subtle – yet important – distinction between the term formative assessment as an object and the intended meaning. The focus here is to suggest that if we want to realize the true potential of formative practices in our classrooms, then we need to start saying what we mean.","PeriodicalId":20361,"journal":{"name":"Practical Assessment, Research and Evaluation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2011-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75781333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 41

首页上一页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Practical Assessment, Research and Evaluation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀