首页 > 最新文献

International Journal of Testing最新文献

英文 中文
An Examination of Different Methods of Setting Cutoff Values in Person Fit Research 对适合度研究中设定临界值的不同方法的考察
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2019-01-02 DOI: 10.1080/15305058.2018.1464010
A. Mousavi, Ying Cui, Todd Rogers
This simulation study evaluates four different methods of setting cutoff values for person fit assessment, including (a) using fixed cutoff values either from theoretical distributions of person fit statistics, or arbitrarily chosen by the researchers in the literature; (b) using the specific percentile rank of empirical sampling distribution of person fit statistics from simulated fitting responses; (c) using bootstrap method to estimate cutoff values of empirical sampling distribution of person fit statistics from simulated fitting responses; and (d) using the p-value methods to identify misfitting responses conditional on ability levels. The Snijders' (2001), as an index with known theoretical distribution, van der Flier's U3 (1982) and Sijtsma's HT coefficient (1986), as indices with unknown theoretical distribution, were chosen. According to the simulation results, different methods of setting cutoff values tend to produce different levels of Type I error and detection rates, indicating it is critical to select an appropriate method for setting cutoff values in person fit research.
该模拟研究评估了四种不同的设定截断值的方法,包括(a)使用固定的截断值,或者从理论分布的人适合统计,或者由研究人员在文献中任意选择;(b)利用模拟拟合反应的人拟合统计量的经验抽样分布的特定百分位数秩;(c)利用自举法从模拟拟合响应中估计人拟合统计量经验抽样分布的截止值;(d)使用p值方法识别以能力水平为条件的错拟合反应。选用理论分布已知的指标Snijders’s(2001),理论分布未知的指标van der Flier’s U3(1982)和Sijtsma’s HT系数(1986)。仿真结果表明,不同的截止值设置方法往往会产生不同程度的I型误差和检出率,这表明在人身拟合研究中选择合适的截止值设置方法至关重要。
{"title":"An Examination of Different Methods of Setting Cutoff Values in Person Fit Research","authors":"A. Mousavi, Ying Cui, Todd Rogers","doi":"10.1080/15305058.2018.1464010","DOIUrl":"https://doi.org/10.1080/15305058.2018.1464010","url":null,"abstract":"This simulation study evaluates four different methods of setting cutoff values for person fit assessment, including (a) using fixed cutoff values either from theoretical distributions of person fit statistics, or arbitrarily chosen by the researchers in the literature; (b) using the specific percentile rank of empirical sampling distribution of person fit statistics from simulated fitting responses; (c) using bootstrap method to estimate cutoff values of empirical sampling distribution of person fit statistics from simulated fitting responses; and (d) using the p-value methods to identify misfitting responses conditional on ability levels. The Snijders' (2001), as an index with known theoretical distribution, van der Flier's U3 (1982) and Sijtsma's HT coefficient (1986), as indices with unknown theoretical distribution, were chosen. According to the simulation results, different methods of setting cutoff values tend to produce different levels of Type I error and detection rates, indicating it is critical to select an appropriate method for setting cutoff values in person fit research.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"1 - 22"},"PeriodicalIF":1.7,"publicationDate":"2019-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1464010","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48532510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests 四种IRT模型在等价段落测试中的相对性能比较
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-12-13 DOI: 10.1080/15305058.2018.1530239
Kyung Yong Kim, Euijin Lim, Won‐Chan Lee
For passage-based tests, items that belong to a common passage often violate the local independence assumption of unidimensional item response theory (UIRT). In this case, ignoring local item dependence (LID) and estimating item parameters using a UIRT model could be problematic because doing so might result in inaccurate parameter estimates, which, in turn, could impact the results of equating. Under the random groups design, the main purpose of this article was to compare the relative performance of the three-parameter logistic (3PL), graded response (GR), bifactor, and testlet models on equating passage-based tests when various degrees of LID were present due to passage. Simulation results showed that the testlet model produced the most accurate equating results, followed by the bifactor model. The 3PL model worked as well as the bifactor and testlet models when the degree of LID was low but returned less accurate equating results than the two multidimensional models as the degree of LID increased. Among the four models, the polytomous GR model provided the least accurate equating results.
对于基于篇章的测试,属于共同篇章的项目经常违反一维项目反应理论(UIRT)的局部独立性假设。在这种情况下,忽略局部项目依赖性(LID)和使用UIRT模型估计项目参数可能会有问题,因为这样做可能会导致参数估计不准确,进而影响等式的结果。在随机分组设计下,本文的主要目的是比较三参数逻辑(3PL)、分级反应(GR)、双因子和小测试模型在基于通道的等效测试中的相对性能,当由于通道存在不同程度的LID时。仿真结果表明,testlet模型产生了最准确的等值结果,其次是双因子模型。当LID程度较低时,3PL模型与双因子和小测试模型一样有效,但随着LID程度的增加,返回的等式结果不如两个多维模型准确。在这四个模型中,多面体GR模型提供了最不准确的等式结果。
{"title":"A Comparison of the Relative Performance of Four IRT Models on Equating Passage-Based Tests","authors":"Kyung Yong Kim, Euijin Lim, Won‐Chan Lee","doi":"10.1080/15305058.2018.1530239","DOIUrl":"https://doi.org/10.1080/15305058.2018.1530239","url":null,"abstract":"For passage-based tests, items that belong to a common passage often violate the local independence assumption of unidimensional item response theory (UIRT). In this case, ignoring local item dependence (LID) and estimating item parameters using a UIRT model could be problematic because doing so might result in inaccurate parameter estimates, which, in turn, could impact the results of equating. Under the random groups design, the main purpose of this article was to compare the relative performance of the three-parameter logistic (3PL), graded response (GR), bifactor, and testlet models on equating passage-based tests when various degrees of LID were present due to passage. Simulation results showed that the testlet model produced the most accurate equating results, followed by the bifactor model. The 3PL model worked as well as the bifactor and testlet models when the degree of LID was low but returned less accurate equating results than the two multidimensional models as the degree of LID increased. Among the four models, the polytomous GR model provided the least accurate equating results.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"248 - 269"},"PeriodicalIF":1.7,"publicationDate":"2018-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1530239","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46453114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Test Instructions Do Not Moderate the Indirect Effect of Perceived Test Importance on Test Performance in Low-Stakes Testing Contexts 在低风险测试环境中,测试说明不能缓和感知测试重要性对测试性能的间接影响
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-10-02 DOI: 10.1080/15305058.2017.1396466
S. Finney, Aaron J. Myers, C. Mathers
Assessment specialists expend a great deal of energy to promote valid inferences from test scores gathered in low-stakes testing contexts. Given the indirect effect of perceived test importance on test performance via examinee effort, assessment practitioners have manipulated test instructions with the goal of increasing perceived test importance. Importantly, no studies have investigated the impact of test instructions on this indirect effect. In the current study, students were randomly assigned to one of three test instruction conditions intended to increase test relevance while keeping the test low-stakes to examinees. Test instructions did not impact average perceived test importance, examinee effort, or test performance. Furthermore, the indirect relationship between importance and performance via effort was not moderated by instructions. Thus, the effect of perceived test importance on test scores via expended effort appears consistent across different messages regarding the personal relevance of the test to examinees. The main implication for testing practice is that the effect of instructions may be negligible when reflective of authentic low-stakes test score use. Future studies should focus on uncovering instructions that increase the value of performance to the examinee yet remain truthful regarding score use.
评估专家花费大量精力从低风险测试环境中收集的测试分数中促进有效的推断。鉴于感知的考试重要性通过考生的努力对考试成绩产生间接影响,评估从业者操纵了考试说明,目的是提高感知的考试重要程度。重要的是,没有研究调查测试说明对这种间接影响的影响。在目前的研究中,学生被随机分配到三种测试指导条件中的一种,旨在提高测试相关性,同时保持测试对考生的低风险。测试说明不会影响平均感知的测试重要性、考生努力程度或测试成绩。此外,重要性和通过努力取得的成绩之间的间接关系并没有受到指示的调节。因此,在关于考试对考生的个人相关性的不同信息中,感知到的考试重要性通过努力对考试成绩的影响似乎是一致的。测试实践的主要含义是,当反映真实的低风险测试分数使用时,说明的效果可能可以忽略不计。未来的研究应侧重于揭示提高考生表现价值但在分数使用方面保持真实的说明。
{"title":"Test Instructions Do Not Moderate the Indirect Effect of Perceived Test Importance on Test Performance in Low-Stakes Testing Contexts","authors":"S. Finney, Aaron J. Myers, C. Mathers","doi":"10.1080/15305058.2017.1396466","DOIUrl":"https://doi.org/10.1080/15305058.2017.1396466","url":null,"abstract":"Assessment specialists expend a great deal of energy to promote valid inferences from test scores gathered in low-stakes testing contexts. Given the indirect effect of perceived test importance on test performance via examinee effort, assessment practitioners have manipulated test instructions with the goal of increasing perceived test importance. Importantly, no studies have investigated the impact of test instructions on this indirect effect. In the current study, students were randomly assigned to one of three test instruction conditions intended to increase test relevance while keeping the test low-stakes to examinees. Test instructions did not impact average perceived test importance, examinee effort, or test performance. Furthermore, the indirect relationship between importance and performance via effort was not moderated by instructions. Thus, the effect of perceived test importance on test scores via expended effort appears consistent across different messages regarding the personal relevance of the test to examinees. The main implication for testing practice is that the effect of instructions may be negligible when reflective of authentic low-stakes test score use. Future studies should focus on uncovering instructions that increase the value of performance to the examinee yet remain truthful regarding score use.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"18 1","pages":"297 - 322"},"PeriodicalIF":1.7,"publicationDate":"2018-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1396466","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49293024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Investigating the Reliability of the Sentence Verification Technique 句子验证技术的可靠性研究
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-09-20 DOI: 10.1080/15305058.2018.1497636
Amanda M Marcotte, Francis Rick, C. Wells
Reading comprehension plays an important role in achievement for all academic domains. The purpose of this study is to describe the sentence verification technique (SVT) (Royer, Hastings, & Hook, 1979) as an alternative method of assessing reading comprehension, which can be used with a variety of texts and across diverse populations and educational contexts. Additionally, this study adds a unique contribution to the extant literature on the SVT through an investigation of the precision of the instrument across proficiency levels. Data were gathered from a sample of 464 fourth-grade students from the Northeast region of the United States. Reliability was estimated using one, two, three, and four passage test forms. Two or three passages provided sufficient reliability. The conditional reliability analyses revealed that the SVT test scores were reliable for readers with average to below average proficiency, but did not provide reliable information for students who were very poor or strong readers.
阅读理解在所有学术领域的成就中都起着重要的作用。本研究的目的是描述句子验证技术(SVT) (Royer, Hastings, & Hook, 1979)作为一种评估阅读理解的替代方法,它可以用于各种文本,跨越不同的人群和教育背景。此外,本研究通过对仪器在熟练程度上的精度的调查,为现有的SVT文献增加了独特的贡献。数据收集自美国东北地区464名四年级学生的样本。信度估计使用一,二,三,四通道测试形式。两三个段落就足够可靠了。条件信度分析表明,对于阅读能力在平均水平到中等水平以下的学生,SVT测试分数是可靠的,而对于阅读能力非常差或很强的学生,SVT测试分数没有提供可靠的信息。
{"title":"Investigating the Reliability of the Sentence Verification Technique","authors":"Amanda M Marcotte, Francis Rick, C. Wells","doi":"10.1080/15305058.2018.1497636","DOIUrl":"https://doi.org/10.1080/15305058.2018.1497636","url":null,"abstract":"Reading comprehension plays an important role in achievement for all academic domains. The purpose of this study is to describe the sentence verification technique (SVT) (Royer, Hastings, & Hook, 1979) as an alternative method of assessing reading comprehension, which can be used with a variety of texts and across diverse populations and educational contexts. Additionally, this study adds a unique contribution to the extant literature on the SVT through an investigation of the precision of the instrument across proficiency levels. Data were gathered from a sample of 464 fourth-grade students from the Northeast region of the United States. Reliability was estimated using one, two, three, and four passage test forms. Two or three passages provided sufficient reliability. The conditional reliability analyses revealed that the SVT test scores were reliable for readers with average to below average proficiency, but did not provide reliable information for students who were very poor or strong readers.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"74 - 95"},"PeriodicalIF":1.7,"publicationDate":"2018-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1497636","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45868181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Item Parameter Drift in Context Questionnaires from International Large-Scale Assessments 国际大型评估问卷中项目参数的漂移
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-09-14 DOI: 10.1080/15305058.2018.1481852
HyeSun Lee, K. Geisinger
The purpose of the current study was to examine the impact of item parameter drift (IPD) occurring in context questionnaires from an international large-scale assessment and determine the most appropriate way to address IPD. Focusing on the context of psychometric and educational research where scores from context questionnaires composed of polytomous items were employed for the classification of examinees, the current research investigated the impacts of IPD on the estimation of questionnaire scores and classification accuracy with five manipulated factors: the length of a questionnaire, the proportion of items exhibiting IPD, the direction and magnitude of IPD, and three decisions about IPD. The results indicated that the impact of IPD occurring in a short context questionnaire on the accuracy of score estimation and classification of examinees was substantial. The accuracy in classification considerably decreased especially at the lowest and highest categories of a trait. Unlike the recommendation from literature in educational testing, the current study demonstrated that keeping items exhibiting IPD and removing them only for transformation were appropriate when IPD occurred in relatively short context questionnaires. Using 2011 TIMSS data from Iran, an applied example demonstrated the application of provided guidance in making appropriate decisions about IPD.
本研究的目的是检验国际大规模评估的情境问卷中项目参数漂移(IPD)的影响,并确定解决IPD的最合适方法。本研究以心理测量和教育研究为背景,采用由多个相似项目组成的情境问卷中的分数对考生进行分类,通过五个操纵因素调查了IPD对问卷分数估计和分类准确性的影响:问卷长度、,显示IPD的项目的比例、IPD的方向和大小,以及关于IPD的三个决策。结果表明,在简短的上下文问卷中出现的IPD对考生的分数估计和分类的准确性有很大的影响。分类的准确性显著下降,尤其是在一个性状的最低和最高类别。与教育测试中文献中的建议不同,当前的研究表明,当IPD发生在相对较短的情境问卷中时,保留显示IPD的项目并仅为转换而删除它们是合适的。利用2011年伊朗TIMSS数据,一个应用实例展示了所提供的指导在IPD相关决策中的应用。
{"title":"Item Parameter Drift in Context Questionnaires from International Large-Scale Assessments","authors":"HyeSun Lee, K. Geisinger","doi":"10.1080/15305058.2018.1481852","DOIUrl":"https://doi.org/10.1080/15305058.2018.1481852","url":null,"abstract":"The purpose of the current study was to examine the impact of item parameter drift (IPD) occurring in context questionnaires from an international large-scale assessment and determine the most appropriate way to address IPD. Focusing on the context of psychometric and educational research where scores from context questionnaires composed of polytomous items were employed for the classification of examinees, the current research investigated the impacts of IPD on the estimation of questionnaire scores and classification accuracy with five manipulated factors: the length of a questionnaire, the proportion of items exhibiting IPD, the direction and magnitude of IPD, and three decisions about IPD. The results indicated that the impact of IPD occurring in a short context questionnaire on the accuracy of score estimation and classification of examinees was substantial. The accuracy in classification considerably decreased especially at the lowest and highest categories of a trait. Unlike the recommendation from literature in educational testing, the current study demonstrated that keeping items exhibiting IPD and removing them only for transformation were appropriate when IPD occurred in relatively short context questionnaires. Using 2011 TIMSS data from Iran, an applied example demonstrated the application of provided guidance in making appropriate decisions about IPD.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"23 - 51"},"PeriodicalIF":1.7,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1481852","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42801965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling 运用比较判断和Rasch模型研究考试难度的可比性
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-09-14 DOI: 10.1080/15305058.2018.1486316
Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He
The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.
采用专家配对比较和学生测试的方法,研究了英国16岁学生数学试题中6个题目的期望难度和实际难度之间的关系。Rasch模型的一种变体应用于比较数据,以建立预期难度的尺度。在测试中,2933名学生采用了等效组设计,允许将项目的实际难度放在同一测量尺度上。结果表明,采用比较判断法得出的期望难度与根据测试数据得出的实际难度具有较强的相关性。这表明比较判断可能是考察考试难度可比性的有效方法。在由于安全或其他风险原因而无法进行预测试的情况下,该方法可能被用作高风险测试预测试的替代方法。
{"title":"Investigating the Comparability of Examination Difficulty Using Comparative Judgement and Rasch Modelling","authors":"Stephen D. Holmes, M. Meadows, I. Stockford, Qingping He","doi":"10.1080/15305058.2018.1486316","DOIUrl":"https://doi.org/10.1080/15305058.2018.1486316","url":null,"abstract":"The relationship of expected and actual difficulty of items on six mathematics question papers designed for 16-year olds in England was investigated through paired comparison using experts and testing with students. A variant of the Rasch model was applied to the comparison data to establish a scale of expected difficulty. In testing, the papers were taken by 2933 students using an equivalent-groups design, allowing the actual difficulty of the items to be placed on the same measurement scale. It was found that the expected difficulty derived using the comparative judgement approach and the actual difficulty derived from the test data was reasonably strongly correlated. This suggests that comparative judgement may be an effective way to investigate the comparability of difficulty of examinations. The approach could potentially be used as a proxy for pretesting high-stakes tests in situations where pretesting is not feasible due to reasons of security or other risks.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"18 1","pages":"366 - 391"},"PeriodicalIF":1.7,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1486316","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45405533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Analyzing Job Analysis Data Using Mixture Rasch Models 使用混合Rasch模型分析工作分析数据
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-09-14 DOI: 10.1080/15305058.2018.1481853
Adam E. Wyse
An important piece of validity evidence to support the use of credentialing exams comes from performing a job analysis of the profession. One common job analysis method is the task inventory method, where people working in the field are surveyed using rating scales about the tasks thought necessary to safely and competently perform the job. This article describes how mixture Rasch models can be used to analyze these data, and how results from these analyses can help to identify whether different groups of people may be responding to job tasks differently. Three examples from different credentialing programs illustrate scenarios that can be found when applying mixture Rasch models to job analysis data. Discussion of what these results may imply for the development of credentialing exams and other analyses of job analysis data is provided.
支持使用资格考试的一项重要有效性证据来自对该行业的工作分析。一种常见的工作分析方法是任务清单法,即使用评分量表对现场工作人员进行调查,了解他们认为安全、胜任工作所需的任务。本文描述了如何使用混合Rasch模型来分析这些数据,以及这些分析的结果如何有助于确定不同人群对工作任务的反应是否不同。来自不同认证程序的三个例子说明了在将混合Rasch模型应用于工作分析数据时可以找到的场景。讨论了这些结果对资格考试的发展和对工作分析数据的其他分析可能意味着什么。
{"title":"Analyzing Job Analysis Data Using Mixture Rasch Models","authors":"Adam E. Wyse","doi":"10.1080/15305058.2018.1481853","DOIUrl":"https://doi.org/10.1080/15305058.2018.1481853","url":null,"abstract":"An important piece of validity evidence to support the use of credentialing exams comes from performing a job analysis of the profession. One common job analysis method is the task inventory method, where people working in the field are surveyed using rating scales about the tasks thought necessary to safely and competently perform the job. This article describes how mixture Rasch models can be used to analyze these data, and how results from these analyses can help to identify whether different groups of people may be responding to job tasks differently. Three examples from different credentialing programs illustrate scenarios that can be found when applying mixture Rasch models to job analysis data. Discussion of what these results may imply for the development of credentialing exams and other analyses of job analysis data is provided.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"19 1","pages":"52 - 73"},"PeriodicalIF":1.7,"publicationDate":"2018-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2018.1481853","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47874147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Polytomous Model of Cognitive Diagnostic Assessment for Graded Data 分级数据认知诊断评估的多元体模型
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-07-03 DOI: 10.1080/15305058.2017.1396465
Dongbo Tu, Chanjin Zheng, Yan Cai, Xuliang Gao, Daxun Wang
Pursuing the line of the difference models in IRT (Thissen & Steinberg, 1986), this article proposed a new cognitive diagnostic model for graded/polytomous data based on the deterministic input, noisy, and gate (Haertel, 1989; Junker & Sijtsma, 2001), which is named the DINA model for graded data (DINA-GD). We investigated the performance of a full Bayesian estimation of the proposed model. In the simulation, the classification accuracy and item recovery for the DINA-GD model were investigated. The results indicated that the proposed model had acceptable examinees' correct attribute classification rate and item parameter recovery. In addition, a real-data example was used to illustrate the application of this new model with the graded data or polytomously scored items.
遵循IRT中的差异模型(Thissen&Steinberg,1986),本文提出了一种新的基于确定性输入、噪声和门的分级/多模数据的认知诊断模型(Haertel,1989;Junker&Sijtsma,2001),称为分级数据的DINA模型(DINA-GD)。我们研究了所提出的模型的完全贝叶斯估计的性能。在模拟中,研究了DINA-GD模型的分类精度和项目回收率。结果表明,该模型具有可接受的考生正确属性分类率和项目参数恢复率。此外,还用一个真实的数据例子说明了这种新模型在分级数据或多面体评分项目中的应用。
{"title":"A Polytomous Model of Cognitive Diagnostic Assessment for Graded Data","authors":"Dongbo Tu, Chanjin Zheng, Yan Cai, Xuliang Gao, Daxun Wang","doi":"10.1080/15305058.2017.1396465","DOIUrl":"https://doi.org/10.1080/15305058.2017.1396465","url":null,"abstract":"Pursuing the line of the difference models in IRT (Thissen & Steinberg, 1986), this article proposed a new cognitive diagnostic model for graded/polytomous data based on the deterministic input, noisy, and gate (Haertel, 1989; Junker & Sijtsma, 2001), which is named the DINA model for graded data (DINA-GD). We investigated the performance of a full Bayesian estimation of the proposed model. In the simulation, the classification accuracy and item recovery for the DINA-GD model were investigated. The results indicated that the proposed model had acceptable examinees' correct attribute classification rate and item parameter recovery. In addition, a real-data example was used to illustrate the application of this new model with the graded data or polytomously scored items.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"18 1","pages":"231 - 252"},"PeriodicalIF":1.7,"publicationDate":"2018-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1396465","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49274990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Investigating How Test-Takers Change Their Strategies to Handle Difficulty in Taking a Reading Comprehension Test: Implications for Score Validation 调查应试者如何改变策略来应对阅读理解测试中的困难:对分数验证的启示
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-07-03 DOI: 10.1080/15305058.2017.1396464
Amery Wu, Michelle Y. Chen, J. Stone
This article investigates how test-takers change their strategies to handle increased test difficulty. An adult sample reported their test-taking strategies immediately after completing the tasks in a reading test. Data were analyzed using structural equation modeling specifying a measurement-invariant, ability-moderated, latent transition analysis in Mplus (Muthén & Asparouhov, 2011). It was found that almost half of the test-takers (47%) changed their strategies when encountering increased task-difficulty. The changes were characterized by augmenting comprehending-meaning strategies with score-maximizing and test-wiseness strategies. Moreover, test-takers' ability was the driving influence that facilitated and/or buffered the changes. The test outcomes, when reviewed in light of adjusted test-taking strategies, demonstrated a form of process-based validity evidence.
本文研究了应试者如何改变他们的策略来应对增加的考试难度。一个成人样本在完成阅读测试任务后立即报告了他们的应试策略。使用结构方程模型对数据进行了分析,该模型指定了Mplus中的测量不变、能力调节和潜在过渡分析(muth和Asparouhov, 2011)。研究发现,几乎一半的考生(47%)在遇到任务难度增加时改变了他们的策略。这种变化表现为分数最大化策略和测验明智策略对理解意义策略的增强。此外,考生的能力是促进和/或缓冲变化的驱动因素。当根据调整后的考试策略审查考试结果时,显示出一种基于过程的效度证据。
{"title":"Investigating How Test-Takers Change Their Strategies to Handle Difficulty in Taking a Reading Comprehension Test: Implications for Score Validation","authors":"Amery Wu, Michelle Y. Chen, J. Stone","doi":"10.1080/15305058.2017.1396464","DOIUrl":"https://doi.org/10.1080/15305058.2017.1396464","url":null,"abstract":"This article investigates how test-takers change their strategies to handle increased test difficulty. An adult sample reported their test-taking strategies immediately after completing the tasks in a reading test. Data were analyzed using structural equation modeling specifying a measurement-invariant, ability-moderated, latent transition analysis in Mplus (Muthén & Asparouhov, 2011). It was found that almost half of the test-takers (47%) changed their strategies when encountering increased task-difficulty. The changes were characterized by augmenting comprehending-meaning strategies with score-maximizing and test-wiseness strategies. Moreover, test-takers' ability was the driving influence that facilitated and/or buffered the changes. The test outcomes, when reviewed in light of adjusted test-taking strategies, demonstrated a form of process-based validity evidence.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"18 1","pages":"253 - 275"},"PeriodicalIF":1.7,"publicationDate":"2018-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1396464","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45131489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Incongruence Between Native and Test Administration Languages: Towards Equal Opportunity in International Literacy Assessment 母语与考试管理语言的不一致:迈向国际读写能力评估的机会均等
IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY Pub Date : 2018-07-03 DOI: 10.1080/15305058.2017.1407767
Patriann Smith, P. Frazier, Jaehoon Lee, R. Chang
Previous research has primarily addressed the effects of language on the Program for International Student Assessment (PISA) mathematics and science assessments. More recent research has focused on the effects of language on PISA reading comprehension and literacy assessments on student populations in specific Organization for Economic Cooperation and Development (OECD) and non-OECD countries. Recognizing calls to highlight the impact of language on student PISA reading performance across countries, the purpose of this study was to examine the effect of home languages versus test languages on PISA reading literacy across OECD and non-OECD economies, while considering other factors. The results of Ordinary Least Squares regression showed that about half of the economies demonstrated a positive and significant effect of students' language status on their reading performance. This finding is consistent with observations in the parallel analysis of PISA 2009 data, suggesting that students' performance on reading literacy assessment was higher when they were tested in their home language. Our findings highlight the importance of the role of context, the need for new approaches to test translation, and the potential similarities in language status for youth from OECD and non-OECD countries that have implications for interpreting their PISA reading literacy assessments.
先前的研究主要针对语言对国际学生评估计划(PISA)数学和科学评估的影响。最近的研究重点是语言对PISA阅读理解的影响,以及对经济合作与发展组织(OECD)和非OECD国家学生群体的识字评估。认识到各国都呼吁强调语言对学生PISA阅读成绩的影响,本研究的目的是在考虑其他因素的同时,考察母语与测试语言对经合组织和非经合组织经济体PISA阅读素养的影响。普通最小二乘回归的结果表明,大约一半的经济体表现出学生的语言状态对他们的阅读成绩的积极而显著的影响。这一发现与PISA 2009年数据的平行分析中的观察结果一致,表明当学生用母语进行测试时,他们在阅读能力评估方面的表现更高。我们的研究结果强调了语境作用的重要性、测试翻译的新方法的必要性,以及经合组织和非经合组织国家青年在语言地位方面的潜在相似性,这对解释他们的PISA阅读能力评估有影响。
{"title":"Incongruence Between Native and Test Administration Languages: Towards Equal Opportunity in International Literacy Assessment","authors":"Patriann Smith, P. Frazier, Jaehoon Lee, R. Chang","doi":"10.1080/15305058.2017.1407767","DOIUrl":"https://doi.org/10.1080/15305058.2017.1407767","url":null,"abstract":"Previous research has primarily addressed the effects of language on the Program for International Student Assessment (PISA) mathematics and science assessments. More recent research has focused on the effects of language on PISA reading comprehension and literacy assessments on student populations in specific Organization for Economic Cooperation and Development (OECD) and non-OECD countries. Recognizing calls to highlight the impact of language on student PISA reading performance across countries, the purpose of this study was to examine the effect of home languages versus test languages on PISA reading literacy across OECD and non-OECD economies, while considering other factors. The results of Ordinary Least Squares regression showed that about half of the economies demonstrated a positive and significant effect of students' language status on their reading performance. This finding is consistent with observations in the parallel analysis of PISA 2009 data, suggesting that students' performance on reading literacy assessment was higher when they were tested in their home language. Our findings highlight the importance of the role of context, the need for new approaches to test translation, and the potential similarities in language status for youth from OECD and non-OECD countries that have implications for interpreting their PISA reading literacy assessments.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"18 1","pages":"276 - 296"},"PeriodicalIF":1.7,"publicationDate":"2018-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2017.1407767","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41346188","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
期刊
International Journal of Testing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1