Pub Date : 2022-07-03DOI: 10.1080/08957347.2022.2103136
Elodie Pools
ABSTRACT Many low-stakes assessments, such as international large-scale surveys, are administered during time-limited testing sessions and some test-takers are not able to endorse the last items of the test, resulting in not-reached (NR) items. However, because the test has no consequence for the respondents, these NR items can also stem from quitting the test. This article, by means of mixture modeling, investigates heterogeneity in the onset of NR items in reading in PISA 2015. Test-taking behavior, assessed by the response times on the first items of the test, and the risk of NR item onset are modeled simultaneously in a 3-class model that distinguishes rapid, slow and typical respondents. Results suggest that NR items can come from a lack of time or from disengaged behaviors and that the relationship between the number of NR items and ability estimate can be affected by these non-effortful NR responses.
{"title":"Not-reached Items: An Issue of Time and of test-taking Disengagement? the Case of PISA 2015 Reading Data","authors":"Elodie Pools","doi":"10.1080/08957347.2022.2103136","DOIUrl":"https://doi.org/10.1080/08957347.2022.2103136","url":null,"abstract":"ABSTRACT Many low-stakes assessments, such as international large-scale surveys, are administered during time-limited testing sessions and some test-takers are not able to endorse the last items of the test, resulting in not-reached (NR) items. However, because the test has no consequence for the respondents, these NR items can also stem from quitting the test. This article, by means of mixture modeling, investigates heterogeneity in the onset of NR items in reading in PISA 2015. Test-taking behavior, assessed by the response times on the first items of the test, and the risk of NR item onset are modeled simultaneously in a 3-class model that distinguishes rapid, slow and typical respondents. Results suggest that NR items can come from a lack of time or from disengaged behaviors and that the relationship between the number of NR items and ability estimate can be affected by these non-effortful NR responses.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"197 - 221"},"PeriodicalIF":1.5,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45573999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-03DOI: 10.1080/08957347.2022.2103135
Steve Ferrara, J. Steedle, R. Frantz
ABSTRACT Item difficulty modeling studies involve (a) hypothesizing item features, or item response demands, that are likely to predict item difficulty with some degree of accuracy; and (b) entering the features as independent variables into a regression equation or other statistical model to predict difficulty. In this review, we report findings from 13 empirical item difficulty modeling studies of reading comprehension tests. We define reading comprehension item response demands as reading passage variables (e.g., length, complexity), passage-by-item variables (e.g., degree of correspondence between item and text, type of information requested), and item stem and response option variables. We report on response demand variables that are related to item difficulty and illustrate how they can be used to manage item difficulty in construct-relevant ways so that empirical item difficulties are within a targeted range (e.g., located within the Proficient or other proficiency level range on a test’s IRT scale, where intended).
{"title":"Response Demands of Reading Comprehension Test Items: A Review of Item Difficulty Modeling Studies","authors":"Steve Ferrara, J. Steedle, R. Frantz","doi":"10.1080/08957347.2022.2103135","DOIUrl":"https://doi.org/10.1080/08957347.2022.2103135","url":null,"abstract":"ABSTRACT Item difficulty modeling studies involve (a) hypothesizing item features, or item response demands, that are likely to predict item difficulty with some degree of accuracy; and (b) entering the features as independent variables into a regression equation or other statistical model to predict difficulty. In this review, we report findings from 13 empirical item difficulty modeling studies of reading comprehension tests. We define reading comprehension item response demands as reading passage variables (e.g., length, complexity), passage-by-item variables (e.g., degree of correspondence between item and text, type of information requested), and item stem and response option variables. We report on response demand variables that are related to item difficulty and illustrate how they can be used to manage item difficulty in construct-relevant ways so that empirical item difficulties are within a targeted range (e.g., located within the Proficient or other proficiency level range on a test’s IRT scale, where intended).","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"237 - 253"},"PeriodicalIF":1.5,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49021008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-07-03DOI: 10.1080/08957347.2022.2103134
Jiajun Xu, Nathan Dadey
ABSTRACT This paper explores how student performance across the full set of multiple modular assessments of individual standards, which we refer to as mini-assessments, from a large scale, operational program of interim assessment can be summarized using Bayesian networks. We follow a completely data-driven approach in which no constraints are imposed to best reflect the empirical relationships between these assessments, and a learning trajectory approach in which constraints are imposed to mirror the stages of a mathematic learning trajectory to provide insight into student learning. Under both approaches, we aim to draw a holistic picture of performance across all of the mini-assessments that provides additional information for students, educators, and administrators. In particular, the graphical structure of the network and the conditional probabilities of mastery provide information above and beyond an overall score on a single mini-assessment. Uses and implications of our work are discussed.
{"title":"Using Bayesian Networks to Characterize Student Performance across Multiple Assessments of Individual Standards","authors":"Jiajun Xu, Nathan Dadey","doi":"10.1080/08957347.2022.2103134","DOIUrl":"https://doi.org/10.1080/08957347.2022.2103134","url":null,"abstract":"ABSTRACT This paper explores how student performance across the full set of multiple modular assessments of individual standards, which we refer to as mini-assessments, from a large scale, operational program of interim assessment can be summarized using Bayesian networks. We follow a completely data-driven approach in which no constraints are imposed to best reflect the empirical relationships between these assessments, and a learning trajectory approach in which constraints are imposed to mirror the stages of a mathematic learning trajectory to provide insight into student learning. Under both approaches, we aim to draw a holistic picture of performance across all of the mini-assessments that provides additional information for students, educators, and administrators. In particular, the graphical structure of the network and the conditional probabilities of mastery provide information above and beyond an overall score on a single mini-assessment. Uses and implications of our work are discussed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"179 - 196"},"PeriodicalIF":1.5,"publicationDate":"2022-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43312965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-03DOI: 10.1080/08957347.2022.2067541
Jessica L. Jonson
ABSTRACT This article describes a grant project that generated a technical guide for PK-12 educators who are utilizing social and emotional learning (SEL) assessments for educational improvement purposes. The guide was developed over a two-year period with funding from the Spencer Foundation. The result was the collective contribution of a widely representative group of scholars and practitioners whose background and expertise provided a multifaceted view of important considerations when evaluating the measurement quality of an SEL assessment. The intent of the guide is to enable PK-12 educators to make more informed decisions when identifying, evaluating, and using valid, reliable, and fair SEL assessments for the purposes of curricular and program improvements. The efforts can also serve as an example of how to contextualize professional standards for testing practice that support the selection and use of tests by non-measurement audiences.
{"title":"Guiding Educators’ Evaluation of the Measurement Quality of Social and Emotional Learning (SEL) Assessments","authors":"Jessica L. Jonson","doi":"10.1080/08957347.2022.2067541","DOIUrl":"https://doi.org/10.1080/08957347.2022.2067541","url":null,"abstract":"ABSTRACT This article describes a grant project that generated a technical guide for PK-12 educators who are utilizing social and emotional learning (SEL) assessments for educational improvement purposes. The guide was developed over a two-year period with funding from the Spencer Foundation. The result was the collective contribution of a widely representative group of scholars and practitioners whose background and expertise provided a multifaceted view of important considerations when evaluating the measurement quality of an SEL assessment. The intent of the guide is to enable PK-12 educators to make more informed decisions when identifying, evaluating, and using valid, reliable, and fair SEL assessments for the purposes of curricular and program improvements. The efforts can also serve as an example of how to contextualize professional standards for testing practice that support the selection and use of tests by non-measurement audiences.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"153 - 177"},"PeriodicalIF":1.5,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49260119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-03DOI: 10.1080/08957347.2022.2067542
Mohammed A. A. Abulela, Joseph A. Rios
ABSTRACT When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the Mantel-Haenszel (MH), standardization index (STD), and logistic regression (LR) differential item functioning (DIF) procedures to type I error in the presence of differential RG. Sample size, test difficulty, group impact, and differential RG rates were manipulated. Findings revealed that the LR procedure was completely robust to type I errors, while slightly elevated false positive rates (< 1%) were observed for the MH and STD procedures. An applied analysis examining data from the Programme for International Student Assessment showed minimal differences in DIF classifications when comparing data in which RG responses were unfiltered and filtered. These results suggest that large rates of differences in RG rates between subgroups are unassociated with false positive classifications of DIF.
{"title":"Comparing the Robustness of Three Nonparametric DIF Procedures to Differential Rapid Guessing","authors":"Mohammed A. A. Abulela, Joseph A. Rios","doi":"10.1080/08957347.2022.2067542","DOIUrl":"https://doi.org/10.1080/08957347.2022.2067542","url":null,"abstract":"ABSTRACT When there are no personal consequences associated with test performance for examinees, rapid guessing (RG) is a concern and can differ between subgroups. To date, the impact of differential RG on item-level measurement invariance has received minimal attention. To that end, a simulation study was conducted to examine the robustness of the Mantel-Haenszel (MH), standardization index (STD), and logistic regression (LR) differential item functioning (DIF) procedures to type I error in the presence of differential RG. Sample size, test difficulty, group impact, and differential RG rates were manipulated. Findings revealed that the LR procedure was completely robust to type I errors, while slightly elevated false positive rates (< 1%) were observed for the MH and STD procedures. An applied analysis examining data from the Programme for International Student Assessment showed minimal differences in DIF classifications when comparing data in which RG responses were unfiltered and filtered. These results suggest that large rates of differences in RG rates between subgroups are unassociated with false positive classifications of DIF.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"81 - 94"},"PeriodicalIF":1.5,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45517411","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-03DOI: 10.1080/08957347.2022.2067539
Séverin Lions, Carlos Monsalve, P. Dartnell, María Paz Blanco, Gabriel Ortega, Julie Lemarié
ABSTRACT Multiple-choice tests are widely used in education, often for high-stakes assessment purposes. Consequently, these tests should be constructed following the highest standards. Many efforts have been undertaken to advance item-writing guidelines intended to improve tests. One important issue is the unwanted effects of the options’ position on test outcomes. Any possible effects should be controlled through an adequate response options placement strategy. However, literature is not straightforward about how test developers arrange options. Therefore, this research synthesis systematically reviewed studies examining adherence to options placement guidelines. Relevant item features, such as the item source (standardized or teacher-made tests) and the number of options were considered. Results show that answer keys’ distribution across tests is often biased, which might provide examinees with clues to select correct options. Findings also show that options are not always arranged in a “logical” fashion (numerically, alphabetically…) despite being suited to be so arranged. The reasons underlying non-adherence to options placement guidelines are discussed, as is the appropriateness of observed response options placement strategies. Suggestions are provided to help developers better arrange items options.
{"title":"Does the Response Options Placement Provide Clues to the Correct Answers in Multiple-choice Tests? A Systematic Review","authors":"Séverin Lions, Carlos Monsalve, P. Dartnell, María Paz Blanco, Gabriel Ortega, Julie Lemarié","doi":"10.1080/08957347.2022.2067539","DOIUrl":"https://doi.org/10.1080/08957347.2022.2067539","url":null,"abstract":"ABSTRACT Multiple-choice tests are widely used in education, often for high-stakes assessment purposes. Consequently, these tests should be constructed following the highest standards. Many efforts have been undertaken to advance item-writing guidelines intended to improve tests. One important issue is the unwanted effects of the options’ position on test outcomes. Any possible effects should be controlled through an adequate response options placement strategy. However, literature is not straightforward about how test developers arrange options. Therefore, this research synthesis systematically reviewed studies examining adherence to options placement guidelines. Relevant item features, such as the item source (standardized or teacher-made tests) and the number of options were considered. Results show that answer keys’ distribution across tests is often biased, which might provide examinees with clues to select correct options. Findings also show that options are not always arranged in a “logical” fashion (numerically, alphabetically…) despite being suited to be so arranged. The reasons underlying non-adherence to options placement guidelines are discussed, as is the appropriateness of observed response options placement strategies. Suggestions are provided to help developers better arrange items options.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"133 - 152"},"PeriodicalIF":1.5,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48343747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-03DOI: 10.1080/08957347.2022.2067543
Yoon Ah Song, Won‐Chan Lee
ABSTRACT This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of proficiency estimation of two IRT models (GPCM versus the hierarchical rater model, HRM) for double ratings. The main findings were as follows: (a) rater effects substantially reduced the accuracy of IRT proficiency estimation; (b) double ratings relieved the negative impact of rater effects on proficiency estimation and improved the accuracy relative to single ratings; (c) IRT estimators showed different patterns in the conditional accuracy; (d) as more items and a larger number of score categories were used, the accuracy of proficiency estimation improved; and (e) the HRM consistently showed better performance than the GPCM.
{"title":"Effects of Using Double Ratings as Item Scores on IRT Proficiency Estimation","authors":"Yoon Ah Song, Won‐Chan Lee","doi":"10.1080/08957347.2022.2067543","DOIUrl":"https://doi.org/10.1080/08957347.2022.2067543","url":null,"abstract":"ABSTRACT This article presents the performance of item response theory (IRT) models when double ratings are used as item scores over single ratings when rater effects are present. Study 1 examined the influence of the number of ratings on the accuracy of proficiency estimation in the generalized partial credit model (GPCM). Study 2 compared the accuracy of proficiency estimation of two IRT models (GPCM versus the hierarchical rater model, HRM) for double ratings. The main findings were as follows: (a) rater effects substantially reduced the accuracy of IRT proficiency estimation; (b) double ratings relieved the negative impact of rater effects on proficiency estimation and improved the accuracy relative to single ratings; (c) IRT estimators showed different patterns in the conditional accuracy; (d) as more items and a larger number of score categories were used, the accuracy of proficiency estimation improved; and (e) the HRM consistently showed better performance than the GPCM.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"95 - 115"},"PeriodicalIF":1.5,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42878209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-04-03DOI: 10.1080/08957347.2022.2067540
John Alexander Silva Diaz, Carmen Köhler, J. Hartig
ABSTRACT Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. Infit and outfit fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the traditional use of fixed infit and outfit cutoff points is an ineffective practice. This article evaluates if confidence intervals estimated via parametric bootstrapping provide more suitable cutoff points than the conventionally applied range of 0.8–1.2, and outfit critical ranges adjusted by sample size. The performance is evaluated under different sizes of misfit, sample sizes, and number of items. Results show that the confidence intervals performed better in terms of power, but had inflated type-I error rates, which resulted from mean square values pushed below unity in the large size of misfit conditions. However, when performing a one-side test with the upper range of the confidence intervals, the forementioned inflation was fixed.
{"title":"Performance of Infit and Outfit Confidence Intervals Calculated via Parametric Bootstrapping","authors":"John Alexander Silva Diaz, Carmen Köhler, J. Hartig","doi":"10.1080/08957347.2022.2067540","DOIUrl":"https://doi.org/10.1080/08957347.2022.2067540","url":null,"abstract":"ABSTRACT Testing item fit is central in item response theory (IRT) modeling, since a good fit is necessary to draw valid inferences from estimated model parameters. Infit and outfit fit statistics, widespread indices for detecting deviations from the Rasch model, are affected by data factors, such as sample size. Consequently, the traditional use of fixed infit and outfit cutoff points is an ineffective practice. This article evaluates if confidence intervals estimated via parametric bootstrapping provide more suitable cutoff points than the conventionally applied range of 0.8–1.2, and outfit critical ranges adjusted by sample size. The performance is evaluated under different sizes of misfit, sample sizes, and number of items. Results show that the confidence intervals performed better in terms of power, but had inflated type-I error rates, which resulted from mean square values pushed below unity in the large size of misfit conditions. However, when performing a one-side test with the upper range of the confidence intervals, the forementioned inflation was fixed.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"116 - 132"},"PeriodicalIF":1.5,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48961483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-02DOI: 10.1080/08957347.2022.2034823
Amy K. Clark, Brooke L. Nash, Meagan Karvonen
ABSTRACT Assessments scored with diagnostic models are increasingly popular because they provide fine-grained information about student achievement. Because of differences in how diagnostic assessments are scored and how results are used, the information teachers must know to interpret and use results may differ from concepts traditionally included in assessment literacy trainings for assessments that produce a raw or scale score. In this study, we connect assessment literacy and score reporting literature to understand teachers’ assessment literacy in a diagnostic assessment context as demonstrated by responses to focus groups and surveys. Results summarize teachers’ descriptions of fundamental diagnostic assessment concepts, understanding of the diagnostic assessment and results produced, and how diagnostic assessment results influence their instructional decision-making. Teachers understood how to use results and were comfortable using the term mastery when interpreting score report contents and planning next instruction. However, teachers were unsure how mastery was calculated and some misinterpreted mastery as representing a percent correct rather than a probability value. We share implications for others implementing large-scale diagnostic assessments or designing score reports for these systems.
{"title":"Teacher Assessment Literacy: Implications for Diagnostic Assessment Systems","authors":"Amy K. Clark, Brooke L. Nash, Meagan Karvonen","doi":"10.1080/08957347.2022.2034823","DOIUrl":"https://doi.org/10.1080/08957347.2022.2034823","url":null,"abstract":"ABSTRACT Assessments scored with diagnostic models are increasingly popular because they provide fine-grained information about student achievement. Because of differences in how diagnostic assessments are scored and how results are used, the information teachers must know to interpret and use results may differ from concepts traditionally included in assessment literacy trainings for assessments that produce a raw or scale score. In this study, we connect assessment literacy and score reporting literature to understand teachers’ assessment literacy in a diagnostic assessment context as demonstrated by responses to focus groups and surveys. Results summarize teachers’ descriptions of fundamental diagnostic assessment concepts, understanding of the diagnostic assessment and results produced, and how diagnostic assessment results influence their instructional decision-making. Teachers understood how to use results and were comfortable using the term mastery when interpreting score report contents and planning next instruction. However, teachers were unsure how mastery was calculated and some misinterpreted mastery as representing a percent correct rather than a probability value. We share implications for others implementing large-scale diagnostic assessments or designing score reports for these systems.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"17 - 32"},"PeriodicalIF":1.5,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43909414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-01-02DOI: 10.1080/08957347.2022.2034821
Yuting Han, M. Wilson
ABSTRACT A technology-based problem-solving test can automatically capture all the actions of students when they complete tasks and save them as process data. Response sequences are the external manifestations of the latent intellectual activities of the students, and it contains rich information about students’ abilities and different problem-solving strategies. This study adopted the mixture Rasch measurement models (MRMs) in analyzing the success of technology-based tasks while automatically classifying the different response patterns based on the characteristics of the response process. The Olive Oil task from the Assessment and Teaching of 21st Century Skills project (ATC21S) is taken as an example to illustrate the use of MRMs and the interpretation of the process data.
{"title":"Analyzing Student Response Processes to Evaluate Success on a Technology-Based Problem-Solving Task","authors":"Yuting Han, M. Wilson","doi":"10.1080/08957347.2022.2034821","DOIUrl":"https://doi.org/10.1080/08957347.2022.2034821","url":null,"abstract":"ABSTRACT A technology-based problem-solving test can automatically capture all the actions of students when they complete tasks and save them as process data. Response sequences are the external manifestations of the latent intellectual activities of the students, and it contains rich information about students’ abilities and different problem-solving strategies. This study adopted the mixture Rasch measurement models (MRMs) in analyzing the success of technology-based tasks while automatically classifying the different response patterns based on the characteristics of the response process. The Olive Oil task from the Assessment and Teaching of 21st Century Skills project (ATC21S) is taken as an example to illustrate the use of MRMs and the interpretation of the process data.","PeriodicalId":51609,"journal":{"name":"Applied Measurement in Education","volume":"35 1","pages":"33 - 45"},"PeriodicalIF":1.5,"publicationDate":"2022-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46282968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}