首页 > 最新文献

Journal of Educational Measurement最新文献

英文 中文
An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-12-20 DOI: 10.1111/jedm.12421
Kaiwen Man, Joni M. Lakin

Eye-tracking procedures generate copious process data that could be valuable in establishing the response processes component of modern validity theory. However, there is a lack of tools for assessing and visualizing response processes using process data such as eye-tracking fixation sequences, especially those suitable for young children. This study, which explored student responses to a spatial reasoning task, employed eye tracking and social network analysis to model, examine, and visualize students' visual transition patterns while solving spatial problems to begin to elucidate these processes. Fifty students in Grades 2–8 completed a spatial reasoning task as eye movements were recorded. Areas of interest (AoIs) were defined within the task for each spatial reasoning question. Transition networks between AoIs were constructed and analyzed using selected network measures. Results revealed shared transition sequences across students as well as strategic differences between high and low performers. High performers demonstrated more integrated transitions between AoIs, while low performers considered information more in isolation. Additionally, age and the interaction of age and performance did not significantly impact these measures. The study demonstrates a novel modeling approach for investigating visual processing and provides initial evidence that high-performing students more deeply engage with visual information in solving these types of questions.

{"title":"An Exploratory Study Using Innovative Graphical Network Analysis to Model Eye Movements in Spatial Reasoning Problem Solving","authors":"Kaiwen Man,&nbsp;Joni M. Lakin","doi":"10.1111/jedm.12421","DOIUrl":"https://doi.org/10.1111/jedm.12421","url":null,"abstract":"<p>Eye-tracking procedures generate copious process data that could be valuable in establishing the response processes component of modern validity theory. However, there is a lack of tools for assessing and visualizing response processes using process data such as eye-tracking fixation sequences, especially those suitable for young children. This study, which explored student responses to a spatial reasoning task, employed eye tracking and social network analysis to model, examine, and visualize students' visual transition patterns while solving spatial problems to begin to elucidate these processes. Fifty students in Grades 2–8 completed a spatial reasoning task as eye movements were recorded. Areas of interest (AoIs) were defined within the task for each spatial reasoning question. Transition networks between AoIs were constructed and analyzed using selected network measures. Results revealed shared transition sequences across students as well as strategic differences between high and low performers. High performers demonstrated more integrated transitions between AoIs, while low performers considered information more in isolation. Additionally, age and the interaction of age and performance did not significantly impact these measures. The study demonstrates a novel modeling approach for investigating visual processing and provides initial evidence that high-performing students more deeply engage with visual information in solving these types of questions.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"710-739"},"PeriodicalIF":1.4,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differences in Time Usage as a Competing Hypothesis for Observed Group Differences in Accuracy with an Application to Observed Gender Differences in PISA Data
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-11-01 DOI: 10.1111/jedm.12419
Radhika Kapoor, Erin Fahle, Klint Kanopka, David Klinowski, Ana Trindade Ribeiro, Benjamin W. Domingue

Group differences in test scores are a key metric in education policy. Response time offers novel opportunities for understanding these differences, especially in low-stakes settings. Here, we describe how observed group differences in test accuracy can be attributed to group differences in latent response speed or group differences in latent capacity, where capacity is defined as expected accuracy for a given response speed. This article introduces a method for decomposing observed group differences in accuracy into these differences in speed versus differences in capacity. We first illustrate in simulation studies that this approach can reliably distinguish between group speed and capacity differences. We then use this approach to probe gender differences in science and reading fluency in PISA 2018 for 71 countries. In science, score differentials largely increase when males, who respond more rapidly, are the higher performing group and decrease when females, who respond more slowly, are the higher performing group. In reading fluency, score differentials decrease where females, who respond more rapidly, are the higher performing group. This method can be used to analyze group differences especially in low-stakes assessments where there are potential group differences in speed.

{"title":"Differences in Time Usage as a Competing Hypothesis for Observed Group Differences in Accuracy with an Application to Observed Gender Differences in PISA Data","authors":"Radhika Kapoor,&nbsp;Erin Fahle,&nbsp;Klint Kanopka,&nbsp;David Klinowski,&nbsp;Ana Trindade Ribeiro,&nbsp;Benjamin W. Domingue","doi":"10.1111/jedm.12419","DOIUrl":"https://doi.org/10.1111/jedm.12419","url":null,"abstract":"<p>Group differences in test scores are a key metric in education policy. Response time offers novel opportunities for understanding these differences, especially in low-stakes settings. Here, we describe how observed group differences in test accuracy can be attributed to group differences in latent response speed or group differences in latent capacity, where capacity is defined as expected accuracy for a given response speed. This article introduces a method for decomposing observed group differences in accuracy into these differences in speed versus differences in capacity. We first illustrate in simulation studies that this approach can reliably distinguish between group speed and capacity differences. We then use this approach to probe gender differences in science and reading fluency in PISA 2018 for 71 countries. In science, score differentials largely increase when males, who respond more rapidly, are the higher performing group and decrease when females, who respond more slowly, are the higher performing group. In reading fluency, score differentials decrease where females, who respond more rapidly, are the higher performing group. This method can be used to analyze group differences especially in low-stakes assessments where there are potential group differences in speed.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"682-709"},"PeriodicalIF":1.4,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143247456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to “Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior”
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-10-23 DOI: 10.1111/jedm.12418

Hurtz, G.M., & Mucino, R. (2024). Expanding the lognormal response time model using profile similarity metrics to improve the detection of anomalous testing behavior. Journal of Educational Measurement, 61, 458–485. https://doi.org/10.1111/jedm.12395

We apologize for this error.

{"title":"Correction to “Expanding the Lognormal Response Time Model Using Profile Similarity Metrics to Improve the Detection of Anomalous Testing Behavior”","authors":"","doi":"10.1111/jedm.12418","DOIUrl":"https://doi.org/10.1111/jedm.12418","url":null,"abstract":"<p>Hurtz, G.M., &amp; Mucino, R. (2024). Expanding the lognormal response time model using profile similarity metrics to improve the detection of anomalous testing behavior. <i>Journal of Educational Measurement, 61</i>, 458–485. https://doi.org/10.1111/jedm.12395</p><p>We apologize for this error.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"780"},"PeriodicalIF":1.4,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12418","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Subscores: A Practical Guide to Their Production and Consumption. Shelby Haberman, Sandip Sinharay, Richard Feinberg, and Howard Wainer. Cambridge, Cambridge University Press 2024, 176 pp. (paperback)
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-10-18 DOI: 10.1111/jedm.12417
Gautam Puhan
{"title":"Subscores: A Practical Guide to Their Production and Consumption. Shelby Haberman, Sandip Sinharay, Richard Feinberg, and Howard Wainer. Cambridge, Cambridge University Press 2024, 176 pp. (paperback)","authors":"Gautam Puhan","doi":"10.1111/jedm.12417","DOIUrl":"https://doi.org/10.1111/jedm.12417","url":null,"abstract":"","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"763-772"},"PeriodicalIF":1.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-10-18 DOI: 10.1111/jedm.12416
Yang Jiang, Mo Zhang, Jiangang Hao, Paul Deane, Chen Li

The emergence of sophisticated AI tools such as ChatGPT, coupled with the transition to remote delivery of educational assessments in the COVID-19 era, has led to increasing concerns about academic integrity and test security. Using AI tools, test takers can produce high-quality texts effortlessly and use them to game assessments. It is thus critical to detect these nonauthentic texts to ensure test integrity. In this study, we leveraged keystroke logs—recordings of every keypress—to build machine learning (ML) detectors of nonauthentic texts in a large-scale writing assessment. We focused on investigating the fairness of the detectors across demographic subgroups to ensure that nongenuine writing can be predicted equally well across subgroups. Results indicated that keystroke dynamics were effective in identifying nonauthentic texts. While the ML models were slightly more likely to misclassify the original responses submitted by male test takers as consisting of nonauthentic texts than those submitted by females, the effect sizes were negligible. Furthermore, balancing demographic distributions and class labels did not consistently mitigate detector bias across predictive models. Findings of this study not only provide implications for using behavioral data to address test security issues, but also highlight the importance of evaluating the fairness of predictive models in educational contexts.

{"title":"Using Keystroke Behavior Patterns to Detect Nonauthentic Texts in Writing Assessments: Evaluating the Fairness of Predictive Models","authors":"Yang Jiang,&nbsp;Mo Zhang,&nbsp;Jiangang Hao,&nbsp;Paul Deane,&nbsp;Chen Li","doi":"10.1111/jedm.12416","DOIUrl":"https://doi.org/10.1111/jedm.12416","url":null,"abstract":"<p>The emergence of sophisticated AI tools such as ChatGPT, coupled with the transition to remote delivery of educational assessments in the COVID-19 era, has led to increasing concerns about academic integrity and test security. Using AI tools, test takers can produce high-quality texts effortlessly and use them to game assessments. It is thus critical to detect these nonauthentic texts to ensure test integrity. In this study, we leveraged keystroke logs—recordings of every keypress—to build machine learning (ML) detectors of nonauthentic texts in a large-scale writing assessment. We focused on investigating the fairness of the detectors across demographic subgroups to ensure that nongenuine writing can be predicted equally well across subgroups. Results indicated that keystroke dynamics were effective in identifying nonauthentic texts. While the ML models were slightly more likely to misclassify the original responses submitted by male test takers as consisting of nonauthentic texts than those submitted by females, the effect sizes were negligible. Furthermore, balancing demographic distributions and class labels did not consistently mitigate detector bias across predictive models. Findings of this study not only provide implications for using behavioral data to address test security issues, but also highlight the importance of evaluating the fairness of predictive models in educational contexts.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"571-594"},"PeriodicalIF":1.4,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-10-17 DOI: 10.1111/jedm.12415
Hwanggyu Lim, Danqi Zhu, Edison M. Choe, KyungT. Han, Chris
<p>This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of implementation. The performance of GRDIF was assessed through a simulation study and compared with existing DIF detection methods, including the generalized Mantel-Haenszel, Lasso-DIF, and alignment methods. Results showed that the GRDIF framework demonstrated well-controlled Type I error rates close to the nominal level of .05 and satisfactory power in detecting uniform, nonuniform, and mixed DIF across different simulated conditions. Each of the three GRDIF statistics, <span></span><math> <semantics> <mrow> <mi>G</mi> <mi>R</mi> <mi>D</mi> <mi>I</mi> <msub> <mi>F</mi> <mi>R</mi> </msub> </mrow> <annotation>$GRDI{{F}_R}$</annotation> </semantics></math>, <span></span><math> <semantics> <mrow> <mi>G</mi> <mi>R</mi> <mi>D</mi> <mi>I</mi> <msub> <mi>F</mi> <mi>S</mi> </msub> </mrow> <annotation>$GRDI{{F}_S}$</annotation> </semantics></math>, and <span></span><math> <semantics> <mrow> <mi>G</mi> <mi>R</mi> <mi>D</mi> <mi>I</mi> <msub> <mi>F</mi> <mrow> <mi>R</mi> <mi>S</mi> </mrow> </msub> </mrow> <annotation>$GRDI{{F}_{RS}}$</annotation> </semantics></math>, effectively detected the specific type of DIF for which it was designed, with <span></span><math> <semantics> <mrow> <mi>G</mi> <mi>R</mi> <mi>D</mi> <mi>I</mi> <msub> <mi>F</mi> <mrow> <mi>R</mi> <mi>S</mi> </mrow> </msub> </mrow> <annotation>$GRDI{{F}_{RS}}$</annotation> </semantics></math> exhibiting the most robust performance across all types of DIF. The GRDIF framework outperformed other
{"title":"Detecting Differential Item Functioning among Multiple Groups Using IRT Residual DIF Framework","authors":"Hwanggyu Lim,&nbsp;Danqi Zhu,&nbsp;Edison M. Choe,&nbsp;KyungT. Han,&nbsp;Chris","doi":"10.1111/jedm.12415","DOIUrl":"https://doi.org/10.1111/jedm.12415","url":null,"abstract":"&lt;p&gt;This study presents a generalized version of the residual differential item functioning (RDIF) detection framework in item response theory, named GRDIF, to analyze differential item functioning (DIF) in multiple groups. The GRDIF framework retains the advantages of the original RDIF framework, such as computational efficiency and ease of implementation. The performance of GRDIF was assessed through a simulation study and compared with existing DIF detection methods, including the generalized Mantel-Haenszel, Lasso-DIF, and alignment methods. Results showed that the GRDIF framework demonstrated well-controlled Type I error rates close to the nominal level of .05 and satisfactory power in detecting uniform, nonuniform, and mixed DIF across different simulated conditions. Each of the three GRDIF statistics, &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;G&lt;/mi&gt;\u0000 &lt;mi&gt;R&lt;/mi&gt;\u0000 &lt;mi&gt;D&lt;/mi&gt;\u0000 &lt;mi&gt;I&lt;/mi&gt;\u0000 &lt;msub&gt;\u0000 &lt;mi&gt;F&lt;/mi&gt;\u0000 &lt;mi&gt;R&lt;/mi&gt;\u0000 &lt;/msub&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$GRDI{{F}_R}$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;G&lt;/mi&gt;\u0000 &lt;mi&gt;R&lt;/mi&gt;\u0000 &lt;mi&gt;D&lt;/mi&gt;\u0000 &lt;mi&gt;I&lt;/mi&gt;\u0000 &lt;msub&gt;\u0000 &lt;mi&gt;F&lt;/mi&gt;\u0000 &lt;mi&gt;S&lt;/mi&gt;\u0000 &lt;/msub&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$GRDI{{F}_S}$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, and &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;G&lt;/mi&gt;\u0000 &lt;mi&gt;R&lt;/mi&gt;\u0000 &lt;mi&gt;D&lt;/mi&gt;\u0000 &lt;mi&gt;I&lt;/mi&gt;\u0000 &lt;msub&gt;\u0000 &lt;mi&gt;F&lt;/mi&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;R&lt;/mi&gt;\u0000 &lt;mi&gt;S&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;/msub&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$GRDI{{F}_{RS}}$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt;, effectively detected the specific type of DIF for which it was designed, with &lt;span&gt;&lt;/span&gt;&lt;math&gt;\u0000 &lt;semantics&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;G&lt;/mi&gt;\u0000 &lt;mi&gt;R&lt;/mi&gt;\u0000 &lt;mi&gt;D&lt;/mi&gt;\u0000 &lt;mi&gt;I&lt;/mi&gt;\u0000 &lt;msub&gt;\u0000 &lt;mi&gt;F&lt;/mi&gt;\u0000 &lt;mrow&gt;\u0000 &lt;mi&gt;R&lt;/mi&gt;\u0000 &lt;mi&gt;S&lt;/mi&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;/msub&gt;\u0000 &lt;/mrow&gt;\u0000 &lt;annotation&gt;$GRDI{{F}_{RS}}$&lt;/annotation&gt;\u0000 &lt;/semantics&gt;&lt;/math&gt; exhibiting the most robust performance across all types of DIF. The GRDIF framework outperformed other","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"656-681"},"PeriodicalIF":1.4,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/jedm.12415","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Item Response Tree Model for Items with Multiple-Choice and Constructed-Response Parts
IF 1.4 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-10-07 DOI: 10.1111/jedm.12414
Junhuan Wei, Qin Wang, Buyun Dai, Yan Cai, Dongbo Tu

Traditional IRT and IRTree models are not appropriate for analyzing the item that simultaneously consists of multiple-choice (MC) task and constructed-response (CR) task in one item. To address this issue, this study proposed an item response tree model (called as IRTree-MR) to accommodate items that contain different response types at different steps and multiple different cognitive processes behind each score to effectively investigate the cognitive process and achieve a more accurate evaluation of examinees. The proposed model employs appropriate processing function for each task and allows multiple paths to an observed outcome. The simulation studies were conducted to evaluate the performance of the proposed IRTree-MR, and results show the proposed model outperforms the traditional IRT model in terms of parameters recovery and model-fit. Moreover, an empirical study was carried out to verify the advantages of the proposed model.

{"title":"An Item Response Tree Model for Items with Multiple-Choice and Constructed-Response Parts","authors":"Junhuan Wei,&nbsp;Qin Wang,&nbsp;Buyun Dai,&nbsp;Yan Cai,&nbsp;Dongbo Tu","doi":"10.1111/jedm.12414","DOIUrl":"https://doi.org/10.1111/jedm.12414","url":null,"abstract":"<p>Traditional IRT and IRTree models are not appropriate for analyzing the item that simultaneously consists of multiple-choice (MC) task and constructed-response (CR) task in one item. To address this issue, this study proposed an item response tree model (called as IRTree-MR) to accommodate items that contain different response types at different steps and multiple different cognitive processes behind each score to effectively investigate the cognitive process and achieve a more accurate evaluation of examinees. The proposed model employs appropriate processing function for each task and allows multiple paths to an observed outcome. The simulation studies were conducted to evaluate the performance of the proposed IRTree-MR, and results show the proposed model outperforms the traditional IRT model in terms of parameters recovery and model-fit. Moreover, an empirical study was carried out to verify the advantages of the proposed model.</p>","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"61 4","pages":"634-655"},"PeriodicalIF":1.4,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143248912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses 基于日志文件的行为过程数据分析的顺序储层计算
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-09-14 DOI: 10.1111/jedm.12413
Jiawei Xiong, Shiyu Wang, Cheng Tang, Qidi Liu, Rufei Sheng, Bowen Wang, Huan Kuang, Allan S. Cohen, Xinhui Xiong
The use of process data in assessment has gained attention in recent years as more assessments are administered by computers. Process data, recorded in computer log files, capture the sequence of examinees' response activities, for example, timestamped keystrokes, during the assessment. Traditional measurement methods are often inadequate for handling this type of data. In this paper, we proposed a sequential reservoir method (SRM) based on a reservoir computing model using the echo state network, with the particle swarm optimization and singular value decomposition as optimization. Designed to regularize features from process data through a computational self‐learning algorithm, this method has been evaluated using both simulated and empirical data. Simulation results suggested that, on one hand, the model effectively transforms action sequences into standardized and meaningful features, and on the other hand, these features are instrumental in categorizing latent behavioral groups and predicting latent information. Empirical results further indicate that SRM can predict assessment efficiency. The features extracted by SRM have been verified as related to action sequence lengths through the correlation analysis. This proposed method enhances the extraction and accessibility of meaningful information from process data, presenting an alternative to existing process data technologies.
近年来,随着越来越多的测评由计算机进行,在测评中使用过程数据的做法越来越受关注。记录在计算机日志文件中的过程数据可以捕捉考生在测评过程中的反应活动顺序,例如,带有时间戳记的击键。传统的测量方法往往不足以处理这类数据。在本文中,我们提出了一种基于水库计算模型的顺序水库法(SRM),该模型使用回波状态网络,并以粒子群优化和奇异值分解作为优化手段。该方法旨在通过计算自学习算法对过程数据中的特征进行正则化处理,我们利用模拟数据和经验数据对该方法进行了评估。模拟结果表明,一方面,该模型能有效地将动作序列转化为标准化和有意义的特征;另一方面,这些特征有助于对潜在行为组进行分类和预测潜在信息。实证结果进一步表明,SRM 可以预测评估效率。通过相关性分析,SRM 提取的特征与动作序列长度的相关性得到了验证。所提出的方法提高了从过程数据中提取和获取有意义信息的能力,为现有的过程数据技术提供了一种替代方案。
{"title":"Sequential Reservoir Computing for Log File‐Based Behavior Process Data Analyses","authors":"Jiawei Xiong, Shiyu Wang, Cheng Tang, Qidi Liu, Rufei Sheng, Bowen Wang, Huan Kuang, Allan S. Cohen, Xinhui Xiong","doi":"10.1111/jedm.12413","DOIUrl":"https://doi.org/10.1111/jedm.12413","url":null,"abstract":"The use of process data in assessment has gained attention in recent years as more assessments are administered by computers. Process data, recorded in computer log files, capture the sequence of examinees' response activities, for example, timestamped keystrokes, during the assessment. Traditional measurement methods are often inadequate for handling this type of data. In this paper, we proposed a sequential reservoir method (SRM) based on a reservoir computing model using the echo state network, with the particle swarm optimization and singular value decomposition as optimization. Designed to regularize features from process data through a computational self‐learning algorithm, this method has been evaluated using both simulated and empirical data. Simulation results suggested that, on one hand, the model effectively transforms action sequences into standardized and meaningful features, and on the other hand, these features are instrumental in categorizing latent behavioral groups and predicting latent information. Empirical results further indicate that SRM can predict assessment efficiency. The features extracted by SRM have been verified as related to action sequence lengths through the correlation analysis. This proposed method enhances the extraction and accessibility of meaningful information from process data, presenting an alternative to existing process data technologies.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"16 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142253879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Latent Constructs through Multimodal Data Analysis 通过多模态数据分析探索潜在结构
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-08-14 DOI: 10.1111/jedm.12412
Shiyu Wang, Shushan Wu, Yinghan Chen, Luyang Fang, Liang Xiao, Feiming Li
This study presents a comprehensive analysis of three types of multimodal data‐response accuracy, response times, and eye‐tracking data‐derived from a computer‐based spatial rotation test. To tackle the complexity of high‐dimensional data analysis challenges, we have developed a methodological framework incorporating various statistical and machine learning methods. The results of our study reveal that hidden state transition probabilities, based on eye‐tracking features, may be contingent on skill mastery estimated from the fluency CDM model. The hidden state trajectory offers additional diagnostic insights into spatial rotation problem‐solving, surpassing the information provided by the fluency CDM alone. Furthermore, the distribution of participants across different hidden states reflects the intricate nature of visualizing objects in each item, adding a nuanced dimension to the characterization of item features. This complements the information obtained from item parameters in the fluency CDM model, which relies on response accuracy and response time. Our findings have the potential to pave the way for the development of new psychometric and statistical models capable of seamlessly integrating various types of multimodal data. This integrated approach promises more meaningful and interpretable results, with implications for advancing the understanding of cognitive processes involved in spatial rotation tests.
本研究全面分析了基于计算机的空间旋转测试中的三种多模态数据--反应准确性、反应时间和眼动数据。为了应对复杂的高维数据分析挑战,我们开发了一个方法框架,其中融合了各种统计和机器学习方法。我们的研究结果表明,基于眼动跟踪特征的隐藏状态转换概率,可能取决于根据流畅性 CDM 模型估计的技能掌握程度。隐藏状态轨迹为空间旋转问题的解决提供了额外的诊断见解,超过了仅由流畅性 CDM 提供的信息。此外,被试在不同隐藏状态下的分布也反映了每个项目中物体视觉化的复杂性,为项目特征的描述增添了一个细微的维度。这是对流畅度 CDM 模型中通过项目参数获得的信息的补充,而流畅度 CDM 模型依赖于反应准确性和反应时间。我们的研究结果有望为开发新的心理测量和统计模型铺平道路,使其能够无缝整合各种类型的多模态数据。这种整合方法有望得到更有意义、更可解释的结果,从而促进对空间旋转测试所涉及的认知过程的理解。
{"title":"Exploring Latent Constructs through Multimodal Data Analysis","authors":"Shiyu Wang, Shushan Wu, Yinghan Chen, Luyang Fang, Liang Xiao, Feiming Li","doi":"10.1111/jedm.12412","DOIUrl":"https://doi.org/10.1111/jedm.12412","url":null,"abstract":"This study presents a comprehensive analysis of three types of multimodal data‐response accuracy, response times, and eye‐tracking data‐derived from a computer‐based spatial rotation test. To tackle the complexity of high‐dimensional data analysis challenges, we have developed a methodological framework incorporating various statistical and machine learning methods. The results of our study reveal that hidden state transition probabilities, based on eye‐tracking features, may be contingent on skill mastery estimated from the fluency CDM model. The hidden state trajectory offers additional diagnostic insights into spatial rotation problem‐solving, surpassing the information provided by the fluency CDM alone. Furthermore, the distribution of participants across different hidden states reflects the intricate nature of visualizing objects in each item, adding a nuanced dimension to the characterization of item features. This complements the information obtained from item parameters in the fluency CDM model, which relies on response accuracy and response time. Our findings have the potential to pave the way for the development of new psychometric and statistical models capable of seamlessly integrating various types of multimodal data. This integrated approach promises more meaningful and interpretable results, with implications for advancing the understanding of cognitive processes involved in spatial rotation tests.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"69 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs 国际学生评估项目多阶段适应性测试设计下项目反应理论模型的稳健性
IF 1.3 4区 心理学 Q3 PSYCHOLOGY, APPLIED Pub Date : 2024-08-01 DOI: 10.1111/jedm.12409
Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto
Many international large‐scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)‐based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item‐by‐country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not‐reached items. The purpose of this study is to examine the robustness of current IRT‐based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item‐by‐country interactions and items not‐reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.
许多国际性的大规模测评(ILSA)都改用了多阶段适应性测试(MST)设计,以提高测量效率,测量全球异质人群的技能。在这种情况下,以往的文献报道了在多阶段自适应测试设计下,当使用目前基于项目反应理论(IRT)的比例模型时,模型参数恢复的可接受程度。然而,以往的研究并未考虑 ILSA 数据中常见的现实现象的影响,如项目与国家之间的交互作用、在后续周期中重复使用 MST 设计以及非响应(包括遗漏和未达到的项目)。本研究的目的是以国际学生评估项目(PISA)设计为例,检验目前基于 IRT 的缩放模型在 MST 设计下对这三个因素的稳健性。一系列模拟研究表明,在 PISA 项目中使用的 IRT 计分模型在后续周期中重复使用 MST 设计(项目数量更少、样本量更小)时是稳健的,而项目与国家之间的交互作用和未达到的项目对模型参数估计的影响可以忽略不计,甚至微乎其微,而遗漏回答的影响最大。讨论部分为未来的 MST 设计和 ILSA 的比例模型提供了建议和启示。
{"title":"Robustness of Item Response Theory Models under the PISA Multistage Adaptive Testing Designs","authors":"Hyo Jeong Shin, Christoph König, Frederic Robin, Andreas Frey, Kentaro Yamamoto","doi":"10.1111/jedm.12409","DOIUrl":"https://doi.org/10.1111/jedm.12409","url":null,"abstract":"Many international large‐scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)‐based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item‐by‐country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not‐reached items. The purpose of this study is to examine the robustness of current IRT‐based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item‐by‐country interactions and items not‐reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.","PeriodicalId":47871,"journal":{"name":"Journal of Educational Measurement","volume":"75 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141882915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Educational Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1