首页 > 最新文献

Educational Measurement-Issues and Practice最新文献

英文 中文
Bilevel Topic Model-Based Multitask Learning for Constructed-Responses Multidimensional Automated Scoring and Interpretation 基于双层主题模型的多任务学习构建反应多维自动评分和解释
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-03-15 DOI: 10.1111/emip.12550
Jiawei Xiong, Feiming Li

Multidimensional scoring evaluates each constructed-response answer from more than one rating dimension and/or trait such as lexicon, organization, and supporting ideas instead of only one holistic score, to help students distinguish between various dimensions of writing quality. In this work, we present a bilevel learning model for combining two objectives, the multidimensional automated scoring, and the students’ writing structure analysis and interpretation. The dual objectives are enabled by a supervised model, called Latent Dirichlet Allocation Multitask Learning (LDAMTL), integrating a topic model and a multitask learning model with an attention mechanism. Two empirical data sets were employed to indicate LDAMTL model performance. On one hand, results suggested that LDAMTL owns better scoring and QW-κ values than two other competitor models, the supervised latent Dirichlet allocation, and Bidirectional Encoder Representations from Transformers at the 5% significance level. On the other hand, extracted topic structures revealed that students with a higher language score tended to employ more compelling words to support the argument in their answers. This study suggested that LDAMTL not only demonstrates the model performance by conjugating the underlying shared representation of each topic and learned representation from the neural networks but also helps understand students’ writing.

多维评分从多个维度和/或特征(如词汇、组织和支持思想)来评估每个建构性回答,而不是只有一个整体得分,以帮助学生区分写作质量的各个维度。在这项工作中,我们提出了一个结合两个目标的双层学习模型,多维自动评分和学生的写作结构分析和解释。双重目标是通过一种被称为潜狄利克雷分配多任务学习(LDAMTL)的监督模型来实现的,该模型将主题模型和多任务学习模型与注意机制相结合。使用两个经验数据集来表明LDAMTL模型的性能。一方面,结果表明LDAMTL具有更好的评分和QW-κ值,在5%显著性水平上优于其他两个竞争模型,即监督潜在Dirichlet分配模型和来自Transformers的双向编码器表示模型。另一方面,提取的主题结构表明,语言得分较高的学生倾向于在答案中使用更有说服力的词汇来支持论点。本研究表明,LDAMTL不仅通过结合每个主题的底层共享表征和从神经网络中学习到的表征来证明模型的性能,而且有助于理解学生的写作。
{"title":"Bilevel Topic Model-Based Multitask Learning for Constructed-Responses Multidimensional Automated Scoring and Interpretation","authors":"Jiawei Xiong,&nbsp;Feiming Li","doi":"10.1111/emip.12550","DOIUrl":"10.1111/emip.12550","url":null,"abstract":"<p>Multidimensional scoring evaluates each constructed-response answer from more than one rating dimension and/or trait such as lexicon, organization, and supporting ideas instead of only one holistic score, to help students distinguish between various dimensions of writing quality. In this work, we present a bilevel learning model for combining two objectives, the multidimensional automated scoring, and the students’ writing structure analysis and interpretation. The dual objectives are enabled by a supervised model, called Latent Dirichlet Allocation Multitask Learning (LDAMTL), integrating a topic model and a multitask learning model with an attention mechanism. Two empirical data sets were employed to indicate LDAMTL model performance. On one hand, results suggested that LDAMTL owns better scoring and QW-<i>κ</i> values than two other competitor models, the supervised latent Dirichlet allocation, and Bidirectional Encoder Representations from Transformers at the 5% significance level. On the other hand, extracted topic structures revealed that students with a higher language score tended to employ more compelling words to support the argument in their answers. This study suggested that LDAMTL not only demonstrates the model performance by conjugating the underlying shared representation of each topic and learned representation from the neural networks but also helps understand students’ writing.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"42-61"},"PeriodicalIF":2.0,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47037574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Machine Learning Approach for the Simultaneous Detection of Preknowledge in Examinees and Items When Both Are Unknown 一种机器学习方法在未知的情况下同时检测考生和项目中的先验知识
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-24 DOI: 10.1111/emip.12543
Yiqin Pan, James A. Wollack

Pan and Wollack (PW) proposed a machine learning method to detect compromised items. We extend the work of PW to an approach detecting compromised items and examinees with item preknowledge simultaneously and draw on ideas in ensemble learning to relax several limitations in the work of PW. The suggested approach also provides a confidence score, which is based on an autoencoder to represent our confidence that the detection result truly corresponds to item preknowledge. Simulation studies indicate that the proposed approach performs well in the detection of item preknowledge, and the confidence score can provide helpful information for users.

Pan和Wollack (PW)提出了一种机器学习方法来检测受损物品。我们将PW的工作扩展到一种同时检测折衷项目和具有项目预知的考生的方法,并借鉴集成学习的思想来放宽PW工作中的几个限制。建议的方法还提供了一个置信度评分,该评分基于自动编码器来表示我们对检测结果真正对应于项目预知的置信度。仿真研究表明,该方法在项目预知检测中表现良好,置信度得分可以为用户提供有用的信息。
{"title":"A Machine Learning Approach for the Simultaneous Detection of Preknowledge in Examinees and Items When Both Are Unknown","authors":"Yiqin Pan,&nbsp;James A. Wollack","doi":"10.1111/emip.12543","DOIUrl":"10.1111/emip.12543","url":null,"abstract":"<p>Pan and Wollack (PW) proposed a machine learning method to detect compromised items. We extend the work of PW to an approach detecting compromised items and examinees with item preknowledge simultaneously and draw on ideas in ensemble learning to relax several limitations in the work of PW. The suggested approach also provides a confidence score, which is based on an autoencoder to represent our confidence that the detection result truly corresponds to item preknowledge. Simulation studies indicate that the proposed approach performs well in the detection of item preknowledge, and the confidence score can provide helpful information for users.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"76-98"},"PeriodicalIF":2.0,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42758240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cheating Detection of Test Collusion: A Study on Machine Learning Techniques and Feature Representation 测试合谋作弊检测:基于机器学习技术和特征表示的研究
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-19 DOI: 10.1111/emip.12538
Shun-Chuan Chang, Keng Lun Chang

Machine learning has evolved and expanded as an interdisciplinary research method for educational sciences. However, cheating detection of test collusion among multiple examinees or sets of examinees with unusual answer patterns using machine learning techniques has remained relatively unexplored. This study investigates collusion on multiple-choice tests by introducing feature representation methodologies and machine learning algorithms that can be jointly used as a promising method; they can be used not only to detect individual examinees involved in the collusion but also to evaluate test collusion with or without the groups of potentially dishonest examinees identified a priori. Furthermore, using small-sample examples, the visual detection procedures of the current study were articulated to help identify questionable item response groups and simultaneously focus on the specific individuals providing anomalous answers.

机器学习作为一种跨学科的教育科学研究方法已经发展和扩展。然而,使用机器学习技术检测多个考生或具有不寻常答案模式的考生之间的作弊行为仍然相对未被探索。本研究通过引入特征表示方法和机器学习算法来研究多项选择题中的合谋,这是一种有前途的方法;它们不仅可以用来检测参与串谋的个别考生,还可以用来评估是否有潜在不诚实考生群体的考试串谋。此外,使用小样本的例子,本研究的视觉检测程序被阐明,以帮助识别可疑的项目反应组,同时关注提供异常答案的特定个体。
{"title":"Cheating Detection of Test Collusion: A Study on Machine Learning Techniques and Feature Representation","authors":"Shun-Chuan Chang,&nbsp;Keng Lun Chang","doi":"10.1111/emip.12538","DOIUrl":"10.1111/emip.12538","url":null,"abstract":"<p>Machine learning has evolved and expanded as an interdisciplinary research method for educational sciences. However, cheating detection of test collusion among multiple examinees or sets of examinees with unusual answer patterns using machine learning techniques has remained relatively unexplored. This study investigates collusion on multiple-choice tests by introducing feature representation methodologies and machine learning algorithms that can be jointly used as a promising method; they can be used not only to detect individual examinees involved in the collusion but also to evaluate test collusion with or without the groups of potentially dishonest examinees identified a priori. Furthermore, using small-sample examples, the visual detection procedures of the current study were articulated to help identify questionable item response groups and simultaneously focus on the specific individuals providing anomalous answers.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"62-73"},"PeriodicalIF":2.0,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45173876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
To Score or Not to Score: Factors Influencing Performance and Feasibility of Automatic Content Scoring of Text Responses 评分与否:影响文本回复内容自动评分性能和可行性的因素
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-14 DOI: 10.1111/emip.12544
Torsten Zesch, Andrea Horbach, Fabian Zehner

In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced by other factors such as target population or input modality. Extending previous work, we distinguish conceptual, realization, and nonconformity variance, which are differentially impacted by the various factors. While conceptual variance relates to different concepts embedded in the text responses, realization variance refers to their diverse manifestation through natural language. Nonconformity variance is added by aberrant response behavior. Furthermore, besides its performance, the feasibility of using an automatic scoring system depends on external factors, such as ethical or computational constraints, which influence whether a system with a given performance is accepted by stakeholders. Our work provides (i) a framework for assessment practitioners to decide a priori whether automatic content scoring can be successfully applied in a given setup as well as (ii) new empirical findings and the integration of empirical findings from the literature on factors that influence automatic systems' performance.

在本文中,我们系统地分析了影响短文本自动内容评分方法性能的因素和可行性。我们认为,性能(即自动系统与人类判断的一致程度)主要取决于在响应中看到的语言差异,而这种差异间接受到目标人群或输入方式等其他因素的影响。扩展以前的工作,我们区分了概念、实现和不符合差异,它们受到各种因素的不同影响。概念差异指的是文本反应中所包含的不同概念,而实现差异指的是这些概念在自然语言中的不同表现形式。异常反应行为增加了不符合方差。此外,除了性能之外,使用自动评分系统的可行性还取决于外部因素,例如道德或计算约束,这些因素会影响具有给定性能的系统是否被利益相关者接受。我们的工作为评估从业者提供了一个框架,以先验地决定自动内容评分是否可以成功地应用于给定的设置,以及(ii)新的实证发现和对影响自动系统性能因素的文献中的实证发现的整合。
{"title":"To Score or Not to Score: Factors Influencing Performance and Feasibility of Automatic Content Scoring of Text Responses","authors":"Torsten Zesch,&nbsp;Andrea Horbach,&nbsp;Fabian Zehner","doi":"10.1111/emip.12544","DOIUrl":"10.1111/emip.12544","url":null,"abstract":"<p>In this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic <i>variance</i> seen in the responses and that this variance is indirectly influenced by other factors such as target population or input modality. Extending previous work, we distinguish <i>conceptual</i>, <i>realization</i>, and <i>nonconformity variance</i>, which are differentially impacted by the various factors. While conceptual variance relates to different concepts embedded in the text responses, realization variance refers to their diverse manifestation through natural language. Nonconformity variance is added by aberrant response behavior. Furthermore, besides its performance, the feasibility of using an automatic scoring system depends on external factors, such as ethical or computational constraints, which influence whether a system with a given performance is accepted by stakeholders. Our work provides (i) a framework for assessment practitioners to decide a priori whether automatic content scoring can be successfully applied in a given setup as well as (ii) new empirical findings and the integration of empirical findings from the literature on factors that influence automatic systems' performance.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"44-58"},"PeriodicalIF":2.0,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12544","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44028287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Call for Papers: Leveraging Measurement for Better Decisions 论文征集:利用衡量做出更好的决策
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-14 DOI: 10.1111/emip.12546
{"title":"Call for Papers: Leveraging Measurement for Better Decisions","authors":"","doi":"10.1111/emip.12546","DOIUrl":"10.1111/emip.12546","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"7"},"PeriodicalIF":2.0,"publicationDate":"2023-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48049149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Introduction to the Special Section “Issues and Practice in Applying Machine Learning in Educational Measurement” “在教育测量中应用机器学习的问题与实践”专题介绍
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-13 DOI: 10.1111/emip.12547
Zhongmin Cui
{"title":"Introduction to the Special Section “Issues and Practice in Applying Machine Learning in Educational Measurement”","authors":"Zhongmin Cui","doi":"10.1111/emip.12547","DOIUrl":"10.1111/emip.12547","url":null,"abstract":"","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"8"},"PeriodicalIF":2.0,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12547","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42293990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning Literacy for Measurement Professionals: A Practical Tutorial 测量专业人员的机器学习素养:实践教程
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-06 DOI: 10.1111/emip.12539
Rui Nie, Qi Guo, Maxim Morin

The COVID-19 pandemic has accelerated the digitalization of assessment, creating new challenges for measurement professionals, including big data management, test security, and analyzing new validity evidence. In response to these challenges, Machine Learning (ML) emerges as an increasingly important skill in the toolbox of measurement professionals in this new era. However, most ML tutorials are technical and conceptual-focused. Therefore, this tutorial aims to provide a practical introduction to ML in the context of educational measurement. We also supplement our tutorial with several examples of supervised and unsupervised ML techniques applied to marking a short-answer question. Python codes are available on GitHub. In the end, common misconceptions about ML are discussed.

2019冠状病毒病大流行加速了评估的数字化,给测量专业人员带来了新的挑战,包括大数据管理、测试安全性和分析新的有效性证据。为了应对这些挑战,机器学习(ML)在这个新时代成为测量专业人员工具箱中越来越重要的技能。然而,大多数ML教程都是以技术和概念为中心的。因此,本教程的目的是在教育测量的背景下提供ML的实用介绍。我们还用几个例子来补充我们的教程,这些例子是应用于标记简短回答问题的有监督和无监督ML技术。Python代码可在GitHub上获得。最后,讨论了关于机器学习的常见误解。©2023国家教育计量委员会。
{"title":"Machine Learning Literacy for Measurement Professionals: A Practical Tutorial","authors":"Rui Nie,&nbsp;Qi Guo,&nbsp;Maxim Morin","doi":"10.1111/emip.12539","DOIUrl":"10.1111/emip.12539","url":null,"abstract":"<p>The COVID-19 pandemic has accelerated the digitalization of assessment, creating new challenges for measurement professionals, including big data management, test security, and analyzing new validity evidence. In response to these challenges, <i>Machine Learning</i> (ML) emerges as an increasingly important skill in the toolbox of measurement professionals in this new era. However, most ML tutorials are technical and conceptual-focused. Therefore, this tutorial aims to provide a practical introduction to ML in the context of educational measurement. We also supplement our tutorial with several examples of supervised and unsupervised ML techniques applied to marking a short-answer question. Python codes are available on GitHub. In the end, common misconceptions about ML are discussed.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"9-23"},"PeriodicalIF":2.0,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48299064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Causal Inference and COVID: Contrasting Methods for Evaluating Pandemic Impacts Using State Assessments 因果推理与COVID:使用状态评估评估大流行影响的对比方法
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-02-03 DOI: 10.1111/emip.12540
Benjamin R. Shear

In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study pandemic impacts using state assessment data: measurement artifacts affecting the comparability of scores, secular trends, and changes in the tested population. The article compares three statistical approaches (the Fair Trend, baseline student growth percentiles, and multiple regression with demographic covariates) that can support more valid inferences about student learning during the pandemic and in other scenarios in which the tested population changes over time. All three approaches lead to similar inferences about statewide student performance but can lead to very different inferences about student subgroups. Results show that controlling statistically for prepandemic demographic differences can reverse the conclusions about groups most affected by the pandemic and decisions about prioritizing resources.

2021年春,也就是学校因COVID-19而被迫关闭仅仅一年后,为了提供有关大流行对学生学习影响的数据,并帮助将资源定向到最需要的地方,政府付出了巨大代价进行了评估。本文使用来自科罗拉多州的州评估数据,描述了使用州评估数据对学生学习进行有效推断以研究大流行影响的最大威胁:影响分数可比性的测量伪影、长期趋势和受测人群的变化。本文比较了三种统计方法(公平趋势、基线学生增长百分位数和人口统计学协变量的多元回归),这些方法可以支持关于大流行期间和受测人群随时间变化的其他情况下学生学习情况的更有效推断。这三种方法对全州学生的表现得出了相似的结论,但对学生分组的推断却截然不同。结果表明,在统计上控制大流行前的人口统计学差异,可以扭转关于受大流行影响最严重群体的结论和有关优先分配资源的决定。©2023国家教育计量委员会。
{"title":"Causal Inference and COVID: Contrasting Methods for Evaluating Pandemic Impacts Using State Assessments","authors":"Benjamin R. Shear","doi":"10.1111/emip.12540","DOIUrl":"10.1111/emip.12540","url":null,"abstract":"<p>In the spring of 2021, just 1 year after schools were forced to close for COVID-19, state assessments were administered at great expense to provide data about impacts of the pandemic on student learning and to help target resources where they were most needed. Using state assessment data from Colorado, this article describes the biggest threats to making valid inferences about student learning to study pandemic impacts using state assessment data: measurement artifacts affecting the comparability of scores, secular trends, and changes in the tested population. The article compares three statistical approaches (the Fair Trend, baseline student growth percentiles, and multiple regression with demographic covariates) that can support more valid inferences about student learning during the pandemic and in other scenarios in which the tested population changes over time. All three approaches lead to similar inferences about statewide student performance but can lead to very different inferences about student subgroups. Results show that controlling statistically for prepandemic demographic differences can reverse the conclusions about groups most affected by the pandemic and decisions about prioritizing resources.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"99-109"},"PeriodicalIF":2.0,"publicationDate":"2023-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44143301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Machine Learning–Based Profiling in Test Cheating Detection 基于机器学习的测试作弊检测分析
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-01-31 DOI: 10.1111/emip.12541
Huijuan Meng, Ye Ma

In recent years, machine learning (ML) techniques have received more attention in detecting aberrant test-taking behaviors due to advantages when compared to traditional data forensics methods. However, defining “True Test Cheaters” is challenging—different than other fraud detection tasks such as flagging forged bank checks or credit card frauds, testing organizations are often lack of physical evidences to identify “True Test Cheaters” to train ML models. This study proposed a statistically defensible method of labeling “True Test Cheaters” in the data, demonstrated the effectiveness of using ML approaches to identify irregular statistical patterns in exam data, and established an analytical framework for evaluating and conducting real-time ML-based test data forensics. Classification accuracy and false negative/positive results are evaluated across different supervised-ML techniques. The reliability and feasibility of operationally using this approach for an IT certification exam are evaluated using real data.

近年来,机器学习(ML)技术由于其与传统数据取证方法相比的优势,在检测异常考试行为方面受到越来越多的关注。然而,定义“真正的测试作弊者”是具有挑战性的——与其他欺诈检测任务(如标记伪造的银行支票或信用卡欺诈)不同,测试组织通常缺乏物理证据来识别“真正的测试作弊者”来训练机器学习模型。本研究提出了一种在数据中标记“真正的考试作弊者”的统计方法,证明了使用ML方法识别考试数据中不规则统计模式的有效性,并建立了一个评估和开展基于ML的实时考试数据取证的分析框架。在不同的监督ml技术中评估分类准确性和假阴性/阳性结果。使用实际数据评估了在IT认证考试中使用这种方法的可靠性和可行性。
{"title":"Machine Learning–Based Profiling in Test Cheating Detection","authors":"Huijuan Meng,&nbsp;Ye Ma","doi":"10.1111/emip.12541","DOIUrl":"10.1111/emip.12541","url":null,"abstract":"<p>In recent years, machine learning (ML) techniques have received more attention in detecting aberrant test-taking behaviors due to advantages when compared to traditional data forensics methods. However, defining “True Test Cheaters” is challenging—different than other fraud detection tasks such as flagging forged bank checks or credit card frauds, testing organizations are often lack of physical evidences to identify “True Test Cheaters” to train ML models. This study proposed a statistically defensible method of labeling “True Test Cheaters” in the data, demonstrated the effectiveness of using ML approaches to identify irregular statistical patterns in exam data, and established an analytical framework for evaluating and conducting real-time ML-based test data forensics. Classification accuracy and false negative/positive results are evaluated across different supervised-ML techniques. The reliability and feasibility of operationally using this approach for an IT certification exam are evaluated using real data.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 1","pages":"59-75"},"PeriodicalIF":2.0,"publicationDate":"2023-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48429707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Psychometric Evaluation of the Preschool Early Numeracy Skills Test–Brief Version Within the Item Response Theory Framework 项目反应理论框架下学前儿童早期算术技能测试的心理测量评价
IF 2 4区 教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH Pub Date : 2023-01-11 DOI: 10.1111/emip.12536
Nikolaos Tsigilis, Katerina Krousorati, Athanasios Gregoriadis, Vasilis Grammatikopoulos

The Preschool Early Numeracy Skills Test–Brief Version (PENS-B) is a measure of early numeracy skills, developed and mainly used in the United States. The purpose of this study was to examine the factorial validity and measurement invariance across gender of PENS-B in the Greek educational context. PENS-B was administered to 906 preschool children (473 boys, 433 girls), randomly selected from 84 kindergarten classrooms. A 2PL unidimensional and multidimensional item response theory analysis, using cross-validation procedures, were used to analyze the data. Results showed that responses to 20 items can be adequately explained by a two-dimensional model (Numbering Relations and Arithmetic Operations). Application of differential item functioning procedures did not detect any gender bias. Numeracy Relation comprises 16 items, which assess low levels of this latent trait. On the other hand, four items capture average levels of Arithmetic Operations. Total information curves revealed that both dimensions measure with precision only a small area of their underlying latent trait.

学前早期算术技能测试-简要版(PENS-B)是早期算术技能的衡量标准,主要在美国开发和使用。本研究的目的是检验希腊教育背景下PENS-B在性别上的析因效度和测量不变性。在84个幼儿园教室中随机抽取906名学龄前儿童(男孩473名,女孩433名)进行pen - b研究。采用交叉验证程序,采用单维度和多维项目反应理论分析对数据进行分析。结果表明,对20个问题的回答可以用一个二维模型(编号关系和算术运算)来充分解释。差异项目功能程序的应用没有发现任何性别偏见。计算关系包括16个项目,这些项目评估了这种潜在特质的低水平。另一方面,有四个项目捕获了算术运算的平均水平。总信息曲线显示,这两个维度只能精确测量潜在特征的一小部分区域。
{"title":"Psychometric Evaluation of the Preschool Early Numeracy Skills Test–Brief Version Within the Item Response Theory Framework","authors":"Nikolaos Tsigilis,&nbsp;Katerina Krousorati,&nbsp;Athanasios Gregoriadis,&nbsp;Vasilis Grammatikopoulos","doi":"10.1111/emip.12536","DOIUrl":"10.1111/emip.12536","url":null,"abstract":"<p>The Preschool Early Numeracy Skills Test–Brief Version (PENS-B) is a measure of early numeracy skills, developed and mainly used in the United States. The purpose of this study was to examine the factorial validity and measurement invariance across gender of PENS-B in the Greek educational context. PENS-B was administered to 906 preschool children (473 boys, 433 girls), randomly selected from 84 kindergarten classrooms. A 2PL unidimensional and multidimensional item response theory analysis, using cross-validation procedures, were used to analyze the data. Results showed that responses to 20 items can be adequately explained by a two-dimensional model (Numbering Relations and Arithmetic Operations). Application of differential item functioning procedures did not detect any gender bias. Numeracy Relation comprises 16 items, which assess low levels of this latent trait. On the other hand, four items capture average levels of Arithmetic Operations. Total information curves revealed that both dimensions measure with precision only a small area of their underlying latent trait.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"42 2","pages":"32-41"},"PeriodicalIF":2.0,"publicationDate":"2023-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12536","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47069473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational Measurement-Issues and Practice
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1