Educational Measurement-Issues and Practice最新文献

英文中文

Issue Cover 发行封面

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-05-20 DOI: 10.1111/emip.12562

引用次数: 0

Blending Strategic Expertise and Technology: A Case Study for Practice Analysis 战略专业知识与技术相结合：实践分析案例研究

IF 2.7 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-05-20 DOI: 10.1111/emip.12607

Bharati B. Belwalkar, Matthew Schultz, Christina Curnow, J. Carl Setzer

There is a growing integration of technology in the workplace (World Economic Forum), and with it, organizations are increasingly relying on advanced technological approaches for improving their human capital processes to stay relevant and competitive in complex environments. All professions must keep up with this transition and begin integrating technology into their tools and processes. This paper centers on how advanced technological approaches (such as natural language processing (NLP) and data mining) have complemented a traditional practice analysis of the accounting profession. We also discuss strategic selection and use of subject-matter experts (SMEs) for more efficient practice analysis. The authors have adopted a triangulation process—gathering information from traditional practice analysis, using selected SMEs, and confirming findings with a novel NLP-based approach. These methods collectively contributed to the revision of the Uniform CPA Exam blueprint and in understanding accounting trends.

技术与工作场所的融合日益加深（世界经济论坛），各组织也越来越依赖先进的技术方法来改进其人力资本流程，以便在复杂的环境中保持相关性和竞争力。所有行业都必须跟上这一转变，并开始将技术融入其工具和流程。本文主要探讨先进的技术方法（如自然语言处理（NLP）和数据挖掘）如何与会计行业的传统实践分析相辅相成。我们还讨论了如何战略性地选择和使用主题专家 (SME)，以提高实践分析的效率。作者采用了一个三角测量过程--从传统的实践分析中收集信息，利用选定的中小型企业，并通过基于 NLP 的新方法确认研究结果。这些方法共同为修订注册会计师统一考试蓝图和了解会计趋势做出了贡献。

引用次数: 0

2023 NCME Presidential Address: Some Musings on Comparable Scores 2023 年全国教育大会主席致辞：关于可比分数的一些想法

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-05-12 DOI: 10.1111/emip.12609

Deborah J. Harris

This article is based on my 2023 NCME Presidential Address, where I talked a bit about my journey into the profession, and more substantively about comparable scores. Specifically, I discussed some of the different ways ‘comparable scores’ are defined, highlighted some areas I think we as a profession need to pay more attention to when considering score comparability, and emphasized that comparability in this context is a matter of degree which varies according to the decisions we plan to make on particular scores.

这篇文章是根据我在 2023 年全国医学教育大会上的主席致辞撰写的，我在致辞中谈到了我进入这一行业的一些历程，并更实质性地谈到了可比分数。具体来说，我讨论了 "可比分数 "的一些不同定义方式，强调了我认为在考虑分数可比性时我们作为一个行业需要更加关注的一些领域，并强调在这种情况下可比性是一个程度问题，根据我们计划对特定分数做出的决定而有所不同。

引用次数: 0

Examining Gender Differences in TIMSS 2019 Using a Multiple-Group Hierarchical Speed-Accuracy-Revisits Model 使用多组分层速度-准确性-重访模型研究 2019 年 TIMSS 考试中的性别差异

IF 2.7 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-04-24 DOI: 10.1111/emip.12606

Dihao Leng, Ummugul Bezirhan, Lale Khorramdel, Bethany Fishbein, Matthias von Davier

This study capitalizes on response and process data from the computer-based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test-taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed-accuracy-revisits (SAR) model was adapted to multiple country-by-gender groups to examine the extent to which mathematics ability, response speed, revisit propensity, and the relationship among them differ between boys and girls. Results across 10 countries showed that boys responded to items faster on average than girls, and there was greater variation in boys’ response speed across students. A mixture distribution of revisit propensity was found for all country-by-gender groups. Both genders had moderate to strong negative correlations between mathematics ability and response speed, supporting the speed-accuracy tradeoff pattern reported in the literature. Results are discussed in the context of low-stakes assessments and in relation to the utility of the multiple-group SAR model.

本研究利用基于计算机的 TIMSS 2019 年 "问题解决与探究 "任务中的反应和过程数据，研究八年级学生在考试行为方面的性别差异及其与数学成绩之间的关联。具体来说，我们将最近提出的分层速度-测准-重访（SAR）模型应用于多个国家的不同性别群体，以研究男生和女生在数学能力、反应速度、重访倾向以及它们之间的关系方面的差异程度。10 个国家的研究结果表明，男生对题目的平均反应速度比女生快，而且男生的反应速度在不同学生之间的差异更大。在所有国家和性别组中，重访倾向呈混合分布。男女生的数学能力与反应速度之间都存在中度到高度的负相关，这支持了文献中报道的速度-准确性权衡模式。本研究结合低分值评估以及多组 SAR 模型的实用性对结果进行了讨论。

{"title":"Examining Gender Differences in TIMSS 2019 Using a Multiple-Group Hierarchical Speed-Accuracy-Revisits Model","authors":"Dihao Leng, Ummugul Bezirhan, Lale Khorramdel, Bethany Fishbein, Matthias von Davier","doi":"10.1111/emip.12606","DOIUrl":"10.1111/emip.12606","url":null,"abstract":"<p>This study capitalizes on response and process data from the computer-based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test-taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed-accuracy-revisits (SAR) model was adapted to multiple country-by-gender groups to examine the extent to which mathematics ability, response speed, revisit propensity, and the relationship among them differ between boys and girls. Results across 10 countries showed that boys responded to items faster on average than girls, and there was greater variation in boys’ response speed across students. A mixture distribution of revisit propensity was found for all country-by-gender groups. Both genders had moderate to strong negative correlations between mathematics ability and response speed, supporting the speed-accuracy tradeoff pattern reported in the literature. Results are discussed in the context of low-stakes assessments and in relation to the utility of the multiple-group SAR model.</p>","PeriodicalId":47345,"journal":{"name":"Educational Measurement-Issues and Practice","volume":"43 3","pages":"64-75"},"PeriodicalIF":2.7,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1111/emip.12606","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140663098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"教育学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Guesses and Slips as Proficiency-Related Phenomena and Impacts on Parameter Invariance 猜测和失误作为与能力有关的现象及其对参数不变性的影响

IF 2.7 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-04-08 DOI: 10.1111/emip.12605

Xiangyi Liao, Daniel M Bolt

Traditional approaches to the modeling of multiple-choice item response data (e.g., 3PL, 4PL models) emphasize slips and guesses as random events. In this paper, an item response model is presented that characterizes both disjunctively interacting guessing and conjunctively interacting slipping processes as proficiency-related phenomena. We show how evidence for this perspective is seen in the systematic form of invariance violations for item slip and guess parameters under four-parameter IRT models when compared across populations of different mean proficiency levels. Specifically, higher proficiency populations tend to show higher guess and lower slip probabilities than lower proficiency populations. The results undermine the use of traditional models for IRT applications that require invariance and would suggest greater attention to alternatives.

传统的多选题项目反应数据建模方法（如 3PL、4PL 模型）强调滑题和猜题是随机事件。本文提出了一种项目反应模型，它将不相关的交互猜测和相关的交互滑动过程都描述为与能力相关的现象。我们展示了这种观点的证据，即在四参数 IRT 模型下，在不同平均能力水平的人群之间进行比较时，项目滑点和猜测参数的不变量违反是如何以系统的形式出现的。具体来说，与水平较低的人群相比，水平较高的人群往往表现出较高的猜测概率和较低的失误概率。这些结果不利于在要求不变量的 IRT 应用中使用传统模型，并建议更多地关注替代模型。

引用次数: 0

Transforming Assessment: The Impacts and Implications of Large Language Models and Generative AI 改变评估：大型语言模型和生成式人工智能的影响和意义

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-04-04 DOI: 10.1111/emip.12602

Jiangang Hao, Alina A. von Davier, Victoria Yaneva, Susan Lottridge, Matthias von Davier, Deborah J. Harris

The remarkable strides in artificial intelligence (AI), exemplified by ChatGPT, have unveiled a wealth of opportunities and challenges in assessment. Applying cutting-edge large language models (LLMs) and generative AI to assessment holds great promise in boosting efficiency, mitigating bias, and facilitating customized evaluations. Conversely, these innovations raise significant concerns regarding validity, reliability, transparency, fairness, equity, and test security, necessitating careful thinking when applying them in assessments. In this article, we discuss the impacts and implications of LLMs and generative AI on critical dimensions of assessment with example use cases and call for a community effort to equip assessment professionals with the needed AI literacy to harness the potential effectively.

以 ChatGPT 为代表的人工智能（AI）取得了长足进步，为评估领域带来了大量机遇和挑战。将尖端的大型语言模型（LLMs）和生成式人工智能应用于评估，在提高效率、减少偏差和促进定制化评估方面大有可为。与此相反，这些创新技术在有效性、可靠性、透明度、公平性、公正性和测试安全性等方面也引起了极大的关注，因此在评估中应用这些技术时必须慎重考虑。在这篇文章中，我们通过使用实例讨论了 LLMs 和生成式人工智能对评估关键维度的影响和意义，并呼吁社会各界共同努力，使评估专业人员具备必要的人工智能素养，从而有效利用人工智能的潜力。

引用次数: 0

Revisiting the Usage of Alpha in Scale Evaluation: Effects of Scale Length and Sample Size 重新审视量表评估中 Alpha 的使用：量表长度和样本量的影响

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-03-20 DOI: 10.1111/emip.12604

Leifeng Xiao, Kit-Tai Hau, Melissa Dan Wang

Short scales are time-efficient for participants and cost-effective in research. However, researchers often mistakenly expect short scales to have the same reliability as long ones without considering the effect of scale length. We argue that applying a universal benchmark for alpha is problematic as the impact of low-quality items is greater on shorter scales. In this study, we proposed simple guidelines for item reduction using the “alpha-if-item-deleted” procedure in scale construction. An item can be removed if alpha increases or decreases by less than .02, especially for short scales. Conversely, an item should be retained if alpha decreases by more than .04 upon its removal. For reliability benchmarks, .80 is relatively safe in most conditions, but higher benchmarks are recommended for longer scales and smaller sample sizes. Supplementary analyses, including item content, face validity, and content coverage, are critical to ensure scale quality.

对于参与者来说，短量表节省时间，在研究中也具有成本效益。然而，研究人员往往错误地认为短量表与长量表具有相同的信度，而没有考虑量表长度的影响。我们认为，采用一个通用的阿尔法基准是有问题的，因为低质量项目对短量表的影响更大。在本研究中，我们提出了在量表构建过程中使用 "如果α-项目-删除 "程序来减少项目的简单指导原则。如果 alpha 的增减小于 0.02，就可以删除一个项目，尤其是短量表。反之，如果删除一个项目后 alpha 下降超过 0.04，则应保留该项目。在大多数情况下，0.80 是相对安全的信度基准，但对于较长的量表和较小的样本量，建议采用更高的信度基准。补充分析，包括项目内容、表面效度和内容覆盖率，对于确保量表质量至关重要。

引用次数: 0

What Mathematics Content Do Teachers Teach? Optimizing Measurement of Opportunities to Learn in the Classroom 教师教授哪些数学内容？优化课堂学习机会的衡量标准

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-03-07 DOI: 10.1111/emip.12603

Jiahui Zhang, William H. Schmidt

Measuring opportunities to learn (OTL) is crucial for evaluating education quality and equity, but obtaining accurate and comprehensive OTL data at a large scale remains challenging. We attempt to address this issue by investigating measurement concerns in data collection and sampling. With the primary goal of estimating group-level OTLs for large populations of classrooms and the secondary goal of estimating classroom-level OTLs, we propose forming a teacher panel and using an online log-type survey to collect content and time data on sampled days throughout the school year. We compared various sampling schemes in a simulation study with real daily log data from 66 fourth-grade math teachers. The findings from this study indicate that sampling 1 day per week or 1 day every other week provided accurate group-level estimates, while sampling 1 day per week yielded satisfactory classroom-level estimates. The proposed approach aids in effectively monitoring large-scale classroom OTL.

衡量学习机会（OTL）对于评估教育质量和公平性至关重要，但在大规模范围内获取准确、全面的 OTL 数据仍具有挑战性。我们试图通过调查数据收集和抽样中的测量问题来解决这一问题。我们的主要目标是估算大量教室的组级 OTL，次要目标是估算教室级 OTL，因此我们建议组建一个教师小组，并使用在线日志型调查来收集整个学年中抽样日的内容和时间数据。我们利用 66 位四年级数学教师的真实每日日志数据，在模拟研究中比较了各种抽样方案。研究结果表明，每周抽样 1 天或每隔一周抽样 1 天可提供准确的组级估计数据，而每周抽样 1 天可提供令人满意的班级估计数据。建议的方法有助于有效监测大规模课堂 OTL。

引用次数: 0

Reframing Research and Assessment Practices: Advancing an Antiracist and Anti-Ableist Research Agenda 重构研究与评估实践：推进反种族主义和反偏见研究议程

IF 2.7 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-03-03 DOI: 10.1111/emip.12601

Angela Johnson, Elizabeth Barker, Marcos Viveros Cespedes

Educators and researchers strive to build policies and practices on data and evidence, especially on academic achievement scores. When assessment scores are inaccurate for specific student populations or when scores are inappropriately used, even data-driven decisions will be misinformed. To maximize the impact of the research-practice-policy collaborative, every stage of the assessment and research process needs to be critically interrogated. In this paper, we highlight the need to reframe assessment and research for multilingual learners, students with disabilities, and multilingual students with disabilities. We outline a framework that integrates three critical perspectives (QuantCrit, DisCrit, and critical multiculturalism) and discuss how this framework can be applied to assessment creation and research.

教育工作者和研究人员努力根据数据和证据，特别是学业成绩分数来制定政策和实践。如果针对特定学生群体的评估分数不准确，或者分数使用不当，那么即使是数据驱动的决策也会被误导。为了最大限度地发挥 "研究-实践-政策 "合作的影响，需要对评估和研究过程的每一个阶段进行严格审查。在本文中，我们强调了为多语言学习者、残疾学生和多语言残疾学生重新构建评估和研究的必要性。我们概述了一个整合了三种批判性视角（QuantCrit、DisCrit 和批判性多元文化主义）的框架，并讨论了如何将这一框架应用于评估创建和研究。

引用次数: 0

Issue Cover 发行封面

IF 2 4区教育学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Educational Measurement-Issues and Practice

Pub Date : 2024-02-28 DOI: 10.1111/emip.12560

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Educational Measurement-Issues and Practice

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀