首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Field-Testing Multiple-Choice Questions With AI Examinees: English Grammar Items. 与人工智能考生一起实地测试多项选择题:英语语法项目。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-03 DOI: 10.1177/00131644241281053
Hotaka Maeda

Field-testing is an essential yet often resource-intensive step in the development of high-quality educational assessments. I introduce an innovative method for field-testing newly written exam items by substituting human examinees with artificially intelligent (AI) examinees. The proposed approach is demonstrated using 466 four-option multiple-choice English grammar questions. Pre-trained transformer language models are fine-tuned based on the 2-parameter logistic (2PL) item response model to respond like human test-takers. Each AI examinee is associated with a latent ability θ, and the item text is used to predict response selection probabilities for each of the four response options. For the best modeling approach identified, the overall correlation between the true and predicted 2PL correct response probabilities was .82 (bias = 0.00, root mean squared error = 0.18). The study results were promising, showing that item response data generated from AI can be used to calculate item proportion correct, item discrimination, conduct item calibration with anchors, distractor analysis, dimensionality analysis, and latent trait scoring. However, the proposed approach did not achieve the level of accuracy obtainable with human examinee response data. If further refined, potential resource savings in transitioning from human to AI field-testing could be enormous. AI could shorten the field-testing timeline, prevent examinees from seeing low-quality field-test items in real exams, shorten test lengths, eliminate test security, item exposure, and sample size concerns, reduce overall cost, and help expand the item bank. Example Python code from this study is available on Github: https://github.com/hotakamaeda/ai_field_testing1.

在开发高质量的教育评估过程中,实地测试是必不可少的一步,但往往需要耗费大量资源。我介绍了一种创新方法,即用人工智能(AI)考生代替人类考生,对新编写的考试项目进行实地测试。我们使用 466 道四选一的英语语法选择题对所提出的方法进行了演示。预先训练好的转换器语言模型根据 2 参数逻辑(2PL)项目响应模型进行微调,以做出与人类考生类似的响应。每个人工智能考生都与潜在能力 θ 相关联,题目文本用于预测四个回答选项中每个选项的回答选择概率。在确定的最佳建模方法中,真实的 2PL 正确作答概率与预测的 2PL 正确作答概率之间的总体相关性为 0.82(偏差 = 0.00,均方根误差 = 0.18)。研究结果很有希望,表明人工智能生成的项目反应数据可用于计算项目正确率、项目区分度、使用锚点进行项目校准、干扰项分析、维度分析和潜在特质评分。然而,所提出的方法并没有达到使用人类考生答题数据所能达到的准确度。如果进一步改进,从人类实地测试过渡到人工智能实地测试可能会节省大量资源。人工智能可以缩短现场测试的时间,防止考生在真实考试中看到低质量的现场测试项目,缩短测试长度,消除测试安全、项目暴露和样本大小方面的顾虑,降低总体成本,并有助于扩大项目库。本研究的 Python 代码示例可在 Github 上获取:https://github.com/hotakamaeda/ai_field_testing1。
{"title":"Field-Testing Multiple-Choice Questions With AI Examinees: English Grammar Items.","authors":"Hotaka Maeda","doi":"10.1177/00131644241281053","DOIUrl":"10.1177/00131644241281053","url":null,"abstract":"<p><p>Field-testing is an essential yet often resource-intensive step in the development of high-quality educational assessments. I introduce an innovative method for field-testing newly written exam items by substituting human examinees with artificially intelligent (AI) examinees. The proposed approach is demonstrated using 466 four-option multiple-choice English grammar questions. Pre-trained transformer language models are fine-tuned based on the 2-parameter logistic (2PL) item response model to respond like human test-takers. Each AI examinee is associated with a latent ability θ, and the item text is used to predict response selection probabilities for each of the four response options. For the best modeling approach identified, the overall correlation between the true and predicted 2PL correct response probabilities was .82 (bias = 0.00, root mean squared error = 0.18). The study results were promising, showing that item response data generated from AI can be used to calculate item proportion correct, item discrimination, conduct item calibration with anchors, distractor analysis, dimensionality analysis, and latent trait scoring. However, the proposed approach did not achieve the level of accuracy obtainable with human examinee response data. If further refined, potential resource savings in transitioning from human to AI field-testing could be enormous. AI could shorten the field-testing timeline, prevent examinees from seeing low-quality field-test items in real exams, shorten test lengths, eliminate test security, item exposure, and sample size concerns, reduce overall cost, and help expand the item bank. Example Python code from this study is available on Github: https://github.com/hotakamaeda/ai_field_testing1.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"221-244"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562880/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Speed-Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations. 利用实验操作评估心理测试中速度与准确性的权衡。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-07 DOI: 10.1177/00131644241271309
Tobias Alfers, Georg Gittler, Esther Ulitzsch, Steffi Pohl

The speed-accuracy tradeoff (SAT), where increased response speed often leads to decreased accuracy, is well established in experimental psychology. However, its implications for psychological assessments, especially in high-stakes settings, remain less understood. This study presents an experimental approach to investigate the SAT within a high-stakes spatial ability assessment. By manipulating instructions in a within-subjects design to induce speed variations in a large sample (N = 1,305) of applicants for an air traffic controller training program, we demonstrate the feasibility of manipulating working speed. Our findings confirm the presence of the SAT for most participants, suggesting that traditional ability scores may not fully reflect performance in high-stakes assessments. Importantly, we observed individual differences in the SAT, challenging the assumption of uniform SAT functions across test takers. These results highlight the complexity of interpreting high-stakes assessment outcomes and the influence of test conditions on performance dynamics. This study offers a valuable addition to the methodological toolkit for assessing the intraindividual relationship between speed and accuracy in psychological testing (including SAT research), providing a controlled approach while acknowledging the need to address potential confounders. Future research may apply this method across various cognitive domains, populations, and testing contexts to deepen our understanding of the SAT's broader implications for psychological measurement.

速度-准确性权衡(SAT),即反应速度的提高往往会导致准确性的降低,这在实验心理学中已得到公认。然而,它对心理测评的影响,尤其是在高风险环境中的影响,仍然鲜为人知。本研究介绍了一种在高风险空间能力评估中研究 SAT 的实验方法。通过在主体内设计中操纵指令,诱导大量(N = 1305)空中交通管制员培训项目申请者的速度变化,我们证明了操纵工作速度的可行性。我们的研究结果证实了大多数参与者的 SAT 存在,这表明传统的能力分数可能无法完全反映高风险评估中的表现。重要的是,我们观察到了 SAT 的个体差异,这挑战了不同应试者 SAT 功能一致的假设。这些结果凸显了解释高风险评估结果的复杂性,以及考试条件对成绩动态的影响。这项研究为评估心理测试(包括 SAT 研究)中速度和准确性之间的个体内部关系提供了一个宝贵的方法工具包,提供了一种受控方法,同时承认有必要解决潜在的混杂因素。未来的研究可能会在不同的认知领域、人群和测试环境中应用这种方法,以加深我们对 SAT 对心理测量的广泛影响的理解。
{"title":"Assessing the Speed-Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations.","authors":"Tobias Alfers, Georg Gittler, Esther Ulitzsch, Steffi Pohl","doi":"10.1177/00131644241271309","DOIUrl":"10.1177/00131644241271309","url":null,"abstract":"<p><p>The speed-accuracy tradeoff (SAT), where increased response speed often leads to decreased accuracy, is well established in experimental psychology. However, its implications for psychological assessments, especially in high-stakes settings, remain less understood. This study presents an experimental approach to investigate the SAT within a high-stakes spatial ability assessment. By manipulating instructions in a within-subjects design to induce speed variations in a large sample (<i>N</i> = 1,305) of applicants for an air traffic controller training program, we demonstrate the feasibility of manipulating working speed. Our findings confirm the presence of the SAT for most participants, suggesting that traditional ability scores may not fully reflect performance in high-stakes assessments. Importantly, we observed individual differences in the SAT, challenging the assumption of uniform SAT functions across test takers. These results highlight the complexity of interpreting high-stakes assessment outcomes and the influence of test conditions on performance dynamics. This study offers a valuable addition to the methodological toolkit for assessing the intraindividual relationship between speed and accuracy in psychological testing (including SAT research), providing a controlled approach while acknowledging the need to address potential confounders. Future research may apply this method across various cognitive domains, populations, and testing contexts to deepen our understanding of the SAT's broader implications for psychological measurement.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"357-383"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretation of the Standardized Mean Difference Effect Size When Distributions Are Not Normal or Homoscedastic. 当分布非正态分布或同态分布时,标准化均值差异效应大小的解释。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-06 DOI: 10.1177/00131644241278928
Larry V Hedges

The standardized mean difference (sometimes called Cohen's d) is an effect size measure widely used to describe the outcomes of experiments. It is mathematically natural to describe differences between groups of data that are normally distributed with different means but the same standard deviation. In that context, it can be interpreted as determining several indexes of overlap between the two distributions. If the data are not approximately normally distributed or if they have substantially unequal standard deviations, the relation between d and overlap between distributions can be very different, and interpretations of d that apply when the data are normal with equal variances are unreliable.

标准化均值差异(有时称为科恩 d)是一种效应大小测量方法,广泛用于描述实验结果。它在数学上很自然地用于描述具有不同均值但相同标准差的正态分布数据组之间的差异。在这种情况下,它可以解释为确定两个分布之间重叠的几个指数。如果数据不是近似正态分布,或者它们的标准差严重不等,那么 d 与分布间重叠度之间的关系就会截然不同,而适用于数据正态分布且方差相等时的 d 解释是不可靠的。
{"title":"Interpretation of the Standardized Mean Difference Effect Size When Distributions Are Not Normal or Homoscedastic.","authors":"Larry V Hedges","doi":"10.1177/00131644241278928","DOIUrl":"10.1177/00131644241278928","url":null,"abstract":"<p><p>The standardized mean difference (sometimes called Cohen's d) is an effect size measure widely used to describe the outcomes of experiments. It is mathematically natural to describe differences between groups of data that are normally distributed with different means but the same standard deviation. In that context, it can be interpreted as determining several indexes of overlap between the two distributions. If the data are not approximately normally distributed or if they have substantially unequal standard deviations, the relation between d and overlap between distributions can be very different, and interpretations of d that apply when the data are normal with equal variances are unreliable.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"245-257"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process. 在标准制定过程中使用 ROC 分析法完善切分分数。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-09-24 DOI: 10.1177/00131644241278925
Dongwei Wang, Lisa A Keller

In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index J. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.

在教育评估中,切分通常是由一组学科专家通过制定标准来确定的。本研究旨在利用接收者操作特征(ROC)分析法调查几个因素对分类准确性的影响,以便在需要完善切分分值时提供统计和理论依据。研究中考察的因素包括相对于切分分值的样本分布、阳性事件的发生率和成本比率。针对四种样本分布的受试者模拟了 40 个项目的回答。此外,还对假阴性和假阳性之间的流行率和成本比进行了处理,以检查它们对分类准确性的影响。结果表明,评价标准所确定的最佳切分往往会使切分更接近能力分布的模式。此外,根据正向事件的发生率和成本比率,最佳切分也会相应地发生变化。根据用于模拟数据的项目参数和模拟样本分布,我们发现,当通过考试在人群中属于低流行率事件时,提高切分分值可在操作上改善分类;而当通过考试属于高流行率事件时,则应降低切分分值以达到最优。随着成本比率的增加,评价标准所建议的最优切分分数会降低。在本研究考察的四个样本分布中,有三个样本在人群中的流行率为 50%时,无论成本比如何,提高切分分值都能增强分类效果。本研究为出于政策原因需要完善切分值时提供了统计证据。
{"title":"Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process.","authors":"Dongwei Wang, Lisa A Keller","doi":"10.1177/00131644241278925","DOIUrl":"10.1177/00131644241278925","url":null,"abstract":"<p><p>In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index <i>J</i>. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"313-335"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Differential Item Functioning Using Response Time. 利用响应时间检测项目功能差异。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-26 DOI: 10.1177/00131644241280400
Qizhou Duan, Ying Cheng

This study investigated uniform differential item functioning (DIF) detection in response times. We proposed a regression analysis approach with both the working speed and the group membership as independent variables, and logarithm transformed response times as the dependent variable. Effect size measures such as Δ R 2 and percentage change in regression coefficients in conjunction with the statistical significance tests were used to flag DIF items. A simulation study was conducted to assess the performance of three DIF detection criteria: (a) significance test, (b) significance test with Δ R 2 , and (c) significance test with the percentage change in regression coefficients. The simulation study considered factors such as sample sizes, proportion of the focal group in relation to total sample size, number of DIF items, and the amount of DIF. The results showed that the significance test alone was too strict; using the percentage change in regression coefficients as an effect size measure reduced the flagging rate when the sample size was large, but the effect was inconsistent across different conditions; using ΔR 2 with significance test reduced the flagging rate and was fairly consistent. The PISA 2018 data were used to illustrate the performance of the proposed method in a real dataset. Furthermore, we provide guidelines for conducting DIF studies with response time.

本研究调查了反应时间中的统一差异项目功能(DIF)检测。我们提出了一种回归分析方法,将工作速度和组员身份作为自变量,将对数转换后的反应时间作为因变量。我们使用Δ R 2 和回归系数变化百分比等效应大小指标,结合统计显著性检验来标记 DIF 项目。我们进行了一项模拟研究,以评估三种 DIF 检测标准的性能:(a) 显著性检验;(b) Δ R 2 的显著性检验;(c) 回归系数百分比变化的显著性检验。模拟研究考虑的因素包括样本量、焦点组在总样本量中所占比例、DIF 项目数和 DIF 量。结果表明,仅使用显著性检验过于严格;使用回归系数的百分比变化作为效应大小衡量标准,在样本量较大时可降低标记率,但在不同条件下效果不一致;使用ΔR 2 并进行显著性检验可降低标记率,且效果相当一致。我们使用 PISA 2018 数据来说明所提方法在真实数据集中的表现。此外,我们还提供了利用响应时间进行 DIF 研究的指南。
{"title":"Detecting Differential Item Functioning Using Response Time.","authors":"Qizhou Duan, Ying Cheng","doi":"10.1177/00131644241280400","DOIUrl":"10.1177/00131644241280400","url":null,"abstract":"<p><p>This study investigated uniform differential item functioning (DIF) detection in response times. We proposed a regression analysis approach with both the working speed and the group membership as independent variables, and logarithm transformed response times as the dependent variable. Effect size measures such as Δ <math> <mrow> <msup><mrow><mi>R</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> and percentage change in regression coefficients in conjunction with the statistical significance tests were used to flag DIF items. A simulation study was conducted to assess the performance of three DIF detection criteria: (a) significance test, (b) significance test with Δ <math> <mrow> <msup><mrow><mi>R</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> , and (c) significance test with the percentage change in regression coefficients. The simulation study considered factors such as sample sizes, proportion of the focal group in relation to total sample size, number of DIF items, and the amount of DIF. The results showed that the significance test alone was too strict; using the percentage change in regression coefficients as an effect size measure reduced the flagging rate when the sample size was large, but the effect was inconsistent across different conditions; using Δ<i>R</i> <sup>2</sup> with significance test reduced the flagging rate and was fairly consistent. The PISA 2018 data were used to illustrate the performance of the proposed method in a real dataset. Furthermore, we provide guidelines for conducting DIF studies with response time.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"291-312"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Effort-Moderated Item Response Theory Models by Evaluating a Two-Step Estimation Method and Multidimensional Variations on the Model. 通过评估两步估算法和模型的多维变化,改进努力调节的项目反应理论模型。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-06 DOI: 10.1177/00131644241280727
Bowen Wang, Corinne Huggins-Manley, Huan Kuang, Jiawei Xiong

Rapid-guessing behavior in data can compromise our ability to estimate item and person parameters accurately. Consequently, it is crucial to model data with rapid-guessing patterns in a way that can produce unbiased ability estimates. This study proposes and evaluates three alternative modeling approaches that follow the logic of the effort-moderated item response theory model (EM-IRT) to analyze response data with rapid-guessing responses. One is the two-step EM-IRT model, which utilizes the item parameters estimated by respondents without rapid-guessing behavior and was initially proposed by Rios and Soland without further investigation. The other two models are effort-moderated multidimensional models (EM-MIRT), which we introduce in this study and vary as both between-item and within-item structures. The advantage of the EM-MIRT model is to account for the underlying relationship between rapid-guessing propensity and ability. The three models were compared with the traditional EM-IRT model regarding the accuracy of parameter recovery in various simulated conditions. Results demonstrated that the two-step EM-IRT and between-item EM-MIRT model consistently outperformed the traditional EM-IRT model under various conditions, with the two-step EM-IRT estimation generally delivering the best performance, especially for ability and item difficulty parameters estimation. In addition, different rapid-guessing patterns (i.e., difficulty-based, changing state, and decreasing effort) did not affect the performance of the two-step EM-IRT model. Overall, the findings suggest that the EM-IRT model with the two-step parameter estimation method can be applied in practice for estimating ability in the presence of rapid-guessing responses due to its accuracy and efficiency. The between-item EM-MIRT model can be used as an alternative model when there is no significant mean difference in the ability estimates between examinees who exhibit rapid-guessing behavior and those who do not.

数据中的快速猜测行为会影响我们准确估计项目和个人参数的能力。因此,对具有快速猜测模式的数据进行建模,使其能够产生无偏的能力估计值至关重要。本研究提出并评估了三种可供选择的建模方法,它们都遵循努力调节项目反应理论模型(EM-IRT)的逻辑,用于分析具有快速猜测反应的反应数据。其中一种是两步式 EM-IRT 模型,它利用的是没有快速猜测行为的被调查者所估计的项目参数,最初是由 Rios 和 Soland 提出的,没有经过进一步研究。另外两个模型是努力调节多维模型(EM-MIRT),我们在本研究中引入了这两个模型,它们既有项目间结构,也有项目内结构。EM-MIRT 模型的优点是考虑了快速猜测倾向与能力之间的内在关系。我们将这三种模型与传统的 EM-IRT 模型在各种模拟条件下的参数恢复准确性进行了比较。结果表明,在各种条件下,两步式 EM-IRT 模型和项目间 EM-MIRT 模型的性能始终优于传统的 EM-IRT 模型,其中两步式 EM-IRT 估计通常性能最佳,尤其是在能力和项目难度参数估计方面。此外,不同的快速猜测模式(即基于难度、改变状态和减少努力)并不影响两步式 EM-IRT 模型的性能。总之,研究结果表明,采用两步参数估计法的 EM-IRT 模型因其准确性和高效性,可实际用于存在快速猜测反应时的能力估计。当表现出快速猜测行为的考生与未表现出快速猜测行为的考生之间的能力估计平均值无显著差异时,可使用项目间 EM-MIRT 模型作为替代模型。
{"title":"Enhancing Effort-Moderated Item Response Theory Models by Evaluating a Two-Step Estimation Method and Multidimensional Variations on the Model.","authors":"Bowen Wang, Corinne Huggins-Manley, Huan Kuang, Jiawei Xiong","doi":"10.1177/00131644241280727","DOIUrl":"10.1177/00131644241280727","url":null,"abstract":"<p><p>Rapid-guessing behavior in data can compromise our ability to estimate item and person parameters accurately. Consequently, it is crucial to model data with rapid-guessing patterns in a way that can produce unbiased ability estimates. This study proposes and evaluates three alternative modeling approaches that follow the logic of the effort-moderated item response theory model (EM-IRT) to analyze response data with rapid-guessing responses. One is the two-step EM-IRT model, which utilizes the item parameters estimated by respondents without rapid-guessing behavior and was initially proposed by Rios and Soland without further investigation. The other two models are effort-moderated multidimensional models (EM-MIRT), which we introduce in this study and vary as both between-item and within-item structures. The advantage of the EM-MIRT model is to account for the underlying relationship between rapid-guessing propensity and ability. The three models were compared with the traditional EM-IRT model regarding the accuracy of parameter recovery in various simulated conditions. Results demonstrated that the two-step EM-IRT and between-item EM-MIRT model consistently outperformed the traditional EM-IRT model under various conditions, with the two-step EM-IRT estimation generally delivering the best performance, especially for ability and item difficulty parameters estimation. In addition, different rapid-guessing patterns (i.e., difficulty-based, changing state, and decreasing effort) did not affect the performance of the two-step EM-IRT model. Overall, the findings suggest that the EM-IRT model with the two-step parameter estimation method can be applied in practice for estimating ability in the presence of rapid-guessing responses due to its accuracy and efficiency. The between-item EM-MIRT model can be used as an alternative model when there is no significant mean difference in the ability estimates between examinees who exhibit rapid-guessing behavior and those who do not.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"401-423"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Precision in Predicting Magnitude of Differential Item Functioning: An M-DIF Pretrained Model Approach. 提高项目功能差异幅度预测的精确度:一种 M-DIF 预训练模型方法。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-01 DOI: 10.1177/00131644241279882
Shan Huang, Hidetoki Ishii

Despite numerous studies on the magnitude of differential item functioning (DIF), different DIF detection methods often define effect sizes inconsistently and fail to adequately account for testing conditions. To address these limitations, this study introduces the unified M-DIF model, which defines the magnitude of DIF as the difference in item difficulty parameters between reference and focal groups. The M-DIF model can incorporate various DIF detection methods and test conditions to form a quantitative model. The pretrained approach was employed to leverage a sufficiently representative large sample as the training set and ensure the model's generalizability. Once the pretrained model is constructed, it can be directly applied to new data. Specifically, a training dataset comprising 144 combinations of test conditions and 144,000 potential DIF items, each equipped with 29 statistical metrics, was used. We adopt the XGBoost method for modeling. Results show that, based on root mean square error (RMSE) and BIAS metrics, the M-DIF model outperforms the baseline model in both validation sets: under consistent and inconsistent test conditions. Across all 360 combinations of test conditions (144 consistent and 216 inconsistent with the training set), the M-DIF model demonstrates lower RMSE in 357 cases (99.2%), illustrating its robustness. Finally, we provided an empirical example to showcase the practical feasibility of implementing the M-DIF model.

尽管关于差异项目功能(DIF)大小的研究不胜枚举,但不同的 DIF 检测方法对效应大小的定义往往不一致,而且未能充分考虑测试条件。为了解决这些局限性,本研究引入了统一的 M-DIF 模型,该模型将 DIF 的大小定义为参照组和焦点组之间项目难度参数的差异。M-DIF 模型可以将各种 DIF 检测方法和测试条件结合起来,形成一个定量模型。采用预训练方法是为了利用具有足够代表性的大样本作为训练集,确保模型的普适性。一旦构建了预训练模型,就可以直接应用于新数据。具体来说,训练数据集包括 144 种测试条件组合和 144,000 个潜在的 DIF 项目,每个项目都有 29 个统计指标。我们采用 XGBoost 方法进行建模。结果表明,根据均方根误差(RMSE)和 BIAS 指标,M-DIF 模型在两个验证集(一致和不一致测试条件下)的表现都优于基线模型。在所有 360 种测试条件组合(144 种与训练集一致,216 种与训练集不一致)中,M-DIF 模型在 357 种情况下(99.2%)显示出较低的 RMSE,这说明了它的鲁棒性。最后,我们提供了一个实证案例来展示实施 M-DIF 模型的实际可行性。
{"title":"Enhancing Precision in Predicting Magnitude of Differential Item Functioning: An M-DIF Pretrained Model Approach.","authors":"Shan Huang, Hidetoki Ishii","doi":"10.1177/00131644241279882","DOIUrl":"10.1177/00131644241279882","url":null,"abstract":"<p><p>Despite numerous studies on the magnitude of differential item functioning (DIF), different DIF detection methods often define effect sizes inconsistently and fail to adequately account for testing conditions. To address these limitations, this study introduces the unified M-DIF model, which defines the magnitude of DIF as the difference in item difficulty parameters between reference and focal groups. The M-DIF model can incorporate various DIF detection methods and test conditions to form a quantitative model. The pretrained approach was employed to leverage a sufficiently representative large sample as the training set and ensure the model's generalizability. Once the pretrained model is constructed, it can be directly applied to new data. Specifically, a training dataset comprising 144 combinations of test conditions and 144,000 potential DIF items, each equipped with 29 statistical metrics, was used. We adopt the XGBoost method for modeling. Results show that, based on root mean square error (RMSE) and BIAS metrics, the M-DIF model outperforms the baseline model in both validation sets: under consistent and inconsistent test conditions. Across all 360 combinations of test conditions (144 consistent and 216 inconsistent with the training set), the M-DIF model demonstrates lower RMSE in 357 cases (99.2%), illustrating its robustness. Finally, we provided an empirical example to showcase the practical feasibility of implementing the M-DIF model.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"384-400"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Performance of Strategies for Handling Rapid Guessing Responses in Item Response Theory Equating. 项目反应理论等价中处理快速猜测反应策略的性能评估。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-30 DOI: 10.1177/00131644251329524
Juyoung Jung, Won-Chan Lee

This study assesses the performance of strategies for handling rapid guessing responses (RGs) within the context of item response theory observed-score equating. Four distinct approaches were evaluated: (1) ignoring RGs, (2) penalizing RGs as incorrect responses, (3) implementing list-wise deletion (LWD), and (4) treating RGs as missing data followed by imputation using logistic regression-based methodologies. These strategies were examined across a diverse array of testing scenarios. Results indicate that the performance of each strategy varied depending on the specific manipulated factors. Both ignoring and penalizing RGs were found to introduce substantial distortions in equating accuracy. LWD generally exhibited the lowest bias among the strategies evaluated but showed higher standard errors. Data imputation methods, particularly those employing lasso logistic regression and bootstrap techniques, demonstrated superior performance in minimizing equating errors compared to other approaches.

本研究评估了项目反应理论背景下处理快速猜测反应(RGs)策略的表现。评估了四种不同的方法:(1)忽略RGs,(2)将RGs作为不正确的响应进行惩罚,(3)实施列表明智删除(LWD),以及(4)将RGs作为缺失数据处理,然后使用基于逻辑回归的方法进行代入。这些策略在一系列不同的测试场景中进行了检验。结果表明,每种策略的性能取决于特定的操纵因素。我们发现忽视和惩罚rg都会导致相当的准确性扭曲。在评估的策略中,LWD通常表现出最低的偏倚,但显示出较高的标准误差。与其他方法相比,数据输入方法,特别是那些采用套索逻辑回归和自举技术的方法,在最小化等式误差方面表现出优越的性能。
{"title":"Assessing the Performance of Strategies for Handling Rapid Guessing Responses in Item Response Theory Equating.","authors":"Juyoung Jung, Won-Chan Lee","doi":"10.1177/00131644251329524","DOIUrl":"10.1177/00131644251329524","url":null,"abstract":"<p><p>This study assesses the performance of strategies for handling rapid guessing responses (RGs) within the context of item response theory observed-score equating. Four distinct approaches were evaluated: (1) ignoring RGs, (2) penalizing RGs as incorrect responses, (3) implementing list-wise deletion (LWD), and (4) treating RGs as missing data followed by imputation using logistic regression-based methodologies. These strategies were examined across a diverse array of testing scenarios. Results indicate that the performance of each strategy varied depending on the specific manipulated factors. Both ignoring and penalizing RGs were found to introduce substantial distortions in equating accuracy. LWD generally exhibited the lowest bias among the strategies evaluated but showed higher standard errors. Data imputation methods, particularly those employing lasso logistic regression and bootstrap techniques, demonstrated superior performance in minimizing equating errors compared to other approaches.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251329524"},"PeriodicalIF":2.1,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143763405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Properties and Functioning of Model-Based Sum Scores in Multidimensional Measures With Local Item Dependencies: A Comprehensive Proposal. 评估局部项目依赖多维度量中基于模型的总和分数的性质和功能:一个综合建议。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-13 DOI: 10.1177/00131644251319286
Pere J Ferrando, David Navarro-González, Fabia Morales-Vives

A common problem in the assessment of noncognitive attributes is the presence of items with correlated residuals. Although most studies have focused on their effect at the structural level, they may also have an effect on the accuracy and effectiveness of the scores derived from extended factor analytic (FA) solutions which include correlated residuals. For this reason, several measures of reliability/factor saturation and information were developed in a previous study to assess this effect in sum scores derived from unidimensional measures based on both linear and nonlinear FA solutions. The current article extends these proposals to a second-order solution with a single general factor, and it also extends the added-value principle to the second-order scenario when local dependences are operating. Related to the added-value, a new coefficient is developed (an effect-size index and its confidence intervals). Overall, what is proposed allows first to assess the reliability and relative efficiency of the scores at both the subscale and total scale levels, and second, provides information on the appropriateness of using subscale scores to predict their own factor in comparison to the predictive capacity of the total score. All that is proposed is implemented in a freely available R program. Its usefulness is illustrated with an empirical example, which shows the distortions that correlated residuals may cause and how the various measures included in this proposal should be interpreted.

非认知属性评估中的一个常见问题是存在相关残差的项目。虽然大多数研究都侧重于其在结构层面上的影响,但它们也可能会影响由包含相关残差的扩展因子分析(FA)方案得出的分数的准确性和有效性。因此,在之前的一项研究中开发了几种可靠性/因子饱和度和信息量的测量方法,以评估基于线性和非线性 FA 解决方案的单维测量所得出的总分的这种影响。本文将这些建议扩展到具有单个一般因子的二阶解法,并将附加值原则扩展到局部依赖性起作用时的二阶方案。与附加值相关,还开发了一种新的系数(效应大小指数及其置信区间)。总之,所提出的建议首先可以评估子量表和总量表层面分数的可靠性和相对效率,其次,与总分的预测能力相比,提供了使用子量表分数预测其自身因素是否合适的信息。所有建议都在一个免费提供的 R 程序中实现。该程序的实用性通过一个实证例子进行了说明,该例子显示了相关残差可能导致的失真,以及应如何解释本建议中包含的各种测量方法。
{"title":"Assessing the Properties and Functioning of Model-Based Sum Scores in Multidimensional Measures With Local Item Dependencies: A Comprehensive Proposal.","authors":"Pere J Ferrando, David Navarro-González, Fabia Morales-Vives","doi":"10.1177/00131644251319286","DOIUrl":"https://doi.org/10.1177/00131644251319286","url":null,"abstract":"<p><p>A common problem in the assessment of noncognitive attributes is the presence of items with correlated residuals. Although most studies have focused on their effect at the structural level, they may also have an effect on the accuracy and effectiveness of the scores derived from extended factor analytic (FA) solutions which include correlated residuals. For this reason, several measures of reliability/factor saturation and information were developed in a previous study to assess this effect in sum scores derived from unidimensional measures based on both linear and nonlinear FA solutions. The current article extends these proposals to a second-order solution with a single general factor, and it also extends the added-value principle to the second-order scenario when local dependences are operating. Related to the added-value, a new coefficient is developed (an effect-size index and its confidence intervals). Overall, what is proposed allows first to assess the reliability and relative efficiency of the scores at both the subscale and total scale levels, and second, provides information on the appropriateness of using subscale scores to predict their own factor in comparison to the predictive capacity of the total score. All that is proposed is implemented in a freely available R program. Its usefulness is illustrated with an empirical example, which shows the distortions that correlated residuals may cause and how the various measures included in this proposal should be interpreted.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251319286"},"PeriodicalIF":2.1,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11907499/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143647648","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Shortening Psychological Scales: Semantic Similarity Matters. 缩短心理量表:语义相似性问题。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-02-24 DOI: 10.1177/00131644251319047
Sevilay Kilmen, Okan Bulut

In this study, we proposed a novel scale abbreviation method based on sentence embeddings and compared it to two established automatic scale abbreviation techniques. Scale abbreviation methods typically rely on administering the full scale to a large representative sample, which is often impractical in certain settings. Our approach leverages the semantic similarity among the items to select abbreviated versions of scales without requiring response data, offering a practical alternative for scale development. We found that the sentence embedding method performs comparably to the data-driven scale abbreviation approaches in terms of model fit, measurement accuracy, and ability estimates. In addition, our results reveal a moderate negative correlation between item discrimination parameters and semantic similarity indices, suggesting that semantically unique items may result in a higher discrimination power. This supports the notion that semantic features can be predictive of psychometric properties. However, this relationship was not observed for reverse-scored items, which may require further investigation. Overall, our findings suggest that the sentence embedding approach offers a promising solution for scale abbreviation, particularly in situations where large sample sizes are unavailable, and may eventually serve as an alternative to traditional data-driven methods.

在本研究中,我们提出了一种新的基于句子嵌入的尺度缩略语方法,并将其与已有的两种自动尺度缩略语技术进行了比较。量表缩写方法通常依赖于对一个大的代表性样本进行全量表的管理,这在某些情况下通常是不切实际的。我们的方法利用项目之间的语义相似性来选择缩略版本的量表,而不需要响应数据,为量表开发提供了一种实用的替代方案。我们发现句子嵌入方法在模型拟合、测量精度和能力估计方面与数据驱动的尺度缩写方法表现相当。此外,我们的研究结果显示,项目识别参数与语义相似度指标之间存在适度的负相关,这表明语义独特的项目可能导致更高的识别能力。这支持了语义特征可以预测心理测量特性的观点。然而,这种关系在反向得分项目中没有观察到,这可能需要进一步的调查。总的来说,我们的研究结果表明,句子嵌入方法为尺度缩写提供了一个有希望的解决方案,特别是在无法获得大样本量的情况下,并且最终可能成为传统数据驱动方法的替代方案。
{"title":"Shortening Psychological Scales: Semantic Similarity Matters.","authors":"Sevilay Kilmen, Okan Bulut","doi":"10.1177/00131644251319047","DOIUrl":"10.1177/00131644251319047","url":null,"abstract":"<p><p>In this study, we proposed a novel scale abbreviation method based on sentence embeddings and compared it to two established automatic scale abbreviation techniques. Scale abbreviation methods typically rely on administering the full scale to a large representative sample, which is often impractical in certain settings. Our approach leverages the semantic similarity among the items to select abbreviated versions of scales without requiring response data, offering a practical alternative for scale development. We found that the sentence embedding method performs comparably to the data-driven scale abbreviation approaches in terms of model fit, measurement accuracy, and ability estimates. In addition, our results reveal a moderate negative correlation between item discrimination parameters and semantic similarity indices, suggesting that semantically unique items may result in a higher discrimination power. This supports the notion that semantic features can be predictive of psychometric properties. However, this relationship was not observed for reverse-scored items, which may require further investigation. Overall, our findings suggest that the sentence embedding approach offers a promising solution for scale abbreviation, particularly in situations where large sample sizes are unavailable, and may eventually serve as an alternative to traditional data-driven methods.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251319047"},"PeriodicalIF":2.1,"publicationDate":"2025-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11851598/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143515073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1