首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
Evaluating Change in Adjusted R-Square and R-Square Indices: A Latent Variable Method Application. 评价调整后r方和r方指数的变化:一种潜在变量法的应用。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-11 DOI: 10.1177/00131644251329178
Tenko Raykov, Christine DiStefano

A procedure for interval estimation of the difference in the adjusted R-square index for nested linear models is discussed. The method yields as a byproduct confidence intervals for their standard R-square difference, as well as for the adjusted and standard R-squares associated with each model. The resulting interval estimate of the difference in adjusted R-square represents a useful and informative complement to the commonly used R-square change statistic and its significance test in model selection and contains substantially more information than that test. The outlined procedure is readily employed with popular software in empirical educational and psychological studies and is illustrated with numerical data.

讨论了嵌套线性模型调整后r平方指数差的区间估计方法。该方法产生其标准r平方差的副产物置信区间,以及与每个模型相关的调整和标准r平方的置信区间。调整后r方差异的区间估计值是对模型选择中常用的r方变化统计量及其显著性检验的有用且信息丰富的补充,并且包含比该检验多得多的信息。概述的程序很容易在经验教育和心理学研究中使用流行的软件,并用数值数据说明。
{"title":"Evaluating Change in Adjusted <i>R</i>-Square and <i>R</i>-Square Indices: A Latent Variable Method Application.","authors":"Tenko Raykov, Christine DiStefano","doi":"10.1177/00131644251329178","DOIUrl":"https://doi.org/10.1177/00131644251329178","url":null,"abstract":"<p><p>A procedure for interval estimation of the difference in the adjusted <i>R</i>-square index for nested linear models is discussed. The method yields as a byproduct confidence intervals for their standard <i>R</i>-square difference, as well as for the adjusted and standard <i>R</i>-squares associated with each model. The resulting interval estimate of the difference in adjusted <i>R</i>-square represents a useful and informative complement to the commonly used <i>R</i>-square change statistic and its significance test in model selection and contains substantially more information than that test. The outlined procedure is readily employed with popular software in empirical educational and psychological studies and is illustrated with numerical data.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251329178"},"PeriodicalIF":2.1,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11993540/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143985479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differential Item Functioning Effect Size Use for Validity Information. 差异项目功能效应大小用于有效性信息。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-11-22 DOI: 10.1177/00131644241293694
W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch

There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln OR ¯ FE ) and the variance of the Mantel-Haenszel log odds ratio ( τ ^ 2 ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.

人们一直在强调差异项目功能(DIF)的效应大小,目的是了解通过统计显著性检验发现的差异的程度。根据分析方法的不同,提出了几种不同的效应大小,以及不同的解释准则。本模拟研究的目的是比较用于量化和比较两个评估中 DIF 量的 DIF 效果大小测量的性能。对一些被认为会影响效应大小或已知会影响 DIF 检测的因素进行了操作。本研究提出了以下两个问题。首先,效应大小是否准确地反映了各项目之间的总体 DIF?其次,效应大小是否能准确确定哪项评估的 DIF 量最少?我们强调了在几种模拟条件下表现良好的效应大小。我们还将这些效应量应用于一个真实数据集,以提供一个示例。研究结果表明,固定效应的对数几率比(Ln OR ¯ FE)和曼特尔-海恩泽尔对数几率比的方差(τ ^ 2)对于识别哪种测试包含更多的 DIF 最为准确。我们指出了这项工作的未来方向,有助于继续关注效应大小以了解 DIF 的程度。
{"title":"Differential Item Functioning Effect Size Use for Validity Information.","authors":"W Holmes Finch, Maria Dolores Hidalgo Montesinos, Brian F French, Maria Hernandez Finch","doi":"10.1177/00131644241293694","DOIUrl":"10.1177/00131644241293694","url":null,"abstract":"<p><p>There has been an emphasis on effect sizes for differential item functioning (DIF) with the purpose to understand the magnitude of the differences that are detected through statistical significance testing. Several different effect sizes have been suggested that correspond to the method used for analysis, as have different guidelines for interpretation. The purpose of this simulation study was to compare the performance of the DIF effect size measures described for quantifying and comparing the amount of DIF in two assessments. Several factors were manipulated that were thought to influence the effect sizes or are known to influence DIF detection. This study asked the following two questions. First, do the effect sizes accurately capture aggregate DIF across items? Second, do effect sizes accurately identify which assessment has the least amount of DIF? We highlight effect sizes that had support for performing well across several simulated conditions. We also apply these effect sizes to a real data set to provide an example. Results of the study revealed that the log odds ratio of fixed effects (Ln <math> <mrow> <msub> <mrow> <mover><mrow><mi>OR</mi></mrow> <mo>¯</mo></mover> </mrow> <mrow><mi>FE</mi></mrow> </msub> </mrow> </math> ) and the variance of the Mantel-Haenszel log odds ratio ( <math> <mrow> <msup> <mrow> <mover><mrow><mi>τ</mi></mrow> <mo>^</mo></mover> </mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> ) were most accurate for identifying which test contains more DIF. We point to future directions with this work to aid the continued focus on effect sizes to understand DIF magnitude.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"258-276"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11583394/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142709569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Field-Testing Multiple-Choice Questions With AI Examinees: English Grammar Items. 与人工智能考生一起实地测试多项选择题:英语语法项目。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-03 DOI: 10.1177/00131644241281053
Hotaka Maeda

Field-testing is an essential yet often resource-intensive step in the development of high-quality educational assessments. I introduce an innovative method for field-testing newly written exam items by substituting human examinees with artificially intelligent (AI) examinees. The proposed approach is demonstrated using 466 four-option multiple-choice English grammar questions. Pre-trained transformer language models are fine-tuned based on the 2-parameter logistic (2PL) item response model to respond like human test-takers. Each AI examinee is associated with a latent ability θ, and the item text is used to predict response selection probabilities for each of the four response options. For the best modeling approach identified, the overall correlation between the true and predicted 2PL correct response probabilities was .82 (bias = 0.00, root mean squared error = 0.18). The study results were promising, showing that item response data generated from AI can be used to calculate item proportion correct, item discrimination, conduct item calibration with anchors, distractor analysis, dimensionality analysis, and latent trait scoring. However, the proposed approach did not achieve the level of accuracy obtainable with human examinee response data. If further refined, potential resource savings in transitioning from human to AI field-testing could be enormous. AI could shorten the field-testing timeline, prevent examinees from seeing low-quality field-test items in real exams, shorten test lengths, eliminate test security, item exposure, and sample size concerns, reduce overall cost, and help expand the item bank. Example Python code from this study is available on Github: https://github.com/hotakamaeda/ai_field_testing1.

在开发高质量的教育评估过程中,实地测试是必不可少的一步,但往往需要耗费大量资源。我介绍了一种创新方法,即用人工智能(AI)考生代替人类考生,对新编写的考试项目进行实地测试。我们使用 466 道四选一的英语语法选择题对所提出的方法进行了演示。预先训练好的转换器语言模型根据 2 参数逻辑(2PL)项目响应模型进行微调,以做出与人类考生类似的响应。每个人工智能考生都与潜在能力 θ 相关联,题目文本用于预测四个回答选项中每个选项的回答选择概率。在确定的最佳建模方法中,真实的 2PL 正确作答概率与预测的 2PL 正确作答概率之间的总体相关性为 0.82(偏差 = 0.00,均方根误差 = 0.18)。研究结果很有希望,表明人工智能生成的项目反应数据可用于计算项目正确率、项目区分度、使用锚点进行项目校准、干扰项分析、维度分析和潜在特质评分。然而,所提出的方法并没有达到使用人类考生答题数据所能达到的准确度。如果进一步改进,从人类实地测试过渡到人工智能实地测试可能会节省大量资源。人工智能可以缩短现场测试的时间,防止考生在真实考试中看到低质量的现场测试项目,缩短测试长度,消除测试安全、项目暴露和样本大小方面的顾虑,降低总体成本,并有助于扩大项目库。本研究的 Python 代码示例可在 Github 上获取:https://github.com/hotakamaeda/ai_field_testing1。
{"title":"Field-Testing Multiple-Choice Questions With AI Examinees: English Grammar Items.","authors":"Hotaka Maeda","doi":"10.1177/00131644241281053","DOIUrl":"10.1177/00131644241281053","url":null,"abstract":"<p><p>Field-testing is an essential yet often resource-intensive step in the development of high-quality educational assessments. I introduce an innovative method for field-testing newly written exam items by substituting human examinees with artificially intelligent (AI) examinees. The proposed approach is demonstrated using 466 four-option multiple-choice English grammar questions. Pre-trained transformer language models are fine-tuned based on the 2-parameter logistic (2PL) item response model to respond like human test-takers. Each AI examinee is associated with a latent ability θ, and the item text is used to predict response selection probabilities for each of the four response options. For the best modeling approach identified, the overall correlation between the true and predicted 2PL correct response probabilities was .82 (bias = 0.00, root mean squared error = 0.18). The study results were promising, showing that item response data generated from AI can be used to calculate item proportion correct, item discrimination, conduct item calibration with anchors, distractor analysis, dimensionality analysis, and latent trait scoring. However, the proposed approach did not achieve the level of accuracy obtainable with human examinee response data. If further refined, potential resource savings in transitioning from human to AI field-testing could be enormous. AI could shorten the field-testing timeline, prevent examinees from seeing low-quality field-test items in real exams, shorten test lengths, eliminate test security, item exposure, and sample size concerns, reduce overall cost, and help expand the item bank. Example Python code from this study is available on Github: https://github.com/hotakamaeda/ai_field_testing1.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"221-244"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562880/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Speed-Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations. 利用实验操作评估心理测试中速度与准确性的权衡。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-07 DOI: 10.1177/00131644241271309
Tobias Alfers, Georg Gittler, Esther Ulitzsch, Steffi Pohl

The speed-accuracy tradeoff (SAT), where increased response speed often leads to decreased accuracy, is well established in experimental psychology. However, its implications for psychological assessments, especially in high-stakes settings, remain less understood. This study presents an experimental approach to investigate the SAT within a high-stakes spatial ability assessment. By manipulating instructions in a within-subjects design to induce speed variations in a large sample (N = 1,305) of applicants for an air traffic controller training program, we demonstrate the feasibility of manipulating working speed. Our findings confirm the presence of the SAT for most participants, suggesting that traditional ability scores may not fully reflect performance in high-stakes assessments. Importantly, we observed individual differences in the SAT, challenging the assumption of uniform SAT functions across test takers. These results highlight the complexity of interpreting high-stakes assessment outcomes and the influence of test conditions on performance dynamics. This study offers a valuable addition to the methodological toolkit for assessing the intraindividual relationship between speed and accuracy in psychological testing (including SAT research), providing a controlled approach while acknowledging the need to address potential confounders. Future research may apply this method across various cognitive domains, populations, and testing contexts to deepen our understanding of the SAT's broader implications for psychological measurement.

速度-准确性权衡(SAT),即反应速度的提高往往会导致准确性的降低,这在实验心理学中已得到公认。然而,它对心理测评的影响,尤其是在高风险环境中的影响,仍然鲜为人知。本研究介绍了一种在高风险空间能力评估中研究 SAT 的实验方法。通过在主体内设计中操纵指令,诱导大量(N = 1305)空中交通管制员培训项目申请者的速度变化,我们证明了操纵工作速度的可行性。我们的研究结果证实了大多数参与者的 SAT 存在,这表明传统的能力分数可能无法完全反映高风险评估中的表现。重要的是,我们观察到了 SAT 的个体差异,这挑战了不同应试者 SAT 功能一致的假设。这些结果凸显了解释高风险评估结果的复杂性,以及考试条件对成绩动态的影响。这项研究为评估心理测试(包括 SAT 研究)中速度和准确性之间的个体内部关系提供了一个宝贵的方法工具包,提供了一种受控方法,同时承认有必要解决潜在的混杂因素。未来的研究可能会在不同的认知领域、人群和测试环境中应用这种方法,以加深我们对 SAT 对心理测量的广泛影响的理解。
{"title":"Assessing the Speed-Accuracy Tradeoff in Psychological Testing Using Experimental Manipulations.","authors":"Tobias Alfers, Georg Gittler, Esther Ulitzsch, Steffi Pohl","doi":"10.1177/00131644241271309","DOIUrl":"10.1177/00131644241271309","url":null,"abstract":"<p><p>The speed-accuracy tradeoff (SAT), where increased response speed often leads to decreased accuracy, is well established in experimental psychology. However, its implications for psychological assessments, especially in high-stakes settings, remain less understood. This study presents an experimental approach to investigate the SAT within a high-stakes spatial ability assessment. By manipulating instructions in a within-subjects design to induce speed variations in a large sample (<i>N</i> = 1,305) of applicants for an air traffic controller training program, we demonstrate the feasibility of manipulating working speed. Our findings confirm the presence of the SAT for most participants, suggesting that traditional ability scores may not fully reflect performance in high-stakes assessments. Importantly, we observed individual differences in the SAT, challenging the assumption of uniform SAT functions across test takers. These results highlight the complexity of interpreting high-stakes assessment outcomes and the influence of test conditions on performance dynamics. This study offers a valuable addition to the methodological toolkit for assessing the intraindividual relationship between speed and accuracy in psychological testing (including SAT research), providing a controlled approach while acknowledging the need to address potential confounders. Future research may apply this method across various cognitive domains, populations, and testing contexts to deepen our understanding of the SAT's broader implications for psychological measurement.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"357-383"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562887/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interpretation of the Standardized Mean Difference Effect Size When Distributions Are Not Normal or Homoscedastic. 当分布非正态分布或同态分布时,标准化均值差异效应大小的解释。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-06 DOI: 10.1177/00131644241278928
Larry V Hedges

The standardized mean difference (sometimes called Cohen's d) is an effect size measure widely used to describe the outcomes of experiments. It is mathematically natural to describe differences between groups of data that are normally distributed with different means but the same standard deviation. In that context, it can be interpreted as determining several indexes of overlap between the two distributions. If the data are not approximately normally distributed or if they have substantially unequal standard deviations, the relation between d and overlap between distributions can be very different, and interpretations of d that apply when the data are normal with equal variances are unreliable.

标准化均值差异(有时称为科恩 d)是一种效应大小测量方法,广泛用于描述实验结果。它在数学上很自然地用于描述具有不同均值但相同标准差的正态分布数据组之间的差异。在这种情况下,它可以解释为确定两个分布之间重叠的几个指数。如果数据不是近似正态分布,或者它们的标准差严重不等,那么 d 与分布间重叠度之间的关系就会截然不同,而适用于数据正态分布且方差相等时的 d 解释是不可靠的。
{"title":"Interpretation of the Standardized Mean Difference Effect Size When Distributions Are Not Normal or Homoscedastic.","authors":"Larry V Hedges","doi":"10.1177/00131644241278928","DOIUrl":"10.1177/00131644241278928","url":null,"abstract":"<p><p>The standardized mean difference (sometimes called Cohen's d) is an effect size measure widely used to describe the outcomes of experiments. It is mathematically natural to describe differences between groups of data that are normally distributed with different means but the same standard deviation. In that context, it can be interpreted as determining several indexes of overlap between the two distributions. If the data are not approximately normally distributed or if they have substantially unequal standard deviations, the relation between d and overlap between distributions can be very different, and interpretations of d that apply when the data are normal with equal variances are unreliable.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"245-257"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562970/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process. 在标准制定过程中使用 ROC 分析法完善切分分数。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-09-24 DOI: 10.1177/00131644241278925
Dongwei Wang, Lisa A Keller

In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index J. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.

在教育评估中,切分通常是由一组学科专家通过制定标准来确定的。本研究旨在利用接收者操作特征(ROC)分析法调查几个因素对分类准确性的影响,以便在需要完善切分分值时提供统计和理论依据。研究中考察的因素包括相对于切分分值的样本分布、阳性事件的发生率和成本比率。针对四种样本分布的受试者模拟了 40 个项目的回答。此外,还对假阴性和假阳性之间的流行率和成本比进行了处理,以检查它们对分类准确性的影响。结果表明,评价标准所确定的最佳切分往往会使切分更接近能力分布的模式。此外,根据正向事件的发生率和成本比率,最佳切分也会相应地发生变化。根据用于模拟数据的项目参数和模拟样本分布,我们发现,当通过考试在人群中属于低流行率事件时,提高切分分值可在操作上改善分类;而当通过考试属于高流行率事件时,则应降低切分分值以达到最优。随着成本比率的增加,评价标准所建议的最优切分分数会降低。在本研究考察的四个样本分布中,有三个样本在人群中的流行率为 50%时,无论成本比如何,提高切分分值都能增强分类效果。本研究为出于政策原因需要完善切分值时提供了统计证据。
{"title":"Using ROC Analysis to Refine Cut Scores Following a Standard Setting Process.","authors":"Dongwei Wang, Lisa A Keller","doi":"10.1177/00131644241278925","DOIUrl":"10.1177/00131644241278925","url":null,"abstract":"<p><p>In educational assessment, cut scores are often defined through standard setting by a group of subject matter experts. This study aims to investigate the impact of several factors on classification accuracy using the receiver operating characteristic (ROC) analysis to provide statistical and theoretical evidence when the cut score needs to be refined. Factors examined in the study include the sample distribution relative to the cut score, prevalence of the positive event, and cost ratio. Forty item responses were simulated for examinees of four sample distributions. In addition, the prevalence and cost ratio between false negatives and false positives were manipulated to examine their impacts on classification accuracy. The optimal cut score is identified using the Youden Index <i>J</i>. The results showed that the optimal cut score identified by the evaluation criterion tended to pull the cut score closer to the mode of the proficiency distribution. In addition, depending on the prevalence of the positive event and cost ratio, the optimal cut score shifts accordingly. With the item parameters used to simulate the data and the simulated sample distributions, it was found that when passing the exam is a low-prevalence event in the population, increasing the cut score operationally improves the classification; when passing the exam is a high-prevalence event, then cut score should be reduced to achieve optimality. As the cost ratio increases, the optimal cut score suggested by the evaluation criterion decreases. In three out of the four sample distributions examined in this study, increasing the cut score enhanced the classification, irrespective of the cost ratio when the prevalence in the population is 50%. This study provides statistical evidence when the cut score needs to be refined for policy reasons.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"313-335"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562877/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Detecting Differential Item Functioning Using Response Time. 利用响应时间检测项目功能差异。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-26 DOI: 10.1177/00131644241280400
Qizhou Duan, Ying Cheng

This study investigated uniform differential item functioning (DIF) detection in response times. We proposed a regression analysis approach with both the working speed and the group membership as independent variables, and logarithm transformed response times as the dependent variable. Effect size measures such as Δ R 2 and percentage change in regression coefficients in conjunction with the statistical significance tests were used to flag DIF items. A simulation study was conducted to assess the performance of three DIF detection criteria: (a) significance test, (b) significance test with Δ R 2 , and (c) significance test with the percentage change in regression coefficients. The simulation study considered factors such as sample sizes, proportion of the focal group in relation to total sample size, number of DIF items, and the amount of DIF. The results showed that the significance test alone was too strict; using the percentage change in regression coefficients as an effect size measure reduced the flagging rate when the sample size was large, but the effect was inconsistent across different conditions; using ΔR 2 with significance test reduced the flagging rate and was fairly consistent. The PISA 2018 data were used to illustrate the performance of the proposed method in a real dataset. Furthermore, we provide guidelines for conducting DIF studies with response time.

本研究调查了反应时间中的统一差异项目功能(DIF)检测。我们提出了一种回归分析方法,将工作速度和组员身份作为自变量,将对数转换后的反应时间作为因变量。我们使用Δ R 2 和回归系数变化百分比等效应大小指标,结合统计显著性检验来标记 DIF 项目。我们进行了一项模拟研究,以评估三种 DIF 检测标准的性能:(a) 显著性检验;(b) Δ R 2 的显著性检验;(c) 回归系数百分比变化的显著性检验。模拟研究考虑的因素包括样本量、焦点组在总样本量中所占比例、DIF 项目数和 DIF 量。结果表明,仅使用显著性检验过于严格;使用回归系数的百分比变化作为效应大小衡量标准,在样本量较大时可降低标记率,但在不同条件下效果不一致;使用ΔR 2 并进行显著性检验可降低标记率,且效果相当一致。我们使用 PISA 2018 数据来说明所提方法在真实数据集中的表现。此外,我们还提供了利用响应时间进行 DIF 研究的指南。
{"title":"Detecting Differential Item Functioning Using Response Time.","authors":"Qizhou Duan, Ying Cheng","doi":"10.1177/00131644241280400","DOIUrl":"10.1177/00131644241280400","url":null,"abstract":"<p><p>This study investigated uniform differential item functioning (DIF) detection in response times. We proposed a regression analysis approach with both the working speed and the group membership as independent variables, and logarithm transformed response times as the dependent variable. Effect size measures such as Δ <math> <mrow> <msup><mrow><mi>R</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> and percentage change in regression coefficients in conjunction with the statistical significance tests were used to flag DIF items. A simulation study was conducted to assess the performance of three DIF detection criteria: (a) significance test, (b) significance test with Δ <math> <mrow> <msup><mrow><mi>R</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> , and (c) significance test with the percentage change in regression coefficients. The simulation study considered factors such as sample sizes, proportion of the focal group in relation to total sample size, number of DIF items, and the amount of DIF. The results showed that the significance test alone was too strict; using the percentage change in regression coefficients as an effect size measure reduced the flagging rate when the sample size was large, but the effect was inconsistent across different conditions; using Δ<i>R</i> <sup>2</sup> with significance test reduced the flagging rate and was fairly consistent. The PISA 2018 data were used to illustrate the performance of the proposed method in a real dataset. Furthermore, we provide guidelines for conducting DIF studies with response time.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"291-312"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562889/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142650502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Effort-Moderated Item Response Theory Models by Evaluating a Two-Step Estimation Method and Multidimensional Variations on the Model. 通过评估两步估算法和模型的多维变化,改进努力调节的项目反应理论模型。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-06 DOI: 10.1177/00131644241280727
Bowen Wang, Corinne Huggins-Manley, Huan Kuang, Jiawei Xiong

Rapid-guessing behavior in data can compromise our ability to estimate item and person parameters accurately. Consequently, it is crucial to model data with rapid-guessing patterns in a way that can produce unbiased ability estimates. This study proposes and evaluates three alternative modeling approaches that follow the logic of the effort-moderated item response theory model (EM-IRT) to analyze response data with rapid-guessing responses. One is the two-step EM-IRT model, which utilizes the item parameters estimated by respondents without rapid-guessing behavior and was initially proposed by Rios and Soland without further investigation. The other two models are effort-moderated multidimensional models (EM-MIRT), which we introduce in this study and vary as both between-item and within-item structures. The advantage of the EM-MIRT model is to account for the underlying relationship between rapid-guessing propensity and ability. The three models were compared with the traditional EM-IRT model regarding the accuracy of parameter recovery in various simulated conditions. Results demonstrated that the two-step EM-IRT and between-item EM-MIRT model consistently outperformed the traditional EM-IRT model under various conditions, with the two-step EM-IRT estimation generally delivering the best performance, especially for ability and item difficulty parameters estimation. In addition, different rapid-guessing patterns (i.e., difficulty-based, changing state, and decreasing effort) did not affect the performance of the two-step EM-IRT model. Overall, the findings suggest that the EM-IRT model with the two-step parameter estimation method can be applied in practice for estimating ability in the presence of rapid-guessing responses due to its accuracy and efficiency. The between-item EM-MIRT model can be used as an alternative model when there is no significant mean difference in the ability estimates between examinees who exhibit rapid-guessing behavior and those who do not.

数据中的快速猜测行为会影响我们准确估计项目和个人参数的能力。因此,对具有快速猜测模式的数据进行建模,使其能够产生无偏的能力估计值至关重要。本研究提出并评估了三种可供选择的建模方法,它们都遵循努力调节项目反应理论模型(EM-IRT)的逻辑,用于分析具有快速猜测反应的反应数据。其中一种是两步式 EM-IRT 模型,它利用的是没有快速猜测行为的被调查者所估计的项目参数,最初是由 Rios 和 Soland 提出的,没有经过进一步研究。另外两个模型是努力调节多维模型(EM-MIRT),我们在本研究中引入了这两个模型,它们既有项目间结构,也有项目内结构。EM-MIRT 模型的优点是考虑了快速猜测倾向与能力之间的内在关系。我们将这三种模型与传统的 EM-IRT 模型在各种模拟条件下的参数恢复准确性进行了比较。结果表明,在各种条件下,两步式 EM-IRT 模型和项目间 EM-MIRT 模型的性能始终优于传统的 EM-IRT 模型,其中两步式 EM-IRT 估计通常性能最佳,尤其是在能力和项目难度参数估计方面。此外,不同的快速猜测模式(即基于难度、改变状态和减少努力)并不影响两步式 EM-IRT 模型的性能。总之,研究结果表明,采用两步参数估计法的 EM-IRT 模型因其准确性和高效性,可实际用于存在快速猜测反应时的能力估计。当表现出快速猜测行为的考生与未表现出快速猜测行为的考生之间的能力估计平均值无显著差异时,可使用项目间 EM-MIRT 模型作为替代模型。
{"title":"Enhancing Effort-Moderated Item Response Theory Models by Evaluating a Two-Step Estimation Method and Multidimensional Variations on the Model.","authors":"Bowen Wang, Corinne Huggins-Manley, Huan Kuang, Jiawei Xiong","doi":"10.1177/00131644241280727","DOIUrl":"10.1177/00131644241280727","url":null,"abstract":"<p><p>Rapid-guessing behavior in data can compromise our ability to estimate item and person parameters accurately. Consequently, it is crucial to model data with rapid-guessing patterns in a way that can produce unbiased ability estimates. This study proposes and evaluates three alternative modeling approaches that follow the logic of the effort-moderated item response theory model (EM-IRT) to analyze response data with rapid-guessing responses. One is the two-step EM-IRT model, which utilizes the item parameters estimated by respondents without rapid-guessing behavior and was initially proposed by Rios and Soland without further investigation. The other two models are effort-moderated multidimensional models (EM-MIRT), which we introduce in this study and vary as both between-item and within-item structures. The advantage of the EM-MIRT model is to account for the underlying relationship between rapid-guessing propensity and ability. The three models were compared with the traditional EM-IRT model regarding the accuracy of parameter recovery in various simulated conditions. Results demonstrated that the two-step EM-IRT and between-item EM-MIRT model consistently outperformed the traditional EM-IRT model under various conditions, with the two-step EM-IRT estimation generally delivering the best performance, especially for ability and item difficulty parameters estimation. In addition, different rapid-guessing patterns (i.e., difficulty-based, changing state, and decreasing effort) did not affect the performance of the two-step EM-IRT model. Overall, the findings suggest that the EM-IRT model with the two-step parameter estimation method can be applied in practice for estimating ability in the presence of rapid-guessing responses due to its accuracy and efficiency. The between-item EM-MIRT model can be used as an alternative model when there is no significant mean difference in the ability estimates between examinees who exhibit rapid-guessing behavior and those who do not.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"401-423"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562957/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Precision in Predicting Magnitude of Differential Item Functioning: An M-DIF Pretrained Model Approach. 提高项目功能差异幅度预测的精确度:一种 M-DIF 预训练模型方法。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-04-01 Epub Date: 2024-10-01 DOI: 10.1177/00131644241279882
Shan Huang, Hidetoki Ishii

Despite numerous studies on the magnitude of differential item functioning (DIF), different DIF detection methods often define effect sizes inconsistently and fail to adequately account for testing conditions. To address these limitations, this study introduces the unified M-DIF model, which defines the magnitude of DIF as the difference in item difficulty parameters between reference and focal groups. The M-DIF model can incorporate various DIF detection methods and test conditions to form a quantitative model. The pretrained approach was employed to leverage a sufficiently representative large sample as the training set and ensure the model's generalizability. Once the pretrained model is constructed, it can be directly applied to new data. Specifically, a training dataset comprising 144 combinations of test conditions and 144,000 potential DIF items, each equipped with 29 statistical metrics, was used. We adopt the XGBoost method for modeling. Results show that, based on root mean square error (RMSE) and BIAS metrics, the M-DIF model outperforms the baseline model in both validation sets: under consistent and inconsistent test conditions. Across all 360 combinations of test conditions (144 consistent and 216 inconsistent with the training set), the M-DIF model demonstrates lower RMSE in 357 cases (99.2%), illustrating its robustness. Finally, we provided an empirical example to showcase the practical feasibility of implementing the M-DIF model.

尽管关于差异项目功能(DIF)大小的研究不胜枚举,但不同的 DIF 检测方法对效应大小的定义往往不一致,而且未能充分考虑测试条件。为了解决这些局限性,本研究引入了统一的 M-DIF 模型,该模型将 DIF 的大小定义为参照组和焦点组之间项目难度参数的差异。M-DIF 模型可以将各种 DIF 检测方法和测试条件结合起来,形成一个定量模型。采用预训练方法是为了利用具有足够代表性的大样本作为训练集,确保模型的普适性。一旦构建了预训练模型,就可以直接应用于新数据。具体来说,训练数据集包括 144 种测试条件组合和 144,000 个潜在的 DIF 项目,每个项目都有 29 个统计指标。我们采用 XGBoost 方法进行建模。结果表明,根据均方根误差(RMSE)和 BIAS 指标,M-DIF 模型在两个验证集(一致和不一致测试条件下)的表现都优于基线模型。在所有 360 种测试条件组合(144 种与训练集一致,216 种与训练集不一致)中,M-DIF 模型在 357 种情况下(99.2%)显示出较低的 RMSE,这说明了它的鲁棒性。最后,我们提供了一个实证案例来展示实施 M-DIF 模型的实际可行性。
{"title":"Enhancing Precision in Predicting Magnitude of Differential Item Functioning: An M-DIF Pretrained Model Approach.","authors":"Shan Huang, Hidetoki Ishii","doi":"10.1177/00131644241279882","DOIUrl":"10.1177/00131644241279882","url":null,"abstract":"<p><p>Despite numerous studies on the magnitude of differential item functioning (DIF), different DIF detection methods often define effect sizes inconsistently and fail to adequately account for testing conditions. To address these limitations, this study introduces the unified M-DIF model, which defines the magnitude of DIF as the difference in item difficulty parameters between reference and focal groups. The M-DIF model can incorporate various DIF detection methods and test conditions to form a quantitative model. The pretrained approach was employed to leverage a sufficiently representative large sample as the training set and ensure the model's generalizability. Once the pretrained model is constructed, it can be directly applied to new data. Specifically, a training dataset comprising 144 combinations of test conditions and 144,000 potential DIF items, each equipped with 29 statistical metrics, was used. We adopt the XGBoost method for modeling. Results show that, based on root mean square error (RMSE) and BIAS metrics, the M-DIF model outperforms the baseline model in both validation sets: under consistent and inconsistent test conditions. Across all 360 combinations of test conditions (144 consistent and 216 inconsistent with the training set), the M-DIF model demonstrates lower RMSE in 357 cases (99.2%), illustrating its robustness. Finally, we provided an empirical example to showcase the practical feasibility of implementing the M-DIF model.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"384-400"},"PeriodicalIF":2.3,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11562883/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142647676","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the Performance of Strategies for Handling Rapid Guessing Responses in Item Response Theory Equating. 项目反应理论等价中处理快速猜测反应策略的性能评估。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-03-30 DOI: 10.1177/00131644251329524
Juyoung Jung, Won-Chan Lee

This study assesses the performance of strategies for handling rapid guessing responses (RGs) within the context of item response theory observed-score equating. Four distinct approaches were evaluated: (1) ignoring RGs, (2) penalizing RGs as incorrect responses, (3) implementing list-wise deletion (LWD), and (4) treating RGs as missing data followed by imputation using logistic regression-based methodologies. These strategies were examined across a diverse array of testing scenarios. Results indicate that the performance of each strategy varied depending on the specific manipulated factors. Both ignoring and penalizing RGs were found to introduce substantial distortions in equating accuracy. LWD generally exhibited the lowest bias among the strategies evaluated but showed higher standard errors. Data imputation methods, particularly those employing lasso logistic regression and bootstrap techniques, demonstrated superior performance in minimizing equating errors compared to other approaches.

本研究评估了项目反应理论背景下处理快速猜测反应(RGs)策略的表现。评估了四种不同的方法:(1)忽略RGs,(2)将RGs作为不正确的响应进行惩罚,(3)实施列表明智删除(LWD),以及(4)将RGs作为缺失数据处理,然后使用基于逻辑回归的方法进行代入。这些策略在一系列不同的测试场景中进行了检验。结果表明,每种策略的性能取决于特定的操纵因素。我们发现忽视和惩罚rg都会导致相当的准确性扭曲。在评估的策略中,LWD通常表现出最低的偏倚,但显示出较高的标准误差。与其他方法相比,数据输入方法,特别是那些采用套索逻辑回归和自举技术的方法,在最小化等式误差方面表现出优越的性能。
{"title":"Assessing the Performance of Strategies for Handling Rapid Guessing Responses in Item Response Theory Equating.","authors":"Juyoung Jung, Won-Chan Lee","doi":"10.1177/00131644251329524","DOIUrl":"10.1177/00131644251329524","url":null,"abstract":"<p><p>This study assesses the performance of strategies for handling rapid guessing responses (RGs) within the context of item response theory observed-score equating. Four distinct approaches were evaluated: (1) ignoring RGs, (2) penalizing RGs as incorrect responses, (3) implementing list-wise deletion (LWD), and (4) treating RGs as missing data followed by imputation using logistic regression-based methodologies. These strategies were examined across a diverse array of testing scenarios. Results indicate that the performance of each strategy varied depending on the specific manipulated factors. Both ignoring and penalizing RGs were found to introduce substantial distortions in equating accuracy. LWD generally exhibited the lowest bias among the strategies evaluated but showed higher standard errors. Data imputation methods, particularly those employing lasso logistic regression and bootstrap techniques, demonstrated superior performance in minimizing equating errors compared to other approaches.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251329524"},"PeriodicalIF":2.1,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955993/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143763405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1