首页 > 最新文献

Educational and Psychological Measurement最新文献

英文 中文
The Dominant Trait Profile Method of Scoring Multidimensional Forced-Choice Questionnaires. 多维强迫选择问卷的优势特征分析方法。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-14 DOI: 10.1177/00131644251360386
Dimiter M Dimitrov

Proposed is a new method of scoring multidimensional forced-choice (MFC) questionnaires referred to as the dominant trait profile (DTP) method. The DTP method identifies a dominant response vector (DRV) for each trait-a vector of binary scores for preferences in item pairs within MFC blocks from the perspective of a respondent for whom the trait under consideration dominates over the other traits being measured. The respondents' observed response vectors are matched to the DRV for each trait to produce (1/0) matching scores that are then analyzed via latent trait modeling, with scaling options (a) bounded D-scale (from 0 to 1), or (b) item response theory logit scale. The DTP method allows for the comparison of individuals on a trait of interest, as well as their standing in relation to a dominant trait "standard" (criterion). The study results indicate that DTP-based trait estimates are highly correlated with those produced by the popular Thurstonian item response theory model and the Zinnes and Griggs pairwise preference item response theory model, while avoiding the complexity of their designs and some computations issues.

提出了一种多维强迫选择(MFC)问卷评分的新方法——显性特征谱(DTP)法。DTP方法为每个特征确定一个主导反应向量(DRV)——从被调查者的角度来看,在MFC块内的项目对偏好的二进制分数向量,被调查者认为所考虑的特征比其他被测量的特征占主导地位。被调查者观察到的反应向量与每个特质的DRV相匹配,产生(1/0)匹配分数,然后通过潜在特质建模进行分析,缩放选项为(a)有界d量表(从0到1)或(b)项目反应理论logit量表。DTP方法允许对个人感兴趣的特征进行比较,以及他们与主导特征“标准”(标准)的关系。研究结果表明,基于dtp的特质估计与流行的Thurstonian项目反应模型和Zinnes和Griggs配对偏好项目反应模型产生的特质估计高度相关,同时避免了它们设计的复杂性和一些计算问题。
{"title":"The Dominant Trait Profile Method of Scoring Multidimensional Forced-Choice Questionnaires.","authors":"Dimiter M Dimitrov","doi":"10.1177/00131644251360386","DOIUrl":"10.1177/00131644251360386","url":null,"abstract":"<p><p>Proposed is a new method of scoring multidimensional forced-choice (MFC) questionnaires referred to as the dominant trait profile (DTP) method. The DTP method identifies a dominant response vector (DRV) for each trait-a vector of binary scores for preferences in item pairs within MFC blocks from the perspective of a respondent for whom the trait under consideration dominates over the other traits being measured. The respondents' observed response vectors are matched to the DRV for each trait to produce (1/0) matching scores that are then analyzed via latent trait modeling, with scaling options (a) bounded D-scale (from 0 to 1), or (b) item response theory logit scale. The DTP method allows for the comparison of individuals on a trait of interest, as well as their standing in relation to a dominant trait \"standard\" (criterion). The study results indicate that DTP-based trait estimates are highly correlated with those produced by the popular Thurstonian item response theory model and the Zinnes and Griggs pairwise preference item response theory model, while avoiding the complexity of their designs and some computations issues.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251360386"},"PeriodicalIF":2.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356822/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human Expertise and Large Language Model Embeddings in the Content Validity Assessment of Personality Tests. 人格测验内容效度评估中的人类专业知识与大语言模型嵌入。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-14 DOI: 10.1177/00131644251355485
Nicola Milano, Michela Ponticorvo, Davide Marocco

In this article, we explore the application of Large Language Models (LLMs) in assessing the content validity of psychometric instruments, focusing on the Big Five Questionnaire (BFQ) and Big Five Inventory (BFI). Content validity, a cornerstone of test construction, ensures that psychological measures adequately cover their intended constructs. Using both human expert evaluations and advanced LLMs, we compared the accuracy of semantic item-construct alignment. Graduate psychology students employed the Content Validity Ratio to rate test items, forming the human baseline. In parallel, state-of-the-art LLMs, including multilingual and fine-tuned models, analyzed item embeddings to predict construct mappings. The results reveal distinct strengths and limitations of human and AI approaches. Human validators excelled in aligning the behaviorally rich BFQ items, while LLMs performed better with the linguistically concise BFI items. Training strategies significantly influenced LLM performance, with models tailored for lexical relationships outperforming general-purpose LLMs. Here we highlight the complementary potential of hybrid validation systems that integrate human expertise and AI precision. The findings underscore the transformative role of LLMs in psychological assessment, paving the way for scalable, objective, and robust test development methodologies.

本文探讨了大语言模型(LLMs)在心理测量工具内容效度评估中的应用,重点研究了大五问卷(BFQ)和大五量表(BFI)。内容效度是测试结构的基石,确保心理测量充分覆盖其预期结构。使用人类专家评估和高级llm,我们比较了语义项目-构造对齐的准确性。心理学研究生采用内容效度比对测试项目进行评分,形成人的基线。同时,最先进的llm,包括多语言和微调模型,分析项目嵌入来预测结构映射。结果揭示了人类和人工智能方法的独特优势和局限性。人类验证者在对齐行为丰富的BFI项目方面表现出色,而llm在对齐语言简洁的BFI项目方面表现更好。训练策略显著影响LLM的表现,为词汇关系量身定制的模型表现优于通用LLM。在这里,我们强调了整合人类专业知识和人工智能精度的混合验证系统的互补潜力。这些发现强调了法学硕士在心理评估中的变革作用,为可扩展的、客观的、健壮的测试开发方法铺平了道路。
{"title":"Human Expertise and Large Language Model Embeddings in the Content Validity Assessment of Personality Tests.","authors":"Nicola Milano, Michela Ponticorvo, Davide Marocco","doi":"10.1177/00131644251355485","DOIUrl":"10.1177/00131644251355485","url":null,"abstract":"<p><p>In this article, we explore the application of Large Language Models (LLMs) in assessing the content validity of psychometric instruments, focusing on the Big Five Questionnaire (BFQ) and Big Five Inventory (BFI). Content validity, a cornerstone of test construction, ensures that psychological measures adequately cover their intended constructs. Using both human expert evaluations and advanced LLMs, we compared the accuracy of semantic item-construct alignment. Graduate psychology students employed the Content Validity Ratio to rate test items, forming the human baseline. In parallel, state-of-the-art LLMs, including multilingual and fine-tuned models, analyzed item embeddings to predict construct mappings. The results reveal distinct strengths and limitations of human and AI approaches. Human validators excelled in aligning the behaviorally rich BFQ items, while LLMs performed better with the linguistically concise BFI items. Training strategies significantly influenced LLM performance, with models tailored for lexical relationships outperforming general-purpose LLMs. Here we highlight the complementary potential of hybrid validation systems that integrate human expertise and AI precision. The findings underscore the transformative role of LLMs in psychological assessment, paving the way for scalable, objective, and robust test development methodologies.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251355485"},"PeriodicalIF":2.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356817/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144872006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The One-Parameter Logistic Model Can Be True With Zero Probability for a Unidimensional Measuring Instrument: How One Could Go Wrong Removing Items Not Satisfying the Model. 一维测量仪器的单参数Logistic模型可以零概率成立:移除不符合模型的项目如何出错?
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-06 DOI: 10.1177/00131644251345120
Tenko Raykov, Bingsheng Zhang

This note is concerned with the chance of the one-parameter logistic (1PL-) model or the Rasch model being true for a unidimensional multi-item measuring instrument. It is pointed out that if a single dimension underlies a scale consisting of dichotomous items, then the probability of either model being correct for that scale can be zero. The question is then addressed, what the consequences could be of removing items not following these models. Using a large number of simulated data sets, a pair of empirically relevant settings is presented where such item elimination can be problematic. Specifically, dropping items from a unidimensional instrument due to them not satisfying the 1PL-model, or the Rasch model, can yield potentially seriously misleading ability estimates with increased standard errors and prediction error with respect to the latent trait. Implications for educational and behavioral research are discussed.

本文讨论一维多项目测量仪器的单参数logistic模型或Rasch模型成立的可能性。有人指出,如果一个单一的维度是由二分类项目组成的尺度的基础,那么任何一个模型对该尺度正确的概率都可以为零。接下来的问题是,删除不遵循这些模型的项目可能会产生什么后果。使用大量模拟数据集,提出了一对经验相关的设置,其中这种项目消除可能是有问题的。具体来说,由于项目不符合1pl模型或Rasch模型,从一维工具中删除项目可能会产生潜在的严重误导能力估计,并增加潜在特质的标准误差和预测误差。讨论了对教育和行为研究的启示。
{"title":"The One-Parameter Logistic Model Can Be True With Zero Probability for a Unidimensional Measuring Instrument: How One Could Go Wrong Removing Items Not Satisfying the Model.","authors":"Tenko Raykov, Bingsheng Zhang","doi":"10.1177/00131644251345120","DOIUrl":"10.1177/00131644251345120","url":null,"abstract":"<p><p>This note is concerned with the chance of the one-parameter logistic (1PL-) model or the Rasch model being true for a unidimensional multi-item measuring instrument. It is pointed out that if a single dimension underlies a scale consisting of dichotomous items, then the probability of either model being correct for that scale can be zero. The question is then addressed, what the consequences could be of removing items not following these models. Using a large number of simulated data sets, a pair of empirically relevant settings is presented where such item elimination can be problematic. Specifically, dropping items from a unidimensional instrument due to them not satisfying the 1PL-model, or the Rasch model, can yield potentially seriously misleading ability estimates with increased standard errors and prediction error with respect to the latent trait. Implications for educational and behavioral research are discussed.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251345120"},"PeriodicalIF":2.3,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12328337/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144816062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Model-Based Person Fit Statistics Applied to the Wechsler Adult Intelligence Scale IV. 基于模型的人拟合统计在韦氏成人智力量表中的应用
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-08-03 DOI: 10.1177/00131644251339444
Jared M Block, Steven P Reise, Keith F Widaman, Amanda K Montoya, David W Loring, Laura Glass Umfleet, Russell M Bauer, Joseph M Gullett, Brittany Wolff, Daniel L Drane, Kristen Enriquez, Robert M Bilder

An important task in clinical neuropsychology is to evaluate whether scores obtained on a test battery, such as the Wechsler Adult Intelligence Scale Fourth Edition (WAIS-IV), can be considered "credible" or "valid" for a particular patient. Such evaluations are typically made based on responses to performance validity tests (PVTs). As a complement to PVTs, we propose that WAIS-IV profiles also be evaluated using a residual-based M-distance ( d ri 2 ) person fit statistic. Large d ri 2 values flag profiles that are inconsistent with the factor analytic model underlying the interpretation of test scores. We first established a well-fitting model with four correlated factors for 10 core WAIS-IV subtests derived from the standardization sample. Based on this model, we then performed a Monte Carlo simulation to evaluate whether a hypothesized sampling distribution for d ri 2 was accurate and whether d ri 2 was computable, under different degrees of missing subtest scores. We found that when the number of subtests administered was less than 8, d ri 2 could not be computed around 25% of the time. When computable, d ri 2 conformed to a χ 2 distribution with degrees of freedom equal to the number of tests minus the number of factors. Demonstration of the d ri 2 index in a large sample of clinical cases was also provided. Findings highlight the potential utility of the d ri 2 index as an adjunct to PVTs, offering clinicians an additional method to evaluate WAIS-IV test profiles and improve the accuracy of neuropsychological evaluations.

临床神经心理学的一项重要任务是评估在一系列测试中获得的分数,如韦氏成人智力量表第四版(WAIS-IV),对于特定患者来说是否可以被认为是“可信的”或“有效的”。这种评估通常是基于对性能有效性测试(pvt)的响应进行的。作为pvt的补充,我们建议使用基于残差的m -距离(d ri 2)人拟合统计量来评估WAIS-IV剖面。大的dri 2值标志着与解释考试成绩的因素分析模型不一致的概况。首先,我们对标准化样本衍生的10个核心WAIS-IV子测试建立了具有4个相关因子的良好拟合模型。在此模型的基础上,我们进行了蒙特卡罗模拟,以评估在不同程度的缺失子测试分数下,d ri 2的假设抽样分布是否准确以及d ri 2是否可计算。我们发现,当进行的子测试数量少于8个时,大约25%的时间无法计算dri 2。当可计算时,dri 2符合χ 2分布,其自由度等于试验数减去因子数。还提供了在大量临床病例样本中对dri 2指数的演示。研究结果强调了d2指数作为pvt辅助指标的潜在效用,为临床医生提供了一种评估WAIS-IV测试资料的额外方法,并提高了神经心理学评估的准确性。
{"title":"Model-Based Person Fit Statistics Applied to the Wechsler Adult Intelligence Scale IV.","authors":"Jared M Block, Steven P Reise, Keith F Widaman, Amanda K Montoya, David W Loring, Laura Glass Umfleet, Russell M Bauer, Joseph M Gullett, Brittany Wolff, Daniel L Drane, Kristen Enriquez, Robert M Bilder","doi":"10.1177/00131644251339444","DOIUrl":"10.1177/00131644251339444","url":null,"abstract":"<p><p>An important task in clinical neuropsychology is to evaluate whether scores obtained on a test battery, such as the Wechsler Adult Intelligence Scale Fourth Edition (WAIS-IV), can be considered \"credible\" or \"valid\" for a particular patient. Such evaluations are typically made based on responses to performance validity tests (PVTs). As a complement to PVTs, we propose that WAIS-IV profiles also be evaluated using a residual-based M-distance ( <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> ) person fit statistic. Large <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> values flag profiles that are inconsistent with the factor analytic model underlying the interpretation of test scores. We first established a well-fitting model with four correlated factors for 10 core WAIS-IV subtests derived from the standardization sample. Based on this model, we then performed a Monte Carlo simulation to evaluate whether a hypothesized sampling distribution for <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> was accurate and whether <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> was computable, under different degrees of missing subtest scores. We found that when the number of subtests administered was less than 8, <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> could not be computed around 25% of the time. When computable, <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> conformed to a <math> <mrow> <msup><mrow><mi>χ</mi></mrow> <mrow><mn>2</mn></mrow> </msup> </mrow> </math> distribution with degrees of freedom equal to the number of tests minus the number of factors. Demonstration of the <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> index in a large sample of clinical cases was also provided. Findings highlight the potential utility of the <math> <mrow> <msubsup><mrow><mi>d</mi></mrow> <mrow><mi>ri</mi></mrow> <mrow><mn>2</mn></mrow> </msubsup> </mrow> </math> index as an adjunct to PVTs, offering clinicians an additional method to evaluate WAIS-IV test profiles and improve the accuracy of neuropsychological evaluations.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251339444"},"PeriodicalIF":2.3,"publicationDate":"2025-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321812/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144793789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentangling Qualitatively Different Faking Strategies in High-Stakes Personality Assessments: A Mixture Extension of the Multidimensional Nominal Response Model. 高风险人格评估中不同品质伪装策略的解耦:多维名义反应模型的混合扩展。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-29 DOI: 10.1177/00131644251341843
Timo Seitz, Ö Emre C Alagöz, Thorsten Meiser

High-stakes personality assessments are often compromised by faking, where test-takers distort their responses according to social desirability. Many previous models have accounted for faking by modeling an additional latent dimension that quantifies each test-taker's degree of faking. Such models assume a homogeneous response strategy among all test-takers, reflected in a measurement model in which substantive traits and faking jointly influence item responses. However, such a model will be misspecified if, for some test-takers, item responding is only a function of substantive traits or only a function of faking. To address this limitation, we propose a mixture modeling extension of the multidimensional nominal response model (M-MNRM) that can be used to account for qualitatively different response strategies and to model relationships of strategy use with external variables. In a simulation study, the M-MNRM exhibited good parameter recovery and high classification accuracy across multiple conditions. Analyses of three empirical high-stakes datasets provided evidence for the consistent presence of the specified latent classes in different personnel selection contexts, emphasizing the importance of accounting for such kind of response behavior heterogeneity in high-stakes assessment data. We end the article with a discussion of the model's utility for psychological measurement.

高风险的人格评估通常会因伪造而受到损害,即考生根据社会期望扭曲自己的回答。以前的许多模型都是通过建立一个额外的潜在维度来量化每个考生的伪造程度。这些模型假设所有被试者的反应策略是同质的,反映在实质性特征和虚假特征共同影响项目反应的测量模型中。然而,对于一些考生来说,如果项目反应仅仅是实质性特征的函数或仅仅是伪造的函数,那么这种模型将是错误的。为了解决这一限制,我们提出了多维名义响应模型(M-MNRM)的混合建模扩展,该模型可用于解释定性不同的响应策略,并对策略使用与外部变量的关系进行建模。在仿真研究中,M-MNRM在多个条件下均表现出良好的参数恢复和较高的分类精度。对三个经验高风险数据集的分析提供了证据,证明特定潜在类别在不同的人员选择背景下一致存在,强调了在高风险评估数据中考虑这种反应行为异质性的重要性。最后,我们讨论了该模型在心理测量中的效用。
{"title":"Disentangling Qualitatively Different Faking Strategies in High-Stakes Personality Assessments: A Mixture Extension of the Multidimensional Nominal Response Model.","authors":"Timo Seitz, Ö Emre C Alagöz, Thorsten Meiser","doi":"10.1177/00131644251341843","DOIUrl":"10.1177/00131644251341843","url":null,"abstract":"<p><p>High-stakes personality assessments are often compromised by faking, where test-takers distort their responses according to social desirability. Many previous models have accounted for faking by modeling an additional latent dimension that quantifies each test-taker's degree of faking. Such models assume a homogeneous response strategy among all test-takers, reflected in a measurement model in which substantive traits and faking jointly influence item responses. However, such a model will be misspecified if, for some test-takers, item responding is only a function of substantive traits or only a function of faking. To address this limitation, we propose a mixture modeling extension of the multidimensional nominal response model (M-MNRM) that can be used to account for qualitatively different response strategies and to model relationships of strategy use with external variables. In a simulation study, the M-MNRM exhibited good parameter recovery and high classification accuracy across multiple conditions. Analyses of three empirical high-stakes datasets provided evidence for the consistent presence of the specified latent classes in different personnel selection contexts, emphasizing the importance of accounting for such kind of response behavior heterogeneity in high-stakes assessment data. We end the article with a discussion of the model's utility for psychological measurement.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251341843"},"PeriodicalIF":2.3,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12310618/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144774941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Item Difficulty Modeling Using Fine-tuned Small and Large Language Models. 道具难度建模使用微调的大小语言模型。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-07-06 DOI: 10.1177/00131644251344973
Ming Li, Hong Jiao, Tianyi Zhou, Nan Zhang, Sydney Peters, Robert W Lissitz

This study investigates methods for item difficulty modeling in large-scale assessments using both small and large language models (LLMs). We introduce novel data augmentation strategies, including augmentation on the fly and distribution balancing, that surpass benchmark performances, demonstrating their effectiveness in mitigating data imbalance and improving model performance. Our results showed that fine-tuned small language models (SLMs) such as Bidirectional Encoder Representations from Transformers (BERT) and RoBERTa yielded lower root mean squared error than the first-place model in the BEA 2024 Shared Task competition, whereas domain-specific models like BioClinicalBERT and PubMedBERT did not provide significant improvements due to distributional gaps. Majority voting among SLMs enhanced prediction accuracy, reinforcing the benefits of ensemble learning. LLMs, such as GPT-4, exhibited strong generalization capabilities but struggled with item difficulty prediction, likely due to limited training data and the absence of explicit difficulty-related context. Chain-of-thought prompting and rationale generation approaches were explored but did not yield substantial improvements, suggesting that additional training data or more sophisticated reasoning techniques may be necessary. Embedding-based methods, particularly using NV-Embed-v2, showed promise but did not outperform our best augmentation strategies, indicating that capturing nuanced difficulty-related features remains a challenge.

本研究探讨了大型和小型语言模型(llm)在大规模评估中项目难度建模的方法。我们引入了新的数据增强策略,包括动态增强和分布平衡,这些策略超越了基准性能,证明了它们在缓解数据不平衡和提高模型性能方面的有效性。我们的研究结果表明,在BEA 2024共享任务竞赛中,经过微调的小型语言模型(SLMs),如来自Transformers的双向编码器表示(BERT)和RoBERTa,产生的均方根误差低于第一名的模型,而特定领域的模型,如BioClinicalBERT和PubMedBERT,由于分布差距没有提供显着的改进。slm中的多数投票提高了预测精度,强化了集成学习的好处。lms,如GPT-4,表现出较强的泛化能力,但在项目难度预测方面存在困难,这可能是由于有限的训练数据和缺乏明确的难度相关背景。研究人员探索了思维链提示和基本原理生成方法,但没有产生实质性的改进,这表明可能需要额外的训练数据或更复杂的推理技术。基于嵌入的方法,特别是使用NV-Embed-v2的方法,表现出了希望,但并没有超过我们最好的增强策略,这表明捕捉细微的困难相关特征仍然是一个挑战。
{"title":"Item Difficulty Modeling Using Fine-tuned Small and Large Language Models.","authors":"Ming Li, Hong Jiao, Tianyi Zhou, Nan Zhang, Sydney Peters, Robert W Lissitz","doi":"10.1177/00131644251344973","DOIUrl":"10.1177/00131644251344973","url":null,"abstract":"<p><p>This study investigates methods for item difficulty modeling in large-scale assessments using both small and large language models (LLMs). We introduce novel data augmentation strategies, including augmentation on the fly and distribution balancing, that surpass benchmark performances, demonstrating their effectiveness in mitigating data imbalance and improving model performance. Our results showed that fine-tuned small language models (SLMs) such as Bidirectional Encoder Representations from Transformers (BERT) and RoBERTa yielded lower root mean squared error than the first-place model in the BEA 2024 Shared Task competition, whereas domain-specific models like BioClinicalBERT and PubMedBERT did not provide significant improvements due to distributional gaps. Majority voting among SLMs enhanced prediction accuracy, reinforcing the benefits of ensemble learning. LLMs, such as GPT-4, exhibited strong generalization capabilities but struggled with item difficulty prediction, likely due to limited training data and the absence of explicit difficulty-related context. Chain-of-thought prompting and rationale generation approaches were explored but did not yield substantial improvements, suggesting that additional training data or more sophisticated reasoning techniques may be necessary. Embedding-based methods, particularly using NV-Embed-v2, showed promise but did not outperform our best augmentation strategies, indicating that capturing nuanced difficulty-related features remains a challenge.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251344973"},"PeriodicalIF":2.1,"publicationDate":"2025-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12230038/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144590702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Historical Measurement Information Can Be Used to Improve Estimation of Structural Parameters in Structural Equation Models With Small Samples. 利用历史测量信息可以改善小样本结构方程模型中结构参数的估计。
IF 2.1 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-13 DOI: 10.1177/00131644251330851
James Ohisei Uanhoro, Olushola O Soyoye

This study investigates the incorporation of historical measurement information into structural equation models (SEM) with small samples to enhance the estimation of structural parameters. Given the availability of published factor analysis results with loading estimates and standard errors for popular scales, researchers may use this historical information as informative priors in Bayesian SEM (BSEM). We focus on estimating the correlation between two constructs using BSEM after generating data with significant bias in the Pearson correlation of their sum scores due to measurement error. Our findings indicate that incorporating historical information on measurement parameters as priors can improve the accuracy of correlation estimates, mainly when the true correlation is small-a common scenario in psychological research. Priors derived from meta-analytic estimates were especially effective, providing high accuracy and acceptable coverage. However, when the true correlation is large, weakly informative priors on all parameters yield the best results. These results suggest leveraging historical measurement information in BSEM can enhance structural parameter estimation.

本研究探讨了将历史测量信息纳入小样本结构方程模型(SEM)以增强结构参数的估计。考虑到已发表的因子分析结果的可用性,研究人员可以使用这些历史信息作为贝叶斯扫描电镜(BSEM)的信息先验。我们的重点是在生成由于测量误差导致的总得分的Pearson相关性存在显著偏差的数据后,使用BSEM估计两个结构之间的相关性。我们的研究结果表明,将测量参数的历史信息作为先验可以提高相关估计的准确性,特别是当真实相关性很小时-这是心理学研究中的常见情况。来自元分析估计的先验尤其有效,提供了高准确性和可接受的覆盖率。然而,当真正的相关性很大时,所有参数的弱信息先验产生最好的结果。这些结果表明,利用历史测量信息在BSEM中可以提高结构参数的估计。
{"title":"Historical Measurement Information Can Be Used to Improve Estimation of Structural Parameters in Structural Equation Models With Small Samples.","authors":"James Ohisei Uanhoro, Olushola O Soyoye","doi":"10.1177/00131644251330851","DOIUrl":"10.1177/00131644251330851","url":null,"abstract":"<p><p>This study investigates the incorporation of historical measurement information into structural equation models (SEM) with small samples to enhance the estimation of structural parameters. Given the availability of published factor analysis results with loading estimates and standard errors for popular scales, researchers may use this historical information as informative priors in Bayesian SEM (BSEM). We focus on estimating the correlation between two constructs using BSEM after generating data with significant bias in the Pearson correlation of their sum scores due to measurement error. Our findings indicate that incorporating historical information on measurement parameters as priors can improve the accuracy of correlation estimates, mainly when the true correlation is small-a common scenario in psychological research. Priors derived from meta-analytic estimates were especially effective, providing high accuracy and acceptable coverage. However, when the true correlation is large, weakly informative priors on all parameters yield the best results. These results suggest leveraging historical measurement information in BSEM can enhance structural parameter estimation.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"00131644251330851"},"PeriodicalIF":2.1,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12170579/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144324766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Effect of Modeling Missing Data With IRTree Approach on Parameter Estimates Under Different Simulation Conditions. 用IRTree方法建模缺失数据对不同仿真条件下参数估计的影响。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2024-12-23 DOI: 10.1177/00131644241306024
Yeşim Beril Soğuksu, Ergül Demir

This study explores the performance of the item response tree (IRTree) approach in modeling missing data, comparing its performance to the expectation-maximization (EM) algorithm and multiple imputation (MI) methods. Both simulation and empirical data were used to evaluate these methods across different missing data mechanisms, test lengths, sample sizes, and missing data proportions. Expected a posteriori was used for ability estimation, and bias and root mean square error (RMSE) were calculated. The findings indicate that IRTree provides more accurate ability estimates with lower RMSE than both EM and MI methods. Its overall performance was particularly strong under missing completely at random and missing not at random, especially with longer tests and lower proportions of missing data. However, IRTree was most effective with moderate levels of omitted responses and medium-ability test takers, though its accuracy decreased in cases of extreme omissions and abilities. The study highlights that IRTree is particularly well suited for low-stakes tests and has strong potential for providing deeper insights into the underlying missing data mechanisms within a data set.

本研究探讨了项目响应树(IRTree)方法在缺失数据建模中的性能,并将其性能与期望最大化(EM)算法和多重imputation (MI)方法进行了比较。模拟和经验数据用于评估这些方法在不同的缺失数据机制,测试长度,样本量和缺失数据比例。期望后验法用于能力估计,并计算偏差和均方根误差(RMSE)。研究结果表明,与EM和MI方法相比,IRTree方法提供了更准确的能力估计,RMSE更低。在完全随机缺失和非随机缺失两种情况下,特别是在测试时间较长和缺失数据比例较低的情况下,其总体性能特别强。然而,IRTree对中等水平的遗漏答案和中等能力的考生最有效,尽管在极端遗漏和能力的情况下,其准确性会下降。该研究强调,IRTree特别适合于低风险测试,并且在提供对数据集中潜在缺失数据机制的更深入了解方面具有强大的潜力。
{"title":"The Effect of Modeling Missing Data With IRTree Approach on Parameter Estimates Under Different Simulation Conditions.","authors":"Yeşim Beril Soğuksu, Ergül Demir","doi":"10.1177/00131644241306024","DOIUrl":"10.1177/00131644241306024","url":null,"abstract":"<p><p>This study explores the performance of the item response tree (IRTree) approach in modeling missing data, comparing its performance to the expectation-maximization (EM) algorithm and multiple imputation (MI) methods. Both simulation and empirical data were used to evaluate these methods across different missing data mechanisms, test lengths, sample sizes, and missing data proportions. Expected a posteriori was used for ability estimation, and bias and root mean square error (RMSE) were calculated. The findings indicate that IRTree provides more accurate ability estimates with lower RMSE than both EM and MI methods. Its overall performance was particularly strong under missing completely at random and missing not at random, especially with longer tests and lower proportions of missing data. However, IRTree was most effective with moderate levels of omitted responses and medium-ability test takers, though its accuracy decreased in cases of extreme omissions and abilities. The study highlights that IRTree is particularly well suited for low-stakes tests and has strong potential for providing deeper insights into the underlying missing data mechanisms within a data set.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"507-526"},"PeriodicalIF":2.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11669122/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142892972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Item Classification by Difficulty Using Functional Principal Component Clustering and Neural Networks. 基于功能主成分聚类和神经网络的项目难度分类。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2025-01-04 DOI: 10.1177/00131644241299834
James Zoucha, Igor Himelfarb, Nai-En Tang

Maintaining consistent item difficulty across test forms is crucial for accurately and fairly classifying examinees into pass or fail categories. This article presents a practical procedure for classifying items based on difficulty levels using functional data analysis (FDA). Methodologically, we clustered item characteristic curves (ICCs) into difficulty groups by analyzing their functional principal components (FPCs) and then employed a neural network to predict difficulty for ICCs. Given the degree of similarity between many ICCs, categorizing items by difficulty can be challenging. The strength of this method lies in its ability to provide an empirical and consistent process for item classification, as opposed to relying solely on visual inspection. The findings reveal that most discrepancies between visual classification and FDA results differed by only one adjacent difficulty level. Approximately 67% of these discrepancies involved items in the medium to hard range being categorized into higher difficulty levels by FDA, while the remaining third involved very easy to easy items being classified into lower levels. The neural network, trained on these data, achieved an accuracy of 79.6%, with misclassifications also differing by only one adjacent difficulty level compared to FDA clustering. The method demonstrates an efficient and practical procedure for classifying test items, especially beneficial in testing programs where smaller volumes of examinees tested at various times throughout the year.

要准确、公平地将考生划分为及格或不及格类别,在各种测试表格中保持项目难度的一致性至关重要。本文介绍了一种利用功能数据分析(FDA)根据难度水平对项目进行分类的实用程序。在方法上,我们通过分析项目特征曲线(ICC)的功能主成分(FPC),将其聚类为难度组,然后采用神经网络预测 ICC 的难度。鉴于许多 ICC 之间的相似程度,按难度对项目进行分类可能具有挑战性。这种方法的优势在于它能够为项目分类提供一个经验性和一致性的过程,而不是仅仅依靠目测。研究结果表明,目测分类与 FDA 结果之间的大多数差异仅相差一个难度等级。在这些差异中,约有 67% 的差异涉及中等至较难的项目被 FDA 归类为较高难度级别,而其余三分之一的差异则涉及非常简单至简单的项目被归类为较低难度级别。在这些数据上训练的神经网络的准确率达到了 79.6%,与 FDA 聚类相比,误分类的难度等级也只相差一个。该方法展示了一种高效实用的测试项目分类程序,尤其适用于在全年不同时间对较少数量的考生进行测试的测试项目。
{"title":"Item Classification by Difficulty Using Functional Principal Component Clustering and Neural Networks.","authors":"James Zoucha, Igor Himelfarb, Nai-En Tang","doi":"10.1177/00131644241299834","DOIUrl":"10.1177/00131644241299834","url":null,"abstract":"<p><p>Maintaining consistent item difficulty across test forms is crucial for accurately and fairly classifying examinees into pass or fail categories. This article presents a practical procedure for classifying items based on difficulty levels using functional data analysis (FDA). Methodologically, we clustered item characteristic curves (ICCs) into difficulty groups by analyzing their functional principal components (FPCs) and then employed a neural network to predict difficulty for ICCs. Given the degree of similarity between many ICCs, categorizing items by difficulty can be challenging. The strength of this method lies in its ability to provide an empirical and consistent process for item classification, as opposed to relying solely on visual inspection. The findings reveal that most discrepancies between visual classification and FDA results differed by only one adjacent difficulty level. Approximately 67% of these discrepancies involved items in the medium to hard range being categorized into higher difficulty levels by FDA, while the remaining third involved <i>very easy</i> to <i>easy</i> items being classified into lower levels. The neural network, trained on these data, achieved an accuracy of 79.6%, with misclassifications also differing by only one adjacent difficulty level compared to FDA clustering. The method demonstrates an efficient and practical procedure for classifying test items, especially beneficial in testing programs where smaller volumes of examinees tested at various times throughout the year.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"429-457"},"PeriodicalIF":2.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11699546/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142930042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Impact of Missing Data on Parameter Estimation: Three Examples in Computerized Adaptive Testing. 缺失数据对参数估计的影响:计算机自适应测试中的三个例子。
IF 2.3 3区 心理学 Q2 MATHEMATICS, INTERDISCIPLINARY APPLICATIONS Pub Date : 2025-06-01 Epub Date: 2025-01-07 DOI: 10.1177/00131644241306990
Xiaowen Liu, Eric Loken

In computerized adaptive testing (CAT), examinees see items targeted to their ability level. Postoperational data have a high degree of missing information relative to designs where everyone answers all questions. Item responses are observed over a restricted range of abilities, reducing item-total score correlations. However, if the adaptive item selection depends only on observed responses, the data are missing at random (MAR). We simulated data from three different testing designs (common items, randomly selected items, and CAT) and found that it was possible to re-estimate both person and item parameters from postoperational CAT data. In a multidimensional CAT, we show that it is necessary to include all responses from the testing phase to avoid violating missing data assumptions. We also observed that some CAT designs produced "reversals" where item discriminations became negative causing dramatic under and over-estimation of abilities. Our results apply to situations where researchers work with data drawn from adaptive testing or from instructional tools with adaptive delivery. To avoid bias, researchers must make sure they use all the data necessary to meet the MAR assumptions.

在计算机自适应测试(CAT)中,考生看到的是针对他们能力水平的项目。相对于每个人回答所有问题的设计,操作后数据有高度的信息缺失。道具反应是在有限的能力范围内观察到的,这降低了道具与总分的相关性。然而,如果自适应项目选择仅依赖于观察到的反应,则数据随机缺失(MAR)。我们模拟了三种不同测试设计(常见项目、随机选择项目和CAT)的数据,发现可以从术后CAT数据中重新估计人和项目参数。在多维CAT中,我们表明有必要包括来自测试阶段的所有响应,以避免违反缺失的数据假设。我们还观察到一些CAT设计产生了“逆转”,其中项目歧视变得消极,导致对能力的严重低估和高估。我们的研究结果适用于研究人员使用自适应测试或自适应交付的教学工具得出的数据的情况。为了避免偏见,研究人员必须确保他们使用所有必要的数据来满足MAR假设。
{"title":"The Impact of Missing Data on Parameter Estimation: Three Examples in Computerized Adaptive Testing.","authors":"Xiaowen Liu, Eric Loken","doi":"10.1177/00131644241306990","DOIUrl":"10.1177/00131644241306990","url":null,"abstract":"<p><p>In computerized adaptive testing (CAT), examinees see items targeted to their ability level. Postoperational data have a high degree of missing information relative to designs where everyone answers all questions. Item responses are observed over a restricted range of abilities, reducing item-total score correlations. However, if the adaptive item selection depends only on observed responses, the data are missing at random (MAR). We simulated data from three different testing designs (common items, randomly selected items, and CAT) and found that it was possible to re-estimate both person and item parameters from postoperational CAT data. In a multidimensional CAT, we show that it is necessary to include all responses from the testing phase to avoid violating missing data assumptions. We also observed that some CAT designs produced \"reversals\" where item discriminations became negative causing dramatic under and over-estimation of abilities. Our results apply to situations where researchers work with data drawn from adaptive testing or from instructional tools with adaptive delivery. To avoid bias, researchers must make sure they use all the data necessary to meet the MAR assumptions.</p>","PeriodicalId":11502,"journal":{"name":"Educational and Psychological Measurement","volume":" ","pages":"617-635"},"PeriodicalIF":2.3,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11705310/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142946372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Educational and Psychological Measurement
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1