International Journal of Testing最新文献

英文中文

Can your darkness be measured? Analyzing the full and brief version of the Dark Factor of Personality in Swedish 你的黑暗能被测量吗?浅析瑞典语《人格黑暗因素》的全文与简写

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2023-04-18 DOI: 10.1080/15305058.2023.2195659

Nico Streckert, Lara Kurtz, P. Kajonius

Abstract The Dark Factor of Personality (D) measures the latent core of antagonistic traits. The present study evaluated the psychometric properties of the Swedish version of the full (D70) and the brief (D16) versions, concerning structural validity, item information, and convergent validity. An online sample (N = 294) was analyzed using CFA (Maximum Likelihood Estimation), IRT (Graded Response Model) and SEM (latent correlations). Firstly, the original theorized bifactor model for D70 and a single-factor model for D16 showed good fit to the data. Moreover, new reliability-analyses based on FD and H indicated that the D70 favorably can be collapsed into a unidimensional measure, which is further discussed. Secondly, the IRT-analyses present valid item quality and functioning and showed that items provide the most information on trait levels above mean levels. Lastly, convergent SEM-analyses showed that D had high latent trait correlations to psychopathy and Machiavellianism, but not to narcissism. The correlations with the Big Six personality factors (mini-IPIP6) yielded expected high correlations with Agreeableness and Honesty-Humility. The Swedish translation of the full D70 and brief D16 is recommended for use in future research.

摘要人格的黑暗因素（D）衡量对抗性特质的潜在核心。本研究评估了瑞典语版完整版（D70）和简短版（D16）在结构有效性、项目信息和收敛有效性方面的心理测量特性。在线样本（N = 294）使用CFA（最大似然估计）、IRT（分级响应模型）和SEM（潜在相关性）进行分析。首先，D70的原始理论双因子模型和D16的单因子模型显示出与数据的良好拟合。此外，基于FD和H的新的可靠性分析表明，D70可以很好地分解为一维度量，这一点有待进一步讨论。其次，IRT分析呈现了有效的项目质量和功能，并表明项目在高于平均水平的特质水平上提供了最多的信息。最后，收敛的SEM分析表明，D与精神变态和马基雅维利主义有很高的潜在特质相关性，但与自恋无关。与六大人格因素（mini-IPIP6）的相关性产生了预期的与随和和诚实谦逊的高相关性。建议在未来的研究中使用完整的D70和简短的D16的瑞典语翻译。

{"title":"Can your darkness be measured? Analyzing the full and brief version of the Dark Factor of Personality in Swedish","authors":"Nico Streckert, Lara Kurtz, P. Kajonius","doi":"10.1080/15305058.2023.2195659","DOIUrl":"https://doi.org/10.1080/15305058.2023.2195659","url":null,"abstract":"Abstract The Dark Factor of Personality (D) measures the latent core of antagonistic traits. The present study evaluated the psychometric properties of the Swedish version of the full (D70) and the brief (D16) versions, concerning structural validity, item information, and convergent validity. An online sample (N = 294) was analyzed using CFA (Maximum Likelihood Estimation), IRT (Graded Response Model) and SEM (latent correlations). Firstly, the original theorized bifactor model for D70 and a single-factor model for D16 showed good fit to the data. Moreover, new reliability-analyses based on FD and H indicated that the D70 favorably can be collapsed into a unidimensional measure, which is further discussed. Secondly, the IRT-analyses present valid item quality and functioning and showed that items provide the most information on trait levels above mean levels. Lastly, convergent SEM-analyses showed that D had high latent trait correlations to psychopathy and Machiavellianism, but not to narcissism. The correlations with the Big Six personality factors (mini-IPIP6) yielded expected high correlations with Agreeableness and Honesty-Humility. The Swedish translation of the full D70 and brief D16 is recommended for use in future research.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"145 - 189"},"PeriodicalIF":1.7,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45280757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Investigating the overlap and predictive validity between Criterion A and B in the alternative model for personality disorders in DSM-5 探讨DSM-5中人格障碍备选模型中标准A和标准B的重叠和预测效度

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2023-04-03 DOI: 10.1080/15305058.2023.2195661

Carla Martí Valls, Kitty Balazadeh, P. Kajonius

Abstract The Alternative DSM-5 Model for Personality Disorders (AMPD) consists of level of personality functioning (Criterion A) and maladaptive personality traits (Criterion B). The brief scale versions of these are understudied, while often being used by clinicians and researchers. In this study, we wanted to investigate the overlap and predictive validity of Criterion A and B. Participants (N = 253) were measured on level of personality functioning (LPFS-BF) and maladaptive personality traits (PID-5-BF), as well as internalizing outcomes such existential meaninglessness (EMS) and externalizing outcomes such as substance and behavioral addictions (SSAB). Data analysis was conducted with principal component analysis (PCA) and regression analyses. The results showed over 50% overlap between the brief versions of Criterion A and B, while Criterion B slightly outperformed Criterion A in outcomes of EMS and SSAB. We discuss the potential redundancy and usefulness of personality functioning and maladaptive personality traits.

摘要人格障碍的DSM-5替代模型（AMPD）由人格功能水平（标准A）和适应不良人格特征（标准B）组成。这些简短的量表版本研究不足，而临床医生和研究人员经常使用。在这项研究中，我们想调查标准A和B的重叠和预测有效性。参与者（N = 253）的人格功能水平（LPFS-BF）和适应不良人格特征（PID-5-BF），以及内化结果（如存在无意义（EMS））和外化结果（如物质和行为成瘾（SSAB））。数据分析采用主成分分析（PCA）和回归分析。结果显示，标准A和标准B的简短版本之间有超过50%的重叠，而标准B在EMS和SSAB的结果上略优于标准A。我们讨论了人格功能和不适应人格特征的潜在冗余和有用性。

引用次数: 2

Multidimensionality and measurement invariance of the revised developmental work personality scale 修订后的发展性工作人格量表的多维性和测量不变性

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2023-01-18 DOI: 10.1080/15305058.2023.2167084

Rongxiu Wu, C. Chiu, David M. Dueber, Mirang Park, D. Lange, Emre Umucu, D. Strauser

Abstract The current study examined the factor structure, measurement invariance, and construct validity of the 14-item Revised Developmental Work Personality Scale (RDWPS) using a sample of 603 college students in a Midwest university of the United States. Exploratory and confirmatory factor analysis results indicated that the 11-item RDWPS resulted in a better fit of the measurement model. Partial measurement invariance was also detected between gender groups. In addition, it was weakly to moderately correlated with the Utrecht Work Engagement Scale-Student (UWES-S), self-reported effort, and GPA among college students. Lastly, it was found that males scored lower than females in all three subscales of the RDWPS in comparison to the latent means of the gender groups.

摘要本研究以美国中西部一所大学的603名大学生为样本，检验了14项修订的发展性工作人格量表（RDWPS）的因子结构、测量不变性和结构有效性。探索性和验证性因素分析结果表明，11项RDWPS更符合测量模型。在性别组之间也检测到部分测量不变性。此外，它与乌得勒支工作参与量表学生（UWES-S）、大学生自我报告的努力和GPA呈弱至中度相关。最后，研究发现，与性别组的潜在平均值相比，在RDWPS的所有三个分量表中，男性的得分都低于女性。

引用次数: 0

Summative assessments in a multilingual context: What comparative judgment reveals about comparability across different languages in Literature 多语言背景下的总结性评估：比较判断揭示了文学中不同语言之间的可比性

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2022-12-28 DOI: 10.1080/15305058.2022.2149536

L.H.L. Badham, Antony Furlong

Abstract Multilingual summative assessments face significant challenges due to tensions that exist between multiple language provision and comparability. Yet, conventional approaches for investigating comparability in multilingual assessments fail to accommodate assessments that comprise extended responses that target complex constructs. This article discusses a study that investigated whether bilingual examiners could apply comparative judgment (CJ) to pairs of Literature essays across different languages (English and Spanish). Preliminary findings suggest that whilst there are some cross-language standardization benefits, bilingual CJ faces validity challenges when different language cohorts approach target constructs differently. Existing definitions of inter-subject and intra-subject comparability are insufficient when multilingual subjects share fundamental constructs but differ in academic approaches. It is therefore proposed that an overarching classification of intra-disciplinary comparability be introduced to frame discussions around multilingual assessments of this nature. Finally, it is recommended that further research into bilingual CJ be carried out to determine how the method can most effectively support investigations into multilingual assessment comparability.

摘要多语言总结性评估由于多语言提供和可比性之间存在的紧张关系而面临重大挑战。然而，用于调查多语言评估可比性的传统方法无法适应包含针对复杂结构的扩展响应的评估。本文讨论了一项研究，探讨了双语考官是否可以在不同语言(英语和西班牙语)的文学论文对中应用比较判断(CJ)。初步研究结果表明，虽然跨语言标准化有一定的好处，但当不同语言群体对目标结构的处理方式不同时，双语CJ面临效度挑战。当多语言学科共享基本结构但在学术方法上不同时，现有的学科间和学科内可比性定义是不够的。因此，建议采用学科内可比性的总体分类，以围绕这种性质的多语言评估进行讨论。最后，建议对双语CJ进行进一步研究，以确定该方法如何最有效地支持多语言评估可比性的调查。

{"title":"Summative assessments in a multilingual context: What comparative judgment reveals about comparability across different languages in Literature","authors":"L.H.L. Badham, Antony Furlong","doi":"10.1080/15305058.2022.2149536","DOIUrl":"https://doi.org/10.1080/15305058.2022.2149536","url":null,"abstract":"Abstract Multilingual summative assessments face significant challenges due to tensions that exist between multiple language provision and comparability. Yet, conventional approaches for investigating comparability in multilingual assessments fail to accommodate assessments that comprise extended responses that target complex constructs. This article discusses a study that investigated whether bilingual examiners could apply comparative judgment (CJ) to pairs of Literature essays across different languages (English and Spanish). Preliminary findings suggest that whilst there are some cross-language standardization benefits, bilingual CJ faces validity challenges when different language cohorts approach target constructs differently. Existing definitions of inter-subject and intra-subject comparability are insufficient when multilingual subjects share fundamental constructs but differ in academic approaches. It is therefore proposed that an overarching classification of intra-disciplinary comparability be introduced to frame discussions around multilingual assessments of this nature. Finally, it is recommended that further research into bilingual CJ be carried out to determine how the method can most effectively support investigations into multilingual assessment comparability.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"111 - 134"},"PeriodicalIF":1.7,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41885703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Measuring pathological traits of the dependent personality disorder based on the HiTOP 基于HiTOP的依赖型人格障碍病理特征测量

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2022-12-19 DOI: 10.1080/15305058.2022.2148185

Lucas de Francisco Carvalho, A. Gonçalves, Amanda Rizzieri Romano, Antônio da Conceição Montes, G. Machado, Giselle Pianowski

Abstract We developed and validated a self-report scale for screening pathological traits of dependent personality disorder (DPD) from the Hierarchical Taxonomy of psychopathology (HiTOP) perspective. The sample was 693 adults who answered the new scale, the Dimensional Clinical Personality Inventory DPD (IDCP-DPD), the PID-5, the FFDI, and the FFBI. The IDCP-DPD was composed of six factors grouped in one general score. The scores showed associations with external measures in the expected direction, and the means comparisons showed large differences. Our findings indicated the IDCP-DPD as a useful clinical measure, and the structure observed confirms the spectrum level of the HiTOP.

摘要我们从精神病理学层次分类（HiTOP）的角度开发并验证了一种用于筛选依赖性人格障碍（DPD）病理特征的自我报告量表。样本为693名成年人，他们回答了新的量表，即维度临床人格量表DPD（IDCP-DPD）、PID-5、FFDI和FFBI。IDCP-DPD由六个因素组成，分为一个总分。这些分数显示出与预期方向上的外部测量相关，平均值比较显示出很大的差异。我们的研究结果表明IDCP-DPD是一种有用的临床测量方法，观察到的结构证实了HiTOP的光谱水平。

引用次数: 0

Refining the antisocial subscale of the dimensional clinical personality inventory 2: Failed improvements or did we reach the mountain top 完善维度临床人格量表的反社会子量表2:改进失败还是我们到达了顶峰

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2022-12-19 DOI: 10.1080/15305058.2022.2147938

Lucas de Francisco Carvalho, Camila Grillo Santos, Nelson Fernandes Junior, Rafael Moreton Alves da Rocha, Talita Meireles Flores, Gisele Magarotto Machado

Abstract We aimed to refine the previously proposed antisocial subscale for the Dimensional Clinical Personality Inventory 2 (IDCP-ASPD). The sample involved 628 Brazilian adults between 18 and 81 years old. We administered the revised ASPD subscale (IDCP-ASPD-R), the Affective and Cognitive Measure of Empathy (ACME), the Crime and Analogous Behavior Scale (CAB), and the Levenson Self-Report Psychopathy (LSRP). We confirmed the 3-factors structure for the IDCP-ASPD-R. The IDCP-ASPD-R and its former version presented a good capacity to distinguish the groups, with the largest effect size for the Affective factor (IDCP-ASPD-R). Although the IDCP-ASPD-R has shown good performance, we have observed only a slight increase over the previous version of the scale. Therefore, we can only expect a small higher contribution of IDCP-ASPD-R in its practical application to group discrimination. However, from a theoretical perspective, the IDCP-ASPD-R overrides its former version.

摘要:本研究旨在完善先前提出的临床人格量表(IDCP-ASPD)反社会子量表。该样本涉及628名年龄在18岁至81岁之间的巴西成年人。我们使用了修订后的反社会人格障碍量表(IDCP-ASPD-R)、情感与认知共情量表(ACME)、犯罪与类似行为量表(CAB)和Levenson精神病自述量表(LSRP)。我们证实了IDCP-ASPD-R的三因子结构。IDCP-ASPD-R及其前版本具有较好的群体区分能力，其中情感因素(IDCP-ASPD-R)的效应量最大。虽然IDCP-ASPD-R表现良好，但我们只观察到比以前版本的量表略有增加。因此，我们只能期望IDCP-ASPD-R在实际应用中对群体歧视的贡献略高。然而，从理论角度来看，IDCP-ASPD-R取代了之前的版本。

{"title":"Refining the antisocial subscale of the dimensional clinical personality inventory 2: Failed improvements or did we reach the mountain top","authors":"Lucas de Francisco Carvalho, Camila Grillo Santos, Nelson Fernandes Junior, Rafael Moreton Alves da Rocha, Talita Meireles Flores, Gisele Magarotto Machado","doi":"10.1080/15305058.2022.2147938","DOIUrl":"https://doi.org/10.1080/15305058.2022.2147938","url":null,"abstract":"Abstract We aimed to refine the previously proposed antisocial subscale for the Dimensional Clinical Personality Inventory 2 (IDCP-ASPD). The sample involved 628 Brazilian adults between 18 and 81 years old. We administered the revised ASPD subscale (IDCP-ASPD-R), the Affective and Cognitive Measure of Empathy (ACME), the Crime and Analogous Behavior Scale (CAB), and the Levenson Self-Report Psychopathy (LSRP). We confirmed the 3-factors structure for the IDCP-ASPD-R. The IDCP-ASPD-R and its former version presented a good capacity to distinguish the groups, with the largest effect size for the Affective factor (IDCP-ASPD-R). Although the IDCP-ASPD-R has shown good performance, we have observed only a slight increase over the previous version of the scale. Therefore, we can only expect a small higher contribution of IDCP-ASPD-R in its practical application to group discrimination. However, from a theoretical perspective, the IDCP-ASPD-R overrides its former version.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"77 - 96"},"PeriodicalIF":1.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42968250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Mobile sensing in psychological and educational research: Examples from two application fields 心理和教育研究中的移动传感:来自两个应用领域的例子

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2036160

Efsun Birtwistle, Ramona Schoedel, Florian Bemmann, Astrid Wirth, Christoph Sürig, Clemens Stachl, M. Bühner, Frank Niklas

Abstract Digital technologies play an important role in our daily lives. Smartphones and tablet computers are very common worldwide and are available for everybody from a very early age. This trend offers the opportunity to track digital usage data for psychological and educational research purposes. The current paper introduces two research projects, the PhoneStudy and Learning4Kids that both use mobile sensing software to collect ecologically valid data on the usage of applications installed on smartphones and tablets. This usage data is used for statistical analyses, for a reward system, and to provide feedback to the study participants. The advantages and challenges of using mobile sensing compared to conventional forms of assessments, and the potential applications of mobile sensing in psychological and educational research are discussed.

摘要数字技术在我们的日常生活中发挥着重要作用。智能手机和平板电脑在世界各地都很常见，每个人从小就可以使用。这一趋势为跟踪心理和教育研究目的的数字使用数据提供了机会。目前的论文介绍了两个研究项目，PhoneStudy和Learning4Kids，这两个项目都使用移动传感软件来收集关于智能手机和平板电脑上安装的应用程序使用情况的生态有效数据。该使用数据用于统计分析、奖励系统以及向研究参与者提供反馈。讨论了与传统评估形式相比，使用移动传感的优势和挑战，以及移动传感在心理和教育研究中的潜在应用。

引用次数: 1

You are what you click: using machine learning to model trace data for psychometric measurement 点击即是：使用机器学习对心理测量的跟踪数据进行建模

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2134394

R. Landers, Elena M. Auer, Gabriel Mersy, Sebastian Marin, Jason Blaik

Abstract Assessment trace data, such as mouse positions and their timing, offer interesting and provocative reflections of individual differences yet are currently underutilized by testing professionals. In this article, we present a 10-step procedure to maximize the probability that a trace data modeling project will be successful: 1) grounding the project in psychometric theory, 2) building technical infrastructure to collect trace data, 3) designing a useful developmental validation study, 4) using a holdout validation approach with collected data, 5) using exploratory analysis to conduct meaningful feature engineering, 6) identifying useful machine learning algorithms to predict a thoughtfully chosen criterion, 7) engineering a machine learning model with meaningful internal cross-validation and hyperparameter selection, 8) conducting model diagnostics to assess if the resulting model is overfitted, underfitted, or within acceptable tolerance, and 9) testing the success of the final model in meeting conceptual, technical, and psychometric goals. If deemed successful, trace data model predictions could then be engineered into decision-making systems. We present this framework within the broader view of psychometrics, exploring the challenges of developing psychometrically valid models using such complex data with much weaker trait signals than assessment developers have typically attempted to model.

摘要评估跟踪数据，如鼠标位置及其时间，提供了对个体差异的有趣和挑衅性的反映，但目前测试专业人员尚未充分利用。在这篇文章中，我们提出了一个10步程序，以最大限度地提高追踪数据建模项目成功的概率：1）将项目建立在心理测量理论的基础上，2）建立收集追踪数据的技术基础设施，3）设计一个有用的发展验证研究，4）对收集的数据使用坚持验证方法，5）使用探索性分析进行有意义的特征工程，6）识别有用的机器学习算法来预测深思熟虑选择的标准，7）设计具有有意义的内部交叉验证和超参数选择的机器学习模型，8）进行模型诊断以评估所得到的模型是否过拟合、不足，或在可接受的容忍度内，以及9）测试最终模型在满足概念、技术和心理测量目标方面的成功。如果被认为是成功的，跟踪数据模型预测可以被设计成决策系统。我们在心理测量学的更广泛视野中提出了这一框架，探讨了使用这种复杂数据开发心理测量学有效模型的挑战，这些数据的特征信号比评估开发人员通常试图建模的要弱得多。

{"title":"You are what you click: using machine learning to model trace data for psychometric measurement","authors":"R. Landers, Elena M. Auer, Gabriel Mersy, Sebastian Marin, Jason Blaik","doi":"10.1080/15305058.2022.2134394","DOIUrl":"https://doi.org/10.1080/15305058.2022.2134394","url":null,"abstract":"Abstract Assessment trace data, such as mouse positions and their timing, offer interesting and provocative reflections of individual differences yet are currently underutilized by testing professionals. In this article, we present a 10-step procedure to maximize the probability that a trace data modeling project will be successful: 1) grounding the project in psychometric theory, 2) building technical infrastructure to collect trace data, 3) designing a useful developmental validation study, 4) using a holdout validation approach with collected data, 5) using exploratory analysis to conduct meaningful feature engineering, 6) identifying useful machine learning algorithms to predict a thoughtfully chosen criterion, 7) engineering a machine learning model with meaningful internal cross-validation and hyperparameter selection, 8) conducting model diagnostics to assess if the resulting model is overfitted, underfitted, or within acceptable tolerance, and 9) testing the success of the final model in meeting conceptual, technical, and psychometric goals. If deemed successful, trace data model predictions could then be engineered into decision-making systems. We present this framework within the broader view of psychometrics, exploring the challenges of developing psychometrically valid models using such complex data with much weaker trait signals than assessment developers have typically attempted to model.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"243 - 263"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49369149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Generating reading comprehension items using automated processes 使用自动化过程生成阅读理解项目

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2070755

Jinnie Shin, Mark J. Gierl

Abstract Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using many different item formats, fill-in-the-blank remains one of the most common when the goal is to measure inferential knowledge. Currently, the item development process used to create fill-in-the-blank reading comprehension items is time-consuming and expensive. Hence, the purpose of the study is to introduce a new systematic method for generating fill-in-the-blank reading comprehension items using an item modeling approach. We describe the use of different unsupervised learning methods that can be paired with natural language processing techniques to identify the salient item models within existing texts. To demonstrate the capacity of our method, 1,013 test items were generated from 100 input texts taken from fill-in-the-blank reading comprehension items used on a high-stakes college entrance exam in South Korea. Our validation results indicated that the generated items produced higher semantic similarities between the item options while depicting little to no syntactic differences with the traditionally written test items.

摘要在过去的五年里，在推进制作不同内容领域项目所需的AIG方法方面取得了巨大进展。然而，一个巨大问题仍未解决的内容领域是语言艺术，更具体地说是阅读理解。虽然阅读理解测试项目可以使用许多不同的项目格式创建，但当目标是测量推理知识时，填空仍然是最常见的项目之一。目前，用于创建填空阅读理解项目的项目开发过程既耗时又昂贵。因此，本研究的目的是介绍一种利用项目建模方法生成填空阅读理解项目的新的系统方法。我们描述了使用不同的无监督学习方法，这些方法可以与自然语言处理技术相结合，以识别现有文本中的显著项目模型。为了证明我们的方法的能力，从韩国一次高风险大学入学考试中使用的填空阅读理解项目中提取的100个输入文本中生成了1013个测试项目。我们的验证结果表明，生成的项目在项目选项之间产生了更高的语义相似性，同时与传统的书面测试项目几乎没有语法差异。

{"title":"Generating reading comprehension items using automated processes","authors":"Jinnie Shin, Mark J. Gierl","doi":"10.1080/15305058.2022.2070755","DOIUrl":"https://doi.org/10.1080/15305058.2022.2070755","url":null,"abstract":"Abstract Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using many different item formats, fill-in-the-blank remains one of the most common when the goal is to measure inferential knowledge. Currently, the item development process used to create fill-in-the-blank reading comprehension items is time-consuming and expensive. Hence, the purpose of the study is to introduce a new systematic method for generating fill-in-the-blank reading comprehension items using an item modeling approach. We describe the use of different unsupervised learning methods that can be paired with natural language processing techniques to identify the salient item models within existing texts. To demonstrate the capacity of our method, 1,013 test items were generated from 100 input texts taken from fill-in-the-blank reading comprehension items used on a high-stakes college entrance exam in South Korea. Our validation results indicated that the generated items produced higher semantic similarities between the item options while depicting little to no syntactic differences with the traditionally written test items.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"289 - 311"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45142900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the writing performance of educationally at-risk examinees using technology 使用技术调查受教育风险考生的写作表现

IF 1.7 Q2 SOCIAL SCIENCES, INTERDISCIPLINARY

International Journal of Testing

Pub Date : 2022-10-02 DOI: 10.1080/15305058.2022.2050734

Mo Zhang, S. Sinharay

Abstract This article demonstrates how recent advances in technology allow fine-grained analyses of candidate-produced essays, thus providing a deeper insight on writing performance. We examined how essay features, automatically extracted using natural language processing and keystroke logging techniques, can predict various performance measures using data from a large-scale and high-stakes assessment for awarding high-school equivalency diploma. The features that are the most predictive of writing proficiency and broader academic success were identified and interpreted. The suggested methodology promises to be practically useful because it has the potential to point to specific writing skills that are important for improving essay writing and academic performance for educationally at-risk adult populations like the one considered in this article.

本文展示了最近的技术进步如何允许对候选人产生的文章进行细粒度分析，从而提供了对写作表现的更深入的了解。我们研究了使用自然语言处理和击键记录技术自动提取的论文特征如何使用来自授予高中同等学历的大规模高风险评估的数据来预测各种绩效指标。对写作能力和更广泛的学术成就最具预测性的特征进行了识别和解释。建议的方法有望在实践中发挥作用，因为它有可能指出特定的写作技巧，这些技巧对于提高论文写作和学习成绩非常重要，对于像本文中所考虑的那样有教育风险的成年人来说。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Journal of Testing

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀