Pub Date : 2023-04-18DOI: 10.1080/15305058.2023.2195659
Nico Streckert, Lara Kurtz, P. Kajonius
Abstract The Dark Factor of Personality (D) measures the latent core of antagonistic traits. The present study evaluated the psychometric properties of the Swedish version of the full (D70) and the brief (D16) versions, concerning structural validity, item information, and convergent validity. An online sample (N = 294) was analyzed using CFA (Maximum Likelihood Estimation), IRT (Graded Response Model) and SEM (latent correlations). Firstly, the original theorized bifactor model for D70 and a single-factor model for D16 showed good fit to the data. Moreover, new reliability-analyses based on FD and H indicated that the D70 favorably can be collapsed into a unidimensional measure, which is further discussed. Secondly, the IRT-analyses present valid item quality and functioning and showed that items provide the most information on trait levels above mean levels. Lastly, convergent SEM-analyses showed that D had high latent trait correlations to psychopathy and Machiavellianism, but not to narcissism. The correlations with the Big Six personality factors (mini-IPIP6) yielded expected high correlations with Agreeableness and Honesty-Humility. The Swedish translation of the full D70 and brief D16 is recommended for use in future research.
{"title":"Can your darkness be measured? Analyzing the full and brief version of the Dark Factor of Personality in Swedish","authors":"Nico Streckert, Lara Kurtz, P. Kajonius","doi":"10.1080/15305058.2023.2195659","DOIUrl":"https://doi.org/10.1080/15305058.2023.2195659","url":null,"abstract":"Abstract The Dark Factor of Personality (D) measures the latent core of antagonistic traits. The present study evaluated the psychometric properties of the Swedish version of the full (D70) and the brief (D16) versions, concerning structural validity, item information, and convergent validity. An online sample (N = 294) was analyzed using CFA (Maximum Likelihood Estimation), IRT (Graded Response Model) and SEM (latent correlations). Firstly, the original theorized bifactor model for D70 and a single-factor model for D16 showed good fit to the data. Moreover, new reliability-analyses based on FD and H indicated that the D70 favorably can be collapsed into a unidimensional measure, which is further discussed. Secondly, the IRT-analyses present valid item quality and functioning and showed that items provide the most information on trait levels above mean levels. Lastly, convergent SEM-analyses showed that D had high latent trait correlations to psychopathy and Machiavellianism, but not to narcissism. The correlations with the Big Six personality factors (mini-IPIP6) yielded expected high correlations with Agreeableness and Honesty-Humility. The Swedish translation of the full D70 and brief D16 is recommended for use in future research.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"145 - 189"},"PeriodicalIF":1.7,"publicationDate":"2023-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45280757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-04-03DOI: 10.1080/15305058.2023.2195661
Carla Martí Valls, Kitty Balazadeh, P. Kajonius
Abstract The Alternative DSM-5 Model for Personality Disorders (AMPD) consists of level of personality functioning (Criterion A) and maladaptive personality traits (Criterion B). The brief scale versions of these are understudied, while often being used by clinicians and researchers. In this study, we wanted to investigate the overlap and predictive validity of Criterion A and B. Participants (N = 253) were measured on level of personality functioning (LPFS-BF) and maladaptive personality traits (PID-5-BF), as well as internalizing outcomes such existential meaninglessness (EMS) and externalizing outcomes such as substance and behavioral addictions (SSAB). Data analysis was conducted with principal component analysis (PCA) and regression analyses. The results showed over 50% overlap between the brief versions of Criterion A and B, while Criterion B slightly outperformed Criterion A in outcomes of EMS and SSAB. We discuss the potential redundancy and usefulness of personality functioning and maladaptive personality traits.
{"title":"Investigating the overlap and predictive validity between Criterion A and B in the alternative model for personality disorders in DSM-5","authors":"Carla Martí Valls, Kitty Balazadeh, P. Kajonius","doi":"10.1080/15305058.2023.2195661","DOIUrl":"https://doi.org/10.1080/15305058.2023.2195661","url":null,"abstract":"Abstract The Alternative DSM-5 Model for Personality Disorders (AMPD) consists of level of personality functioning (Criterion A) and maladaptive personality traits (Criterion B). The brief scale versions of these are understudied, while often being used by clinicians and researchers. In this study, we wanted to investigate the overlap and predictive validity of Criterion A and B. Participants (N = 253) were measured on level of personality functioning (LPFS-BF) and maladaptive personality traits (PID-5-BF), as well as internalizing outcomes such existential meaninglessness (EMS) and externalizing outcomes such as substance and behavioral addictions (SSAB). Data analysis was conducted with principal component analysis (PCA) and regression analyses. The results showed over 50% overlap between the brief versions of Criterion A and B, while Criterion B slightly outperformed Criterion A in outcomes of EMS and SSAB. We discuss the potential redundancy and usefulness of personality functioning and maladaptive personality traits.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"190 - 204"},"PeriodicalIF":1.7,"publicationDate":"2023-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48832344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-01-18DOI: 10.1080/15305058.2023.2167084
Rongxiu Wu, C. Chiu, David M. Dueber, Mirang Park, D. Lange, Emre Umucu, D. Strauser
Abstract The current study examined the factor structure, measurement invariance, and construct validity of the 14-item Revised Developmental Work Personality Scale (RDWPS) using a sample of 603 college students in a Midwest university of the United States. Exploratory and confirmatory factor analysis results indicated that the 11-item RDWPS resulted in a better fit of the measurement model. Partial measurement invariance was also detected between gender groups. In addition, it was weakly to moderately correlated with the Utrecht Work Engagement Scale-Student (UWES-S), self-reported effort, and GPA among college students. Lastly, it was found that males scored lower than females in all three subscales of the RDWPS in comparison to the latent means of the gender groups.
{"title":"Multidimensionality and measurement invariance of the revised developmental work personality scale","authors":"Rongxiu Wu, C. Chiu, David M. Dueber, Mirang Park, D. Lange, Emre Umucu, D. Strauser","doi":"10.1080/15305058.2023.2167084","DOIUrl":"https://doi.org/10.1080/15305058.2023.2167084","url":null,"abstract":"Abstract The current study examined the factor structure, measurement invariance, and construct validity of the 14-item Revised Developmental Work Personality Scale (RDWPS) using a sample of 603 college students in a Midwest university of the United States. Exploratory and confirmatory factor analysis results indicated that the 11-item RDWPS resulted in a better fit of the measurement model. Partial measurement invariance was also detected between gender groups. In addition, it was weakly to moderately correlated with the Utrecht Work Engagement Scale-Student (UWES-S), self-reported effort, and GPA among college students. Lastly, it was found that males scored lower than females in all three subscales of the RDWPS in comparison to the latent means of the gender groups.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"135 - 144"},"PeriodicalIF":1.7,"publicationDate":"2023-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49347768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-28DOI: 10.1080/15305058.2022.2149536
L.H.L. Badham, Antony Furlong
Abstract Multilingual summative assessments face significant challenges due to tensions that exist between multiple language provision and comparability. Yet, conventional approaches for investigating comparability in multilingual assessments fail to accommodate assessments that comprise extended responses that target complex constructs. This article discusses a study that investigated whether bilingual examiners could apply comparative judgment (CJ) to pairs of Literature essays across different languages (English and Spanish). Preliminary findings suggest that whilst there are some cross-language standardization benefits, bilingual CJ faces validity challenges when different language cohorts approach target constructs differently. Existing definitions of inter-subject and intra-subject comparability are insufficient when multilingual subjects share fundamental constructs but differ in academic approaches. It is therefore proposed that an overarching classification of intra-disciplinary comparability be introduced to frame discussions around multilingual assessments of this nature. Finally, it is recommended that further research into bilingual CJ be carried out to determine how the method can most effectively support investigations into multilingual assessment comparability.
{"title":"Summative assessments in a multilingual context: What comparative judgment reveals about comparability across different languages in Literature","authors":"L.H.L. Badham, Antony Furlong","doi":"10.1080/15305058.2022.2149536","DOIUrl":"https://doi.org/10.1080/15305058.2022.2149536","url":null,"abstract":"Abstract Multilingual summative assessments face significant challenges due to tensions that exist between multiple language provision and comparability. Yet, conventional approaches for investigating comparability in multilingual assessments fail to accommodate assessments that comprise extended responses that target complex constructs. This article discusses a study that investigated whether bilingual examiners could apply comparative judgment (CJ) to pairs of Literature essays across different languages (English and Spanish). Preliminary findings suggest that whilst there are some cross-language standardization benefits, bilingual CJ faces validity challenges when different language cohorts approach target constructs differently. Existing definitions of inter-subject and intra-subject comparability are insufficient when multilingual subjects share fundamental constructs but differ in academic approaches. It is therefore proposed that an overarching classification of intra-disciplinary comparability be introduced to frame discussions around multilingual assessments of this nature. Finally, it is recommended that further research into bilingual CJ be carried out to determine how the method can most effectively support investigations into multilingual assessment comparability.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"111 - 134"},"PeriodicalIF":1.7,"publicationDate":"2022-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41885703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-19DOI: 10.1080/15305058.2022.2148185
Lucas de Francisco Carvalho, A. Gonçalves, Amanda Rizzieri Romano, Antônio da Conceição Montes, G. Machado, Giselle Pianowski
Abstract We developed and validated a self-report scale for screening pathological traits of dependent personality disorder (DPD) from the Hierarchical Taxonomy of psychopathology (HiTOP) perspective. The sample was 693 adults who answered the new scale, the Dimensional Clinical Personality Inventory DPD (IDCP-DPD), the PID-5, the FFDI, and the FFBI. The IDCP-DPD was composed of six factors grouped in one general score. The scores showed associations with external measures in the expected direction, and the means comparisons showed large differences. Our findings indicated the IDCP-DPD as a useful clinical measure, and the structure observed confirms the spectrum level of the HiTOP.
{"title":"Measuring pathological traits of the dependent personality disorder based on the HiTOP","authors":"Lucas de Francisco Carvalho, A. Gonçalves, Amanda Rizzieri Romano, Antônio da Conceição Montes, G. Machado, Giselle Pianowski","doi":"10.1080/15305058.2022.2148185","DOIUrl":"https://doi.org/10.1080/15305058.2022.2148185","url":null,"abstract":"Abstract We developed and validated a self-report scale for screening pathological traits of dependent personality disorder (DPD) from the Hierarchical Taxonomy of psychopathology (HiTOP) perspective. The sample was 693 adults who answered the new scale, the Dimensional Clinical Personality Inventory DPD (IDCP-DPD), the PID-5, the FFDI, and the FFBI. The IDCP-DPD was composed of six factors grouped in one general score. The scores showed associations with external measures in the expected direction, and the means comparisons showed large differences. Our findings indicated the IDCP-DPD as a useful clinical measure, and the structure observed confirms the spectrum level of the HiTOP.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"97 - 110"},"PeriodicalIF":1.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45945681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-19DOI: 10.1080/15305058.2022.2147938
Lucas de Francisco Carvalho, Camila Grillo Santos, Nelson Fernandes Junior, Rafael Moreton Alves da Rocha, Talita Meireles Flores, Gisele Magarotto Machado
Abstract We aimed to refine the previously proposed antisocial subscale for the Dimensional Clinical Personality Inventory 2 (IDCP-ASPD). The sample involved 628 Brazilian adults between 18 and 81 years old. We administered the revised ASPD subscale (IDCP-ASPD-R), the Affective and Cognitive Measure of Empathy (ACME), the Crime and Analogous Behavior Scale (CAB), and the Levenson Self-Report Psychopathy (LSRP). We confirmed the 3-factors structure for the IDCP-ASPD-R. The IDCP-ASPD-R and its former version presented a good capacity to distinguish the groups, with the largest effect size for the Affective factor (IDCP-ASPD-R). Although the IDCP-ASPD-R has shown good performance, we have observed only a slight increase over the previous version of the scale. Therefore, we can only expect a small higher contribution of IDCP-ASPD-R in its practical application to group discrimination. However, from a theoretical perspective, the IDCP-ASPD-R overrides its former version.
{"title":"Refining the antisocial subscale of the dimensional clinical personality inventory 2: Failed improvements or did we reach the mountain top","authors":"Lucas de Francisco Carvalho, Camila Grillo Santos, Nelson Fernandes Junior, Rafael Moreton Alves da Rocha, Talita Meireles Flores, Gisele Magarotto Machado","doi":"10.1080/15305058.2022.2147938","DOIUrl":"https://doi.org/10.1080/15305058.2022.2147938","url":null,"abstract":"Abstract We aimed to refine the previously proposed antisocial subscale for the Dimensional Clinical Personality Inventory 2 (IDCP-ASPD). The sample involved 628 Brazilian adults between 18 and 81 years old. We administered the revised ASPD subscale (IDCP-ASPD-R), the Affective and Cognitive Measure of Empathy (ACME), the Crime and Analogous Behavior Scale (CAB), and the Levenson Self-Report Psychopathy (LSRP). We confirmed the 3-factors structure for the IDCP-ASPD-R. The IDCP-ASPD-R and its former version presented a good capacity to distinguish the groups, with the largest effect size for the Affective factor (IDCP-ASPD-R). Although the IDCP-ASPD-R has shown good performance, we have observed only a slight increase over the previous version of the scale. Therefore, we can only expect a small higher contribution of IDCP-ASPD-R in its practical application to group discrimination. However, from a theoretical perspective, the IDCP-ASPD-R overrides its former version.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"23 1","pages":"77 - 96"},"PeriodicalIF":1.7,"publicationDate":"2022-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42968250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15305058.2022.2036160
Efsun Birtwistle, Ramona Schoedel, Florian Bemmann, Astrid Wirth, Christoph Sürig, Clemens Stachl, M. Bühner, Frank Niklas
Abstract Digital technologies play an important role in our daily lives. Smartphones and tablet computers are very common worldwide and are available for everybody from a very early age. This trend offers the opportunity to track digital usage data for psychological and educational research purposes. The current paper introduces two research projects, the PhoneStudy and Learning4Kids that both use mobile sensing software to collect ecologically valid data on the usage of applications installed on smartphones and tablets. This usage data is used for statistical analyses, for a reward system, and to provide feedback to the study participants. The advantages and challenges of using mobile sensing compared to conventional forms of assessments, and the potential applications of mobile sensing in psychological and educational research are discussed.
{"title":"Mobile sensing in psychological and educational research: Examples from two application fields","authors":"Efsun Birtwistle, Ramona Schoedel, Florian Bemmann, Astrid Wirth, Christoph Sürig, Clemens Stachl, M. Bühner, Frank Niklas","doi":"10.1080/15305058.2022.2036160","DOIUrl":"https://doi.org/10.1080/15305058.2022.2036160","url":null,"abstract":"Abstract Digital technologies play an important role in our daily lives. Smartphones and tablet computers are very common worldwide and are available for everybody from a very early age. This trend offers the opportunity to track digital usage data for psychological and educational research purposes. The current paper introduces two research projects, the PhoneStudy and Learning4Kids that both use mobile sensing software to collect ecologically valid data on the usage of applications installed on smartphones and tablets. This usage data is used for statistical analyses, for a reward system, and to provide feedback to the study participants. The advantages and challenges of using mobile sensing compared to conventional forms of assessments, and the potential applications of mobile sensing in psychological and educational research are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"264 - 288"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43292336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15305058.2022.2134394
R. Landers, Elena M. Auer, Gabriel Mersy, Sebastian Marin, Jason Blaik
Abstract Assessment trace data, such as mouse positions and their timing, offer interesting and provocative reflections of individual differences yet are currently underutilized by testing professionals. In this article, we present a 10-step procedure to maximize the probability that a trace data modeling project will be successful: 1) grounding the project in psychometric theory, 2) building technical infrastructure to collect trace data, 3) designing a useful developmental validation study, 4) using a holdout validation approach with collected data, 5) using exploratory analysis to conduct meaningful feature engineering, 6) identifying useful machine learning algorithms to predict a thoughtfully chosen criterion, 7) engineering a machine learning model with meaningful internal cross-validation and hyperparameter selection, 8) conducting model diagnostics to assess if the resulting model is overfitted, underfitted, or within acceptable tolerance, and 9) testing the success of the final model in meeting conceptual, technical, and psychometric goals. If deemed successful, trace data model predictions could then be engineered into decision-making systems. We present this framework within the broader view of psychometrics, exploring the challenges of developing psychometrically valid models using such complex data with much weaker trait signals than assessment developers have typically attempted to model.
{"title":"You are what you click: using machine learning to model trace data for psychometric measurement","authors":"R. Landers, Elena M. Auer, Gabriel Mersy, Sebastian Marin, Jason Blaik","doi":"10.1080/15305058.2022.2134394","DOIUrl":"https://doi.org/10.1080/15305058.2022.2134394","url":null,"abstract":"Abstract Assessment trace data, such as mouse positions and their timing, offer interesting and provocative reflections of individual differences yet are currently underutilized by testing professionals. In this article, we present a 10-step procedure to maximize the probability that a trace data modeling project will be successful: 1) grounding the project in psychometric theory, 2) building technical infrastructure to collect trace data, 3) designing a useful developmental validation study, 4) using a holdout validation approach with collected data, 5) using exploratory analysis to conduct meaningful feature engineering, 6) identifying useful machine learning algorithms to predict a thoughtfully chosen criterion, 7) engineering a machine learning model with meaningful internal cross-validation and hyperparameter selection, 8) conducting model diagnostics to assess if the resulting model is overfitted, underfitted, or within acceptable tolerance, and 9) testing the success of the final model in meeting conceptual, technical, and psychometric goals. If deemed successful, trace data model predictions could then be engineered into decision-making systems. We present this framework within the broader view of psychometrics, exploring the challenges of developing psychometrically valid models using such complex data with much weaker trait signals than assessment developers have typically attempted to model.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"243 - 263"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49369149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15305058.2022.2070755
Jinnie Shin, Mark J. Gierl
Abstract Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using many different item formats, fill-in-the-blank remains one of the most common when the goal is to measure inferential knowledge. Currently, the item development process used to create fill-in-the-blank reading comprehension items is time-consuming and expensive. Hence, the purpose of the study is to introduce a new systematic method for generating fill-in-the-blank reading comprehension items using an item modeling approach. We describe the use of different unsupervised learning methods that can be paired with natural language processing techniques to identify the salient item models within existing texts. To demonstrate the capacity of our method, 1,013 test items were generated from 100 input texts taken from fill-in-the-blank reading comprehension items used on a high-stakes college entrance exam in South Korea. Our validation results indicated that the generated items produced higher semantic similarities between the item options while depicting little to no syntactic differences with the traditionally written test items.
{"title":"Generating reading comprehension items using automated processes","authors":"Jinnie Shin, Mark J. Gierl","doi":"10.1080/15305058.2022.2070755","DOIUrl":"https://doi.org/10.1080/15305058.2022.2070755","url":null,"abstract":"Abstract Over the last five years, tremendous strides have been made in advancing the AIG methodology required to produce items in diverse content areas. However, the one content area where enormous problems remain unsolved is language arts, generally, and reading comprehension, more specifically. While reading comprehension test items can be created using many different item formats, fill-in-the-blank remains one of the most common when the goal is to measure inferential knowledge. Currently, the item development process used to create fill-in-the-blank reading comprehension items is time-consuming and expensive. Hence, the purpose of the study is to introduce a new systematic method for generating fill-in-the-blank reading comprehension items using an item modeling approach. We describe the use of different unsupervised learning methods that can be paired with natural language processing techniques to identify the salient item models within existing texts. To demonstrate the capacity of our method, 1,013 test items were generated from 100 input texts taken from fill-in-the-blank reading comprehension items used on a high-stakes college entrance exam in South Korea. Our validation results indicated that the generated items produced higher semantic similarities between the item options while depicting little to no syntactic differences with the traditionally written test items.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"289 - 311"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45142900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-10-02DOI: 10.1080/15305058.2022.2050734
Mo Zhang, S. Sinharay
Abstract This article demonstrates how recent advances in technology allow fine-grained analyses of candidate-produced essays, thus providing a deeper insight on writing performance. We examined how essay features, automatically extracted using natural language processing and keystroke logging techniques, can predict various performance measures using data from a large-scale and high-stakes assessment for awarding high-school equivalency diploma. The features that are the most predictive of writing proficiency and broader academic success were identified and interpreted. The suggested methodology promises to be practically useful because it has the potential to point to specific writing skills that are important for improving essay writing and academic performance for educationally at-risk adult populations like the one considered in this article.
{"title":"Investigating the writing performance of educationally at-risk examinees using technology","authors":"Mo Zhang, S. Sinharay","doi":"10.1080/15305058.2022.2050734","DOIUrl":"https://doi.org/10.1080/15305058.2022.2050734","url":null,"abstract":"Abstract This article demonstrates how recent advances in technology allow fine-grained analyses of candidate-produced essays, thus providing a deeper insight on writing performance. We examined how essay features, automatically extracted using natural language processing and keystroke logging techniques, can predict various performance measures using data from a large-scale and high-stakes assessment for awarding high-school equivalency diploma. The features that are the most predictive of writing proficiency and broader academic success were identified and interpreted. The suggested methodology promises to be practically useful because it has the potential to point to specific writing skills that are important for improving essay writing and academic performance for educationally at-risk adult populations like the one considered in this article.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":"22 1","pages":"312 - 347"},"PeriodicalIF":1.7,"publicationDate":"2022-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48678854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}