Pub Date : 2021-06-15DOI: 10.1080/15305058.2021.1916505
Anastasios Karakolidis, M. O’Leary, Darina Scully
Abstract The linguistic complexity of many text-based tests can be a source of construct-irrelevant variance, as test-takers’ performance may be affected by factors that are beyond the focus of the assessment itself, such as reading comprehension skills. This experimental study examined the extent to which the use of animated videos, as opposed to written text, could (i) reduce construct-irrelevant variance attributed to language and reading skills and (ii) impact test-takers’ reactions to a situational judgment test. The results indicated that the variance attributed to construct-irrelevant factors was lower by 9.5% in the animated version of the test. In addition, those who took the animated test perceived it to be more valid, fair, and enjoyable, than those who took the text-based test. They also rated the language used as less difficult to understand. The implications of these findings are discussed.
{"title":"Animated videos in assessment: comparing validity evidence from and test-takers’ reactions to an animated and a text-based situational judgment test","authors":"Anastasios Karakolidis, M. O’Leary, Darina Scully","doi":"10.1080/15305058.2021.1916505","DOIUrl":"https://doi.org/10.1080/15305058.2021.1916505","url":null,"abstract":"Abstract The linguistic complexity of many text-based tests can be a source of construct-irrelevant variance, as test-takers’ performance may be affected by factors that are beyond the focus of the assessment itself, such as reading comprehension skills. This experimental study examined the extent to which the use of animated videos, as opposed to written text, could (i) reduce construct-irrelevant variance attributed to language and reading skills and (ii) impact test-takers’ reactions to a situational judgment test. The results indicated that the variance attributed to construct-irrelevant factors was lower by 9.5% in the animated version of the test. In addition, those who took the animated test perceived it to be more valid, fair, and enjoyable, than those who took the text-based test. They also rated the language used as less difficult to understand. The implications of these findings are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1916505","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41447909","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-09DOI: 10.1080/15305058.2022.2042000
C. Cassiani-Miranda, J. Pedrozo-Pupo, A. Campo‐Arias
Abstract The study aimed to adapt and evaluate a scale to measure COVID-19-CED in COVID-19 survivors. A sample of 330 COVID-19 survivors filled out the COVID-19 Perceived Discrimination Scale (C-19-PDS). C-19-PDS was adapted from the Tuberculosis Perceived Discrimination Scale (11 items). Confirmatory factor analysis showed poor goodness-of-fit indicators. However, the 5-item version of the C-19-PDS showed better goodness-of-fit indicators, high internal consistency, and non-gender DIF. This instrument is recommended to evaluate COVID-19-CED in Colombian and other Spanish-speaking populations.
{"title":"Post-COVID-19 perceived stigma-discrimination scale: psychometric development and evaluation","authors":"C. Cassiani-Miranda, J. Pedrozo-Pupo, A. Campo‐Arias","doi":"10.1080/15305058.2022.2042000","DOIUrl":"https://doi.org/10.1080/15305058.2022.2042000","url":null,"abstract":"Abstract The study aimed to adapt and evaluate a scale to measure COVID-19-CED in COVID-19 survivors. A sample of 330 COVID-19 survivors filled out the COVID-19 Perceived Discrimination Scale (C-19-PDS). C-19-PDS was adapted from the Tuberculosis Perceived Discrimination Scale (11 items). Confirmatory factor analysis showed poor goodness-of-fit indicators. However, the 5-item version of the C-19-PDS showed better goodness-of-fit indicators, high internal consistency, and non-gender DIF. This instrument is recommended to evaluate COVID-19-CED in Colombian and other Spanish-speaking populations.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43353665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-06DOI: 10.1080/15305058.2021.1884872
S. Roschmann, S. Witmer, Martin A. Volker
Abstract Accommodations are commonly provided to address language-related barriers students may experience during testing. Research on the validity of scores from accommodated test administrations remains somewhat inconclusive. The current study investigated item response patterns to understand whether accommodations, as used in practice among English learners (ELs) in the United States, allow for comparable measurement between ELs and non-ELs. Results indicated that although significant differences are evident in overall test scores for ELs and non-ELs, only minimal measurement concerns were evident. Very few items displayed moderate or large differential item functioning (DIF); no tests showed small, medium, or large differential test functioning. The current study adds to existing literature on measurement comparability and accommodation research on ELs; implications for practice are provided.
{"title":"Examining provision and sufficiency of testing accommodations for English learners","authors":"S. Roschmann, S. Witmer, Martin A. Volker","doi":"10.1080/15305058.2021.1884872","DOIUrl":"https://doi.org/10.1080/15305058.2021.1884872","url":null,"abstract":"Abstract Accommodations are commonly provided to address language-related barriers students may experience during testing. Research on the validity of scores from accommodated test administrations remains somewhat inconclusive. The current study investigated item response patterns to understand whether accommodations, as used in practice among English learners (ELs) in the United States, allow for comparable measurement between ELs and non-ELs. Results indicated that although significant differences are evident in overall test scores for ELs and non-ELs, only minimal measurement concerns were evident. Very few items displayed moderate or large differential item functioning (DIF); no tests showed small, medium, or large differential test functioning. The current study adds to existing literature on measurement comparability and accommodation research on ELs; implications for practice are provided.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1884872","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46681048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-02-03DOI: 10.1080/15305058.2021.1884871
Emmanuel Affum-Osei, H. Mensah, S. K. Forkuoh, Eric Adom Asante
Abstract The purpose of this study was to examine the psychometric properties of the goal orientation (GO) scale across job search contexts to facilitate its use in large and varied search settings. A sample of 720 job losers and new entrants’ job seekers in Ghana completed the survey. Confirmatory factor analysis supported the three-factor theoretical structure (Learning goal, Performance-prove goal, and Performance-avoid goal orientations) for both new entrants’ and job losers’ samples. Results of the invariance test reached measurement equivalence across job search contexts and genders. Furthermore, GO dimensions correlated differently with some cognitive self-regulation criterion variables (employment commitment, self-control, learning from failure, and strategy awareness) thus, providing evidence of convergent and discriminant validity. Overall, the study provides additional support for the job search GO measure for use across different job search contexts.
{"title":"Goal orientation in job search: Psychometric characteristics and construct validation across job search contexts","authors":"Emmanuel Affum-Osei, H. Mensah, S. K. Forkuoh, Eric Adom Asante","doi":"10.1080/15305058.2021.1884871","DOIUrl":"https://doi.org/10.1080/15305058.2021.1884871","url":null,"abstract":"Abstract The purpose of this study was to examine the psychometric properties of the goal orientation (GO) scale across job search contexts to facilitate its use in large and varied search settings. A sample of 720 job losers and new entrants’ job seekers in Ghana completed the survey. Confirmatory factor analysis supported the three-factor theoretical structure (Learning goal, Performance-prove goal, and Performance-avoid goal orientations) for both new entrants’ and job losers’ samples. Results of the invariance test reached measurement equivalence across job search contexts and genders. Furthermore, GO dimensions correlated differently with some cognitive self-regulation criterion variables (employment commitment, self-control, learning from failure, and strategy awareness) thus, providing evidence of convergent and discriminant validity. Overall, the study provides additional support for the job search GO measure for use across different job search contexts.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2021-02-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2021.1884871","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43471412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1080/15305058.2021.2019747
Zoe Magraw‐Mickelson, Harry Wang, M. Gollwitzer
Abstract Much psychological research depends on participants’ diligence in filling out materials such as surveys. However, not all participants are motivated to respond attentively, which leads to unintended issues with data quality, known as careless responding. Our question is: how do different modes of data collection—paper/pencil, computer/web-based, and smartphone—affect participants’ diligence vs. “careless responding” tendencies and, thus, data quality? Results from prior studies suggest that different data collection modes produce a comparable prevalence of careless responding tendencies. However, as technology develops and data are collected with increasingly diversified populations, this question needs to be readdressed and taken further. The present research examined the effect of survey mode on careless responding in a repeated-measures design with data from three different samples. First, in a sample of working adults from China, we found that participants were slightly more careless when completing computer/web-based survey materials than in paper/pencil mode. Next, in a German student sample, participants were slightly more careless when completing the paper/pencil mode compared to the smartphone mode. Finally, in a sample of Chinese-speaking students, we found no difference between modes. Overall, in a meta-analysis of the findings, we found minimal difference between modes across cultures. Theoretical and practical implications are discussed.
{"title":"Survey mode and data quality: Careless responding across three modes in cross-cultural contexts","authors":"Zoe Magraw‐Mickelson, Harry Wang, M. Gollwitzer","doi":"10.1080/15305058.2021.2019747","DOIUrl":"https://doi.org/10.1080/15305058.2021.2019747","url":null,"abstract":"Abstract Much psychological research depends on participants’ diligence in filling out materials such as surveys. However, not all participants are motivated to respond attentively, which leads to unintended issues with data quality, known as careless responding. Our question is: how do different modes of data collection—paper/pencil, computer/web-based, and smartphone—affect participants’ diligence vs. “careless responding” tendencies and, thus, data quality? Results from prior studies suggest that different data collection modes produce a comparable prevalence of careless responding tendencies. However, as technology develops and data are collected with increasingly diversified populations, this question needs to be readdressed and taken further. The present research examined the effect of survey mode on careless responding in a repeated-measures design with data from three different samples. First, in a sample of working adults from China, we found that participants were slightly more careless when completing computer/web-based survey materials than in paper/pencil mode. Next, in a German student sample, participants were slightly more careless when completing the paper/pencil mode compared to the smartphone mode. Finally, in a sample of Chinese-speaking students, we found no difference between modes. Overall, in a meta-analysis of the findings, we found minimal difference between modes across cultures. Theoretical and practical implications are discussed.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45224199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-10-23DOI: 10.1080/15305058.2020.1828427
M. Finkelman, J. de la Torre, Jeremy Karp
Abstract Cognitive diagnosis models (CDMs) have been studied as a means of providing detailed diagnostic information about the skills that have been mastered, and the skills that have not, by examinees. Prior research has examined the use of automated test assembly (ATA) alongside CDMs; however, no previous study has investigated how to perform ATA when a CDM is employed and the total amount of time taken by the test must be controlled. The purpose of the current research was to develop an ATA procedure to select tests that are highly informative while simultaneously satisfying constraints on key parameters related to the total-time distribution. In a simulation study, the procedure successfully selected tests that met these dual goals.
{"title":"Cognitive diagnosis models and automated test assembly: an approach incorporating response times","authors":"M. Finkelman, J. de la Torre, Jeremy Karp","doi":"10.1080/15305058.2020.1828427","DOIUrl":"https://doi.org/10.1080/15305058.2020.1828427","url":null,"abstract":"Abstract Cognitive diagnosis models (CDMs) have been studied as a means of providing detailed diagnostic information about the skills that have been mastered, and the skills that have not, by examinees. Prior research has examined the use of automated test assembly (ATA) alongside CDMs; however, no previous study has investigated how to perform ATA when a CDM is employed and the total amount of time taken by the test must be controlled. The purpose of the current research was to develop an ATA procedure to select tests that are highly informative while simultaneously satisfying constraints on key parameters related to the total-time distribution. In a simulation study, the procedure successfully selected tests that met these dual goals.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1828427","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44001960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-31DOI: 10.1080/15305058.2020.1786833
Anely Ramírez, Mladen Koljatic, Mónica Silva
Abstract The study addresses the association between coaching practices and university admission test performance in Chile. Estimates of coaching effects are reported for test-takers from the private and public school systems. Our results indicate that coaching is associated with variations in test scores. The estimated magnitude of coaching appears to vary by subject area, type of coaching strategy and type of high school attended.
{"title":"Coaching β in admission test performance: a study of group differences","authors":"Anely Ramírez, Mladen Koljatic, Mónica Silva","doi":"10.1080/15305058.2020.1786833","DOIUrl":"https://doi.org/10.1080/15305058.2020.1786833","url":null,"abstract":"Abstract The study addresses the association between coaching practices and university admission test performance in Chile. Estimates of coaching effects are reported for test-takers from the private and public school systems. Our results indicate that coaching is associated with variations in test scores. The estimated magnitude of coaching appears to vary by subject area, type of coaching strategy and type of high school attended.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1786833","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48759901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-24DOI: 10.1080/15305058.2020.1786834
S. Finney, B. Perkins, Paulius Satkus
Abstract Using a sample of 497 college students, we measured test-taking emotions (anger, worry, pride, enjoyment) after the first third, second third, and last third of a low-stakes cognitive test of sociocultural knowledge. We examined the simultaneous change in emotions and whether change in emotions predicted subsequent test-taking effort and test performance. Latent growth models indicated that, on average, enjoyment and anger increased, whereas pride and worry decreased during the test. There was significant variability in individual change about these averages. Positive correlations were observed between change in worry and anger and change in pride and enjoyment. Structural equation models indicated that all initial emotions and gains in pride during the test influenced subsequent effort, whereas initial worry, anger and enjoyment, change in pride and enjoyment, and effort influenced test scores. The findings emphasize the importance of assessing change in emotions and the mediation mechanism of effort when modeling test performance.
{"title":"Examining the simultaneous change in emotions during a test: relations with expended effort and test performance","authors":"S. Finney, B. Perkins, Paulius Satkus","doi":"10.1080/15305058.2020.1786834","DOIUrl":"https://doi.org/10.1080/15305058.2020.1786834","url":null,"abstract":"Abstract Using a sample of 497 college students, we measured test-taking emotions (anger, worry, pride, enjoyment) after the first third, second third, and last third of a low-stakes cognitive test of sociocultural knowledge. We examined the simultaneous change in emotions and whether change in emotions predicted subsequent test-taking effort and test performance. Latent growth models indicated that, on average, enjoyment and anger increased, whereas pride and worry decreased during the test. There was significant variability in individual change about these averages. Positive correlations were observed between change in worry and anger and change in pride and enjoyment. Structural equation models indicated that all initial emotions and gains in pride during the test influenced subsequent effort, whereas initial worry, anger and enjoyment, change in pride and enjoyment, and effort influenced test scores. The findings emphasize the importance of assessing change in emotions and the mediation mechanism of effort when modeling test performance.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2020.1786834","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42495913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-07-02DOI: 10.1080/15305058.2019.1673758
A. Walker, Stefanie A. Wind
Researchers apply individual person fit analyses as a procedure for checking model-data fit for individual test-takers. When a test-taker misfits, it means that the inferences from their test score regarding what they know and can do may not be accurate. One problem in applying individual person fit procedures in practice is the question of how much misfit it takes to make the test score an untrustworthy estimate of achievement. In this paper, we argue that if a person’s responses generally follow a monotonic pattern, the resulting test score is “good enough” to be interpreted and used. We present an approach that applies statistical procedures from the Rasch and Mokken measurement perspectives to examine individual person fit based on this good enough criterion in real data from a performance assessment. We discuss how these perspectives may facilitate thinking about applying individual person fit procedures in practice.
{"title":"Identifying Misfitting Achievement Estimates in Performance Assessments: An Illustration Using Rasch and Mokken Scale Analyses","authors":"A. Walker, Stefanie A. Wind","doi":"10.1080/15305058.2019.1673758","DOIUrl":"https://doi.org/10.1080/15305058.2019.1673758","url":null,"abstract":"Researchers apply individual person fit analyses as a procedure for checking model-data fit for individual test-takers. When a test-taker misfits, it means that the inferences from their test score regarding what they know and can do may not be accurate. One problem in applying individual person fit procedures in practice is the question of how much misfit it takes to make the test score an untrustworthy estimate of achievement. In this paper, we argue that if a person’s responses generally follow a monotonic pattern, the resulting test score is “good enough” to be interpreted and used. We present an approach that applies statistical procedures from the Rasch and Mokken measurement perspectives to examine individual person fit based on this good enough criterion in real data from a performance assessment. We discuss how these perspectives may facilitate thinking about applying individual person fit procedures in practice.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1673758","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49272081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-04-02DOI: 10.1080/15305058.2019.1648270
J. Moon, S. Sinharay, M. Keehner, Irvin R. Katz
The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of answer options. The results of an item response theory analysis were consistent with the hypothesized cognitive processes in alternative item formats. The findings suggest that seemingly subtle variations of item design could substantially affect test-taker cognition and psychometric outcomes, emphasizing the need for investigating item format effects at a fine-grained level.
{"title":"Investigating Technology-Enhanced Item Formats Using Cognitive and Item Response Theory Approaches","authors":"J. Moon, S. Sinharay, M. Keehner, Irvin R. Katz","doi":"10.1080/15305058.2019.1648270","DOIUrl":"https://doi.org/10.1080/15305058.2019.1648270","url":null,"abstract":"The current study examined the relationship between test-taker cognition and psychometric item properties in multiple-selection multiple-choice and grid items. In a study with content-equivalent mathematics items in alternative item formats, adult participants’ tendency to respond to an item was affected by the presence of a grid and variations of answer options. The results of an item response theory analysis were consistent with the hypothesized cognitive processes in alternative item formats. The findings suggest that seemingly subtle variations of item design could substantially affect test-taker cognition and psychometric outcomes, emphasizing the need for investigating item format effects at a fine-grained level.","PeriodicalId":46615,"journal":{"name":"International Journal of Testing","volume":null,"pages":null},"PeriodicalIF":1.7,"publicationDate":"2020-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/15305058.2019.1648270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46485223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}