Job and practice analysis is a commonly used method for determining examination content specifications. However, difficulties arise when many domains are present, as mainstream approaches do not fully adhere to the essence of the weighing process, namely a “comparison‐evaluation‐decision” framework for assigning percentage values to the content. Stemming from the principle of comparing multiple criteria for making decisions, the Analytic Hierarchy Process (AHP) provides an appropriate solution that circumvents the aforementioned obstacle. We propose using an extended version of AHP called Group AHP (GAHP) to weight content specifications for standardized medical education assessment. Specifically, GAHP is integrated with the Delphi method and expected to aid exam developers in integrating feedback from diverse experienced physicians when determining content specifications for the National Medical Licensing Examination (NMLE) in China. The complete flow of the proposed approach was demonstrated in this study with an application to the NMLE.
{"title":"Weighting Content Specifications for the National Medical Licensing Examination via Group Analytic Hierarchy Process","authors":"Xiaomei Hong, Zhehan Jiang, Hanyu Liu, Fen Cai","doi":"10.1111/emip.12620","DOIUrl":"https://doi.org/10.1111/emip.12620","url":null,"abstract":"Job and practice analysis is a commonly used method for determining examination content specifications. However, difficulties arise when many domains are present, as mainstream approaches do not fully adhere to the essence of the weighing process, namely a “comparison‐evaluation‐decision” framework for assigning percentage values to the content. Stemming from the principle of comparing multiple criteria for making decisions, the Analytic Hierarchy Process (AHP) provides an appropriate solution that circumvents the aforementioned obstacle. We propose using an extended version of AHP called Group AHP (GAHP) to weight content specifications for standardized medical education assessment. Specifically, GAHP is integrated with the Delphi method and expected to aid exam developers in integrating feedback from diverse experienced physicians when determining content specifications for the National Medical Licensing Examination (NMLE) in China. The complete flow of the proposed approach was demonstrated in this study with an application to the NMLE.","PeriodicalId":516921,"journal":{"name":"Educational Measurement: Issues and Practice","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141823629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuelan Qiu, Jimmy de la Torre, You‐Gan Wang, Jinran Wu
Multidimensional forced‐choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed, majority of which are for MFC items with binary responses. However, MFC items with polytomous responses are more informative and have many applications. This paper develops a polytomous Rasch ipsative model (pRIM) that can deal with ipsative data and yield estimates that measure construct differentiation—a latent trait that describes the degree to which the personality constructs (e.g., interests) distinguish between each other. The pRIM and its simpler form are applied to a career interests assessment containing four‐category MFC items and the measures of interests differentiation are used for both intra‐ and interpersonal comparisons. Simulations are conducted to examine the recovery of the parameters under various conditions. The results show that the parameters of the pRIM can be well recovered, particularly when a complete linking design and a large sample are used. The implications and application of the pRIM in the personality assessment using MFC items are discussed.
{"title":"Item Response Theory Models for Polytomous Multidimensional Forced‐Choice Items to Measure Construct Differentiation","authors":"Xuelan Qiu, Jimmy de la Torre, You‐Gan Wang, Jinran Wu","doi":"10.1111/emip.12621","DOIUrl":"https://doi.org/10.1111/emip.12621","url":null,"abstract":"Multidimensional forced‐choice (MFC) items have been found to be useful to reduce response biases in personality assessments. However, conventional scoring methods for the MFC items result in ipsative data, hindering the wider applications of the MFC format. In the last decade, a number of item response theory (IRT) models have been developed, majority of which are for MFC items with binary responses. However, MFC items with polytomous responses are more informative and have many applications. This paper develops a polytomous Rasch ipsative model (pRIM) that can deal with ipsative data and yield estimates that measure construct differentiation—a latent trait that describes the degree to which the personality constructs (e.g., interests) distinguish between each other. The pRIM and its simpler form are applied to a career interests assessment containing four‐category MFC items and the measures of interests differentiation are used for both intra‐ and interpersonal comparisons. Simulations are conducted to examine the recovery of the parameters under various conditions. The results show that the parameters of the pRIM can be well recovered, particularly when a complete linking design and a large sample are used. The implications and application of the pRIM in the personality assessment using MFC items are discussed.","PeriodicalId":516921,"journal":{"name":"Educational Measurement: Issues and Practice","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141362892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bharati B. Belwalkar, Matthew Schultz, Christina Curnow, J. C. Setzer
There is a growing integration of technology in the workplace (World Economic Forum), and with it, organizations are increasingly relying on advanced technological approaches for improving their human capital processes to stay relevant and competitive in complex environments. All professions must keep up with this transition and begin integrating technology into their tools and processes. This paper centers on how advanced technological approaches (such as natural language processing (NLP) and data mining) have complemented a traditional practice analysis of the accounting profession. We also discuss strategic selection and use of subject‐matter experts (SMEs) for more efficient practice analysis. The authors have adopted a triangulation process—gathering information from traditional practice analysis, using selected SMEs, and confirming findings with a novel NLP‐based approach. These methods collectively contributed to the revision of the Uniform CPA Exam blueprint and in understanding accounting trends.
{"title":"Blending Strategic Expertise and Technology: A Case Study for Practice Analysis","authors":"Bharati B. Belwalkar, Matthew Schultz, Christina Curnow, J. C. Setzer","doi":"10.1111/emip.12607","DOIUrl":"https://doi.org/10.1111/emip.12607","url":null,"abstract":"There is a growing integration of technology in the workplace (World Economic Forum), and with it, organizations are increasingly relying on advanced technological approaches for improving their human capital processes to stay relevant and competitive in complex environments. All professions must keep up with this transition and begin integrating technology into their tools and processes. This paper centers on how advanced technological approaches (such as natural language processing (NLP) and data mining) have complemented a traditional practice analysis of the accounting profession. We also discuss strategic selection and use of subject‐matter experts (SMEs) for more efficient practice analysis. The authors have adopted a triangulation process—gathering information from traditional practice analysis, using selected SMEs, and confirming findings with a novel NLP‐based approach. These methods collectively contributed to the revision of the Uniform CPA Exam blueprint and in understanding accounting trends.","PeriodicalId":516921,"journal":{"name":"Educational Measurement: Issues and Practice","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141122285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dihao Leng, Ummugul Bezirhan, Lale Khorramdel, Bethany Fishbein, M. Davier
This study capitalizes on response and process data from the computer‐based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test‐taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed‐accuracy‐revisits (SAR) model was adapted to multiple country‐by‐gender groups to examine the extent to which mathematics ability, response speed, revisit propensity, and the relationship among them differ between boys and girls. Results across 10 countries showed that boys responded to items faster on average than girls, and there was greater variation in boys’ response speed across students. A mixture distribution of revisit propensity was found for all country‐by‐gender groups. Both genders had moderate to strong negative correlations between mathematics ability and response speed, supporting the speed‐accuracy tradeoff pattern reported in the literature. Results are discussed in the context of low‐stakes assessments and in relation to the utility of the multiple‐group SAR model.
本研究利用基于计算机的 TIMSS 2019 年 "问题解决与探究 "任务中的反应和过程数据,研究八年级学生在考试行为方面的性别差异及其与数学成绩之间的关联。具体来说,我们将最近提出的分层速度-测准-重访(SAR)模型应用于多个国家的不同性别群体,以研究男生和女生在数学能力、反应速度、重访倾向以及它们之间的关系方面的差异程度。10 个国家的研究结果表明,男生对题目的平均反应速度比女生快,而且男生的反应速度在不同学生之间的差异更大。在所有国家和性别组中,重访倾向呈混合分布。男女生的数学能力与反应速度之间都存在中度到高度的负相关,这支持了文献中报道的速度-准确性权衡模式。本研究结合低分值评估以及多组 SAR 模型的实用性对结果进行了讨论。
{"title":"Examining Gender Differences in TIMSS 2019 Using a Multiple‐Group Hierarchical Speed‐Accuracy‐Revisits Model","authors":"Dihao Leng, Ummugul Bezirhan, Lale Khorramdel, Bethany Fishbein, M. Davier","doi":"10.1111/emip.12606","DOIUrl":"https://doi.org/10.1111/emip.12606","url":null,"abstract":"This study capitalizes on response and process data from the computer‐based TIMSS 2019 Problem Solving and Inquiry tasks to investigate gender differences in test‐taking behaviors and their association with mathematics achievement at the eighth grade. Specifically, a recently proposed hierarchical speed‐accuracy‐revisits (SAR) model was adapted to multiple country‐by‐gender groups to examine the extent to which mathematics ability, response speed, revisit propensity, and the relationship among them differ between boys and girls. Results across 10 countries showed that boys responded to items faster on average than girls, and there was greater variation in boys’ response speed across students. A mixture distribution of revisit propensity was found for all country‐by‐gender groups. Both genders had moderate to strong negative correlations between mathematics ability and response speed, supporting the speed‐accuracy tradeoff pattern reported in the literature. Results are discussed in the context of low‐stakes assessments and in relation to the utility of the multiple‐group SAR model.","PeriodicalId":516921,"journal":{"name":"Educational Measurement: Issues and Practice","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140663098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}