The emergence and development of Bayesian psychometrics is a result of psychometrics' desire to reduce measurement error. This book is the first to present a systematic description of the Bayesian approach in psychometric research. The book consists of two parts: the first one presents the main principles of the Bayesian approach (Foundations), the second one includes their application in psychometric modeling (Psychometrics). The reviewer believes that the publication will be useful for those who used to work in the frequentist approach and would like to learn about the Bayesian approach. At the same time, she recommends those who are not sure of the quality of their mathematical background to additionally turn to other sources that are not equipped with such a detailed mathematical description. The book has not been translated into Russian.
{"title":"Power of Probability in Psychometrics. Review of the book “Bayesian Psychometric Modeling“","authors":"Ирина Угланова","doi":"10.17323/vo-2023-17952","DOIUrl":"https://doi.org/10.17323/vo-2023-17952","url":null,"abstract":"The emergence and development of Bayesian psychometrics is a result of psychometrics' desire to reduce measurement error. This book is the first to present a systematic description of the Bayesian approach in psychometric research. The book consists of two parts: the first one presents the main principles of the Bayesian approach (Foundations), the second one includes their application in psychometric modeling (Psychometrics). The reviewer believes that the publication will be useful for those who used to work in the frequentist approach and would like to learn about the Bayesian approach. At the same time, she recommends those who are not sure of the quality of their mathematical background to additionally turn to other sources that are not equipped with such a detailed mathematical description. The book has not been translated into Russian.
","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"51 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135545200","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Psychological theories regarding ability and personality traits often rely on the results of psychometric modelling. The latter is assumed to link responses to test items to an unobserved 'construct' (trait, ability), which is 'modelled' from the test data. However, does the agreement between the data and the model indicate that the model represents a psychological construct? To what extent is ‘psychometric modelling’ modelling in the general scientific sense of the term? The validity of using modelling data to understand psychological phenomena depends on the answer to these questions. The article analyses the logic of psychometric modelling in comparison with modelling in other sciences and argues that psychological phenomena as a subject of modelling are not involved neither in the construction nor in the correction of models. The problem of unjustified interpretations of modelling results in psychology and their undesirable consequences for psychological theory is raised. At the same time, the use of psychometric modelling for human resource decision-making is still waiting for its evaluation.
{"title":"Is Psychometrics So Useful for Academic Psychology?","authors":"Юлия Тюменева","doi":"10.17323/vo-2023-16781","DOIUrl":"https://doi.org/10.17323/vo-2023-16781","url":null,"abstract":"Psychological theories regarding ability and personality traits often rely on the results of psychometric modelling. The latter is assumed to link responses to test items to an unobserved 'construct' (trait, ability), which is 'modelled' from the test data. However, does the agreement between the data and the model indicate that the model represents a psychological construct? To what extent is ‘psychometric modelling’ modelling in the general scientific sense of the term? The validity of using modelling data to understand psychological phenomena depends on the answer to these questions. The article analyses the logic of psychometric modelling in comparison with modelling in other sciences and argues that psychological phenomena as a subject of modelling are not involved neither in the construction nor in the correction of models. The problem of unjustified interpretations of modelling results in psychology and their undesirable consequences for psychological theory is raised. At the same time, the use of psychometric modelling for human resource decision-making is still waiting for its evaluation.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"44 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The article considered several issues of the relationships between cognitive psychology and psychometrics. Cognitive psychology has mainly developed within experimental paradigm in psychology, whereas psychometrics has developed within a different paradigm – assessment of individual differences and correlational studies. In the article it has been considered a brief history of the development of relationships between experimental studies and psychometrics, from the end of 19th century to the present. The historical view allows understanding problems in the use of experimental tasks for assessing individual differences and obstacles to the widespread of use psychometric models in experimental studies. Several recommendations are proposed to improve the accuracy of measurements of individual differences in cognitive abilities and processes, from psychometric perspectives.
{"title":"Psychometrics and Cognitive Research: Contradictions and Possibility for Cooperation","authors":"Юлия Кузьмина","doi":"10.17323/vo-2023-16875","DOIUrl":"https://doi.org/10.17323/vo-2023-16875","url":null,"abstract":"The article considered several issues of the relationships between cognitive psychology and psychometrics. Cognitive psychology has mainly developed within experimental paradigm in psychology, whereas psychometrics has developed within a different paradigm – assessment of individual differences and correlational studies. In the article it has been considered a brief history of the development of relationships between experimental studies and psychometrics, from the end of 19th century to the present. The historical view allows understanding problems in the use of experimental tasks for assessing individual differences and obstacles to the widespread of use psychometric models in experimental studies. Several recommendations are proposed to improve the accuracy of measurements of individual differences in cognitive abilities and processes, from psychometric perspectives. 
","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"44 7","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ability to cope with the challenges of cultural diversity is one of the key competencies that students need to develop in the contemporary world. This actualizes the question of intercultural learning and students' attitudes towards it. The study aimed at development and validation of the instrument to measure students’ attitudes towards intercultural learning. Taking into account the “sources” and processes of intercultural learning available to students, as well as the structure of the attitude, an instrument consisting of four scales were created - each of them includes three aspects of the attitudes: cognitive, affective and behavioral ones. The reliability and validity of each of the instrument were confirmed on a sample of 399 students of Russian universities using statistical procedures such as exploratory and confirmatory factor analysis, analysis of the consistency of the scales according to Cronbach's alpha and McDonald’s omega, analysis of extracted mean variance and heterotrait-monotrait ratio. It seems promising to consider various aspects of intercultural learning in further research, adapting and using the scales not only in the higher education system, but also in other contexts and age groups.
{"title":"Approbation of the Scale of Attitudes towards Intercultural Learning in Higher Education","authors":"Мария Бульцева, Соня Алехандра Берриос Кальехас","doi":"10.17323/vo-2023-16819","DOIUrl":"https://doi.org/10.17323/vo-2023-16819","url":null,"abstract":"The ability to cope with the challenges of cultural diversity is one of the key competencies that students need to develop in the contemporary world. This actualizes the question of intercultural learning and students' attitudes towards it. The study aimed at development and validation of the instrument to measure students’ attitudes towards intercultural learning. Taking into account the “sources” and processes of intercultural learning available to students, as well as the structure of the attitude, an instrument consisting of four scales were created - each of them includes three aspects of the attitudes: cognitive, affective and behavioral ones. The reliability and validity of each of the instrument were confirmed on a sample of 399 students of Russian universities using statistical procedures such as exploratory and confirmatory factor analysis, analysis of the consistency of the scales according to Cronbach's alpha and McDonald’s omega, analysis of extracted mean variance and heterotrait-monotrait ratio. It seems promising to consider various aspects of intercultural learning in further research, adapting and using the scales not only in the higher education system, but also in other contexts and age groups.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"44 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Measuring students’ growth and change is considered one of the main ways for evidence-based development of educational systems. However, it is a non-trivial methodological task, despite the numerous approaches available for its conceptualization and statistical realization. In this article, we describe the main features of measuring students' growth and change using Item Response Theory (IRT) in detail. We then expand this approach to allow for the modeling of cognitive operations with the Linear Logistic Test Model (LLTM). We show that the synthesis of traditional IRT models for measuring growth and change with LLTM significantly enriches the interpretability of ability estimates while preserving the advantages of the traditional approach. To illustrate this approach, we use a set of monitoring tests to measure educational progress in mathematics in secondary school.
{"title":"Measuring Learning Progress Based on Cognitive Operations","authors":"Сергей Тарасов, Ирина Зуева, Денис Федерякин","doi":"10.17323/vo-2023-16902","DOIUrl":"https://doi.org/10.17323/vo-2023-16902","url":null,"abstract":"Measuring students’ growth and change is considered one of the main ways for evidence-based development of educational systems. However, it is a non-trivial methodological task, despite the numerous approaches available for its conceptualization and statistical realization. In this article, we describe the main features of measuring students' growth and change using Item Response Theory (IRT) in detail. We then expand this approach to allow for the modeling of cognitive operations with the Linear Logistic Test Model (LLTM). We show that the synthesis of traditional IRT models for measuring growth and change with LLTM significantly enriches the interpretability of ability estimates while preserving the advantages of the traditional approach. To illustrate this approach, we use a set of monitoring tests to measure educational progress in mathematics in secondary school.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"43 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the significant lack of questionnaires is a scores distortion for the measured constructs, associated with the social desirability effects. An even greater threat to the validity of decisions is social desirability in high-stakes evaluation, such as selection for a position. Moreover the issue of the relationship between different components of social desirability and the most frequently measured personal constructs remains debatable. In the material of the author's normative questionnaire of universal competencies, an approach is considered for making adjustments to the final scores for measured constructs using the developed scales of egoistic and moralistic social desirability. Also discussed the prospect of using statement formulations that are neutral to social desirability or express the most positive degree of measured indicators.
The empirical basis of this study is data gathered within a pilot conducted in the spring of 2022, during which data were obtained from 579 respondents in 49 measurable competencies. The analysis was aimed at assessing the quality of the developed scales of social desirability and modeling of each of the universal competencies scales was carried out with the inclusion of a scale of social desirability. The data were analyzed in the framework of structural modeling - confirmatory factor analysis (CFA) using bifactor models for each of the measured competencies.
According to the results of this study, the use of the scale of egoistic social desirability as a measure for adjusting factor scores for the competencies has generally satisfactory psychometric statistics, but there is concern about the relatively large measurement error. The paper discusses the advantages and disadvantages of this approach and other practices that are most often used to reduce the effects of social desirability in the academic and business environment.
{"title":"Experience of Using Bifactor Models to Reduce the Effects of Social Desirability on the Normative Questionnaire of Universal Competencies","authors":"Егор Сагитов, Ирина Брун, Станислав Павлов","doi":"10.17323/vo-2023-16827","DOIUrl":"https://doi.org/10.17323/vo-2023-16827","url":null,"abstract":"One of the significant lack of questionnaires is a scores distortion for the measured constructs, associated with the social desirability effects. An even greater threat to the validity of decisions is social desirability in high-stakes evaluation, such as selection for a position. Moreover the issue of the relationship between different components of social desirability and the most frequently measured personal constructs remains debatable. In the material of the author's normative questionnaire of universal competencies, an approach is considered for making adjustments to the final scores for measured constructs using the developed scales of egoistic and moralistic social desirability. Also discussed the prospect of using statement formulations that are neutral to social desirability or express the most positive degree of measured indicators.
 The empirical basis of this study is data gathered within a pilot conducted in the spring of 2022, during which data were obtained from 579 respondents in 49 measurable competencies. The analysis was aimed at assessing the quality of the developed scales of social desirability and modeling of each of the universal competencies scales was carried out with the inclusion of a scale of social desirability. The data were analyzed in the framework of structural modeling - confirmatory factor analysis (CFA) using bifactor models for each of the measured competencies.
 According to the results of this study, the use of the scale of egoistic social desirability as a measure for adjusting factor scores for the competencies has generally satisfactory psychometric statistics, but there is concern about the relatively large measurement error. The paper discusses the advantages and disadvantages of this approach and other practices that are most often used to reduce the effects of social desirability in the academic and business environment.
","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"44 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724146","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This book continues a series of books on the methodology of educational testing and assessment written by leading psychometricians and researchers in the field of educational assessment. Computational psychometrics is defined as the combination of computer science methods and psychometric measurement principles for analysing data obtained as a result of testing using technologically advanced test formats. The first part of the book discusses the changes that have occurred in teaching and educational assessment under the influence of digital technologies. The second part provides an overview of computational psychometric methods: from traditional psychometric models to machine learning technologies.
The material in the book can be useful to students and researchers in the field of psychometrics who are involved in the development, design and analysis of learning systems and measurements using complex test formats and data. The strength of the book is an electronic application containing the code of the R or Python programming environment for the methodological chapters.
{"title":"Computational Psychometrics: Near Future or Reality","authors":"Дарья Грачева, Ксения Тарасова","doi":"10.17323/vo-2023-17938","DOIUrl":"https://doi.org/10.17323/vo-2023-17938","url":null,"abstract":"This book continues a series of books on the methodology of educational testing and assessment written by leading psychometricians and researchers in the field of educational assessment. Computational psychometrics is defined as the combination of computer science methods and psychometric measurement principles for analysing data obtained as a result of testing using technologically advanced test formats. The first part of the book discusses the changes that have occurred in teaching and educational assessment under the influence of digital technologies. The second part provides an overview of computational psychometric methods: from traditional psychometric models to machine learning technologies.
 The material in the book can be useful to students and researchers in the field of psychometrics who are involved in the development, design and analysis of learning systems and measurements using complex test formats and data. The strength of the book is an electronic application containing the code of the R or Python programming environment for the methodological chapters.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"44 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724152","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qualitative measurement is a fundamental requirement for research practice in the social sciences. The quality of measurements determines the validity of the interpretations, conclusions and decisions we can make based on the data obtained from the measurements. Qualitative measurement in the social sciences requires assessment tools as well as data analysis methods to link observed measurement results to theoretical attributes. The scientific basis for their development is provided by psychometrics.
Preceding the special issue of the journal "Voprosy obrazovaniya / Educational Studies Moscow" devoted to psychometrics, the guest editors of this issue cover the main milestones of the history of psychometrics, highlight some significant publications, note the professional institutions and authors who have made their valuable contribution to the development of this branch of science. The authors pay special attention to the history of psychometrics in Russia. Assessing the possibilities, prospects and limitations of psychometrics, the authors express their point of view on the debatable issues of psychometrics, and it does not always coincide with the opinion of the authors of the special issue. This issue presents examples of using modern psychometric methods to solve actual problems in education research, as well as in research at the intersection of education and psychology, education and different spheres of business. All the authors of the presented articles are united by the desire to improve research practice in the social sciences through truly qualitative measurements.
{"title":"Psychometric Research: Modern Methods and New Opportunities for Education","authors":"Елена Карданова, Алина Иванова","doi":"10.17323/vo-2023-17951","DOIUrl":"https://doi.org/10.17323/vo-2023-17951","url":null,"abstract":"Qualitative measurement is a fundamental requirement for research practice in the social sciences. The quality of measurements determines the validity of the interpretations, conclusions and decisions we can make based on the data obtained from the measurements. Qualitative measurement in the social sciences requires assessment tools as well as data analysis methods to link observed measurement results to theoretical attributes. The scientific basis for their development is provided by psychometrics.
 Preceding the special issue of the journal \"Voprosy obrazovaniya / Educational Studies Moscow\" devoted to psychometrics, the guest editors of this issue cover the main milestones of the history of psychometrics, highlight some significant publications, note the professional institutions and authors who have made their valuable contribution to the development of this branch of science. The authors pay special attention to the history of psychometrics in Russia. Assessing the possibilities, prospects and limitations of psychometrics, the authors express their point of view on the debatable issues of psychometrics, and it does not always coincide with the opinion of the authors of the special issue. This issue presents examples of using modern psychometric methods to solve actual problems in education research, as well as in research at the intersection of education and psychology, education and different spheres of business. All the authors of the presented articles are united by the desire to improve research practice in the social sciences through truly qualitative measurements.
","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"44 8","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The current study investigates the question of test difficulty decomposition depending on the characteristics of items (such as: format, belonging to the type of text to which the item belongs) and the reader's actions required to answer it (search for information in the text, simple conclusions, complex conclusions, critical interpretation of the text). The sample of the study consisted of fourth grade elementary school students in Krasnoyarsk, who completed the computerized test of reading literacy "Progress" in the spring of 2022. Research method: psychometric modeling using the LLTM+e model. Research hypothesis: the decomposition of item difficulties will help to prove that the reading actions required to complete the tasks will form a hierarchy of difficulties similar to traditional taxonomies (B. Bloom), that is, reading skills aimed at analyzing, synthesizing, interpreting information will give tasks greater difficulty than simple conclusions, and those, in turn, will make tasks more difficult than the reader's actions to find information in the text. The results show that the assignment of items to the group of reader's actions is a significant factor. The size of the effects does not allow us to speak of a strict hierarchy, but when other attributes are controlled, the tasks for information retrieval in an explicit form are easier for students than the tasks for complex conclusions and for critical understanding of the text.
{"title":"Decomposing Difficulty of Reading Literacy Test Items","authors":"Алина Иванова, Инна Антипкина","doi":"10.17323/vo-2023-16925","DOIUrl":"https://doi.org/10.17323/vo-2023-16925","url":null,"abstract":"The current study investigates the question of test difficulty decomposition depending on the characteristics of items (such as: format, belonging to the type of text to which the item belongs) and the reader's actions required to answer it (search for information in the text, simple conclusions, complex conclusions, critical interpretation of the text). The sample of the study consisted of fourth grade elementary school students in Krasnoyarsk, who completed the computerized test of reading literacy \"Progress\" in the spring of 2022. Research method: psychometric modeling using the LLTM+e model. Research hypothesis: the decomposition of item difficulties will help to prove that the reading actions required to complete the tasks will form a hierarchy of difficulties similar to traditional taxonomies (B. Bloom), that is, reading skills aimed at analyzing, synthesizing, interpreting information will give tasks greater difficulty than simple conclusions, and those, in turn, will make tasks more difficult than the reader's actions to find information in the text. The results show that the assignment of items to the group of reader's actions is a significant factor. The size of the effects does not allow us to speak of a strict hierarchy, but when other attributes are controlled, the tasks for information retrieval in an explicit form are easier for students than the tasks for complex conclusions and for critical understanding of the text.","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"44 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In education, much attention is paid to the development and evaluation of universal skills in schoolchildren. At the same time, the assessment of universal skills requires new test formats based on the observed actions of the student in the digital environment. Scenario-based contextual tasks serve as a promising format. However, the contextual diversity of such tasks can make it difficult to compare results obtained from different scenario tasks. This article aims to analyze the role of scenario task context in measuring two universal skills: critical thinking and communication. The work uses the methods of Generalizability Theory, which allows to analyze to what extent the results can be generalized for other contexts of scenario tasks, and how, by changing the number of indicators or scenario contexts, to ensure satisfied measurement reliability. The study is based on data from fourth-grade students who were tested with various scenario-based tasks of the “4K” instrument. The results of the analysis showed that the behavior of the test-takers differs in scenarios with different contexts, while the difficulties of the contexts are almost the same. To achieve satisfactory reliability, it is recommended to use at least two scenarios with different contexts, and the use of three or more scenarios with different contexts will reduce the number of indicators without loss of reliability. Also, the study evaluated the role of context when using alternative scenario-based tasks forms were used. The alternative forms were similar in the main problem and plot of the scenario, but differed in topic (content). Changing only the content of the scenario makes it possible to generalize the results across scenario forms, that is, alternative forms can be used interchangeably. This study demonstrates how Generalization Theory can be used to optimize the development of tasks, taking into account the requirements for measurement reliability.
{"title":"The Role of Context in Scenario-Based Tasks for Measuring Universal Skills: The Use of Generalizability Theory","authors":"Дарья Грачева","doi":"10.17323/vo-2023-16901","DOIUrl":"https://doi.org/10.17323/vo-2023-16901","url":null,"abstract":"In education, much attention is paid to the development and evaluation of universal skills in schoolchildren. At the same time, the assessment of universal skills requires new test formats based on the observed actions of the student in the digital environment. Scenario-based contextual tasks serve as a promising format. However, the contextual diversity of such tasks can make it difficult to compare results obtained from different scenario tasks. This article aims to analyze the role of scenario task context in measuring two universal skills: critical thinking and communication. The work uses the methods of Generalizability Theory, which allows to analyze to what extent the results can be generalized for other contexts of scenario tasks, and how, by changing the number of indicators or scenario contexts, to ensure satisfied measurement reliability. The study is based on data from fourth-grade students who were tested with various scenario-based tasks of the “4K” instrument. The results of the analysis showed that the behavior of the test-takers differs in scenarios with different contexts, while the difficulties of the contexts are almost the same. To achieve satisfactory reliability, it is recommended to use at least two scenarios with different contexts, and the use of three or more scenarios with different contexts will reduce the number of indicators without loss of reliability. Also, the study evaluated the role of context when using alternative scenario-based tasks forms were used. The alternative forms were similar in the main problem and plot of the scenario, but differed in topic (content). Changing only the content of the scenario makes it possible to generalize the results across scenario forms, that is, alternative forms can be used interchangeably. This study demonstrates how Generalization Theory can be used to optimize the development of tasks, taking into account the requirements for measurement reliability.
 
","PeriodicalId":54119,"journal":{"name":"Voprosy Obrazovaniya-Educational Studies Moscow","volume":"43 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135724035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}