Based on the academic performance grades of university students, various high-stakes decisions are made, including determinations of pass/fail status, the awarding of diplomas, and eligibility for placement in graduate education programs. According to the criteria used, the types of assessment are divided into two assessment, criterion-referenced assessments and norm-referenced assessments. When the grading system of state universities in Turkish higher education is examined, it has been observed that some universities use criterion-referenced assessment, some use norm-referenced assessment, and some use both assessment systems. The purpose of this research is to examine whether inter-university grading systems show significant concordance in the context of university students' letter grades or not. In other words, it is to reveal whether there are skew in the grading systems of public universities. In this context, 250 individuals were simulated in a way that their class/group achievement level would show a normal distribution. Among the public universities in the 2021-2022 Academic Performance Ranking of Universities (URAP), four state universities were determined in the first quarter, second quarter, third quarter, and last quarter. The letter grades of each student's academic success grade in the relevant universities were determined and it was examined whether there was a significant concordance between the letter grades of the students. In the study, it was concluded that in the context of university students' letter grades, inter-university grading systems generally do not show significant concordance. The findings are expected to contribute to the work of the Council of Higher Education and the University Education Commissions.
{"title":"The complexity of the grading system in Turkish higher education","authors":"Recep Gür, Mustafa Köroğlu","doi":"10.21449/ijate.1266808","DOIUrl":"https://doi.org/10.21449/ijate.1266808","url":null,"abstract":"Based on the academic performance grades of university students, various high-stakes decisions are made, including determinations of pass/fail status, the awarding of diplomas, and eligibility for placement in graduate education programs. According to the criteria used, the types of assessment are divided into two assessment, criterion-referenced assessments and norm-referenced assessments. When the grading system of state universities in Turkish higher education is examined, it has been observed that some universities use criterion-referenced assessment, some use norm-referenced assessment, and some use both assessment systems. The purpose of this research is to examine whether inter-university grading systems show significant concordance in the context of university students' letter grades or not. In other words, it is to reveal whether there are skew in the grading systems of public universities. In this context, 250 individuals were simulated in a way that their class/group achievement level would show a normal distribution. Among the public universities in the 2021-2022 Academic Performance Ranking of Universities (URAP), four state universities were determined in the first quarter, second quarter, third quarter, and last quarter. The letter grades of each student's academic success grade in the relevant universities were determined and it was examined whether there was a significant concordance between the letter grades of the students. In the study, it was concluded that in the context of university students' letter grades, inter-university grading systems generally do not show significant concordance. The findings are expected to contribute to the work of the Council of Higher Education and the University Education Commissions.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139162673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Global competence is a comprehensive term referring to the interconnectedness of various constructs ranging from knowledge to values required to communicate, cooperate, and work towards the well-being of not only the local but also the global community. Teacher education has an important role in preparing teachers equipped with global competences. Therefore, having tools which can validly and reliably measure if and to what extent pre-service teachers are globally competent is a requisite. Hence, this study aimed at adapting and validating the Global Competence scale developed by Liu et al. (2020) to Turkish to measure pre-service English language teachers’ global competences and to obtain evidence regarding the psychometric properties of the scale to measure global competences in teaching and teacher education. The data collected from pre-service English language teachers (N=351) studying at various universities in Türkiye was divided into two equal halves. The first part of it was used to perform exploratory factor analysis which revealed an eight-factor 29-item structure. The second half which was used for confirmatory factor analysis yielded a good fit of a 25-item, eight-factor structure scale. The Cronbach’s Alpha coefficient (α= .88) and McDonald’s Omega (ω = .89) which indicated good internal consistency in the CFA dataset revealed excellent internal consistency (α= .90, ω = .91) in another independent dataset. Thus, the study revealed that the Global Competence scale has a good level of psychometric properties and reliability to measure pre-service English language teachers’ global competences in the Turkish context.
{"title":"Global competence scale: An adaptation to measure pre-service English teachers’ global competences","authors":"İsmail Emre Köş, H. Çelik","doi":"10.21449/ijate.1260245","DOIUrl":"https://doi.org/10.21449/ijate.1260245","url":null,"abstract":"Global competence is a comprehensive term referring to the interconnectedness of various constructs ranging from knowledge to values required to communicate, cooperate, and work towards the well-being of not only the local but also the global community. Teacher education has an important role in preparing teachers equipped with global competences. Therefore, having tools which can validly and reliably measure if and to what extent pre-service teachers are globally competent is a requisite. Hence, this study aimed at adapting and validating the Global Competence scale developed by Liu et al. (2020) to Turkish to measure pre-service English language teachers’ global competences and to obtain evidence regarding the psychometric properties of the scale to measure global competences in teaching and teacher education. The data collected from pre-service English language teachers (N=351) studying at various universities in Türkiye was divided into two equal halves. The first part of it was used to perform exploratory factor analysis which revealed an eight-factor 29-item structure. The second half which was used for confirmatory factor analysis yielded a good fit of a 25-item, eight-factor structure scale. The Cronbach’s Alpha coefficient (α= .88) and McDonald’s Omega (ω = .89) which indicated good internal consistency in the CFA dataset revealed excellent internal consistency (α= .90, ω = .91) in another independent dataset. Thus, the study revealed that the Global Competence scale has a good level of psychometric properties and reliability to measure pre-service English language teachers’ global competences in the Turkish context.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139163143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The proliferation of large language models represents a paradigm shift in the landscape of automated essay scoring (AES) systems, fundamentally elevating their accuracy and efficacy. This study presents an extensive examination of large language models, with a particular emphasis on the transformative influence of transformer-based models, such as BERT, mBERT, LaBSE, and GPT, in augmenting the accuracy of multilingual AES systems. The exploration of these advancements within the context of the Turkish language serves as a compelling illustration of the potential for harnessing large language models to elevate AES performance in in low-resource linguistic environments. Our study provides valuable insights for the ongoing discourse on the intersection of artificial intelligence and educational assessment.
{"title":"Language Models in Automated Essay Scoring: Insights for the Turkish Language","authors":"Tahereh Firoozi, Okan Bulut, Mark J. Gierl","doi":"10.21449/ijate.1394194","DOIUrl":"https://doi.org/10.21449/ijate.1394194","url":null,"abstract":"The proliferation of large language models represents a paradigm shift in the landscape of automated essay scoring (AES) systems, fundamentally elevating their accuracy and efficacy. This study presents an extensive examination of large language models, with a particular emphasis on the transformative influence of transformer-based models, such as BERT, mBERT, LaBSE, and GPT, in augmenting the accuracy of multilingual AES systems. The exploration of these advancements within the context of the Turkish language serves as a compelling illustration of the potential for harnessing large language models to elevate AES performance in in low-resource linguistic environments. Our study provides valuable insights for the ongoing discourse on the intersection of artificial intelligence and educational assessment.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138965817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The main purpose of this study is to examine the Type I error and statistical power ratios of Differential Item Functioning (DIF) techniques based on different theories under different conditions. For this purpose, a simulation study was conducted by using Mantel-Haenszel (MH), Logistic Regression (LR), Lord’s χ2, and Raju’s Areas Measures techniques. In the simulation-based research model, the two-parameter item response model, group’s ability distribution, and DIF type were the fixed conditions while sample size (1800, 3000), rates of sample size (0.50, 1), test length (20, 80) and DIF- containing item rate (0, 0.05, 0.10) were manipulated conditions. The total number of conditions is 24 (2x2x2x3), and statistical analysis was performed in the R software. The current study found that the Type I error rates in all conditions were higher than the nominal error level. It was also demonstrated that MH had the highest error rate while Raju’s Areas Measures had the lowest error rate. Also, MH produced the highest statistical power rates. The analysis of the findings of Type 1 error and statistical power rates illustrated that techniques based on both of the theories performed better in the 1800 sample size. Furthermore, the increase in the sample size affected techniques based on CTT rather than IRT. Also, the findings demonstrated that the techniques’ Type 1 error rates were lower while their statistical power rates were higher under conditions where the test length was 80, and the sample sizes were not equal.
本研究的主要目的是考察基于不同理论的差异项目功能(DIF)技术在不同条件下的 I 类错误和统计功率比。为此,我们使用 Mantel-Haenszel (MH)、Logistic Regression (LR)、Lord's χ2 和 Raju's Areas Measures 技术进行了模拟研究。在基于模拟的研究模型中,双参数项目反应模型、群体能力分布和 DIF 类型是固定条件,而样本量(1800、3000)、样本量比率(0.50、1)、测试长度(20、80)和含有 DIF 的项目比率(0、0.05、0.10)是可操作条件。条件总数为 24 个(2x2x2x3),统计分析在 R 软件中进行。本次研究发现,所有条件下的 I 类错误率均高于标称错误水平。研究还表明,MH 误差率最高,而 Raju 的 Areas Measures 误差率最低。此外,MH 的统计功率率最高。对第一类误差和统计功率率的分析结果表明,基于这两种理论的技术在 1800 个样本量时表现更好。此外,样本量的增加对基于 CTT 而非 IRT 的技术产生了影响。此外,研究结果还表明,在测试长度为 80 个样本且样本量不相等的情况下,技术的 1 类错误率较低,而统计功率率较高。
{"title":"Type I error and power rates: A comparative analysis of techniques in differential item functioning","authors":"Ayşe BİLİCİOĞLU GÜNEŞ, Bayram Biçak","doi":"10.21449/ijate.1368341","DOIUrl":"https://doi.org/10.21449/ijate.1368341","url":null,"abstract":"The main purpose of this study is to examine the Type I error and statistical power ratios of Differential Item Functioning (DIF) techniques based on different theories under different conditions. For this purpose, a simulation study was conducted by using Mantel-Haenszel (MH), Logistic Regression (LR), Lord’s χ2, and Raju’s Areas Measures techniques. In the simulation-based research model, the two-parameter item response model, group’s ability distribution, and DIF type were the fixed conditions while sample size (1800, 3000), rates of sample size (0.50, 1), test length (20, 80) and DIF- containing item rate (0, 0.05, 0.10) were manipulated conditions. The total number of conditions is 24 (2x2x2x3), and statistical analysis was performed in the R software. The current study found that the Type I error rates in all conditions were higher than the nominal error level. It was also demonstrated that MH had the highest error rate while Raju’s Areas Measures had the lowest error rate. Also, MH produced the highest statistical power rates. The analysis of the findings of Type 1 error and statistical power rates illustrated that techniques based on both of the theories performed better in the 1800 sample size. Furthermore, the increase in the sample size affected techniques based on CTT rather than IRT. Also, the findings demonstrated that the techniques’ Type 1 error rates were lower while their statistical power rates were higher under conditions where the test length was 80, and the sample sizes were not equal.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138983194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mathematics, by nature, has a spiral structure. The fact that the students fail to acquire an achievement in the mathematics course negatively affects other achievements. Therefore, students' mathematics failure continues exponentially. In this context, it is important to identify and eliminate students’ weaknesses, if any, with the formative assessments in the learning process. The present study aimed to improve the success level of the students with the formative assessment-based teaching practice about circle and circular region in secondary school seventh-grade mathematics teaching and implement an exemplary application for formative assessment-based teaching. Since the study sought to evaluate the learning process of the students and eliminate the weaknesses identified in the process, an action research design, one of the methods of qualitative research, was used. The study was carried out with 34 seventh-grade secondary school students. Data were collected and analyzed descriptively through observation, interview and various formative assessment tools used in the process. At the end of the three-week implementation period, substantial improvement was observed in the achievements of low and medium-achieving students, and the overall success of the students changed positively at the end of the process.
{"title":"An examplary application of mathematics teaching based on formative assessment","authors":"Hilal Özcan, Aytaç Kurtulus","doi":"10.21449/ijate.1352605","DOIUrl":"https://doi.org/10.21449/ijate.1352605","url":null,"abstract":"Mathematics, by nature, has a spiral structure. The fact that the students fail to acquire an achievement in the mathematics course negatively affects other achievements. Therefore, students' mathematics failure continues exponentially. In this context, it is important to identify and eliminate students’ weaknesses, if any, with the formative assessments in the learning process. The present study aimed to improve the success level of the students with the formative assessment-based teaching practice about circle and circular region in secondary school seventh-grade mathematics teaching and implement an exemplary application for formative assessment-based teaching. Since the study sought to evaluate the learning process of the students and eliminate the weaknesses identified in the process, an action research design, one of the methods of qualitative research, was used. The study was carried out with 34 seventh-grade secondary school students. Data were collected and analyzed descriptively through observation, interview and various formative assessment tools used in the process. At the end of the three-week implementation period, substantial improvement was observed in the achievements of low and medium-achieving students, and the overall success of the students changed positively at the end of the process.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139258796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The increasing volume of large-scale assessment data poses a challenge for testing organizations to manage data and conduct psychometric analysis efficiently. Traditional psychometric software presents barriers, such as a lack of functionality for managing data and conducting various standard psychometric analyses efficiently. These challenges have resulted in high costs to achieve the desired research and analysis outcomes. To address these challenges, we have designed and implemented a modernized data pipeline that allows psychometricians and statisticians to efficiently manage the data, conduct psychometric analysis, generate technical reports, and perform quality assurance to validate the required outputs. This modernized pipeline has proven to scale with large databases, decrease human error by reducing manual processes, efficiently make complex workloads repeatable, ensure high quality of the outputs, and reduce overall costs of psychometric analysis of large-scale assessment data. This paper aims to provide information to support the modernization of the current psychometric analysis practices. We shared details on the workflow design and functionalities of our modernized data pipeline, which provide a universal interface to large-scale assessments. The methods for developing non-technical and user-friendly interfaces will also be discussed.
{"title":"A data pipeline for e-large-scale assessments: Better automation, quality assurance, and efficiency","authors":"Ryan Schwarz, H. Bulut, Charles Ani̇fowose","doi":"10.21449/ijate.1321061","DOIUrl":"https://doi.org/10.21449/ijate.1321061","url":null,"abstract":"The increasing volume of large-scale assessment data poses a challenge for testing organizations to manage data and conduct psychometric analysis efficiently. Traditional psychometric software presents barriers, such as a lack of functionality for managing data and conducting various standard psychometric analyses efficiently. These challenges have resulted in high costs to achieve the desired research and analysis outcomes. To address these challenges, we have designed and implemented a modernized data pipeline that allows psychometricians and statisticians to efficiently manage the data, conduct psychometric analysis, generate technical reports, and perform quality assurance to validate the required outputs. This modernized pipeline has proven to scale with large databases, decrease human error by reducing manual processes, efficiently make complex workloads repeatable, ensure high quality of the outputs, and reduce overall costs of psychometric analysis of large-scale assessment data. This paper aims to provide information to support the modernization of the current psychometric analysis practices. We shared details on the workflow design and functionalities of our modernized data pipeline, which provide a universal interface to large-scale assessments. The methods for developing non-technical and user-friendly interfaces will also be discussed.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139255017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study aims at investigating the impact of visual arts activities on the socialization and stress management of individuals with special needs. This is a qualitative research study that employs "action research" and our data were collected based on the observations of teachers. Over a 20-week period, visual arts activities were carried out with 27 individuals with special needs, including six with autism, seven with Down syndrome, and 14 with moderate to severe intellectual disabilities, who received education at the third level in the "Fehmi Cerrahoğlu Special Education Practice School" in Ordu province during the 2020-2021 and 2021-2022 academic years. The study group included a counselling teacher and 19 special education teachers, who observed the activities and their effects on the socialization levels and stress management of educable individuals with special needs. The data obtained from semi-structured interviews were analyzed using content analysis. Most of the participating teachers agreed that visual arts activities contributed to the socialization and stress management of individuals with special needs, and the study found that these activities played an important role in the inclusion of individuals with special needs in society and led to a decrease in stress symptoms.
{"title":"The effect of visual art activities on socialization and stress management of individuals with special needs","authors":"Kıymet Bayer, Seda LİMAN TURAN","doi":"10.21449/ijate.1269977","DOIUrl":"https://doi.org/10.21449/ijate.1269977","url":null,"abstract":"This study aims at investigating the impact of visual arts activities on the socialization and stress management of individuals with special needs. This is a qualitative research study that employs \"action research\" and our data were collected based on the observations of teachers. Over a 20-week period, visual arts activities were carried out with 27 individuals with special needs, including six with autism, seven with Down syndrome, and 14 with moderate to severe intellectual disabilities, who received education at the third level in the \"Fehmi Cerrahoğlu Special Education Practice School\" in Ordu province during the 2020-2021 and 2021-2022 academic years. The study group included a counselling teacher and 19 special education teachers, who observed the activities and their effects on the socialization levels and stress management of educable individuals with special needs. The data obtained from semi-structured interviews were analyzed using content analysis. Most of the participating teachers agreed that visual arts activities contributed to the socialization and stress management of individuals with special needs, and the study found that these activities played an important role in the inclusion of individuals with special needs in society and led to a decrease in stress symptoms.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139288639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Differential item functioning (DIF) detection was handled based on “Mantel-Haenszel (MH)”, “Simultaneous item bias test (SIBTEST)”, “Lord's chi-square”, “Raju's area” methods when item purification was performed or item purification was not performed using real data in current study. After detecting gender-related DIF, expert opinions were taken for bias study. It is important to conduct the gender bias research in the English test when purification is performed and when purification is not performed, as there were DIF studies, but there were not completely similar bias studies in the literature. The sample of the research consists of 7389 students who took the “Transition from Primary to Secondary Education Exam (TPSEE, referred to as “TEOG” in Turkey)” administered in April 2017. When gender-related DIF analysis was performed with the four methods, the results were found to differ partially. DIF analysis results differed in the different conditions item purification was performed or not. Detection of DIF was indicative of possible bias. In the second stage of the study, the opinions of seven experts were taken for item 11, for which DIF was detected at least at B level based on MH, SIBTEST. As a result of expert opinion, it was found that there was no item bias according to gender in any item in the English test. It is recommended that similar bias studies can be conducted for test developers to be aware of the features that may lead to item bias and to construct unbiased items.
{"title":"Purification procedures used for the detection of gender DIF: Item bias in a foreign language test","authors":"Serap Büyükkidik","doi":"10.21449/ijate.1250358","DOIUrl":"https://doi.org/10.21449/ijate.1250358","url":null,"abstract":"Differential item functioning (DIF) detection was handled based on “Mantel-Haenszel (MH)”, “Simultaneous item bias test (SIBTEST)”, “Lord's chi-square”, “Raju's area” methods when item purification was performed or item purification was not performed using real data in current study. After detecting gender-related DIF, expert opinions were taken for bias study. It is important to conduct the gender bias research in the English test when purification is performed and when purification is not performed, as there were DIF studies, but there were not completely similar bias studies in the literature. The sample of the research consists of 7389 students who took the “Transition from Primary to Secondary Education Exam (TPSEE, referred to as “TEOG” in Turkey)” administered in April 2017. When gender-related DIF analysis was performed with the four methods, the results were found to differ partially. DIF analysis results differed in the different conditions item purification was performed or not. Detection of DIF was indicative of possible bias. In the second stage of the study, the opinions of seven experts were taken for item 11, for which DIF was detected at least at B level based on MH, SIBTEST. As a result of expert opinion, it was found that there was no item bias according to gender in any item in the English test. It is recommended that similar bias studies can be conducted for test developers to be aware of the features that may lead to item bias and to construct unbiased items.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139289696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this study is to generate non-verbal items for a visual reasoning test using templated-based automatic item generation (AIG). The fundamental research method involved following the three stages of template-based AIG. An item from the 2016 4th-grade entrance exam of the Science and Art Center (known as BİLSEM) was chosen as the parent item. A cognitive model and an item model were developed for non-verbal reasoning. Then, the items were generated using computer algorithms. For the first item model, 112 items were generated, and for the second item model, 1728 items were produced. The items were evaluated based on subject matter experts (SMEs). The SMEs indicated that the items met the criteria of one right answer, single content and behavior, not trivial content, and homogeneous choices. Additionally, SMEs' opinions determined that the items have varying item difficulty. The results obtained demonstrate the feasibility of AIG for creating an extensive item repository consisting of non-verbal visual reasoning items.
{"title":"Automatic item generation for non-verbal reasoning items","authors":"Ayfer Sayin, Sabiha Bozdağ, Mark J. Gierl","doi":"10.21449/ijate.1359348","DOIUrl":"https://doi.org/10.21449/ijate.1359348","url":null,"abstract":"The purpose of this study is to generate non-verbal items for a visual reasoning test using templated-based automatic item generation (AIG). The fundamental research method involved following the three stages of template-based AIG. An item from the 2016 4th-grade entrance exam of the Science and Art Center (known as BİLSEM) was chosen as the parent item. A cognitive model and an item model were developed for non-verbal reasoning. Then, the items were generated using computer algorithms. For the first item model, 112 items were generated, and for the second item model, 1728 items were produced. The items were evaluated based on subject matter experts (SMEs). The SMEs indicated that the items met the criteria of one right answer, single content and behavior, not trivial content, and homogeneous choices. Additionally, SMEs' opinions determined that the items have varying item difficulty. The results obtained demonstrate the feasibility of AIG for creating an extensive item repository consisting of non-verbal visual reasoning items.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139307522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The majority of students from elementary to tertiary levels have misunderstandings and challenges acquiring various statistical concepts and skills. However, the existing statistics assessment frameworks challenge practice in a classroom setting. The purpose of this research is to develop and validate a statistical thinking assessment tool involving form one (Grade 7) students’ statistical thinking. The SOLO model was applied to develop five testlet tasks. Each testlet task involved four components. This study employed the survey methodology to assess the statistical thinking of 356 form one students. Content validity was determined using the Content Validity Index (CVI). The construct validity was determined using Rasch analysis. The results demonstrated that the instrument for assessing the statistical thinking of the form one students was valid and trustworthy. This finding of the study also revealed new evidence that the instrument allowed the teachers to identify the students’ progress effectively based on the hierarchical manner of item levels in the testlet format. The instrument was useful in identifying students’ statistical thinking levels. The students’ ability to respond appropriately to a task at a particular level reveals their degree of cognitive development. Testlet task was also easy to diagnose the strengths and weaknesses in learning statistics topics.
从初等教育到高等教育,大多数学生在掌握各种统计概念和技能方面都存在误解和困难。然而,现有的统计评估框架对课堂实践提出了挑战。本研究的目的是开发并验证一种涉及中一(七年级)学生统计思维的统计思维评估工具。研究采用 SOLO 模型开发了五个小测试任务。每个小测试任务包括四个部分。本研究采用调查法评估了 356 名中一学生的统计思维。内容效度采用内容效度指数(CVI)确定。建构效度则采用 Rasch 分析法确定。结果表明,评估中一学生统计思维的工具是有效和可信的。这一研究结果还揭示了新的证据,即该工具允许教师根据试卷格式中项目等级的分层方式有效地确定学生的进步情况。该工具有助于确定学生的统计思维水平。学生对某一层次的任务做出适当反应的能力,揭示了他们的认知发展程度。小测试任务也易于诊断学生在学习统计题目方面的强项和弱项。
{"title":"Determining the psychometric properties of middle school statistical thinking testlet-based assessment tool","authors":"Lım HOOI LİAN, W. Yew","doi":"10.21449/ijate.1255859","DOIUrl":"https://doi.org/10.21449/ijate.1255859","url":null,"abstract":"The majority of students from elementary to tertiary levels have misunderstandings and challenges acquiring various statistical concepts and skills. However, the existing statistics assessment frameworks challenge practice in a classroom setting. The purpose of this research is to develop and validate a statistical thinking assessment tool involving form one (Grade 7) students’ statistical thinking. The SOLO model was applied to develop five testlet tasks. Each testlet task involved four components. This study employed the survey methodology to assess the statistical thinking of 356 form one students. Content validity was determined using the Content Validity Index (CVI). The construct validity was determined using Rasch analysis. The results demonstrated that the instrument for assessing the statistical thinking of the form one students was valid and trustworthy. This finding of the study also revealed new evidence that the instrument allowed the teachers to identify the students’ progress effectively based on the hierarchical manner of item levels in the testlet format. The instrument was useful in identifying students’ statistical thinking levels. The students’ ability to respond appropriately to a task at a particular level reveals their degree of cognitive development. Testlet task was also easy to diagnose the strengths and weaknesses in learning statistics topics.","PeriodicalId":42417,"journal":{"name":"International Journal of Assessment Tools in Education","volume":null,"pages":null},"PeriodicalIF":0.9,"publicationDate":"2023-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139315261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}