首页 > 最新文献

Chinese/English journal of educational measurement and evaluation最新文献

英文 中文
Comparing Performance of Feature Extraction Methods and Machine Learning Models in Automatic Essay Scoring 特征提取方法与机器学习模型在自动作文评分中的性能比较
Pub Date : 2023-09-01 DOI: 10.59863/dqiz8440
Lihua Yao, Hong Jiao
This study used Kaggle data, the ASAP data set, and applied NLP and Bidirectional Encoder Representations from Transformers (BERT) for corpus processing and feature extraction, and applied different machine learning models, both traditional machine-learning classifiers and neural-network-based approaches. Supervised learning models were used for the scoring system, where six out of the eight essay prompts were trained separately and concatenated. Compared with previous study, we found that adding more features such as readability scores using Spacy Textsta improved the prediction results for the essay scoring system. The neural network model, trained on all prompt data and utilizing NLP for corpus processing and feature extraction, performed better than other models with an overall test quadratic weighted kappa (QWK) of 0.9724. It achieved the highest QWK score of 0.859 for prompt 1 and an average QWK of 0.771 across all 6 prompts, making it the best-performing machine learning model that was tested.
本研究使用Kaggle数据,ASAP数据集,并应用NLP和双向编码器表示(BERT)进行语料库处理和特征提取,并应用不同的机器学习模型,包括传统的机器学习分类器和基于神经网络的方法。监督学习模型被用于评分系统,其中八个作文提示中的六个被单独训练并串联起来。与之前的研究相比,我们发现添加更多的特征,如使用Spacy Textsta的可读性分数,提高了论文评分系统的预测结果。该神经网络模型对所有提示数据进行训练,并利用NLP进行语料处理和特征提取,整体测试二次加权kappa (QWK)为0.9724,优于其他模型。它在提示1中获得了0.859的最高QWK分数,在所有6个提示中获得了0.771的平均QWK分数,使其成为测试中表现最好的机器学习模型。
{"title":"Comparing Performance of Feature Extraction Methods and Machine Learning Models in Automatic Essay Scoring","authors":"Lihua Yao, Hong Jiao","doi":"10.59863/dqiz8440","DOIUrl":"https://doi.org/10.59863/dqiz8440","url":null,"abstract":"This study used Kaggle data, the ASAP data set, and applied NLP and Bidirectional Encoder Representations from Transformers (BERT) for corpus processing and feature extraction, and applied different machine learning models, both traditional machine-learning classifiers and neural-network-based approaches. Supervised learning models were used for the scoring system, where six out of the eight essay prompts were trained separately and concatenated. Compared with previous study, we found that adding more features such as readability scores using Spacy Textsta improved the prediction results for the essay scoring system. The neural network model, trained on all prompt data and utilizing NLP for corpus processing and feature extraction, performed better than other models with an overall test quadratic weighted kappa (QWK) of 0.9724. It achieved the highest QWK score of 0.859 for prompt 1 and an average QWK of 0.771 across all 6 prompts, making it the best-performing machine learning model that was tested.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"37 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74440964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
比较特征提取方法和机器学习模型在作文自动评分中的表现 比较特征提取方法和机器学习模型在作文自动评分中的表现
Pub Date : 2023-09-01 DOI: 10.59863/vlgu9815
Li Yao, Hongzan Jiao
本研究利用特征提取与机器学习方法分析 Kaggle 数据,即 ASAP 数据集。具体而言,应用自然语言处理(Natural Language Processing, NLP)和双向编码表示转换模型 (Bidirectional Encoder Representations from Transformers, BERT)进行语料处理和特征提取,并涵盖不同的机器学习模型,包括传统的机器学习分类器和基于神经网络的方法。 对评分系统使用有监督学习模型,对其中 6/8 的写作指令(prompt)进行单独训练或同 时训练。与已有研究相比,本研究发现:(1)增加特征的数量(如使用 Spacy Textsta 的 易读性得分)能够提高作文评分系统的预测能力;(2)使用 NLP 进行语料处理和特征提 取的神经网络模型,同时训练所有写作指令时表现优于其他模型,整体二次加权 Kappa 系数(QWK)为 0.9724。其中,写作指令 1 的 QWK 最高,具体为 0.859,所有 6 个写 作指令的平均 QWK 为 0.771。
本研究利用特征提取与机器学习方法分析 Kaggle 数据,即 ASAP 数据集。具体而言,应用自然语言处理(Natural Language Processing, NLP)和双向编码表示转换模型 (Bidirectional Encoder Representations from Transformers, BERT)进行语料处理和特征提取,并涵盖不同的机器学习模型,包括传统的机器学习分类器和基于神经网络的方法。 对评分系统使用有监督学习模型,对其中 6/8 的写作指令(prompt)进行单独训练或同 时训练。与已有研究相比,本研究发现:(1)增加特征的数量(如使用 Spacy Textsta 的 易读性得分)能够提高作文评分系统的预测能力;(2)使用 NLP 进行语料处理和特征提 取的神经网络模型,同时训练所有写作指令时表现优于其他模型,整体二次加权 Kappa 系数(QWK)为 0.9724。其中,写作指令 1 的 QWK 最高,具体为 0.859,所有 6 个写 作指令的平均 QWK 为 0.771。
{"title":"比较特征提取方法和机器学习模型在作文自动评分中的表现","authors":"Li Yao, Hongzan Jiao","doi":"10.59863/vlgu9815","DOIUrl":"https://doi.org/10.59863/vlgu9815","url":null,"abstract":"本研究利用特征提取与机器学习方法分析 Kaggle 数据,即 ASAP 数据集。具体而言,应用自然语言处理(Natural Language Processing, NLP)和双向编码表示转换模型 (Bidirectional Encoder Representations from Transformers, BERT)进行语料处理和特征提取,并涵盖不同的机器学习模型,包括传统的机器学习分类器和基于神经网络的方法。 对评分系统使用有监督学习模型,对其中 6/8 的写作指令(prompt)进行单独训练或同 时训练。与已有研究相比,本研究发现:(1)增加特征的数量(如使用 Spacy Textsta 的 易读性得分)能够提高作文评分系统的预测能力;(2)使用 NLP 进行语料处理和特征提 取的神经网络模型,同时训练所有写作指令时表现优于其他模型,整体二次加权 Kappa 系数(QWK)为 0.9724。其中,写作指令 1 的 QWK 最高,具体为 0.859,所有 6 个写 作指令的平均 QWK 为 0.771。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88170327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
从 NEAP 阅读项目自动评分的数据挑战赛中汲取的公平性评估经验 从 NEAP 阅读项目自动评分的数据挑战赛中汲取的公平性评估经验
Pub Date : 2023-09-01 DOI: 10.59863/nzbo8811
Maggie Beiting-Parrish, John Whitmer
自然语言处理(NLP)在各个领域被广泛用于预测学生开放式反应的人为评分 (Johnson et al., 2022)。保证基于学生人口统计学因素的算法公平是至关重要的 (Madnani et al., 2017)。本研究对数据挑战赛中表现最好的六个参赛者进行了公平性分析,涉及20个NEAP阅读理解项目,这些项目最初是基于种族和性别进行公平性分析的。本研究描述了包括英语语言学习者身份(ELLs)、个人教育计划以及免费/优惠午餐在内的附加公平性评估。许多项目在成绩预测上表现出较低的准确性,其中对ELLs表现得最为明显。本研究推荐在评分公平性评估中纳入额外的人口统计学因素,同样,公平性分析需要考虑多重因素和背景。
自然语言处理(NLP)在各个领域被广泛用于预测学生开放式反应的人为评分 (Johnson et al., 2022)。保证基于学生人口统计学因素的算法公平是至关重要的 (Madnani et al., 2017)。本研究对数据挑战赛中表现最好的六个参赛者进行了公平性分析,涉及20个NEAP阅读理解项目,这些项目最初是基于种族和性别进行公平性分析的。本研究描述了包括英语语言学习者身份(ELLs)、个人教育计划以及免费/优惠午餐在内的附加公平性评估。许多项目在成绩预测上表现出较低的准确性,其中对ELLs表现得最为明显。本研究推荐在评分公平性评估中纳入额外的人口统计学因素,同样,公平性分析需要考虑多重因素和背景。
{"title":"从 NEAP 阅读项目自动评分的数据挑战赛中汲取的公平性评估经验","authors":"Maggie Beiting-Parrish, John Whitmer","doi":"10.59863/nzbo8811","DOIUrl":"https://doi.org/10.59863/nzbo8811","url":null,"abstract":"自然语言处理(NLP)在各个领域被广泛用于预测学生开放式反应的人为评分 (Johnson et al., 2022)。保证基于学生人口统计学因素的算法公平是至关重要的 (Madnani et al., 2017)。本研究对数据挑战赛中表现最好的六个参赛者进行了公平性分析,涉及20个NEAP阅读理解项目,这些项目最初是基于种族和性别进行公平性分析的。本研究描述了包括英语语言学习者身份(ELLs)、个人教育计划以及免费/优惠午餐在内的附加公平性评估。许多项目在成绩预测上表现出较低的准确性,其中对ELLs表现得最为明显。本研究推荐在评分公平性评估中纳入额外的人口统计学因素,同样,公平性分析需要考虑多重因素和背景。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135737278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items 从数据挑战中评估公平性以自动评分NAEP阅读项目的经验教训
Pub Date : 2023-09-01 DOI: 10.59863/nkcj9608
Maggie Beiting-Parrish, John Whitmer
Natural language processing (NLP) is widely used to predict human scores for open-ended student assessment responses in various content areas (Johnson et al., 2022). Ensuring algorithmic fairness based on student demographic background factors is crucial (Madnani et al., 2017). This study presents a fairness analysis of six top-performing entries from a data challenge involving 20 NAEP reading comprehension items that were initially analyzed for fairness based on race/ethnicity and gender. This study describes additional fairness evaluation including English Language Learner Status (ELLs), Individual Education Plans, and Free/Reduced-Price Lunch. Several items showed lower accuracy for predicted scores, particularly for ELLs. This study recommends considering additional demographic factors in fairness scoring evaluations and that fairness analysis should consider multiple factors and contexts.
自然语言处理(NLP)被广泛用于预测各种内容领域开放式学生评估反应的人类分数(Johnson et al., 2022)。确保基于学生人口统计背景因素的算法公平性至关重要(Madnani et al., 2017)。本研究对来自20个NAEP阅读理解项目的数据挑战中的六个表现最好的条目进行了公平性分析,这些项目最初是根据种族/民族和性别进行公平性分析的。本研究描述了额外的公平性评估,包括英语学习者状态(ELLs)、个人教育计划和免费/减价午餐。有几个项目的预测分数的准确性较低,尤其是ELLs。本研究建议在公平评分评估中考虑额外的人口因素,公平分析应考虑多种因素和背景。
{"title":"Lessons Learned about Evaluating Fairness from a Data Challenge to Automatically Score NAEP Reading Items","authors":"Maggie Beiting-Parrish, John Whitmer","doi":"10.59863/nkcj9608","DOIUrl":"https://doi.org/10.59863/nkcj9608","url":null,"abstract":"Natural language processing (NLP) is widely used to predict human scores for open-ended student assessment responses in various content areas (Johnson et al., 2022). Ensuring algorithmic fairness based on student demographic background factors is crucial (Madnani et al., 2017). This study presents a fairness analysis of six top-performing entries from a data challenge involving 20 NAEP reading comprehension items that were initially analyzed for fairness based on race/ethnicity and gender. This study describes additional fairness evaluation including English Language Learner Status (ELLs), Individual Education Plans, and Free/Reduced-Price Lunch. Several items showed lower accuracy for predicted scores, particularly for ELLs. This study recommends considering additional demographic factors in fairness scoring evaluations and that fairness analysis should consider multiple factors and contexts.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135737279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
练习测试中的粗心案例检测 练习测试中的粗心案例检测
Pub Date : 2023-09-01 DOI: 10.59863/ahsa2170
Steven Nydick
本文提出了一种新颖的方法,利用机器学习模型在低风险的练习测试中检测粗心的作答 行为。我们不是根据模型的拟合统计量或已知的事实将被试的作答归类为粗心,而是构 建了一个模型,该模型基于练习测试题目的属性来预测练习测试与正式测试之间的考试 分数的显著变化。我们利用有关粗心被试如何作答题目的假设,从练习测试题目中提取 特征,通过交叉验证来优化模型的样本外预测,并在预测最接近的正式测试时减少异方 差性。所有分析均使用 Duolingo 英语测试的练习版和正式版的数据。我们讨论了使用机 器学习模型预测粗心作答情况与其他的流行方法相比的意义。
本文提出了一种新颖的方法,利用机器学习模型在低风险的练习测试中检测粗心的作答 行为。我们不是根据模型的拟合统计量或已知的事实将被试的作答归类为粗心,而是构 建了一个模型,该模型基于练习测试题目的属性来预测练习测试与正式测试之间的考试 分数的显著变化。我们利用有关粗心被试如何作答题目的假设,从练习测试题目中提取 特征,通过交叉验证来优化模型的样本外预测,并在预测最接近的正式测试时减少异方 差性。所有分析均使用 Duolingo 英语测试的练习版和正式版的数据。我们讨论了使用机 器学习模型预测粗心作答情况与其他的流行方法相比的意义。
{"title":"练习测试中的粗心案例检测","authors":"Steven Nydick","doi":"10.59863/ahsa2170","DOIUrl":"https://doi.org/10.59863/ahsa2170","url":null,"abstract":"本文提出了一种新颖的方法,利用机器学习模型在低风险的练习测试中检测粗心的作答 行为。我们不是根据模型的拟合统计量或已知的事实将被试的作答归类为粗心,而是构 建了一个模型,该模型基于练习测试题目的属性来预测练习测试与正式测试之间的考试 分数的显著变化。我们利用有关粗心被试如何作答题目的假设,从练习测试题目中提取 特征,通过交叉验证来优化模型的样本外预测,并在预测最接近的正式测试时减少异方 差性。所有分析均使用 Duolingo 英语测试的练习版和正式版的数据。我们讨论了使用机 器学习模型预测粗心作答情况与其他的流行方法相比的意义。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"43 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139345427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
泰尔指数在评分者间信度中的应用:与组内相关系数的比较 泰尔指数在评分者间信度中的应用:与组内相关系数的比较
Pub Date : 2023-08-01 DOI: 10.59863/bner9428
天舒 潘, 悦 阴
本文建议应用泰尔(Theil)指数比率于评分者间信度。我们讨论了其理论基础,并使用 真实数据进行了检验。研究结果表明,组内相关系数和泰尔指数比率结果之间的相关性 很高。但是,组内相关系数的估计可能会因评分者之间的某些极端分歧而低估评分者间 信度,比泰尔指数比率更容易受到这些极端分歧的影响。鉴于泰尔指数比率在某种程度 上克服了组内相关系数的局限性,至少在某些条件下,例如,当数据中存在奇异值,很 难估计方差分量,或者组内相关系数低估了评分者间信度的时候,泰尔指数比率提供了 评估评分者间信度的另一种方法。
本文建议应用泰尔(Theil)指数比率于评分者间信度。我们讨论了其理论基础,并使用 真实数据进行了检验。研究结果表明,组内相关系数和泰尔指数比率结果之间的相关性 很高。但是,组内相关系数的估计可能会因评分者之间的某些极端分歧而低估评分者间 信度,比泰尔指数比率更容易受到这些极端分歧的影响。鉴于泰尔指数比率在某种程度 上克服了组内相关系数的局限性,至少在某些条件下,例如,当数据中存在奇异值,很 难估计方差分量,或者组内相关系数低估了评分者间信度的时候,泰尔指数比率提供了 评估评分者间信度的另一种方法。
{"title":"泰尔指数在评分者间信度中的应用:与组内相关系数的比较","authors":"天舒 潘, 悦 阴","doi":"10.59863/bner9428","DOIUrl":"https://doi.org/10.59863/bner9428","url":null,"abstract":"本文建议应用泰尔(Theil)指数比率于评分者间信度。我们讨论了其理论基础,并使用 真实数据进行了检验。研究结果表明,组内相关系数和泰尔指数比率结果之间的相关性 很高。但是,组内相关系数的估计可能会因评分者之间的某些极端分歧而低估评分者间 信度,比泰尔指数比率更容易受到这些极端分歧的影响。鉴于泰尔指数比率在某种程度 上克服了组内相关系数的局限性,至少在某些条件下,例如,当数据中存在奇异值,很 难估计方差分量,或者组内相关系数低估了评分者间信度的时候,泰尔指数比率提供了 评估评分者间信度的另一种方法。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"49 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73394756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Language of 21st Century Skills: Next Directions for Closing the Skills Gap Between Employers and Postsecondary Graduates 21世纪技能的语言:缩小雇主与高等教育毕业生之间技能差距的下一个方向
Pub Date : 2023-08-01 DOI: 10.59863/oivi3767
G. Orona, O. Liu, Richard Arum
The onus of preparing skilled employees for the modern workforce is largely placed on institutions of higher education. However, recent surveys consistently show a skills gap between what employers’ desire and what graduates possess. This review engages this discussion in the context of measuring and assessing 21st century skills. We begin by succinctly reviewing literature pertaining to the skills gap, including what types of skills are commonly referenced, before moving to examine literature indicating the relations between current 21st century skills and job-related outcomes. Finally, we conclude with recommendations for higher education researchers examining skill development. Our recommendations cover three key corresponding areas: theories of cognitive development, intervention design, measurement and assessment.
为现代劳动力培养熟练员工的责任在很大程度上落在了高等教育机构身上。然而,最近的调查不断显示,雇主所期望的技能与毕业生所拥有的技能之间存在差距。这篇综述在测量和评估21世纪技能的背景下进行了讨论。我们首先简要回顾与技能差距有关的文献,包括通常引用的技能类型,然后再研究表明当前21世纪技能与工作相关结果之间关系的文献。最后,我们对研究技能发展的高等教育研究者提出了建议。我们的建议涵盖三个关键的相应领域:认知发展理论,干预设计,测量和评估。
{"title":"The Language of 21st Century Skills: Next Directions for Closing the Skills Gap Between Employers and Postsecondary Graduates","authors":"G. Orona, O. Liu, Richard Arum","doi":"10.59863/oivi3767","DOIUrl":"https://doi.org/10.59863/oivi3767","url":null,"abstract":"The onus of preparing skilled employees for the modern workforce is largely placed on institutions of higher education. However, recent surveys consistently show a skills gap between what employers’ desire and what graduates possess. This review engages this discussion in the context of measuring and assessing 21st century skills. We begin by succinctly reviewing literature pertaining to the skills gap, including what types of skills are commonly referenced, before moving to examine literature indicating the relations between current 21st century skills and job-related outcomes. Finally, we conclude with recommendations for higher education researchers examining skill development. Our recommendations cover three key corresponding areas: theories of cognitive development, intervention design, measurement and assessment.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83066599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
21世纪技能的观点:缩小雇主和高等教育毕业生间技能差距的下一个方向 21世纪技能的观点:缩小雇主和高等教育毕业生间技能差距的下一个方向
Pub Date : 2023-08-01 DOI: 10.59863/wzuf7282
G. Orona, O. Liu, Richard Arum
高等教育机构承担了为现代劳动力培养熟练员工的责任。然而,最近的调研一致显示雇 主期望与毕业生所拥有的技能差距。本综述在衡量和评估21世纪技能的语境中讨论这种差距。我们首先简要回顾有关技能差距的文献(包括哪些类型的技能最常被提及),然后 探讨当前 21 世纪技能与工作相关成果之间关系的献。最后,我们总结出给高等教育研究人员探索技能发展的建议。我们的建议涵盖三个关键的相关领域:认知发展理论、干预设计、测量和评估。
高等教育机构承担了为现代劳动力培养熟练员工的责任。然而,最近的调研一致显示雇 主期望与毕业生所拥有的技能差距。本综述在衡量和评估21世纪技能的语境中讨论这种差距。我们首先简要回顾有关技能差距的文献(包括哪些类型的技能最常被提及),然后 探讨当前 21 世纪技能与工作相关成果之间关系的献。最后,我们总结出给高等教育研究人员探索技能发展的建议。我们的建议涵盖三个关键的相关领域:认知发展理论、干预设计、测量和评估。
{"title":"21世纪技能的观点:缩小雇主和高等教育毕业生间技能差距的下一个方向","authors":"G. Orona, O. Liu, Richard Arum","doi":"10.59863/wzuf7282","DOIUrl":"https://doi.org/10.59863/wzuf7282","url":null,"abstract":"高等教育机构承担了为现代劳动力培养熟练员工的责任。然而,最近的调研一致显示雇 主期望与毕业生所拥有的技能差距。本综述在衡量和评估21世纪技能的语境中讨论这种差距。我们首先简要回顾有关技能差距的文献(包括哪些类型的技能最常被提及),然后 探讨当前 21 世纪技能与工作相关成果之间关系的献。最后,我们总结出给高等教育研究人员探索技能发展的建议。我们的建议涵盖三个关键的相关领域:认知发展理论、干预设计、测量和评估。","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"39 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88772158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Application of Theril Indexes for the Interrater Reliability: A Comparison with Intraclass Correlations 组间信度指标的应用:与组内相关的比较
Pub Date : 2023-08-01 DOI: 10.59863/wddk7257
Tianshu Pan, Yue Yin
This study proposes to apply the Theil-index ratios for the interrater reliability. We discuss the theoretical foundations and examine its function empirically using real data. Our analyses show that Theil-index rations and intraclass correlation (ICC) estimates are highly correlated. However, ICC may underestimate the interrater reliability by some extreme disagreement among raters and be more likely to be influenced by the extreme disagreement. As Theil-index ratios overcome the limitations of ICC to some degree, it seems that Theil-index ratios provide an alternative to evaluating interrater reliability, at least under certain conditions, e.g., when outliers exist in the data, it is difficult to obtain the variance component estimates, or ICC underestimates interrater reliability.
本研究提出将Theil-index比值用于互鉴信度。讨论了其理论基础,并用实际数据检验了其功能。我们的分析表明,他们的指数比率和类内相关(ICC)估计是高度相关的。然而,由于评价者之间的极端分歧,ICC可能低估了评价者的信度,并且更有可能受到极端分歧的影响。由于Theil-index比在一定程度上克服了ICC的局限性,似乎Theil-index比至少在某些条件下(例如,当数据中存在异常值时,难以获得方差成分估计值,或ICC低估了interr可靠性)提供了评估interr可靠性的另一种选择。
{"title":"An Application of Theril Indexes for the Interrater Reliability: A Comparison with Intraclass Correlations","authors":"Tianshu Pan, Yue Yin","doi":"10.59863/wddk7257","DOIUrl":"https://doi.org/10.59863/wddk7257","url":null,"abstract":"This study proposes to apply the Theil-index ratios for the interrater reliability. We discuss the theoretical foundations and examine its function empirically using real data. Our analyses show that Theil-index rations and intraclass correlation (ICC) estimates are highly correlated. However, ICC may underestimate the interrater reliability by some extreme disagreement among raters and be more likely to be influenced by the extreme disagreement. As Theil-index ratios overcome the limitations of ICC to some degree, it seems that Theil-index ratios provide an alternative to evaluating interrater reliability, at least under certain conditions, e.g., when outliers exist in the data, it is difficult to obtain the variance component estimates, or ICC underestimates interrater reliability.","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"130 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74896897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
flexCDMs: A Web-based Platform for Cognitive Diagnostic Data Analysis flexCDMs:基于web的认知诊断数据分析平台
Pub Date : 2023-06-28 DOI: 10.59863/osdb8732
Dongbo Tu, Yong Liu, Xuliang Gao, Yan Cai
Cognitive diagnosis is an important component of modern measurement theory and has received widespread attention from researchers in the fields of education and psychological measurement. Existing cognitive diagnosis analysis tools rely on professional software packages (such as R packages), which creates significant challenges for users, especially those who are not familiar with computer programming. To remove this technical barrier, our team has developed a web-based, user-friendly platform, named flexCDMs, for cognitive diagnosis data analysis. This article describes the features of the platform, the functional modules, the implemented cognitive diagnosis models (CDMs) and algorithms, and illustrates the operations of the platform. This platform can be used to analyze data based on various cognitive diagnosis models, carry out Q-matrix theory, model-data fit tests, parameter estimation, quality analysis of cognitive diagnostic tests, differential item functioning (DIF) detection, and Q-matrix modification. It produces various charts and graphs to report results. It is a powerful, yet easy to use cognitive diagnosis data analysis tool. The website for the flexCDMs platform is: http://111.230.233.68:1001/?Id=false&Block
认知诊断是现代测量理论的重要组成部分,受到了教育和心理测量领域研究者的广泛关注。现有的认知诊断分析工具依赖于专业的软件包(如R软件包),这给用户,特别是那些不熟悉计算机编程的用户带来了很大的挑战。为了消除这一技术障碍,我们的团队开发了一个基于网络的、用户友好的平台,名为flexCDMs,用于认知诊断数据分析。本文介绍了该平台的特点、功能模块、实现的认知诊断模型和算法,并对平台的操作进行了说明。该平台可用于基于各种认知诊断模型的数据分析,进行q -矩阵理论、模型-数据拟合检验、参数估计、认知诊断测试质量分析、DIF检测、q -矩阵修正等。它生成各种图表和图形来报告结果。它是一个功能强大,但易于使用的认知诊断数据分析工具。flexCDMs平台的网址是:http://111.230.233.68:1001/?Id=false&Block
{"title":"flexCDMs: A Web-based Platform for Cognitive Diagnostic Data Analysis","authors":"Dongbo Tu, Yong Liu, Xuliang Gao, Yan Cai","doi":"10.59863/osdb8732","DOIUrl":"https://doi.org/10.59863/osdb8732","url":null,"abstract":"Cognitive diagnosis is an important component of modern measurement theory and has received widespread attention from researchers in the fields of education and psychological measurement. Existing cognitive diagnosis analysis tools rely on professional software packages (such as R packages), which creates significant challenges for users, especially those who are not familiar with computer programming. To remove this technical barrier, our team has developed a web-based, user-friendly platform, named flexCDMs, for cognitive diagnosis data analysis. This article describes the features of the platform, the functional modules, the implemented cognitive diagnosis models (CDMs) and algorithms, and illustrates the operations of the platform. This platform can be used to analyze data based on various cognitive diagnosis models, carry out Q-matrix theory, model-data fit tests, parameter estimation, quality analysis of cognitive diagnostic tests, differential item functioning (DIF) detection, and Q-matrix modification. It produces various charts and graphs to report results. It is a powerful, yet easy to use cognitive diagnosis data analysis tool. The website for the flexCDMs platform is: http://111.230.233.68:1001/?Id=false&Block","PeriodicalId":72586,"journal":{"name":"Chinese/English journal of educational measurement and evaluation","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2023-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79149589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Chinese/English journal of educational measurement and evaluation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1