通过人工智能聊天机器人评估任务解决情况的评分标准开发与验证

IF 2.4 Q1 EDUCATION & EDUCATIONAL RESEARCH Electronic Journal of e-Learning Pub Date : 2024-05-17 DOI:10.34190/ejel.22.6.3292

Mohammad Hmoud, Hadeel Swaity, Eman Anjass, Eva María Aguaded-Ramírez

{"title":"通过人工智能聊天机器人评估任务解决情况的评分标准开发与验证","authors":"Mohammad Hmoud, Hadeel Swaity, Eman Anjass, Eva María Aguaded-Ramírez","doi":"10.34190/ejel.22.6.3292","DOIUrl":null,"url":null,"abstract":"This research aimed to develop and validate a rubric to assess Artificial Intelligence (AI) chatbots' effectiveness in accomplishing tasks, particularly within educational contexts. Given the rapidly growing integration of AI in various sectors, including education, a systematic and robust tool for evaluating AI chatbot performance is essential. This investigation involved a rigorous process including expert involvement to ensure content validity, as well as the application of statistical tests for assessing internal consistency and reliability. Factor analysis also revealed two significant domains, \"Quality of Content\" and \"Quality of Expression\", which further enhanced the construct validity of the evaluation scale. The results from this investigation robustly affirm the reliability and validity of the developed rubric, thus marking a significant advancement in the sphere of AI chatbot performance evaluation within educational contexts. Nonetheless, the study simultaneously emphasizes the requirement for additional validation research, specifically those entailing a variety of tasks and diverse AI chatbots, to further corroborate these findings. The ramifications of this research are profound, offering both researchers and practitioners engaged in chatbot development and evaluation a comprehensive and validated framework for the assessment of chatbot performance.","PeriodicalId":46105,"journal":{"name":"Electronic Journal of e-Learning","volume":null,"pages":null},"PeriodicalIF":2.4000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rubric Development and Validation for Assessing Tasks' Solving via AI Chatbots\",\"authors\":\"Mohammad Hmoud, Hadeel Swaity, Eman Anjass, Eva María Aguaded-Ramírez\",\"doi\":\"10.34190/ejel.22.6.3292\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This research aimed to develop and validate a rubric to assess Artificial Intelligence (AI) chatbots' effectiveness in accomplishing tasks, particularly within educational contexts. Given the rapidly growing integration of AI in various sectors, including education, a systematic and robust tool for evaluating AI chatbot performance is essential. This investigation involved a rigorous process including expert involvement to ensure content validity, as well as the application of statistical tests for assessing internal consistency and reliability. Factor analysis also revealed two significant domains, \\\"Quality of Content\\\" and \\\"Quality of Expression\\\", which further enhanced the construct validity of the evaluation scale. The results from this investigation robustly affirm the reliability and validity of the developed rubric, thus marking a significant advancement in the sphere of AI chatbot performance evaluation within educational contexts. Nonetheless, the study simultaneously emphasizes the requirement for additional validation research, specifically those entailing a variety of tasks and diverse AI chatbots, to further corroborate these findings. The ramifications of this research are profound, offering both researchers and practitioners engaged in chatbot development and evaluation a comprehensive and validated framework for the assessment of chatbot performance.\",\"PeriodicalId\":46105,\"journal\":{\"name\":\"Electronic Journal of e-Learning\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Journal of e-Learning\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34190/ejel.22.6.3292\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Journal of e-Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34190/ejel.22.6.3292","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

这项研究旨在开发和验证一个标准，用于评估人工智能（AI）聊天机器人完成任务的效率，特别是在教育领域。鉴于人工智能与包括教育在内的各行各业的快速融合，一个系统、强大的人工智能聊天机器人性能评估工具至关重要。本次调查采用了严格的流程，包括专家参与以确保内容的有效性，以及应用统计测试来评估内部一致性和可靠性。因子分析还揭示了 "内容质量 "和 "表达质量 "这两个重要领域，进一步增强了评价量表的建构效度。研究结果有力地证实了所开发量表的可靠性和有效性，从而标志着人工智能聊天机器人在教育背景下的性能评估领域取得了重大进展。尽管如此，本研究同时强调需要进行更多的验证研究，特别是涉及各种任务和不同人工智能聊天机器人的研究，以进一步证实这些发现。这项研究意义深远，它为从事聊天机器人开发和评估的研究人员和从业人员提供了一个全面、有效的聊天机器人性能评估框架。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Rubric Development and Validation for Assessing Tasks' Solving via AI Chatbots

This research aimed to develop and validate a rubric to assess Artificial Intelligence (AI) chatbots' effectiveness in accomplishing tasks, particularly within educational contexts. Given the rapidly growing integration of AI in various sectors, including education, a systematic and robust tool for evaluating AI chatbot performance is essential. This investigation involved a rigorous process including expert involvement to ensure content validity, as well as the application of statistical tests for assessing internal consistency and reliability. Factor analysis also revealed two significant domains, "Quality of Content" and "Quality of Expression", which further enhanced the construct validity of the evaluation scale. The results from this investigation robustly affirm the reliability and validity of the developed rubric, thus marking a significant advancement in the sphere of AI chatbot performance evaluation within educational contexts. Nonetheless, the study simultaneously emphasizes the requirement for additional validation research, specifically those entailing a variety of tasks and diverse AI chatbots, to further corroborate these findings. The ramifications of this research are profound, offering both researchers and practitioners engaged in chatbot development and evaluation a comprehensive and validated framework for the assessment of chatbot performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊