Tianyi Liu, Julia Chatain, Laura Kobel-Keller, Gerd Kortemeyer, Thomas Willwacher, Mrinmaya Sachan
{"title":"人工智能辅助大学数学考试手写简答题自动评分","authors":"Tianyi Liu, Julia Chatain, Laura Kobel-Keller, Gerd Kortemeyer, Thomas Willwacher, Mrinmaya Sachan","doi":"arxiv-2408.11728","DOIUrl":null,"url":null,"abstract":"Effective and timely feedback in educational assessments is essential but\nlabor-intensive, especially for complex tasks. Recent developments in automated\nfeedback systems, ranging from deterministic response grading to the evaluation\nof semi-open and open-ended essays, have been facilitated by advances in\nmachine learning. The emergence of pre-trained Large Language Models, such as\nGPT-4, offers promising new opportunities for efficiently processing diverse\nresponse types with minimal customization. This study evaluates the\neffectiveness of a pre-trained GPT-4 model in grading semi-open handwritten\nresponses in a university-level mathematics exam. Our findings indicate that\nGPT-4 provides surprisingly reliable and cost-effective initial grading,\nsubject to subsequent human verification. Future research should focus on\nrefining grading rules and enhancing the extraction of handwritten responses to\nfurther leverage these technologies.","PeriodicalId":501462,"journal":{"name":"arXiv - MATH - History and Overview","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams\",\"authors\":\"Tianyi Liu, Julia Chatain, Laura Kobel-Keller, Gerd Kortemeyer, Thomas Willwacher, Mrinmaya Sachan\",\"doi\":\"arxiv-2408.11728\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Effective and timely feedback in educational assessments is essential but\\nlabor-intensive, especially for complex tasks. Recent developments in automated\\nfeedback systems, ranging from deterministic response grading to the evaluation\\nof semi-open and open-ended essays, have been facilitated by advances in\\nmachine learning. The emergence of pre-trained Large Language Models, such as\\nGPT-4, offers promising new opportunities for efficiently processing diverse\\nresponse types with minimal customization. This study evaluates the\\neffectiveness of a pre-trained GPT-4 model in grading semi-open handwritten\\nresponses in a university-level mathematics exam. Our findings indicate that\\nGPT-4 provides surprisingly reliable and cost-effective initial grading,\\nsubject to subsequent human verification. Future research should focus on\\nrefining grading rules and enhancing the extraction of handwritten responses to\\nfurther leverage these technologies.\",\"PeriodicalId\":501462,\"journal\":{\"name\":\"arXiv - MATH - History and Overview\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - History and Overview\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.11728\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - History and Overview","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams
Effective and timely feedback in educational assessments is essential but
labor-intensive, especially for complex tasks. Recent developments in automated
feedback systems, ranging from deterministic response grading to the evaluation
of semi-open and open-ended essays, have been facilitated by advances in
machine learning. The emergence of pre-trained Large Language Models, such as
GPT-4, offers promising new opportunities for efficiently processing diverse
response types with minimal customization. This study evaluates the
effectiveness of a pre-trained GPT-4 model in grading semi-open handwritten
responses in a university-level mathematics exam. Our findings indicate that
GPT-4 provides surprisingly reliable and cost-effective initial grading,
subject to subsequent human verification. Future research should focus on
refining grading rules and enhancing the extraction of handwritten responses to
further leverage these technologies.