Tianyi Liu, Julia Chatain, Laura Kobel-Keller, Gerd Kortemeyer, Thomas Willwacher, Mrinmaya Sachan
{"title":"AI-assisted Automated Short Answer Grading of Handwritten University Level Mathematics Exams","authors":"Tianyi Liu, Julia Chatain, Laura Kobel-Keller, Gerd Kortemeyer, Thomas Willwacher, Mrinmaya Sachan","doi":"arxiv-2408.11728","DOIUrl":null,"url":null,"abstract":"Effective and timely feedback in educational assessments is essential but\nlabor-intensive, especially for complex tasks. Recent developments in automated\nfeedback systems, ranging from deterministic response grading to the evaluation\nof semi-open and open-ended essays, have been facilitated by advances in\nmachine learning. The emergence of pre-trained Large Language Models, such as\nGPT-4, offers promising new opportunities for efficiently processing diverse\nresponse types with minimal customization. This study evaluates the\neffectiveness of a pre-trained GPT-4 model in grading semi-open handwritten\nresponses in a university-level mathematics exam. Our findings indicate that\nGPT-4 provides surprisingly reliable and cost-effective initial grading,\nsubject to subsequent human verification. Future research should focus on\nrefining grading rules and enhancing the extraction of handwritten responses to\nfurther leverage these technologies.","PeriodicalId":501462,"journal":{"name":"arXiv - MATH - History and Overview","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - History and Overview","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.11728","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Effective and timely feedback in educational assessments is essential but
labor-intensive, especially for complex tasks. Recent developments in automated
feedback systems, ranging from deterministic response grading to the evaluation
of semi-open and open-ended essays, have been facilitated by advances in
machine learning. The emergence of pre-trained Large Language Models, such as
GPT-4, offers promising new opportunities for efficiently processing diverse
response types with minimal customization. This study evaluates the
effectiveness of a pre-trained GPT-4 model in grading semi-open handwritten
responses in a university-level mathematics exam. Our findings indicate that
GPT-4 provides surprisingly reliable and cost-effective initial grading,
subject to subsequent human verification. Future research should focus on
refining grading rules and enhancing the extraction of handwritten responses to
further leverage these technologies.