Can question-texts improve the recognition of handwritten mathematical expressions in respondents’ solutions?

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2024-11-20 DOI:10.1016/j.knosys.2024.112731

Ting Zhang, Xinxin Jin, Xiaoyang Ma, Xinzi Peng, Yiyang Zhao, Jinzheng Liu, Xinguo Yu

{"title":"Can question-texts improve the recognition of handwritten mathematical expressions in respondents’ solutions?","authors":"Ting Zhang, Xinxin Jin, Xiaoyang Ma, Xinzi Peng, Yiyang Zhao, Jinzheng Liu, Xinguo Yu","doi":"10.1016/j.knosys.2024.112731","DOIUrl":null,"url":null,"abstract":"<div><div>The accurate recognition of respondents’ handwritten solutions is important for implementing intelligent diagnosis and tutoring. This task is significantly challenging because of scribbled and irregular writing, especially when handling primary or secondary students whose handwriting has not yet been fully developed. Recognition becomes difficult in such cases even for humans relying only on the visual signals of handwritten content without any context. However, despite decades of work on handwriting recognition, few studies have explored the idea of utilizing external information (question priors) to improve the accuracy. Based on the correlation between questions and solutions, this study aims to explore whether question-texts can improve the recognition of handwritten mathematical expressions (HMEs) in respondents’ solutions. Based on the encoder–decoder framework, which is the mainstream method for HME recognition, we propose two models for fusing question-text signals and handwriting-vision signals at the encoder and decoder stages, respectively. The first, called encoder-fusion, adopts a static query to implement the interaction between two modalities at the encoder phase, and to better catch and interpret the interaction, a fusing method based on a dynamic query at the decoder stage, called decoder-attend is proposed. These two models were evaluated on a self-collected dataset comprising approximately 7k samples and achieved accuracies of 62.61% and 64.20%, respectively, at the expression level. The experimental results demonstrated that both models outperformed the baseline model, which utilized only visual information. The encoder fusion achieved results similar to those of other state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112731"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013650","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The accurate recognition of respondents’ handwritten solutions is important for implementing intelligent diagnosis and tutoring. This task is significantly challenging because of scribbled and irregular writing, especially when handling primary or secondary students whose handwriting has not yet been fully developed. Recognition becomes difficult in such cases even for humans relying only on the visual signals of handwritten content without any context. However, despite decades of work on handwriting recognition, few studies have explored the idea of utilizing external information (question priors) to improve the accuracy. Based on the correlation between questions and solutions, this study aims to explore whether question-texts can improve the recognition of handwritten mathematical expressions (HMEs) in respondents’ solutions. Based on the encoder–decoder framework, which is the mainstream method for HME recognition, we propose two models for fusing question-text signals and handwriting-vision signals at the encoder and decoder stages, respectively. The first, called encoder-fusion, adopts a static query to implement the interaction between two modalities at the encoder phase, and to better catch and interpret the interaction, a fusing method based on a dynamic query at the decoder stage, called decoder-attend is proposed. These two models were evaluated on a self-collected dataset comprising approximately 7k samples and achieved accuracies of 62.61% and 64.20%, respectively, at the expression level. The experimental results demonstrated that both models outperformed the baseline model, which utilized only visual information. The encoder fusion achieved results similar to those of other state-of-the-art methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

问题文本能否提高答卷人答案中手写数学表达式的识别率？

准确识别受访者的手写答案对于实施智能诊断和辅导非常重要。由于书写潦草和不规范，尤其是在处理笔迹尚未完全成熟的中小学生时，这项任务具有极大的挑战性。在这种情况下，即使人类只依靠手写内容的视觉信号，而没有任何上下文，也很难进行识别。然而，尽管在手写识别领域已经开展了数十年的工作，但很少有研究探讨利用外部信息（问题先验）来提高识别准确率的想法。基于问题与解决方案之间的相关性，本研究旨在探讨问题文本是否能提高受访者解决方案中手写数学表达式（HMEs）的识别率。基于手写数学表达式识别的主流方法--编码器-解码器框架，我们提出了两种分别在编码器和解码器阶段融合问题文本信号和手写视图信号的模型。第一种称为编码器-融合（encoder-fusion），在编码器阶段采用静态查询来实现两种模态之间的交互，为了更好地捕捉和解释交互，我们提出了一种基于解码器阶段动态查询的融合方法，称为解码器-关注（decoder-attend）。这两个模型在一个包含约 7k 个样本的自收集数据集上进行了评估，在表达水平上的准确率分别达到了 62.61% 和 64.20%。实验结果表明，这两个模型的性能都优于只利用视觉信息的基线模型。编码器融合取得的结果与其他最先进的方法相似。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.