Can question-texts improve the recognition of handwritten mathematical expressions in respondents’ solutions?

IF 7.2 1区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Knowledge-Based Systems Pub Date : 2024-11-20 DOI:10.1016/j.knosys.2024.112731
Ting Zhang, Xinxin Jin, Xiaoyang Ma, Xinzi Peng, Yiyang Zhao, Jinzheng Liu, Xinguo Yu
{"title":"Can question-texts improve the recognition of handwritten mathematical expressions in respondents’ solutions?","authors":"Ting Zhang,&nbsp;Xinxin Jin,&nbsp;Xiaoyang Ma,&nbsp;Xinzi Peng,&nbsp;Yiyang Zhao,&nbsp;Jinzheng Liu,&nbsp;Xinguo Yu","doi":"10.1016/j.knosys.2024.112731","DOIUrl":null,"url":null,"abstract":"<div><div>The accurate recognition of respondents’ handwritten solutions is important for implementing intelligent diagnosis and tutoring. This task is significantly challenging because of scribbled and irregular writing, especially when handling primary or secondary students whose handwriting has not yet been fully developed. Recognition becomes difficult in such cases even for humans relying only on the visual signals of handwritten content without any context. However, despite decades of work on handwriting recognition, few studies have explored the idea of utilizing external information (question priors) to improve the accuracy. Based on the correlation between questions and solutions, this study aims to explore whether question-texts can improve the recognition of handwritten mathematical expressions (HMEs) in respondents’ solutions. Based on the encoder–decoder framework, which is the mainstream method for HME recognition, we propose two models for fusing question-text signals and handwriting-vision signals at the encoder and decoder stages, respectively. The first, called encoder-fusion, adopts a static query to implement the interaction between two modalities at the encoder phase, and to better catch and interpret the interaction, a fusing method based on a dynamic query at the decoder stage, called decoder-attend is proposed. These two models were evaluated on a self-collected dataset comprising approximately 7k samples and achieved accuracies of 62.61% and 64.20%, respectively, at the expression level. The experimental results demonstrated that both models outperformed the baseline model, which utilized only visual information. The encoder fusion achieved results similar to those of other state-of-the-art methods.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"307 ","pages":"Article 112731"},"PeriodicalIF":7.2000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705124013650","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

The accurate recognition of respondents’ handwritten solutions is important for implementing intelligent diagnosis and tutoring. This task is significantly challenging because of scribbled and irregular writing, especially when handling primary or secondary students whose handwriting has not yet been fully developed. Recognition becomes difficult in such cases even for humans relying only on the visual signals of handwritten content without any context. However, despite decades of work on handwriting recognition, few studies have explored the idea of utilizing external information (question priors) to improve the accuracy. Based on the correlation between questions and solutions, this study aims to explore whether question-texts can improve the recognition of handwritten mathematical expressions (HMEs) in respondents’ solutions. Based on the encoder–decoder framework, which is the mainstream method for HME recognition, we propose two models for fusing question-text signals and handwriting-vision signals at the encoder and decoder stages, respectively. The first, called encoder-fusion, adopts a static query to implement the interaction between two modalities at the encoder phase, and to better catch and interpret the interaction, a fusing method based on a dynamic query at the decoder stage, called decoder-attend is proposed. These two models were evaluated on a self-collected dataset comprising approximately 7k samples and achieved accuracies of 62.61% and 64.20%, respectively, at the expression level. The experimental results demonstrated that both models outperformed the baseline model, which utilized only visual information. The encoder fusion achieved results similar to those of other state-of-the-art methods.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
问题文本能否提高答卷人答案中手写数学表达式的识别率?
准确识别受访者的手写答案对于实施智能诊断和辅导非常重要。由于书写潦草和不规范,尤其是在处理笔迹尚未完全成熟的中小学生时,这项任务具有极大的挑战性。在这种情况下,即使人类只依靠手写内容的视觉信号,而没有任何上下文,也很难进行识别。然而,尽管在手写识别领域已经开展了数十年的工作,但很少有研究探讨利用外部信息(问题先验)来提高识别准确率的想法。基于问题与解决方案之间的相关性,本研究旨在探讨问题文本是否能提高受访者解决方案中手写数学表达式(HMEs)的识别率。基于手写数学表达式识别的主流方法--编码器-解码器框架,我们提出了两种分别在编码器和解码器阶段融合问题文本信号和手写视图信号的模型。第一种称为编码器-融合(encoder-fusion),在编码器阶段采用静态查询来实现两种模态之间的交互,为了更好地捕捉和解释交互,我们提出了一种基于解码器阶段动态查询的融合方法,称为解码器-关注(decoder-attend)。这两个模型在一个包含约 7k 个样本的自收集数据集上进行了评估,在表达水平上的准确率分别达到了 62.61% 和 64.20%。实验结果表明,这两个模型的性能都优于只利用视觉信息的基线模型。编码器融合取得的结果与其他最先进的方法相似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Knowledge-Based Systems
Knowledge-Based Systems 工程技术-计算机:人工智能
CiteScore
14.80
自引率
12.50%
发文量
1245
审稿时长
7.8 months
期刊介绍: Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.
期刊最新文献
Progressive de-preference task-specific processing for generalizable person re-identification GKA-GPT: Graphical knowledge aggregation for multiturn dialog generation A novel spatio-temporal feature interleaved contrast learning neural network from a robustness perspective PSNet: A non-uniform illumination correction method for underwater images based pseudo-siamese network A novel domain-private-suppress meta-recognition network based universal domain generalization for machinery fault diagnosis
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1