利用聚焦细节的分层网络增强复杂公式识别能力

Jiale Wang, Junhui Yu, Huanyong Liu, Chenanran Kong
{"title":"利用聚焦细节的分层网络增强复杂公式识别能力","authors":"Jiale Wang, Junhui Yu, Huanyong Liu, Chenanran Kong","doi":"arxiv-2409.11677","DOIUrl":null,"url":null,"abstract":"Hierarchical and complex Mathematical Expression Recognition (MER) is\nchallenging due to multiple possible interpretations of a formula, complicating\nboth parsing and evaluation. In this paper, we introduce the Hierarchical\nDetail-Focused Recognition dataset (HDR), the first dataset specifically\ndesigned to address these issues. It consists of a large-scale training set,\nHDR-100M, offering an unprecedented scale and diversity with one hundred\nmillion training instances. And the test set, HDR-Test, includes multiple\ninterpretations of complex hierarchical formulas for comprehensive model\nperformance evaluation. Additionally, the parsing of complex formulas often\nsuffers from errors in fine-grained details. To address this, we propose the\nHierarchical Detail-Focused Recognition Network (HDNet), an innovative\nframework that incorporates a hierarchical sub-formula module, focusing on the\nprecise handling of formula details, thereby significantly enhancing MER\nperformance. Experimental results demonstrate that HDNet outperforms existing\nMER models across various datasets.","PeriodicalId":501030,"journal":{"name":"arXiv - CS - Computation and Language","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network\",\"authors\":\"Jiale Wang, Junhui Yu, Huanyong Liu, Chenanran Kong\",\"doi\":\"arxiv-2409.11677\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Hierarchical and complex Mathematical Expression Recognition (MER) is\\nchallenging due to multiple possible interpretations of a formula, complicating\\nboth parsing and evaluation. In this paper, we introduce the Hierarchical\\nDetail-Focused Recognition dataset (HDR), the first dataset specifically\\ndesigned to address these issues. It consists of a large-scale training set,\\nHDR-100M, offering an unprecedented scale and diversity with one hundred\\nmillion training instances. And the test set, HDR-Test, includes multiple\\ninterpretations of complex hierarchical formulas for comprehensive model\\nperformance evaluation. Additionally, the parsing of complex formulas often\\nsuffers from errors in fine-grained details. To address this, we propose the\\nHierarchical Detail-Focused Recognition Network (HDNet), an innovative\\nframework that incorporates a hierarchical sub-formula module, focusing on the\\nprecise handling of formula details, thereby significantly enhancing MER\\nperformance. Experimental results demonstrate that HDNet outperforms existing\\nMER models across various datasets.\",\"PeriodicalId\":501030,\"journal\":{\"name\":\"arXiv - CS - Computation and Language\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Computation and Language\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11677\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Computation and Language","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11677","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

分层复杂数学表达式识别(MER)是一项挑战,因为一个公式可能有多种解释,这使得解析和评估都变得复杂。在本文中,我们介绍了分层细节识别数据集(HDR),这是第一个专门为解决这些问题而设计的数据集。它由大规模训练集 HDR-100M 和测试集 HDR-TM 组成。测试集 HDR-Test 包括对复杂分层公式的多种解释,用于全面评估模型性能。此外,复杂公式的解析经常会出现细节错误。为了解决这个问题,我们提出了分层细节识别网络(HDNet),这是一个创新的框架,其中包含一个分层子公式模块,重点是精确处理公式细节,从而显著提高 MER 性能。实验结果表明,在各种数据集上,HDNet 的性能均优于现有的 MER 模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Enhancing Complex Formula Recognition with Hierarchical Detail-Focused Network
Hierarchical and complex Mathematical Expression Recognition (MER) is challenging due to multiple possible interpretations of a formula, complicating both parsing and evaluation. In this paper, we introduce the Hierarchical Detail-Focused Recognition dataset (HDR), the first dataset specifically designed to address these issues. It consists of a large-scale training set, HDR-100M, offering an unprecedented scale and diversity with one hundred million training instances. And the test set, HDR-Test, includes multiple interpretations of complex hierarchical formulas for comprehensive model performance evaluation. Additionally, the parsing of complex formulas often suffers from errors in fine-grained details. To address this, we propose the Hierarchical Detail-Focused Recognition Network (HDNet), an innovative framework that incorporates a hierarchical sub-formula module, focusing on the precise handling of formula details, thereby significantly enhancing MER performance. Experimental results demonstrate that HDNet outperforms existing MER models across various datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
LLMs + Persona-Plug = Personalized LLMs MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts Extract-and-Abstract: Unifying Extractive and Abstractive Summarization within Single Encoder-Decoder Framework Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources Human-like Affective Cognition in Foundation Models
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1