Efficient extraction of experimental data from line charts using advanced machine learning techniques

IF 2.2 4区 计算机科学 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING Graphical Models Pub Date : 2025-03-25 DOI:10.1016/j.gmod.2025.101259
Wenjin Yang, Jie He, Xiaotong Zhang
{"title":"Efficient extraction of experimental data from line charts using advanced machine learning techniques","authors":"Wenjin Yang,&nbsp;Jie He,&nbsp;Xiaotong Zhang","doi":"10.1016/j.gmod.2025.101259","DOIUrl":null,"url":null,"abstract":"<div><div>Line charts, as a common data visualization tool in scientific research and business analysis, encapsulate rich experimental data. However, existing data extraction tools face challenges such as low automation levels and difficulties in handling complex charts. This paper proposes a novel method for extracting data from line charts, reformulating the extraction problem as an instance segmentation task, and introducing the Mamba-enhanced Transformer mask query method along with a curve mask-guided training approach to address challenges such as long dependencies and intersections in curve detection. Additionally, YOLOv9 is utilized for the detection and classification of chart elements, and a text recognition dataset comprising approximately 100K charts is constructed. An LSTM-based attention mechanism is employed for precise scale value recognition. Lastly, we present a method for automatically converting image data into structured JSON data, significantly enhancing the efficiency and accuracy of data extraction. Experimental results demonstrate that this method exhibits high efficiency and accuracy in handling complex charts, achieving an average extraction accuracy of 93% on public datasets, significantly surpassing the current state-of-the-art methods. This research provides an efficient foundation for large-scale scientific data analysis and machine learning model development, advancing the field of automated data extraction technology.</div></div>","PeriodicalId":55083,"journal":{"name":"Graphical Models","volume":"139 ","pages":"Article 101259"},"PeriodicalIF":2.2000,"publicationDate":"2025-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Graphical Models","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1524070325000062","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

Line charts, as a common data visualization tool in scientific research and business analysis, encapsulate rich experimental data. However, existing data extraction tools face challenges such as low automation levels and difficulties in handling complex charts. This paper proposes a novel method for extracting data from line charts, reformulating the extraction problem as an instance segmentation task, and introducing the Mamba-enhanced Transformer mask query method along with a curve mask-guided training approach to address challenges such as long dependencies and intersections in curve detection. Additionally, YOLOv9 is utilized for the detection and classification of chart elements, and a text recognition dataset comprising approximately 100K charts is constructed. An LSTM-based attention mechanism is employed for precise scale value recognition. Lastly, we present a method for automatically converting image data into structured JSON data, significantly enhancing the efficiency and accuracy of data extraction. Experimental results demonstrate that this method exhibits high efficiency and accuracy in handling complex charts, achieving an average extraction accuracy of 93% on public datasets, significantly surpassing the current state-of-the-art methods. This research provides an efficient foundation for large-scale scientific data analysis and machine learning model development, advancing the field of automated data extraction technology.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用先进的机器学习技术从折线图中高效提取实验数据
折线图作为科学研究和商业分析中常用的数据可视化工具,封装了丰富的实验数据。然而,现有的数据提取工具面临着自动化水平低和处理复杂图表困难等挑战。本文提出了一种从折线图中提取数据的新方法,将提取问题重新定义为实例分割任务,并引入了mamba增强的Transformer掩码查询方法以及曲线掩码指导的训练方法,以解决曲线检测中的长依赖关系和交叉点等挑战。此外,利用YOLOv9对图表元素进行检测和分类,构建了包含约100K个图表的文本识别数据集。采用基于lstm的注意机制进行尺度值的精确识别。最后,我们提出了一种将图像数据自动转换为结构化JSON数据的方法,大大提高了数据提取的效率和准确性。实验结果表明,该方法在处理复杂图表方面具有较高的效率和准确性,在公共数据集上的平均提取准确率达到93%,大大超过了目前最先进的方法。本研究为大规模科学数据分析和机器学习模型开发提供了有效的基础,推动了自动化数据提取技术领域的发展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Graphical Models
Graphical Models 工程技术-计算机:软件工程
CiteScore
3.60
自引率
5.90%
发文量
15
审稿时长
47 days
期刊介绍: Graphical Models is recognized internationally as a highly rated, top tier journal and is focused on the creation, geometric processing, animation, and visualization of graphical models and on their applications in engineering, science, culture, and entertainment. GMOD provides its readers with thoroughly reviewed and carefully selected papers that disseminate exciting innovations, that teach rigorous theoretical foundations, that propose robust and efficient solutions, or that describe ambitious systems or applications in a variety of topics. We invite papers in five categories: research (contributions of novel theoretical or practical approaches or solutions), survey (opinionated views of the state-of-the-art and challenges in a specific topic), system (the architecture and implementation details of an innovative architecture for a complete system that supports model/animation design, acquisition, analysis, visualization?), application (description of a novel application of know techniques and evaluation of its impact), or lecture (an elegant and inspiring perspective on previously published results that clarifies them and teaches them in a new way). GMOD offers its authors an accelerated review, feedback from experts in the field, immediate online publication of accepted papers, no restriction on color and length (when justified by the content) in the online version, and a broad promotion of published papers. A prestigious group of editors selected from among the premier international researchers in their fields oversees the review process.
期刊最新文献
Monte Carlo optimization for gradient meshes Corrigendum to “LDM: Large tensorial SDF model for textured mesh generation” [Graphical Models, Volume 140, August 2025, 101271]
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1