{"title":"C3E: A framework for chart classification and content extraction","authors":"Muhammad Suhaib Kanroo , Hadia Showkat Kawoosa , Kapil Rana , Puneet Goyal","doi":"10.1016/j.compeleceng.2024.109861","DOIUrl":null,"url":null,"abstract":"<div><div>Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.</div></div>","PeriodicalId":50630,"journal":{"name":"Computers & Electrical Engineering","volume":"121 ","pages":"Article 109861"},"PeriodicalIF":4.0000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Electrical Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0045790624007882","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Incorporating charts into technical documents enhances richness by simplifying complex data representation and improving comprehension. However, automated chart content extraction (CCE) presents a significant challenge within the domain of document analysis and understanding. The CCE problem can be viewed through a series of six sub-tasks: chart classification (CC), text detection and recognition (TDR), text role classification (TRC), axis analysis, legend analysis, and data extraction. Improving these sub-tasks is important for enhancing the effectiveness of CCE. This paper introduces the chart classification and content extraction (C3E) framework, with a primary focus on the first three sub-tasks of CCE: CC, TDR, and TRC. We propose a ChartVision model for the CC, an EfficientNet-based model coupled with a dual-branch architecture incorporating a novel hybrid convolutional and dilated attention module. For text detection and TRC, we introduce a novel CCE method based on YOLOv5, CCE-YOLO, designed for localizing and classifying textual components of varying sizes. Further, for text recognition, we employ a convolutional recurrent neural network with connectionist temporal classification loss. We conducted experimental analysis on benchmark datasets to assess the effectiveness of our approach across each sub-task. Specifically, we evaluated CC, TDR, and TRC methods using the UB-PMC 2020 and UB-PMC 2022 datasets from the ICPR2020 and ICPR2022 CHART-Infographics competitions. The C3E framework achieved notable F1-scores of 94.26%, 92.44%, and 80.64% for CC, TDR, and TRC, respectively on the UB-PMC 2020 dataset and 94.0%, 91.98%, and 84.48% for CC, TDR, and TRC, respectively on the UB-PMC 2022 dataset.
期刊介绍:
The impact of computers has nowhere been more revolutionary than in electrical engineering. The design, analysis, and operation of electrical and electronic systems are now dominated by computers, a transformation that has been motivated by the natural ease of interface between computers and electrical systems, and the promise of spectacular improvements in speed and efficiency.
Published since 1973, Computers & Electrical Engineering provides rapid publication of topical research into the integration of computer technology and computational techniques with electrical and electronic systems. The journal publishes papers featuring novel implementations of computers and computational techniques in areas like signal and image processing, high-performance computing, parallel processing, and communications. Special attention will be paid to papers describing innovative architectures, algorithms, and software tools.