Guided Anchoring Cascade R-CNN: An intensive improvement of R-CNN in Vietnamese Document Detection

Hai Le, Truong-Hai Nguyen, Vy Le, Trong-Thuan Nguyen, Nguyen D. Vo, Khang Nguyen
{"title":"Guided Anchoring Cascade R-CNN: An intensive improvement of R-CNN in Vietnamese Document Detection","authors":"Hai Le, Truong-Hai Nguyen, Vy Le, Trong-Thuan Nguyen, Nguyen D. Vo, Khang Nguyen","doi":"10.1109/NICS54270.2021.9701510","DOIUrl":null,"url":null,"abstract":"Along with the development of the world, digital documents are gradually replacing paper documents. Therefore, the need to extract information from digital documents is increasing and becoming one of the main interests in the field of computer vision, particularly reading comprehension of image documents. The problem of object detection on image documents (figures, tables, formulas) is one of the premise problems for analyzing and extracting information from documents. Previous studies have mostly focused on English documents. In this study, we now experiment on a Vietnamese image document dataset UIT-DODV, which includes four classes: Table, Figure, Caption and Formula. We test on common state-of-the-art object detection models such as Double-Head R-CNN, Libra R-CNN, Guided Anchoring and achieved the highest results with Guided Anchoring of 73.6% mAP. Besides, we assume that high-quality anchor boxes are keys to the success of an anchor-based object detection models, thus we decide to adopt Guided Anchoring in our research. Moreover, we attempt to raise the quality of the predicted bounding boxes by utilizing Cascade R-CNN architecture, which can afford this by its scheme, so that we can filter out as many confused bounding boxes as possible. Based on the initial evaluation results from the common state-of-the-art object detection models, we proposed an object detection model for Vietnamese image documents based on Cascade R-CNN and Guided Anchoring. Our proposed model has achieved up to 76.6% mAP, 2.1% higher than the baseline model on the UIT-DODV dataset.","PeriodicalId":296963,"journal":{"name":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 8th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS54270.2021.9701510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Along with the development of the world, digital documents are gradually replacing paper documents. Therefore, the need to extract information from digital documents is increasing and becoming one of the main interests in the field of computer vision, particularly reading comprehension of image documents. The problem of object detection on image documents (figures, tables, formulas) is one of the premise problems for analyzing and extracting information from documents. Previous studies have mostly focused on English documents. In this study, we now experiment on a Vietnamese image document dataset UIT-DODV, which includes four classes: Table, Figure, Caption and Formula. We test on common state-of-the-art object detection models such as Double-Head R-CNN, Libra R-CNN, Guided Anchoring and achieved the highest results with Guided Anchoring of 73.6% mAP. Besides, we assume that high-quality anchor boxes are keys to the success of an anchor-based object detection models, thus we decide to adopt Guided Anchoring in our research. Moreover, we attempt to raise the quality of the predicted bounding boxes by utilizing Cascade R-CNN architecture, which can afford this by its scheme, so that we can filter out as many confused bounding boxes as possible. Based on the initial evaluation results from the common state-of-the-art object detection models, we proposed an object detection model for Vietnamese image documents based on Cascade R-CNN and Guided Anchoring. Our proposed model has achieved up to 76.6% mAP, 2.1% higher than the baseline model on the UIT-DODV dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
引导锚定级联R-CNN:对R-CNN在越南语文档检测中的强化改进
随着世界的发展,数字文档正在逐渐取代纸质文档。因此,从数字文档中提取信息的需求日益增加,成为计算机视觉领域的主要研究方向之一,尤其是图像文档的阅读理解。图像文档(图形、表格、公式)的目标检测问题是分析和提取文档信息的前提问题之一。以往的研究主要集中在英文文献上。在本研究中,我们现在在越南图像文档数据集unit - dodv上进行实验,该数据集包括四个类:表、图、标题和公式。我们在常见的最先进的目标检测模型如Double-Head R-CNN, Libra R-CNN, Guided anchor上进行了测试,以73.6% mAP的Guided anchor获得了最高的结果。此外,我们认为高质量的锚盒是基于锚的目标检测模型成功的关键,因此我们决定在我们的研究中采用引导锚定。此外,我们尝试使用Cascade R-CNN架构来提高预测的边界框的质量,该架构的方案可以负担得起这一点,这样我们就可以过滤掉尽可能多的混淆边界框。基于常用的最先进的目标检测模型的初步评估结果,我们提出了一种基于级联R-CNN和导引锚定的越南图像文档目标检测模型。我们提出的模型在unit - dodv数据集上实现了76.6%的mAP,比基线模型高2.1%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A novel adaptive neural controller for narrowband active noise control systems A Lightweight Model for Remote Sensing Image Retrieval with Knowledge Distillation and Mining Interclass Characteristics Keynote Talk #1 : Cryscanner: Finding Cryptographic Libraries Misuse FedChain: A Collaborative Framework for Building Artificial Intelligence Models using Blockchain and Federated Learning Exploring the Performances of Stacking Classifier in Predicting Patients Having Stroke
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1