基于图像处理的多阶段公文扫描徽标检测方法

Algorithms Pub Date : 2024-04-22 DOI:10.3390/a17040170
María Guijarro, Juan Bayon, Daniel Martín-Carabias, Joaquín Recas
{"title":"基于图像处理的多阶段公文扫描徽标检测方法","authors":"María Guijarro, Juan Bayon, Daniel Martín-Carabias, Joaquín Recas","doi":"10.3390/a17040170","DOIUrl":null,"url":null,"abstract":"A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using the public Tobacco800 database. Our method outputs a set of regions from an official document with a high probability to contain a logo using a new approach based on the variation of the feature rectangles method available in the literature. Candidate regions were computed using the longest increasing run algorithm over the document blank lines’ indices. Those regions were further refined by using a feature-rectangle-expansion method with forward checking, where the rectangle expansion can occur in parallel in each region. Finally, a C4.5 decision tree was trained and tested against a set of 1291 official documents to evaluate its performance. The strategic combination of the three previous steps offers a precision and recall for logo detention of 98.9% and 89.9%, respectively, being also resistant to noise and low-quality documents. The method is also able to reduce the processing area of the document while maintaining a low percentage of false negatives.","PeriodicalId":502609,"journal":{"name":"Algorithms","volume":"48 11","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Multi-Stage Method for Logo Detection in Scanned Official Documents Based on Image Processing\",\"authors\":\"María Guijarro, Juan Bayon, Daniel Martín-Carabias, Joaquín Recas\",\"doi\":\"10.3390/a17040170\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using the public Tobacco800 database. Our method outputs a set of regions from an official document with a high probability to contain a logo using a new approach based on the variation of the feature rectangles method available in the literature. Candidate regions were computed using the longest increasing run algorithm over the document blank lines’ indices. Those regions were further refined by using a feature-rectangle-expansion method with forward checking, where the rectangle expansion can occur in parallel in each region. Finally, a C4.5 decision tree was trained and tested against a set of 1291 official documents to evaluate its performance. The strategic combination of the three previous steps offers a precision and recall for logo detention of 98.9% and 89.9%, respectively, being also resistant to noise and low-quality documents. The method is also able to reduce the processing area of the document while maintaining a low percentage of false negatives.\",\"PeriodicalId\":502609,\"journal\":{\"name\":\"Algorithms\",\"volume\":\"48 11\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Algorithms\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3390/a17040170\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Algorithms","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/a17040170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

徽标是一个矩形区域,由一系列不同于文字的特征定义,这些特征来自像素信息和区域形状。本文提出了一种新的徽标自动检测方法,并使用公共 Tobacco800 数据库进行了测试。我们的方法使用一种基于文献中特征矩形方法变体的新方法,从正式文件中输出一组包含徽标概率较高的区域。在计算候选区域时,使用了文件空白行指数的最长递增运行算法。通过使用带有前向检查功能的特征矩形扩展法进一步完善这些区域,其中矩形扩展可以在每个区域中并行进行。最后,对 C4.5 决策树进行了训练,并对 1291 份官方文件集进行了测试,以评估其性能。前三个步骤的策略性组合使徽标识别的精确度和召回率分别达到 98.9% 和 89.9%,同时还能抵御噪音和低质量文件。该方法还能减少文件的处理面积,同时保持较低的误判率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
A Multi-Stage Method for Logo Detection in Scanned Official Documents Based on Image Processing
A logotype is a rectangular region defined by a set of characteristics, which come from the pixel information and region shape, that differ from those of the text. In this paper, a new method for automatic logo detection is proposed and tested using the public Tobacco800 database. Our method outputs a set of regions from an official document with a high probability to contain a logo using a new approach based on the variation of the feature rectangles method available in the literature. Candidate regions were computed using the longest increasing run algorithm over the document blank lines’ indices. Those regions were further refined by using a feature-rectangle-expansion method with forward checking, where the rectangle expansion can occur in parallel in each region. Finally, a C4.5 decision tree was trained and tested against a set of 1291 official documents to evaluate its performance. The strategic combination of the three previous steps offers a precision and recall for logo detention of 98.9% and 89.9%, respectively, being also resistant to noise and low-quality documents. The method is also able to reduce the processing area of the document while maintaining a low percentage of false negatives.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Label-Setting Algorithm for Multi-Destination K Simple Shortest Paths Problem and Application A Quantum Approach for Exploring the Numerical Results of the Heat Equation Enhancing Indoor Positioning Accuracy with WLAN and WSN: A QPSO Hybrid Algorithm with Surface Tessellation Trajectory Classification and Recognition of Planar Mechanisms Based on ResNet18 Network Computational Test for Conditional Independence
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1