A New Database of Digits Extracted from Coins with Hard-to-Segment Foreground for Optical Character Recognition Evaluation

Q1 Computer Science Frontiers in ICT Pub Date : 2017-05-09 DOI:10.3389/fict.2017.00009
Xingyu Pan, L. Tougne
{"title":"A New Database of Digits Extracted from Coins with Hard-to-Segment Foreground for Optical Character Recognition Evaluation","authors":"Xingyu Pan, L. Tougne","doi":"10.3389/fict.2017.00009","DOIUrl":null,"url":null,"abstract":"Since the release date struck on a coin is important information of its monetary type, recognition of extracted digits may assist in identification of monetary types. However, digit images extracted from coins are challenging for conventional optical character recognition (OCR) methods because the foreground of such digits has very often the same color as their background. In addition, other noises, including the wear of coin metal, make it more difficult to obtain a correct segmentation of the character shape. To address those challenges, this paper presents the CoinNUMS database for automatic digit recognition. The database CoinNUMS, containing 3006 digit images, is divided into three subsets. The first subset CoinNUMS_geni consists of 606 digit images manually cropped from high-resolution photos of well-conserved coins from GENI coin photos; the second subset CoinNUMS_pcgs_a consists of 1200 digit images automatically extracted from a subset of the USA_Grading numismatic database containing coins in different quality; the last subset CoinNUMS_pcgs_m consists of 1200 digit images manually extracted from the same coin photos as CoinNUMS_pcgs_a. In CoinNUMS_pcgs_a and CoinNUMS_pcgs_m, the digit images are extracted from the release date. In CoinNUMS_geni, the digit images can come from the cropped date, the face value or any other legends containing digits in the coin. To show the difficulty of these databases, we have tested recognition algorithms of the literature. The database and the results of the tested algorithms will be freely available on a dedicated website .","PeriodicalId":37157,"journal":{"name":"Frontiers in ICT","volume":"13 1","pages":"9"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in ICT","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/fict.2017.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 5

Abstract

Since the release date struck on a coin is important information of its monetary type, recognition of extracted digits may assist in identification of monetary types. However, digit images extracted from coins are challenging for conventional optical character recognition (OCR) methods because the foreground of such digits has very often the same color as their background. In addition, other noises, including the wear of coin metal, make it more difficult to obtain a correct segmentation of the character shape. To address those challenges, this paper presents the CoinNUMS database for automatic digit recognition. The database CoinNUMS, containing 3006 digit images, is divided into three subsets. The first subset CoinNUMS_geni consists of 606 digit images manually cropped from high-resolution photos of well-conserved coins from GENI coin photos; the second subset CoinNUMS_pcgs_a consists of 1200 digit images automatically extracted from a subset of the USA_Grading numismatic database containing coins in different quality; the last subset CoinNUMS_pcgs_m consists of 1200 digit images manually extracted from the same coin photos as CoinNUMS_pcgs_a. In CoinNUMS_pcgs_a and CoinNUMS_pcgs_m, the digit images are extracted from the release date. In CoinNUMS_geni, the digit images can come from the cropped date, the face value or any other legends containing digits in the coin. To show the difficulty of these databases, we have tested recognition algorithms of the literature. The database and the results of the tested algorithms will be freely available on a dedicated website .
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
一种用于光学字符识别评价的前景难以分割硬币数字提取新数据库
由于硬币上的发行日期是其货币类型的重要信息,因此对提取的数字的识别可能有助于识别货币类型。然而,从硬币中提取的数字图像对传统的光学字符识别(OCR)方法具有挑战性,因为这些数字的前景通常与背景具有相同的颜色。此外,其他噪音,包括硬币金属的磨损,使得获得正确的字符形状分割更加困难。为了解决这些挑战,本文提出了用于自动数字识别的CoinNUMS数据库。包含3006个数字图像的数据库CoinNUMS分为三个子集。第一个子集CoinNUMS_geni由606位数字图像组成,这些图像是从GENI硬币照片中保存良好的硬币的高分辨率照片中手动裁剪出来的;第二个子集CoinNUMS_pcgs_a由1200个数字图像组成,这些图像自动从包含不同质量硬币的USA_Grading钱币数据库的子集中提取;最后一个子集CoinNUMS_pcgs_m由从与CoinNUMS_pcgs_a相同的硬币照片中手动提取的1200个数字图像组成。在CoinNUMS_pcgs_a和CoinNUMS_pcgs_m中,数字图像是从发布日期提取的。在CoinNUMS_geni中,数字图像可以来自裁剪的日期、面值或任何其他包含硬币中数字的图例。为了显示这些数据库的难度,我们测试了文献的识别算法。数据库和测试算法的结果将在一个专门的网站上免费提供。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Frontiers in ICT
Frontiers in ICT Computer Science-Computer Networks and Communications
自引率
0.00%
发文量
0
期刊最新文献
Project Westdrive: Unity City With Self-Driving Cars and Pedestrians for Virtual Reality Studies The Syncopated Energy Algorithm for Rendering Real-Time Tactile Interactions Dyadic Interference Leads to Area of Uncertainty During Face-to-Face Cooperative Interception Task Eyelid and Pupil Landmark Detection and Blink Estimation Based on Deformable Shape Models for Near-Field Infrared Video Toward Industry 4.0 With IoT: Optimizing Business Processes in an Evolving Manufacturing Factory
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1