Forged Character Detection Datasets: Passports, Driving Licences and Visa Stickers

Teerath Kumar, Muhammad Turab, Shahnawaz Talpur, Rob Brennan, Malika Bendechache
{"title":"Forged Character Detection Datasets: Passports, Driving Licences and Visa Stickers","authors":"Teerath Kumar, Muhammad Turab, Shahnawaz Talpur, Rob Brennan, Malika Bendechache","doi":"10.5121/ijaia.2022.13202","DOIUrl":null,"url":null,"abstract":"Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our knowledge, we are the first to release these datasets. The proposed algorithm starts by reading plain document images, simulates forging simulation tasks on five different countries' passports, driving licences and VISA stickers. Then it keeps the bounding boxes as a track of the forged characters as a labeling process. Furthermore, considering the real world scenario, we performed the selected data augmentation accordingly. Regarding the stats of datasets, each dataset consists of 15000 images having size of 950 x 550 of each. For further research purpose we release our algorithm code 1 and, datasets i.e. FCD-P 2 , FCD-D 3 and FCD-V 4.","PeriodicalId":391502,"journal":{"name":"International Journal of Artificial Intelligence & Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Artificial Intelligence & Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/ijaia.2022.13202","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

Forged documents specifically passport, driving licence and VISA stickers are used for fraud purposes including robbery, theft and many more. So detecting forged characters from documents is a significantly important and challenging task in digital forensic imaging. Forged characters detection has two big challenges. First challenge is, data for forged characters detection is extremely difficult to get due to several reasons including limited access of data, unlabeled data or work is done on private data. Second challenge is, deep learning (DL) algorithms require labeled data, which poses a further challenge as getting labeled is tedious, time-consuming, expensive and requires domain expertise. To end these issues, in this paper we propose a novel algorithm, which generates the three datasets namely forged characters detection for passport (FCD-P), forged characters detection for driving licence (FCD-D) and forged characters detection for VISA stickers (FCD-V). To the best of our knowledge, we are the first to release these datasets. The proposed algorithm starts by reading plain document images, simulates forging simulation tasks on five different countries' passports, driving licences and VISA stickers. Then it keeps the bounding boxes as a track of the forged characters as a labeling process. Furthermore, considering the real world scenario, we performed the selected data augmentation accordingly. Regarding the stats of datasets, each dataset consists of 15000 images having size of 950 x 550 of each. For further research purpose we release our algorithm code 1 and, datasets i.e. FCD-P 2 , FCD-D 3 and FCD-V 4.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
伪造字符检测数据集:护照,驾驶执照和签证贴纸
伪造的文件,特别是护照、驾驶执照和VISA贴纸,被用于欺诈目的,包括抢劫、盗窃等等。因此,文件伪造字符的检测是数字法医成像中十分重要和具有挑战性的课题。伪造字符检测面临两大挑战。第一个挑战是,伪造字符检测的数据非常难以获得,原因包括数据访问受限,未标记数据或对私人数据进行的工作。第二个挑战是,深度学习(DL)算法需要标记数据,这带来了进一步的挑战,因为标记是乏味、耗时、昂贵的,并且需要领域的专业知识。为了解决这些问题,本文提出了一种新的算法,该算法生成了护照伪造字符检测(FCD-P)、驾照伪造字符检测(FCD-D)和VISA贴纸伪造字符检测(FCD-V)三个数据集。据我们所知,我们是第一个发布这些数据集的。该算法首先读取普通文件图像,在五个不同国家的护照、驾照和VISA贴纸上模拟伪造模拟任务。然后,它将边界框作为伪造字符的跟踪,作为标记过程。此外,考虑到现实场景,我们相应地执行了所选的数据增强。关于数据集的统计,每个数据集由15000张图像组成,每张图像的大小为950 x 550。为了进一步的研究目的,我们发布了我们的算法代码1和数据集,即fcd - p2, fcd - d3和fcd - v4。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Passive Sonar Detection and Classification Based on Demon-Lofar Analysis and Neural Network Algorithms Ensemble Learning Approach for Digital Communication Modulation’s Classification Imbalanced Dataset Effect on CNN-Based Classifier Performance for Face Recognition Foundations of ANNs: Tolstoy’s Genius Explored using Transformer Architecture Review of AI Maturity Models in Automotive SME Manufacturing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1