IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection

Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou
{"title":"IDNet: A Novel Dataset for Identity Document Analysis and Fraud Detection","authors":"Hong Guan, Yancheng Wang, Lulu Xie, Soham Nag, Rajeev Goel, Niranjan Erappa Narayana Swamy, Yingzhen Yang, Chaowei Xiao, Jonathan Prisby, Ross Maciejewski, Jia Zou","doi":"arxiv-2408.01690","DOIUrl":null,"url":null,"abstract":"Effective fraud detection and analysis of government-issued identity\ndocuments, such as passports, driver's licenses, and identity cards, are\nessential in thwarting identity theft and bolstering security on online\nplatforms. The training of accurate fraud detection and analysis tools depends\non the availability of extensive identity document datasets. However, current\npublicly available benchmark datasets for identity document analysis, including\nMIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a\nlimited number of samples, cover insufficient varieties of fraud patterns, and\nseldom include alterations in critical personal identifying fields like\nportrait images, limiting their utility in training models capable of detecting\nrealistic frauds while preserving privacy. In response to these shortcomings, our research introduces a new benchmark\ndataset, IDNet, designed to advance privacy-preserving fraud detection efforts.\nThe IDNet dataset comprises 837,060 images of synthetically generated identity\ndocuments, totaling approximately 490 gigabytes, categorized into 20 types from\n$10$ U.S. states and 10 European countries. We evaluate the utility and present\nuse cases of the dataset, illustrating how it can aid in training\nprivacy-preserving fraud detection methods, facilitating the generation of\ncamera and video capturing of identity documents, and testing schema\nunification and other identity document management functionalities.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01690","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Effective fraud detection and analysis of government-issued identity documents, such as passports, driver's licenses, and identity cards, are essential in thwarting identity theft and bolstering security on online platforms. The training of accurate fraud detection and analysis tools depends on the availability of extensive identity document datasets. However, current publicly available benchmark datasets for identity document analysis, including MIDV-500, MIDV-2020, and FMIDV, fall short in several respects: they offer a limited number of samples, cover insufficient varieties of fraud patterns, and seldom include alterations in critical personal identifying fields like portrait images, limiting their utility in training models capable of detecting realistic frauds while preserving privacy. In response to these shortcomings, our research introduces a new benchmark dataset, IDNet, designed to advance privacy-preserving fraud detection efforts. The IDNet dataset comprises 837,060 images of synthetically generated identity documents, totaling approximately 490 gigabytes, categorized into 20 types from $10$ U.S. states and 10 European countries. We evaluate the utility and present use cases of the dataset, illustrating how it can aid in training privacy-preserving fraud detection methods, facilitating the generation of camera and video capturing of identity documents, and testing schema unification and other identity document management functionalities.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
IDNet:用于身份文件分析和欺诈检测的新型数据集
对护照、驾照和身份证等政府签发的身份证件进行有效的欺诈检测和分析,对于打击身份盗窃和加强在线平台的安全性至关重要。准确欺诈检测和分析工具的培训依赖于大量身份证件数据集的可用性。然而,目前公开的用于身份文件分析的基准数据集(包括 MIDV-500、MIDV-2020 和 FMIDV)在几个方面存在不足:它们提供的样本数量有限,涵盖的欺诈模式种类不足,而且很少包含关键个人身份识别字段(如肖像图像)的更改,这限制了它们在训练模型以检测真实欺诈行为的同时保护隐私方面的实用性。针对这些缺陷,我们的研究引入了一个新的基准数据集 IDNet,旨在推进保护隐私的欺诈检测工作。IDNet 数据集包括 837,060 张合成生成的身份证件图像,总计约 490 千兆字节,分为 20 种类型,分别来自 10 美元的美国各州和 10 个欧洲国家。我们评估了该数据集的实用性并介绍了其使用案例,说明了它如何帮助训练保护隐私的欺诈检测方法、促进身份证件的摄像头和视频捕捉生成,以及测试模式统一和其他身份证件管理功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Vista3D: Unravel the 3D Darkside of a Single Image MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion Efficient Low-Resolution Face Recognition via Bridge Distillation Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints NVLM: Open Frontier-Class Multimodal LLMs
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1