Dataset of Urduud1k from Natural Scenes

U. Zaki, D. Hakro, M. Memon, F. H. Khoso, Khalil-ur-Rehman Khoumbati, M. A. Zaki, M. Hameed, G. Nabi
{"title":"Dataset of Urduud1k from Natural Scenes","authors":"U. Zaki, D. Hakro, M. Memon, F. H. Khoso, Khalil-ur-Rehman Khoumbati, M. A. Zaki, M. Hameed, G. Nabi","doi":"10.26692/surj/2019.12.95","DOIUrl":null,"url":null,"abstract":"In latest years research has drawn attention on text analysis in natural scenes. Databases play a significant part in the efficiency assessment of the algorithm for text recognition. A data set of natural scene text images in six distinct languages have recently been released in an International Conference on Document Analysis and Recognition (ICDAR).This dataset is for multi-languages except Urdu. In the natural images of the Urdu scene, there is an absence of a conventional Urdu text database. This research therefore mainly aims to build a database for Urdu text in natural scenes. The dataset is very large because there are 10 distinct cameras with distinct resolution, distinct angles and distinct range requirements for each picture captured by distinct light zone. The dataset comprises of Urdu words, ligatures and characters in natural scenes. The dataset contains 16k images of words, 32k ligatures and characters images. This dataset contains 1kimagesincluding signboard, a name of the store, banners and so on. In addition, the Urdu dataset is contrasted with the current data set including ICRAR 2003, ARASTI, Chars 74k, etc. The dataset includes many images from the natural scene so it can be used in natural environments to identify Urdu text.","PeriodicalId":21859,"journal":{"name":"Sindh University Research Journal","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sindh University Research Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26692/surj/2019.12.95","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In latest years research has drawn attention on text analysis in natural scenes. Databases play a significant part in the efficiency assessment of the algorithm for text recognition. A data set of natural scene text images in six distinct languages have recently been released in an International Conference on Document Analysis and Recognition (ICDAR).This dataset is for multi-languages except Urdu. In the natural images of the Urdu scene, there is an absence of a conventional Urdu text database. This research therefore mainly aims to build a database for Urdu text in natural scenes. The dataset is very large because there are 10 distinct cameras with distinct resolution, distinct angles and distinct range requirements for each picture captured by distinct light zone. The dataset comprises of Urdu words, ligatures and characters in natural scenes. The dataset contains 16k images of words, 32k ligatures and characters images. This dataset contains 1kimagesincluding signboard, a name of the store, banners and so on. In addition, the Urdu dataset is contrasted with the current data set including ICRAR 2003, ARASTI, Chars 74k, etc. The dataset includes many images from the natural scene so it can be used in natural environments to identify Urdu text.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
乌尔都克语自然场景数据集
近年来,自然场景中的文本分析受到了人们的关注。数据库在文本识别算法的效率评估中起着重要的作用。最近在国际文件分析与识别会议(ICDAR)上发布了六种不同语言的自然场景文本图像数据集。此数据集适用于除乌尔都语以外的多种语言。在乌尔都语场景的自然图像中,缺乏传统的乌尔都语文本数据库。因此,本研究的主要目的是建立一个自然场景乌尔都语文本数据库。数据集非常大,因为有10个不同的相机,不同的分辨率,不同的角度和不同的距离要求,每个图像被不同的光区捕获。该数据集由乌尔都语单词、连词和自然场景中的字符组成。该数据集包含16k个单词图像,32k个连词和字符图像。该数据集包含1kimage,包括招牌、商店名称、横幅等。此外,还将Urdu数据集与ICRAR 2003、ARASTI、Chars 74k等现有数据集进行了对比。该数据集包括许多来自自然场景的图像,因此可以在自然环境中使用它来识别乌尔都语文本。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Knowledge, awareness and perception towards COVID-19 during early outbreak: a cross-sectional study from southern Pakistan Analysis of Facebook contents of Police Department of Pakistan in the context of Good Governance Perception of Quality about Local Manufacturing of Drugs in Pakistan and Its Qualitative Analysis Frequency of overweight and obesity among Middle School Children A case study of District Hyderabad Pakistan Mineralogical Studies of Manchar Formation (Pliocene), Laki Range, Pakistan: source and Possible Occurrence of Bauxite
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1