乌尔都克语自然场景数据集

Sindh University Research Journal Pub Date : 2019-12-09 DOI:10.26692/surj/2019.12.95

U. Zaki, D. Hakro, M. Memon, F. H. Khoso, Khalil-ur-Rehman Khoumbati, M. A. Zaki, M. Hameed, G. Nabi

{"title":"乌尔都克语自然场景数据集","authors":"U. Zaki, D. Hakro, M. Memon, F. H. Khoso, Khalil-ur-Rehman Khoumbati, M. A. Zaki, M. Hameed, G. Nabi","doi":"10.26692/surj/2019.12.95","DOIUrl":null,"url":null,"abstract":"In latest years research has drawn attention on text analysis in natural scenes. Databases play a significant part in the efficiency assessment of the algorithm for text recognition. A data set of natural scene text images in six distinct languages have recently been released in an International Conference on Document Analysis and Recognition (ICDAR).This dataset is for multi-languages except Urdu. In the natural images of the Urdu scene, there is an absence of a conventional Urdu text database. This research therefore mainly aims to build a database for Urdu text in natural scenes. The dataset is very large because there are 10 distinct cameras with distinct resolution, distinct angles and distinct range requirements for each picture captured by distinct light zone. The dataset comprises of Urdu words, ligatures and characters in natural scenes. The dataset contains 16k images of words, 32k ligatures and characters images. This dataset contains 1kimagesincluding signboard, a name of the store, banners and so on. In addition, the Urdu dataset is contrasted with the current data set including ICRAR 2003, ARASTI, Chars 74k, etc. The dataset includes many images from the natural scene so it can be used in natural environments to identify Urdu text.","PeriodicalId":21859,"journal":{"name":"Sindh University Research Journal","volume":"68 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Dataset of Urduud1k from Natural Scenes\",\"authors\":\"U. Zaki, D. Hakro, M. Memon, F. H. Khoso, Khalil-ur-Rehman Khoumbati, M. A. Zaki, M. Hameed, G. Nabi\",\"doi\":\"10.26692/surj/2019.12.95\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In latest years research has drawn attention on text analysis in natural scenes. Databases play a significant part in the efficiency assessment of the algorithm for text recognition. A data set of natural scene text images in six distinct languages have recently been released in an International Conference on Document Analysis and Recognition (ICDAR).This dataset is for multi-languages except Urdu. In the natural images of the Urdu scene, there is an absence of a conventional Urdu text database. This research therefore mainly aims to build a database for Urdu text in natural scenes. The dataset is very large because there are 10 distinct cameras with distinct resolution, distinct angles and distinct range requirements for each picture captured by distinct light zone. The dataset comprises of Urdu words, ligatures and characters in natural scenes. The dataset contains 16k images of words, 32k ligatures and characters images. This dataset contains 1kimagesincluding signboard, a name of the store, banners and so on. In addition, the Urdu dataset is contrasted with the current data set including ICRAR 2003, ARASTI, Chars 74k, etc. The dataset includes many images from the natural scene so it can be used in natural environments to identify Urdu text.\",\"PeriodicalId\":21859,\"journal\":{\"name\":\"Sindh University Research Journal\",\"volume\":\"68 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sindh University Research Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.26692/surj/2019.12.95\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sindh University Research Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26692/surj/2019.12.95","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，自然场景中的文本分析受到了人们的关注。数据库在文本识别算法的效率评估中起着重要的作用。最近在国际文件分析与识别会议(ICDAR)上发布了六种不同语言的自然场景文本图像数据集。此数据集适用于除乌尔都语以外的多种语言。在乌尔都语场景的自然图像中，缺乏传统的乌尔都语文本数据库。因此，本研究的主要目的是建立一个自然场景乌尔都语文本数据库。数据集非常大，因为有10个不同的相机，不同的分辨率，不同的角度和不同的距离要求，每个图像被不同的光区捕获。该数据集由乌尔都语单词、连词和自然场景中的字符组成。该数据集包含16k个单词图像，32k个连词和字符图像。该数据集包含1kimage，包括招牌、商店名称、横幅等。此外，还将Urdu数据集与ICRAR 2003、ARASTI、Chars 74k等现有数据集进行了对比。该数据集包括许多来自自然场景的图像，因此可以在自然环境中使用它来识别乌尔都语文本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Dataset of Urduud1k from Natural Scenes

In latest years research has drawn attention on text analysis in natural scenes. Databases play a significant part in the efficiency assessment of the algorithm for text recognition. A data set of natural scene text images in six distinct languages have recently been released in an International Conference on Document Analysis and Recognition (ICDAR).This dataset is for multi-languages except Urdu. In the natural images of the Urdu scene, there is an absence of a conventional Urdu text database. This research therefore mainly aims to build a database for Urdu text in natural scenes. The dataset is very large because there are 10 distinct cameras with distinct resolution, distinct angles and distinct range requirements for each picture captured by distinct light zone. The dataset comprises of Urdu words, ligatures and characters in natural scenes. The dataset contains 16k images of words, 32k ligatures and characters images. This dataset contains 1kimagesincluding signboard, a name of the store, banners and so on. In addition, the Urdu dataset is contrasted with the current data set including ICRAR 2003, ARASTI, Chars 74k, etc. The dataset includes many images from the natural scene so it can be used in natural environments to identify Urdu text.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sindh University Research Journal

自引率

0.00%

发文量