Data Preprocessing for Learning, Analyzing and Detecting Scene Text Video based on Rotational Gradient

IF 2 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS Data Pub Date : 2021-04-05 DOI:10.1145/3460620.3460621

Manasa Devi Mortha, S. Maddala, V. Raju

{"title":"Data Preprocessing for Learning, Analyzing and Detecting Scene Text Video based on Rotational Gradient","authors":"Manasa Devi Mortha, S. Maddala, V. Raju","doi":"10.1145/3460620.3460621","DOIUrl":null,"url":null,"abstract":"Challenging annotated video datasets are in huge demand for the researchers and embedded industrials to learn and build an artificial intelligence for detecting, localizing and classifying the objects of interest aimed at various applications under pattern recognition and computer vision domain. It is very significant to produce those annotated sets to the respective communal. This paper focuses on text as annotated data in video for detection, localization, tracking and classification to solve several optical character recognition (OCR) based problems. Text is very essential in understanding the nature of the video because of diverse applications which are in renowned today like video retrieval and searching, driverless cars, industrial goods automation, geocoding and many more. Hence, it is important to understand how to create, prepare and load datasets to make ready for the machine to learn and understand. First, we have applied bilateral filter to preserve the edge information. Then, rotational gradient approach is proposed to detect the text in variable viewpoints. Later, the combination of morphology and contours has applied to generate blobs with bounding box around the detected regions by eradicating quasi text areas. The simulation results have shown better performance than traditional techniques with better detection rate on ICDAR Robust Reading Competition on Text in Video 2013-15 datasets.","PeriodicalId":36824,"journal":{"name":"Data","volume":"89 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2021-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data","FirstCategoryId":"90","ListUrlMain":"https://doi.org/10.1145/3460620.3460621","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Challenging annotated video datasets are in huge demand for the researchers and embedded industrials to learn and build an artificial intelligence for detecting, localizing and classifying the objects of interest aimed at various applications under pattern recognition and computer vision domain. It is very significant to produce those annotated sets to the respective communal. This paper focuses on text as annotated data in video for detection, localization, tracking and classification to solve several optical character recognition (OCR) based problems. Text is very essential in understanding the nature of the video because of diverse applications which are in renowned today like video retrieval and searching, driverless cars, industrial goods automation, geocoding and many more. Hence, it is important to understand how to create, prepare and load datasets to make ready for the machine to learn and understand. First, we have applied bilateral filter to preserve the edge information. Then, rotational gradient approach is proposed to detect the text in variable viewpoints. Later, the combination of morphology and contours has applied to generate blobs with bounding box around the detected regions by eradicating quasi text areas. The simulation results have shown better performance than traditional techniques with better detection rate on ICDAR Robust Reading Competition on Text in Video 2013-15 datasets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于旋转梯度的场景文本视频学习、分析和检测数据预处理

具有挑战性的注释视频数据集对研究人员和嵌入式行业有巨大的需求，以学习和构建用于检测，定位和分类感兴趣的对象的人工智能，针对模式识别和计算机视觉领域的各种应用。将这些标注集生成到各自的社区是非常重要的。本文将文本作为视频中的标注数据进行检测、定位、跟踪和分类，以解决若干基于光学字符识别(OCR)的问题。文本对于理解视频的性质是非常重要的，因为今天有各种各样的应用，如视频检索和搜索，无人驾驶汽车，工业产品自动化，地理编码等等。因此，了解如何创建、准备和加载数据集，为机器学习和理解做好准备是很重要的。首先，我们使用双边滤波器来保留边缘信息。然后，提出了旋转梯度方法来检测不同视点的文本。然后，将形态学和轮廓相结合，通过消除准文本区域，在检测区域周围生成带边界框的blobs。仿真结果表明，该方法在2013- 2015年视频数据集的ICDAR文本鲁棒阅读竞赛中具有比传统方法更好的性能和更高的检出率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊