Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615860
Frederic Ringsleben, Maik Benndorf, T. Haenselmann, R. Boiger, Manfred Mücke, M. Fehr, Dirk Motthes
Observing production processes is a typical task for sensors in industrial environments. This paper deals with the use of camera systems as a sensor array to compare similar production processes with one another. The aim is to detect anomalies in production processes, such as the motion of robots or the flow of liquids. Since the comparison of high-resolution and long videos is very resource-intensive, we propose clustering the video into areas and shots. Therefore, we suggest interpreting each pixel of a video as a signal varying in time. In order to do that without any background knowledge and to be useful for any production environment with motion involved, we use an unsupervised clustering procedure. We show three different preprocessing approaches to avoid faulty clustering of static image areas and those relevant for the production and finally compare the results.
{"title":"A New Approach using Characteristic Video Signals to Improve the Stability of Manufacturing Processes","authors":"Frederic Ringsleben, Maik Benndorf, T. Haenselmann, R. Boiger, Manfred Mücke, M. Fehr, Dirk Motthes","doi":"10.1109/DICTA.2018.8615860","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615860","url":null,"abstract":"Observing production processes is a typical task for sensors in industrial environments. This paper deals with the use of camera systems as a sensor array to compare similar production processes with one another. The aim is to detect anomalies in production processes, such as the motion of robots or the flow of liquids. Since the comparison of high-resolution and long videos is very resource-intensive, we propose clustering the video into areas and shots. Therefore, we suggest interpreting each pixel of a video as a signal varying in time. In order to do that without any background knowledge and to be useful for any production environment with motion involved, we use an unsupervised clustering procedure. We show three different preprocessing approaches to avoid faulty clustering of static image areas and those relevant for the production and finally compare the results.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"69 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131646013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615815
Ying Liao, Weihong Li, Jinkai Cui, W. Gong
This paper proposes a blur kernel estimation model based on combined constraints involving both image and blur kernel constraints for blind image deblurring. We adopt L0 regularization term for constraining image gradient and dark channel of image gradient to protect image strong edges and suppress noise in image, and use L2 regularization term as hybrid constraints for blur kernel and its gradient to preserve blur kernel's sparsity and continuity respectively. In combined constraints, the constrained dark channel of image gradient, which is a dark channel prior, can also effectively help blind image deblurring in various scenarios, such as natural, face and text images. Moreover, we introduce a half-quadratic splitting optimization algorithm for solving the proposed model. We conduct extensive experiments and results demonstrate that the proposed method can better estimate blur kernel and achieve better visual quality of image deblurring on both synthetic and real-life blurred images.
{"title":"Blur Kernel Estimation Model with Combined Constraints for Blind Image Deblurring","authors":"Ying Liao, Weihong Li, Jinkai Cui, W. Gong","doi":"10.1109/DICTA.2018.8615815","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615815","url":null,"abstract":"This paper proposes a blur kernel estimation model based on combined constraints involving both image and blur kernel constraints for blind image deblurring. We adopt L0 regularization term for constraining image gradient and dark channel of image gradient to protect image strong edges and suppress noise in image, and use L2 regularization term as hybrid constraints for blur kernel and its gradient to preserve blur kernel's sparsity and continuity respectively. In combined constraints, the constrained dark channel of image gradient, which is a dark channel prior, can also effectively help blind image deblurring in various scenarios, such as natural, face and text images. Moreover, we introduce a half-quadratic splitting optimization algorithm for solving the proposed model. We conduct extensive experiments and results demonstrate that the proposed method can better estimate blur kernel and achieve better visual quality of image deblurring on both synthetic and real-life blurred images.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130608336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615810
Mirza Muhammad Ali Baig, Mian Ihtisham Shah, Muhammad Abdullah Wajahat, Nauman Zafar, Omar Arif
Image captioning is a field within artificial intelligence that is progressing rapidly and it has a lot of potentials. A major problem when working in this field is the limited amount of data that is available to us as is. The only dataset considered suitable enough for the task is the Microsoft: Common Objects in Context (MSCOCO) dataset, which contains about 120,000 training images. This covers about 80 object classes, which is an insufficient amount if we want to create robust solutions that aren't limited to the constraints of the data at hand. In order to overcome this problem, we propose a solution that incorporates Zero-Shot Learning concepts in order to identify unknown objects and classes by using semantic word embeddings and existing state-of-the-art object identification algorithms. Our proposed model, Image Captioning using Novel Word Injection, uses a pre-trained caption generator and works on the output of the generator to inject objects that are not present in the dataset into the caption. We evaluate the model on standardized metrics, namely, BLEU, CIDEr and ROUGE-L. The results, qualitatively and quantitatively, outperform the underlying model.
图像字幕是人工智能中发展迅速的一个领域,它具有很大的潜力。在这个领域工作时的一个主要问题是我们现有的可用数据量有限。唯一被认为足够适合这项任务的数据集是微软:上下文中的公共对象(MSCOCO)数据集,它包含大约120,000张训练图像。这涵盖了大约80个对象类,如果我们想要创建不受手头数据约束的健壮解决方案,那么这个数量是不够的。为了克服这个问题,我们提出了一个结合Zero-Shot学习概念的解决方案,通过使用语义词嵌入和现有的最先进的对象识别算法来识别未知对象和类。我们提出的模型,使用Novel Word Injection的图像字幕,使用预训练的标题生成器,并在生成器的输出上工作,将数据集中不存在的对象注入到标题中。我们在标准化指标上评估模型,即BLEU, CIDEr和ROUGE-L。结果,定性和定量,优于基础模型。
{"title":"Image Caption Generator with Novel Object Injection","authors":"Mirza Muhammad Ali Baig, Mian Ihtisham Shah, Muhammad Abdullah Wajahat, Nauman Zafar, Omar Arif","doi":"10.1109/DICTA.2018.8615810","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615810","url":null,"abstract":"Image captioning is a field within artificial intelligence that is progressing rapidly and it has a lot of potentials. A major problem when working in this field is the limited amount of data that is available to us as is. The only dataset considered suitable enough for the task is the Microsoft: Common Objects in Context (MSCOCO) dataset, which contains about 120,000 training images. This covers about 80 object classes, which is an insufficient amount if we want to create robust solutions that aren't limited to the constraints of the data at hand. In order to overcome this problem, we propose a solution that incorporates Zero-Shot Learning concepts in order to identify unknown objects and classes by using semantic word embeddings and existing state-of-the-art object identification algorithms. Our proposed model, Image Captioning using Novel Word Injection, uses a pre-trained caption generator and works on the output of the generator to inject objects that are not present in the dataset into the caption. We evaluate the model on standardized metrics, namely, BLEU, CIDEr and ROUGE-L. The results, qualitatively and quantitatively, outperform the underlying model.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615785
Nima Zarbakht, J. Zou
A high level of situational awareness is essential to an advanced driver assistance system. One of the most important duties of such a system is the detection of lane markings on the road and to distinguish them from the road and other objects such as shadows, traffic, etc. A robust lane detection algorithm is critical to a lane departure warning system. It must determine the relative lane position reliably and rapidly using captured images. The available literature provides some methods to solve problems associated with adverse conditions such as precipitation, glare and blurred lane markings. However, the reliability of these methods can be adversely affected by the lighting conditions. In this paper, a new method is proposed that combines two distinct color spaces to reduce interference in a pre-processing step. The method is adaptive to different lighting situations. The directional gradient is used to detect the lane marking edges. The method can detect lane markings with different complexities imposed by shadows, rain, reflection, strong sources of light such as headlights and tail lights.
{"title":"Lane Detection Under Adverse Conditions Based on Dual Color Space","authors":"Nima Zarbakht, J. Zou","doi":"10.1109/DICTA.2018.8615785","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615785","url":null,"abstract":"A high level of situational awareness is essential to an advanced driver assistance system. One of the most important duties of such a system is the detection of lane markings on the road and to distinguish them from the road and other objects such as shadows, traffic, etc. A robust lane detection algorithm is critical to a lane departure warning system. It must determine the relative lane position reliably and rapidly using captured images. The available literature provides some methods to solve problems associated with adverse conditions such as precipitation, glare and blurred lane markings. However, the reliability of these methods can be adversely affected by the lighting conditions. In this paper, a new method is proposed that combines two distinct color spaces to reduce interference in a pre-processing step. The method is adaptive to different lighting situations. The directional gradient is used to detect the lane marking edges. The method can detect lane markings with different complexities imposed by shadows, rain, reflection, strong sources of light such as headlights and tail lights.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129661082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615771
C. Zhang, Yang Song, Sidong Liu, S. Lill, Chenyu Wang, Zihao Tang, Yuyi You, Yang Gao, A. Klistorner, M. Barnett, Weidong (Tom) Cai
Automated segmentation of multiple sclerosis (MS) lesions in brain imaging is challenging due to the high variability in lesion characteristics. Based on the generative adversarial network (GAN), we propose a semantic segmentation framework MS-GAN to localize MS lesions in multimodal brain magnetic resonance imaging (MRI), which consists of one multimodal encoder-decoder generator G and multiple discriminators D corresponding to the multiple input modalities. For the design of the generator, we adopt an encoder-decoder deep learning architecture with bypass of spatial information from encoder to the corresponding decoder, which helps to reduce the network parameters while improving the localization performance. Our generator is also designed to integrate multimodal imaging data in end-to-end learning with multi-path encoding and cross-modality fusion. An additional classification-related constraint is proposed for the adversarial training process of the GAN model, with the aim of alleviating the hard-to-converge issue in classification-based image-to-image translation problems. For evaluation, we collected a database of 126 cases from patients with relapsing MS. We also experimented with other semantic segmentation models as well as patch-based deep learning methods for performance comparison. The results show that our method provides more accurate segmentation than the state-of-the-art techniques.
{"title":"MS-GAN: GAN-Based Semantic Segmentation of Multiple Sclerosis Lesions in Brain Magnetic Resonance Imaging","authors":"C. Zhang, Yang Song, Sidong Liu, S. Lill, Chenyu Wang, Zihao Tang, Yuyi You, Yang Gao, A. Klistorner, M. Barnett, Weidong (Tom) Cai","doi":"10.1109/DICTA.2018.8615771","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615771","url":null,"abstract":"Automated segmentation of multiple sclerosis (MS) lesions in brain imaging is challenging due to the high variability in lesion characteristics. Based on the generative adversarial network (GAN), we propose a semantic segmentation framework MS-GAN to localize MS lesions in multimodal brain magnetic resonance imaging (MRI), which consists of one multimodal encoder-decoder generator G and multiple discriminators D corresponding to the multiple input modalities. For the design of the generator, we adopt an encoder-decoder deep learning architecture with bypass of spatial information from encoder to the corresponding decoder, which helps to reduce the network parameters while improving the localization performance. Our generator is also designed to integrate multimodal imaging data in end-to-end learning with multi-path encoding and cross-modality fusion. An additional classification-related constraint is proposed for the adversarial training process of the GAN model, with the aim of alleviating the hard-to-converge issue in classification-based image-to-image translation problems. For evaluation, we collected a database of 126 cases from patients with relapsing MS. We also experimented with other semantic segmentation models as well as patch-based deep learning methods for performance comparison. The results show that our method provides more accurate segmentation than the state-of-the-art techniques.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615824
Haoyuan Du, Liquan Dong, Ming Liu, Yuejin Zhao, W. Jia, Xiaohua Liu, Mei Hui, Lingqin Kong, Q. Hao
Wavefront coding (WFC) is a prosperous technology for extending depth of field (DOF) in the incoherent imaging system. Digital recovery of the WFC technique is a classical ill-conditioned problem by removing the blurring effect and suppressing the noise. Traditional approaches relying on image heuristics suffer from high frequency noise amplification and processing artifacts. This paper investigates a general framework of neural networks for restoring images in WFC. To our knowledge, this is the first attempt for applying convolutional networks in WFC. The blur and additive noise are considered simultaneously. Two solutions respectively exploiting fully convolutional networks (FCN) and conditional Generative Adversarial Networks (CGAN) are presented. The FCN based on minimizing the mean squared reconstruction error (MSE) in pixel space gets high PSNR. On the other side, the CGAN based on perceptual loss optimization criterion retrieves more textures. We conduct comparison experiments to demonstrate the performance at different noise levels from the training configuration. We also reveal the image quality on non-natural test target image and defocused situation. The results indicate that the proposed networks outperform traditional approaches for restoring high frequency details and suppressing noise effectively.
{"title":"Image Restoration Based on Deep Convolutional Network in Wavefront Coding Imaging System","authors":"Haoyuan Du, Liquan Dong, Ming Liu, Yuejin Zhao, W. Jia, Xiaohua Liu, Mei Hui, Lingqin Kong, Q. Hao","doi":"10.1109/DICTA.2018.8615824","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615824","url":null,"abstract":"Wavefront coding (WFC) is a prosperous technology for extending depth of field (DOF) in the incoherent imaging system. Digital recovery of the WFC technique is a classical ill-conditioned problem by removing the blurring effect and suppressing the noise. Traditional approaches relying on image heuristics suffer from high frequency noise amplification and processing artifacts. This paper investigates a general framework of neural networks for restoring images in WFC. To our knowledge, this is the first attempt for applying convolutional networks in WFC. The blur and additive noise are considered simultaneously. Two solutions respectively exploiting fully convolutional networks (FCN) and conditional Generative Adversarial Networks (CGAN) are presented. The FCN based on minimizing the mean squared reconstruction error (MSE) in pixel space gets high PSNR. On the other side, the CGAN based on perceptual loss optimization criterion retrieves more textures. We conduct comparison experiments to demonstrate the performance at different noise levels from the training configuration. We also reveal the image quality on non-natural test target image and defocused situation. The results indicate that the proposed networks outperform traditional approaches for restoring high frequency details and suppressing noise effectively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133891562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615792
Hichem Abdellali, Z. Kato
We propose a new algorithm for estimating the absolute and relative pose of a multi-view camera system. The algorithm relies on two solvers: a direct solver using a minimal set of 6 line pairs and a least squares solver which uses all inlier 2D-3D line pairs. The algorithm have been validated on a large synthetic dataset, experimental results confirm the stable and real-time performance under realistic noise on the line parameters as well as on the vertical direction. Furthermore, the algorithm performs well on real data with less then half degree rotation error and less than 25 cm translation error.
{"title":"Absolute and Relative Pose Estimation of a Multi-View Camera System using 2D-3D Line Pairs and Vertical Direction","authors":"Hichem Abdellali, Z. Kato","doi":"10.1109/DICTA.2018.8615792","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615792","url":null,"abstract":"We propose a new algorithm for estimating the absolute and relative pose of a multi-view camera system. The algorithm relies on two solvers: a direct solver using a minimal set of 6 line pairs and a least squares solver which uses all inlier 2D-3D line pairs. The algorithm have been validated on a large synthetic dataset, experimental results confirm the stable and real-time performance under realistic noise on the line parameters as well as on the vertical direction. Furthermore, the algorithm performs well on real data with less then half degree rotation error and less than 25 cm translation error.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133297758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615794
Choon Giap Goh, Wee Han Lim, Justus Chua, I. Atmosukarto
Overcrowding is a common problem faced by train commuters in many countries. While waiting for the train at the stations, commuters tend to cluster and queue at doors that are closest to escalators and elevators that lead towards the station entrances and exits. This scenario results in trains not being fully utilized in terms of their capacity. As cabins with certain door positions tend to be more crowded than the rest of the cabins. The objective of this paper is to provide a methodology to estimate the crowd density within cabins of incoming trains, while leveraging on the existing train CCTV infrastructures. Providing the train cabin density information to commuters who are waiting for the incoming train allows the commuters to better select which cabin to board based on the provided density information. This will facilitate a better commuting experience without incurring a high cost for the train operator. To achieve this objective, we have adopted the usage of deep convolutional neural networks to analyze the footage from the existing security camera inside the trains and classify the images frames based the crowd level of train cabins. Three different experiments were conducted to train and test different convolutional neural network models. All models are able to make classification with an accuracy rate of over 90%.
{"title":"Image Analytics for Train Crowd Estimation","authors":"Choon Giap Goh, Wee Han Lim, Justus Chua, I. Atmosukarto","doi":"10.1109/DICTA.2018.8615794","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615794","url":null,"abstract":"Overcrowding is a common problem faced by train commuters in many countries. While waiting for the train at the stations, commuters tend to cluster and queue at doors that are closest to escalators and elevators that lead towards the station entrances and exits. This scenario results in trains not being fully utilized in terms of their capacity. As cabins with certain door positions tend to be more crowded than the rest of the cabins. The objective of this paper is to provide a methodology to estimate the crowd density within cabins of incoming trains, while leveraging on the existing train CCTV infrastructures. Providing the train cabin density information to commuters who are waiting for the incoming train allows the commuters to better select which cabin to board based on the provided density information. This will facilitate a better commuting experience without incurring a high cost for the train operator. To achieve this objective, we have adopted the usage of deep convolutional neural networks to analyze the footage from the existing security camera inside the trains and classify the images frames based the crowd level of train cabins. Three different experiments were conducted to train and test different convolutional neural network models. All models are able to make classification with an accuracy rate of over 90%.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2018-12-01DOI: 10.1109/DICTA.2018.8615865
F. Kamran, M. Shahzad, F. Shafait
Detection and identification of military vehicles from aerial images is of great practical interest particularly for defense sector as it aids in predicting enemys move and hence, build early precautionary measures. Although due to advancement in the domain of self-driving cars, a vast literature of published algorithms exists that use the terrestrial data to solve the problem of vehicle detection in natural scenes. Directly translating these algorithms towards detection of both military and non-military vehicles in aerial images is not straight forward owing to high variability in scale, illumination and orientation together with articulations both in shape and structure. Moreover, unlike availability of terrestrial benchmark datasets such as Baidu Research Open-Access Dataset etc., there does not exist well-annotated datasets encompassing both military and non-military vehicles in aerial images which as a consequence limit the applicability of the state-of-the-art deep learning based object detection algorithms that have shown great success in the recent years. To this end, we have prepared a dataset of low-altitude aerial images that comprises of both real data (taken from military shows videos) and toy data (downloaded from YouTube videos). The dataset has been categorized into three main types, i.e., military vehicle, non-military vehicle and other non-vehicular objects. In total, there are 15,086 (11,733 toy and 3,353 real) vehicle images exhibiting a variety of different shapes, scales and orientations. To analyze the adequacy of the prepared dataset, we employed the state-of-the-art object detection algorithms to distinguish military and non-military vehicles. The experimental results show that the training of deep architectures using the customized/prepared dataset allows to recognize seven types of military and four types of non-military vehicles.
{"title":"Automated Military Vehicle Detection from Low-Altitude Aerial Images","authors":"F. Kamran, M. Shahzad, F. Shafait","doi":"10.1109/DICTA.2018.8615865","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615865","url":null,"abstract":"Detection and identification of military vehicles from aerial images is of great practical interest particularly for defense sector as it aids in predicting enemys move and hence, build early precautionary measures. Although due to advancement in the domain of self-driving cars, a vast literature of published algorithms exists that use the terrestrial data to solve the problem of vehicle detection in natural scenes. Directly translating these algorithms towards detection of both military and non-military vehicles in aerial images is not straight forward owing to high variability in scale, illumination and orientation together with articulations both in shape and structure. Moreover, unlike availability of terrestrial benchmark datasets such as Baidu Research Open-Access Dataset etc., there does not exist well-annotated datasets encompassing both military and non-military vehicles in aerial images which as a consequence limit the applicability of the state-of-the-art deep learning based object detection algorithms that have shown great success in the recent years. To this end, we have prepared a dataset of low-altitude aerial images that comprises of both real data (taken from military shows videos) and toy data (downloaded from YouTube videos). The dataset has been categorized into three main types, i.e., military vehicle, non-military vehicle and other non-vehicular objects. In total, there are 15,086 (11,733 toy and 3,353 real) vehicle images exhibiting a variety of different shapes, scales and orientations. To analyze the adequacy of the prepared dataset, we employed the state-of-the-art object detection algorithms to distinguish military and non-military vehicles. The experimental results show that the training of deep architectures using the customized/prepared dataset allows to recognize seven types of military and four types of non-military vehicles.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128540476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}