2018 Digital Image Computing: Techniques and Applications (DICTA)最新文献

英文中文

A New Approach using Characteristic Video Signals to Improve the Stability of Manufacturing Processes 利用特征视频信号提高制造过程稳定性的新方法

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615860

Frederic Ringsleben, Maik Benndorf, T. Haenselmann, R. Boiger, Manfred Mücke, M. Fehr, Dirk Motthes

Observing production processes is a typical task for sensors in industrial environments. This paper deals with the use of camera systems as a sensor array to compare similar production processes with one another. The aim is to detect anomalies in production processes, such as the motion of robots or the flow of liquids. Since the comparison of high-resolution and long videos is very resource-intensive, we propose clustering the video into areas and shots. Therefore, we suggest interpreting each pixel of a video as a signal varying in time. In order to do that without any background knowledge and to be useful for any production environment with motion involved, we use an unsupervised clustering procedure. We show three different preprocessing approaches to avoid faulty clustering of static image areas and those relevant for the production and finally compare the results.

观察生产过程是工业环境中传感器的典型任务。本文讨论了使用相机系统作为传感器阵列来比较相似的生产过程。其目的是检测生产过程中的异常情况，例如机器人的运动或液体的流动。由于高分辨率视频和长视频的比较非常耗费资源，我们建议将视频聚类成区域和镜头。因此，我们建议将视频的每个像素解释为随时间变化的信号。为了在没有任何背景知识的情况下做到这一点，并且对任何涉及运动的生产环境都有用，我们使用了一个无监督的聚类过程。我们展示了三种不同的预处理方法，以避免静态图像区域和生产相关区域的错误聚类，并最后比较了结果。

引用次数: 0

Blur Kernel Estimation Model with Combined Constraints for Blind Image Deblurring 结合约束的模糊核估计模型用于图像去模糊

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615815

Ying Liao, Weihong Li, Jinkai Cui, W. Gong

This paper proposes a blur kernel estimation model based on combined constraints involving both image and blur kernel constraints for blind image deblurring. We adopt L0 regularization term for constraining image gradient and dark channel of image gradient to protect image strong edges and suppress noise in image, and use L2 regularization term as hybrid constraints for blur kernel and its gradient to preserve blur kernel's sparsity and continuity respectively. In combined constraints, the constrained dark channel of image gradient, which is a dark channel prior, can also effectively help blind image deblurring in various scenarios, such as natural, face and text images. Moreover, we introduce a half-quadratic splitting optimization algorithm for solving the proposed model. We conduct extensive experiments and results demonstrate that the proposed method can better estimate blur kernel and achieve better visual quality of image deblurring on both synthetic and real-life blurred images.

提出了一种基于图像约束和模糊核约束相结合的模糊核估计模型，用于图像去模糊。采用L0正则化项约束图像梯度和图像梯度暗通道来保护图像的强边缘和抑制图像中的噪声，采用L2正则化项作为模糊核及其梯度的混合约束，分别保持模糊核的稀疏性和连续性。在联合约束条件下，图像梯度的约束暗通道作为一种暗通道先验，也可以有效地帮助自然、人脸和文本图像等各种场景下的图像去盲。此外，我们引入了一种半二次分裂优化算法来求解所提出的模型。我们进行了大量的实验，结果表明，所提出的方法可以更好地估计模糊核，并在合成和真实模糊图像上获得更好的图像去模糊视觉质量。

引用次数: 1

Image Caption Generator with Novel Object Injection 具有新颖对象注入的图像标题生成器

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615810

Mirza Muhammad Ali Baig, Mian Ihtisham Shah, Muhammad Abdullah Wajahat, Nauman Zafar, Omar Arif

Image captioning is a field within artificial intelligence that is progressing rapidly and it has a lot of potentials. A major problem when working in this field is the limited amount of data that is available to us as is. The only dataset considered suitable enough for the task is the Microsoft: Common Objects in Context (MSCOCO) dataset, which contains about 120,000 training images. This covers about 80 object classes, which is an insufficient amount if we want to create robust solutions that aren't limited to the constraints of the data at hand. In order to overcome this problem, we propose a solution that incorporates Zero-Shot Learning concepts in order to identify unknown objects and classes by using semantic word embeddings and existing state-of-the-art object identification algorithms. Our proposed model, Image Captioning using Novel Word Injection, uses a pre-trained caption generator and works on the output of the generator to inject objects that are not present in the dataset into the caption. We evaluate the model on standardized metrics, namely, BLEU, CIDEr and ROUGE-L. The results, qualitatively and quantitatively, outperform the underlying model.

图像字幕是人工智能中发展迅速的一个领域，它具有很大的潜力。在这个领域工作时的一个主要问题是我们现有的可用数据量有限。唯一被认为足够适合这项任务的数据集是微软:上下文中的公共对象(MSCOCO)数据集，它包含大约120,000张训练图像。这涵盖了大约80个对象类，如果我们想要创建不受手头数据约束的健壮解决方案，那么这个数量是不够的。为了克服这个问题，我们提出了一个结合Zero-Shot学习概念的解决方案，通过使用语义词嵌入和现有的最先进的对象识别算法来识别未知对象和类。我们提出的模型，使用Novel Word Injection的图像字幕，使用预训练的标题生成器，并在生成器的输出上工作，将数据集中不存在的对象注入到标题中。我们在标准化指标上评估模型，即BLEU, CIDEr和ROUGE-L。结果，定性和定量，优于基础模型。

{"title":"Image Caption Generator with Novel Object Injection","authors":"Mirza Muhammad Ali Baig, Mian Ihtisham Shah, Muhammad Abdullah Wajahat, Nauman Zafar, Omar Arif","doi":"10.1109/DICTA.2018.8615810","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615810","url":null,"abstract":"Image captioning is a field within artificial intelligence that is progressing rapidly and it has a lot of potentials. A major problem when working in this field is the limited amount of data that is available to us as is. The only dataset considered suitable enough for the task is the Microsoft: Common Objects in Context (MSCOCO) dataset, which contains about 120,000 training images. This covers about 80 object classes, which is an insufficient amount if we want to create robust solutions that aren't limited to the constraints of the data at hand. In order to overcome this problem, we propose a solution that incorporates Zero-Shot Learning concepts in order to identify unknown objects and classes by using semantic word embeddings and existing state-of-the-art object identification algorithms. Our proposed model, Image Captioning using Novel Word Injection, uses a pre-trained caption generator and works on the output of the generator to inject objects that are not present in the dataset into the caption. We evaluate the model on standardized metrics, namely, BLEU, CIDEr and ROUGE-L. The results, qualitatively and quantitatively, outperform the underlying model.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120848108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Lane Detection Under Adverse Conditions Based on Dual Color Space 基于双颜色空间的不利条件下车道检测

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615785

Nima Zarbakht, J. Zou

A high level of situational awareness is essential to an advanced driver assistance system. One of the most important duties of such a system is the detection of lane markings on the road and to distinguish them from the road and other objects such as shadows, traffic, etc. A robust lane detection algorithm is critical to a lane departure warning system. It must determine the relative lane position reliably and rapidly using captured images. The available literature provides some methods to solve problems associated with adverse conditions such as precipitation, glare and blurred lane markings. However, the reliability of these methods can be adversely affected by the lighting conditions. In this paper, a new method is proposed that combines two distinct color spaces to reduce interference in a pre-processing step. The method is adaptive to different lighting situations. The directional gradient is used to detect the lane marking edges. The method can detect lane markings with different complexities imposed by shadows, rain, reflection, strong sources of light such as headlights and tail lights.

高水平的态势感知对于先进的驾驶员辅助系统至关重要。这种系统最重要的职责之一是检测道路上的车道标记，并将它们与道路和其他物体(如阴影、交通等)区分开来。鲁棒的车道检测算法是车道偏离预警系统的关键。它必须利用捕获的图像可靠、快速地确定相对车道位置。现有的文献提供了一些方法来解决与不利条件有关的问题，如降水、眩光和模糊的车道标记。然而，这些方法的可靠性会受到光照条件的不利影响。本文提出了一种结合两个不同颜色空间的方法来减少预处理步骤中的干扰。该方法可适应不同的照明情况。方向梯度用于检测车道标记边缘。该方法可以检测由阴影、雨水、反射、强光源(如前灯和尾灯)施加的不同复杂性的车道标记。

引用次数: 5

MS-GAN: GAN-Based Semantic Segmentation of Multiple Sclerosis Lesions in Brain Magnetic Resonance Imaging MS-GAN:基于gan的多发性硬化症病变脑磁共振成像语义分割

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615771

C. Zhang, Yang Song, Sidong Liu, S. Lill, Chenyu Wang, Zihao Tang, Yuyi You, Yang Gao, A. Klistorner, M. Barnett, Weidong (Tom) Cai

Automated segmentation of multiple sclerosis (MS) lesions in brain imaging is challenging due to the high variability in lesion characteristics. Based on the generative adversarial network (GAN), we propose a semantic segmentation framework MS-GAN to localize MS lesions in multimodal brain magnetic resonance imaging (MRI), which consists of one multimodal encoder-decoder generator G and multiple discriminators D corresponding to the multiple input modalities. For the design of the generator, we adopt an encoder-decoder deep learning architecture with bypass of spatial information from encoder to the corresponding decoder, which helps to reduce the network parameters while improving the localization performance. Our generator is also designed to integrate multimodal imaging data in end-to-end learning with multi-path encoding and cross-modality fusion. An additional classification-related constraint is proposed for the adversarial training process of the GAN model, with the aim of alleviating the hard-to-converge issue in classification-based image-to-image translation problems. For evaluation, we collected a database of 126 cases from patients with relapsing MS. We also experimented with other semantic segmentation models as well as patch-based deep learning methods for performance comparison. The results show that our method provides more accurate segmentation than the state-of-the-art techniques.

由于多发性硬化症(MS)病变特征的高度可变性，在脑成像中自动分割是具有挑战性的。基于生成对抗网络(GAN)，我们提出了一个语义分割框架MS-GAN来定位多模态脑磁共振成像(MRI)中的MS病变，该框架由一个多模态编码器-解码器生成器G和对应于多个输入模态的多个鉴别器D组成。在生成器的设计上，我们采用了一种编码器-解码器深度学习架构，将空间信息从编码器绕过到相应的解码器，在减少网络参数的同时提高了定位性能。我们的生成器还设计用于集成端到端学习中的多模态成像数据，具有多路径编码和跨模态融合。为GAN模型的对抗训练过程提出了一个额外的分类相关约束，旨在缓解基于分类的图像到图像翻译问题中难以收敛的问题。为了进行评估，我们收集了126例ms复发患者的数据库。我们还尝试了其他语义分割模型和基于补丁的深度学习方法进行性能比较。结果表明，我们的方法提供了比最先进的技术更准确的分割。

{"title":"MS-GAN: GAN-Based Semantic Segmentation of Multiple Sclerosis Lesions in Brain Magnetic Resonance Imaging","authors":"C. Zhang, Yang Song, Sidong Liu, S. Lill, Chenyu Wang, Zihao Tang, Yuyi You, Yang Gao, A. Klistorner, M. Barnett, Weidong (Tom) Cai","doi":"10.1109/DICTA.2018.8615771","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615771","url":null,"abstract":"Automated segmentation of multiple sclerosis (MS) lesions in brain imaging is challenging due to the high variability in lesion characteristics. Based on the generative adversarial network (GAN), we propose a semantic segmentation framework MS-GAN to localize MS lesions in multimodal brain magnetic resonance imaging (MRI), which consists of one multimodal encoder-decoder generator G and multiple discriminators D corresponding to the multiple input modalities. For the design of the generator, we adopt an encoder-decoder deep learning architecture with bypass of spatial information from encoder to the corresponding decoder, which helps to reduce the network parameters while improving the localization performance. Our generator is also designed to integrate multimodal imaging data in end-to-end learning with multi-path encoding and cross-modality fusion. An additional classification-related constraint is proposed for the adversarial training process of the GAN model, with the aim of alleviating the hard-to-converge issue in classification-based image-to-image translation problems. For evaluation, we collected a database of 126 cases from patients with relapsing MS. We also experimented with other semantic segmentation models as well as patch-based deep learning methods for performance comparison. The results show that our method provides more accurate segmentation than the state-of-the-art techniques.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 28

Image Restoration Based on Deep Convolutional Network in Wavefront Coding Imaging System 波前编码成像系统中基于深度卷积网络的图像恢复

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615824

Haoyuan Du, Liquan Dong, Ming Liu, Yuejin Zhao, W. Jia, Xiaohua Liu, Mei Hui, Lingqin Kong, Q. Hao

Wavefront coding (WFC) is a prosperous technology for extending depth of field (DOF) in the incoherent imaging system. Digital recovery of the WFC technique is a classical ill-conditioned problem by removing the blurring effect and suppressing the noise. Traditional approaches relying on image heuristics suffer from high frequency noise amplification and processing artifacts. This paper investigates a general framework of neural networks for restoring images in WFC. To our knowledge, this is the first attempt for applying convolutional networks in WFC. The blur and additive noise are considered simultaneously. Two solutions respectively exploiting fully convolutional networks (FCN) and conditional Generative Adversarial Networks (CGAN) are presented. The FCN based on minimizing the mean squared reconstruction error (MSE) in pixel space gets high PSNR. On the other side, the CGAN based on perceptual loss optimization criterion retrieves more textures. We conduct comparison experiments to demonstrate the performance at different noise levels from the training configuration. We also reveal the image quality on non-natural test target image and defocused situation. The results indicate that the proposed networks outperform traditional approaches for restoring high frequency details and suppressing noise effectively.

波前编码(WFC)是非相干成像系统中扩展景深(DOF)的一种新兴技术。WFC技术的数字恢复是一个典型的病态问题，需要消除模糊效应和抑制噪声。依靠图像启发式的传统方法受到高频噪声放大和处理伪影的影响。本文研究了一种用于WFC图像恢复的通用神经网络框架。据我们所知，这是在WFC中应用卷积网络的第一次尝试。同时考虑了模糊和加性噪声。提出了利用完全卷积网络(FCN)和条件生成对抗网络(CGAN)的两种解决方案。基于最小化像素空间均方重构误差(MSE)的FCN获得了较高的PSNR。另一方面，基于感知损失优化准则的CGAN检索到更多的纹理。我们进行了对比实验，以证明在不同噪声水平下的性能。揭示了非自然测试目标图像和散焦情况下的图像质量。结果表明，该网络在恢复高频细节和抑制噪声方面优于传统方法。

{"title":"Image Restoration Based on Deep Convolutional Network in Wavefront Coding Imaging System","authors":"Haoyuan Du, Liquan Dong, Ming Liu, Yuejin Zhao, W. Jia, Xiaohua Liu, Mei Hui, Lingqin Kong, Q. Hao","doi":"10.1109/DICTA.2018.8615824","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615824","url":null,"abstract":"Wavefront coding (WFC) is a prosperous technology for extending depth of field (DOF) in the incoherent imaging system. Digital recovery of the WFC technique is a classical ill-conditioned problem by removing the blurring effect and suppressing the noise. Traditional approaches relying on image heuristics suffer from high frequency noise amplification and processing artifacts. This paper investigates a general framework of neural networks for restoring images in WFC. To our knowledge, this is the first attempt for applying convolutional networks in WFC. The blur and additive noise are considered simultaneously. Two solutions respectively exploiting fully convolutional networks (FCN) and conditional Generative Adversarial Networks (CGAN) are presented. The FCN based on minimizing the mean squared reconstruction error (MSE) in pixel space gets high PSNR. On the other side, the CGAN based on perceptual loss optimization criterion retrieves more textures. We conduct comparison experiments to demonstrate the performance at different noise levels from the training configuration. We also reveal the image quality on non-natural test target image and defocused situation. The results indicate that the proposed networks outperform traditional approaches for restoring high frequency details and suppressing noise effectively.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133891562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

[Copyright notice] (版权)

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/dicta.2018.8615751

引用次数: 0

Absolute and Relative Pose Estimation of a Multi-View Camera System using 2D-3D Line Pairs and Vertical Direction 基于2D-3D线对和垂直方向的多视点相机系统绝对和相对姿态估计

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615792

Hichem Abdellali, Z. Kato

We propose a new algorithm for estimating the absolute and relative pose of a multi-view camera system. The algorithm relies on two solvers: a direct solver using a minimal set of 6 line pairs and a least squares solver which uses all inlier 2D-3D line pairs. The algorithm have been validated on a large synthetic dataset, experimental results confirm the stable and real-time performance under realistic noise on the line parameters as well as on the vertical direction. Furthermore, the algorithm performs well on real data with less then half degree rotation error and less than 25 cm translation error.

提出了一种估计多视点相机系统的绝对姿态和相对姿态的新算法。该算法依赖于两个求解器:使用6个线对的最小集的直接求解器和使用所有内层2D-3D线对的最小二乘求解器。在大型合成数据集上对该算法进行了验证，实验结果证实了该算法在线参数和垂直方向上的真实噪声下具有稳定的实时性。此外，该算法在实际数据上表现良好，旋转误差小于半度，平移误差小于25 cm。

引用次数: 7

Image Analytics for Train Crowd Estimation 列车人群估计的图像分析

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615794

Choon Giap Goh, Wee Han Lim, Justus Chua, I. Atmosukarto

Overcrowding is a common problem faced by train commuters in many countries. While waiting for the train at the stations, commuters tend to cluster and queue at doors that are closest to escalators and elevators that lead towards the station entrances and exits. This scenario results in trains not being fully utilized in terms of their capacity. As cabins with certain door positions tend to be more crowded than the rest of the cabins. The objective of this paper is to provide a methodology to estimate the crowd density within cabins of incoming trains, while leveraging on the existing train CCTV infrastructures. Providing the train cabin density information to commuters who are waiting for the incoming train allows the commuters to better select which cabin to board based on the provided density information. This will facilitate a better commuting experience without incurring a high cost for the train operator. To achieve this objective, we have adopted the usage of deep convolutional neural networks to analyze the footage from the existing security camera inside the trains and classify the images frames based the crowd level of train cabins. Three different experiments were conducted to train and test different convolutional neural network models. All models are able to make classification with an accuracy rate of over 90%.

拥挤是许多国家火车通勤者面临的普遍问题。在车站等车时，通勤者倾向于聚集在离车站出入口的自动扶梯和电梯最近的门口排队。这种情况导致列车的容量没有得到充分利用。因为有特定舱门位置的舱室往往比其他舱室更拥挤。本文的目的是提供一种方法来估计进站列车车厢内的人群密度，同时利用现有的列车闭路电视基础设施。向等待进站列车的通勤者提供列车客舱密度信息，使通勤者可以根据提供的密度信息更好地选择乘坐哪个客舱。这将促进更好的通勤体验，而不会给火车运营商带来高昂的成本。为了实现这一目标，我们采用深度卷积神经网络来分析火车内部现有安全摄像头的镜头，并根据火车车厢的人群水平对图像帧进行分类。通过三个不同的实验来训练和测试不同的卷积神经网络模型。所有模型的分类准确率均在90%以上。

{"title":"Image Analytics for Train Crowd Estimation","authors":"Choon Giap Goh, Wee Han Lim, Justus Chua, I. Atmosukarto","doi":"10.1109/DICTA.2018.8615794","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615794","url":null,"abstract":"Overcrowding is a common problem faced by train commuters in many countries. While waiting for the train at the stations, commuters tend to cluster and queue at doors that are closest to escalators and elevators that lead towards the station entrances and exits. This scenario results in trains not being fully utilized in terms of their capacity. As cabins with certain door positions tend to be more crowded than the rest of the cabins. The objective of this paper is to provide a methodology to estimate the crowd density within cabins of incoming trains, while leveraging on the existing train CCTV infrastructures. Providing the train cabin density information to commuters who are waiting for the incoming train allows the commuters to better select which cabin to board based on the provided density information. This will facilitate a better commuting experience without incurring a high cost for the train operator. To achieve this objective, we have adopted the usage of deep convolutional neural networks to analyze the footage from the existing security camera inside the trains and classify the images frames based the crowd level of train cabins. Three different experiments were conducted to train and test different convolutional neural network models. All models are able to make classification with an accuracy rate of over 90%.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130177876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Automated Military Vehicle Detection from Low-Altitude Aerial Images 从低空航拍图像自动检测军用车辆

2018 Digital Image Computing: Techniques and Applications (DICTA)

Pub Date : 2018-12-01 DOI: 10.1109/DICTA.2018.8615865

F. Kamran, M. Shahzad, F. Shafait

Detection and identification of military vehicles from aerial images is of great practical interest particularly for defense sector as it aids in predicting enemys move and hence, build early precautionary measures. Although due to advancement in the domain of self-driving cars, a vast literature of published algorithms exists that use the terrestrial data to solve the problem of vehicle detection in natural scenes. Directly translating these algorithms towards detection of both military and non-military vehicles in aerial images is not straight forward owing to high variability in scale, illumination and orientation together with articulations both in shape and structure. Moreover, unlike availability of terrestrial benchmark datasets such as Baidu Research Open-Access Dataset etc., there does not exist well-annotated datasets encompassing both military and non-military vehicles in aerial images which as a consequence limit the applicability of the state-of-the-art deep learning based object detection algorithms that have shown great success in the recent years. To this end, we have prepared a dataset of low-altitude aerial images that comprises of both real data (taken from military shows videos) and toy data (downloaded from YouTube videos). The dataset has been categorized into three main types, i.e., military vehicle, non-military vehicle and other non-vehicular objects. In total, there are 15,086 (11,733 toy and 3,353 real) vehicle images exhibiting a variety of different shapes, scales and orientations. To analyze the adequacy of the prepared dataset, we employed the state-of-the-art object detection algorithms to distinguish military and non-military vehicles. The experimental results show that the training of deep architectures using the customized/prepared dataset allows to recognize seven types of military and four types of non-military vehicles.

从航空图像中探测和识别军用车辆具有很大的实际意义，特别是对国防部门来说，因为它有助于预测敌人的行动，从而建立早期预防措施。尽管由于自动驾驶汽车领域的进步，存在大量已发表的算法文献，使用地面数据来解决自然场景中的车辆检测问题。将这些算法直接转化为航空图像中军事和非军事车辆的检测并不是直截了当的，因为尺度、照明和方向以及形状和结构上的关节都具有高度可变性。此外，与地面基准数据集(如百度研究开放获取数据集等)的可用性不同，航空图像中不存在包含军用和非军用车辆的良好注释数据集，因此限制了近年来取得巨大成功的基于最先进深度学习的目标检测算法的适用性。为此，我们准备了一个低空航拍图像数据集，其中包括真实数据(取自军事表演视频)和玩具数据(从YouTube视频下载)。数据集主要分为三类，即军用车辆、非军用车辆和其他非车辆物体。总共有15,086张(11,733张玩具和3,353张真实)车辆图像，展示了各种不同的形状、比例和方向。为了分析准备数据集的充分性，我们采用了最先进的目标检测算法来区分军用和非军用车辆。实验结果表明，使用定制/准备数据集进行深度架构训练，可以识别7种军用车辆和4种非军用车辆。

{"title":"Automated Military Vehicle Detection from Low-Altitude Aerial Images","authors":"F. Kamran, M. Shahzad, F. Shafait","doi":"10.1109/DICTA.2018.8615865","DOIUrl":"https://doi.org/10.1109/DICTA.2018.8615865","url":null,"abstract":"Detection and identification of military vehicles from aerial images is of great practical interest particularly for defense sector as it aids in predicting enemys move and hence, build early precautionary measures. Although due to advancement in the domain of self-driving cars, a vast literature of published algorithms exists that use the terrestrial data to solve the problem of vehicle detection in natural scenes. Directly translating these algorithms towards detection of both military and non-military vehicles in aerial images is not straight forward owing to high variability in scale, illumination and orientation together with articulations both in shape and structure. Moreover, unlike availability of terrestrial benchmark datasets such as Baidu Research Open-Access Dataset etc., there does not exist well-annotated datasets encompassing both military and non-military vehicles in aerial images which as a consequence limit the applicability of the state-of-the-art deep learning based object detection algorithms that have shown great success in the recent years. To this end, we have prepared a dataset of low-altitude aerial images that comprises of both real data (taken from military shows videos) and toy data (downloaded from YouTube videos). The dataset has been categorized into three main types, i.e., military vehicle, non-military vehicle and other non-vehicular objects. In total, there are 15,086 (11,733 toy and 3,353 real) vehicle images exhibiting a variety of different shapes, scales and orientations. To analyze the adequacy of the prepared dataset, we employed the state-of-the-art object detection algorithms to distinguish military and non-military vehicles. The experimental results show that the training of deep architectures using the customized/prepared dataset allows to recognize seven types of military and four types of non-military vehicles.","PeriodicalId":130057,"journal":{"name":"2018 Digital Image Computing: Techniques and Applications (DICTA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128540476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 Digital Image Computing: Techniques and Applications (DICTA)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀