2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)最新文献_第2页

Detecting the Presence of Vehicles and Equipment in SAR Imagery Using Image Texture Features 利用图像纹理特征检测SAR图像中车辆和设备的存在

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174598

Michael Harner, A. Groener, M. D. Pritt

In this work, we present a methodology for monitoring man-made, construction-like activities in low-resolution SAR imagery. Our source of data is the European Space Agency’s Sentinel-l satellite which provides global coverage at a 12-day revisit rate. Despite limitations in resolution, our methodology enables us to monitor activity levels (i.e. presence of vehicles, equipment) of a pre-defined location by analyzing the texture of detected SAR imagery. Using an exploratory dataset, we trained a support vector machine (SVM), a random binary forest, and a fully-connected neural network for classification. We use Haralick texture features in the VV and VH polarization channels as the input features to our classifiers. Each classifier showed promising results in being able to distinguish between two possible types of construction-site activity levels. This paper documents a case study that is centered around monitoring the construction process for oil and gas fracking wells.

在这项工作中，我们提出了一种在低分辨率SAR图像中监测人造建筑活动的方法。我们的数据来源是欧洲航天局的哨兵1号卫星，它以12天的重访率提供全球覆盖。尽管分辨率有限，但我们的方法使我们能够通过分析检测到的SAR图像的纹理来监测预定义位置的活动水平(即车辆，设备的存在)。使用探索性数据集，我们训练了支持向量机(SVM)、随机二叉森林和全连接神经网络进行分类。我们使用VV和VH极化通道中的Haralick纹理特征作为分类器的输入特征。每个分类器在能够区分两种可能类型的建筑工地活动水平方面显示出有希望的结果。本文记录了一个以监测油气压裂井施工过程为中心的案例研究。

引用次数: 3

Performance Evaluation of Semantic Video Compression using Multi-cue Object Detection 基于多线索目标检测的语义视频压缩性能评价

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174601

Noor M. Al-Shakarji, F. Bunyak, H. Aliakbarpour, G. Seetharaman, K. Palaniappan

Video compression becomes a very important task during real-time aerial surveillance scenarios where limited communication bandwidth and on-board storage greatly restrict air-to-ground and air-to-air communications. In these cases, efficient handling of video data is needed to ensure optimum storage, smoother video transmission, fast and reliable video analysis. Conventional video compression schemes were typically designed for human visual perception rather than automated video analytics. Information loss and artifacts introduced during image/video compression impose serious limitations on the performance of automated video analytics tasks. These limitations are further increased in aerial imagery due to complex background and small size of objects. In this paper, we describe and evaluate a salient region estimation pipeline for aerial imagery to enable adaptive bit-rate allocation during video compression. The salient regions are estimated using a multi-cue moving vehicle detection pipeline, which synergistically fuses complementary appearance and motion cues using deep learning-based object detection and flux tensor-based spatio-temporal filtering approaches. Adaptive compression results using the described multi-cue saliency estimation pipeline are compared against conventional MPEG and JPEG encoding in terms of compression ratio, image quality, and impact on automated video analytics operations. Experimental results on ABQ urban aerial video dataset [1] show that incorporation of contextual information enables high semantic compression ratios of over 2000:1 while preserving image quality for the regions of interest. The proposed pipeline enables better utilization of the limited bandwidth of the air-to-ground or air-to-air network links.

在实时空中监视场景中，有限的通信带宽和机载存储极大地限制了空对地和空对空通信，视频压缩成为一项非常重要的任务。在这些情况下，需要对视频数据进行有效的处理，以保证最佳的存储，更流畅的视频传输，快速可靠的视频分析。传统的视频压缩方案通常是为人类视觉感知而不是自动视频分析而设计的。在图像/视频压缩过程中引入的信息丢失和伪影严重限制了自动视频分析任务的性能。由于复杂的背景和小尺寸的对象，这些限制在航空成像中进一步增加。在本文中，我们描述和评估了一个显著区域估计管道，用于航空图像，以实现视频压缩过程中的自适应比特率分配。使用多线索移动车辆检测管道估计突出区域，该管道使用基于深度学习的物体检测和基于通量张量的时空滤波方法协同融合互补的外观和运动线索。使用所描述的多线索显著性估计管道的自适应压缩结果与传统的MPEG和JPEG编码在压缩比、图像质量和对自动视频分析操作的影响方面进行了比较。在ABQ城市航拍视频数据集[1]上的实验结果表明，上下文信息的结合可以实现超过2000:1的高语义压缩比，同时保持感兴趣区域的图像质量。拟议的管道能够更好地利用空对地或空对空网络链路的有限带宽。

{"title":"Performance Evaluation of Semantic Video Compression using Multi-cue Object Detection","authors":"Noor M. Al-Shakarji, F. Bunyak, H. Aliakbarpour, G. Seetharaman, K. Palaniappan","doi":"10.1109/AIPR47015.2019.9174601","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174601","url":null,"abstract":"Video compression becomes a very important task during real-time aerial surveillance scenarios where limited communication bandwidth and on-board storage greatly restrict air-to-ground and air-to-air communications. In these cases, efficient handling of video data is needed to ensure optimum storage, smoother video transmission, fast and reliable video analysis. Conventional video compression schemes were typically designed for human visual perception rather than automated video analytics. Information loss and artifacts introduced during image/video compression impose serious limitations on the performance of automated video analytics tasks. These limitations are further increased in aerial imagery due to complex background and small size of objects. In this paper, we describe and evaluate a salient region estimation pipeline for aerial imagery to enable adaptive bit-rate allocation during video compression. The salient regions are estimated using a multi-cue moving vehicle detection pipeline, which synergistically fuses complementary appearance and motion cues using deep learning-based object detection and flux tensor-based spatio-temporal filtering approaches. Adaptive compression results using the described multi-cue saliency estimation pipeline are compared against conventional MPEG and JPEG encoding in terms of compression ratio, image quality, and impact on automated video analytics operations. Experimental results on ABQ urban aerial video dataset [1] show that incorporation of contextual information enables high semantic compression ratios of over 2000:1 while preserving image quality for the regions of interest. The proposed pipeline enables better utilization of the limited bandwidth of the air-to-ground or air-to-air network links.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133093404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Multi Stage Common Vector Space for Multimodal Embeddings 多模态嵌入的多阶段公共向量空间

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174583

Sabarish Gopalakrishnan, Premkumar Udaiyar, Shagan Sah, R. Ptucha

Deep learning frameworks have proven to be very effective at tasks like classification, segmentation, detection, and translation. Before being processed by a deep learning model, objects are first encoded into a suitable vector representation. For example, images are typically encoded using convolutional neural networks whereas texts typically use recurrent neural networks. Similarly, other modalities of data like 3D point clouds, audio signals, and videos can be transformed into vectors using appropriate encoders. Although deep learning architectures do a good job of learning these vector representations in isolation, learning a single common representation across multiple modalities is a challenging task. In this work, we develop a Multi Stage Common Vector Space (M-CVS) that is suitable for encoding multiple modalities. The M-CVS is an efficient low-dimensional vector representation in which the contextual similarity of data is preserved across all modalities through the use of contrastive loss functions. Our vector space can perform tasks like multimodal retrieval, searching and generation, where for example, images can be retrieved from text or audio input. The addition of a new modality would generally mean resetting and training the entire network. However, we introduce a stagewise learning technique where each modality is compared to a reference modality before being projected to the M-CVS. Our method ensures that a new modality can be mapped into the MCVS without changing existing encodings, allowing the extension to any number of modalities. We build and evaluate M-CVS on the XMedia and XMedianet multimodal dataset. Extensive ablation experiments using images, text, audio, video, and 3D point cloud modalities demonstrate the complexity vs. accuracy tradeoff under a wide variety of real-world use cases.

深度学习框架已被证明在分类、分割、检测和翻译等任务上非常有效。在被深度学习模型处理之前，对象首先被编码成合适的向量表示。例如，图像通常使用卷积神经网络编码，而文本通常使用循环神经网络编码。类似地，其他形式的数据，如3D点云、音频信号和视频，也可以通过适当的编码器转换成矢量。尽管深度学习架构在孤立地学习这些向量表示方面做得很好，但学习跨多个模态的单一公共表示是一项具有挑战性的任务。在这项工作中，我们开发了一个适合于编码多模态的多阶段公共向量空间(M-CVS)。M-CVS是一种高效的低维矢量表示，通过使用对比损失函数，在所有模态中保留了数据的上下文相似性。我们的向量空间可以执行多模态检索、搜索和生成等任务，例如，可以从文本或音频输入中检索图像。增加一种新的模式通常意味着重置和训练整个网络。然而，我们引入了一种分阶段学习技术，其中每个模态在投射到M-CVS之前与参考模态进行比较。我们的方法确保了新的模态可以在不改变现有编码的情况下映射到MCVS中，从而允许扩展到任意数量的模态。我们在XMedia和XMedianet多模态数据集上构建和评估M-CVS。使用图像、文本、音频、视频和3D点云模式的广泛消融实验证明了在各种实际用例下复杂性与准确性的权衡。

{"title":"Multi Stage Common Vector Space for Multimodal Embeddings","authors":"Sabarish Gopalakrishnan, Premkumar Udaiyar, Shagan Sah, R. Ptucha","doi":"10.1109/AIPR47015.2019.9174583","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174583","url":null,"abstract":"Deep learning frameworks have proven to be very effective at tasks like classification, segmentation, detection, and translation. Before being processed by a deep learning model, objects are first encoded into a suitable vector representation. For example, images are typically encoded using convolutional neural networks whereas texts typically use recurrent neural networks. Similarly, other modalities of data like 3D point clouds, audio signals, and videos can be transformed into vectors using appropriate encoders. Although deep learning architectures do a good job of learning these vector representations in isolation, learning a single common representation across multiple modalities is a challenging task. In this work, we develop a Multi Stage Common Vector Space (M-CVS) that is suitable for encoding multiple modalities. The M-CVS is an efficient low-dimensional vector representation in which the contextual similarity of data is preserved across all modalities through the use of contrastive loss functions. Our vector space can perform tasks like multimodal retrieval, searching and generation, where for example, images can be retrieved from text or audio input. The addition of a new modality would generally mean resetting and training the entire network. However, we introduce a stagewise learning technique where each modality is compared to a reference modality before being projected to the M-CVS. Our method ensures that a new modality can be mapped into the MCVS without changing existing encodings, allowing the extension to any number of modalities. We build and evaluate M-CVS on the XMedia and XMedianet multimodal dataset. Extensive ablation experiments using images, text, audio, video, and 3D point cloud modalities demonstrate the complexity vs. accuracy tradeoff under a wide variety of real-world use cases.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"100 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133190620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Using Robust Networks to Inform Lightweight Models in Semi-Supervised Learning for Object Detection 在半监督学习中使用鲁棒网络通知轻量级模型用于目标检测

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174592

Jonathan Worobey, S. Recker, C. Gribble

A common trade-off among object detection algorithms is accuracy-for-speed (or vice versa). To meet our application’s real-time requirement, we use a Single Shot MultiBox Detector (SSD) model. This architecture meets our latency requirements; however, a large amount of training data is required to achieve an acceptable accuracy level. While unusable for our end application, more robust network architectures, such as Regions with CNN features (R-CNN), provide an important advantage over SSD models—they can be more reliably trained on small datasets. By fine-tuning R-CNN models on a small number of hand-labeled examples, we create new, larger training datasets by running inference on the remaining unlabeled data. We show that these new, inferenced labels are beneficial to the training of lightweight models. These inferenced datasets are imperfect, and we explore various methods of dealing with the errors, including hand-labeling mislabeled data, discarding poor examples, and simply ignoring errors. Further, we explore the total cost, measured in human and computer time, required to execute this workflow compared to a hand-labeling baseline.

目标检测算法之间的一个常见权衡是精度换速度(反之亦然)。为了满足我们应用程序的实时性要求，我们使用单镜头多盒检测器(SSD)模型。这种架构满足我们的延迟需求;然而，要达到可接受的精度水平，需要大量的训练数据。虽然不能用于我们的最终应用程序，但更健壮的网络架构，如具有CNN特征的区域(R-CNN)，提供了比SSD模型更重要的优势——它们可以更可靠地在小数据集上进行训练。通过在少量手工标记的示例上微调R-CNN模型，我们通过在剩余的未标记数据上运行推理来创建新的，更大的训练数据集。我们证明了这些新的、推断的标签对轻量级模型的训练是有益的。这些推断数据集是不完美的，我们探索了各种处理错误的方法，包括手工标记错误标记的数据，丢弃糟糕的例子，以及简单地忽略错误。此外，我们探讨了与手工标记基线相比，执行该工作流所需的总成本，以人力和计算机时间来衡量。

{"title":"Using Robust Networks to Inform Lightweight Models in Semi-Supervised Learning for Object Detection","authors":"Jonathan Worobey, S. Recker, C. Gribble","doi":"10.1109/AIPR47015.2019.9174592","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174592","url":null,"abstract":"A common trade-off among object detection algorithms is accuracy-for-speed (or vice versa). To meet our application’s real-time requirement, we use a Single Shot MultiBox Detector (SSD) model. This architecture meets our latency requirements; however, a large amount of training data is required to achieve an acceptable accuracy level. While unusable for our end application, more robust network architectures, such as Regions with CNN features (R-CNN), provide an important advantage over SSD models—they can be more reliably trained on small datasets. By fine-tuning R-CNN models on a small number of hand-labeled examples, we create new, larger training datasets by running inference on the remaining unlabeled data. We show that these new, inferenced labels are beneficial to the training of lightweight models. These inferenced datasets are imperfect, and we explore various methods of dealing with the errors, including hand-labeling mislabeled data, discarding poor examples, and simply ignoring errors. Further, we explore the total cost, measured in human and computer time, required to execute this workflow compared to a hand-labeling baseline.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121040075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Estimating the Population of Large Animals in the Wild Using Satellite Imagery: A Case Study of Hippos in Zambia’s Luangwa River 利用卫星图像估计野生大型动物的数量:以赞比亚卢安瓜河河马为例

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174564

J. Irvine, J. Nolan, Nathaniel Hofmann, D. Lewis, Twakundine Simpamba, P. Zyambo, A. Travis, S. Hemami

Degradation of natural ecosystems as influenced by increasing human activity and climate change is threatening many animal populations in the wild. Zambia’s hippo population in Luangwa Valley is one example where declining forest cover from increased farming pressures has the potential of limiting hippo range and numbers by reducing water flow in this population’s critical habitat, the Luangwa River. COMACO applies economic incentives through a farmer-based business model to mitigate threats of watershed loss and has identified hippos as a key indicator species for assessing its work and the health of Luangwa’s watershed. The goal of this effort is to develop automated machine learning tools that can process fine resolution commercial satellite imagery to estimate the hippo population and associated characteristics of the habitat. The focus is the Luangwa River in Zambia, where the ideal time for imagery acquisition is the dry season of June through September. This study leverages historical commercial satellite imagery to identify selected areas with observable hippo groupings, develop an-image-based signature for hippo detection, and construct an initial image classifier to support larger-scale assessment of the hippo population over broad regions. We begin by characterizing the nature of the problem and the challenges inherent in applying remote sensing methods to the estimation of animal populations. To address these challenges, spectral signatures were constructed from analysis of historical imagery. The initial approach to classifier development relied on spectral angle to distinguish hippos from background, where background conditions included water, bare soil, low vegetation, trees, and mixtures of these materials. We present the approach and the initial classifier results. We conclude with a discussion of next steps to produce an imagebased estimate of the hippo populations and discuss lessons learned from this study.

受人类活动增加和气候变化影响，自然生态系统的退化正威胁着许多野生动物种群。赞比亚卢安瓜河谷的河马种群就是一个例子，由于农业压力增加，森林覆盖率下降，这可能会限制河马的范围和数量，因为这减少了河马的关键栖息地卢安瓜河的水流。COMACO通过以农民为基础的商业模式实施经济激励措施，以减轻流域损失的威胁，并将河马确定为评估其工作和卢安瓜流域健康的关键指标物种。这项工作的目标是开发自动化机器学习工具，可以处理高分辨率商业卫星图像，以估计河马种群和栖息地的相关特征。重点是赞比亚的卢安瓜河，那里的理想图像采集时间是6月至9月的旱季。本研究利用历史商业卫星图像来识别可观察到河马群体的选定区域，开发基于图像的河马检测签名，并构建初始图像分类器，以支持对广大地区河马种群的大规模评估。我们首先描述问题的性质以及应用遥感方法估计动物种群所固有的挑战。为了应对这些挑战，通过分析历史图像构建了光谱特征。最初的分类器开发方法依赖于光谱角度来区分河马和背景，背景条件包括水、裸露的土壤、低植被、树木和这些物质的混合物。我们给出了该方法和初始分类器结果。最后，我们讨论了下一步如何产生基于图像的河马种群估计，并讨论了从本研究中吸取的教训。

{"title":"Estimating the Population of Large Animals in the Wild Using Satellite Imagery: A Case Study of Hippos in Zambia’s Luangwa River","authors":"J. Irvine, J. Nolan, Nathaniel Hofmann, D. Lewis, Twakundine Simpamba, P. Zyambo, A. Travis, S. Hemami","doi":"10.1109/AIPR47015.2019.9174564","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174564","url":null,"abstract":"Degradation of natural ecosystems as influenced by increasing human activity and climate change is threatening many animal populations in the wild. Zambia’s hippo population in Luangwa Valley is one example where declining forest cover from increased farming pressures has the potential of limiting hippo range and numbers by reducing water flow in this population’s critical habitat, the Luangwa River. COMACO applies economic incentives through a farmer-based business model to mitigate threats of watershed loss and has identified hippos as a key indicator species for assessing its work and the health of Luangwa’s watershed. The goal of this effort is to develop automated machine learning tools that can process fine resolution commercial satellite imagery to estimate the hippo population and associated characteristics of the habitat. The focus is the Luangwa River in Zambia, where the ideal time for imagery acquisition is the dry season of June through September. This study leverages historical commercial satellite imagery to identify selected areas with observable hippo groupings, develop an-image-based signature for hippo detection, and construct an initial image classifier to support larger-scale assessment of the hippo population over broad regions. We begin by characterizing the nature of the problem and the challenges inherent in applying remote sensing methods to the estimation of animal populations. To address these challenges, spectral signatures were constructed from analysis of historical imagery. The initial approach to classifier development relied on spectral angle to distinguish hippos from background, where background conditions included water, bare soil, low vegetation, trees, and mixtures of these materials. We present the approach and the initial classifier results. We conclude with a discussion of next steps to produce an imagebased estimate of the hippo populations and discuss lessons learned from this study.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125069484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3-D Scene Reconstruction Using Depth from Defocus and Deep Learning 基于离焦深度和深度学习的三维场景重建

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174568

David R. Emerson, Lauren A. Christopher

Depth estimation is becoming increasingly important in computer vision applications. As the commercial industry moves forward with autonomous vehicle research and development, there is a demand for these systems to be able to gauge their 3D surroundings in order to avoid obstacles, and react to threats. This need requires depth estimation systems, and current research in self-driving vehicles now use LIDAR for 3D awareness. However, as LIDAR becomes more prevalent there is the potential for an increased risk of interference between this type of active measurement system on multiple vehicles. Passive methods, on the other hand, do not require the transmission of a signal in order to measure depth. Instead, they estimate the depth by using specific cues in the scene. Previous research, using a Depth from Defocus (DfD) single passive camera system, has shown that an in-focus image and an out-of-focus image can be used to produce a depth measure. This research introduces a new Deep Learning (DL) architecture that is capable of ingesting these image pairs to produce a depth map of the given scene improving both speed and performance over a range of lighting conditions. Compared to the previous state-of-the-art multi-label graph cut algorithms; the new DfD-Net produces a 63.7% and 33.6% improvement in the Normalized Root Mean Square Error (NRMSE) for the darkest and brightest images respectively. In addition to the NRMSE, an image quality metric (Structural Similarity Index (SSIM)) was also used to assess the DfD-Net performance. The DfD-Net produced a 3.6% increase (improvement) and a 2.3% reduction (slight decrease) in the SSIM metric for the darkest and brightest images respectively.

深度估计在计算机视觉应用中变得越来越重要。随着商业行业不断推进自动驾驶汽车的研发，这些系统需要能够测量其3D环境，以避开障碍物，并对威胁做出反应。这种需求需要深度估计系统，目前自动驾驶汽车的研究正在使用激光雷达进行3D感知。然而，随着激光雷达变得越来越普遍，这种类型的主动测量系统在多辆车上的干扰风险可能会增加。另一方面，无源方法不需要传输信号来测量深度。相反，他们通过使用场景中的特定线索来估计深度。先前的研究，使用离焦深度(DfD)单被动相机系统，已经表明，在焦点图像和失焦图像可以用来产生深度测量。本研究引入了一种新的深度学习(DL)架构，该架构能够摄取这些图像对以生成给定场景的深度图，从而在一系列照明条件下提高速度和性能。与以往最先进的多标签图切算法相比;新的DfD-Net在最暗和最亮图像的归一化均方根误差(NRMSE)上分别提高了63.7%和33.6%。除了NRMSE之外，还使用图像质量度量(结构相似指数(SSIM))来评估DfD-Net的性能。DfD-Net在最暗和最亮图像的SSIM度量中分别产生3.6%的增加(改进)和2.3%的减少(轻微减少)。

{"title":"3-D Scene Reconstruction Using Depth from Defocus and Deep Learning","authors":"David R. Emerson, Lauren A. Christopher","doi":"10.1109/AIPR47015.2019.9174568","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174568","url":null,"abstract":"Depth estimation is becoming increasingly important in computer vision applications. As the commercial industry moves forward with autonomous vehicle research and development, there is a demand for these systems to be able to gauge their 3D surroundings in order to avoid obstacles, and react to threats. This need requires depth estimation systems, and current research in self-driving vehicles now use LIDAR for 3D awareness. However, as LIDAR becomes more prevalent there is the potential for an increased risk of interference between this type of active measurement system on multiple vehicles. Passive methods, on the other hand, do not require the transmission of a signal in order to measure depth. Instead, they estimate the depth by using specific cues in the scene. Previous research, using a Depth from Defocus (DfD) single passive camera system, has shown that an in-focus image and an out-of-focus image can be used to produce a depth measure. This research introduces a new Deep Learning (DL) architecture that is capable of ingesting these image pairs to produce a depth map of the given scene improving both speed and performance over a range of lighting conditions. Compared to the previous state-of-the-art multi-label graph cut algorithms; the new DfD-Net produces a 63.7% and 33.6% improvement in the Normalized Root Mean Square Error (NRMSE) for the darkest and brightest images respectively. In addition to the NRMSE, an image quality metric (Structural Similarity Index (SSIM)) was also used to assess the DfD-Net performance. The DfD-Net produced a 3.6% increase (improvement) and a 2.3% reduction (slight decrease) in the SSIM metric for the darkest and brightest images respectively.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"02 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129631829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Quantifying Socio-economic Context from Overhead Imagery 从高空图像量化社会经济背景

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174576

Brigid Angelini, Michael R. Crystal, J. Irvine

Discerning regional political volatility is valuable for successful policy development by government and commercial entities, and necessitates having an understanding of the underlying economic, social, and political environment. Some methods of obtaining the environment information, such as global public opinion surveys, are expensive and slow to complete. We explore the feasibility of gleaning comparable information through automated image processing with a premium on freely available commercial satellite imagery. Previous work demonstrated success in predicting survey responses related to wealth, poverty, and crime in rural Afghanistan and Botswana, by utilizing spatially coinciding high resolution satellite images to develop models. We extend these findings by using similar image features to predict survey responses regarding political and economic sentiment. We also explore the feasibility of predicting survey responses with models built from Sentinel 2 satellite imagery, which is coarser-resolution, but freely available. Our fidings reiterate the potential for cheaply and quickly discerning the socio-politico-economic context of a region solely through satellite image features. We show a number of models and their cross-validated performance in predicting survey responses, and conclude with comments and recommendations for future work.

辨别地区政治波动对于政府和商业实体成功制定政策是有价值的，并且需要了解潜在的经济、社会和政治环境。一些获取环境信息的方法，如全球民意调查，既昂贵又缓慢。我们探索通过自动图像处理收集可比信息的可行性，并在免费提供的商业卫星图像上提供溢价。先前的工作表明，通过利用空间重合的高分辨率卫星图像开发模型，可以成功预测阿富汗和博茨瓦纳农村地区与财富、贫困和犯罪相关的调查反应。我们通过使用类似的图像特征来扩展这些发现，以预测有关政治和经济情绪的调查反应。我们还探讨了利用哨兵2号卫星图像建立的模型预测调查结果的可行性，哨兵2号卫星图像分辨率较低，但可以免费获得。我们的研究结果重申了仅通过卫星图像特征就可以廉价而快速地识别一个地区的社会政治经济背景的潜力。我们展示了一些模型及其在预测调查反应方面的交叉验证性能，并总结了对未来工作的评论和建议。

{"title":"Quantifying Socio-economic Context from Overhead Imagery","authors":"Brigid Angelini, Michael R. Crystal, J. Irvine","doi":"10.1109/AIPR47015.2019.9174576","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174576","url":null,"abstract":"Discerning regional political volatility is valuable for successful policy development by government and commercial entities, and necessitates having an understanding of the underlying economic, social, and political environment. Some methods of obtaining the environment information, such as global public opinion surveys, are expensive and slow to complete. We explore the feasibility of gleaning comparable information through automated image processing with a premium on freely available commercial satellite imagery. Previous work demonstrated success in predicting survey responses related to wealth, poverty, and crime in rural Afghanistan and Botswana, by utilizing spatially coinciding high resolution satellite images to develop models. We extend these findings by using similar image features to predict survey responses regarding political and economic sentiment. We also explore the feasibility of predicting survey responses with models built from Sentinel 2 satellite imagery, which is coarser-resolution, but freely available. Our fidings reiterate the potential for cheaply and quickly discerning the socio-politico-economic context of a region solely through satellite image features. We show a number of models and their cross-validated performance in predicting survey responses, and conclude with comments and recommendations for future work.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122272016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

PSIG-GAN: A Parameterized Synthetic Image Generator Optimized via Non-Differentiable GAN PSIG-GAN:一个基于不可微GAN优化的参数化合成图像生成器

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174588

Hussain I. Khajanchi, Jake Bezold, M. Kilcher, Alexander Benasutti, Brian Rentsch, Larry Pearlstein, S. Maxwell

Deep convolutional neural networks have been successfully deployed by large, well-funded teams, but their wider adoption is often limited by the cost and schedule ramifications of their requirement for massive amounts of labeled data. We address this problem through the use of a parameterized synthetic image generator. Our approach is particularly novel in that we have been able to fine tune the generator’s parameters through the use of a generative adversarial network. We describe our approach, and present results that demonstrate its potential benefits. We demonstrate the PSIG-GAN by creating images for training a DCNN to detect the existence and location of weeds in lawn grass.

深度卷积神经网络已经被资金充足的大型团队成功部署，但其广泛采用往往受到成本和大量标记数据需求的进度影响的限制。我们通过使用一个参数化的合成图像生成器来解决这个问题。我们的方法特别新颖，因为我们已经能够通过使用生成对抗网络来微调生成器的参数。我们描述了我们的方法，并提出了证明其潜在好处的结果。我们通过创建图像来训练DCNN来检测草坪草中杂草的存在和位置来演示PSIG-GAN。

引用次数: 0

Globally-scalable Automated Target Recognition (GATR) 全球可扩展的自动目标识别

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174585

Gary Chern, A. Groener, Michael Harner, Tyler Kuhns, A. Lam, Stephen O’Neill, M. D. Pritt

GATR (Globally-scalable Automated Target Recognition) is a Lockheed Martin software system for real-time object detection and classification in satellite imagery on a worldwide basis. GATR uses GPU-accelerated deep learning software to quickly search large geographic regions. On a single GPU it processes imagery at a rate of over 16 km2/sec (or more than 10 Mpixels/sec), and it requires only two hours to search the entire state of Pennsylvania for gas fracking wells. The search time scales linearly with the geographic area, and the processing rate scales linearly with the number of GPUs. GATR has a modular, cloud-based architecture that uses Maxar’s GBDX platform and provides an ATR analytic as a service. Applications include broad area search, watch boxes for monitoring ports and airfields, and site characterization. ATR is performed by deep learning models including RetinaNet and Faster R-CNN. Results are presented for the detection of aircraft and fracking wells and show that the recalls exceed 90% even in geographic regions never seen before. GATR is extensible to new targets, such as cars and ships, and it also handles radar and infrared imagery.

GATR(全球可扩展自动目标识别)是洛克希德·马丁公司的软件系统，用于在全球范围内对卫星图像进行实时目标检测和分类。GATR使用gpu加速的深度学习软件来快速搜索大型地理区域。在单个GPU上，它处理图像的速度超过16平方公里/秒(或超过1000万像素/秒)，只需要两个小时就可以搜索整个宾夕法尼亚州的天然气压裂井。搜索时间与地理区域成线性关系，处理速度与gpu数量成线性关系。GATR采用模块化、基于云的架构，使用Maxar的GBDX平台，并提供ATR分析服务。应用包括广域搜索，监视港口和机场的监视箱，以及现场表征。ATR由包括RetinaNet和Faster R-CNN在内的深度学习模型执行。对飞机和压裂井的检测结果表明，即使在从未见过的地理区域，召回率也超过90%。GATR可以扩展到新的目标，如汽车和船只，它还可以处理雷达和红外图像。

{"title":"Globally-scalable Automated Target Recognition (GATR)","authors":"Gary Chern, A. Groener, Michael Harner, Tyler Kuhns, A. Lam, Stephen O’Neill, M. D. Pritt","doi":"10.1109/AIPR47015.2019.9174585","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174585","url":null,"abstract":"GATR (Globally-scalable Automated Target Recognition) is a Lockheed Martin software system for real-time object detection and classification in satellite imagery on a worldwide basis. GATR uses GPU-accelerated deep learning software to quickly search large geographic regions. On a single GPU it processes imagery at a rate of over 16 km2/sec (or more than 10 Mpixels/sec), and it requires only two hours to search the entire state of Pennsylvania for gas fracking wells. The search time scales linearly with the geographic area, and the processing rate scales linearly with the number of GPUs. GATR has a modular, cloud-based architecture that uses Maxar’s GBDX platform and provides an ATR analytic as a service. Applications include broad area search, watch boxes for monitoring ports and airfields, and site characterization. ATR is performed by deep learning models including RetinaNet and Faster R-CNN. Results are presented for the detection of aircraft and fracking wells and show that the recalls exceed 90% even in geographic regions never seen before. GATR is extensible to new targets, such as cars and ships, and it also handles radar and infrared imagery.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125091403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

GLSNet: Global and Local Streams Network for 3D Point Cloud Classification GLSNet:用于3D点云分类的全球和本地流网络

2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)

Pub Date : 2019-10-01 DOI: 10.1109/AIPR47015.2019.9174587

Rina Bao, K. Palaniappan, Yunxin Zhao, G. Seetharaman, Wenjun Zeng

We propose a novel deep architecture for semantic labeling of 3D point clouds referred to as Global and Local Streams Network (GLSNet) which is designed to capture both global and local structures and contextual information for large scale 3D point cloud classification. Our GLSNet tackles a hard problem – large differences of object sizes in large-scale point cloud segmentation including extremely large objects like water, and small objects like buildings and trees, and we design a two-branch deep network architecture to decompose the complex problem to separate processing problems at global and local scales and then fuse their predictions. GLSNet combines the strength of Submanifold Sparse Convolutional Network [1] for learning global structure with the strength of PointNet++ [2] for incorporating local information.The first branch of GLSNet processes a full point cloud in the global stream, and it captures long range information about the geometric structure by using a U-Net structured Submanifold Sparse Convolutional Network (SSCN-U) architecture. The second branch of GLSNet processes a point cloud in the local stream, and it partitions 3D points into slices and processes one slice of the cloud at a time by using the PointNet ++ architecture. The two streams of information are fused by max pooling over their classification prediction vectors. Our results on the IEEE GRSS Data Fusion Contest Urban Semantic 3D, Track 4 (DFT4) [3] [4] [5] point cloud classification dataset have shown that GLSNet achieved performance gains of almost 4% in mIOU and 1% in overall accuracy over the individual streams on the held-back testing dataset.

我们提出了一种新的用于3D点云语义标记的深度架构，称为全局和局部流网络(GLSNet)，旨在捕获大规模3D点云分类的全局和局部结构以及上下文信息。我们的GLSNet解决了一个难题——大规模点云分割中对象大小的巨大差异，包括像水这样的超大对象，以及像建筑物和树木这样的小对象，我们设计了一个双分支的深度网络架构来分解复杂的问题，在全局和局部尺度上分离处理问题，然后融合它们的预测。GLSNet结合了Submanifold Sparse Convolutional Network[1]学习全局结构的强度和PointNet++[2]吸收局部信息的强度。GLSNet的第一个分支在全局流中处理一个完整的点云，并使用U-Net结构的子流形稀疏卷积网络(Submanifold Sparse Convolutional Network, SSCN-U)架构捕获几何结构的远程信息。GLSNet的第二个分支处理本地流中的点云，并使用PointNet ++架构将3D点划分为切片，一次处理一片云。两个信息流通过最大池化对其分类预测向量进行融合。我们在IEEE GRSS数据融合竞赛城市语义3D，轨道4 (DFT4)[3][4][5]点云分类数据集上的结果表明，GLSNet在mIOU上的性能提高了近4%，在保留测试数据集上的整体精度提高了1%。

{"title":"GLSNet: Global and Local Streams Network for 3D Point Cloud Classification","authors":"Rina Bao, K. Palaniappan, Yunxin Zhao, G. Seetharaman, Wenjun Zeng","doi":"10.1109/AIPR47015.2019.9174587","DOIUrl":"https://doi.org/10.1109/AIPR47015.2019.9174587","url":null,"abstract":"We propose a novel deep architecture for semantic labeling of 3D point clouds referred to as Global and Local Streams Network (GLSNet) which is designed to capture both global and local structures and contextual information for large scale 3D point cloud classification. Our GLSNet tackles a hard problem – large differences of object sizes in large-scale point cloud segmentation including extremely large objects like water, and small objects like buildings and trees, and we design a two-branch deep network architecture to decompose the complex problem to separate processing problems at global and local scales and then fuse their predictions. GLSNet combines the strength of Submanifold Sparse Convolutional Network [1] for learning global structure with the strength of PointNet++ [2] for incorporating local information.The first branch of GLSNet processes a full point cloud in the global stream, and it captures long range information about the geometric structure by using a U-Net structured Submanifold Sparse Convolutional Network (SSCN-U) architecture. The second branch of GLSNet processes a point cloud in the local stream, and it partitions 3D points into slices and processes one slice of the cloud at a time by using the PointNet ++ architecture. The two streams of information are fused by max pooling over their classification prediction vectors. Our results on the IEEE GRSS Data Fusion Contest Urban Semantic 3D, Track 4 (DFT4) [3] [4] [5] point cloud classification dataset have shown that GLSNet achieved performance gains of almost 4% in mIOU and 1% in overall accuracy over the individual streams on the held-back testing dataset.","PeriodicalId":167075,"journal":{"name":"2019 IEEE Applied Imagery Pattern Recognition Workshop (AIPR)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117088400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5