2019 IEEE Winter Conference on Applications of Computer Vision (WACV)最新文献_第2页

Multi-Modal Detection Fusion on a Mobile UGV for Wide-Area, Long-Range Surveillance 面向广域、远程监视的移动UGV多模态检测融合

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00207

Matt Brown, Keith Fieldhouse, E. Swears, Paul Tunison, Adam Romlein, A. Hoogs

We introduce a self-contained, mobile surveillance system designed to remotely detect and track people in real time, at long ranges, and over a wide field of view in cluttered urban and natural settings. The system is integrated with an unmanned ground vehicle, which hosts an array of four IR and four high-resolution RGB cameras, navigational sensors, and onboard processing computers. High-confidence, low-false-alarm-rate person tracks are produced by fusing motion detections and single-frame CNN person detections between co-registered RGB and IR video streams. Processing speeds are increased by using semantic scene segmentation and a tiered inference scheme to focus processing on the most salient regions of the 43° x 7.8° composite field of view. The system autonomously produces alerts of human presence and movement within the field of view, which are disseminated over a radio network and remotely viewed on a tablet computer. We present an ablation study quantifying the benefits that multi-sensor, multi-detector fusion brings to the problem of detecting people in challenging outdoor environments with shadows, occlusions, clutter, and variable weather conditions.

我们推出了一个独立的移动监控系统，旨在远程检测和实时跟踪人员，在混乱的城市和自然环境中，远距离和宽视野。该系统集成了一辆无人地面车辆，该车辆拥有4个红外和4个高分辨率RGB相机阵列、导航传感器和机载处理计算机。在RGB和IR视频流之间融合运动检测和单帧CNN人物检测，产生高置信度、低假警率的人物轨迹。通过使用语义场景分割和分层推理方案，将处理集中在43°x 7.8°复合视场的最显著区域，提高了处理速度。该系统在视野范围内自动发出人类存在和移动的警报，这些警报通过无线网络传播，并在平板电脑上远程观看。我们提出了一项消融研究，量化了多传感器、多探测器融合在具有阴影、遮挡、杂波和多变天气条件的室外环境中检测人员所带来的好处。

{"title":"Multi-Modal Detection Fusion on a Mobile UGV for Wide-Area, Long-Range Surveillance","authors":"Matt Brown, Keith Fieldhouse, E. Swears, Paul Tunison, Adam Romlein, A. Hoogs","doi":"10.1109/WACV.2019.00207","DOIUrl":"https://doi.org/10.1109/WACV.2019.00207","url":null,"abstract":"We introduce a self-contained, mobile surveillance system designed to remotely detect and track people in real time, at long ranges, and over a wide field of view in cluttered urban and natural settings. The system is integrated with an unmanned ground vehicle, which hosts an array of four IR and four high-resolution RGB cameras, navigational sensors, and onboard processing computers. High-confidence, low-false-alarm-rate person tracks are produced by fusing motion detections and single-frame CNN person detections between co-registered RGB and IR video streams. Processing speeds are increased by using semantic scene segmentation and a tiered inference scheme to focus processing on the most salient regions of the 43° x 7.8° composite field of view. The system autonomously produces alerts of human presence and movement within the field of view, which are disseminated over a radio network and remotely viewed on a tablet computer. We present an ablation study quantifying the benefits that multi-sensor, multi-detector fusion brings to the problem of detecting people in challenging outdoor environments with shadows, occlusions, clutter, and variable weather conditions.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"202 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125732000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Exploring Classification of Histological Disease Biomarkers From Renal Biopsy Images 从肾活检图像中探索组织学疾病生物标志物的分类

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00016

Puneet Mathur, Meghna P. Ayyar, R. Shah, S. Sharma

Identification of diseased kidney glomeruli and fibrotic regions remains subjective and time-consuming due to complete dependence on an expert kidney pathologist. In an attempt to automate the classification of glomeruli into normal and abnormal morphology and classification of fibrosis patches into mild, moderate and severe categories, we investigate three deep learning techniques: traditional transfer learning, pre-trained deep neural networks for feature extraction followed by supervised classification, and a novel Multi-Gaze Attention Network (MGANet) that uses multi-headed self-attention through parallel residual skip connections in a CNN architecture. Emperically, while the transfer learning models such as ResNet50, InceptionResNetV2, VGG19 and InceptionV3 acutely under-perform in the classification tasks, the Logistic Regression model augmented with features extracted from the InceptionResNetV2 shows promising results. Additionally, the experiments effectively ascertain that the proposed MGANet architecture outperforms both the former baseline techniques to establish the state of the art accuracy of 87.25% and 81.47% for glomerluli and fibrosis classification, respectively on the Renal Glomeruli Fibrosis Histopathological (RGFH) database.

由于完全依赖肾脏病理学专家，病变肾小球和纤维化区域的鉴定仍然是主观和耗时的。为了自动将肾小球分为正常和异常形态，并将纤维化斑块分为轻度、中度和重度，我们研究了三种深度学习技术:传统的迁移学习、用于特征提取和监督分类的预训练深度神经网络，以及一种新型的多凝视注意网络(MGANet)，该网络通过CNN架构中的并行残差跳跃连接使用多头自注意。从经验上看，虽然迁移学习模型(如ResNet50、InceptionResNetV2、VGG19和InceptionV3)在分类任务中的表现严重不足，但从InceptionResNetV2中提取的特征增强的逻辑回归模型显示出有希望的结果。此外，实验有效地确定了所提出的MGANet架构优于之前的基线技术，在肾小球纤维化组织病理学(RGFH)数据库上建立肾小球和纤维化分类的最新准确率分别为87.25%和81.47%。

{"title":"Exploring Classification of Histological Disease Biomarkers From Renal Biopsy Images","authors":"Puneet Mathur, Meghna P. Ayyar, R. Shah, S. Sharma","doi":"10.1109/WACV.2019.00016","DOIUrl":"https://doi.org/10.1109/WACV.2019.00016","url":null,"abstract":"Identification of diseased kidney glomeruli and fibrotic regions remains subjective and time-consuming due to complete dependence on an expert kidney pathologist. In an attempt to automate the classification of glomeruli into normal and abnormal morphology and classification of fibrosis patches into mild, moderate and severe categories, we investigate three deep learning techniques: traditional transfer learning, pre-trained deep neural networks for feature extraction followed by supervised classification, and a novel Multi-Gaze Attention Network (MGANet) that uses multi-headed self-attention through parallel residual skip connections in a CNN architecture. Emperically, while the transfer learning models such as ResNet50, InceptionResNetV2, VGG19 and InceptionV3 acutely under-perform in the classification tasks, the Logistic Regression model augmented with features extracted from the InceptionResNetV2 shows promising results. Additionally, the experiments effectively ascertain that the proposed MGANet architecture outperforms both the former baseline techniques to establish the state of the art accuracy of 87.25% and 81.47% for glomerluli and fibrosis classification, respectively on the Renal Glomeruli Fibrosis Histopathological (RGFH) database.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125953356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

A Deep Learning Approach to Solar-Irradiance Forecasting in Sky-Videos 天空视频中太阳辐照度预测的深度学习方法

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00234

Talha Ahmad Siddiqui, Samarth Bharadwaj, S. Kalyanaraman

Ahead-of-time forecasting of incident solar-irradiance on a panel is indicative of expected energy yield and is essential for efficient grid distribution and planning. Traditionally, these forecasts are based on meteorological physics models whose parameters are tuned by coarse-grained radiometric tiles sensed from geo-satellites. This research presents a novel application of deep neural network approach to observe and estimate short-term weather effects from videos. Specifically, we use time-lapsed videos (sky-videos) obtained from upward facing wide-lensed cameras (sky-cameras) to directly estimate and forecast solar irradiance. We introduce and present results on two large publicly available datasets obtained from weather stations in two regions of North America using relatively inexpensive optical hardware. These datasets contain over a million images that span for 1 and 12 years respectively, the largest such collection to our knowledge. Compared to satellite based approaches, the proposed deep learning approach significantly reduces the normalized mean-absolute-percentage error for both nowcasting, i.e. prediction of the solar irradiance at the instance the frame is captured, as well as forecasting, ahead-of-time irradiance prediction for a duration for upto 4 hours.

面板上入射太阳辐照度的提前预测是预期能量产出的指示，对于有效的电网分配和规划至关重要。传统上，这些预报是基于气象物理模型，其参数由地球卫星感知的粗粒度辐射瓷砖调整。本研究提出了一种新的应用深度神经网络方法来观察和估计视频中的短期天气影响。具体来说，我们使用从面向上方的宽镜头摄像机(天空摄像机)获得的延时视频(天空视频)来直接估计和预测太阳辐照度。我们介绍并展示了从北美两个地区的气象站获得的两个大型公开数据集的结果，这些数据集使用了相对便宜的光学硬件。这些数据集包含超过100万张图像，分别跨越1年和12年，是我们所知的最大的此类集合。与基于卫星的方法相比，所提出的深度学习方法显著降低了近预报的归一化平均绝对百分比误差，即在捕获帧的实例中预测太阳辐照度，以及预测，持续时间长达4小时的提前辐照度预测。

{"title":"A Deep Learning Approach to Solar-Irradiance Forecasting in Sky-Videos","authors":"Talha Ahmad Siddiqui, Samarth Bharadwaj, S. Kalyanaraman","doi":"10.1109/WACV.2019.00234","DOIUrl":"https://doi.org/10.1109/WACV.2019.00234","url":null,"abstract":"Ahead-of-time forecasting of incident solar-irradiance on a panel is indicative of expected energy yield and is essential for efficient grid distribution and planning. Traditionally, these forecasts are based on meteorological physics models whose parameters are tuned by coarse-grained radiometric tiles sensed from geo-satellites. This research presents a novel application of deep neural network approach to observe and estimate short-term weather effects from videos. Specifically, we use time-lapsed videos (sky-videos) obtained from upward facing wide-lensed cameras (sky-cameras) to directly estimate and forecast solar irradiance. We introduce and present results on two large publicly available datasets obtained from weather stations in two regions of North America using relatively inexpensive optical hardware. These datasets contain over a million images that span for 1 and 12 years respectively, the largest such collection to our knowledge. Compared to satellite based approaches, the proposed deep learning approach significantly reduces the normalized mean-absolute-percentage error for both nowcasting, i.e. prediction of the solar irradiance at the instance the frame is captured, as well as forecasting, ahead-of-time irradiance prediction for a duration for upto 4 hours.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130046565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 34

Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment 基于实例感知跟踪和动态模型刷新的在线多目标跟踪

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00023

Peng Chu, Heng Fan, C. C. Tan, Haibin Ling

Recent progresses in model-free single object tracking (SOT) algorithms have largely inspired applying SOT to multi-object tracking (MOT) to improve the robustness as well as relieving dependency on external detector. However, SOT algorithms are generally designed for distinguishing a target from its environment, and hence meet problems when a target is spatially mixed with similar objects as observed frequently in MOT. To address this issue, in this paper we propose an instance-aware tracker to integrate SOT techniques for MOT by encoding awareness both within and between target models. In particular, we construct each target model by fusing information for distinguishing target both from background and other instances (tracking targets). To conserve uniqueness of all target models, our instance-aware tracker considers response maps from all target models and assigns spatial locations exclusively to optimize the overall accuracy. Another contribution we make is a dynamic model refreshing strategy learned by a convolutional neural network. This strategy helps to eliminate initialization noise as well as to adapt to variation of target size and appearance. To show the effectiveness of the proposed approach, it is evaluated on the popular MOT15 and MOT16 challenge benchmarks. On both benchmarks, our approach achieves the best overall performances in comparison with published results.

无模型单目标跟踪(SOT)算法的最新进展极大地启发了将SOT应用于多目标跟踪(MOT)，以提高鲁棒性并减轻对外部检测器的依赖。然而，SOT算法通常是为了将目标与其环境区分开来而设计的，因此在MOT中经常观察到目标与相似物体在空间上混合时，会遇到问题。为了解决这个问题，在本文中，我们提出了一个实例感知跟踪器，通过在目标模型内部和之间编码感知来集成MOT的SOT技术。特别是，我们通过融合信息来构建每个目标模型，以区分目标与背景和其他实例(跟踪目标)。为了保持所有目标模型的唯一性，我们的实例感知跟踪器考虑来自所有目标模型的响应图，并专门分配空间位置以优化整体精度。我们的另一个贡献是卷积神经网络学习的动态模型刷新策略。该策略有助于消除初始化噪声，并适应目标尺寸和外观的变化。为了证明所提出方法的有效性，在流行的MOT15和MOT16挑战基准上对其进行了评估。在这两个基准测试中，与已公布的结果相比，我们的方法实现了最佳的总体性能。

{"title":"Online Multi-Object Tracking With Instance-Aware Tracker and Dynamic Model Refreshment","authors":"Peng Chu, Heng Fan, C. C. Tan, Haibin Ling","doi":"10.1109/WACV.2019.00023","DOIUrl":"https://doi.org/10.1109/WACV.2019.00023","url":null,"abstract":"Recent progresses in model-free single object tracking (SOT) algorithms have largely inspired applying SOT to multi-object tracking (MOT) to improve the robustness as well as relieving dependency on external detector. However, SOT algorithms are generally designed for distinguishing a target from its environment, and hence meet problems when a target is spatially mixed with similar objects as observed frequently in MOT. To address this issue, in this paper we propose an instance-aware tracker to integrate SOT techniques for MOT by encoding awareness both within and between target models. In particular, we construct each target model by fusing information for distinguishing target both from background and other instances (tracking targets). To conserve uniqueness of all target models, our instance-aware tracker considers response maps from all target models and assigns spatial locations exclusively to optimize the overall accuracy. Another contribution we make is a dynamic model refreshing strategy learned by a convolutional neural network. This strategy helps to eliminate initialization noise as well as to adapt to variation of target size and appearance. To show the effectiveness of the proposed approach, it is evaluated on the popular MOT15 and MOT16 challenge benchmarks. On both benchmarks, our approach achieves the best overall performances in comparison with published results.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"148 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132449987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 98

ThunderNet: A Turbo Unified Network for Real-Time Semantic Segmentation ThunderNet:一个用于实时语义分割的Turbo统一网络

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00195

Wei Xiang, Hongda Mao, V. Athitsos

Recent research in pixel-wise semantic segmentation has increasingly focused on the development of very complicated deep neural networks, which require a large amount of computational resources. The ability to perform dense predictions in real-time, therefore, becomes tantamount to achieving high accuracies. This real-time demand turns out to be fundamental particularly on the mobile platform and other GPU-powered embedded systems like NVIDIA Jetson TX series. In this paper, we present a fast and efficient lightweight network called Turbo Unified Network (ThunderNet). With a minimum backbone truncated from ResNet18, ThunderNet unifies the pyramid pooling module with our customized decoder. Our experimental results show that ThunderNet can achieve 64.0% mIoU on CityScapes, with real-time performance of 96.2 fps on a Titan XP GPU (512x1024), and 20.9 fps on Jetson TX2 (256x512).

近年来在逐像素语义分割方面的研究越来越多地集中在非常复杂的深度神经网络的开发上，这需要大量的计算资源。因此，实时执行密集预测的能力就等同于实现高精度。这种实时需求被证明是基本的，特别是在移动平台和其他gpu驱动的嵌入式系统，如NVIDIA Jetson TX系列。本文提出了一种快速高效的轻量级网络——Turbo统一网络(ThunderNet)。通过从ResNet18截断的最小骨干，ThunderNet将金字塔池模块与我们定制的解码器统一起来。我们的实验结果表明，ThunderNet在cityscape上可以实现64.0%的mIoU，在Titan XP GPU (512x1024)上实时性能为96.2 fps，在Jetson TX2 (256x512)上实时性能为20.9 fps。

引用次数: 27

Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision 从更少的数据中学习:计算机视觉的统一数据子集选择和主动学习框架

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00142

Vishal Kaushal, Rishabh K. Iyer, S. Kothawade, Rohan Mahadev, Khoshrav Doctor, Ganesh Ramakrishnan

Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.

基于最先进的计算机视觉技术的监督机器学习通常需要大量数据。他们的数据管理面临着昂贵的人工标签、不足的计算资源和更大的实验周期的挑战。训练数据子集选择和主动学习技术已经被提出作为这些挑战的可能解决方案。一类特殊的子集选择函数自然地对多样性、覆盖范围和表示概念进行建模，并可用于消除冗余，从而很好地用于训练数据子集选择。它们还可以帮助提高主动学习的效率，通过选择使用传统的基于不确定性采样的技术获得的示例的子集，进一步减少人类的标记工作。在这项工作中，我们通过经验证明了两个多样性模型的有效性，即设施-位置和分散模型，用于训练数据子集选择和减少标记工作。我们在各种计算机视觉任务中全面展示了这一点，包括性别识别、人脸识别、场景识别、对象检测和对象识别。我们的研究结果表明，以正确的方式进行基于多样性的子集选择可以将准确率提高到现有基线的5 - 10%，特别是在可用训练数据较少的情况下。这允许训练复杂的机器学习模型，如卷积神经网络，使用更少的训练数据和标记成本，同时产生最小的性能损失。

{"title":"Learning From Less Data: A Unified Data Subset Selection and Active Learning Framework for Computer Vision","authors":"Vishal Kaushal, Rishabh K. Iyer, S. Kothawade, Rohan Mahadev, Khoshrav Doctor, Ganesh Ramakrishnan","doi":"10.1109/WACV.2019.00142","DOIUrl":"https://doi.org/10.1109/WACV.2019.00142","url":null,"abstract":"Supervised machine learning based state-of-the-art computer vision techniques are in general data hungry. Their data curation poses the challenges of expensive human labeling, inadequate computing resources and larger experiment turn around times. Training data subset selection and active learning techniques have been proposed as possible solutions to these challenges. A special class of subset selection functions naturally model notions of diversity, coverage and representation and can be used to eliminate redundancy thus lending themselves well for training data subset selection. They can also help improve the efficiency of active learning in further reducing human labeling efforts by selecting a subset of the examples obtained using the conventional uncertainty sampling based techniques. In this work, we empirically demonstrate the effectiveness of two diversity models, namely the Facility-Location and Dispersion models for training-data subset selection and reducing labeling effort. We demonstrate this across the board for a variety of computer vision tasks including Gender Recognition, Face Recognition, Scene Recognition, Object Detection and Object Recognition. Our results show that diversity based subset selection done in the right way can increase the accuracy by upto 5 - 10% over existing baselines, particularly in settings in which less training data is available. This allows the training of complex machine learning models like Convolutional Neural Networks with much less training data and labeling costs while incurring minimal performance loss.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121712239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 61

Human-Centric Light Sensing and Estimation From RGBD Images: The Invisible Light Switch 基于RGBD图像的以人为中心的光感知与估计:不可见的光开关

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00050

T. Tsesmelis, Irtiza Hasan, M. Cristani, A. D. Bue, Fabio Galasso

Lighting design in indoor environments is of primary importance for at least two reasons: 1) people should perceive an adequate light; 2) an effective lighting design means consistent energy saving. We present the Invisible Light Switch (ILS) to address both aspects. ILS dynamically adjusts the room illumination level to save energy while maintaining constant the light level perception of the users. So the energy saving is invisible to them. Our proposed ILS leverages a radiosity model to estimate the light level which is perceived by a person within an indoor environment, taking into account the person position and her/his viewing frustum (head pose). ILS may therefore dim those luminaires, which are not seen by the user, resulting in an effective energy saving, especially in large open offices (where light may otherwise be ON everywhere for a single person). To quantify the system performance, we have collected a new dataset where people wear luxmeter devices while working in office rooms. The luxmeters measure the amount of light (in Lux) reaching the people gaze, which we consider a proxy to their illumination level perception. Our initial results are promising: in a room with 8 LED luminaires, the energy consumption in a day may be reduced from 18585 to 6206 watts with ILS (currently needing 1560 watts for operations). While doing so, the drop in perceived lighting decreases by just 200 lux, a value considered negligible when the original illumination level is above 1200 lux, as is normally the case in offices.

室内环境的照明设计至关重要，至少有两个原因:1)人们应该感受到足够的光线;2)有效的照明设计意味着持续的节能。我们提出了不可见光开关(ILS)来解决这两个问题。ILS动态调节室内照明水平，以节省能源，同时保持恒定的亮度感知用户。所以节能对他们来说是不可见的。我们提出的ILS利用辐射模型来估计人在室内环境中感知到的光线水平，同时考虑到人的位置和她/他的视锥体(头部姿势)。因此，ILS可能会使用户看不到的灯具变暗，从而有效地节省能源，特别是在大型开放式办公室(否则可能会为一个人开着所有的灯)。为了量化系统性能，我们收集了一个新的数据集，其中人们在办公室工作时佩戴luxmeter设备。luxmeter测量到达人们视线的光量(以Lux为单位)，我们认为这是他们感知照明水平的代理。我们的初步结果是有希望的:在一个有8个LED灯具的房间里，使用ILS每天的能耗可能会从18585瓦减少到6206瓦(目前需要1560瓦的运行)。当这样做时，感知到的照明下降仅为200勒克斯，当原始照明水平高于1200勒克斯时，这个值被认为可以忽略不计，就像办公室通常的情况一样。

{"title":"Human-Centric Light Sensing and Estimation From RGBD Images: The Invisible Light Switch","authors":"T. Tsesmelis, Irtiza Hasan, M. Cristani, A. D. Bue, Fabio Galasso","doi":"10.1109/WACV.2019.00050","DOIUrl":"https://doi.org/10.1109/WACV.2019.00050","url":null,"abstract":"Lighting design in indoor environments is of primary importance for at least two reasons: 1) people should perceive an adequate light; 2) an effective lighting design means consistent energy saving. We present the Invisible Light Switch (ILS) to address both aspects. ILS dynamically adjusts the room illumination level to save energy while maintaining constant the light level perception of the users. So the energy saving is invisible to them. Our proposed ILS leverages a radiosity model to estimate the light level which is perceived by a person within an indoor environment, taking into account the person position and her/his viewing frustum (head pose). ILS may therefore dim those luminaires, which are not seen by the user, resulting in an effective energy saving, especially in large open offices (where light may otherwise be ON everywhere for a single person). To quantify the system performance, we have collected a new dataset where people wear luxmeter devices while working in office rooms. The luxmeters measure the amount of light (in Lux) reaching the people gaze, which we consider a proxy to their illumination level perception. Our initial results are promising: in a room with 8 LED luminaires, the energy consumption in a day may be reduced from 18585 to 6206 watts with ILS (currently needing 1560 watts for operations). While doing so, the drop in perceived lighting decreases by just 200 lux, a value considered negligible when the original illumination level is above 1200 lux, as is normally the case in offices.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123876306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Video-Rate Video Inpainting 视频速率视频绘制

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00170

Rito Murase, Yan Zhang, Takayuki Okatani

This paper considers the problem of video inpainting, i.e., to remove specified objects from an input video. Many methods have been developed for the problem so far, in which there is a trade-off between image quality and computational time. There was no method that can generate high-quality images in video rate. The key to video inpainting is how to establish correspondences from scene regions occluded in a frame to those observed in other frames. To break the trade-off, we propose to use CNNs as a solution to this key problem. We extend existing CNNs for the standard task of optical flow estimation to be able to estimate the flow of occluded background regions. The extension includes augmentation of their architecture and changes of their training method. We experimentally show that this approach works well despite its simplicity, and that a simple video inpainting method integrating this flow estimator runs in video rate (e.g., 32fps for 832 × 448 pixel videos on a standard PC with a GPU) while achieving image quality close to the state-of-the-art.

本文研究了视频补画问题，即从输入视频中去除指定的对象。到目前为止，已经开发了许多方法来解决这个问题，其中需要在图像质量和计算时间之间进行权衡。在视频速率下，没有一种方法可以生成高质量的图像。视频绘画的关键是如何建立一帧中被遮挡的场景区域与其他帧中观察到的场景区域的对应关系。为了打破这种权衡，我们建议使用cnn作为解决这个关键问题的方法。我们将现有的cnn扩展到光流估计的标准任务，使其能够估计被遮挡背景区域的流量。扩展包括他们的架构的增强和他们的训练方法的变化。我们的实验表明，尽管这种方法很简单，但效果很好，并且集成该流量估计器的简单视频绘制方法可以在视频速率下运行(例如，在带有GPU的标准PC上，832 × 448像素视频的32fps)，同时获得接近最先进的图像质量。

引用次数: 4

Cascade Attention Machine for Occluded Landmark Detection in 2D X-Ray Angiography 二维x线血管造影中闭塞地标检测的级联注意机

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00017

Liheng Zhang, V. Singh, Guo-Jun Qi, Terrence Chen

In cardiac interventions, localization of guiding catheter tip in 2D fluoroscopic images is important to specify ves-sel branches and calibrate vessels with stenosis. While detection of guiding catheter tip is not trivial in contrast-free images due to low dose radiation as well as occlusion by other devices, it is even more challenging in contrast-filled images. As contrast-filled vessels become visible in X-ray imaging, the landmark of guiding catheter tip can often be completely occluded by the contrast medium. It is difficult even for human eyes to precisely localize the catheter tip from a single angiography image. Physicians have to rely on information before the inject of contrast medium to localize the guiding catheter tip occluded by contrast medium. Automatic landmark detection when occlusion happens is important and can significantly simplify the intervention workflow. To address this problem, we propose a novel Cascade Attention Machine (CAM) model. It borrows the idea of how human experts localize the catheter tip by first per-forming landmark detection when occlusion does not hap-pen, then leveraging this information as prior knowledge to assist the occluded detection. Attention maps are computed from non-occluded detection to further refine the heatmaps for occluded detection to guide the inference focusing on related regions. Experiments on X-ray angiography demonstrate the promising performance compared with the state-of-the-art baselines. It shows that the CAM can capture the relation between situations with and without occlusion to achieve precise detection of occluded landmark.

在心脏介入治疗中，在二维透视图像中定位导尿管尖端对于确定血管分支和校准狭窄血管非常重要。由于低剂量辐射以及其他设备的遮挡，在无对比度图像中检测导尿管尖端并非易事，而在充满对比度的图像中则更具挑战性。由于在x线成像中可见造影剂填充的血管，导尿管尖端的地标通常会被造影剂完全阻塞。即使是人眼也很难从单个血管造影图像中精确定位导管尖端。在注射造影剂前，医生必须依靠信息来定位被造影剂堵塞的导尿管尖端。遮挡发生时的自动地标检测非常重要，可以显著简化干预工作流程。为了解决这个问题，我们提出了一个新的级联注意机(CAM)模型。它借鉴了人类专家如何定位导管尖端的思想，首先在没有发生闭塞的情况下进行地标检测，然后利用这些信息作为先验知识来辅助闭塞检测。从非遮挡检测中计算注意图，进一步细化遮挡检测的热图，指导聚焦相关区域的推理。与最先进的基线相比，x射线血管造影实验证明了有希望的性能。结果表明，该方法能够捕捉到有遮挡和无遮挡情况之间的关系，实现对遮挡地标的精确检测。

{"title":"Cascade Attention Machine for Occluded Landmark Detection in 2D X-Ray Angiography","authors":"Liheng Zhang, V. Singh, Guo-Jun Qi, Terrence Chen","doi":"10.1109/WACV.2019.00017","DOIUrl":"https://doi.org/10.1109/WACV.2019.00017","url":null,"abstract":"In cardiac interventions, localization of guiding catheter tip in 2D fluoroscopic images is important to specify ves-sel branches and calibrate vessels with stenosis. While detection of guiding catheter tip is not trivial in contrast-free images due to low dose radiation as well as occlusion by other devices, it is even more challenging in contrast-filled images. As contrast-filled vessels become visible in X-ray imaging, the landmark of guiding catheter tip can often be completely occluded by the contrast medium. It is difficult even for human eyes to precisely localize the catheter tip from a single angiography image. Physicians have to rely on information before the inject of contrast medium to localize the guiding catheter tip occluded by contrast medium. Automatic landmark detection when occlusion happens is important and can significantly simplify the intervention workflow. To address this problem, we propose a novel Cascade Attention Machine (CAM) model. It borrows the idea of how human experts localize the catheter tip by first per-forming landmark detection when occlusion does not hap-pen, then leveraging this information as prior knowledge to assist the occluded detection. Attention maps are computed from non-occluded detection to further refine the heatmaps for occluded detection to guide the inference focusing on related regions. Experiments on X-ray angiography demonstrate the promising performance compared with the state-of-the-art baselines. It shows that the CAM can capture the relation between situations with and without occlusion to achieve precise detection of occluded landmark.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126964853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Coupled Generative Adversarial Network for Continuous Fine-Grained Action Segmentation 连续细粒度动作分割的耦合生成对抗网络

2019 IEEE Winter Conference on Applications of Computer Vision (WACV)

Pub Date : 2019-01-01 DOI: 10.1109/WACV.2019.00027

Harshala Gammulle, Tharindu Fernando, S. Denman, S. Sridharan, C. Fookes

We propose a novel conditional GAN (cGAN) model for continuous fine-grained human action segmentation, that utilises multi-modal data and learned scene context information. The proposed approach utilises two GANs: termed Action GAN and Auxiliary GAN, where the Action GAN is trained to operate over the current RGB frame while the Auxiliary GAN utilises supplementary information such as depth or optical flow. The goal of both GANs is to generate similar 'action codes', a vector representation of the current action. To facilitate this process a context extractor that incorporates data and recent outputs from both modes is used to extract context information to aids recognition performance. The result is a recurrent GAN architecture which learns a task specific loss function from multiple feature modalities. Extensive evaluations on variants of the proposed model to show the importance of utilising different streams of information such as context and auxiliary information in the proposed network; and show that our model is capable of outperforming state-of-the-art methods for three widely used datasets: 50 Salads, MERL Shopping and Georgia Tech Egocentric Activities, comprising both static and dynamic camera settings.

我们提出了一种新的条件GAN (cGAN)模型，用于连续的细粒度人类动作分割，该模型利用多模态数据和学习的场景上下文信息。所提出的方法利用两种GAN:称为动作GAN和辅助GAN，其中动作GAN被训练为在当前RGB帧上操作，而辅助GAN利用诸如深度或光流等补充信息。这两种gan的目标是生成相似的“动作代码”，即当前动作的向量表示。为了促进这一过程，使用了一个上下文提取器，该提取器结合了两种模式的数据和最近的输出来提取上下文信息，以提高识别性能。结果是一个循环GAN架构，它从多个特征模态中学习任务特定的损失函数。对所建议的模型的变体进行广泛的评估，以显示在所建议的网络中利用不同信息流(例如上下文和辅助信息)的重要性;并表明我们的模型能够在三个广泛使用的数据集上优于最先进的方法:50沙拉，MERL购物和佐治亚理工学院自我中心活动，包括静态和动态相机设置。

{"title":"Coupled Generative Adversarial Network for Continuous Fine-Grained Action Segmentation","authors":"Harshala Gammulle, Tharindu Fernando, S. Denman, S. Sridharan, C. Fookes","doi":"10.1109/WACV.2019.00027","DOIUrl":"https://doi.org/10.1109/WACV.2019.00027","url":null,"abstract":"We propose a novel conditional GAN (cGAN) model for continuous fine-grained human action segmentation, that utilises multi-modal data and learned scene context information. The proposed approach utilises two GANs: termed Action GAN and Auxiliary GAN, where the Action GAN is trained to operate over the current RGB frame while the Auxiliary GAN utilises supplementary information such as depth or optical flow. The goal of both GANs is to generate similar 'action codes', a vector representation of the current action. To facilitate this process a context extractor that incorporates data and recent outputs from both modes is used to extract context information to aids recognition performance. The result is a recurrent GAN architecture which learns a task specific loss function from multiple feature modalities. Extensive evaluations on variants of the proposed model to show the importance of utilising different streams of information such as context and auxiliary information in the proposed network; and show that our model is capable of outperforming state-of-the-art methods for three widely used datasets: 50 Salads, MERL Shopping and Georgia Tech Egocentric Activities, comprising both static and dynamic camera settings.","PeriodicalId":436637,"journal":{"name":"2019 IEEE Winter Conference on Applications of Computer Vision (WACV)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114549403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18