首页 > 最新文献

2020 IEEE International Conference on Image Processing (ICIP)最新文献

英文 中文
Image Dehazing With Contextualized Attentive U-NET 情境化关注U-NET图像去雾
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190725
Yean-Wei Lee, L. Wong, John See
Haze, which occurs due to the accumulation of fine dust or smoke particles in the atmosphere, degrades outdoor imaging, resulting in reduced attractiveness of outdoor photography and the effectiveness of vision-based systems. In this paper, we present an end-to-end convolutional neural network for image dehazing. Our proposed U-Net based architecture employs Squeeze-and-Excitation (SE) blocks at the skip connections to enforce channel-wise attention and parallelized dilated convolution blocks at the bottleneck to capture both local and global context, resulting in a richer representation of the image features. Experimental results demonstrate the effectiveness of the proposed method in achieving state-of-the-art performance on the benchmark SOTS dataset.
雾霾是由于大气中细小粉尘或烟雾颗粒的积累而产生的,它会降低户外成像的质量,从而降低户外摄影的吸引力和基于视觉的系统的有效性。在本文中,我们提出了一个端到端卷积神经网络用于图像去雾。我们提出的基于U-Net的架构在跳过连接处使用挤压和激励(SE)块来强制通道关注,在瓶颈处使用并行扩展卷积块来捕获局部和全局上下文,从而产生更丰富的图像特征表示。实验结果表明,该方法在基准SOTS数据集上取得了最先进的性能。
{"title":"Image Dehazing With Contextualized Attentive U-NET","authors":"Yean-Wei Lee, L. Wong, John See","doi":"10.1109/ICIP40778.2020.9190725","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190725","url":null,"abstract":"Haze, which occurs due to the accumulation of fine dust or smoke particles in the atmosphere, degrades outdoor imaging, resulting in reduced attractiveness of outdoor photography and the effectiveness of vision-based systems. In this paper, we present an end-to-end convolutional neural network for image dehazing. Our proposed U-Net based architecture employs Squeeze-and-Excitation (SE) blocks at the skip connections to enforce channel-wise attention and parallelized dilated convolution blocks at the bottleneck to capture both local and global context, resulting in a richer representation of the image features. Experimental results demonstrate the effectiveness of the proposed method in achieving state-of-the-art performance on the benchmark SOTS dataset.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132422590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Adaptive Convolutionally Enchanced Bi-Directional Lstm Networks For Choreographic Modeling 用于舞蹈建模的自适应卷积增强双向Lstm网络
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191307
N. Bakalos, I. Rallis, N. Doulamis, A. Doulamis, A. Voulodimos, Eftychios E. Protopapadakis
In this paper, we present a deep learning scheme for classification of choreographic primitives from RGB images. The proposed framework combines the representational power of feature maps, extracted by Convolutional Neural Networks, with the long-term dependency modeling capabilities of Long Short-Term Memory recurrent neural networks. In addition, it uses AutoRegressive and Moving Average (ARMA) filter into the convolutionally enriched LSTM filter to face dance dynamic characteristics. Finally, an adaptive weight updating strategy is introduced for improving classification modeling performance The framework is used for the recognition of dance primitives (basic dance postures) and is experimentally validated with real-world sequences of traditional Greek folk dances.
在本文中,我们提出了一种深度学习方案,用于从RGB图像中分类编舞原语。该框架将卷积神经网络提取的特征映射的表示能力与长短期记忆递归神经网络的长期依赖建模能力相结合。此外,在卷积丰富的LSTM滤波器中使用自回归和移动平均(ARMA)滤波器来获取人脸舞蹈的动态特征。最后,引入自适应权值更新策略,提高分类建模性能。该框架用于舞蹈原语(基本舞蹈姿势)的识别,并通过希腊传统民间舞蹈的真实序列进行了实验验证。
{"title":"Adaptive Convolutionally Enchanced Bi-Directional Lstm Networks For Choreographic Modeling","authors":"N. Bakalos, I. Rallis, N. Doulamis, A. Doulamis, A. Voulodimos, Eftychios E. Protopapadakis","doi":"10.1109/ICIP40778.2020.9191307","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191307","url":null,"abstract":"In this paper, we present a deep learning scheme for classification of choreographic primitives from RGB images. The proposed framework combines the representational power of feature maps, extracted by Convolutional Neural Networks, with the long-term dependency modeling capabilities of Long Short-Term Memory recurrent neural networks. In addition, it uses AutoRegressive and Moving Average (ARMA) filter into the convolutionally enriched LSTM filter to face dance dynamic characteristics. Finally, an adaptive weight updating strategy is introduced for improving classification modeling performance The framework is used for the recognition of dance primitives (basic dance postures) and is experimentally validated with real-world sequences of traditional Greek folk dances.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"48 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132894203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blind Natural Image Quality Prediction Using Convolutional Neural Networks And Weighted Spatial Pooling 基于卷积神经网络和加权空间池的自然图像质量盲预测
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190789
Yicheng Su, J. Korhonen
Typically, some regions of an image are more relevant for its perceived quality than the others. On the other hand, subjective image quality is also affected by low level characteristics, such as sensor noise and sharpness. This is why image rescaling, as often used in object recognition, is not a feasible approach for producing input images for convolutional neural networks (CNN) used for blind image quality prediction. Generally, convolution layer can accept images of arbitrary resolution as input, whereas fully connected (FC) layer only can accept a fixed length feature vector. To solve this problem, we propose weighted spatial pooling (WSP) to aggregate spatial information of any size of weight map, which can be used to replace global average pooling (GAP). In this paper, we present a blind image quality assessment (BIQA) method based on CNN and WSP. Our experimental results show that the prediction accuracy of the proposed method is competitive against the state-of-the-art image quality assessment methods.
通常,图像的某些区域比其他区域与感知质量更相关。另一方面,主观图像质量也受到低电平特性的影响,如传感器噪声和清晰度。这就是为什么在物体识别中经常使用的图像重缩放,并不是为用于盲图像质量预测的卷积神经网络(CNN)生成输入图像的可行方法。通常,卷积层可以接受任意分辨率的图像作为输入,而全连接层(FC)只能接受固定长度的特征向量。为了解决这一问题,我们提出了加权空间池(WSP)来聚合任意大小的权重图的空间信息,可以用来取代全局平均池化(GAP)。本文提出了一种基于CNN和WSP的盲图像质量评估(BIQA)方法。实验结果表明,该方法的预测精度与目前最先进的图像质量评估方法相比具有竞争力。
{"title":"Blind Natural Image Quality Prediction Using Convolutional Neural Networks And Weighted Spatial Pooling","authors":"Yicheng Su, J. Korhonen","doi":"10.1109/ICIP40778.2020.9190789","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190789","url":null,"abstract":"Typically, some regions of an image are more relevant for its perceived quality than the others. On the other hand, subjective image quality is also affected by low level characteristics, such as sensor noise and sharpness. This is why image rescaling, as often used in object recognition, is not a feasible approach for producing input images for convolutional neural networks (CNN) used for blind image quality prediction. Generally, convolution layer can accept images of arbitrary resolution as input, whereas fully connected (FC) layer only can accept a fixed length feature vector. To solve this problem, we propose weighted spatial pooling (WSP) to aggregate spatial information of any size of weight map, which can be used to replace global average pooling (GAP). In this paper, we present a blind image quality assessment (BIQA) method based on CNN and WSP. Our experimental results show that the prediction accuracy of the proposed method is competitive against the state-of-the-art image quality assessment methods.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132327749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Multiple Events Detection In Seismic Structures Using A Novel U-Net Variant 基于U-Net的地震结构多事件检测
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190682
M. Alfarhan, M. Deriche, A. Maalej, G. AlRegib, H. Al-Marzouqi
Seismic data interpretation is a fundamental process in the pipeline of identifying hydrocarbon structural traps such as salt domes and faults. This process is highly demanding and challenging in terms of expert-knowledge, time, and efforts. The interpretation process becomes even more challenging when it comes to identifying multiple seismic events taking place simultaneously. In recent years, the technology trend has been directed towards the automation of seismic interpretation using advanced computational techniques and in particular deep learning (DL) networks. In this paper, we present our DL solution for concurrent salt domes and faults identification with very promising preliminary results obtained through applications to real world seismic data. The proposed workflow leads to excellent detection results even with small size training datasets. Furthermore, the resulting probability maps can be extended to even a larger number of structure types. Precisions of the order of more than 96% were obtained with real data when three types of seismic structures are present concurrently.
地震资料解释是识别盐丘、断层等油气构造圈闭的基本过程。这个过程在专家知识、时间和努力方面要求很高,具有挑战性。当涉及到识别同时发生的多个地震事件时,解释过程变得更具挑战性。近年来,技术趋势是利用先进的计算技术,特别是深度学习(DL)网络,实现地震解释的自动化。在本文中,我们提出了并行盐穹和断层识别的深度学习解决方案,并通过应用于实际地震数据获得了非常有希望的初步结果。所提出的工作流程即使在较小的训练数据集上也能产生出色的检测结果。此外,得到的概率图可以扩展到更多的结构类型。在三种地震构造同时存在的情况下,用实际数据得到了96%以上的精度。
{"title":"Multiple Events Detection In Seismic Structures Using A Novel U-Net Variant","authors":"M. Alfarhan, M. Deriche, A. Maalej, G. AlRegib, H. Al-Marzouqi","doi":"10.1109/ICIP40778.2020.9190682","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190682","url":null,"abstract":"Seismic data interpretation is a fundamental process in the pipeline of identifying hydrocarbon structural traps such as salt domes and faults. This process is highly demanding and challenging in terms of expert-knowledge, time, and efforts. The interpretation process becomes even more challenging when it comes to identifying multiple seismic events taking place simultaneously. In recent years, the technology trend has been directed towards the automation of seismic interpretation using advanced computational techniques and in particular deep learning (DL) networks. In this paper, we present our DL solution for concurrent salt domes and faults identification with very promising preliminary results obtained through applications to real world seismic data. The proposed workflow leads to excellent detection results even with small size training datasets. Furthermore, the resulting probability maps can be extended to even a larger number of structure types. Precisions of the order of more than 96% were obtained with real data when three types of seismic structures are present concurrently.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130824252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Visual Relationship Detection With A Deep Convolutional Relationship Network 基于深度卷积关系网络的视觉关系检测
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190642
Yao Peng, D. Chen, Lanfen Lin
Visual relationship is crucial to image understanding and can be applied to many tasks (e.g., image caption and visual question answering). Despite great progress on many vision tasks, relationship detection remains a challenging problem due to the complexity of modeling the widely spread and imbalanced distribution of {subject – predicate – object} triplets. In this paper, we propose a new framework to capture the relative positions and sizes of the subject and object in the feature map and add a new branch to filter out some object pairs that are unlikely to have relationships. In addition, an activation function is trained to increase the probability of some feature maps given an object pair. Experiments on two large datasets, the Visual Relationship Detection (VRD) and Visual Genome (VG) datasets, demonstrate the superiority of our new approach over state-of-the-art methods. Further, ablation study verifies the effectiveness of our techniques.
视觉关系对图像理解至关重要,可以应用于许多任务(例如,图像标题和视觉问答)。尽管在许多视觉任务上取得了很大进展,但由于对{主语-谓语-客体}三元组的广泛分布和不平衡建模的复杂性,关系检测仍然是一个具有挑战性的问题。在本文中,我们提出了一个新的框架来捕获主题和对象在特征映射中的相对位置和大小,并增加了一个新的分支来过滤掉一些不太可能有关系的对象对。此外,还训练了一个激活函数来增加给定对象对的某些特征映射的概率。在两个大型数据集,视觉关系检测(VRD)和视觉基因组(VG)数据集上的实验表明,我们的新方法优于最先进的方法。烧蚀实验进一步验证了该技术的有效性。
{"title":"Visual Relationship Detection With A Deep Convolutional Relationship Network","authors":"Yao Peng, D. Chen, Lanfen Lin","doi":"10.1109/ICIP40778.2020.9190642","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190642","url":null,"abstract":"Visual relationship is crucial to image understanding and can be applied to many tasks (e.g., image caption and visual question answering). Despite great progress on many vision tasks, relationship detection remains a challenging problem due to the complexity of modeling the widely spread and imbalanced distribution of {subject – predicate – object} triplets. In this paper, we propose a new framework to capture the relative positions and sizes of the subject and object in the feature map and add a new branch to filter out some object pairs that are unlikely to have relationships. In addition, an activation function is trained to increase the probability of some feature maps given an object pair. Experiments on two large datasets, the Visual Relationship Detection (VRD) and Visual Genome (VG) datasets, demonstrate the superiority of our new approach over state-of-the-art methods. Further, ablation study verifies the effectiveness of our techniques.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130279622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Scalable Learned Image Compression With A Recurrent Neural Networks-Based Hyperprior 基于循环神经网络的可伸缩学习图像压缩
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9190704
Rige Su, Zhengxue Cheng, Heming Sun, J. Katto
Recently learned image compression has achieved many great progresses, such as representative hyperprior and its variants based on convolutional neural networks (CNNs). However, CNNs are not fit for scalable coding and multiple models need to be trained separately to achieve variable rates. In this paper, we incorporate differentiable quantization and accurate entropy models into recurrent neural networks (RNNs) architectures to achieve a scalable learned image compression. First, we present an RNN architecture with quantization and entropy coding. To realize the scalable coding, we allocate the bits to multiple layers, by adjusting the layer-wise lambda values in Lagrangian multiplier-based rate-distortion optimization function. Second, we add an RNN-based hyperprior to improve the accuracy of entropy models for multiple-layer residual representations. Experimental results demonstrate that our performance can be comparable with recent CNN-based hyperprior methods on Kodak dataset. Besides, our method is a scalable and flexible coding approach, to achieve multiple rates using one single model, which is very appealing.
近年来,学习图像压缩取得了许多重大进展,例如基于卷积神经网络(cnn)的代表性超先验及其变体。然而,cnn不适合可扩展编码,需要单独训练多个模型来实现可变速率。在本文中,我们将可微量化和精确熵模型结合到递归神经网络(rnn)架构中,以实现可扩展的学习图像压缩。首先,我们提出了一个带有量化和熵编码的RNN架构。为了实现可扩展编码,我们通过调整基于拉格朗日乘法器的率失真优化函数中的分层lambda值来将比特分配到多个层。其次,我们添加了一个基于rnn的超先验来提高熵模型对多层残差表示的准确性。实验结果表明,在柯达数据集上,我们的性能可以与最近基于cnn的超先验方法相媲美。此外,我们的方法是一种可扩展和灵活的编码方法,可以使用一个模型实现多个速率,这非常吸引人。
{"title":"Scalable Learned Image Compression With A Recurrent Neural Networks-Based Hyperprior","authors":"Rige Su, Zhengxue Cheng, Heming Sun, J. Katto","doi":"10.1109/ICIP40778.2020.9190704","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9190704","url":null,"abstract":"Recently learned image compression has achieved many great progresses, such as representative hyperprior and its variants based on convolutional neural networks (CNNs). However, CNNs are not fit for scalable coding and multiple models need to be trained separately to achieve variable rates. In this paper, we incorporate differentiable quantization and accurate entropy models into recurrent neural networks (RNNs) architectures to achieve a scalable learned image compression. First, we present an RNN architecture with quantization and entropy coding. To realize the scalable coding, we allocate the bits to multiple layers, by adjusting the layer-wise lambda values in Lagrangian multiplier-based rate-distortion optimization function. Second, we add an RNN-based hyperprior to improve the accuracy of entropy models for multiple-layer residual representations. Experimental results demonstrate that our performance can be comparable with recent CNN-based hyperprior methods on Kodak dataset. Besides, our method is a scalable and flexible coding approach, to achieve multiple rates using one single model, which is very appealing.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130410018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Precise Cerebrovascular Segmentation 脑血管精确分割
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191077
F. Taher, A. Soliman, Heba Kandil, Ali M. Mahmoud, A. Shalaby, G. Gimel'farb, A. El-Baz
Analyzing cerebrovascular changes using Time-of-Flight Magnetic Resonance Angiography (ToF–MRA) images can detect the presence of serious diseases and track their progress, e.g., hypertension. Such analysis requires accurate segmentation of the vasculature from the surroundings, which motivated us to propose a fully automated cerebral vasculature segmentation approach based on extracting both prior and current appearance features that capture the appearance of macro and micro-vessels. The appearance prior is modeled with a novel translation and rotation invariant Markov-Gibbs Random Field (MGRF) of voxel intensities with pairwise interaction analytically identified from a set of training data sets, while the current appearance is represented with a marginal probability distribution of voxel intensities by using a Linear Combination of Discrete Gaussians (LCDG) whose parameters are estimated by a modified Expectation-Maximization (EM) algorithm. The proposed approach was validated on 190 data sets using three metrics, which revealed high accuracy compared to existing approaches.
使用飞行时间磁共振血管造影(ToF-MRA)图像分析脑血管变化可以检测严重疾病的存在并跟踪其进展,例如高血压。这种分析需要从周围环境中准确分割血管,这促使我们提出了一种全自动的脑血管分割方法,该方法基于提取先前和当前的外观特征,捕获宏观和微血管的外观。外观先验是通过一组训练数据集解析识别出具有两两交互作用的体素强度的新颖的翻译和旋转不变马尔可夫-吉布斯随机场(MGRF)来建模的,而当前外观是通过离散高斯的线性组合(LCDG)来表示体素强度的边际概率分布,其参数是通过改进的期望最大化(EM)算法估计的。使用三个指标在190个数据集上验证了所提出的方法,与现有方法相比,该方法显示出更高的准确性。
{"title":"Precise Cerebrovascular Segmentation","authors":"F. Taher, A. Soliman, Heba Kandil, Ali M. Mahmoud, A. Shalaby, G. Gimel'farb, A. El-Baz","doi":"10.1109/ICIP40778.2020.9191077","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191077","url":null,"abstract":"Analyzing cerebrovascular changes using Time-of-Flight Magnetic Resonance Angiography (ToF–MRA) images can detect the presence of serious diseases and track their progress, e.g., hypertension. Such analysis requires accurate segmentation of the vasculature from the surroundings, which motivated us to propose a fully automated cerebral vasculature segmentation approach based on extracting both prior and current appearance features that capture the appearance of macro and micro-vessels. The appearance prior is modeled with a novel translation and rotation invariant Markov-Gibbs Random Field (MGRF) of voxel intensities with pairwise interaction analytically identified from a set of training data sets, while the current appearance is represented with a marginal probability distribution of voxel intensities by using a Linear Combination of Discrete Gaussians (LCDG) whose parameters are estimated by a modified Expectation-Maximization (EM) algorithm. The proposed approach was validated on 190 data sets using three metrics, which revealed high accuracy compared to existing approaches.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127913574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3d Object Detection For Autonomous Driving Using Temporal Lidar Data 基于时间激光雷达数据的自动驾驶3d目标检测
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191134
S. McCrae, A. Zakhor
3D object detection is a fundamental problem in the space of autonomous driving, and pedestrians are some of the most important objects to detect. The recently introduced PointPillars architecture has been shown to be effective in object detection. It voxelizes 3D LiDAR point clouds to produce a 2D pseudo-image to be used for object detection. In this work, we modify PointPillars to become a recurrent network, using fewer LiDAR frames per forward pass. Specifically, as compared to the original PointPillars model which uses 10 LiDAR frames per forward pass, our recurrent model uses 3 frames and recurrent memory. With this modification, we observe an 8% increase in pedestrian detection and a slight decline in performance on vehicle detection in a coarsely voxelized setting. Furthermore, when given 3 frames of data as input to both models, our recurrent architecture outperforms PointPillars by 21% and 1% in pedestrian and vehicle detection, respectively.
三维物体检测是自动驾驶空间中的一个基本问题,行人是其中最重要的检测对象。最近引入的PointPillars架构已被证明在目标检测方面是有效的。它将三维激光雷达点云体素化,生成用于目标检测的二维伪图像。在这项工作中,我们修改了PointPillars,使其成为一个循环网络,每次向前通过使用更少的LiDAR帧。具体来说,与每个前向通道使用10个LiDAR帧的原始PointPillars模型相比,我们的循环模型使用3帧和循环内存。通过这种修改,我们观察到在粗体素化设置下行人检测性能提高了8%,车辆检测性能略有下降。此外,当给定3帧数据作为两个模型的输入时,我们的循环架构在行人和车辆检测方面分别比PointPillars高出21%和1%。
{"title":"3d Object Detection For Autonomous Driving Using Temporal Lidar Data","authors":"S. McCrae, A. Zakhor","doi":"10.1109/ICIP40778.2020.9191134","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191134","url":null,"abstract":"3D object detection is a fundamental problem in the space of autonomous driving, and pedestrians are some of the most important objects to detect. The recently introduced PointPillars architecture has been shown to be effective in object detection. It voxelizes 3D LiDAR point clouds to produce a 2D pseudo-image to be used for object detection. In this work, we modify PointPillars to become a recurrent network, using fewer LiDAR frames per forward pass. Specifically, as compared to the original PointPillars model which uses 10 LiDAR frames per forward pass, our recurrent model uses 3 frames and recurrent memory. With this modification, we observe an 8% increase in pedestrian detection and a slight decline in performance on vehicle detection in a coarsely voxelized setting. Furthermore, when given 3 frames of data as input to both models, our recurrent architecture outperforms PointPillars by 21% and 1% in pedestrian and vehicle detection, respectively.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129241684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Image Compression with Laplacian Guided Scale Space Inpainting 拉普拉斯引导尺度空间绘图的图像压缩
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191041
Lingzhi Zhang, P. Kumar, Manuj R. Sabharwal, Andy Kuzma, Jianbo Shi
We present an image compression algorithm that preserves high-frequency details and information of rare occurrences. Our approach can be thought of as image inpainting in the frequency scale space. Given an image, we construct a Laplacian image pyramid, and store only the finest and coarsest levels, thereby removing the middle-frequency of the image. Using a network backbone borrowed from an image super-resolution algorithm, we train our network to hallucinate the missing middle-level Laplacian image. We introduce a novel training paradigm where we train our algorithm using only a face dataset where the faces are aligned and scaled correctly. We demonstrate that image compression learned on this restricted dataset leads to better GAN network [1] convergence and generalization to completely different image domains. We also show that Lapacian inpainting could be simplified further with a few selective pixels as seeds.
我们提出了一种保留高频细节和罕见信息的图像压缩算法。我们的方法可以被认为是在频率尺度空间中的图像修复。给定图像,我们构建一个拉普拉斯图像金字塔,并仅存储最细和最粗的层,从而去除图像的中频。利用借鉴图像超分辨率算法的网络骨干,我们训练网络产生缺失的中层拉普拉斯图像。我们引入了一种新的训练范式,我们只使用人脸数据集来训练我们的算法,其中人脸是正确对齐和缩放的。我们证明了在这个受限数据集上学习的图像压缩导致更好的GAN网络[1]收敛和泛化到完全不同的图像域。我们还表明,使用一些选择性像素作为种子,可以进一步简化Lapacian图像绘制。
{"title":"Image Compression with Laplacian Guided Scale Space Inpainting","authors":"Lingzhi Zhang, P. Kumar, Manuj R. Sabharwal, Andy Kuzma, Jianbo Shi","doi":"10.1109/ICIP40778.2020.9191041","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191041","url":null,"abstract":"We present an image compression algorithm that preserves high-frequency details and information of rare occurrences. Our approach can be thought of as image inpainting in the frequency scale space. Given an image, we construct a Laplacian image pyramid, and store only the finest and coarsest levels, thereby removing the middle-frequency of the image. Using a network backbone borrowed from an image super-resolution algorithm, we train our network to hallucinate the missing middle-level Laplacian image. We introduce a novel training paradigm where we train our algorithm using only a face dataset where the faces are aligned and scaled correctly. We demonstrate that image compression learned on this restricted dataset leads to better GAN network [1] convergence and generalization to completely different image domains. We also show that Lapacian inpainting could be simplified further with a few selective pixels as seeds.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125456882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient Detection of Pixel-Level Adversarial Attacks 像素级对抗性攻击的有效检测
Pub Date : 2020-10-01 DOI: 10.1109/ICIP40778.2020.9191084
S. A. A. Shah, Moise Bougre, Naveed Akhtar, Bennamoun, Liang Zhang
Deep learning has achieved unprecedented performance in object recognition and scene understanding. However, deep models are also found vulnerable to adversarial attacks. Of particular relevance to robotics systems are pixel-level attacks that can completely fool a neural network by altering very few pixels (e.g. 1-5) in an image. We present the first technique to detect the presence of adversarial pixels in images for the robotic systems, employing an Adversarial Detection Network (ADNet). The proposed network efficiently recognize an input as adversarial or clean by discriminating the peculiar activation signals of the adversarial samples from the clean ones. It acts as a defense mechanism for the robotic vision system by detecting and rejecting the adversarial samples. We thoroughly evaluate our technique on three benchmark datasets including CIFAR-10, CIFAR-100 and Fashion MNIST. Results demonstrate effective detection of adversarial samples by ADNet.
深度学习在物体识别和场景理解方面取得了前所未有的成绩。然而,深度模型也容易受到对抗性攻击。与机器人系统特别相关的是像素级攻击,它可以通过改变图像中的很少像素(例如1-5)来完全欺骗神经网络。我们提出了第一种检测机器人系统图像中敌对像素存在的技术,采用对抗检测网络(ADNet)。该网络通过区分敌对样本的特殊激活信号和干净样本的激活信号,有效地识别输入是敌对的还是干净的。它作为机器人视觉系统的防御机制,通过检测和拒绝敌对的样本。我们在三个基准数据集(CIFAR-10、CIFAR-100和Fashion MNIST)上全面评估了我们的技术。结果表明,ADNet能够有效地检测出对抗样本。
{"title":"Efficient Detection of Pixel-Level Adversarial Attacks","authors":"S. A. A. Shah, Moise Bougre, Naveed Akhtar, Bennamoun, Liang Zhang","doi":"10.1109/ICIP40778.2020.9191084","DOIUrl":"https://doi.org/10.1109/ICIP40778.2020.9191084","url":null,"abstract":"Deep learning has achieved unprecedented performance in object recognition and scene understanding. However, deep models are also found vulnerable to adversarial attacks. Of particular relevance to robotics systems are pixel-level attacks that can completely fool a neural network by altering very few pixels (e.g. 1-5) in an image. We present the first technique to detect the presence of adversarial pixels in images for the robotic systems, employing an Adversarial Detection Network (ADNet). The proposed network efficiently recognize an input as adversarial or clean by discriminating the peculiar activation signals of the adversarial samples from the clean ones. It acts as a defense mechanism for the robotic vision system by detecting and rejecting the adversarial samples. We thoroughly evaluate our technique on three benchmark datasets including CIFAR-10, CIFAR-100 and Fashion MNIST. Results demonstrate effective detection of adversarial samples by ADNet.","PeriodicalId":405734,"journal":{"name":"2020 IEEE International Conference on Image Processing (ICIP)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125482898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2020 IEEE International Conference on Image Processing (ICIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1