首页 > 最新文献

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Learning Photometric Stereo via Manifold-based Mapping 通过基于流形的映射学习光度立体
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301860
Yakun Ju, Muwei Jian, Junyu Dong, K. Lam
Three-dimensional reconstruction technologies are fundamental problems in computer vision. Photometric stereo recovers the surface normals of a 3D object from varying shading cues, prevailing in its capability for generating fine surface normal. In recent years, deep learning-based photometric stereo methods are capable of improving the surface-normal estimation under general non-Lambertian surfaces, due to its powerful fitting ability on the non-Lambertian surface. These state-of-the-art methods however usually regress the surface normal directly from the high-dimensional features, without exploring the embedded structural information. This results in the underutilization of the information available in the features. Therefore, in this paper, we propose an efficient manifold-based framework for learning-based photometric stereo, which can better map combined high-dimensional feature spaces to low-dimensional manifolds. Extensive experiments show that our method, learning with the low-dimensional manifolds, achieves more accurate surface-normal estimation, outperforming other state-of-the-art methods on the challenging DiLiGenT benchmark dataset.
三维重建技术是计算机视觉的基础问题。光度立体从不同的阴影线索中恢复3D物体的表面法线,其生成精细表面法线的能力占主导地位。近年来,基于深度学习的光度立体方法由于其对非朗伯曲面的强大拟合能力,能够改善一般非朗伯曲面下的曲面法向估计。然而,这些最先进的方法通常直接从高维特征回归表面法线,而不探索嵌入的结构信息。这将导致特性中可用信息的利用不足。因此,在本文中,我们提出了一种高效的基于流形的基于学习的光度立体框架,该框架可以更好地将组合的高维特征空间映射到低维流形。大量的实验表明,我们的方法,通过低维流形的学习,实现了更准确的表面法线估计,在具有挑战性的勤奋基准数据集上优于其他最先进的方法。
{"title":"Learning Photometric Stereo via Manifold-based Mapping","authors":"Yakun Ju, Muwei Jian, Junyu Dong, K. Lam","doi":"10.1109/VCIP49819.2020.9301860","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301860","url":null,"abstract":"Three-dimensional reconstruction technologies are fundamental problems in computer vision. Photometric stereo recovers the surface normals of a 3D object from varying shading cues, prevailing in its capability for generating fine surface normal. In recent years, deep learning-based photometric stereo methods are capable of improving the surface-normal estimation under general non-Lambertian surfaces, due to its powerful fitting ability on the non-Lambertian surface. These state-of-the-art methods however usually regress the surface normal directly from the high-dimensional features, without exploring the embedded structural information. This results in the underutilization of the information available in the features. Therefore, in this paper, we propose an efficient manifold-based framework for learning-based photometric stereo, which can better map combined high-dimensional feature spaces to low-dimensional manifolds. Extensive experiments show that our method, learning with the low-dimensional manifolds, achieves more accurate surface-normal estimation, outperforming other state-of-the-art methods on the challenging DiLiGenT benchmark dataset.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124545585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Power/QoS-Adaptive HEVC FME Hardware using Machine Learning-Based Approximation Control 基于机器学习的近似控制的功率/ qos自适应HEVC FME硬件
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301797
Wagner Penny, D. Palomino, M. Porto, B. Zatt
This paper presents a machine learning-based adaptive approximate hardware design targeting the fractional motion estimation (FME) of HEVC encoder. Hardware designs targeting multiple levels of approximation are proposed, by changing FME filters coefficients and/or discarding taps. The level of approximation is defined by a decision tree, generated taking into account the behavior of several parameters of the encoding in order to predict homogeneous blocks, more suitable for more aggressive approximation without significant losses on quality of service (QoS). Instead of applying a specific level of approximation over the full video, different approximate FME accelerators are dynamically selected. Such a strategy is able to provide up to 50.54% of power reduction while keeping the QoS losses at 1.18% BD-BR.
针对HEVC编码器的分数运动估计问题,提出了一种基于机器学习的自适应近似硬件设计。通过改变FME滤波器系数和/或丢弃抽头,提出了针对多级近似的硬件设计。近似级别由决策树定义,该决策树的生成考虑了编码的几个参数的行为,以便预测同质块,更适合更积极的近似,而不会对服务质量(QoS)造成重大损失。不同的近似FME加速器是动态选择的,而不是在整个视频上应用特定级别的近似。这种策略能够提供高达50.54%的功耗降低,同时保持QoS损失在1.18% BD-BR。
{"title":"Power/QoS-Adaptive HEVC FME Hardware using Machine Learning-Based Approximation Control","authors":"Wagner Penny, D. Palomino, M. Porto, B. Zatt","doi":"10.1109/VCIP49819.2020.9301797","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301797","url":null,"abstract":"This paper presents a machine learning-based adaptive approximate hardware design targeting the fractional motion estimation (FME) of HEVC encoder. Hardware designs targeting multiple levels of approximation are proposed, by changing FME filters coefficients and/or discarding taps. The level of approximation is defined by a decision tree, generated taking into account the behavior of several parameters of the encoding in order to predict homogeneous blocks, more suitable for more aggressive approximation without significant losses on quality of service (QoS). Instead of applying a specific level of approximation over the full video, different approximate FME accelerators are dynamically selected. Such a strategy is able to provide up to 50.54% of power reduction while keeping the QoS losses at 1.18% BD-BR.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128998273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sensitivity-Aware Bit Allocation for Intermediate Deep Feature Compression 基于灵敏度感知的中深度特征压缩位分配
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301807
Yuzhang Hu, Sifeng Xia, Wenhan Yang, Jiaying Liu
In this paper, we focus on compressing and trans-mitting deep intermediate features to support the prosperous applications at the cloud side efficiently, and propose a sensitivity-aware bit allocation algorithm for the deep intermediate feature compression. Considering that different channels’ contributions to the final inference result of the deep learning model might differ a lot, we design a channel-wise bit allocation mechanism to maintain the accuracy while trying to reduce the bit-rate cost. The algorithm consists of two passes. In the first pass, only one channel is exposed to compression degradation while other channels are kept as the original ones in order to test this channel’s sensitivity to the compression degradation. This process will be repeated until all channels’ sensitivity is obtained. Then, in the second pass, bits allocated to each channel will be automatically decided according to the sensitivity obtained in the first pass to make sure that the channel with higher sensitivity can be allocated with more bits to maintain accuracy as much as possible. With the well-designed algorithm, our method surpasses state-of-the-art compression tools with on average 6.4% BD-rate saving.
本文重点研究了深度中间特征的压缩和传输,以有效地支持云端的繁荣应用,并提出了一种敏感的深度中间特征压缩比特分配算法。考虑到不同信道对深度学习模型最终推理结果的贡献可能存在很大差异,我们设计了一种基于信道的比特分配机制,在保持精度的同时尽量降低比特率成本。该算法由两步组成。在第一个通道中,只有一个通道暴露于压缩退化,而其他通道保持原始通道,以测试该通道对压缩退化的敏感性。此过程将重复,直到获得所有通道的灵敏度。然后,在第二次通道中,根据第一次通道获得的灵敏度自动决定分配给每个通道的比特数,以确保分配给灵敏度较高的通道的比特数更多,尽可能地保持精度。通过精心设计的算法,我们的方法超过了最先进的压缩工具,平均节省了6.4%的bd速率。
{"title":"Sensitivity-Aware Bit Allocation for Intermediate Deep Feature Compression","authors":"Yuzhang Hu, Sifeng Xia, Wenhan Yang, Jiaying Liu","doi":"10.1109/VCIP49819.2020.9301807","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301807","url":null,"abstract":"In this paper, we focus on compressing and trans-mitting deep intermediate features to support the prosperous applications at the cloud side efficiently, and propose a sensitivity-aware bit allocation algorithm for the deep intermediate feature compression. Considering that different channels’ contributions to the final inference result of the deep learning model might differ a lot, we design a channel-wise bit allocation mechanism to maintain the accuracy while trying to reduce the bit-rate cost. The algorithm consists of two passes. In the first pass, only one channel is exposed to compression degradation while other channels are kept as the original ones in order to test this channel’s sensitivity to the compression degradation. This process will be repeated until all channels’ sensitivity is obtained. Then, in the second pass, bits allocated to each channel will be automatically decided according to the sensitivity obtained in the first pass to make sure that the channel with higher sensitivity can be allocated with more bits to maintain accuracy as much as possible. With the well-designed algorithm, our method surpasses state-of-the-art compression tools with on average 6.4% BD-rate saving.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128664469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Network Update Compression for Federated Learning 联邦学习的网络更新压缩
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301815
B. Kathariya, Li Li, Zhu Li, Ling-yu Duan, Shan Liu
In federated learning setting, models are trained in a variety of edge-devices with locally generated data and each round only updates in the current model rather than the model itself are sent to the server where they are aggregated to compose an improved model. These edge devices, however, reside in highly uneven nature of network with higher latency and lower-throughput connections and are intermittently available for training. In addition, a network connection has an asymmetric nature of downlink and uplink. All these contribute to a major challenge while synchronizing these updates to the server.In this work, we proposed an efficient c oding s olution to significantly r educe u plink c ommunication c ost b y r educing the total number of parameters required for updates. This was achieved by applying Gaussian Mixture Model (GMM) to localize Karhunen–Loève Transform (KLT) on inter-model subspace and representing it with two low-rank matrices. Experiments on convolutional neural network (CNN) models showed the proposed model can significantly reduce the uplink communication cost in federated learning while preserving reasonable accuracy.
在联邦学习设置中,模型在各种具有本地生成数据的边缘设备中进行训练,并且每轮只更新当前模型而不是将模型本身发送到服务器,在服务器中它们被聚合以组成改进的模型。然而,这些边缘设备驻留在高度不均匀的网络中,具有更高的延迟和更低的吞吐量连接,并且间歇性地可用于训练。此外,网络连接具有下行链路和上行链路的不对称性质。所有这些都是将这些更新同步到服务器的主要挑战。在这项工作中,我们提出了一种有效的c编码解决方案,通过减少更新所需的参数总数来显着降低链路通信成本。利用高斯混合模型(GMM)对模型间子空间上的karhunen - lo变换(KLT)进行局部化,并用两个低秩矩阵表示。在卷积神经网络(CNN)模型上的实验表明,该模型可以显著降低联邦学习中的上行通信成本,同时保持合理的准确率。
{"title":"Network Update Compression for Federated Learning","authors":"B. Kathariya, Li Li, Zhu Li, Ling-yu Duan, Shan Liu","doi":"10.1109/VCIP49819.2020.9301815","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301815","url":null,"abstract":"In federated learning setting, models are trained in a variety of edge-devices with locally generated data and each round only updates in the current model rather than the model itself are sent to the server where they are aggregated to compose an improved model. These edge devices, however, reside in highly uneven nature of network with higher latency and lower-throughput connections and are intermittently available for training. In addition, a network connection has an asymmetric nature of downlink and uplink. All these contribute to a major challenge while synchronizing these updates to the server.In this work, we proposed an efficient c oding s olution to significantly r educe u plink c ommunication c ost b y r educing the total number of parameters required for updates. This was achieved by applying Gaussian Mixture Model (GMM) to localize Karhunen–Loève Transform (KLT) on inter-model subspace and representing it with two low-rank matrices. Experiments on convolutional neural network (CNN) models showed the proposed model can significantly reduce the uplink communication cost in federated learning while preserving reasonable accuracy.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115530634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Color Transform in VVC Standard VVC标准中的自适应颜色变换
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301798
Hong-Jheng Jhu, Xiaoyu Xiu, Yi-Wen Chen, Tsung-Chuan Ma, Xianglin Wang
This paper provides an in-depth overview of the adaptive color transform (ACT) tool that is adopted into the emerging versatile video coding (VVC) standard. With the ACT, prediction residuals in the original color space are adaptively converted into another color space to reduce the correlation among the three color components of video sequences in 4:4:4 chroma format. The residuals after color space conversion are then transformed, quantized and entropy-coded, following the VVC framework. YCgCo-R transforms, which can be easily implemented with shift and addition operations, are selected as the ACT core transforms to do the color space conversion. Additionally, to facilitate its implementations, the ACT is disabled in certain cases where the three color components do not share the same block partition, e.g. under separate tree partition mode or intra sub-partition prediction mode. Simulation results based on the VVC reference software show that ACT may provide significant coding gains with negligible impact on encoding and decoding runtime.
本文提供了一个深入的概述自适应颜色变换(ACT)工具,采用了新兴的通用视频编码(VVC)标准。利用ACT自适应地将原色彩空间中的预测残差转换到另一个色彩空间中,以降低4:4:4色度格式视频序列中三个色彩分量之间的相关性。然后根据VVC框架对色彩空间转换后的残差进行变换、量化和熵编码。选择易于移位和加法运算实现的YCgCo-R变换作为ACT核心变换进行色彩空间转换。此外,为了便于实现,在某些情况下,当三个颜色组件不共享相同的块分区时,例如在单独的树分区模式下或在子分区内预测模式下,ACT被禁用。基于VVC参考软件的仿真结果表明,ACT可以提供显著的编码增益,而对编码和解码运行时间的影响可以忽略不计。
{"title":"Adaptive Color Transform in VVC Standard","authors":"Hong-Jheng Jhu, Xiaoyu Xiu, Yi-Wen Chen, Tsung-Chuan Ma, Xianglin Wang","doi":"10.1109/VCIP49819.2020.9301798","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301798","url":null,"abstract":"This paper provides an in-depth overview of the adaptive color transform (ACT) tool that is adopted into the emerging versatile video coding (VVC) standard. With the ACT, prediction residuals in the original color space are adaptively converted into another color space to reduce the correlation among the three color components of video sequences in 4:4:4 chroma format. The residuals after color space conversion are then transformed, quantized and entropy-coded, following the VVC framework. YCgCo-R transforms, which can be easily implemented with shift and addition operations, are selected as the ACT core transforms to do the color space conversion. Additionally, to facilitate its implementations, the ACT is disabled in certain cases where the three color components do not share the same block partition, e.g. under separate tree partition mode or intra sub-partition prediction mode. Simulation results based on the VVC reference software show that ACT may provide significant coding gains with negligible impact on encoding and decoding runtime.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116040429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-Reference Stereoscopic Image Quality Assessment Based on Convolutional Neural Network with A Long-Term Feature Fusion 基于长期特征融合卷积神经网络的无参考立体图像质量评价
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301854
Sumei Li, Mingyi Wang
With the rapid development of three-dimensional (3D) technology, the effective stereoscopic image quality assessment (SIQA) methods are in great demand. Stereoscopic image contains depth information, making it much more challenging in exploring a reliable SIQA model that fits human visual system. In this paper, a no-reference SIQA method is proposed, which better simulates binocular fusion and binocular rivalry. The proposed method applies convolutional neural network to build a dual-channel model and achieve a long-term process of feature extraction, fusion, and processing. What’s more, both high and low frequency information are used effectively. Experimental results demonstrate that the proposed model outperforms the state-of-the-art no-reference SIQA methods and has a promising generalization ability.
随着三维技术的飞速发展,对有效的立体图像质量评价方法的需求越来越大。立体图像中包含深度信息,这使得探索适合人类视觉系统的可靠SIQA模型更具挑战性。本文提出了一种模拟双目融合和双目竞争的无参考SIQA方法。该方法利用卷积神经网络构建双通道模型,实现特征提取、融合和处理的长期过程。此外,高频和低频信息都得到了有效的利用。实验结果表明,该模型优于目前最先进的无参考SIQA方法,具有良好的泛化能力。
{"title":"No-Reference Stereoscopic Image Quality Assessment Based on Convolutional Neural Network with A Long-Term Feature Fusion","authors":"Sumei Li, Mingyi Wang","doi":"10.1109/VCIP49819.2020.9301854","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301854","url":null,"abstract":"With the rapid development of three-dimensional (3D) technology, the effective stereoscopic image quality assessment (SIQA) methods are in great demand. Stereoscopic image contains depth information, making it much more challenging in exploring a reliable SIQA model that fits human visual system. In this paper, a no-reference SIQA method is proposed, which better simulates binocular fusion and binocular rivalry. The proposed method applies convolutional neural network to build a dual-channel model and achieve a long-term process of feature extraction, fusion, and processing. What’s more, both high and low frequency information are used effectively. Experimental results demonstrate that the proposed model outperforms the state-of-the-art no-reference SIQA methods and has a promising generalization ability.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127543419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Discrete Cosine Model of Light Field Sampling for Improving Rendering Quality of Views 一种用于提高视图渲染质量的光场采样离散余弦模型
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301838
Ying Wei, Changjian Zhu, You Yang, Yan Liu
A number of theories have been proposed for reducing sampling rate of light field. But these theories still need a great many of samples (images) to obtain sufficient geometric information. In this paper, we utilize the sparse representation of light field in Discrete Cosine Transform domain to present a Discrete Cosine Sparse Basis (DCSB). Thus, we can find out the zeros of DCSB to reduce sampling requirement of light field for alias-free rendering. Finally, experimental results demonstrate the effectiveness of our approach without lose information.
人们提出了许多降低光场采样率的理论。但是这些理论仍然需要大量的样本(图像)来获得足够的几何信息。本文利用离散余弦变换域中光场的稀疏表示,提出了一种离散余弦稀疏基(DCSB)。因此,我们可以找到DCSB的零点,以减少光场的采样要求,实现无别名渲染。最后,实验结果证明了该方法在不丢失信息的情况下是有效的。
{"title":"A Discrete Cosine Model of Light Field Sampling for Improving Rendering Quality of Views","authors":"Ying Wei, Changjian Zhu, You Yang, Yan Liu","doi":"10.1109/VCIP49819.2020.9301838","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301838","url":null,"abstract":"A number of theories have been proposed for reducing sampling rate of light field. But these theories still need a great many of samples (images) to obtain sufficient geometric information. In this paper, we utilize the sparse representation of light field in Discrete Cosine Transform domain to present a Discrete Cosine Sparse Basis (DCSB). Thus, we can find out the zeros of DCSB to reduce sampling requirement of light field for alias-free rendering. Finally, experimental results demonstrate the effectiveness of our approach without lose information.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122007579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Marked Point Process Model For Visual Perceptual Groups Extraction 视觉感知群提取的标记点过程模型
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301776
A. Mbarki, M. Naouai
Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image features tend to be grouped by giving a set of organizing principles. In this paper, we propose an approach for the detection of perceptual groups in an image. We are mainly interested in features grouped by the proximity law of Gestalt. We conceive an object-based model within a stochastic framework using a marked point process (MPP). We use a Bayesian learning method to extract perceptual groups in a scene. The proposed model tested on synthetic images proves the efficient detection of perceptual groups in noisy images.
感知组织是将场景的每个部分分配给指定的特征关联以成为同一组织的一部分的过程。在20世纪,格式塔心理学家通过给出一套组织原则,正式确定了图像特征是如何被归类的。在本文中,我们提出了一种检测图像中感知群的方法。我们主要对按格式塔接近定律分组的特征感兴趣。我们设想一个基于对象的模型在一个随机框架内使用标记点过程(MPP)。我们使用贝叶斯学习方法来提取场景中的感知组。该模型在合成图像上进行了测试,证明了在噪声图像中感知群的有效检测。
{"title":"A Marked Point Process Model For Visual Perceptual Groups Extraction","authors":"A. Mbarki, M. Naouai","doi":"10.1109/VCIP49819.2020.9301776","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301776","url":null,"abstract":"Perceptual organization is the process of assigning each part of a scene to a specified association of features to be a part of the same organization. In the twenty century, Gestalt psychologists formalized how image features tend to be grouped by giving a set of organizing principles. In this paper, we propose an approach for the detection of perceptual groups in an image. We are mainly interested in features grouped by the proximity law of Gestalt. We conceive an object-based model within a stochastic framework using a marked point process (MPP). We use a Bayesian learning method to extract perceptual groups in a scene. The proposed model tested on synthetic images proves the efficient detection of perceptual groups in noisy images.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"155 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126593524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two recent advances on normalization methods for deep neural network optimization 深度神经网络优化归一化方法的两个最新进展
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301751
Lei Zhang
The normalization methods are very important for the effective and efficient optimization of deep neural networks (DNNs). The statistics such as mean and variance can be used to normalize the network activations or weights to make the training process more stable. Among the activation normalization techniques, batch normalization (BN) is the most popular one. However, BN has poor performance when the batch size is small in training. We found that the formulation of BN in the inference stage is problematic, and consequently presented a corrected one. Without any change in the training stage, the corrected BN significantly improves the inference performance when training with small batch size.
归一化方法对于深度神经网络的有效优化是非常重要的。均值和方差等统计量可以用来对网络激活或权值进行归一化,使训练过程更加稳定。在各种激活归一化技术中,批归一化是最常用的一种。然而,在训练中,当批大小较小时,BN的性能较差。我们发现在推理阶段BN的表述是有问题的,因此提出了一个修正的表述。在训练阶段没有任何变化的情况下,修正后的BN在小批量训练时显著提高了推理性能。
{"title":"Two recent advances on normalization methods for deep neural network optimization","authors":"Lei Zhang","doi":"10.1109/VCIP49819.2020.9301751","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301751","url":null,"abstract":"The normalization methods are very important for the effective and efficient optimization of deep neural networks (DNNs). The statistics such as mean and variance can be used to normalize the network activations or weights to make the training process more stable. Among the activation normalization techniques, batch normalization (BN) is the most popular one. However, BN has poor performance when the batch size is small in training. We found that the formulation of BN in the inference stage is problematic, and consequently presented a corrected one. Without any change in the training stage, the corrected BN significantly improves the inference performance when training with small batch size.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125151846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Matching Behavior Differences for Compressing Vehicle Re-identification Models 压缩车辆再识别模型的学习匹配行为差异
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301869
Yi Xie, Jianqing Zhu, Huanqiang Zeng, C. Cai, Lixin Zheng
Vehicle re-identification matching vehicles captured by different cameras has great potential in the field of public security. However, recent vehicle re-identification approaches exploit complex networks, causing large computations in their testing phases. In this paper, we propose a matching behavior difference learning (MBDL) method to compress vehicle re-identification models for saving testing computations. In order to represent the matching behavior evolution across two different layers of a deep network, a matching behavior difference (MBD) matrix is designed. Then, our MBDL method minimizes the L1 loss function among MBD matrixes from a small student network and a complex teacher network, ensuring the student network use less computations to simulate the teacher network’s matching behaviors. During the testing phase, only the small student network is utilized so that testing computations can be significantly reduced. Experiments on VeRi776 and VehicleID datasets show that MBDL outperforms many state-of-the-art approaches in terms of accuracy and testing time performance.
对不同摄像机捕获的车辆进行再识别,在公安领域具有很大的应用潜力。然而,最近的车辆再识别方法利用了复杂的网络,在测试阶段需要进行大量的计算。本文提出了一种匹配行为差异学习(MBDL)方法来压缩车辆再识别模型,以节省测试计算量。为了表示深度网络两层之间的匹配行为演化,设计了匹配行为差异矩阵(MBD)。然后,我们的MBDL方法最小化了来自小型学生网络和复杂教师网络的MBD矩阵之间的L1损失函数,确保学生网络使用较少的计算来模拟教师网络的匹配行为。在测试阶段,只使用小型学生网络,这样可以大大减少测试计算。在VeRi776和VehicleID数据集上的实验表明,MBDL在准确性和测试时间性能方面优于许多最先进的方法。
{"title":"Learning Matching Behavior Differences for Compressing Vehicle Re-identification Models","authors":"Yi Xie, Jianqing Zhu, Huanqiang Zeng, C. Cai, Lixin Zheng","doi":"10.1109/VCIP49819.2020.9301869","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301869","url":null,"abstract":"Vehicle re-identification matching vehicles captured by different cameras has great potential in the field of public security. However, recent vehicle re-identification approaches exploit complex networks, causing large computations in their testing phases. In this paper, we propose a matching behavior difference learning (MBDL) method to compress vehicle re-identification models for saving testing computations. In order to represent the matching behavior evolution across two different layers of a deep network, a matching behavior difference (MBD) matrix is designed. Then, our MBDL method minimizes the L1 loss function among MBD matrixes from a small student network and a complex teacher network, ensuring the student network use less computations to simulate the teacher network’s matching behaviors. During the testing phase, only the small student network is utilized so that testing computations can be significantly reduced. Experiments on VeRi776 and VehicleID datasets show that MBDL outperforms many state-of-the-art approaches in terms of accuracy and testing time performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124279302","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1