首页 > 最新文献

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
No-Reference Stereoscopic Image Quality Assessment Based On Visual Attention Mechanism 基于视觉注意机制的无参考立体图像质量评价
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301770
Sumei Li, Ping Zhao, Yongli Chang
In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM), which includes channel attention mechanism and spatial attention mechanism. The channel attention mechanism can give high weight to the features with large contribution to final quality, and small weight to features with low contribution. The spatial attention mechanism considers the inner region of a feature, and different areas are assigned different weights according to the importance of the region within the feature. In addition, data selection strategy is designed for CNN model. According to VAM, visual saliency is applied to guide data selection, and a certain proportion of saliency patches are employed to fine tune the network. The same operation is performed on the test set, which can remove data redundancy and improve algorithm performance. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art SIQA methods. Cross-database validation shows high generalization ability and high effectiveness of our model.
本文提出了一种基于视觉注意机制(VAM)的无参考立体图像质量评价(SIQA)优化模型。基于双注意机制(dual attention mechanism, DAM)设计了一个CNN模型,该模型包括通道注意机制和空间注意机制。通道注意机制对最终质量贡献大的特征给予高权重,对最终质量贡献小的特征给予小权重。空间注意机制考虑特征的内部区域,根据特征内区域的重要程度,对不同区域赋予不同的权重。此外,针对CNN模型设计了数据选择策略。根据VAM,利用视觉显著性来指导数据选择,并利用一定比例的显著性补丁对网络进行微调。在测试集上执行相同的操作,可以消除数据冗余,提高算法性能。在两个公共数据库上的实验结果表明,该模型优于目前最先进的SIQA方法。跨数据库验证表明,该模型具有较高的泛化能力和有效性。
{"title":"No-Reference Stereoscopic Image Quality Assessment Based On Visual Attention Mechanism","authors":"Sumei Li, Ping Zhao, Yongli Chang","doi":"10.1109/VCIP49819.2020.9301770","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301770","url":null,"abstract":"In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM), which includes channel attention mechanism and spatial attention mechanism. The channel attention mechanism can give high weight to the features with large contribution to final quality, and small weight to features with low contribution. The spatial attention mechanism considers the inner region of a feature, and different areas are assigned different weights according to the importance of the region within the feature. In addition, data selection strategy is designed for CNN model. According to VAM, visual saliency is applied to guide data selection, and a certain proportion of saliency patches are employed to fine tune the network. The same operation is performed on the test set, which can remove data redundancy and improve algorithm performance. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art SIQA methods. Cross-database validation shows high generalization ability and high effectiveness of our model.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Learning Redundant Sparsifying Transform based on Equi-Angular Frame 基于等角框架的学习冗余稀疏变换
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301836
Min Zhang, Yunhui Shi, Xiaoyan Sun, N. Ling, Na Qi
Due to the fact that sparse coding in redundant sparse dictionary learning model is NP-hard, interest has turned to the non-redundant sparsifying transform as its sparse coding is computationally cheap. However, natural images typically contain diverse textures that cannot be sparsified well by a non-redundant system. In this paper we propose a new approach for learning redundant sparsifying transform based on equi-angular frame, where the frame and its dual frame are corresponding to applying the forward and the backward transforms. The uniform mutual coherence in the sparsifying transform is enforced by the equi-angular constraint, which better sparsifies diverse textures. In addition, an efficient algorithm is proposed for learning the redundant transform. Experimental results for image representation illustrate the superiority of our proposed method over non-redundant sparsifying transforms. The image denoising results show that our proposed method achieves superior denoising performance, in terms of subjective and objective quality, compared to the K-SVD, the data-driven tight frame method, the learning based sparsifying transform and the overcomplete transform model with block cosparsity (OCTOBOS).
由于冗余稀疏字典学习模型中的稀疏编码是np困难的,人们的兴趣转向了非冗余稀疏化变换,因为它的稀疏编码计算成本低。然而,自然图像通常包含不同的纹理,不能通过非冗余系统很好地稀疏化。本文提出了一种基于等角框架的冗余稀疏化变换学习新方法,其中框架及其对偶框架对应于应用前向变换和后向变换。在稀疏化变换中,通过等角约束实现均匀的相互相干性,使不同纹理得到更好的稀疏化。此外,还提出了一种有效的冗余变换学习算法。图像表示的实验结果表明,我们提出的方法优于非冗余稀疏化变换。图像去噪结果表明,与K-SVD方法、数据驱动的紧框架方法、基于学习的稀疏化变换方法和具有块co稀疏度的过完备变换模型(OCTOBOS)相比,本文方法在主观和客观质量上都取得了较好的去噪效果。
{"title":"Learning Redundant Sparsifying Transform based on Equi-Angular Frame","authors":"Min Zhang, Yunhui Shi, Xiaoyan Sun, N. Ling, Na Qi","doi":"10.1109/VCIP49819.2020.9301836","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301836","url":null,"abstract":"Due to the fact that sparse coding in redundant sparse dictionary learning model is NP-hard, interest has turned to the non-redundant sparsifying transform as its sparse coding is computationally cheap. However, natural images typically contain diverse textures that cannot be sparsified well by a non-redundant system. In this paper we propose a new approach for learning redundant sparsifying transform based on equi-angular frame, where the frame and its dual frame are corresponding to applying the forward and the backward transforms. The uniform mutual coherence in the sparsifying transform is enforced by the equi-angular constraint, which better sparsifies diverse textures. In addition, an efficient algorithm is proposed for learning the redundant transform. Experimental results for image representation illustrate the superiority of our proposed method over non-redundant sparsifying transforms. The image denoising results show that our proposed method achieves superior denoising performance, in terms of subjective and objective quality, compared to the K-SVD, the data-driven tight frame method, the learning based sparsifying transform and the overcomplete transform model with block cosparsity (OCTOBOS).","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117183464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-Reference Objective Quality Assessment Method of Display Products 展示产品无参考客观质量评价方法
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301894
Huiqing Zhang, Donghao Li, Lifang Wu, Zhifang Xia
Recent years have witnessed the spread of electronic devices especially the mobile phones, which have become almost the necessities in people’s daily lives. An effective and efficient technique for blindly assessing the quality of display products is greatly helpful to improve the experiences of users, such as displaying the pictures or texts in a more comfortable manner. In this paper, we put forward a novel no-reference (NR) quality metric of display products, dubbed as NQMDP. First, we have established a new subjective photo quality database, in which 50 photos shown on three different types of display products were captured to constitute a total of 150 photos and then scored by more than 40 inexperienced observers. Second, 19 effective image features are extracted by using six different influencing factors (including complexity, contrast, sharpness, brightness, colorfulness and naturalness) on the quality of display products and then were learned with the support vector regressor (SVR) to estimate the objective quality score of each photo. Results of experiments show that our proposed method has obtained better performance than the state-of-the-art algorithms.
近年来,电子设备,尤其是手机的普及,已经成为人们日常生活中的必需品。一种有效的、高效的盲目评估显示产品质量的技术,对于提高用户的体验有很大的帮助,比如以更舒适的方式显示图片或文字。本文提出了一种新的显示产品无参考(NR)质量度量,称为NQMDP。首先,我们建立了一个新的主观照片质量数据库,其中采集了三种不同类型展示产品上的50张照片,共150张照片,然后由40多名没有经验的观察者进行评分。其次,利用复杂度、对比度、清晰度、亮度、色彩度和自然度6个不同的影响显示产品质量的因素,提取出19个有效的图像特征,并利用支持向量回归(SVR)进行学习,估计每张照片的客观质量分数。实验结果表明,该方法比现有的算法具有更好的性能。
{"title":"No-Reference Objective Quality Assessment Method of Display Products","authors":"Huiqing Zhang, Donghao Li, Lifang Wu, Zhifang Xia","doi":"10.1109/VCIP49819.2020.9301894","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301894","url":null,"abstract":"Recent years have witnessed the spread of electronic devices especially the mobile phones, which have become almost the necessities in people’s daily lives. An effective and efficient technique for blindly assessing the quality of display products is greatly helpful to improve the experiences of users, such as displaying the pictures or texts in a more comfortable manner. In this paper, we put forward a novel no-reference (NR) quality metric of display products, dubbed as NQMDP. First, we have established a new subjective photo quality database, in which 50 photos shown on three different types of display products were captured to constitute a total of 150 photos and then scored by more than 40 inexperienced observers. Second, 19 effective image features are extracted by using six different influencing factors (including complexity, contrast, sharpness, brightness, colorfulness and naturalness) on the quality of display products and then were learned with the support vector regressor (SVR) to estimate the objective quality score of each photo. Results of experiments show that our proposed method has obtained better performance than the state-of-the-art algorithms.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129749677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time Detection and Tracking Network with Feature Sharing 特征共享的实时检测与跟踪网络
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301779
Ente Guo, Z. Chen, Zhenjia Fan, Xiujun Yang
Multiple object tracking (MOT) systems can benefit many applications, such as autonomous driving, action recognition, and surveillance. State-of-the-art methods detect objects in an image and then use a representation model to connect these objects with existing trajectories. However, the combination of these two components to reduce computation has received minimal attention. In this study, we propose a single-shot network for simultaneously detecting objects and extracting tracking features to achieve a real-time MOT system. We also present a detection–tracking coupled method that uses temporal information to improve the accuracy of object detection and make trajectories complete. Experimentation on the KITTI driving dataset indicates that our scheme achieves an accurate and fast MOT system. In particular, the lightweight network reaches a running speed of 100 FPS.
多目标跟踪(MOT)系统可以使许多应用受益,例如自动驾驶、动作识别和监视。最先进的方法检测图像中的物体,然后使用表征模型将这些物体与现有的轨迹连接起来。然而,结合这两个组件来减少计算很少受到关注。在本研究中,我们提出了一种单镜头网络,用于同时检测目标并提取跟踪特征,以实现实时MOT系统。我们还提出了一种利用时间信息提高目标检测精度并使轨迹完整的检测-跟踪耦合方法。在KITTI驾驶数据集上的实验表明,我们的方案实现了一个准确、快速的MOT系统。其中,轻量级网络的运行速度达到100fps。
{"title":"Real-time Detection and Tracking Network with Feature Sharing","authors":"Ente Guo, Z. Chen, Zhenjia Fan, Xiujun Yang","doi":"10.1109/VCIP49819.2020.9301779","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301779","url":null,"abstract":"Multiple object tracking (MOT) systems can benefit many applications, such as autonomous driving, action recognition, and surveillance. State-of-the-art methods detect objects in an image and then use a representation model to connect these objects with existing trajectories. However, the combination of these two components to reduce computation has received minimal attention. In this study, we propose a single-shot network for simultaneously detecting objects and extracting tracking features to achieve a real-time MOT system. We also present a detection–tracking coupled method that uses temporal information to improve the accuracy of object detection and make trajectories complete. Experimentation on the KITTI driving dataset indicates that our scheme achieves an accurate and fast MOT system. In particular, the lightweight network reaches a running speed of 100 FPS.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128440297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sparse Spectral Unmixing of Hyperspectral Images using Expectation-Propagation 基于期望传播的高光谱图像稀疏光谱分解
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301819
Zeng Li, Y. Altmann, Jie Chen, S. Mclaughlin, S. Rahardja
The aim of spectral unmixing of hyperspectral images is to determine the component materials and their associated abundances from mixed pixels. In this paper, we present sparse linear unmixing via an Expectation-Propagation method based on the classical linear mixing model and a spike-and-slab prior promoting abundance sparsity. The proposed method, which allows approximate uncertainty quantification (UQ), is compared to existing sparse unmixing methods, including Monte Carlo strategies traditionally considered for UQ. Experimental results on synthetic data and real hyperspectral data illustrate the benefits of the proposed algorithm over state-of-art linear unmixing methods.
高光谱图像光谱分解的目的是从混合像元中确定组成物质及其相关丰度。本文采用基于经典线性混合模型的期望-传播方法和提高丰度稀疏性的尖峰-板先验,提出了稀疏线性解混方法。该方法允许近似不确定性量化(UQ),并与现有的稀疏解混方法进行了比较,包括传统上考虑UQ的蒙特卡罗策略。在合成数据和真实高光谱数据上的实验结果表明,该算法优于现有的线性解混方法。
{"title":"Sparse Spectral Unmixing of Hyperspectral Images using Expectation-Propagation","authors":"Zeng Li, Y. Altmann, Jie Chen, S. Mclaughlin, S. Rahardja","doi":"10.1109/VCIP49819.2020.9301819","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301819","url":null,"abstract":"The aim of spectral unmixing of hyperspectral images is to determine the component materials and their associated abundances from mixed pixels. In this paper, we present sparse linear unmixing via an Expectation-Propagation method based on the classical linear mixing model and a spike-and-slab prior promoting abundance sparsity. The proposed method, which allows approximate uncertainty quantification (UQ), is compared to existing sparse unmixing methods, including Monte Carlo strategies traditionally considered for UQ. Experimental results on synthetic data and real hyperspectral data illustrate the benefits of the proposed algorithm over state-of-art linear unmixing methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130536249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Multi-Model Fusion Framework for NIR-to-RGB Translation nir到rgb转换的多模型融合框架
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301787
Longbin Yan, Xiuheng Wang, Min Zhao, Shumin Liu, Jie Chen
Near-infrared (NIR) images provide spectral information beyond the visible light spectrum and thus are useful in many applications. However, single-channel NIR images contain less information per pixel than RGB images and lack visibility for human perception. Transforming NIR images to RGB images is necessary for performing further analysis and computer vision tasks. In this work, we propose a novel NIR-to-RGB translation method. It contains two sub-networks and a fusion operator. Specifically, a U-net based neural network is used to learn the texture information while a CycleGAN based neural network is adopted to excavate the color information. Finally, a guided filter based fusion strategy is applied to fuse the outputs of these two neural networks. Experiment results show that our proposed method achieves superior NIR-to-RGB translation performance.
近红外(NIR)图像提供了可见光光谱以外的光谱信息,因此在许多应用中都很有用。然而,与RGB图像相比,单通道近红外图像每像素包含的信息更少,并且缺乏人类感知的可见性。将近红外图像转换为RGB图像对于执行进一步的分析和计算机视觉任务是必要的。在这项工作中,我们提出了一种新的nir到rgb的翻译方法。它包含两个子网和一个融合算子。具体来说,使用基于U-net的神经网络学习纹理信息,使用基于CycleGAN的神经网络挖掘颜色信息。最后,采用一种基于引导滤波的融合策略对两个神经网络的输出进行融合。实验结果表明,该方法具有较好的nir - rgb转换性能。
{"title":"A Multi-Model Fusion Framework for NIR-to-RGB Translation","authors":"Longbin Yan, Xiuheng Wang, Min Zhao, Shumin Liu, Jie Chen","doi":"10.1109/VCIP49819.2020.9301787","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301787","url":null,"abstract":"Near-infrared (NIR) images provide spectral information beyond the visible light spectrum and thus are useful in many applications. However, single-channel NIR images contain less information per pixel than RGB images and lack visibility for human perception. Transforming NIR images to RGB images is necessary for performing further analysis and computer vision tasks. In this work, we propose a novel NIR-to-RGB translation method. It contains two sub-networks and a fusion operator. Specifically, a U-net based neural network is used to learn the texture information while a CycleGAN based neural network is adopted to excavate the color information. Finally, a guided filter based fusion strategy is applied to fuse the outputs of these two neural networks. Experiment results show that our proposed method achieves superior NIR-to-RGB translation performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123959203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
FastSCCNet: Fast Mode Decision in VVC Screen Content Coding via Fully Convolutional Network 基于全卷积网络的VVC屏幕内容编码的快速模式决策
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301885
Sik-Ho Tsang, Ngai-Wing Kwong, Yui-Lam Chan
Screen content coding have been supported recently in Versatile Video Coding (VVC) to improve the coding efficiency of screen content videos by adopting new coding modes which are dedicated to screen content video compression. Two new coding modes called Intra Block Copy (IBC) and Palette (PLT) are introduced. However, the flexible quad-tree plus multi-type tree (QTMT) coding structure for coding unit (CU) partitioning in VVC makes the fast algorithm of the SCC particularly challenging. To efficiently reduce the computational complexity of SCC in VVC, we propose a deep learning based fast prediction network, namely FastSCCNet, where a fully convolutional network (FCN) is designed. CUs are classified into natural content block (NCB) and screen content block (SCB). With the use of FCN, only one shot inference is needed to classify the block types of the current CU and all corresponding sub-CUs. After block classification, different subsets of coding modes are assigned according to the block type, to accelerate the encoding process. Compared with the conventional SCC in VVC, our proposed FastSCCNet reduced the encoding time by 29.88% on average, with negligible bitrate increase under all-intra configuration. To the best of our knowledge, it is the first approach to tackle the computational complexity reduction for SCC in VVC.
多功能视频编码(VVC)最近支持屏幕内容编码,通过采用专门用于屏幕内容视频压缩的新编码模式来提高屏幕内容视频的编码效率。介绍了两种新的编码模式,即块内复制(IBC)和调色板(PLT)。然而,VVC中用于编码单元(CU)划分的灵活的四叉树加多类型树(QTMT)编码结构给SCC的快速算法带来了极大的挑战。为了有效降低VVC中SCC的计算复杂度,我们提出了一种基于深度学习的快速预测网络FastSCCNet,其中设计了一个全卷积网络(FCN)。cu分为自然内容块(NCB)和屏幕内容块(SCB)。使用FCN,只需要一次推理就可以对当前CU和所有对应的子CU的块类型进行分类。分组分类后,根据分组类型分配不同的编码模式子集,加快编码过程。与VVC中的传统SCC相比,我们提出的FastSCCNet平均减少了29.88%的编码时间,而在全帧内配置下比特率的增加可以忽略不计。据我们所知,这是解决VVC中SCC计算复杂性降低的第一种方法。
{"title":"FastSCCNet: Fast Mode Decision in VVC Screen Content Coding via Fully Convolutional Network","authors":"Sik-Ho Tsang, Ngai-Wing Kwong, Yui-Lam Chan","doi":"10.1109/VCIP49819.2020.9301885","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301885","url":null,"abstract":"Screen content coding have been supported recently in Versatile Video Coding (VVC) to improve the coding efficiency of screen content videos by adopting new coding modes which are dedicated to screen content video compression. Two new coding modes called Intra Block Copy (IBC) and Palette (PLT) are introduced. However, the flexible quad-tree plus multi-type tree (QTMT) coding structure for coding unit (CU) partitioning in VVC makes the fast algorithm of the SCC particularly challenging. To efficiently reduce the computational complexity of SCC in VVC, we propose a deep learning based fast prediction network, namely FastSCCNet, where a fully convolutional network (FCN) is designed. CUs are classified into natural content block (NCB) and screen content block (SCB). With the use of FCN, only one shot inference is needed to classify the block types of the current CU and all corresponding sub-CUs. After block classification, different subsets of coding modes are assigned according to the block type, to accelerate the encoding process. Compared with the conventional SCC in VVC, our proposed FastSCCNet reduced the encoding time by 29.88% on average, with negligible bitrate increase under all-intra configuration. To the best of our knowledge, it is the first approach to tackle the computational complexity reduction for SCC in VVC.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"40 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121505281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Point Cloud Geometry Prediction Across Spatial Scale using Deep Learning 利用深度学习跨空间尺度的点云几何预测
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301804
Anique Akhtar, Wen Gao, Xianguo Zhang, Li Li, Zhu Li, Shan Liu
A point cloud is a 3D data representation that is becoming increasingly popular. Due to the large size of a point cloud, the transmission of point cloud is not feasible without compression. However, the current point cloud lossy compression and processing techniques suffer from quantization loss which results in a coarser sub-sampled representation of point cloud. In this paper, we solve the problem of points lost during voxelization by performing geometry prediction across spatial scale using deep learning architecture. We perform an octree-type upsampling of point cloud geometry where each voxel point is divided into 8 sub-voxel points and their occupancy is predicted by our network. This way we obtain a denser representation of the point cloud while minimizing the losses with respect to the ground truth. We utilize sparse tensors with sparse convolutions by using Minkowski Engine with a UNet like network equipped with inception-residual network blocks. Our results show that our geometry prediction scheme can significantly improve the PSNR of a point cloud, therefore, making it an essential post-processing scheme for the compression-transmission pipeline. This solution can serve as a crucial prediction tool across scale for point cloud compression, as well as display adaptation.
点云是一种越来越流行的3D数据表示。由于点云的体积较大,不进行压缩传输是不可行的。然而,目前的点云有损压缩和处理技术存在量化损失,导致点云的子采样表示较为粗糙。在本文中,我们通过使用深度学习架构跨空间尺度进行几何预测来解决体素化过程中丢失点的问题。我们执行点云几何的八叉树式上采样,其中每个体素点被划分为8个子体素点,并由我们的网络预测它们的占用情况。通过这种方式,我们获得了点云的更密集的表示,同时最小化了相对于地面真值的损失。我们通过使用Minkowski引擎和带有初始-残差网络块的UNet类网络来利用稀疏卷积的稀疏张量。结果表明,我们的几何预测方案可以显著提高点云的PSNR,因此,使其成为压缩传输管道必不可少的后处理方案。该解决方案可作为点云压缩和显示适应跨尺度的关键预测工具。
{"title":"Point Cloud Geometry Prediction Across Spatial Scale using Deep Learning","authors":"Anique Akhtar, Wen Gao, Xianguo Zhang, Li Li, Zhu Li, Shan Liu","doi":"10.1109/VCIP49819.2020.9301804","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301804","url":null,"abstract":"A point cloud is a 3D data representation that is becoming increasingly popular. Due to the large size of a point cloud, the transmission of point cloud is not feasible without compression. However, the current point cloud lossy compression and processing techniques suffer from quantization loss which results in a coarser sub-sampled representation of point cloud. In this paper, we solve the problem of points lost during voxelization by performing geometry prediction across spatial scale using deep learning architecture. We perform an octree-type upsampling of point cloud geometry where each voxel point is divided into 8 sub-voxel points and their occupancy is predicted by our network. This way we obtain a denser representation of the point cloud while minimizing the losses with respect to the ground truth. We utilize sparse tensors with sparse convolutions by using Minkowski Engine with a UNet like network equipped with inception-residual network blocks. Our results show that our geometry prediction scheme can significantly improve the PSNR of a point cloud, therefore, making it an essential post-processing scheme for the compression-transmission pipeline. This solution can serve as a crucial prediction tool across scale for point cloud compression, as well as display adaptation.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124501024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Sparse Representation-Based Intra Prediction for Lossless/Near Lossless Video Coding 基于稀疏表示的无损/近无损视频编码内预测
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301752
Linwei Zhu, Yun Zhang, N. Li, Jinyong Pi, Xinju Wu
In this paper, a novel intra prediction method is presented for lossless/near lossless High Efficiency Video Coding (HEVC), termed as Sparse Representation based Intra Prediction (SRIP). In specific, the existing Angular Intra Prediction (AIP) modes in HEVC are organized as a mode dictionary, which is utilized to sparsely represent the visual signal by minimizing the difference with respect to the ground truth. For the match of encoding and decoding, the sparse coefficients are also required to be encoded and transmitted to the decoder side. To further improve the coding performance, an additional binary flag is included in the video codec to indicate which strategy is finally adopted with the rate distortion optimization, i.e., SRIP or traditional AIP. Extensive experimental results reveal that the proposed method can achieve 0.36% bit rate saving on average in case of lossless scenario.
本文提出了一种新的用于无损/近无损高效视频编码(HEVC)的帧内预测方法——基于稀疏表示的帧内预测(SRIP)。具体来说,HEVC中现有的Angular Intra Prediction (AIP)模式被组织成一个模式字典,通过最小化相对于ground truth的差异来稀疏表示视觉信号。为了编码和解码的匹配,还需要对稀疏系数进行编码并传输到解码器侧。为了进一步提高编码性能,在视频编解码器中加入一个额外的二进制标志来表示最终采用哪种策略进行速率失真优化,即SRIP还是传统的AIP。大量的实验结果表明,在无损情况下,该方法可以平均节省0.36%的比特率。
{"title":"Sparse Representation-Based Intra Prediction for Lossless/Near Lossless Video Coding","authors":"Linwei Zhu, Yun Zhang, N. Li, Jinyong Pi, Xinju Wu","doi":"10.1109/VCIP49819.2020.9301752","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301752","url":null,"abstract":"In this paper, a novel intra prediction method is presented for lossless/near lossless High Efficiency Video Coding (HEVC), termed as Sparse Representation based Intra Prediction (SRIP). In specific, the existing Angular Intra Prediction (AIP) modes in HEVC are organized as a mode dictionary, which is utilized to sparsely represent the visual signal by minimizing the difference with respect to the ground truth. For the match of encoding and decoding, the sparse coefficients are also required to be encoded and transmitted to the decoder side. To further improve the coding performance, an additional binary flag is included in the video codec to indicate which strategy is finally adopted with the rate distortion optimization, i.e., SRIP or traditional AIP. Extensive experimental results reveal that the proposed method can achieve 0.36% bit rate saving on average in case of lossless scenario.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Volumetric End-to-End Optimized Compression for Brain Images 体积端到端优化压缩脑图像
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301767
Shuo Gao, Yueyi Zhang, Dong Liu, Zhiwei Xiong
The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it’s urgent to explore an efficient volumetric compression method. Recent years have witnessed the progress of deep learning-based approaches for two-dimensional (2D) natural image compression, but the field of learned volumetric image compression still remains unexplored. In this paper, we propose the first end-to-end learning framework for volumetric image compression by extending the advanced techniques of 2D image compression to volumetric images. Specifically, a convolutional autoencoder is used to compress 3D image cubes, and the non-local attention models are embedded in the convolutional autoencoder to jointly capture local and global correlations. Both hyperprior and autoregressive models are used to perform the conditional probability estimation in entropy coding. To reduce model complexity, we introduce a convolutional long short-term memory network for the autoregressive model based on channel-wise prediction. Experimental results on volumetric mouse brain images show that the proposed method outperforms JPEG2000-3D, HEVC and state-of-the-art 2D methods.
脑体积图像的量迅速增加,需要大量的存储和传输资源,因此迫切需要探索一种高效的体积压缩方法。近年来,基于深度学习的二维(2D)自然图像压缩方法取得了进展,但学习的体积图像压缩领域仍未得到探索。在本文中,我们通过将2D图像压缩的先进技术扩展到体积图像,提出了第一个用于体积图像压缩的端到端学习框架。具体而言,采用卷积自编码器对三维图像立方体进行压缩,并将非局部注意模型嵌入到卷积自编码器中,以共同捕获局部和全局相关性。采用超先验模型和自回归模型对熵编码中的条件概率进行估计。为了降低模型复杂度,我们引入了一个基于通道预测的卷积长短期记忆网络。实验结果表明,该方法优于JPEG2000-3D、HEVC和最先进的2D方法。
{"title":"Volumetric End-to-End Optimized Compression for Brain Images","authors":"Shuo Gao, Yueyi Zhang, Dong Liu, Zhiwei Xiong","doi":"10.1109/VCIP49819.2020.9301767","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301767","url":null,"abstract":"The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it’s urgent to explore an efficient volumetric compression method. Recent years have witnessed the progress of deep learning-based approaches for two-dimensional (2D) natural image compression, but the field of learned volumetric image compression still remains unexplored. In this paper, we propose the first end-to-end learning framework for volumetric image compression by extending the advanced techniques of 2D image compression to volumetric images. Specifically, a convolutional autoencoder is used to compress 3D image cubes, and the non-local attention models are embedded in the convolutional autoencoder to jointly capture local and global correlations. Both hyperprior and autoregressive models are used to perform the conditional probability estimation in entropy coding. To reduce model complexity, we introduce a convolutional long short-term memory network for the autoregressive model based on channel-wise prediction. Experimental results on volumetric mouse brain images show that the proposed method outperforms JPEG2000-3D, HEVC and state-of-the-art 2D methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132209557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1