Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301770
Sumei Li, Ping Zhao, Yongli Chang
In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM), which includes channel attention mechanism and spatial attention mechanism. The channel attention mechanism can give high weight to the features with large contribution to final quality, and small weight to features with low contribution. The spatial attention mechanism considers the inner region of a feature, and different areas are assigned different weights according to the importance of the region within the feature. In addition, data selection strategy is designed for CNN model. According to VAM, visual saliency is applied to guide data selection, and a certain proportion of saliency patches are employed to fine tune the network. The same operation is performed on the test set, which can remove data redundancy and improve algorithm performance. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art SIQA methods. Cross-database validation shows high generalization ability and high effectiveness of our model.
{"title":"No-Reference Stereoscopic Image Quality Assessment Based On Visual Attention Mechanism","authors":"Sumei Li, Ping Zhao, Yongli Chang","doi":"10.1109/VCIP49819.2020.9301770","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301770","url":null,"abstract":"In this paper, we proposed an optimized model based on the visual attention mechanism(VAM) for no-reference stereoscopic image quality assessment (SIQA). A CNN model is designed based on dual attention mechanism (DAM), which includes channel attention mechanism and spatial attention mechanism. The channel attention mechanism can give high weight to the features with large contribution to final quality, and small weight to features with low contribution. The spatial attention mechanism considers the inner region of a feature, and different areas are assigned different weights according to the importance of the region within the feature. In addition, data selection strategy is designed for CNN model. According to VAM, visual saliency is applied to guide data selection, and a certain proportion of saliency patches are employed to fine tune the network. The same operation is performed on the test set, which can remove data redundancy and improve algorithm performance. Experimental results on two public databases show that the proposed model is superior to the state-of-the-art SIQA methods. Cross-database validation shows high generalization ability and high effectiveness of our model.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114665135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301836
Min Zhang, Yunhui Shi, Xiaoyan Sun, N. Ling, Na Qi
Due to the fact that sparse coding in redundant sparse dictionary learning model is NP-hard, interest has turned to the non-redundant sparsifying transform as its sparse coding is computationally cheap. However, natural images typically contain diverse textures that cannot be sparsified well by a non-redundant system. In this paper we propose a new approach for learning redundant sparsifying transform based on equi-angular frame, where the frame and its dual frame are corresponding to applying the forward and the backward transforms. The uniform mutual coherence in the sparsifying transform is enforced by the equi-angular constraint, which better sparsifies diverse textures. In addition, an efficient algorithm is proposed for learning the redundant transform. Experimental results for image representation illustrate the superiority of our proposed method over non-redundant sparsifying transforms. The image denoising results show that our proposed method achieves superior denoising performance, in terms of subjective and objective quality, compared to the K-SVD, the data-driven tight frame method, the learning based sparsifying transform and the overcomplete transform model with block cosparsity (OCTOBOS).
{"title":"Learning Redundant Sparsifying Transform based on Equi-Angular Frame","authors":"Min Zhang, Yunhui Shi, Xiaoyan Sun, N. Ling, Na Qi","doi":"10.1109/VCIP49819.2020.9301836","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301836","url":null,"abstract":"Due to the fact that sparse coding in redundant sparse dictionary learning model is NP-hard, interest has turned to the non-redundant sparsifying transform as its sparse coding is computationally cheap. However, natural images typically contain diverse textures that cannot be sparsified well by a non-redundant system. In this paper we propose a new approach for learning redundant sparsifying transform based on equi-angular frame, where the frame and its dual frame are corresponding to applying the forward and the backward transforms. The uniform mutual coherence in the sparsifying transform is enforced by the equi-angular constraint, which better sparsifies diverse textures. In addition, an efficient algorithm is proposed for learning the redundant transform. Experimental results for image representation illustrate the superiority of our proposed method over non-redundant sparsifying transforms. The image denoising results show that our proposed method achieves superior denoising performance, in terms of subjective and objective quality, compared to the K-SVD, the data-driven tight frame method, the learning based sparsifying transform and the overcomplete transform model with block cosparsity (OCTOBOS).","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117183464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301894
Huiqing Zhang, Donghao Li, Lifang Wu, Zhifang Xia
Recent years have witnessed the spread of electronic devices especially the mobile phones, which have become almost the necessities in people’s daily lives. An effective and efficient technique for blindly assessing the quality of display products is greatly helpful to improve the experiences of users, such as displaying the pictures or texts in a more comfortable manner. In this paper, we put forward a novel no-reference (NR) quality metric of display products, dubbed as NQMDP. First, we have established a new subjective photo quality database, in which 50 photos shown on three different types of display products were captured to constitute a total of 150 photos and then scored by more than 40 inexperienced observers. Second, 19 effective image features are extracted by using six different influencing factors (including complexity, contrast, sharpness, brightness, colorfulness and naturalness) on the quality of display products and then were learned with the support vector regressor (SVR) to estimate the objective quality score of each photo. Results of experiments show that our proposed method has obtained better performance than the state-of-the-art algorithms.
{"title":"No-Reference Objective Quality Assessment Method of Display Products","authors":"Huiqing Zhang, Donghao Li, Lifang Wu, Zhifang Xia","doi":"10.1109/VCIP49819.2020.9301894","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301894","url":null,"abstract":"Recent years have witnessed the spread of electronic devices especially the mobile phones, which have become almost the necessities in people’s daily lives. An effective and efficient technique for blindly assessing the quality of display products is greatly helpful to improve the experiences of users, such as displaying the pictures or texts in a more comfortable manner. In this paper, we put forward a novel no-reference (NR) quality metric of display products, dubbed as NQMDP. First, we have established a new subjective photo quality database, in which 50 photos shown on three different types of display products were captured to constitute a total of 150 photos and then scored by more than 40 inexperienced observers. Second, 19 effective image features are extracted by using six different influencing factors (including complexity, contrast, sharpness, brightness, colorfulness and naturalness) on the quality of display products and then were learned with the support vector regressor (SVR) to estimate the objective quality score of each photo. Results of experiments show that our proposed method has obtained better performance than the state-of-the-art algorithms.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129749677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301779
Ente Guo, Z. Chen, Zhenjia Fan, Xiujun Yang
Multiple object tracking (MOT) systems can benefit many applications, such as autonomous driving, action recognition, and surveillance. State-of-the-art methods detect objects in an image and then use a representation model to connect these objects with existing trajectories. However, the combination of these two components to reduce computation has received minimal attention. In this study, we propose a single-shot network for simultaneously detecting objects and extracting tracking features to achieve a real-time MOT system. We also present a detection–tracking coupled method that uses temporal information to improve the accuracy of object detection and make trajectories complete. Experimentation on the KITTI driving dataset indicates that our scheme achieves an accurate and fast MOT system. In particular, the lightweight network reaches a running speed of 100 FPS.
{"title":"Real-time Detection and Tracking Network with Feature Sharing","authors":"Ente Guo, Z. Chen, Zhenjia Fan, Xiujun Yang","doi":"10.1109/VCIP49819.2020.9301779","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301779","url":null,"abstract":"Multiple object tracking (MOT) systems can benefit many applications, such as autonomous driving, action recognition, and surveillance. State-of-the-art methods detect objects in an image and then use a representation model to connect these objects with existing trajectories. However, the combination of these two components to reduce computation has received minimal attention. In this study, we propose a single-shot network for simultaneously detecting objects and extracting tracking features to achieve a real-time MOT system. We also present a detection–tracking coupled method that uses temporal information to improve the accuracy of object detection and make trajectories complete. Experimentation on the KITTI driving dataset indicates that our scheme achieves an accurate and fast MOT system. In particular, the lightweight network reaches a running speed of 100 FPS.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128440297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301819
Zeng Li, Y. Altmann, Jie Chen, S. Mclaughlin, S. Rahardja
The aim of spectral unmixing of hyperspectral images is to determine the component materials and their associated abundances from mixed pixels. In this paper, we present sparse linear unmixing via an Expectation-Propagation method based on the classical linear mixing model and a spike-and-slab prior promoting abundance sparsity. The proposed method, which allows approximate uncertainty quantification (UQ), is compared to existing sparse unmixing methods, including Monte Carlo strategies traditionally considered for UQ. Experimental results on synthetic data and real hyperspectral data illustrate the benefits of the proposed algorithm over state-of-art linear unmixing methods.
{"title":"Sparse Spectral Unmixing of Hyperspectral Images using Expectation-Propagation","authors":"Zeng Li, Y. Altmann, Jie Chen, S. Mclaughlin, S. Rahardja","doi":"10.1109/VCIP49819.2020.9301819","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301819","url":null,"abstract":"The aim of spectral unmixing of hyperspectral images is to determine the component materials and their associated abundances from mixed pixels. In this paper, we present sparse linear unmixing via an Expectation-Propagation method based on the classical linear mixing model and a spike-and-slab prior promoting abundance sparsity. The proposed method, which allows approximate uncertainty quantification (UQ), is compared to existing sparse unmixing methods, including Monte Carlo strategies traditionally considered for UQ. Experimental results on synthetic data and real hyperspectral data illustrate the benefits of the proposed algorithm over state-of-art linear unmixing methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"01 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130536249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301787
Longbin Yan, Xiuheng Wang, Min Zhao, Shumin Liu, Jie Chen
Near-infrared (NIR) images provide spectral information beyond the visible light spectrum and thus are useful in many applications. However, single-channel NIR images contain less information per pixel than RGB images and lack visibility for human perception. Transforming NIR images to RGB images is necessary for performing further analysis and computer vision tasks. In this work, we propose a novel NIR-to-RGB translation method. It contains two sub-networks and a fusion operator. Specifically, a U-net based neural network is used to learn the texture information while a CycleGAN based neural network is adopted to excavate the color information. Finally, a guided filter based fusion strategy is applied to fuse the outputs of these two neural networks. Experiment results show that our proposed method achieves superior NIR-to-RGB translation performance.
{"title":"A Multi-Model Fusion Framework for NIR-to-RGB Translation","authors":"Longbin Yan, Xiuheng Wang, Min Zhao, Shumin Liu, Jie Chen","doi":"10.1109/VCIP49819.2020.9301787","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301787","url":null,"abstract":"Near-infrared (NIR) images provide spectral information beyond the visible light spectrum and thus are useful in many applications. However, single-channel NIR images contain less information per pixel than RGB images and lack visibility for human perception. Transforming NIR images to RGB images is necessary for performing further analysis and computer vision tasks. In this work, we propose a novel NIR-to-RGB translation method. It contains two sub-networks and a fusion operator. Specifically, a U-net based neural network is used to learn the texture information while a CycleGAN based neural network is adopted to excavate the color information. Finally, a guided filter based fusion strategy is applied to fuse the outputs of these two neural networks. Experiment results show that our proposed method achieves superior NIR-to-RGB translation performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123959203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301885
Sik-Ho Tsang, Ngai-Wing Kwong, Yui-Lam Chan
Screen content coding have been supported recently in Versatile Video Coding (VVC) to improve the coding efficiency of screen content videos by adopting new coding modes which are dedicated to screen content video compression. Two new coding modes called Intra Block Copy (IBC) and Palette (PLT) are introduced. However, the flexible quad-tree plus multi-type tree (QTMT) coding structure for coding unit (CU) partitioning in VVC makes the fast algorithm of the SCC particularly challenging. To efficiently reduce the computational complexity of SCC in VVC, we propose a deep learning based fast prediction network, namely FastSCCNet, where a fully convolutional network (FCN) is designed. CUs are classified into natural content block (NCB) and screen content block (SCB). With the use of FCN, only one shot inference is needed to classify the block types of the current CU and all corresponding sub-CUs. After block classification, different subsets of coding modes are assigned according to the block type, to accelerate the encoding process. Compared with the conventional SCC in VVC, our proposed FastSCCNet reduced the encoding time by 29.88% on average, with negligible bitrate increase under all-intra configuration. To the best of our knowledge, it is the first approach to tackle the computational complexity reduction for SCC in VVC.
{"title":"FastSCCNet: Fast Mode Decision in VVC Screen Content Coding via Fully Convolutional Network","authors":"Sik-Ho Tsang, Ngai-Wing Kwong, Yui-Lam Chan","doi":"10.1109/VCIP49819.2020.9301885","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301885","url":null,"abstract":"Screen content coding have been supported recently in Versatile Video Coding (VVC) to improve the coding efficiency of screen content videos by adopting new coding modes which are dedicated to screen content video compression. Two new coding modes called Intra Block Copy (IBC) and Palette (PLT) are introduced. However, the flexible quad-tree plus multi-type tree (QTMT) coding structure for coding unit (CU) partitioning in VVC makes the fast algorithm of the SCC particularly challenging. To efficiently reduce the computational complexity of SCC in VVC, we propose a deep learning based fast prediction network, namely FastSCCNet, where a fully convolutional network (FCN) is designed. CUs are classified into natural content block (NCB) and screen content block (SCB). With the use of FCN, only one shot inference is needed to classify the block types of the current CU and all corresponding sub-CUs. After block classification, different subsets of coding modes are assigned according to the block type, to accelerate the encoding process. Compared with the conventional SCC in VVC, our proposed FastSCCNet reduced the encoding time by 29.88% on average, with negligible bitrate increase under all-intra configuration. To the best of our knowledge, it is the first approach to tackle the computational complexity reduction for SCC in VVC.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"40 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121505281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301804
Anique Akhtar, Wen Gao, Xianguo Zhang, Li Li, Zhu Li, Shan Liu
A point cloud is a 3D data representation that is becoming increasingly popular. Due to the large size of a point cloud, the transmission of point cloud is not feasible without compression. However, the current point cloud lossy compression and processing techniques suffer from quantization loss which results in a coarser sub-sampled representation of point cloud. In this paper, we solve the problem of points lost during voxelization by performing geometry prediction across spatial scale using deep learning architecture. We perform an octree-type upsampling of point cloud geometry where each voxel point is divided into 8 sub-voxel points and their occupancy is predicted by our network. This way we obtain a denser representation of the point cloud while minimizing the losses with respect to the ground truth. We utilize sparse tensors with sparse convolutions by using Minkowski Engine with a UNet like network equipped with inception-residual network blocks. Our results show that our geometry prediction scheme can significantly improve the PSNR of a point cloud, therefore, making it an essential post-processing scheme for the compression-transmission pipeline. This solution can serve as a crucial prediction tool across scale for point cloud compression, as well as display adaptation.
{"title":"Point Cloud Geometry Prediction Across Spatial Scale using Deep Learning","authors":"Anique Akhtar, Wen Gao, Xianguo Zhang, Li Li, Zhu Li, Shan Liu","doi":"10.1109/VCIP49819.2020.9301804","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301804","url":null,"abstract":"A point cloud is a 3D data representation that is becoming increasingly popular. Due to the large size of a point cloud, the transmission of point cloud is not feasible without compression. However, the current point cloud lossy compression and processing techniques suffer from quantization loss which results in a coarser sub-sampled representation of point cloud. In this paper, we solve the problem of points lost during voxelization by performing geometry prediction across spatial scale using deep learning architecture. We perform an octree-type upsampling of point cloud geometry where each voxel point is divided into 8 sub-voxel points and their occupancy is predicted by our network. This way we obtain a denser representation of the point cloud while minimizing the losses with respect to the ground truth. We utilize sparse tensors with sparse convolutions by using Minkowski Engine with a UNet like network equipped with inception-residual network blocks. Our results show that our geometry prediction scheme can significantly improve the PSNR of a point cloud, therefore, making it an essential post-processing scheme for the compression-transmission pipeline. This solution can serve as a crucial prediction tool across scale for point cloud compression, as well as display adaptation.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124501024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301752
Linwei Zhu, Yun Zhang, N. Li, Jinyong Pi, Xinju Wu
In this paper, a novel intra prediction method is presented for lossless/near lossless High Efficiency Video Coding (HEVC), termed as Sparse Representation based Intra Prediction (SRIP). In specific, the existing Angular Intra Prediction (AIP) modes in HEVC are organized as a mode dictionary, which is utilized to sparsely represent the visual signal by minimizing the difference with respect to the ground truth. For the match of encoding and decoding, the sparse coefficients are also required to be encoded and transmitted to the decoder side. To further improve the coding performance, an additional binary flag is included in the video codec to indicate which strategy is finally adopted with the rate distortion optimization, i.e., SRIP or traditional AIP. Extensive experimental results reveal that the proposed method can achieve 0.36% bit rate saving on average in case of lossless scenario.
本文提出了一种新的用于无损/近无损高效视频编码(HEVC)的帧内预测方法——基于稀疏表示的帧内预测(SRIP)。具体来说,HEVC中现有的Angular Intra Prediction (AIP)模式被组织成一个模式字典,通过最小化相对于ground truth的差异来稀疏表示视觉信号。为了编码和解码的匹配,还需要对稀疏系数进行编码并传输到解码器侧。为了进一步提高编码性能,在视频编解码器中加入一个额外的二进制标志来表示最终采用哪种策略进行速率失真优化,即SRIP还是传统的AIP。大量的实验结果表明,在无损情况下,该方法可以平均节省0.36%的比特率。
{"title":"Sparse Representation-Based Intra Prediction for Lossless/Near Lossless Video Coding","authors":"Linwei Zhu, Yun Zhang, N. Li, Jinyong Pi, Xinju Wu","doi":"10.1109/VCIP49819.2020.9301752","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301752","url":null,"abstract":"In this paper, a novel intra prediction method is presented for lossless/near lossless High Efficiency Video Coding (HEVC), termed as Sparse Representation based Intra Prediction (SRIP). In specific, the existing Angular Intra Prediction (AIP) modes in HEVC are organized as a mode dictionary, which is utilized to sparsely represent the visual signal by minimizing the difference with respect to the ground truth. For the match of encoding and decoding, the sparse coefficients are also required to be encoded and transmitted to the decoder side. To further improve the coding performance, an additional binary flag is included in the video codec to indicate which strategy is finally adopted with the rate distortion optimization, i.e., SRIP or traditional AIP. Extensive experimental results reveal that the proposed method can achieve 0.36% bit rate saving on average in case of lossless scenario.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"265 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115954369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301767
Shuo Gao, Yueyi Zhang, Dong Liu, Zhiwei Xiong
The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it’s urgent to explore an efficient volumetric compression method. Recent years have witnessed the progress of deep learning-based approaches for two-dimensional (2D) natural image compression, but the field of learned volumetric image compression still remains unexplored. In this paper, we propose the first end-to-end learning framework for volumetric image compression by extending the advanced techniques of 2D image compression to volumetric images. Specifically, a convolutional autoencoder is used to compress 3D image cubes, and the non-local attention models are embedded in the convolutional autoencoder to jointly capture local and global correlations. Both hyperprior and autoregressive models are used to perform the conditional probability estimation in entropy coding. To reduce model complexity, we introduce a convolutional long short-term memory network for the autoregressive model based on channel-wise prediction. Experimental results on volumetric mouse brain images show that the proposed method outperforms JPEG2000-3D, HEVC and state-of-the-art 2D methods.
{"title":"Volumetric End-to-End Optimized Compression for Brain Images","authors":"Shuo Gao, Yueyi Zhang, Dong Liu, Zhiwei Xiong","doi":"10.1109/VCIP49819.2020.9301767","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301767","url":null,"abstract":"The amount of volumetric brain image increases rapidly, which requires a vast amount of resources for storage and transmission, so it’s urgent to explore an efficient volumetric compression method. Recent years have witnessed the progress of deep learning-based approaches for two-dimensional (2D) natural image compression, but the field of learned volumetric image compression still remains unexplored. In this paper, we propose the first end-to-end learning framework for volumetric image compression by extending the advanced techniques of 2D image compression to volumetric images. Specifically, a convolutional autoencoder is used to compress 3D image cubes, and the non-local attention models are embedded in the convolutional autoencoder to jointly capture local and global correlations. Both hyperprior and autoregressive models are used to perform the conditional probability estimation in entropy coding. To reduce model complexity, we introduce a convolutional long short-term memory network for the autoregressive model based on channel-wise prediction. Experimental results on volumetric mouse brain images show that the proposed method outperforms JPEG2000-3D, HEVC and state-of-the-art 2D methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132209557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}