Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301750
K. Sivakumar, B. Vishwanath, K. Rose
The VR180 format is gaining considerable traction among the various promising immersive multimedia formats that will arguably dominate future multimedia consumption applications. VR180 enables stereo viewing of a hemisphere about the user. The increased field of view and the stereo setting result in extensive volumes of data that strongly motivate the pursuit of novel efficient compression tools tailored to this format. This paper’s focus is on the critical inter-view prediction module that exploits correlations between camera views. Existing approaches mainly consist of projection to a plane where traditional multi-view coders are applied, and disparity compensation employs simple block translation in the plane. However, warping due to the projection renders such compensation highly suboptimal. The proposed approach circumvents this shortcoming by performing geodesic disparity compensation on the sphere. It leverages the observation that, as an observer moves from one view point to the other, all points on surrounding objects are perceived to move along respective geodesics on the sphere, which all intersect at the two points where the axis connecting the two view points pierces the sphere. Thus, the proposed method performs inter-view prediction on the sphere by moving pixels along their predefined respective geodesics, and accurately captures the perceived deformations. Experimental results show significant bitrate savings and evidence the efficacy of the proposed approach.
{"title":"Geodesic Disparity Compensation for Inter-View Prediction in VR180","authors":"K. Sivakumar, B. Vishwanath, K. Rose","doi":"10.1109/VCIP49819.2020.9301750","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301750","url":null,"abstract":"The VR180 format is gaining considerable traction among the various promising immersive multimedia formats that will arguably dominate future multimedia consumption applications. VR180 enables stereo viewing of a hemisphere about the user. The increased field of view and the stereo setting result in extensive volumes of data that strongly motivate the pursuit of novel efficient compression tools tailored to this format. This paper’s focus is on the critical inter-view prediction module that exploits correlations between camera views. Existing approaches mainly consist of projection to a plane where traditional multi-view coders are applied, and disparity compensation employs simple block translation in the plane. However, warping due to the projection renders such compensation highly suboptimal. The proposed approach circumvents this shortcoming by performing geodesic disparity compensation on the sphere. It leverages the observation that, as an observer moves from one view point to the other, all points on surrounding objects are perceived to move along respective geodesics on the sphere, which all intersect at the two points where the axis connecting the two view points pierces the sphere. Thus, the proposed method performs inter-view prediction on the sphere by moving pixels along their predefined respective geodesics, and accurately captures the perceived deformations. Experimental results show significant bitrate savings and evidence the efficacy of the proposed approach.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125053504","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301859
Chang Liu, Ke-bin Jia, Pengyu Liu
Compared with traditional High Efficiency Video Coding (HEVC), 3D-HEVC introduces multi-view coding and depth map coding, which leads to significant increase in coding complexity. In this paper, we propose a low complexity intra coding algorithm for depth map based on end-to-end edge detection network. Firstly, we use Holistically Nested Edge Detection (HED) network to determine the edge location of the depth map. Secondly, we use Ostu method to divide the output of the HED into foreground region and background region. Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU). Experimental results demonstrate that the proposed algorithm can reduce the encoding time by 39.56% on average under negligible degradation of coding performance.
与传统的高效视频编码(High Efficiency Video Coding, HEVC)相比,3D-HEVC引入了多视图编码和深度图编码,使得编码复杂度显著提高。本文提出了一种基于端到端边缘检测网络的低复杂度深度图内编码算法。首先,我们使用整体嵌套边缘检测(HED)网络确定深度图的边缘位置。其次,利用Ostu方法将HED的输出分割为前景区域和背景区域;最后,根据编码树单元(CTU)的区域确定CU大小和内模候选列表。实验结果表明,该算法在编码性能下降可以忽略不计的情况下,平均减少了39.56%的编码时间。
{"title":"Fast Intra Coding Algorithm for Depth Map with End-to-End Edge Detection Network","authors":"Chang Liu, Ke-bin Jia, Pengyu Liu","doi":"10.1109/VCIP49819.2020.9301859","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301859","url":null,"abstract":"Compared with traditional High Efficiency Video Coding (HEVC), 3D-HEVC introduces multi-view coding and depth map coding, which leads to significant increase in coding complexity. In this paper, we propose a low complexity intra coding algorithm for depth map based on end-to-end edge detection network. Firstly, we use Holistically Nested Edge Detection (HED) network to determine the edge location of the depth map. Secondly, we use Ostu method to divide the output of the HED into foreground region and background region. Finally, the CU size and the candidate list of intra mode are determined according to the region of coding tree unit (CTU). Experimental results demonstrate that the proposed algorithm can reduce the encoding time by 39.56% on average under negligible degradation of coding performance.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129103929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301753
Wen-Hsiao Peng, H. Hang
The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.
{"title":"Recent Advances in End-to-End Learned Image and Video Compression","authors":"Wen-Hsiao Peng, H. Hang","doi":"10.1109/VCIP49819.2020.9301753","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301753","url":null,"abstract":"The DCT-based transform coding technique was adopted by the international standards (ISO JPEG, ITU H.261/264/265, ISO MPEG-2/4/H, and many others) for nearly 30 years. Although researchers are still trying to improve its efficiency by fine-tuning its components and parameters, the basic structure has not changed in the past two decades.The deep learning technology recently developed may provide a new direction for constructing a high-compression image/video coding system. Recent results, particularly from the Challenge on Learned Image Compression (CLIC) at CVPR, indicate that this new type of schemes (often trained end-to-end) may have good potential for further improving compression efficiency.In the first part of this tutorial, we shall (1) summarize briefly the progress of this topic in the past 3 or so years, including an overview of CLIC results and JPEG AI Call-for-Evidence Challenge on Learning-based Image Coding (issued in early 2020). Because Deep Neural Network (DNN)-based image compression is a new area, several techniques and structures have been tested. The recently published autoencoder-based schemes can achieve similar PSNR to BPG (Better Portable Graphics, H.265 still image standard) and has superior subject quality (e.g., MSSSIM), especially at the very low bit rates. In the second part, we shall (2) address the detailed design concepts of image compression algorithms using the autoencoder structure. In the third part, we shall switch gears to (3) explore the emerging area of DNN-based video compression. Recent publications in this area have indicated that end-to-end trained video compression can achieve comparable or superior rate-distortion performance to HEVC/H.265. The CLIC at CVPR 2020 also created for the first time a new track dedicated to P-frame coding.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128720714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301872
Runtong Zhang, Fanman Meng, Hongliang Li, Q. Wu, K. Ngan
Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of objects and thus the network cannot discriminate the whole target. We think that original labels used in classification tasks are not enough to describe all features of the objects. If we annotate more detailed labels like class-agnostic attribute labels for each image, the network may be able to mine larger CAM. Motivated by this idea, we propose and design common attribute labels, which are lower-level labels summarized from original image-level categories to describe more details of the target. Moreover, it should be emphasized that our proposed labels have good generalization on unknown categories since attributes (such as head, body, etc.) in some categories (such as dog, cat, etc.) are common and class-agnostic. That is why we call our proposed labels as common attribute labels, which are lower-level and more general compared with traditional labels. We finish the annotation work based on the PASCAL VOC2012 dataset and design a new architecture to successfully classify these common attribute labels. Then after fusing features of attribute labels into original categories, our network can mine larger CAMs of objects. Our method achieves better CAM results in visual and higher evaluation scores compared with traditional methods.
{"title":"Mining Larger Class Activation Map with Common Attribute Labels","authors":"Runtong Zhang, Fanman Meng, Hongliang Li, Q. Wu, K. Ngan","doi":"10.1109/VCIP49819.2020.9301872","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301872","url":null,"abstract":"Class Activation Map (CAM) is the visualization of target regions generated from classification networks. However, classification network trained by class-level labels only has high responses to a few features of objects and thus the network cannot discriminate the whole target. We think that original labels used in classification tasks are not enough to describe all features of the objects. If we annotate more detailed labels like class-agnostic attribute labels for each image, the network may be able to mine larger CAM. Motivated by this idea, we propose and design common attribute labels, which are lower-level labels summarized from original image-level categories to describe more details of the target. Moreover, it should be emphasized that our proposed labels have good generalization on unknown categories since attributes (such as head, body, etc.) in some categories (such as dog, cat, etc.) are common and class-agnostic. That is why we call our proposed labels as common attribute labels, which are lower-level and more general compared with traditional labels. We finish the annotation work based on the PASCAL VOC2012 dataset and design a new architecture to successfully classify these common attribute labels. Then after fusing features of attribute labels into original categories, our network can mine larger CAMs of objects. Our method achieves better CAM results in visual and higher evaluation scores compared with traditional methods.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"147 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117080490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301811
Jian Cao, Zhen Qiu, Zhengren Li, Fan Liang, Jun Wang
This paper proposes an IBC-Mirror mode for Screen Content Coding (SCC) for the next generation video coding standards, including Versatile Video Coding (VVC) and Audio Video Standard-3 in China (AVS3). It is the first time to take mirror characteristic into consideration for SCC in VVC/AVS3. Based on the translational motion model of Intra Block Copy (IBC) mode, the function of "horizontal and vertical flipping" is further added to reduce prediction error and improve coding efficiency. The proposed IBC-Mirror mode is implemented on the latest reference software, including VTM5.0 (VVC) and HPM-5.0 (AVS3). The simulations show that the proposed mode can achieve up to 1~2% (VVC) and 4~7% (AVS3) BD-rate saving for SCC test sequences. Drafts about the mode have been submitted to AVS meeting and investigated in SCC Core Experiments (CE).
{"title":"IBC-Mirror Mode for Screen Content Coding for the Next Generation Video Coding Standards","authors":"Jian Cao, Zhen Qiu, Zhengren Li, Fan Liang, Jun Wang","doi":"10.1109/VCIP49819.2020.9301811","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301811","url":null,"abstract":"This paper proposes an IBC-Mirror mode for Screen Content Coding (SCC) for the next generation video coding standards, including Versatile Video Coding (VVC) and Audio Video Standard-3 in China (AVS3). It is the first time to take mirror characteristic into consideration for SCC in VVC/AVS3. Based on the translational motion model of Intra Block Copy (IBC) mode, the function of \"horizontal and vertical flipping\" is further added to reduce prediction error and improve coding efficiency. The proposed IBC-Mirror mode is implemented on the latest reference software, including VTM5.0 (VVC) and HPM-5.0 (AVS3). The simulations show that the proposed mode can achieve up to 1~2% (VVC) and 4~7% (AVS3) BD-rate saving for SCC test sequences. Drafts about the mode have been submitted to AVS meeting and investigated in SCC Core Experiments (CE).","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121385043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301877
Matthias Kränzler, Christian Herglotz, A. Kaup
In previous research, it is shown that the decoding energy demand of several video codecs can be estimated accurately by using bit stream feature-based models. Therefore, we show in this paper that the visualization with the Decoding Energy Estimation Tool (DENESTO) can help to improve the understanding of the energy demand of the decoder.
{"title":"DENESTO: A Tool for Video Decoding Energy Estimation and Visualization","authors":"Matthias Kränzler, Christian Herglotz, A. Kaup","doi":"10.1109/VCIP49819.2020.9301877","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301877","url":null,"abstract":"In previous research, it is shown that the decoding energy demand of several video codecs can be estimated accurately by using bit stream feature-based models. Therefore, we show in this paper that the visualization with the Decoding Energy Estimation Tool (DENESTO) can help to improve the understanding of the energy demand of the decoder.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122240115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301862
Jin Feng, Kaili Zhao, Xiaolin Song, Anxin Li, Honggang Zhang
The competitive performances in visual tracking are achieved mostly by tracking-by-detection based approaches, whose accuracy highly relies on a binary classifier that distinguishes targets from distractors in a set of candidates. However, severe class imbalance, with few positives (e.g., targets) relative to negatives (e.g., backgrounds), leads to degrade accuracy of classification or increase bias of tracking. In this paper, we propose an imbalance-elimination mechanism, which adopts a multi-class paradigm and utilizes a novel candidate generation strategy. Specifically, our multi-class model assigns samples into one positive class and four proposed negative classes, naturally alleviating class imbalance. We define negative classes by introducing proportions of targets in samples, which values explicitly reveal relative scales between targets and backgrounds. Further-more, during candidate generation, we exploit such scale-aware negative patterns to help adjust searching areas of candidates to incorporate larger target proportions, thus more accurate target candidates are obtained and more positive samples are included to ease class imbalance simultaneously. Extensive experiments on standard benchmarks show that our tracker achieves favorable performance against the state-of-the-art approaches, and offers robust discrimination of positive targets and negative patterns.
{"title":"Robust Visual Tracking Via An Imbalance-Elimination Mechanism","authors":"Jin Feng, Kaili Zhao, Xiaolin Song, Anxin Li, Honggang Zhang","doi":"10.1109/VCIP49819.2020.9301862","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301862","url":null,"abstract":"The competitive performances in visual tracking are achieved mostly by tracking-by-detection based approaches, whose accuracy highly relies on a binary classifier that distinguishes targets from distractors in a set of candidates. However, severe class imbalance, with few positives (e.g., targets) relative to negatives (e.g., backgrounds), leads to degrade accuracy of classification or increase bias of tracking. In this paper, we propose an imbalance-elimination mechanism, which adopts a multi-class paradigm and utilizes a novel candidate generation strategy. Specifically, our multi-class model assigns samples into one positive class and four proposed negative classes, naturally alleviating class imbalance. We define negative classes by introducing proportions of targets in samples, which values explicitly reveal relative scales between targets and backgrounds. Further-more, during candidate generation, we exploit such scale-aware negative patterns to help adjust searching areas of candidates to incorporate larger target proportions, thus more accurate target candidates are obtained and more positive samples are included to ease class imbalance simultaneously. Extensive experiments on standard benchmarks show that our tracker achieves favorable performance against the state-of-the-art approaches, and offers robust discrimination of positive targets and negative patterns.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133877199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301800
T. N. Huu, V. V. Duong, B. Jeon
The increasing prevalence of VR/AR as well as the expected availability of Light Field (LF) display soon call for more practical methods to transmit LF image/video for services. In that aspect, the LF video coding should not only consider the compression efficiency but also the view random-access capability (especially in the multi-view-based system). The multi-view coding system heavily exploits view dependencies coming from both inter-view and temporal correlation. While such a system greatly improves the compression efficiency, its view random-access capability can be much reduced due to so called "chain of dependencies." In this paper, we first model the chain of dependencies by a tree, then a cost function is used to assign an importance value to each tree node. By travelling from top to bottom, a node of lesser importance is cut-off, forming a pruned tree to achieve reduction of random-access complexity. Our tree pruning method has shown to reduce about 40% of random-access complexity at the cost of minor compression loss compared to the state-of-the-art methods. Furthermore, it is expected that our method is very lightweight in its realization and also effective on a practical LF video coding system.
{"title":"Random-access-aware Light Field Video Coding using Tree Pruning Method","authors":"T. N. Huu, V. V. Duong, B. Jeon","doi":"10.1109/VCIP49819.2020.9301800","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301800","url":null,"abstract":"The increasing prevalence of VR/AR as well as the expected availability of Light Field (LF) display soon call for more practical methods to transmit LF image/video for services. In that aspect, the LF video coding should not only consider the compression efficiency but also the view random-access capability (especially in the multi-view-based system). The multi-view coding system heavily exploits view dependencies coming from both inter-view and temporal correlation. While such a system greatly improves the compression efficiency, its view random-access capability can be much reduced due to so called \"chain of dependencies.\" In this paper, we first model the chain of dependencies by a tree, then a cost function is used to assign an importance value to each tree node. By travelling from top to bottom, a node of lesser importance is cut-off, forming a pruned tree to achieve reduction of random-access complexity. Our tree pruning method has shown to reduce about 40% of random-access complexity at the cost of minor compression loss compared to the state-of-the-art methods. Furthermore, it is expected that our method is very lightweight in its realization and also effective on a practical LF video coding system.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117140852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301865
Q. Wu, Li Chen, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu
Single image de-raining is quite challenging due to the diversity of rain types and inhomogeneous distributions of rainwater. By means of dedicated models and constraints, existing methods perform well for specific rain type. However, their generalization capability is highly limited as well. In this paper, we propose a unified de-raining model by selectively fusing the clean background of the input rain image and the well restored regions occluded by various rains. This is achieved by our region adaptive coupled network (RACN), whose two branches integrate the features of each other in different layers to jointly generate the spatial-variant weight and restored image respectively. On the one hand, the weight branch could lead the restoration branch to focus on the regions with higher contributions for de-raining. On the other hand, the restoration branch could guide the weight branch to keep off the regions with over-/under-filtering risks. Extensive experiments show that our method outperforms many state-of-the-art de-raining algorithms on diverse rain types including the rain streak, raindrop and rain-mist.
{"title":"A Unified Single Image De-raining Model via Region Adaptive Coupled Network","authors":"Q. Wu, Li Chen, K. Ngan, Hongliang Li, Fanman Meng, Linfeng Xu","doi":"10.1109/VCIP49819.2020.9301865","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301865","url":null,"abstract":"Single image de-raining is quite challenging due to the diversity of rain types and inhomogeneous distributions of rainwater. By means of dedicated models and constraints, existing methods perform well for specific rain type. However, their generalization capability is highly limited as well. In this paper, we propose a unified de-raining model by selectively fusing the clean background of the input rain image and the well restored regions occluded by various rains. This is achieved by our region adaptive coupled network (RACN), whose two branches integrate the features of each other in different layers to jointly generate the spatial-variant weight and restored image respectively. On the one hand, the weight branch could lead the restoration branch to focus on the regions with higher contributions for de-raining. On the other hand, the restoration branch could guide the weight branch to keep off the regions with over-/under-filtering risks. Extensive experiments show that our method outperforms many state-of-the-art de-raining algorithms on diverse rain types including the rain streak, raindrop and rain-mist.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123490851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2020-12-01DOI: 10.1109/VCIP49819.2020.9301841
Yufei Tan, Kan Chang, Hengxin Li, Zhenhua Tang, Tuanfa Qin
Convolutional neural network (CNN)-based color image demosaicking methods have achieved great success recently. However, in many applications where the computation resource is highly limited, it is not practical to deploy large-scale networks. This paper proposes a lightweight CNN for color image demosaicking. Firstly, to effectively extract shallow features, a multi-core feature extraction module, which takes the Bayer sampling positions into consideration, is proposed. Secondly, by taking advantage of inter-channel correlation, an attention-aware fusion module is presented to efficiently r econstruct t he full color image. Moreover, a feature enhancement module, which contains several cascading attention-aware enhancement blocks, is designed to further refine t he i nitial reconstructed i mage. To demonstrate the effectiveness of the proposed network, several state-of-the-art demosaicking methods are compared. Experimental results show that with the smallest number of parameters, the proposed network outperforms the other compared methods in terms of both objective and subjective qualities.
{"title":"Lightweight Color Image Demosaicking with Multi-Core Feature Extraction","authors":"Yufei Tan, Kan Chang, Hengxin Li, Zhenhua Tang, Tuanfa Qin","doi":"10.1109/VCIP49819.2020.9301841","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301841","url":null,"abstract":"Convolutional neural network (CNN)-based color image demosaicking methods have achieved great success recently. However, in many applications where the computation resource is highly limited, it is not practical to deploy large-scale networks. This paper proposes a lightweight CNN for color image demosaicking. Firstly, to effectively extract shallow features, a multi-core feature extraction module, which takes the Bayer sampling positions into consideration, is proposed. Secondly, by taking advantage of inter-channel correlation, an attention-aware fusion module is presented to efficiently r econstruct t he full color image. Moreover, a feature enhancement module, which contains several cascading attention-aware enhancement blocks, is designed to further refine t he i nitial reconstructed i mage. To demonstrate the effectiveness of the proposed network, several state-of-the-art demosaicking methods are compared. Experimental results show that with the smallest number of parameters, the proposed network outperforms the other compared methods in terms of both objective and subjective qualities.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":"64 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123443993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}