Pub Date : 2021-12-05DOI: 10.1109/vcip53242.2021.9675354
{"title":"Copyright and Reprint Permissions","authors":"","doi":"10.1109/vcip53242.2021.9675354","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675354","url":null,"abstract":"","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"3 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675407
Shiyi Liu, Zhenyu Wang, Ke Qiu, Jiayu Yang, Ronggang Wang
The third generation of Audio Video Coding Standard (AVS3) achieves 22% coding performance improvement compared with High Efficiency Video Coding (HEVC). However, the improvement of encoding efficiency comes from a more flexible block partition scheme is at the cost of much higher encoding complexity. This paper proposes a bottom-up fast algorithm to prune the time-consuming search process of the CU partition tree. To be specific, we design a scoring mechanism based on the splitting patterns traced back from the bottom to predict the possibility of a partition type to be selected as optimal. The score threshold to skip the exhaustive Rate-Distortion Optimization (RDO) procedure of the partition type is determined by statistical analysis. The experimental results show that the proposed methods can achieve 24.56% time-saving with 0.37% BDBR loss under Random Access configuration and 12.50% complexity reduction with 0.08% BDBR loss under All Intra configuration. The effectiveness leads to the adoption by the open-source platform of AVS3 after evaluated by the AVS working group.
与HEVC (High Efficiency Video Coding)相比,第三代音视频编码标准(AVS3)的编码性能提高了22%。然而,更灵活的块划分方案所带来的编码效率的提高是以更高的编码复杂度为代价的。针对CU分区树耗时的搜索过程,提出了一种自底向上的快速算法。具体来说,我们设计了一种基于从底部追溯的分裂模式的评分机制,以预测被选择为最优分区类型的可能性。通过统计分析确定了跳过分区类型的穷举率失真优化(RDO)过程的得分阈值。实验结果表明,该方法在随机接入配置下可节省24.56%的时间和0.37%的BDBR损耗,在全Intra配置下可降低12.50%的复杂度和0.08%的BDBR损耗。经过AVS工作组的评估,其有效性导致AVS3被开源平台采用。
{"title":"A Bottom-up Fast CU Partition Scoring Mechanism for AVS3","authors":"Shiyi Liu, Zhenyu Wang, Ke Qiu, Jiayu Yang, Ronggang Wang","doi":"10.1109/VCIP53242.2021.9675407","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675407","url":null,"abstract":"The third generation of Audio Video Coding Standard (AVS3) achieves 22% coding performance improvement compared with High Efficiency Video Coding (HEVC). However, the improvement of encoding efficiency comes from a more flexible block partition scheme is at the cost of much higher encoding complexity. This paper proposes a bottom-up fast algorithm to prune the time-consuming search process of the CU partition tree. To be specific, we design a scoring mechanism based on the splitting patterns traced back from the bottom to predict the possibility of a partition type to be selected as optimal. The score threshold to skip the exhaustive Rate-Distortion Optimization (RDO) procedure of the partition type is determined by statistical analysis. The experimental results show that the proposed methods can achieve 24.56% time-saving with 0.37% BDBR loss under Random Access configuration and 12.50% complexity reduction with 0.08% BDBR loss under All Intra configuration. The effectiveness leads to the adoption by the open-source platform of AVS3 after evaluated by the AVS working group.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129570534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675419
Jong-Beom Jeong, Soonbin Lee, Eun‐Seok Ryu
With the new immersive video coding standard MPEG immersive video (MIV) and versatile video coding (VVC), six degrees of freedom (6DoF) virtual reality (VR) streaming technology is emerging for both computer-generated and natural content videos. This paper addresses the decoder-wise subpicture bitstream extracting and merging (DWS-BEAM) method for MIV and proposes two main ideas: (i) a selective streaming-aware subpicture allocation method using a motion-constrained tile set (MCTS), (ii) a decoder-wise subpicture extracting and merging method for single-pass decoding. In the experiments using the VVC test model (VTM), the proposed method shows 1.23% BD-rate saving for immersive video PSNR (IV-PSNR) and 15.78% decoding runtime saving compared to the VTM anchor. Moreover, while the MIV test model requires four decoders, the proposed method only requires one decoder.
{"title":"DWS-BEAM: Decoder-Wise Subpicture Bitstream Extracting and Merging for MPEG Immersive Video","authors":"Jong-Beom Jeong, Soonbin Lee, Eun‐Seok Ryu","doi":"10.1109/VCIP53242.2021.9675419","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675419","url":null,"abstract":"With the new immersive video coding standard MPEG immersive video (MIV) and versatile video coding (VVC), six degrees of freedom (6DoF) virtual reality (VR) streaming technology is emerging for both computer-generated and natural content videos. This paper addresses the decoder-wise subpicture bitstream extracting and merging (DWS-BEAM) method for MIV and proposes two main ideas: (i) a selective streaming-aware subpicture allocation method using a motion-constrained tile set (MCTS), (ii) a decoder-wise subpicture extracting and merging method for single-pass decoding. In the experiments using the VVC test model (VTM), the proposed method shows 1.23% BD-rate saving for immersive video PSNR (IV-PSNR) and 15.78% decoding runtime saving compared to the VTM anchor. Moreover, while the MIV test model requires four decoders, the proposed method only requires one decoder.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675401
Jiayu Xie, Xin Jin, Hongkun Cao
Image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
{"title":"SMRD: A Local Feature Descriptor for Multi-modal Image Registration","authors":"Jiayu Xie, Xin Jin, Hongkun Cao","doi":"10.1109/VCIP53242.2021.9675401","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675401","url":null,"abstract":"Image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675422
Franck Galpin, P. Bordes, Thierry Dumas, Pavel Nikitin, F. L. Léannec
This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.
{"title":"Neural Network based Inter bi-prediction Blending","authors":"Franck Galpin, P. Bordes, Thierry Dumas, Pavel Nikitin, F. L. Léannec","doi":"10.1109/VCIP53242.2021.9675422","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675422","url":null,"abstract":"This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123610571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675314
Evgeniy Upenik, Michela Testolina, J. Ascenso, Fernando Pereira, T. Ebrahimi
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based Image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naïve subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.
{"title":"Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding","authors":"Evgeniy Upenik, Michela Testolina, J. Ascenso, Fernando Pereira, T. Ebrahimi","doi":"10.1109/VCIP53242.2021.9675314","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675314","url":null,"abstract":"Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based Image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naïve subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114636275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675393
Shan-zhi Shi, Cheolkon Jung
In this paper, we propose deep metric learning for human action recognition with SlowFast networks. We adopt SlowFast Networks to extract slow-changing spatial semantic information of a single target entity in the spatial domain with fast-changing motion information in the temporal domain. Since deep metric learning is able to learn the class difference between human actions, we utilize deep metric learning to learn a mapping from the original video to the compact features in the embedding space. The proposed network consists of three main parts: 1) two branches independently operating at low and high frame rates to extract spatial and temporal features; 2) feature fusion of the two branches; 3) joint training network of deep metric learning and classification loss. Experimental results on the KTH human action dataset demonstrate that the proposed method achieves faster runtime with less model size than C3D and R3D, while ensuring high accuracy.
{"title":"Deep Metric Learning for Human Action Recognition with SlowFast Networks","authors":"Shan-zhi Shi, Cheolkon Jung","doi":"10.1109/VCIP53242.2021.9675393","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675393","url":null,"abstract":"In this paper, we propose deep metric learning for human action recognition with SlowFast networks. We adopt SlowFast Networks to extract slow-changing spatial semantic information of a single target entity in the spatial domain with fast-changing motion information in the temporal domain. Since deep metric learning is able to learn the class difference between human actions, we utilize deep metric learning to learn a mapping from the original video to the compact features in the embedding space. The proposed network consists of three main parts: 1) two branches independently operating at low and high frame rates to extract spatial and temporal features; 2) feature fusion of the two branches; 3) joint training network of deep metric learning and classification loss. Experimental results on the KTH human action dataset demonstrate that the proposed method achieves faster runtime with less model size than C3D and R3D, while ensuring high accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114432473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675339
Jun Fu, Chen Hou, Zhibo Chen
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.
{"title":"AutoDerain: Memory-efficient Neural Architecture Search for Image Deraining","authors":"Jun Fu, Chen Hou, Zhibo Chen","doi":"10.1109/VCIP53242.2021.9675339","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675339","url":null,"abstract":"Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126701049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675372
Sheng Liu, Liangchen Song, Yi Xu, Junsong Yuan
Existing human models, e.g., SMPL and STAR, represent 3D geometry of a human body in the form of a polygon mesh obtained by deforming a template mesh according to a set of shape and pose parameters. The appearance, however, is not directly modeled by most existing human models. We present a novel 3D human model that faithfully models both the 3D geometry and the appearance of a clothed human body with a continuous volumetric representation, i.e., volume densities and emitted colors of continuous 3D locations in the volume encompassing the human body. In contrast to the mesh-based representation whose resolution is limited by a mesh's fixed number of polygons, our volumetric representation does not limit the resolution of our model. Moreover, our volumetric represen-tation can be rendered via differentiable volume rendering, thus enabling us to train the model only using 2D images (without using ground truth 3D geometries of human bodies) by minimizing a loss function which measures the differences between rendered images and ground truth images. On the contrary, existing human models are trained using ground truth 3D geometries of human bodies. Thanks to the ability of our model to jointly model both the geometries and the appearances of clothed people, our model can benefit applications including human image synthesis, gaming and 3D television and telepresence.
{"title":"NeCH: Neural Clothed Human Model","authors":"Sheng Liu, Liangchen Song, Yi Xu, Junsong Yuan","doi":"10.1109/VCIP53242.2021.9675372","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675372","url":null,"abstract":"Existing human models, e.g., SMPL and STAR, represent 3D geometry of a human body in the form of a polygon mesh obtained by deforming a template mesh according to a set of shape and pose parameters. The appearance, however, is not directly modeled by most existing human models. We present a novel 3D human model that faithfully models both the 3D geometry and the appearance of a clothed human body with a continuous volumetric representation, i.e., volume densities and emitted colors of continuous 3D locations in the volume encompassing the human body. In contrast to the mesh-based representation whose resolution is limited by a mesh's fixed number of polygons, our volumetric representation does not limit the resolution of our model. Moreover, our volumetric represen-tation can be rendered via differentiable volume rendering, thus enabling us to train the model only using 2D images (without using ground truth 3D geometries of human bodies) by minimizing a loss function which measures the differences between rendered images and ground truth images. On the contrary, existing human models are trained using ground truth 3D geometries of human bodies. Thanks to the ability of our model to jointly model both the geometries and the appearances of clothed people, our model can benefit applications including human image synthesis, gaming and 3D television and telepresence.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126959061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-05DOI: 10.1109/VCIP53242.2021.9675408
Ronglei Ji, A. Tekalp
Deinterlacing continues to be an important problem of interest since many digital TV broadcasts and catalog content are still in interlaced format. Although deep learning has had huge impact in all forms of image/video processing, learned deinterlacing has not received much attention in the industry or academia. In this paper, we propose a novel multi-field deinterlacing network that aligns features from adjacent fields to a reference field (to be deinterlaced) using deformable residual convolution blocks. To the best of our knowledge, this paper is the first to propose fusion of multi-field features that are aligned via deformable convolutions for deinterlacing. We demonstrate through extensive experimental results that the proposed method provides state-of-the-art deinterlacing results in terms of both PSNR and perceptual quality.
{"title":"Learned Multi-Field De-Interlacing with Feature Alignment via Deformable Residual Convolution Blocks","authors":"Ronglei Ji, A. Tekalp","doi":"10.1109/VCIP53242.2021.9675408","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675408","url":null,"abstract":"Deinterlacing continues to be an important problem of interest since many digital TV broadcasts and catalog content are still in interlaced format. Although deep learning has had huge impact in all forms of image/video processing, learned deinterlacing has not received much attention in the industry or academia. In this paper, we propose a novel multi-field deinterlacing network that aligns features from adjacent fields to a reference field (to be deinterlaced) using deformable residual convolution blocks. To the best of our knowledge, this paper is the first to propose fusion of multi-field features that are aligned via deformable convolutions for deinterlacing. We demonstrate through extensive experimental results that the proposed method provides state-of-the-art deinterlacing results in terms of both PSNR and perceptual quality.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124978839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}