首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Copyright and Reprint Permissions 版权和转载权限
Pub Date : 2021-12-05 DOI: 10.1109/vcip53242.2021.9675354
{"title":"Copyright and Reprint Permissions","authors":"","doi":"10.1109/vcip53242.2021.9675354","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675354","url":null,"abstract":"","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"3 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Bottom-up Fast CU Partition Scoring Mechanism for AVS3 AVS3的自底向上快速CU分区评分机制
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675407
Shiyi Liu, Zhenyu Wang, Ke Qiu, Jiayu Yang, Ronggang Wang
The third generation of Audio Video Coding Standard (AVS3) achieves 22% coding performance improvement compared with High Efficiency Video Coding (HEVC). However, the improvement of encoding efficiency comes from a more flexible block partition scheme is at the cost of much higher encoding complexity. This paper proposes a bottom-up fast algorithm to prune the time-consuming search process of the CU partition tree. To be specific, we design a scoring mechanism based on the splitting patterns traced back from the bottom to predict the possibility of a partition type to be selected as optimal. The score threshold to skip the exhaustive Rate-Distortion Optimization (RDO) procedure of the partition type is determined by statistical analysis. The experimental results show that the proposed methods can achieve 24.56% time-saving with 0.37% BDBR loss under Random Access configuration and 12.50% complexity reduction with 0.08% BDBR loss under All Intra configuration. The effectiveness leads to the adoption by the open-source platform of AVS3 after evaluated by the AVS working group.
与HEVC (High Efficiency Video Coding)相比,第三代音视频编码标准(AVS3)的编码性能提高了22%。然而,更灵活的块划分方案所带来的编码效率的提高是以更高的编码复杂度为代价的。针对CU分区树耗时的搜索过程,提出了一种自底向上的快速算法。具体来说,我们设计了一种基于从底部追溯的分裂模式的评分机制,以预测被选择为最优分区类型的可能性。通过统计分析确定了跳过分区类型的穷举率失真优化(RDO)过程的得分阈值。实验结果表明,该方法在随机接入配置下可节省24.56%的时间和0.37%的BDBR损耗,在全Intra配置下可降低12.50%的复杂度和0.08%的BDBR损耗。经过AVS工作组的评估,其有效性导致AVS3被开源平台采用。
{"title":"A Bottom-up Fast CU Partition Scoring Mechanism for AVS3","authors":"Shiyi Liu, Zhenyu Wang, Ke Qiu, Jiayu Yang, Ronggang Wang","doi":"10.1109/VCIP53242.2021.9675407","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675407","url":null,"abstract":"The third generation of Audio Video Coding Standard (AVS3) achieves 22% coding performance improvement compared with High Efficiency Video Coding (HEVC). However, the improvement of encoding efficiency comes from a more flexible block partition scheme is at the cost of much higher encoding complexity. This paper proposes a bottom-up fast algorithm to prune the time-consuming search process of the CU partition tree. To be specific, we design a scoring mechanism based on the splitting patterns traced back from the bottom to predict the possibility of a partition type to be selected as optimal. The score threshold to skip the exhaustive Rate-Distortion Optimization (RDO) procedure of the partition type is determined by statistical analysis. The experimental results show that the proposed methods can achieve 24.56% time-saving with 0.37% BDBR loss under Random Access configuration and 12.50% complexity reduction with 0.08% BDBR loss under All Intra configuration. The effectiveness leads to the adoption by the open-source platform of AVS3 after evaluated by the AVS working group.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129570534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DWS-BEAM: Decoder-Wise Subpicture Bitstream Extracting and Merging for MPEG Immersive Video DWS-BEAM:基于解码器的MPEG沉浸式视频子图像比特流提取和合并
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675419
Jong-Beom Jeong, Soonbin Lee, Eun‐Seok Ryu
With the new immersive video coding standard MPEG immersive video (MIV) and versatile video coding (VVC), six degrees of freedom (6DoF) virtual reality (VR) streaming technology is emerging for both computer-generated and natural content videos. This paper addresses the decoder-wise subpicture bitstream extracting and merging (DWS-BEAM) method for MIV and proposes two main ideas: (i) a selective streaming-aware subpicture allocation method using a motion-constrained tile set (MCTS), (ii) a decoder-wise subpicture extracting and merging method for single-pass decoding. In the experiments using the VVC test model (VTM), the proposed method shows 1.23% BD-rate saving for immersive video PSNR (IV-PSNR) and 15.78% decoding runtime saving compared to the VTM anchor. Moreover, while the MIV test model requires four decoders, the proposed method only requires one decoder.
随着新的沉浸式视频编码标准MPEG(沉浸式视频编码)和通用视频编码(VVC)的出现,计算机生成和自然内容视频的六自由度虚拟现实(VR)流媒体技术正在兴起。本文讨论了用于MIV的解码器智能子图像比特流提取和合并(DWS-BEAM)方法,并提出了两个主要思想:(i)使用运动约束贴图集(MCTS)的选择性流感知子图像分配方法,(ii)用于单次解码的解码器智能子图像提取和合并方法。在使用VVC测试模型(VTM)的实验中,与VTM锚点相比,所提出的方法在沉浸式视频PSNR (IV-PSNR)上节省了1.23%的bd率,在解码时间上节省了15.78%。此外,MIV测试模型需要四个解码器,而本文方法只需要一个解码器。
{"title":"DWS-BEAM: Decoder-Wise Subpicture Bitstream Extracting and Merging for MPEG Immersive Video","authors":"Jong-Beom Jeong, Soonbin Lee, Eun‐Seok Ryu","doi":"10.1109/VCIP53242.2021.9675419","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675419","url":null,"abstract":"With the new immersive video coding standard MPEG immersive video (MIV) and versatile video coding (VVC), six degrees of freedom (6DoF) virtual reality (VR) streaming technology is emerging for both computer-generated and natural content videos. This paper addresses the decoder-wise subpicture bitstream extracting and merging (DWS-BEAM) method for MIV and proposes two main ideas: (i) a selective streaming-aware subpicture allocation method using a motion-constrained tile set (MCTS), (ii) a decoder-wise subpicture extracting and merging method for single-pass decoding. In the experiments using the VVC test model (VTM), the proposed method shows 1.23% BD-rate saving for immersive video PSNR (IV-PSNR) and 15.78% decoding runtime saving compared to the VTM anchor. Moreover, while the MIV test model requires four decoders, the proposed method only requires one decoder.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132486723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
SMRD: A Local Feature Descriptor for Multi-modal Image Registration SMRD:多模态图像配准的局部特征描述符
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675401
Jiayu Xie, Xin Jin, Hongkun Cao
Image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.
多模态图像配准在计算机视觉和计算摄影领域受到越来越多的关注。然而,非线性的强度变化限制了不同模态图像对之间特征点的精确匹配。为此,提出了一种用于多模态图像配准的鲁棒图像描述符,即基于shearlet的模态鲁棒描述符(SMRD)。利用多尺度边缘和纹理信息的各向异性特征,基于离散shearlet变换对感兴趣点周围区域进行编码。我们在四种不同的多模态数据集上与几种最先进的多模态/多光谱描述符进行了实验,以验证所提出的SMRD。实验结果表明,该方法在查全率、查全率和f1分数方面均优于其他方法。
{"title":"SMRD: A Local Feature Descriptor for Multi-modal Image Registration","authors":"Jiayu Xie, Xin Jin, Hongkun Cao","doi":"10.1109/VCIP53242.2021.9675401","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675401","url":null,"abstract":"Image registration among multimodality has received increasing attention in the scope of computer vision and computational photography nowadays. However, the non-linear intensity variations prohibit the accurate feature points matching between modal-different image pairs. Thus, a robust image descriptor for multi-modal image registration is proposed, named shearlet-based modality robust descriptor(SMRD). The anisotropic feature of edge and texture information in multi-scale is encoded to describe the region around a point of interest based on discrete shearlet transform. We conducted the experiments to verify the proposed SMRD compared with several state-of-the-art multi-modal/multispectral descriptors on four different multi-modal datasets. The experimental results showed that our SMRD achieves superior performance than other methods in terms of precision, recall and F1-score.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131391529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Neural Network based Inter bi-prediction Blending 基于神经网络的双向预测混合
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675422
Franck Galpin, P. Bordes, Thierry Dumas, Pavel Nikitin, F. L. Léannec
This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.
提出了一种基于学习的改进视频编码双预测的方法。在传统的视频编码解决方案中,已经解码的参考图像块的运动补偿是用于预测当前帧的主要工具。特别是通过对两个不同的运动补偿预测块进行平均得到一个块的双预测,显著提高了最终的时间预测精度。在这种情况下,我们引入了一个简单的神经网络,进一步改进了混合操作。在网络大小和编码器模式选择方面进行了复杂性平衡。在最近标准化的VVC编解码器之上进行了广泛的测试,并显示在随机访问配置中,对于小于10k参数的网络大小,bd速率提高了- 1.4%。我们还提出了一个简单的基于cpu的实现和直接的网络量化来评估传统编解码器框架中的复杂性/增益权衡。
{"title":"Neural Network based Inter bi-prediction Blending","authors":"Franck Galpin, P. Bordes, Thierry Dumas, Pavel Nikitin, F. L. Léannec","doi":"10.1109/VCIP53242.2021.9675422","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675422","url":null,"abstract":"This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123610571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding 基于学习的图像编码的大规模众包主观质量评价
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675314
Evgeniy Upenik, Michela Testolina, J. Ascenso, Fernando Pereira, T. Ebrahimi
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based Image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naïve subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.
与传统图像编解码器(如JPEG、JPEG 2000和HEIC)引入的阻塞和模糊退化相比,基于学习的图像编解码器产生了不同的压缩伪影。本文使用基于众包的主观质量评估程序对提交给MMSP“2020基于学习的图像编码大挑战”和JPEG人工智能证据征集的一组具有代表性的端到端深度学习图像编解码器进行了基准测试。首次采用具有连续质量尺度的双刺激方法对这类图像编解码器进行评价。这项主观实验是有史以来规模最大的实验之一,包括118名naïve受试者评估的240多对比较。基于学习的图像编码解决方案与传统编解码器的基准测试结果与刺激一起组织在差分平均意见分数的数据集中,并公开提供。
{"title":"Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding","authors":"Evgeniy Upenik, Michela Testolina, J. Ascenso, Fernando Pereira, T. Ebrahimi","doi":"10.1109/VCIP53242.2021.9675314","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675314","url":null,"abstract":"Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based Image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naïve subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114636275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Deep Metric Learning for Human Action Recognition with SlowFast Networks 基于慢速网络的深度度量学习人类动作识别
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675393
Shan-zhi Shi, Cheolkon Jung
In this paper, we propose deep metric learning for human action recognition with SlowFast networks. We adopt SlowFast Networks to extract slow-changing spatial semantic information of a single target entity in the spatial domain with fast-changing motion information in the temporal domain. Since deep metric learning is able to learn the class difference between human actions, we utilize deep metric learning to learn a mapping from the original video to the compact features in the embedding space. The proposed network consists of three main parts: 1) two branches independently operating at low and high frame rates to extract spatial and temporal features; 2) feature fusion of the two branches; 3) joint training network of deep metric learning and classification loss. Experimental results on the KTH human action dataset demonstrate that the proposed method achieves faster runtime with less model size than C3D and R3D, while ensuring high accuracy.
本文提出了基于慢速网络的深度度量学习的人体动作识别方法。我们采用SlowFast Networks来提取单个目标实体在空间域中缓慢变化的空间语义信息和在时间域中快速变化的运动信息。由于深度度量学习能够学习人类行为之间的类别差异,我们利用深度度量学习来学习从原始视频到嵌入空间中紧凑特征的映射。该网络由三个主要部分组成:1)分别以低帧率和高帧率独立工作的两个分支提取时空特征;2)两个分支的特征融合;3)深度度量学习与分类损失联合训练网络。在KTH人体动作数据集上的实验结果表明,与C3D和R3D相比,该方法以更小的模型尺寸实现了更快的运行时间,同时保证了较高的精度。
{"title":"Deep Metric Learning for Human Action Recognition with SlowFast Networks","authors":"Shan-zhi Shi, Cheolkon Jung","doi":"10.1109/VCIP53242.2021.9675393","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675393","url":null,"abstract":"In this paper, we propose deep metric learning for human action recognition with SlowFast networks. We adopt SlowFast Networks to extract slow-changing spatial semantic information of a single target entity in the spatial domain with fast-changing motion information in the temporal domain. Since deep metric learning is able to learn the class difference between human actions, we utilize deep metric learning to learn a mapping from the original video to the compact features in the embedding space. The proposed network consists of three main parts: 1) two branches independently operating at low and high frame rates to extract spatial and temporal features; 2) feature fusion of the two branches; 3) joint training network of deep metric learning and classification loss. Experimental results on the KTH human action dataset demonstrate that the proposed method achieves faster runtime with less model size than C3D and R3D, while ensuring high accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114432473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
AutoDerain: Memory-efficient Neural Architecture Search for Image Deraining AutoDerain:高效记忆神经架构搜索图像derain
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675339
Jun Fu, Chen Hou, Zhibo Chen
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.
在过去的几十年里,基于学习的图像训练方法取得了显著的成功。目前,大多数培训体系结构是由人类专家开发的,这是一个费力且容易出错的过程。在本文中,我们提出了一种使用神经架构搜索(NAS)来自动设计脱轨架构的研究,称为AutoDerain。具体而言,我们首先提出了一个u型脱轨架构,该架构主要由残余挤压和激励块(rseb)组成。然后,我们定义了一个搜索空间,在那里我们搜索卷积类型和使用压缩和激励块。考虑到可微架构搜索是内存密集型的,我们提出了一种内存高效的可微架构搜索方案(mdart)。鉴于二元神经网络训练的成功,MDARTS通过近端梯度优化体系结构参数,其消耗的GPU内存与训练单个脱模模型相同。实验结果表明,MDARTS设计的体系结构优于手工设计的脱模器。
{"title":"AutoDerain: Memory-efficient Neural Architecture Search for Image Deraining","authors":"Jun Fu, Chen Hou, Zhibo Chen","doi":"10.1109/VCIP53242.2021.9675339","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675339","url":null,"abstract":"Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126701049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeCH: Neural Clothed Human Model NeCH:神经穿衣人体模型
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675372
Sheng Liu, Liangchen Song, Yi Xu, Junsong Yuan
Existing human models, e.g., SMPL and STAR, represent 3D geometry of a human body in the form of a polygon mesh obtained by deforming a template mesh according to a set of shape and pose parameters. The appearance, however, is not directly modeled by most existing human models. We present a novel 3D human model that faithfully models both the 3D geometry and the appearance of a clothed human body with a continuous volumetric representation, i.e., volume densities and emitted colors of continuous 3D locations in the volume encompassing the human body. In contrast to the mesh-based representation whose resolution is limited by a mesh's fixed number of polygons, our volumetric representation does not limit the resolution of our model. Moreover, our volumetric represen-tation can be rendered via differentiable volume rendering, thus enabling us to train the model only using 2D images (without using ground truth 3D geometries of human bodies) by minimizing a loss function which measures the differences between rendered images and ground truth images. On the contrary, existing human models are trained using ground truth 3D geometries of human bodies. Thanks to the ability of our model to jointly model both the geometries and the appearances of clothed people, our model can benefit applications including human image synthesis, gaming and 3D television and telepresence.
现有的人体模型,如SMPL和STAR,是根据一组形状和位姿参数对模板网格进行变形,以多边形网格的形式表示人体的三维几何形状。然而,大多数现有的人体模型并不能直接模拟这种外观。我们提出了一种新的三维人体模型,该模型忠实地模拟了三维几何形状和穿着衣服的人体外观,具有连续的体积表示,即人体周围体积中连续3D位置的体积密度和发射颜色。与基于网格的表示(其分辨率受网格的固定多边形数量的限制)相反,我们的体积表示不限制我们模型的分辨率。此外,我们的体积表示可以通过可微分体积渲染来渲染,从而使我们能够通过最小化测量渲染图像和地面真实图像之间差异的损失函数,仅使用2D图像(不使用人体的地面真实3D几何)来训练模型。相反,现有的人体模型是使用人体的地面真实三维几何形状来训练的。由于我们的模型能够同时模拟穿着衣服的人的几何形状和外观,我们的模型可以应用于人类图像合成、游戏、3D电视和远程呈现等领域。
{"title":"NeCH: Neural Clothed Human Model","authors":"Sheng Liu, Liangchen Song, Yi Xu, Junsong Yuan","doi":"10.1109/VCIP53242.2021.9675372","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675372","url":null,"abstract":"Existing human models, e.g., SMPL and STAR, represent 3D geometry of a human body in the form of a polygon mesh obtained by deforming a template mesh according to a set of shape and pose parameters. The appearance, however, is not directly modeled by most existing human models. We present a novel 3D human model that faithfully models both the 3D geometry and the appearance of a clothed human body with a continuous volumetric representation, i.e., volume densities and emitted colors of continuous 3D locations in the volume encompassing the human body. In contrast to the mesh-based representation whose resolution is limited by a mesh's fixed number of polygons, our volumetric representation does not limit the resolution of our model. Moreover, our volumetric represen-tation can be rendered via differentiable volume rendering, thus enabling us to train the model only using 2D images (without using ground truth 3D geometries of human bodies) by minimizing a loss function which measures the differences between rendered images and ground truth images. On the contrary, existing human models are trained using ground truth 3D geometries of human bodies. Thanks to the ability of our model to jointly model both the geometries and the appearances of clothed people, our model can benefit applications including human image synthesis, gaming and 3D television and telepresence.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126959061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Learned Multi-Field De-Interlacing with Feature Alignment via Deformable Residual Convolution Blocks 通过可变形残差卷积块学习特征对齐的多场去隔行
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675408
Ronglei Ji, A. Tekalp
Deinterlacing continues to be an important problem of interest since many digital TV broadcasts and catalog content are still in interlaced format. Although deep learning has had huge impact in all forms of image/video processing, learned deinterlacing has not received much attention in the industry or academia. In this paper, we propose a novel multi-field deinterlacing network that aligns features from adjacent fields to a reference field (to be deinterlaced) using deformable residual convolution blocks. To the best of our knowledge, this paper is the first to propose fusion of multi-field features that are aligned via deformable convolutions for deinterlacing. We demonstrate through extensive experimental results that the proposed method provides state-of-the-art deinterlacing results in terms of both PSNR and perceptual quality.
去隔行仍然是一个重要的问题,因为许多数字电视广播和目录内容仍然是隔行格式。尽管深度学习在所有形式的图像/视频处理中都产生了巨大的影响,但习得的去隔行处理在工业界或学术界并没有受到太多的关注。在本文中,我们提出了一种新的多场去交错网络,该网络使用可变形残差卷积块将相邻场的特征对齐到参考场(待去交错)。据我们所知,本文首次提出了通过可变形卷积对齐的多场特征融合以进行去交错。我们通过大量的实验结果证明,所提出的方法在PSNR和感知质量方面都提供了最先进的去隔行结果。
{"title":"Learned Multi-Field De-Interlacing with Feature Alignment via Deformable Residual Convolution Blocks","authors":"Ronglei Ji, A. Tekalp","doi":"10.1109/VCIP53242.2021.9675408","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675408","url":null,"abstract":"Deinterlacing continues to be an important problem of interest since many digital TV broadcasts and catalog content are still in interlaced format. Although deep learning has had huge impact in all forms of image/video processing, learned deinterlacing has not received much attention in the industry or academia. In this paper, we propose a novel multi-field deinterlacing network that aligns features from adjacent fields to a reference field (to be deinterlaced) using deformable residual convolution blocks. To the best of our knowledge, this paper is the first to propose fusion of multi-field features that are aligned via deformable convolutions for deinterlacing. We demonstrate through extensive experimental results that the proposed method provides state-of-the-art deinterlacing results in terms of both PSNR and perceptual quality.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124978839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1