首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Copyright and Reprint Permissions 版权和转载权限
Pub Date : 2021-12-05 DOI: 10.1109/vcip53242.2021.9675354
{"title":"Copyright and Reprint Permissions","authors":"","doi":"10.1109/vcip53242.2021.9675354","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675354","url":null,"abstract":"","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"3 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133203582","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Plug-and-Play Deblurring for Robust Object Detection 即插即用去模糊稳健的目标检测
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675437
Gerald Xie, Zhu Li, S. Bhattacharyya, A. Mehmood
Object detection is a classic computer vision task, which learns the mapping between an image and object bounding boxes + class labels. Many applications of object detection involve images which are prone to degradation at capture time, notably motion blur from a moving camera like UAVs or object itself. One approach to handling this blur involves using common deblurring methods to recover the clean pixel images and then the apply vision task. This task is typically ill-posed. On top of this, application of these methods also add onto the inference time of the vision network, which can hinder performance of video inputs. To address the issues, we propose a novel plug-and-play (PnP) solution that insert deblurring features into the target vision task network without the need to retrain the task network. The deblur features are learned from a classification loss network on blur strength and directions, and the PnP scheme works well with the object detection network with minimum inference time complexity, compared with the state of the art deblur and then detection solution.
对象检测是一项经典的计算机视觉任务,它学习图像与对象边界框+类标签之间的映射。物体检测的许多应用涉及在捕获时容易退化的图像,特别是来自像无人机或物体本身这样的移动相机的运动模糊。处理这种模糊的一种方法包括使用常见的去模糊方法来恢复干净的像素图像,然后应用视觉任务。这个任务通常是不适定的。除此之外,这些方法的应用还增加了视觉网络的推理时间,这可能会影响视频输入的性能。为了解决这些问题,我们提出了一种新颖的即插即用(PnP)解决方案,该方案将去模糊特征插入到目标视觉任务网络中,而无需重新训练任务网络。在模糊强度和方向上从分类损失网络中学习到去模糊特征,与现有的去模糊再检测方案相比,PnP方案能以最小的推理时间复杂度与目标检测网络协同工作。
{"title":"Plug-and-Play Deblurring for Robust Object Detection","authors":"Gerald Xie, Zhu Li, S. Bhattacharyya, A. Mehmood","doi":"10.1109/VCIP53242.2021.9675437","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675437","url":null,"abstract":"Object detection is a classic computer vision task, which learns the mapping between an image and object bounding boxes + class labels. Many applications of object detection involve images which are prone to degradation at capture time, notably motion blur from a moving camera like UAVs or object itself. One approach to handling this blur involves using common deblurring methods to recover the clean pixel images and then the apply vision task. This task is typically ill-posed. On top of this, application of these methods also add onto the inference time of the vision network, which can hinder performance of video inputs. To address the issues, we propose a novel plug-and-play (PnP) solution that insert deblurring features into the target vision task network without the need to retrain the task network. The deblur features are learned from a classification loss network on blur strength and directions, and the PnP scheme works well with the object detection network with minimum inference time complexity, compared with the state of the art deblur and then detection solution.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132015696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Learn to overfit better: finding the important parameters for learned image compression 更好地学习过拟合:找到学习图像压缩的重要参数
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675360
Honglei Zhang, Francesco Cricri, H. R. Tavakoli, M. Santamaría, Y. Lam, M. Hannuksela
For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficiency of learning-based image and video codecs. At the encoding stage, one or more neural networks that are part of the codec are finetuned using the input image or video to achieve a better coding performance. The encoder en-codes the input content into a content bitstream. If the finetuned neural network is part (also) of the decoder, the encoder signals the weight update of the finetuned model to the decoder along with the content bitstream. At the decoding stage, the decoder first updates its neural network model according to the received weight update, and then proceeds with decoding the content bitstream. Since a neural network contains a large number of parameters, compressing the weight update is critical to reducing bitrate overhead. In this paper, we propose learning-based methods to find the important parameters to be overfitted, in terms of rate-distortion performance. Based on simple distribution models for variables in the weight update, we derive two objective functions. By optimizing the proposed objective functions, the importance scores of the parameters can be calculated and the important parameters can be determined. Our experiments on lossless image compression codec show that the proposed method significantly outperforms a prior-art method where overfitted parameters were selected based on heuristics. Furthermore, our technique improved the compression performance of the state-of-the-art lossless image compression codec by 0.1 bit per pixel.
对于大多数机器学习系统来说,过拟合是一种不希望出现的行为。然而,在推理时对测试图像或视频进行过拟合是提高基于学习的图像和视频编解码器编码效率的一种有利而有效的技术。在编码阶段,使用输入图像或视频对编解码器中的一个或多个神经网络进行微调,以获得更好的编码性能。编码器将输入内容编码为内容比特流。如果经过微调的神经网络是解码器的一部分,则编码器将经过微调的模型的权重更新连同内容比特流一起发送给解码器。在解码阶段,解码器首先根据接收到的权值更新其神经网络模型,然后对内容比特流进行解码。由于神经网络包含大量参数,压缩权重更新对于减少比特率开销至关重要。在本文中,我们提出了基于学习的方法来找到需要过拟合的重要参数,在率失真性能方面。基于权重更新中变量的简单分布模型,导出了两个目标函数。通过对提出的目标函数进行优化,计算出各参数的重要度得分,确定各参数的重要程度。我们对无损图像压缩编解码器的实验表明,所提出的方法明显优于基于启发式选择过拟合参数的现有技术方法。此外,我们的技术将最先进的无损图像压缩编解码器的压缩性能提高了每像素0.1位。
{"title":"Learn to overfit better: finding the important parameters for learned image compression","authors":"Honglei Zhang, Francesco Cricri, H. R. Tavakoli, M. Santamaría, Y. Lam, M. Hannuksela","doi":"10.1109/VCIP53242.2021.9675360","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675360","url":null,"abstract":"For most machine learning systems, overfitting is an undesired behavior. However, overfitting a model to a test image or a video at inference time is a favorable and effective technique to improve the coding efficiency of learning-based image and video codecs. At the encoding stage, one or more neural networks that are part of the codec are finetuned using the input image or video to achieve a better coding performance. The encoder en-codes the input content into a content bitstream. If the finetuned neural network is part (also) of the decoder, the encoder signals the weight update of the finetuned model to the decoder along with the content bitstream. At the decoding stage, the decoder first updates its neural network model according to the received weight update, and then proceeds with decoding the content bitstream. Since a neural network contains a large number of parameters, compressing the weight update is critical to reducing bitrate overhead. In this paper, we propose learning-based methods to find the important parameters to be overfitted, in terms of rate-distortion performance. Based on simple distribution models for variables in the weight update, we derive two objective functions. By optimizing the proposed objective functions, the importance scores of the parameters can be calculated and the important parameters can be determined. Our experiments on lossless image compression codec show that the proposed method significantly outperforms a prior-art method where overfitted parameters were selected based on heuristics. Furthermore, our technique improved the compression performance of the state-of-the-art lossless image compression codec by 0.1 bit per pixel.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"37 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130758778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
A Bottom-up Fast CU Partition Scoring Mechanism for AVS3 AVS3的自底向上快速CU分区评分机制
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675407
Shiyi Liu, Zhenyu Wang, Ke Qiu, Jiayu Yang, Ronggang Wang
The third generation of Audio Video Coding Standard (AVS3) achieves 22% coding performance improvement compared with High Efficiency Video Coding (HEVC). However, the improvement of encoding efficiency comes from a more flexible block partition scheme is at the cost of much higher encoding complexity. This paper proposes a bottom-up fast algorithm to prune the time-consuming search process of the CU partition tree. To be specific, we design a scoring mechanism based on the splitting patterns traced back from the bottom to predict the possibility of a partition type to be selected as optimal. The score threshold to skip the exhaustive Rate-Distortion Optimization (RDO) procedure of the partition type is determined by statistical analysis. The experimental results show that the proposed methods can achieve 24.56% time-saving with 0.37% BDBR loss under Random Access configuration and 12.50% complexity reduction with 0.08% BDBR loss under All Intra configuration. The effectiveness leads to the adoption by the open-source platform of AVS3 after evaluated by the AVS working group.
与HEVC (High Efficiency Video Coding)相比,第三代音视频编码标准(AVS3)的编码性能提高了22%。然而,更灵活的块划分方案所带来的编码效率的提高是以更高的编码复杂度为代价的。针对CU分区树耗时的搜索过程,提出了一种自底向上的快速算法。具体来说,我们设计了一种基于从底部追溯的分裂模式的评分机制,以预测被选择为最优分区类型的可能性。通过统计分析确定了跳过分区类型的穷举率失真优化(RDO)过程的得分阈值。实验结果表明,该方法在随机接入配置下可节省24.56%的时间和0.37%的BDBR损耗,在全Intra配置下可降低12.50%的复杂度和0.08%的BDBR损耗。经过AVS工作组的评估,其有效性导致AVS3被开源平台采用。
{"title":"A Bottom-up Fast CU Partition Scoring Mechanism for AVS3","authors":"Shiyi Liu, Zhenyu Wang, Ke Qiu, Jiayu Yang, Ronggang Wang","doi":"10.1109/VCIP53242.2021.9675407","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675407","url":null,"abstract":"The third generation of Audio Video Coding Standard (AVS3) achieves 22% coding performance improvement compared with High Efficiency Video Coding (HEVC). However, the improvement of encoding efficiency comes from a more flexible block partition scheme is at the cost of much higher encoding complexity. This paper proposes a bottom-up fast algorithm to prune the time-consuming search process of the CU partition tree. To be specific, we design a scoring mechanism based on the splitting patterns traced back from the bottom to predict the possibility of a partition type to be selected as optimal. The score threshold to skip the exhaustive Rate-Distortion Optimization (RDO) procedure of the partition type is determined by statistical analysis. The experimental results show that the proposed methods can achieve 24.56% time-saving with 0.37% BDBR loss under Random Access configuration and 12.50% complexity reduction with 0.08% BDBR loss under All Intra configuration. The effectiveness leads to the adoption by the open-source platform of AVS3 after evaluated by the AVS working group.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129570534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Network based Inter bi-prediction Blending 基于神经网络的双向预测混合
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675422
Franck Galpin, P. Bordes, Thierry Dumas, Pavel Nikitin, F. L. Léannec
This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.
提出了一种基于学习的改进视频编码双预测的方法。在传统的视频编码解决方案中,已经解码的参考图像块的运动补偿是用于预测当前帧的主要工具。特别是通过对两个不同的运动补偿预测块进行平均得到一个块的双预测,显著提高了最终的时间预测精度。在这种情况下,我们引入了一个简单的神经网络,进一步改进了混合操作。在网络大小和编码器模式选择方面进行了复杂性平衡。在最近标准化的VVC编解码器之上进行了广泛的测试,并显示在随机访问配置中,对于小于10k参数的网络大小,bd速率提高了- 1.4%。我们还提出了一个简单的基于cpu的实现和直接的网络量化来评估传统编解码器框架中的复杂性/增益权衡。
{"title":"Neural Network based Inter bi-prediction Blending","authors":"Franck Galpin, P. Bordes, Thierry Dumas, Pavel Nikitin, F. L. Léannec","doi":"10.1109/VCIP53242.2021.9675422","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675422","url":null,"abstract":"This paper presents a learning-based method to improve bi-prediction in video coding. In conventional video coding solutions, the motion compensation of blocks from already decoded reference pictures stands out as the principal tool used to predict the current frame. Especially, the bi-prediction, in which a block is obtained by averaging two different motion-compensated prediction blocks, significantly improves the final temporal prediction accuracy. In this context, we introduce a simple neural network that further improves the blending operation. A complexity balance, both in terms of network size and encoder mode selection, is carried out. Extensive tests on top of the recently standardized VVC codec are performed and show a BD-rate improvement of −1.4% in random access configuration for a network size of fewer than 10k parameters. We also propose a simple CPU-based implementation and direct network quantization to assess the complexity/gains tradeoff in a conventional codec framework.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123610571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
People Detection and Tracking Using a Fisheye Camera Network 基于鱼眼摄像机网络的人员检测与跟踪
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675451
T. Wang, Chih-Hao Liao, Li-Hsuan Hsieh, A. W. Tsui, Hsin-Chien Huang
In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topic of multi-camera object tracking. The experimental results on test videos exhibit good performance for practical use. We also propose methods to account for occlusion by scene objects at different stages of the algorithm that lead to improved results.
在本文中,我们研究了在多个俯视图鱼眼相机覆盖的室内场景中对多人进行准确检测、定位和跟踪的技术。这是在多摄像机目标跟踪主题中很少研究的设置。在测试视频上的实验结果表明,该方法具有较好的实用性。我们还提出了在算法的不同阶段考虑场景物体遮挡的方法,从而提高了结果。
{"title":"People Detection and Tracking Using a Fisheye Camera Network","authors":"T. Wang, Chih-Hao Liao, Li-Hsuan Hsieh, A. W. Tsui, Hsin-Chien Huang","doi":"10.1109/VCIP53242.2021.9675451","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675451","url":null,"abstract":"In this paper we study techniques for accurate detection, localization, and tracking of multiple people in an indoor scene covered by multiple top-view fisheye cameras. This is a rarely studied setting within the topic of multi-camera object tracking. The experimental results on test videos exhibit good performance for practical use. We also propose methods to account for occlusion by scene objects at different stages of the algorithm that lead to improved results.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123767854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
AutoDerain: Memory-efficient Neural Architecture Search for Image Deraining AutoDerain:高效记忆神经架构搜索图像derain
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675339
Jun Fu, Chen Hou, Zhibo Chen
Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.
在过去的几十年里,基于学习的图像训练方法取得了显著的成功。目前,大多数培训体系结构是由人类专家开发的,这是一个费力且容易出错的过程。在本文中,我们提出了一种使用神经架构搜索(NAS)来自动设计脱轨架构的研究,称为AutoDerain。具体而言,我们首先提出了一个u型脱轨架构,该架构主要由残余挤压和激励块(rseb)组成。然后,我们定义了一个搜索空间,在那里我们搜索卷积类型和使用压缩和激励块。考虑到可微架构搜索是内存密集型的,我们提出了一种内存高效的可微架构搜索方案(mdart)。鉴于二元神经网络训练的成功,MDARTS通过近端梯度优化体系结构参数,其消耗的GPU内存与训练单个脱模模型相同。实验结果表明,MDARTS设计的体系结构优于手工设计的脱模器。
{"title":"AutoDerain: Memory-efficient Neural Architecture Search for Image Deraining","authors":"Jun Fu, Chen Hou, Zhibo Chen","doi":"10.1109/VCIP53242.2021.9675339","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675339","url":null,"abstract":"Learning-based image deraining methods have achieved remarkable success in the past few decades. Currently, most deraining architectures are developed by human experts, which is a laborious and error-prone process. In this paper, we present a study on employing neural architecture search (NAS) to automatically design deraining architectures, dubbed AutoDerain. Specifically, we first propose an U-shaped deraining architecture, which mainly consists of residual squeeze-and-excitation blocks (RSEBs). Then, we define a search space, where we search for the convolutional types and the use of the squeeze-and-excitation block. Considering that the differentiable architecture search is memory-intensive, we propose a memory-efficient differentiable architecture search scheme (MDARTS). In light of the success of training binary neural networks, MDARTS optimizes architecture parameters through the proximal gradient, which only consumes the same GPU memory as training a single deraining model. Experimental results demonstrate that the architecture designed by MDARTS is superior to manually designed derainers.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126701049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NeCH: Neural Clothed Human Model NeCH:神经穿衣人体模型
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675372
Sheng Liu, Liangchen Song, Yi Xu, Junsong Yuan
Existing human models, e.g., SMPL and STAR, represent 3D geometry of a human body in the form of a polygon mesh obtained by deforming a template mesh according to a set of shape and pose parameters. The appearance, however, is not directly modeled by most existing human models. We present a novel 3D human model that faithfully models both the 3D geometry and the appearance of a clothed human body with a continuous volumetric representation, i.e., volume densities and emitted colors of continuous 3D locations in the volume encompassing the human body. In contrast to the mesh-based representation whose resolution is limited by a mesh's fixed number of polygons, our volumetric representation does not limit the resolution of our model. Moreover, our volumetric represen-tation can be rendered via differentiable volume rendering, thus enabling us to train the model only using 2D images (without using ground truth 3D geometries of human bodies) by minimizing a loss function which measures the differences between rendered images and ground truth images. On the contrary, existing human models are trained using ground truth 3D geometries of human bodies. Thanks to the ability of our model to jointly model both the geometries and the appearances of clothed people, our model can benefit applications including human image synthesis, gaming and 3D television and telepresence.
现有的人体模型,如SMPL和STAR,是根据一组形状和位姿参数对模板网格进行变形,以多边形网格的形式表示人体的三维几何形状。然而,大多数现有的人体模型并不能直接模拟这种外观。我们提出了一种新的三维人体模型,该模型忠实地模拟了三维几何形状和穿着衣服的人体外观,具有连续的体积表示,即人体周围体积中连续3D位置的体积密度和发射颜色。与基于网格的表示(其分辨率受网格的固定多边形数量的限制)相反,我们的体积表示不限制我们模型的分辨率。此外,我们的体积表示可以通过可微分体积渲染来渲染,从而使我们能够通过最小化测量渲染图像和地面真实图像之间差异的损失函数,仅使用2D图像(不使用人体的地面真实3D几何)来训练模型。相反,现有的人体模型是使用人体的地面真实三维几何形状来训练的。由于我们的模型能够同时模拟穿着衣服的人的几何形状和外观,我们的模型可以应用于人类图像合成、游戏、3D电视和远程呈现等领域。
{"title":"NeCH: Neural Clothed Human Model","authors":"Sheng Liu, Liangchen Song, Yi Xu, Junsong Yuan","doi":"10.1109/VCIP53242.2021.9675372","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675372","url":null,"abstract":"Existing human models, e.g., SMPL and STAR, represent 3D geometry of a human body in the form of a polygon mesh obtained by deforming a template mesh according to a set of shape and pose parameters. The appearance, however, is not directly modeled by most existing human models. We present a novel 3D human model that faithfully models both the 3D geometry and the appearance of a clothed human body with a continuous volumetric representation, i.e., volume densities and emitted colors of continuous 3D locations in the volume encompassing the human body. In contrast to the mesh-based representation whose resolution is limited by a mesh's fixed number of polygons, our volumetric representation does not limit the resolution of our model. Moreover, our volumetric represen-tation can be rendered via differentiable volume rendering, thus enabling us to train the model only using 2D images (without using ground truth 3D geometries of human bodies) by minimizing a loss function which measures the differences between rendered images and ground truth images. On the contrary, existing human models are trained using ground truth 3D geometries of human bodies. Thanks to the ability of our model to jointly model both the geometries and the appearances of clothed people, our model can benefit applications including human image synthesis, gaming and 3D television and telepresence.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126959061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Deep Metric Learning for Human Action Recognition with SlowFast Networks 基于慢速网络的深度度量学习人类动作识别
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675393
Shan-zhi Shi, Cheolkon Jung
In this paper, we propose deep metric learning for human action recognition with SlowFast networks. We adopt SlowFast Networks to extract slow-changing spatial semantic information of a single target entity in the spatial domain with fast-changing motion information in the temporal domain. Since deep metric learning is able to learn the class difference between human actions, we utilize deep metric learning to learn a mapping from the original video to the compact features in the embedding space. The proposed network consists of three main parts: 1) two branches independently operating at low and high frame rates to extract spatial and temporal features; 2) feature fusion of the two branches; 3) joint training network of deep metric learning and classification loss. Experimental results on the KTH human action dataset demonstrate that the proposed method achieves faster runtime with less model size than C3D and R3D, while ensuring high accuracy.
本文提出了基于慢速网络的深度度量学习的人体动作识别方法。我们采用SlowFast Networks来提取单个目标实体在空间域中缓慢变化的空间语义信息和在时间域中快速变化的运动信息。由于深度度量学习能够学习人类行为之间的类别差异,我们利用深度度量学习来学习从原始视频到嵌入空间中紧凑特征的映射。该网络由三个主要部分组成:1)分别以低帧率和高帧率独立工作的两个分支提取时空特征;2)两个分支的特征融合;3)深度度量学习与分类损失联合训练网络。在KTH人体动作数据集上的实验结果表明,与C3D和R3D相比,该方法以更小的模型尺寸实现了更快的运行时间,同时保证了较高的精度。
{"title":"Deep Metric Learning for Human Action Recognition with SlowFast Networks","authors":"Shan-zhi Shi, Cheolkon Jung","doi":"10.1109/VCIP53242.2021.9675393","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675393","url":null,"abstract":"In this paper, we propose deep metric learning for human action recognition with SlowFast networks. We adopt SlowFast Networks to extract slow-changing spatial semantic information of a single target entity in the spatial domain with fast-changing motion information in the temporal domain. Since deep metric learning is able to learn the class difference between human actions, we utilize deep metric learning to learn a mapping from the original video to the compact features in the embedding space. The proposed network consists of three main parts: 1) two branches independently operating at low and high frame rates to extract spatial and temporal features; 2) feature fusion of the two branches; 3) joint training network of deep metric learning and classification loss. Experimental results on the KTH human action dataset demonstrate that the proposed method achieves faster runtime with less model size than C3D and R3D, while ensuring high accuracy.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114432473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding 基于学习的图像编码的大规模众包主观质量评价
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675314
Evgeniy Upenik, Michela Testolina, J. Ascenso, Fernando Pereira, T. Ebrahimi
Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based Image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naïve subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.
与传统图像编解码器(如JPEG、JPEG 2000和HEIC)引入的阻塞和模糊退化相比,基于学习的图像编解码器产生了不同的压缩伪影。本文使用基于众包的主观质量评估程序对提交给MMSP“2020基于学习的图像编码大挑战”和JPEG人工智能证据征集的一组具有代表性的端到端深度学习图像编解码器进行了基准测试。首次采用具有连续质量尺度的双刺激方法对这类图像编解码器进行评价。这项主观实验是有史以来规模最大的实验之一,包括118名naïve受试者评估的240多对比较。基于学习的图像编码解决方案与传统编解码器的基准测试结果与刺激一起组织在差分平均意见分数的数据集中,并公开提供。
{"title":"Large-Scale Crowdsourcing Subjective Quality Evaluation of Learning-Based Image Coding","authors":"Evgeniy Upenik, Michela Testolina, J. Ascenso, Fernando Pereira, T. Ebrahimi","doi":"10.1109/VCIP53242.2021.9675314","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675314","url":null,"abstract":"Learning-based image codecs produce different compression artifacts, when compared to the blocking and blurring degradation introduced by conventional image codecs, such as JPEG, JPEG 2000 and HEIC. In this paper, a crowdsourcing based subjective quality evaluation procedure was used to benchmark a representative set of end-to-end deep learning-based image codecs submitted to the MMSP'2020 Grand Challenge on Learning-Based Image Coding and the JPEG AI Call for Evidence. For the first time, a double stimulus methodology with a continuous quality scale was applied to evaluate this type of image codecs. The subjective experiment is one of the largest ever reported including more than 240 pair-comparisons evaluated by 118 naïve subjects. The results of the benchmarking of learning-based image coding solutions against conventional codecs are organized in a dataset of differential mean opinion scores along with the stimuli and made publicly available.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114636275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1