首页 > 最新文献

2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Learning to encode user-generated short videos with lower bitrate and the same perceptual quality 学习以较低的比特率和相同的感知质量对用户生成的短视频进行编码
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301835
Shengbin Meng, Yang Li, Yiting Liao, Junlin Li, Shiqi Wang
On a platform of user-generated content (UGC), the uploaded videos need to be encoded again before distribution. For this specific encoding scenario, we propose a novel dataset and a corresponding learning-based scheme that is able to achieve significant bitrate saving without decreasing perceptual quality. In the dataset, each video’s label indicates whether it can be encoded with a much lower bitrate while still keeps the same perceptual quality. Models trained on this dataset can then be used to classify the input video and adjust its final encoding parameters accordingly. With enough classification accuracy, more than 20% average bitrate saving can be obtained through the proposed scheme. The dataset will be further expanded to facilitate the study on this problem.
在UGC (user-generated content)平台上,上传的视频在发布前需要重新编码。对于这种特定的编码场景,我们提出了一个新的数据集和相应的基于学习的方案,该方案能够在不降低感知质量的情况下实现显着的比特率节省。在数据集中,每个视频的标签表明它是否可以用更低的比特率编码,同时仍然保持相同的感知质量。在此数据集上训练的模型可以用来对输入视频进行分类,并相应地调整其最终的编码参数。在具有足够的分类精度的情况下,该方案可以节省20%以上的平均比特率。我们会进一步扩充数据集,以协助研究这个问题。
{"title":"Learning to encode user-generated short videos with lower bitrate and the same perceptual quality","authors":"Shengbin Meng, Yang Li, Yiting Liao, Junlin Li, Shiqi Wang","doi":"10.1109/VCIP49819.2020.9301835","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301835","url":null,"abstract":"On a platform of user-generated content (UGC), the uploaded videos need to be encoded again before distribution. For this specific encoding scenario, we propose a novel dataset and a corresponding learning-based scheme that is able to achieve significant bitrate saving without decreasing perceptual quality. In the dataset, each video’s label indicates whether it can be encoded with a much lower bitrate while still keeps the same perceptual quality. Models trained on this dataset can then be used to classify the input video and adjust its final encoding parameters accordingly. With enough classification accuracy, more than 20% average bitrate saving can be obtained through the proposed scheme. The dataset will be further expanded to facilitate the study on this problem.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129182762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Automatic Sheep Counting by Multi-object Tracking 基于多目标跟踪的自动数羊方法
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301868
Jingsong Xu, Litao Yu, Jian Zhang, Qiang Wu
Animal counting is a highly skilled yet tedious task in livestock transportation and trading. To effectively free up the human labour and provide accurate counts for sheep loading/unloading, we develop an auto sheep counting system based on multi-object detection, tracking and extrapolation techniques. Our system has demonstrated more than 99.9% accuracy with sheep moving freely in a race under optimal visual conditions.
在牲畜运输和交易中,清点牲畜是一项技术含量很高但又很繁琐的工作。为了有效地解放人力,提供准确的羊装卸计数,我们开发了一种基于多目标检测、跟踪和外推技术的自动数羊系统。我们的系统已经证明,在最佳视觉条件下,绵羊在比赛中自由移动的准确率超过99.9%。
{"title":"Automatic Sheep Counting by Multi-object Tracking","authors":"Jingsong Xu, Litao Yu, Jian Zhang, Qiang Wu","doi":"10.1109/VCIP49819.2020.9301868","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301868","url":null,"abstract":"Animal counting is a highly skilled yet tedious task in livestock transportation and trading. To effectively free up the human labour and provide accurate counts for sheep loading/unloading, we develop an auto sheep counting system based on multi-object detection, tracking and extrapolation techniques. Our system has demonstrated more than 99.9% accuracy with sheep moving freely in a race under optimal visual conditions.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120994298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Content-aware Hybrid Equi-angular Cubemap Projection for Omnidirectional Video Coding 面向全向视频编码的内容感知混合等角立方体映射投影
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301893
Jinyong Pi, Yun Zhang, Linwei Zhu, Xinju Wu, Xuemei Zhou
Omnidirectional video is required to be projected from the Three-Dimensional (3D) sphere to a Two-Dimensional (2D) plane before compression due to its spherical characteristics. Therefore, various projection formats have been proposed in recent years. However, these existing projection methods have problems of either oversampling or discontinuous boundary, which penalize the coding performance. Among them, Hybrid Equiangular Cubemap (HEC) projection has achieved significant coding gains by keeping boundary continuity when compared with Equi-Angular Cubemap (EAC) projection. However, the parameters of its mapping function are fixed and cannot adapt to the video contents, which results in non-uniform sampling in certain regions. To address this limitation, a projection method named Content-aware HEC (CHEC) is presented in this paper. In particular, these parameters of mapping function are adaptively achieved by minimizing the projection conversion distortion. Additionally, an omnidirectional video coding framework with adaptive parameters of mapping function is proposed to effectively improve the coding performance. Experimental results show that the proposed scheme achieves 8.57% and 0.11% bit rate reduction on average in terms of End-to-End Weighted to Spherically uniform Peak Signal to Noise Ratio (E2E WS-PSNR) when compared with Equi-Rectangular Projection (ERP) and HEC projections, respectively.
全向视频由于其球形特性,需要在压缩前从三维(3D)球体投影到二维(2D)平面。因此,近年来提出了各种投影格式。然而,现有的投影方法存在过采样或边界不连续等问题,影响了编码性能。其中,HEC (Hybrid equiangle Cubemap)投影与EAC (Equi-Angular Cubemap)投影相比,在保持边界连续性方面获得了显著的编码增益。然而,其映射函数的参数是固定的,不能适应视频内容,导致某些区域的采样不均匀。为了解决这一限制,本文提出了一种名为内容感知HEC (CHEC)的投影方法。特别是,映射函数的这些参数是通过最小化投影转换失真自适应实现的。此外,提出了一种具有自适应映射函数参数的全向视频编码框架,有效地提高了编码性能。实验结果表明,在端到端加权球均匀峰值信噪比(E2E WS-PSNR)方面,与等矩形投影(ERP)和HEC投影相比,该方案的比特率平均降低了8.57%和0.11%。
{"title":"Content-aware Hybrid Equi-angular Cubemap Projection for Omnidirectional Video Coding","authors":"Jinyong Pi, Yun Zhang, Linwei Zhu, Xinju Wu, Xuemei Zhou","doi":"10.1109/VCIP49819.2020.9301893","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301893","url":null,"abstract":"Omnidirectional video is required to be projected from the Three-Dimensional (3D) sphere to a Two-Dimensional (2D) plane before compression due to its spherical characteristics. Therefore, various projection formats have been proposed in recent years. However, these existing projection methods have problems of either oversampling or discontinuous boundary, which penalize the coding performance. Among them, Hybrid Equiangular Cubemap (HEC) projection has achieved significant coding gains by keeping boundary continuity when compared with Equi-Angular Cubemap (EAC) projection. However, the parameters of its mapping function are fixed and cannot adapt to the video contents, which results in non-uniform sampling in certain regions. To address this limitation, a projection method named Content-aware HEC (CHEC) is presented in this paper. In particular, these parameters of mapping function are adaptively achieved by minimizing the projection conversion distortion. Additionally, an omnidirectional video coding framework with adaptive parameters of mapping function is proposed to effectively improve the coding performance. Experimental results show that the proposed scheme achieves 8.57% and 0.11% bit rate reduction on average in terms of End-to-End Weighted to Spherically uniform Peak Signal to Noise Ratio (E2E WS-PSNR) when compared with Equi-Rectangular Projection (ERP) and HEC projections, respectively.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122842491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
HDR Deghosting Using Motion-Registration-Free Fusion in the Luminance Gradient Domain 在亮度梯度域中使用无运动配准融合的HDR去重影
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301844
Cheng-Yeh Liou, Cheng-Yen Chuang, Chia-Han Huang, Yi-Chang Lu
For most of the existing high dynamic range (HDR) deghosting flows, they require a time-consuming motion registration step to generate ghost-free HDR results. Since the motion registration step usually becomes the bottleneck of the entire flow, in this paper, we propose a novel H DR deghosting flow which does not require any motion registration process. By taking channel properties into account, the luminance and chrominance channels are fused differently in the proposed flow. Our motion-registration-free fusion could generate high-quality HDR results swiftly even if the original Low Dynamic Range (LDR) images contain objects with large foreground motions.
对于大多数现有的高动态范围(HDR)去鬼影流,它们需要一个耗时的运动配准步骤来生成无鬼影的HDR结果。由于运动配准步骤通常成为整个流的瓶颈,本文提出了一种新的不需要任何运动配准过程的hdr去重影流。通过考虑信道特性,在该流中对亮度和色度信道进行了不同的融合。即使原始的低动态范围(LDR)图像包含具有大前景运动的对象,我们的无运动配准融合也可以快速生成高质量的HDR结果。
{"title":"HDR Deghosting Using Motion-Registration-Free Fusion in the Luminance Gradient Domain","authors":"Cheng-Yeh Liou, Cheng-Yen Chuang, Chia-Han Huang, Yi-Chang Lu","doi":"10.1109/VCIP49819.2020.9301844","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301844","url":null,"abstract":"For most of the existing high dynamic range (HDR) deghosting flows, they require a time-consuming motion registration step to generate ghost-free HDR results. Since the motion registration step usually becomes the bottleneck of the entire flow, in this paper, we propose a novel H DR deghosting flow which does not require any motion registration process. By taking channel properties into account, the luminance and chrominance channels are fused differently in the proposed flow. Our motion-registration-free fusion could generate high-quality HDR results swiftly even if the original Low Dynamic Range (LDR) images contain objects with large foreground motions.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122950048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-to-Image Generation via Semi-Supervised Training 通过半监督训练生成文本到图像
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301888
Zhongyi Ji, Wenmin Wang, Baoyang Chen, Xiao Han
Synthesizing images from text is an important problem and has various applications. Most of the existing studies of text-to-image generation utilize supervised methods and rely on a fully-labeled dataset, but detailed and accurate descriptions of images are onerous to obtain. In this paper, we introduce a simple but effective semi-supervised approach that considers the feature of unlabeled images as "Pseudo Text Feature". Therefore, the unlabeled data can participate in the following training process. To achieve this, we design a Modality-invariant Semantic- consistent Module which aims to make the image feature and the text feature indistinguishable and maintain their semantic information. Extensive qualitative and quantitative experiments on MNIST and Oxford-102 flower datasets demonstrate the effectiveness of our semi-supervised method in comparison to supervised ones. We also show that the proposed method can be easily plugged into other visual generation models such as image translation and performs well.
从文本中合成图像是一个重要的问题,有各种各样的应用。大多数现有的文本到图像生成的研究都使用监督方法并依赖于完全标记的数据集,但是很难获得详细和准确的图像描述。在本文中,我们引入了一种简单而有效的半监督方法,该方法将未标记图像的特征视为“伪文本特征”。因此,未标记的数据可以参与下面的训练过程。为了实现这一目标,我们设计了一个模态不变的语义一致模块,该模块旨在使图像特征和文本特征不可区分并保持它们的语义信息。在MNIST和Oxford-102花卉数据集上进行的大量定性和定量实验表明,与有监督方法相比,我们的半监督方法是有效的。我们还表明,该方法可以很容易地插入其他视觉生成模型,如图像翻译,并表现良好。
{"title":"Text-to-Image Generation via Semi-Supervised Training","authors":"Zhongyi Ji, Wenmin Wang, Baoyang Chen, Xiao Han","doi":"10.1109/VCIP49819.2020.9301888","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301888","url":null,"abstract":"Synthesizing images from text is an important problem and has various applications. Most of the existing studies of text-to-image generation utilize supervised methods and rely on a fully-labeled dataset, but detailed and accurate descriptions of images are onerous to obtain. In this paper, we introduce a simple but effective semi-supervised approach that considers the feature of unlabeled images as \"Pseudo Text Feature\". Therefore, the unlabeled data can participate in the following training process. To achieve this, we design a Modality-invariant Semantic- consistent Module which aims to make the image feature and the text feature indistinguishable and maintain their semantic information. Extensive qualitative and quantitative experiments on MNIST and Oxford-102 flower datasets demonstrate the effectiveness of our semi-supervised method in comparison to supervised ones. We also show that the proposed method can be easily plugged into other visual generation models such as image translation and performs well.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115469204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
An Empirical Study of Emotion Recognition from Thermal Video Based on Deep Neural Networks 基于深度神经网络的热视频情绪识别实证研究
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301883
Herman Prawiro, Tse-Yu Pan, Min-Chun Hu
Emotion recognition is a crucial problem in affective computing. Most of previous works utilized facial expression from visible spectrum data to solve emotion recognition task. Thermal videos provide temperature measurement of human body over time, which can be used to recognize affective states by learning its temporal pattern. In this paper, we conduct comparative experiments to study the effectiveness of the existing deep neural networks when applied to emotion recognition task from thermal video. We analyze the effect of various approaches for frame sampling in video, temporal aggregation between frames, and different convolutional neural network architectures. To the best of our knowledge, we are the first w ork t o c onduct s tudy on emotion recognition from thermal video based on deep neural networks. Our work can provide preliminary study to design new methods for emotion recognition in thermal domain.
情感识别是情感计算中的一个关键问题。以往的工作大多是利用可见光谱数据中的面部表情来解决情绪识别任务。热视频提供了人体随时间的温度测量,可以通过学习其时间模式来识别情感状态。本文通过对比实验,研究了现有深度神经网络在热视频情感识别任务中的有效性。我们分析了视频中不同的帧采样方法、帧之间的时间聚合以及不同的卷积神经网络架构的效果。据我们所知,我们是第一个进行基于深度神经网络的热视频情绪识别研究的工作。本研究可为热域情感识别新方法的设计提供初步研究。
{"title":"An Empirical Study of Emotion Recognition from Thermal Video Based on Deep Neural Networks","authors":"Herman Prawiro, Tse-Yu Pan, Min-Chun Hu","doi":"10.1109/VCIP49819.2020.9301883","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301883","url":null,"abstract":"Emotion recognition is a crucial problem in affective computing. Most of previous works utilized facial expression from visible spectrum data to solve emotion recognition task. Thermal videos provide temperature measurement of human body over time, which can be used to recognize affective states by learning its temporal pattern. In this paper, we conduct comparative experiments to study the effectiveness of the existing deep neural networks when applied to emotion recognition task from thermal video. We analyze the effect of various approaches for frame sampling in video, temporal aggregation between frames, and different convolutional neural network architectures. To the best of our knowledge, we are the first w ork t o c onduct s tudy on emotion recognition from thermal video based on deep neural networks. Our work can provide preliminary study to design new methods for emotion recognition in thermal domain.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132319944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D-CNN Autoencoder for Plenoptic Image Compression 3D-CNN自编码器的全光学图像压缩
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301793
Tingting Zhong, Xin Jin, Kedeng Tong
Recently, plenoptic image has attracted great attentions because of its applications in various scenarios. However, high resolution and special pixel distribution structure bring huge challenges to its storage and transmission. In order to adapt compression to the structural characteristic of plenoptic image, in this paper, we propose a Data Structure Adaptive 3D-convolutional(DSA-3D) autoencoder. The DSA-3D autoencoder enables up-sampling and down-samping the sub-aperture sequence along the angular resolution or spatial resolution, thereby avoiding the artifacts caused by directly compressing plenoptic image and achieving better compression efficiency. In addition, we propose a special and efficient Square rearrangement to generate sub-aperture sequence. We compare Square with Zigzag sub-aperture sequence rearrangements, and analyzed the compression efficiency of block image compression and whole image compression. Compared with traditional hybrid encoders HEVC, JPEG2000 and JPEG PLENO(WaSP), the proposed DSA-3D(Square) autoencoder achieves a superior performance in terms of PSNR metrics.
近年来,全光学图像因其在各种场景中的应用而备受关注。然而,高分辨率和特殊的像素分布结构给其存储和传输带来了巨大的挑战。为了使压缩适应全光图像的结构特点,本文提出了一种数据结构自适应3d -卷积(DSA-3D)自编码器。DSA-3D自编码器可以沿角分辨率或空间分辨率对子孔径序列进行上采样和下采样,从而避免了直接压缩全光图像产生的伪影,获得了更好的压缩效率。此外,我们还提出了一种特殊而高效的Square重排方法来生成子孔径序列。我们比较了方形和锯齿形子孔径序列重排,并分析了块图像压缩和整幅图像压缩的压缩效率。与传统的混合编码器HEVC、JPEG2000和JPEG PLENO(WaSP)相比,本文提出的DSA-3D(Square)自编码器在PSNR指标方面具有优越的性能。
{"title":"3D-CNN Autoencoder for Plenoptic Image Compression","authors":"Tingting Zhong, Xin Jin, Kedeng Tong","doi":"10.1109/VCIP49819.2020.9301793","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301793","url":null,"abstract":"Recently, plenoptic image has attracted great attentions because of its applications in various scenarios. However, high resolution and special pixel distribution structure bring huge challenges to its storage and transmission. In order to adapt compression to the structural characteristic of plenoptic image, in this paper, we propose a Data Structure Adaptive 3D-convolutional(DSA-3D) autoencoder. The DSA-3D autoencoder enables up-sampling and down-samping the sub-aperture sequence along the angular resolution or spatial resolution, thereby avoiding the artifacts caused by directly compressing plenoptic image and achieving better compression efficiency. In addition, we propose a special and efficient Square rearrangement to generate sub-aperture sequence. We compare Square with Zigzag sub-aperture sequence rearrangements, and analyzed the compression efficiency of block image compression and whole image compression. Compared with traditional hybrid encoders HEVC, JPEG2000 and JPEG PLENO(WaSP), the proposed DSA-3D(Square) autoencoder achieves a superior performance in terms of PSNR metrics.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126754714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Theory of Occlusion for Improving Rendering Quality of Views 一种提高视图渲染质量的遮挡理论
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301887
Yijun Zeng, Weiyan Chen, Mengqin Bai, Yangdong Zeng, Changjian Zhu
Occlusion lack compensation (OLC) is a multiplexing gain optimization data acquisition and novel views rendering strategy for light field rendering (LFR). While the achieved OLC is much higher than previously thought possible, the improvement comes at the cost of requiring more scene information. This can capture more detailed scene information, including geometric information, texture information and depth information, by learning and training methods. In this paper, we develop an occlusion compensation (OCC) model based on restricted boltzmann machine (RBM) to compensate for lack scene information caused by occlusion. We show that occlusion will cause the lack of captured scene information, which will lead to the decline of view rendering quality. The OCC model can estimate and compensate the lack information of occlusion edge by learning. We present experimental results to demonstrate the performance of OCC model with analog training, verify our theoretical analysis, and extend our conclusions on optimal rendering quality of light field.
遮挡缺失补偿(OLC)是光场渲染(LFR)中一种多路增益优化数据采集和新颖的视图渲染策略。虽然实现的OLC比以前想象的要高得多,但这种改进是以需要更多的场景信息为代价的。通过学习和训练方法,可以捕获更详细的场景信息,包括几何信息、纹理信息和深度信息。本文提出了一种基于受限玻尔兹曼机(RBM)的遮挡补偿(OCC)模型,用于补偿遮挡导致的场景信息缺失。我们发现遮挡会导致捕获场景信息的缺失,从而导致视图渲染质量的下降。OCC模型可以通过学习来估计和补偿遮挡边缘信息的缺失。通过模拟训练,实验结果验证了OCC模型的性能,验证了我们的理论分析,并扩展了我们关于光场最佳渲染质量的结论。
{"title":"A Theory of Occlusion for Improving Rendering Quality of Views","authors":"Yijun Zeng, Weiyan Chen, Mengqin Bai, Yangdong Zeng, Changjian Zhu","doi":"10.1109/VCIP49819.2020.9301887","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301887","url":null,"abstract":"Occlusion lack compensation (OLC) is a multiplexing gain optimization data acquisition and novel views rendering strategy for light field rendering (LFR). While the achieved OLC is much higher than previously thought possible, the improvement comes at the cost of requiring more scene information. This can capture more detailed scene information, including geometric information, texture information and depth information, by learning and training methods. In this paper, we develop an occlusion compensation (OCC) model based on restricted boltzmann machine (RBM) to compensate for lack scene information caused by occlusion. We show that occlusion will cause the lack of captured scene information, which will lead to the decline of view rendering quality. The OCC model can estimate and compensate the lack information of occlusion edge by learning. We present experimental results to demonstrate the performance of OCC model with analog training, verify our theoretical analysis, and extend our conclusions on optimal rendering quality of light field.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114179823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSCNet: A Shallow Single Column Network for Crowd Counting CSCNet:用于人群计数的浅单列网络
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301855
Zhida Zhou, Li Su, Guorong Li, Yifan Yang, Qingming Huang
Crowd counting in complex scene is an important but challenge task. The scale variation of crowd makes the shallow network hard to extract effective features. In this paper, we propose a shallow single column network named CSCNet for crowd counting. The key component is complementary scale context block (CSCB). It is designed to capture complementary scale context and obtains a high accuracy with limited depth of the network. As far as we know, CSCNet is the shallowest single column network in existing works. We demonstrate our methods on three challenge benchmarks. Compared to state-of-the-art methods, CSCNet achieves comparable accuracy with much less complexity. CSCNet provides an alternative to achieve comparable or even better performance with about 30% of depth and 50% of width decrease. Besides, CSCNet performs more stably on both sparse and congested crowd scenes.
复杂场景中的人群计数是一项重要而富有挑战性的任务。人群的尺度变化使得浅层网络难以提取有效特征。本文提出了一种用于人群统计的浅单列网络CSCNet。关键部件是互补尺度上下文块(CSCB)。它旨在捕获互补尺度上下文,并在有限的网络深度下获得较高的精度。据我们所知,CSCNet是现有工程中最浅的单柱网。我们在三个挑战基准上演示了我们的方法。与最先进的方法相比,CSCNet以更低的复杂性实现了相当的准确性。CSCNet提供了一种替代方案,可以在深度减少30%、宽度减少50%的情况下实现相当甚至更好的性能。此外,CSCNet在稀疏和拥挤的人群场景中都表现得更加稳定。
{"title":"CSCNet: A Shallow Single Column Network for Crowd Counting","authors":"Zhida Zhou, Li Su, Guorong Li, Yifan Yang, Qingming Huang","doi":"10.1109/VCIP49819.2020.9301855","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301855","url":null,"abstract":"Crowd counting in complex scene is an important but challenge task. The scale variation of crowd makes the shallow network hard to extract effective features. In this paper, we propose a shallow single column network named CSCNet for crowd counting. The key component is complementary scale context block (CSCB). It is designed to capture complementary scale context and obtains a high accuracy with limited depth of the network. As far as we know, CSCNet is the shallowest single column network in existing works. We demonstrate our methods on three challenge benchmarks. Compared to state-of-the-art methods, CSCNet achieves comparable accuracy with much less complexity. CSCNet provides an alternative to achieve comparable or even better performance with about 30% of depth and 50% of width decrease. Besides, CSCNet performs more stably on both sparse and congested crowd scenes.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115218621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Geometric-visual descriptor for improved image based localization 改进的基于图像定位的几何视觉描述符
Pub Date : 2020-12-01 DOI: 10.1109/VCIP49819.2020.9301831
Achref Ouni, E. Royer, Marc Chevaldonné, M. Dhome
This paper addresses the problem of image based localization. The goal is to find quickly and accurately the relative pose from a query taken from a stereo camera and a map obtained using visual SLAM which contains poses and 3D points associated to descriptors. In this paper we introduce a new method that leverages the stereo vision by adding geometric information to visual descriptors. This method can be used when the vertical direction of the camera is known (for example on a wheeled robot). This new geometric visual descriptor can be used with several image based localization algorithms based on visual words. We test the approach with different datasets (indoor, outdoor) and we show experimentally that the new geometric-visual descriptor improves standard image based localization approaches.
本文研究了基于图像的定位问题。目标是从立体摄像机和使用视觉SLAM获得的地图中快速准确地找到相对姿态,该地图包含与描述符相关的姿态和3D点。本文介绍了一种利用立体视觉的方法,在视觉描述符中加入几何信息。这种方法可以在摄像机的垂直方向已知的情况下使用(例如轮式机器人)。这种新的几何视觉描述符可以与几种基于视觉词的图像定位算法一起使用。我们用不同的数据集(室内和室外)测试了该方法,并通过实验证明了新的几何视觉描述符改进了标准的基于图像的定位方法。
{"title":"Geometric-visual descriptor for improved image based localization","authors":"Achref Ouni, E. Royer, Marc Chevaldonné, M. Dhome","doi":"10.1109/VCIP49819.2020.9301831","DOIUrl":"https://doi.org/10.1109/VCIP49819.2020.9301831","url":null,"abstract":"This paper addresses the problem of image based localization. The goal is to find quickly and accurately the relative pose from a query taken from a stereo camera and a map obtained using visual SLAM which contains poses and 3D points associated to descriptors. In this paper we introduce a new method that leverages the stereo vision by adding geometric information to visual descriptors. This method can be used when the vertical direction of the camera is known (for example on a wheeled robot). This new geometric visual descriptor can be used with several image based localization algorithms based on visual words. We test the approach with different datasets (indoor, outdoor) and we show experimentally that the new geometric-visual descriptor improves standard image based localization approaches.","PeriodicalId":431880,"journal":{"name":"2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115349441","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2020 IEEE International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1