首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Polynomial Image-Based Rendering for non-Lambertian Objects 非lambertian对象的多项式图像渲染
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675371
Sarah Fachada, Daniele Bonatto, Mehrdad Teratani, G. Lafruit
Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering them with existing Depth Image-Based Rendering (DIBR) approaches, or to triangulate their surface with Structure-from-Motion (SfM). In this paper, we propose an extension of the DIBR paradigm to describe these non-linearities, by replacing the depth maps by more complete multi-channel “non-Lambertian maps”, without attempting a 3D reconstruction of the scene. We provide a study of the importance of each coefficient of the proposed map, measuring the trade-off between visual quality and data volume to optimally render non-Lambertian objects. We compare our method to other state-of-the-art image-based rendering methods and outperform them with promising subjective and objective results on a challenging dataset.
非朗伯物体呈现的方面取决于观看者对周围场景的位置。与漫射物体相反,它们的特征随相机非线性移动,无法使用现有的基于深度图像的渲染(DIBR)方法渲染它们,也无法使用运动结构(SfM)对它们的表面进行三角测量。在本文中,我们提出了DIBR范例的扩展,通过用更完整的多通道“非兰伯特地图”取代深度图来描述这些非线性,而不尝试对场景进行3D重建。我们对所提出的地图的每个系数的重要性进行了研究,测量了视觉质量和数据量之间的权衡,以最佳地呈现非兰伯物体。我们将我们的方法与其他最先进的基于图像的渲染方法进行比较,并在具有挑战性的数据集上以有希望的主观和客观结果优于它们。
{"title":"Polynomial Image-Based Rendering for non-Lambertian Objects","authors":"Sarah Fachada, Daniele Bonatto, Mehrdad Teratani, G. Lafruit","doi":"10.1109/VCIP53242.2021.9675371","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675371","url":null,"abstract":"Non-Lambertian objects present an aspect which depends on the viewer's position towards the surrounding scene. Contrary to diffuse objects, their features move non-linearly with the camera, preventing rendering them with existing Depth Image-Based Rendering (DIBR) approaches, or to triangulate their surface with Structure-from-Motion (SfM). In this paper, we propose an extension of the DIBR paradigm to describe these non-linearities, by replacing the depth maps by more complete multi-channel “non-Lambertian maps”, without attempting a 3D reconstruction of the scene. We provide a study of the importance of each coefficient of the proposed map, measuring the trade-off between visual quality and data volume to optimally render non-Lambertian objects. We compare our method to other state-of-the-art image-based rendering methods and outperform them with promising subjective and objective results on a challenging dataset.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117349813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
An Intra String Copy Approach for SCC in AVS3 AVS3中SCC的内部字符串复制方法
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675427
Liping Zhao, Kailun Zhou, Qingyang Zhou, Huihui Wang, Tao Lin
An efficient SCC tool named Intra String Copy (ISC) has been proposed and adopted in AVS3 recently. ISC has two CU-level sub-modes: FPSP (fully-matching-string and partially-matching-string based string prediction) sub-mode and EUSP (equal-value-string, unit-basis-vector-string and unmatched-pixel-string based string prediction) sub-mode. Compared with the latest AVS3 reference software HPM with SCC tools disabled, using AVS3 SCC Common Test Condition and YUV test sequences in text and graphics with motion (TGM) and mixed content (MC) categories, the proposed tool achieves an average Y BD-rate reduction of 57.7%/39.5% and 77.2%/57.9% for TGM and MC in All Intra (AI)/Low Delay B(LDB) configurations, respectively, with low additional encoding complexity and almost the same decoding complexity.
最近在AVS3中提出并采用了一种高效的SCC工具——内部字符串复制(ISC)。ISC有两个cu级子模式:FPSP(基于完全匹配字符串和部分匹配字符串的字符串预测)子模式和EUSP(基于等价字符串、基于单位的向量字符串和基于不匹配像素字符串的字符串预测)子模式。与禁用SCC工具的最新AVS3参考软件HPM相比,使用AVS3 SCC通用测试条件和带有运动(TGM)和混合内容(MC)类别的文本和图形YUV测试序列,所提出的工具在所有Intra (AI)/Low Delay B(LDB)配置下,TGM和MC的平均Y - db率分别降低了57.7%/39.5%和77.2%/57.9%,额外编码复杂度低,解码复杂度几乎相同。
{"title":"An Intra String Copy Approach for SCC in AVS3","authors":"Liping Zhao, Kailun Zhou, Qingyang Zhou, Huihui Wang, Tao Lin","doi":"10.1109/VCIP53242.2021.9675427","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675427","url":null,"abstract":"An efficient SCC tool named Intra String Copy (ISC) has been proposed and adopted in AVS3 recently. ISC has two CU-level sub-modes: FPSP (fully-matching-string and partially-matching-string based string prediction) sub-mode and EUSP (equal-value-string, unit-basis-vector-string and unmatched-pixel-string based string prediction) sub-mode. Compared with the latest AVS3 reference software HPM with SCC tools disabled, using AVS3 SCC Common Test Condition and YUV test sequences in text and graphics with motion (TGM) and mixed content (MC) categories, the proposed tool achieves an average Y BD-rate reduction of 57.7%/39.5% and 77.2%/57.9% for TGM and MC in All Intra (AI)/Low Delay B(LDB) configurations, respectively, with low additional encoding complexity and almost the same decoding complexity.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125968873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Encoder-Decoder Joint Enhancement for Video Chat 视频聊天的编码器-解码器联合增强
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675448
Zhenghao Zhang, Zhao Wang, Yan Ye, Shiqi Wang, Changwen Zheng
Video chat becomes more and more popular in our daily life. However, how to provide a high-quality video chat with the limited bandwidth is a key challenging task. In this paper, beyond the state-of-the-art video compression system, we propose an encoder-decoder joint enhancement algorithm for the video chat. In particular, the sparse map of the original frame is extracted at the encoder side and signaled to the decoder, which is utilized together with the sparse map of the decoded frame to obtain the boundary transformation map. In this manner, the boundary transformation map represents the key difference between the original frame and the decoded frame and hence can be used to enhance the decoded frame. Experimental results show that the proposed algorithm brings clear subjective and objective quality improvements. At the same quality, the proposed algorithm can achieve 35% bitrate savings compared to the VVC.
视频聊天在我们的日常生活中变得越来越流行。然而,如何在有限的带宽下提供高质量的视频聊天是一个关键的挑战。本文在现有视频压缩系统的基础上,提出了一种用于视频聊天的编码器-解码器联合增强算法。其中,在编码器侧提取原始帧的稀疏映射,并向解码器发送信号,与解码帧的稀疏映射一起利用,得到边界变换映射。这样,边界变换映射就表示了原始帧和解码帧之间的关键区别,因此可以用来增强解码帧。实验结果表明,该算法在主客观两方面都有明显的质量提升。在相同的质量下,与VVC相比,该算法可以节省35%的比特率。
{"title":"Encoder-Decoder Joint Enhancement for Video Chat","authors":"Zhenghao Zhang, Zhao Wang, Yan Ye, Shiqi Wang, Changwen Zheng","doi":"10.1109/VCIP53242.2021.9675448","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675448","url":null,"abstract":"Video chat becomes more and more popular in our daily life. However, how to provide a high-quality video chat with the limited bandwidth is a key challenging task. In this paper, beyond the state-of-the-art video compression system, we propose an encoder-decoder joint enhancement algorithm for the video chat. In particular, the sparse map of the original frame is extracted at the encoder side and signaled to the decoder, which is utilized together with the sparse map of the decoded frame to obtain the boundary transformation map. In this manner, the boundary transformation map represents the key difference between the original frame and the decoded frame and hence can be used to enhance the decoded frame. Experimental results show that the proposed algorithm brings clear subjective and objective quality improvements. At the same quality, the proposed algorithm can achieve 35% bitrate savings compared to the VVC.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126747531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stereoscopic Video Quality Assessment with Multi-level Binocular Fusion Network Considering Disparity and Multi-scale Information 考虑视差和多尺度信息的多级双目融合网络立体视频质量评估
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675404
Yingjie Feng, Sumei Li
Stereoscopic video quality assessment (SVQA) is of great importance to promote the development of the stereoscopic video industry. In this paper, we propose a three-branch multi-level binocular fusion convolutional neural network (MBFNet) which is highly consistent with human visual perception. Our network mainly includes three innovative structures. Firstly, we construct a multi-scale cross-dimension attention module (MSCAM) on the left and right branches to capture more critical semantic information. Then, we design a multi-level binocular fusion unit (MBFU) to fuse the features from left and right branches adaptively. Besides, a disparity compensation branch (DCB) containing an enhancement unit (EU) is added to provide disparity feature. The experimental results show that the proposed method is superior to other existing SVQA methods with state-of-the-art performance.
立体视频质量评价(SVQA)对促进立体视频产业的发展具有重要意义。本文提出了一种与人眼视觉高度一致的三分支多级双目融合卷积神经网络(MBFNet)。我们的网络主要包括三个创新结构。首先,我们在左分支和右分支上构建多尺度跨维注意模块(MSCAM),以捕获更关键的语义信息。然后,我们设计了一个多级双目融合单元(MBFU)来自适应地融合左右分支的特征。此外,还增加了包含增强单元的视差补偿分支(DCB)来提供视差特征。实验结果表明,该方法优于现有的SVQA方法,具有较好的性能。
{"title":"Stereoscopic Video Quality Assessment with Multi-level Binocular Fusion Network Considering Disparity and Multi-scale Information","authors":"Yingjie Feng, Sumei Li","doi":"10.1109/VCIP53242.2021.9675404","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675404","url":null,"abstract":"Stereoscopic video quality assessment (SVQA) is of great importance to promote the development of the stereoscopic video industry. In this paper, we propose a three-branch multi-level binocular fusion convolutional neural network (MBFNet) which is highly consistent with human visual perception. Our network mainly includes three innovative structures. Firstly, we construct a multi-scale cross-dimension attention module (MSCAM) on the left and right branches to capture more critical semantic information. Then, we design a multi-level binocular fusion unit (MBFU) to fuse the features from left and right branches adaptively. Besides, a disparity compensation branch (DCB) containing an enhancement unit (EU) is added to provide disparity feature. The experimental results show that the proposed method is superior to other existing SVQA methods with state-of-the-art performance.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115327696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Deep Inter Prediction via Reference Frame Interpolation for Blurry Video Coding 基于参考帧插值的模糊视频编码深度预测
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675429
Zezhi Zhu, Lili Zhao, Xuhu Lin, Xuezhou Guo, Jianwen Chen
In High Efficiency Video Coding (HEVC), inter prediction is an important module for removing temporal redundancy. The accuracy of inter prediction is much affected by the similarity between the current and reference frames. However, for blurry videos, the performance of inter coding will be degraded by varying motion blur, which is derived from camera shake or the acceleration of objects in the scene. To address this problem, we propose to synthesize additional reference frame via the frame interpolation network. The synthesized reference frame is added into reference picture lists to supply more credible reference candidate, and the searching mechanism for motion candidates is changed accordingly. In addition, to make our interpolation network more robust to various inputs with different compression artifacts, we establish a new blurry video database to train our network. With the well-trained frame interpolation network, compared with the reference software HM-16.9, the proposed method achieves on average 1.55% BD-rate reduction under random access (RA) configuration for blurry videos, and also obtains on average 0.75% BD-rate reduction for common test sequences.
在高效视频编码(HEVC)中,相互预测是消除时间冗余的重要模块。当前框架和参考框架之间的相似性很大程度上影响了内部预测的准确性。然而,对于模糊视频,由于摄像机抖动或场景中物体的加速度而产生的运动模糊会降低互编码的性能。为了解决这个问题,我们提出通过帧插值网络合成额外的参考帧。将合成的参考帧加入到参考图片列表中,以提供更可信的候选参考帧,并相应地改变运动候选帧的搜索机制。此外,为了使我们的插值网络对具有不同压缩伪像的各种输入具有更强的鲁棒性,我们建立了一个新的模糊视频数据库来训练我们的网络。通过训练良好的帧插值网络,与参考软件HM-16.9相比,该方法在随机访问(RA)配置下对模糊视频的平均bd率降低了1.55%,对普通测试序列的平均bd率降低了0.75%。
{"title":"Deep Inter Prediction via Reference Frame Interpolation for Blurry Video Coding","authors":"Zezhi Zhu, Lili Zhao, Xuhu Lin, Xuezhou Guo, Jianwen Chen","doi":"10.1109/VCIP53242.2021.9675429","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675429","url":null,"abstract":"In High Efficiency Video Coding (HEVC), inter prediction is an important module for removing temporal redundancy. The accuracy of inter prediction is much affected by the similarity between the current and reference frames. However, for blurry videos, the performance of inter coding will be degraded by varying motion blur, which is derived from camera shake or the acceleration of objects in the scene. To address this problem, we propose to synthesize additional reference frame via the frame interpolation network. The synthesized reference frame is added into reference picture lists to supply more credible reference candidate, and the searching mechanism for motion candidates is changed accordingly. In addition, to make our interpolation network more robust to various inputs with different compression artifacts, we establish a new blurry video database to train our network. With the well-trained frame interpolation network, compared with the reference software HM-16.9, the proposed method achieves on average 1.55% BD-rate reduction under random access (RA) configuration for blurry videos, and also obtains on average 0.75% BD-rate reduction for common test sequences.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116550543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Study on 4D Light Field Compression Using Multi-focus Images and Reference Views 基于多聚焦图像和参考视图的四维光场压缩研究
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675378
Shuho Umebayashi, K. Kodama, T. Hamamoto
We propose a novel method of light field compression using multi-focus images and reference views. Light fields enable us to observe scenes from various viewpoints. However, it generally consists of 4D enormous data, that are not suitable for storing or transmitting without effective compression at relatively low bit-rates. On the other hand, 4D light fields are essentially redundant because it includes just 3D scene information. While robust 3D scene estimation such as depth recovery from light fields is not so easy, a method of reconstructing light fields directly from 3D information composed of multi-focus images without any scene estimation is successfully derived. Based on the method, we previously proposed light field compression via multi-focus images as effective representation of 3D scenes. Actually, its high performance can be seen only at very low bit-rates, because there exists some degradation of low frequency components and occluded regions on light fields predicted from multi-focus images. In this paper, we study higher quality light field compression by using reference views to improve quality of the prediction from multi-focus images. Our contribution is twofold: first, our improved method can keep good performance of 4D light field compression at a wider range of low bit-rates than the previous one working effectively only for very low bit-rates; second, we clarify how the proposed method can improve its performance continuously by introducing recent video codec such as HEVC and VVC into our compression framework, that does not depend on 3D-SPIHT previously adopted for the corresponding component. We show experimental results by using synthetic and real images, where quality of reconstructed light fields is evaluated by PSNR and SSIM for analyzing characteristics of our novel method well. We notice that it is much superior to light field compression using HEVC directly at low bit-rates regardless of its light field scan order.
提出了一种利用多聚焦图像和参考视图进行光场压缩的新方法。光场使我们能够从不同的角度观察景物。但是,它通常由4D庞大的数据组成,如果不以相对较低的比特率进行有效压缩,则不适合存储或传输。另一方面,4D光场本质上是冗余的,因为它只包含3D场景信息。针对光场深度恢复等鲁棒三维场景估计不容易实现的问题,本文成功地推导了一种直接从多焦图像组成的三维信息中重建光场的方法。在此基础上,我们之前提出了通过多聚焦图像进行光场压缩,作为3D场景的有效表示。实际上,它的高性能只能在非常低的比特率下才能看到,因为在多聚焦图像预测的光场上存在一些低频分量的退化和遮挡区域。为了提高多聚焦图像的预测质量,我们研究了利用参考视图进行高质量的光场压缩。我们的贡献体现在两个方面:首先,我们改进的方法可以在更宽的低比特率范围内保持良好的4D光场压缩性能,而之前的方法只能在非常低的比特率下有效地工作;其次,我们阐明了所提出的方法如何通过在我们的压缩框架中引入最新的视频编解码器(如HEVC和VVC)来不断提高其性能,这些编解码器不依赖于之前针对相应组件采用的3D-SPIHT。在合成图像和真实图像的实验结果中,利用PSNR和SSIM评价了重建光场的质量,很好地分析了新方法的特点。我们注意到,无论其光场扫描顺序如何,它都比直接使用HEVC进行低比特率的光场压缩优越得多。
{"title":"A Study on 4D Light Field Compression Using Multi-focus Images and Reference Views","authors":"Shuho Umebayashi, K. Kodama, T. Hamamoto","doi":"10.1109/VCIP53242.2021.9675378","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675378","url":null,"abstract":"We propose a novel method of light field compression using multi-focus images and reference views. Light fields enable us to observe scenes from various viewpoints. However, it generally consists of 4D enormous data, that are not suitable for storing or transmitting without effective compression at relatively low bit-rates. On the other hand, 4D light fields are essentially redundant because it includes just 3D scene information. While robust 3D scene estimation such as depth recovery from light fields is not so easy, a method of reconstructing light fields directly from 3D information composed of multi-focus images without any scene estimation is successfully derived. Based on the method, we previously proposed light field compression via multi-focus images as effective representation of 3D scenes. Actually, its high performance can be seen only at very low bit-rates, because there exists some degradation of low frequency components and occluded regions on light fields predicted from multi-focus images. In this paper, we study higher quality light field compression by using reference views to improve quality of the prediction from multi-focus images. Our contribution is twofold: first, our improved method can keep good performance of 4D light field compression at a wider range of low bit-rates than the previous one working effectively only for very low bit-rates; second, we clarify how the proposed method can improve its performance continuously by introducing recent video codec such as HEVC and VVC into our compression framework, that does not depend on 3D-SPIHT previously adopted for the corresponding component. We show experimental results by using synthetic and real images, where quality of reconstructed light fields is evaluated by PSNR and SSIM for analyzing characteristics of our novel method well. We notice that it is much superior to light field compression using HEVC directly at low bit-rates regardless of its light field scan order.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Binocular Visual Mechanism Guided No-Reference Stereoscopic Image Quality Assessment Considering Spatial Saliency 考虑空间显著性的双目视觉机制引导无参考立体图像质量评价
Pub Date : 2021-12-05 DOI: 10.1109/vcip53242.2021.9675338
Jinhui Feng, Sumei Li, Yongli Chang
In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.
近年来,随着3D技术的普及,立体图像质量评价(SIQA)受到了广泛关注。在本文中,我们提出了一个两阶段的SIQA双目融合网络,该网络考虑了双眼融合、双眼竞争和双眼抑制,以模仿人类大脑中复杂的双眼视觉机制。此外,为了提取左视图、右视图和融合视图的空间显著性特征,在网络中应用了显著性生成层(SGLs)。SGL采用多尺度展开卷积来强调输入特征的基本空间信息。在四个公共立体图像数据库上的实验结果表明,该方法在对称和不对称畸变立体图像上都优于现有的SIQA方法。
{"title":"Binocular Visual Mechanism Guided No-Reference Stereoscopic Image Quality Assessment Considering Spatial Saliency","authors":"Jinhui Feng, Sumei Li, Yongli Chang","doi":"10.1109/vcip53242.2021.9675338","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675338","url":null,"abstract":"In recent years, with the popularization of 3D technology, stereoscopic image quality assessment (SIQA) has attracted extensive attention. In this paper, we propose a two-stage binocular fusion network for SIQA, which takes binocular fusion, binocular rivalry and binocular suppression into account to imitate the complex binocular visual mechanism in the human brain. Besides, to extract spatial saliency features of the left view, the right view, and the fusion view, saliency generating layers (SGLs) are applied in the network. The SGL apply multi-scale dilated convolution to emphasize essential spatial information of the input features. Experimental results on four public stereoscopic image databases demonstrate that the proposed method outperforms the state-of-the-art SIQA methods on both symmetrical and asymmetrical distortion stereoscopic images.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124675516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stereo Image Super-Resolution Based on Pixel-Wise Knowledge Distillation Strategy 基于逐像素知识蒸馏策略的立体图像超分辨率
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675446
Li Ma, Sumei Li
In stereo image super-resolution (SR), it is equally important to utilize intra-view and cross-view information. However, most existing methods only focus on the exploration of cross-view information and neglect the full mining of intra-view information, which limits the reconstruction performance of these methods. Since single image SR (SISR) methods are powerful in intra-view information exploitation, we propose to introduce the knowledge distillation strategy to transfer the knowledge of a SISR network (teacher network) to a stereo image SR network (student network). With the help of the teacher network, the student network can easily learn more intra-view information. Specifically, we propose pixel-wise distillation as the implementation method, which not only improves the intra-view information extraction ability of student network, but also ensures the effective learning of cross-view information. Moreover, we propose a lightweight student network named Adaptive Residual Feature Aggregation network (ARFAnet). Its main unit, the ARFA module, can aggregate informative residual features and produce more representative features for image reconstruction. Experimental results demonstrate that our teacher-student network achieves state-of-the-art performance on all benchmark datasets.
在立体图像超分辨率(SR)中,利用视图内和跨视图信息同样重要。然而,现有的方法大多只关注跨视图信息的挖掘,而忽略了对视图内信息的充分挖掘,这限制了这些方法的重建性能。由于单幅图像SR (SISR)方法在视图内信息开发方面具有强大的功能,我们建议引入知识蒸馏策略,将单幅图像SR网络(教师网络)中的知识转移到立体图像SR网络(学生网络)中。在教师网络的帮助下,学生网络可以很容易地了解更多的视图内信息。具体来说,我们提出逐像素提取作为实现方法,既提高了学生网络的视图内信息提取能力,又保证了跨视图信息的有效学习。此外,我们还提出了一种轻量级的学生网络——自适应残差特征聚合网络(ARFAnet)。其主要单元ARFA模块可以聚合信息残差特征,产生更有代表性的图像重构特征。实验结果表明,我们的师生网络在所有基准数据集上都达到了最先进的性能。
{"title":"Stereo Image Super-Resolution Based on Pixel-Wise Knowledge Distillation Strategy","authors":"Li Ma, Sumei Li","doi":"10.1109/VCIP53242.2021.9675446","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675446","url":null,"abstract":"In stereo image super-resolution (SR), it is equally important to utilize intra-view and cross-view information. However, most existing methods only focus on the exploration of cross-view information and neglect the full mining of intra-view information, which limits the reconstruction performance of these methods. Since single image SR (SISR) methods are powerful in intra-view information exploitation, we propose to introduce the knowledge distillation strategy to transfer the knowledge of a SISR network (teacher network) to a stereo image SR network (student network). With the help of the teacher network, the student network can easily learn more intra-view information. Specifically, we propose pixel-wise distillation as the implementation method, which not only improves the intra-view information extraction ability of student network, but also ensures the effective learning of cross-view information. Moreover, we propose a lightweight student network named Adaptive Residual Feature Aggregation network (ARFAnet). Its main unit, the ARFA module, can aggregate informative residual features and produce more representative features for image reconstruction. Experimental results demonstrate that our teacher-student network achieves state-of-the-art performance on all benchmark datasets.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129482859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DensER: Density-imbalance-Eased Representation for LiDAR-based Whole Scene Upsampling DensER:基于激光雷达的全场景上采样的密度不平衡简化表示
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675334
Tso-Yuan Chen, Ching-Chun Hsiao, Wen-Huang Cheng, Hong-Han Shuai, Peter Chen, Ching-Chun Huang
With the development of depth sensors, 3D point cloud upsampling that generates a high-resolution point cloud given a sparse input becomes emergent. However, many previous works focused on single 3D object reconstruction and refinement. Although a few recent works began to discuss 3D structure refine-ment for a more complex scene, they do not target LiDAR-based point clouds, which have density imbalance issues from near to far. This paper proposed DensER, a Density-imbalance-Eased regional Representation. Notably, to learn robust representations and model local geometry under imbalance point density, we designed density-aware multiple receptive fields to extract the regional features. Moreover, founded on the patch reoccurrence property of a nature scene, we proposed a density-aided attentive module to enrich the extracted features of point-sparse areas by referring to other non-local regions. Finally, by coupling with novel manifold-based upsamplers, DensER shows the ability to super-resolve LiDAR-based whole-scene point clouds. The exper-imental results show DensER outperforms related works both in qualitative and quantitative evaluation. We also demonstrate that the enhanced point clouds can improve downstream tasks such as 3D object detection and depth completion.
随着深度传感器的发展,在稀疏输入条件下生成高分辨率点云的三维点云上采样技术应运而生。然而,许多先前的工作集中在单个3D对象的重建和细化。尽管最近的一些工作开始讨论更复杂场景的3D结构优化,但他们并没有针对基于lidar的点云,这些点云具有从近到远的密度不平衡问题。本文提出了密度-不平衡-简化区域表示。此外,基于自然场景的斑块重现特性,我们提出了一种密度辅助关注模块,通过参考其他非局部区域来丰富点稀疏区域的提取特征。最后,通过与新型的基于流形的上采样器相结合,DensER显示了超分辨基于激光雷达的全场景点云的能力。实验结果表明,DensER在定性和定量评价方面都优于相关作品。我们还证明了增强的点云可以改善下游任务,如3D目标检测和深度完成。
{"title":"DensER: Density-imbalance-Eased Representation for LiDAR-based Whole Scene Upsampling","authors":"Tso-Yuan Chen, Ching-Chun Hsiao, Wen-Huang Cheng, Hong-Han Shuai, Peter Chen, Ching-Chun Huang","doi":"10.1109/VCIP53242.2021.9675334","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675334","url":null,"abstract":"With the development of depth sensors, 3D point cloud upsampling that generates a high-resolution point cloud given a sparse input becomes emergent. However, many previous works focused on single 3D object reconstruction and refinement. Although a few recent works began to discuss 3D structure refine-ment for a more complex scene, they do not target LiDAR-based point clouds, which have density imbalance issues from near to far. This paper proposed DensER, a Density-imbalance-Eased regional Representation. Notably, to learn robust representations and model local geometry under imbalance point density, we designed density-aware multiple receptive fields to extract the regional features. Moreover, founded on the patch reoccurrence property of a nature scene, we proposed a density-aided attentive module to enrich the extracted features of point-sparse areas by referring to other non-local regions. Finally, by coupling with novel manifold-based upsamplers, DensER shows the ability to super-resolve LiDAR-based whole-scene point clouds. The exper-imental results show DensER outperforms related works both in qualitative and quantitative evaluation. We also demonstrate that the enhanced point clouds can improve downstream tasks such as 3D object detection and depth completion.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Revisiting Flipping Strategy for Learning-based Stereo Depth Estimation 基于学习的立体深度估计翻转策略研究
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675450
Yue Li, Yueyi Zhang, Zhiwei Xiong
Deep neural networks (DNNs) have been widely used for stereo depth estimation, which achieve great success in performance. In this paper, we introduce a novel flipping strategy for DNN on the stereo depth estimation task. Specifically, based on a common DNN for stereo matching, we apply the flipping operation for both input stereo images, which are further fed to the original DNN. A flipping loss function is proposed to jointly train the network with the initial loss. We apply our strategy to many representative networks in both supervised and self-supervised manners. Extensive experimental results demonstrate that our proposed strategy improves the performance of these networks.
深度神经网络在立体深度估计中得到了广泛的应用,在性能上取得了很大的成功。本文在立体深度估计任务中引入了一种新的深度神经网络翻转策略。具体来说,基于一个用于立体匹配的通用深度神经网络,我们对两个输入的立体图像应用翻转操作,并将其进一步馈送到原始深度神经网络。提出了一个翻转损失函数与初始损失联合训练网络。我们以监督和自监督的方式将我们的策略应用于许多具有代表性的网络。大量的实验结果表明,我们提出的策略提高了这些网络的性能。
{"title":"Revisiting Flipping Strategy for Learning-based Stereo Depth Estimation","authors":"Yue Li, Yueyi Zhang, Zhiwei Xiong","doi":"10.1109/VCIP53242.2021.9675450","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675450","url":null,"abstract":"Deep neural networks (DNNs) have been widely used for stereo depth estimation, which achieve great success in performance. In this paper, we introduce a novel flipping strategy for DNN on the stereo depth estimation task. Specifically, based on a common DNN for stereo matching, we apply the flipping operation for both input stereo images, which are further fed to the original DNN. A flipping loss function is proposed to jointly train the network with the initial loss. We apply our strategy to many representative networks in both supervised and self-supervised manners. Extensive experimental results demonstrate that our proposed strategy improves the performance of these networks.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"05 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115930092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1