首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
Parallelized Context Modeling for Faster Image Coding 并行上下文建模用于更快的图像编码
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675377
A. B. Koyuncu, Kai Cui, A. Boev, E. Steinbach
Learning-based image compression has reached the performance of classical methods such as BPG. One common approach is to use an autoencoder network to map the pixel information to a latent space and then approximate the symbol probabilities in that space with a context model. During inference, the learned context model provides symbol probabilities, which are used by the entropy encoder to obtain the bitstream. Currently, the most effective context models use autoregression, but autoregression results in a very high decoding complexity due to the serialized data processing. In this work, we propose a method to parallelize the autoregressive process used for image compression. In our experiments, we achieve a decoding speed that is over 8 times faster than the standard autoregressive context model almost without compression performance reduction.
基于学习的图像压缩已经达到了经典方法(如BPG)的性能。一种常见的方法是使用自动编码器网络将像素信息映射到潜在空间,然后使用上下文模型近似该空间中的符号概率。在推理过程中,学习到的上下文模型提供符号概率,熵编码器使用这些概率来获得比特流。目前,最有效的上下文模型使用自回归,但自回归由于数据处理序列化导致解码复杂度很高。在这项工作中,我们提出了一种并行化用于图像压缩的自回归过程的方法。在我们的实验中,我们实现了比标准自回归上下文模型快8倍以上的解码速度,几乎没有压缩性能降低。
{"title":"Parallelized Context Modeling for Faster Image Coding","authors":"A. B. Koyuncu, Kai Cui, A. Boev, E. Steinbach","doi":"10.1109/VCIP53242.2021.9675377","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675377","url":null,"abstract":"Learning-based image compression has reached the performance of classical methods such as BPG. One common approach is to use an autoencoder network to map the pixel information to a latent space and then approximate the symbol probabilities in that space with a context model. During inference, the learned context model provides symbol probabilities, which are used by the entropy encoder to obtain the bitstream. Currently, the most effective context models use autoregression, but autoregression results in a very high decoding complexity due to the serialized data processing. In this work, we propose a method to parallelize the autoregressive process used for image compression. In our experiments, we achieve a decoding speed that is over 8 times faster than the standard autoregressive context model almost without compression performance reduction.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116848871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Video Dataset for Learning-based Visual Data Compression and Analysis 基于学习的视觉数据压缩与分析的视频数据集
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675343
Xiaozhong Xu, Shan Liu, Zeqi Li
Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. A UHD video dataset, referred to as Tencent Video Dataset (TVD), is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and segmentation. This dataset contains 86 video sequences with a variety of content coverage. Each video sequence consists of 65 frames at 4K (3840x2160) spatial resolution. In this paper, the details of this dataset, as well as its performance when compressed by VVC and HEVC video codecs, are introduced.
基于学习的视觉数据压缩与分析近年来引起了学术界和工业界的极大兴趣。更多的训练和测试数据集,特别是高质量的视频数据集是相关研究和标准化活动非常需要的。建立了一个UHD视频数据集,称为腾讯视频数据集(TVD),用于各种目的,如训练基于神经网络的编码工具和测试机器视觉任务,包括对象检测和分割。该数据集包含86个具有各种内容覆盖的视频序列。每个视频序列由65帧4K (3840x2160)空间分辨率组成。本文介绍了该数据集的细节,以及它在VVC和HEVC视频编解码器压缩后的性能。
{"title":"A Video Dataset for Learning-based Visual Data Compression and Analysis","authors":"Xiaozhong Xu, Shan Liu, Zeqi Li","doi":"10.1109/VCIP53242.2021.9675343","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675343","url":null,"abstract":"Learning-based visual data compression and analysis have attracted great interest from both academia and industry recently. More training as well as testing datasets, especially good quality video datasets are highly desirable for related research and standardization activities. A UHD video dataset, referred to as Tencent Video Dataset (TVD), is established to serve various purposes such as training neural network-based coding tools and testing machine vision tasks including object detection and segmentation. This dataset contains 86 video sequences with a variety of content coverage. Each video sequence consists of 65 frames at 4K (3840x2160) spatial resolution. In this paper, the details of this dataset, as well as its performance when compressed by VVC and HEVC video codecs, are introduced.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121390621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Video Coding Pre-Processing Based on Rate-Distortion Optimized Weighted Guided Filter 基于率失真优化加权引导滤波器的视频编码预处理
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675444
Xi Huang, Luheng Jia, Han Wang, Ke-bin Jia
In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited quality improvement. Direct using of denoising methods causes coding performance degradation, and hence not suitable for video coding scenario. In this work, we propose a video pre-processing approach by leveraging edge preserving filter specifically designed for video coding, of which filter parameters are optimized in the sense of rate-distortion (R-D) performance. The proposed pre-processing method removes low R-D cost-effective components for video encoder while keeping important structural components, leading to higher coding efficiency and also better subjective quality. Comparing with the conventional denoising filters, our proposed pre-processing method using the R-D optimized edge preserving filter can improve the coding efficiency by up to −5.2% BD-rate with low computational complexity.
在视频编码中,压缩包括噪声和视觉上难以察觉的内容在内的高频成分一直是一个棘手的问题,这些高频成分消耗了大量的带宽资源,而提高的质量却有限。直接使用去噪方法会导致编码性能下降,因此不适合视频编码场景。在这项工作中,我们提出了一种视频预处理方法,利用专为视频编码设计的边缘保持滤波器,其中滤波器参数在率失真(R-D)性能方面进行了优化。该预处理方法在保留重要结构成分的同时,去除了视频编码器中低R-D性价比的成分,提高了编码效率和主观质量。与传统的去噪滤波器相比,采用R-D优化的边缘保持滤波器的预处理方法可以将编码效率提高- 5.2%,且计算复杂度低。
{"title":"Video Coding Pre-Processing Based on Rate-Distortion Optimized Weighted Guided Filter","authors":"Xi Huang, Luheng Jia, Han Wang, Ke-bin Jia","doi":"10.1109/VCIP53242.2021.9675444","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675444","url":null,"abstract":"In video coding, it is always an intractable problem to compress high frequency components including noise and visually imperceptible content that consumes large amount bandwidth resources while providing limited quality improvement. Direct using of denoising methods causes coding performance degradation, and hence not suitable for video coding scenario. In this work, we propose a video pre-processing approach by leveraging edge preserving filter specifically designed for video coding, of which filter parameters are optimized in the sense of rate-distortion (R-D) performance. The proposed pre-processing method removes low R-D cost-effective components for video encoder while keeping important structural components, leading to higher coding efficiency and also better subjective quality. Comparing with the conventional denoising filters, our proposed pre-processing method using the R-D optimized edge preserving filter can improve the coding efficiency by up to −5.2% BD-rate with low computational complexity.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130529305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Multi-dimensional Aesthetic Quality Assessment Model for Mobile Game Images 手机游戏图像的多维审美质量评价模型
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675430
Tao Wang, Wei Sun, Xiongkuo Min, Wei Lu, Zicheng Zhang, Guangtao Zhai
With the development of the game industry and the popularization of mobile devices, mobile games have played an important role in people's entertainment life. The aesthetic quality of mobile game images determines the users' Quality of Experience (QoE) to a certain extent. In this paper, we propose a multi-task deep learning based method to evaluate the aesthetic quality of mobile game images in multiple dimensions (i.e. the fineness, color harmony, colorfulness, and overall quality). Specifically, we first extract the quality-aware feature representation through integrating the features from all intermediate layers of the convolution neural network (CNN) and then map these quality-aware features into the quality score space in each dimension via the quality regressor module, which consists of three fully connected (FC) layers. The proposed model is trained through a multi-task learning manner, where the quality-aware features are shared by different quality dimension prediction tasks, and the multi-dimensional quality scores of each image are regressed by multiple quality regression modules respectively. We further introduce an uncertainty principle to balance the loss of each task in the training stage. The experimental results show that our proposed model achieves the best performance on the Multi-dimensional Aesthetic assessment for Mobile Game image database (MAMG) among state-of-the-art image quality assessment (IQA) algorithms and aesthetic quality assessment (AQA) algorithms.
随着游戏产业的发展和移动设备的普及,手机游戏在人们的娱乐生活中扮演了重要的角色。手机游戏图像的审美质量在一定程度上决定了用户的体验质量。在本文中,我们提出了一种基于多任务深度学习的方法,从多个维度(即精细度、色彩和谐度、色彩丰富度和整体质量)来评估手机游戏图像的美学质量。具体而言,我们首先通过整合卷积神经网络(CNN)所有中间层的特征来提取质量感知特征表示,然后通过质量回归模块将这些质量感知特征映射到每个维度的质量得分空间中,该模块由三个全连接(FC)层组成。该模型采用多任务学习的方式进行训练,其中质量感知特征由不同的质量维度预测任务共享,每张图像的多维质量分数分别由多个质量回归模块进行回归。我们进一步引入不确定性原理来平衡每个任务在训练阶段的损失。实验结果表明,在最先进的图像质量评估(IQA)算法和美学质量评估(AQA)算法中,我们提出的模型在移动游戏图像数据库(MAMG)的多维美学评估上取得了最好的性能。
{"title":"A Multi-dimensional Aesthetic Quality Assessment Model for Mobile Game Images","authors":"Tao Wang, Wei Sun, Xiongkuo Min, Wei Lu, Zicheng Zhang, Guangtao Zhai","doi":"10.1109/VCIP53242.2021.9675430","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675430","url":null,"abstract":"With the development of the game industry and the popularization of mobile devices, mobile games have played an important role in people's entertainment life. The aesthetic quality of mobile game images determines the users' Quality of Experience (QoE) to a certain extent. In this paper, we propose a multi-task deep learning based method to evaluate the aesthetic quality of mobile game images in multiple dimensions (i.e. the fineness, color harmony, colorfulness, and overall quality). Specifically, we first extract the quality-aware feature representation through integrating the features from all intermediate layers of the convolution neural network (CNN) and then map these quality-aware features into the quality score space in each dimension via the quality regressor module, which consists of three fully connected (FC) layers. The proposed model is trained through a multi-task learning manner, where the quality-aware features are shared by different quality dimension prediction tasks, and the multi-dimensional quality scores of each image are regressed by multiple quality regression modules respectively. We further introduce an uncertainty principle to balance the loss of each task in the training stage. The experimental results show that our proposed model achieves the best performance on the Multi-dimensional Aesthetic assessment for Mobile Game image database (MAMG) among state-of-the-art image quality assessment (IQA) algorithms and aesthetic quality assessment (AQA) algorithms.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128104878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
VCIP 2021 Organizing Committee VCIP 2021组委会
Pub Date : 2021-12-05 DOI: 10.1109/vcip53242.2021.9675374
{"title":"VCIP 2021 Organizing Committee","authors":"","doi":"10.1109/vcip53242.2021.9675374","DOIUrl":"https://doi.org/10.1109/vcip53242.2021.9675374","url":null,"abstract":"","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125832555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-Reference Stereoscopic Image Quality Assessment Based on The Visual Pathway of Human Visual System 基于人眼视觉系统视觉通路的无参考立体图像质量评价
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675346
F. Meng, Sumei Li
With the development of stereoscopic imaging technology, stereoscopic image quality assessment (SIQA) has gradually been more and more important, and how to design a method in line with human visual perception is full of challenges due to the complex relationship between binocular views. In this article, firstly, convolutional neural network (CNN) based on the visual pathway of human visual system (HVS) is built, which simulates different parts of visual pathway such as the optic chiasm, lateral geniculate nucleus (LGN), and visual cortex. Secondly, the two pathways of our method simulate the ‘what’ and ‘where’ visual pathway respectively, which are endowed with different feature extraction capabilities. Finally, we find a different application way for 3D-convolution, employing it fuse the information from left and right view, rather than just extracting temporal features in video. The experimental results show that our proposed method is more in line with subjective score and has good generalization.
随着立体成像技术的发展,立体图像质量评估(SIQA)逐渐受到重视,双目视图之间复杂的关系使如何设计出符合人类视觉感知的方法充满了挑战。本文首先构建了基于人类视觉系统视觉通路的卷积神经网络(CNN),该网络模拟了视交叉、外侧膝状核(LGN)和视觉皮层等视觉通路的不同部分;其次,我们的方法的两条路径分别模拟了“什么”和“在哪里”的视觉路径,它们被赋予了不同的特征提取能力。最后,我们找到了一种不同的3d卷积的应用方式,利用它融合了左右视图的信息,而不仅仅是提取视频中的时间特征。实验结果表明,该方法更符合主观评分,具有较好的泛化性。
{"title":"No-Reference Stereoscopic Image Quality Assessment Based on The Visual Pathway of Human Visual System","authors":"F. Meng, Sumei Li","doi":"10.1109/VCIP53242.2021.9675346","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675346","url":null,"abstract":"With the development of stereoscopic imaging technology, stereoscopic image quality assessment (SIQA) has gradually been more and more important, and how to design a method in line with human visual perception is full of challenges due to the complex relationship between binocular views. In this article, firstly, convolutional neural network (CNN) based on the visual pathway of human visual system (HVS) is built, which simulates different parts of visual pathway such as the optic chiasm, lateral geniculate nucleus (LGN), and visual cortex. Secondly, the two pathways of our method simulate the ‘what’ and ‘where’ visual pathway respectively, which are endowed with different feature extraction capabilities. Finally, we find a different application way for 3D-convolution, employing it fuse the information from left and right view, rather than just extracting temporal features in video. The experimental results show that our proposed method is more in line with subjective score and has good generalization.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126348823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-stage Parallax Correction and Multi-stage Cross-view Fusion Network Based Stereo Image Super-Resolution 基于两级视差校正和多级交叉视点融合网络的立体图像超分辨率
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675418
Yijian Zheng, Sumei Li
Stereo image super-resolution (SR) has achieved great progress in recent years. However, the two major problems of the existing methods are that the parallax correction is insufficient and the cross-view information fusion only occurs in the beginning of the network. To address these problems, we propose a two-stage parallax correction and a multi-stage cross-view fusion network for better stereo image SR results. Specially, the two-stage parallax correction module consists of horizontal parallax correction and refined parallax correction. The first stage corrects horizontal parallax by parallax attention. The second stage is based on deformable convolution to refine horizontal parallax and correct vertical parallax simultaneously. Then, multiple cascaded enhanced residual spatial feature transform blocks are developed to fuse cross-view information at multiple stages. Extensive experiments show that our method achieves state-of-the-art performance on the KITTI2012, KITTI2015, Middlebury and Flickr1024 datasets.
立体图像超分辨率(SR)技术近年来取得了很大的进展。然而,现有方法存在视差校正不足和交叉视点信息融合只发生在网络初始阶段这两个主要问题。为了解决这些问题,我们提出了一种两阶段视差校正和多阶段交叉视点融合网络,以获得更好的立体图像SR结果。其中,两级视差校正模块包括水平视差校正和精细视差校正。第一阶段通过视差注意纠正水平视差。第二阶段是基于可变形卷积的水平视差细化和垂直视差校正同步进行。然后,开发多个级联的增强残差空间特征变换块,融合多阶段的交叉视图信息;大量的实验表明,我们的方法在KITTI2012, KITTI2015, Middlebury和Flickr1024数据集上达到了最先进的性能。
{"title":"Two-stage Parallax Correction and Multi-stage Cross-view Fusion Network Based Stereo Image Super-Resolution","authors":"Yijian Zheng, Sumei Li","doi":"10.1109/VCIP53242.2021.9675418","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675418","url":null,"abstract":"Stereo image super-resolution (SR) has achieved great progress in recent years. However, the two major problems of the existing methods are that the parallax correction is insufficient and the cross-view information fusion only occurs in the beginning of the network. To address these problems, we propose a two-stage parallax correction and a multi-stage cross-view fusion network for better stereo image SR results. Specially, the two-stage parallax correction module consists of horizontal parallax correction and refined parallax correction. The first stage corrects horizontal parallax by parallax attention. The second stage is based on deformable convolution to refine horizontal parallax and correct vertical parallax simultaneously. Then, multiple cascaded enhanced residual spatial feature transform blocks are developed to fuse cross-view information at multiple stages. Extensive experiments show that our method achieves state-of-the-art performance on the KITTI2012, KITTI2015, Middlebury and Flickr1024 datasets.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126014821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multicomponent Secondary Transform 多分量二次变换
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675447
M. Krishnan, Xin Zhao, Shanchun Liu
The Alliance for Open Media has recently initiated coding tool exploration activities towards the next-generation video coding beyond AV1. In this regard, a frequency-domain coding tool, which is designed to leverage the cross-component correlation existing between collocated chroma blocks, is explored in this paper. The tool, henceforth known as multi-component secondary transform (MCST), is implemented as a low complexity secondary transform with primary transform coefficients of multiple color components as input. The proposed tool is implemented and tested on top of libaom. Experimental results show that, compared to libaom, the proposed method achieves an average 0.34% to 0.44% overall coding efficiency for All Intra (AI) coding configuration for a wide range of video content.
开放媒体联盟最近开始了针对AV1以外的下一代视频编码的编码工具探索活动。在这方面,本文探索了一种频域编码工具,该工具旨在利用并置色度块之间存在的跨分量相关性。该工具,因此被称为多分量二次变换(MCST),被实现为一个低复杂度的二次变换,以多个颜色分量的主变换系数作为输入。该工具在libaom上进行了实现和测试。实验结果表明,与libaom相比,该方法在大范围视频内容的All Intra (AI)编码配置下的平均总编码效率为0.34% ~ 0.44%。
{"title":"Multicomponent Secondary Transform","authors":"M. Krishnan, Xin Zhao, Shanchun Liu","doi":"10.1109/VCIP53242.2021.9675447","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675447","url":null,"abstract":"The Alliance for Open Media has recently initiated coding tool exploration activities towards the next-generation video coding beyond AV1. In this regard, a frequency-domain coding tool, which is designed to leverage the cross-component correlation existing between collocated chroma blocks, is explored in this paper. The tool, henceforth known as multi-component secondary transform (MCST), is implemented as a low complexity secondary transform with primary transform coefficients of multiple color components as input. The proposed tool is implemented and tested on top of libaom. Experimental results show that, compared to libaom, the proposed method achieves an average 0.34% to 0.44% overall coding efficiency for All Intra (AI) coding configuration for a wide range of video content.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129305427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deformable Convolution Based No-Reference Stereoscopic Image Quality Assessment Considering Visual Feedback Mechanism 考虑视觉反馈机制的变形卷积无参考立体图像质量评价
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675324
Mingyue Zhou, Sumei Li
Simulation of human visual system (HVS) is very crucial for fitting human perception and improving assessment performance in stereoscopic image quality assessment (SIQA). In this paper, a no-reference SIQA method considering feedback mechanism and orientation selectivity of HVS is proposed. In HVS, feedback connections are indispensable during the process of human perception, which has not been studied in the existing SIQA models. Therefore, we design a new feedback module (FBM) to realize the guidance of the high-level region of visual cortex to the low-level region. In addition, given the orientation selectivity of primary visual cortex cells, a deformable feature extraction block is explored to simulate it, and the block can adaptively select the regions of interest. Meanwhile, retinal ganglion cells (RGCs) with different receptive fields have different sensitivities to objects of different sizes in the image. So a new multi receptive fields information extraction and fusion manner is realized in the network structure. Experimental results show that the proposed model is superior to the state-of-the-art no-reference SIQA methods and has excellent generalization ability.
在立体图像质量评价(SIQA)中,人眼视觉系统仿真对于拟合人眼感知和提高评价效果至关重要。本文提出了一种考虑反馈机制和HVS定向选择性的无参考SIQA方法。在HVS中,反馈连接在人类感知过程中是必不可少的,这在现有的SIQA模型中尚未得到研究。因此,我们设计了一种新的反馈模块(FBM)来实现视觉皮层的高阶区域对低阶区域的引导。此外,考虑到初级视觉皮层细胞的方向选择性,探索了一种可变形的特征提取块来模拟它,该块可以自适应地选择感兴趣的区域。同时,具有不同感受野的视网膜神经节细胞(RGCs)对图像中不同大小的物体具有不同的敏感性。从而在网络结构中实现了一种新的多感受野信息提取与融合方式。实验结果表明,该模型优于现有的无参考SIQA方法,具有良好的泛化能力。
{"title":"Deformable Convolution Based No-Reference Stereoscopic Image Quality Assessment Considering Visual Feedback Mechanism","authors":"Mingyue Zhou, Sumei Li","doi":"10.1109/VCIP53242.2021.9675324","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675324","url":null,"abstract":"Simulation of human visual system (HVS) is very crucial for fitting human perception and improving assessment performance in stereoscopic image quality assessment (SIQA). In this paper, a no-reference SIQA method considering feedback mechanism and orientation selectivity of HVS is proposed. In HVS, feedback connections are indispensable during the process of human perception, which has not been studied in the existing SIQA models. Therefore, we design a new feedback module (FBM) to realize the guidance of the high-level region of visual cortex to the low-level region. In addition, given the orientation selectivity of primary visual cortex cells, a deformable feature extraction block is explored to simulate it, and the block can adaptively select the regions of interest. Meanwhile, retinal ganglion cells (RGCs) with different receptive fields have different sensitivities to objects of different sizes in the image. So a new multi receptive fields information extraction and fusion manner is realized in the network structure. Experimental results show that the proposed model is superior to the state-of-the-art no-reference SIQA methods and has excellent generalization ability.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"287 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124574715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Generative DNA: Representation Learning for DNA-based Approximate Image Storage 生成DNA:基于DNA的近似图像存储的表示学习
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675366
Giulio Franzese, Yiqing Yan, G. Serra, Ivan D'Onofrio, Raja Appuswamy, P. Michiardi
Synthetic DNA has received much attention recently as a long-term archival medium alternative due to its high density and durability characteristics. However, most current work has primarily focused on using DNA as a precise storage medium. In this work, we take an alternate view of DNA. Using neural-network-based compression techniques, we transform images into a latent-space representation, which we then store on DNA. By doing so, we transform DNA into an approximate image storage medium, as images generated back from DNA are only approximate representations of the original images. Using several datasets, we investigate the storage benefits of approximation, and study the impact of DNA storage errors (substitutions, indels, bias) on the quality of approximation. In doing so, we demonstrate the feasibility and potential of viewing DNA as an approximate storage medium.
合成DNA由于其高密度和耐久性的特点,近年来作为一种长期的档案介质替代品受到了广泛的关注。然而,目前的大部分工作主要集中在使用DNA作为精确的存储介质。在这项工作中,我们对DNA采取了另一种观点。使用基于神经网络的压缩技术,我们将图像转换为潜在空间表示,然后将其存储在DNA中。通过这样做,我们将DNA转化为近似的图像存储介质,因为从DNA生成的图像只是原始图像的近似表示。使用多个数据集,我们调查了近似的存储优势,并研究了DNA存储误差(替换、索引、偏差)对近似质量的影响。在这样做的过程中,我们证明了将DNA视为近似存储介质的可行性和潜力。
{"title":"Generative DNA: Representation Learning for DNA-based Approximate Image Storage","authors":"Giulio Franzese, Yiqing Yan, G. Serra, Ivan D'Onofrio, Raja Appuswamy, P. Michiardi","doi":"10.1109/VCIP53242.2021.9675366","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675366","url":null,"abstract":"Synthetic DNA has received much attention recently as a long-term archival medium alternative due to its high density and durability characteristics. However, most current work has primarily focused on using DNA as a precise storage medium. In this work, we take an alternate view of DNA. Using neural-network-based compression techniques, we transform images into a latent-space representation, which we then store on DNA. By doing so, we transform DNA into an approximate image storage medium, as images generated back from DNA are only approximate representations of the original images. Using several datasets, we investigate the storage benefits of approximation, and study the impact of DNA storage errors (substitutions, indels, bias) on the quality of approximation. In doing so, we demonstrate the feasibility and potential of viewing DNA as an approximate storage medium.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127882885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1