首页 > 最新文献

2021 International Conference on Visual Communications and Image Processing (VCIP)最新文献

英文 中文
On the Impact of Viewing Distance on Perceived Video Quality 观看距离对感知视频质量的影响
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675431
Hadi Amirpour, R. Schatz, C. Timmerer, M. Ghanbari
Due to the growing importance of optimizing the quality and efficiency of video streaming delivery, accurate assessment of user-perceived video quality becomes increasingly important. However, due to the wide range of viewing distances encountered in real-world viewing settings, the perceived video quality can vary significantly in everyday viewing situations. In this paper, we investigate and quantify the influence of viewing distance on perceived video quality. A subjective experiment was conducted with full HD sequences at three different fixed viewing distances, with each video sequence being encoded at three different quality levels. Our study results confirm that the viewing distance has a significant influence on the quality assessment. In particular, they show that an increased viewing distance generally leads to increased perceived video quality, especially at low media encoding quality levels. In this context, we also provide an estimation of potential bitrate savings that knowledge of actual viewing distance would enable in practice. Since current objective video quality metrics do not systematically take into account viewing distance, we also analyze and quantify the influence of viewing distance on the correlation between objective and subjective metrics. Our results confirm the need for distance-aware objective metrics when the accurate prediction of perceived video quality in real-world environments is required.
由于优化视频流传输的质量和效率变得越来越重要,准确评估用户感知的视频质量变得越来越重要。然而,由于在现实世界的观看设置中遇到的观看距离范围很大,因此在日常观看情况下,感知到的视频质量可能会有很大差异。在本文中,我们研究并量化了观看距离对感知视频质量的影响。在三种不同的固定观看距离下对全高清视频序列进行主观实验,每个视频序列以三种不同的质量水平进行编码。我们的研究结果证实了观看距离对质量评价有显著影响。特别是,他们表明,观看距离的增加通常会导致感知视频质量的提高,特别是在低媒体编码质量水平下。在这种情况下,我们还提供了潜在比特率节省的估计,实际观看距离的知识将在实践中实现。由于目前的客观视频质量指标没有系统地考虑观看距离,我们还分析和量化了观看距离对客观和主观指标之间相关性的影响。我们的研究结果证实,当需要在现实世界环境中准确预测感知视频质量时,需要距离感知客观指标。
{"title":"On the Impact of Viewing Distance on Perceived Video Quality","authors":"Hadi Amirpour, R. Schatz, C. Timmerer, M. Ghanbari","doi":"10.1109/VCIP53242.2021.9675431","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675431","url":null,"abstract":"Due to the growing importance of optimizing the quality and efficiency of video streaming delivery, accurate assessment of user-perceived video quality becomes increasingly important. However, due to the wide range of viewing distances encountered in real-world viewing settings, the perceived video quality can vary significantly in everyday viewing situations. In this paper, we investigate and quantify the influence of viewing distance on perceived video quality. A subjective experiment was conducted with full HD sequences at three different fixed viewing distances, with each video sequence being encoded at three different quality levels. Our study results confirm that the viewing distance has a significant influence on the quality assessment. In particular, they show that an increased viewing distance generally leads to increased perceived video quality, especially at low media encoding quality levels. In this context, we also provide an estimation of potential bitrate savings that knowledge of actual viewing distance would enable in practice. Since current objective video quality metrics do not systematically take into account viewing distance, we also analyze and quantify the influence of viewing distance on the correlation between objective and subjective metrics. Our results confirm the need for distance-aware objective metrics when the accurate prediction of perceived video quality in real-world environments is required.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115268245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
A Distortion Propagation Oriented CU-tree Algorithm for x265 面向x265失真传播的cu树算法
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675426
Xinye Jiang, Zhenyu Liu, Yongbing Zhang, Xiangyang Ji
Rate-distortion optimization (RDO) is widely used in video coding to improve coding efficiency. Conventionally, RDO is applied to each block independently to avoid high computational complexity. However, various prediction techniques introduce spatio-temporal dependency between blocks, therefore the independent RDO is not optimal. Specifically, because of the motion compensation, the distortion of reference blocks will affect the quality of subsequent prediction blocks. And considering this temporal dependency in RDO can improve the global rate-distortion (R-D) performance. x265 leveraged on a lookahead module to analyze the temporal dependency between blocks, and weighted the quality of each block based on its reference strength. However, the original algorithm in x265 ignored the impacts of quantization, and this shortcoming degraded the R-D performance of x265. In this paper, we propose a new linear distortion propagation model to estimate the temporal dependency, which introduces the impacts of quantization. And from a perspective of global RDO, a corresponding adaptive quantization formula is presented. The proposed algorithm was conducted in x265 version 3.2. Experiments revealed that, the proposed algorithm achieved average 15.43% PSNR-based and 23.81% SSIM-based BD-rate reductions, which outperformed the original algorithm in x265 by 4.14% and 9.68%, respectively.
码率失真优化(RDO)被广泛应用于视频编码中,以提高编码效率。传统上,RDO是独立应用于每个块,以避免高的计算复杂度。然而,各种预测技术引入了块之间的时空依赖性,因此独立的RDO不是最优的。具体来说,由于运动补偿,参考块的畸变会影响后续预测块的质量。在RDO中考虑这种时间依赖性可以提高全局率失真(R-D)性能。X265利用向前看模块来分析块之间的时间依赖性,并根据其引用强度对每个块的质量进行加权。但是,x265中的原始算法忽略了量化的影响,这一缺点降低了x265的R-D性能。在本文中,我们提出了一个新的线性失真传播模型来估计时间依赖性,该模型引入了量化的影响。并从全局RDO的角度,给出了相应的自适应量化公式。该算法在x265 3.2版本中进行。实验表明,该算法基于psnr和ssim的平均降噪率分别达到15.43%和23.81%,分别比x265的原始算法高4.14%和9.68%。
{"title":"A Distortion Propagation Oriented CU-tree Algorithm for x265","authors":"Xinye Jiang, Zhenyu Liu, Yongbing Zhang, Xiangyang Ji","doi":"10.1109/VCIP53242.2021.9675426","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675426","url":null,"abstract":"Rate-distortion optimization (RDO) is widely used in video coding to improve coding efficiency. Conventionally, RDO is applied to each block independently to avoid high computational complexity. However, various prediction techniques introduce spatio-temporal dependency between blocks, therefore the independent RDO is not optimal. Specifically, because of the motion compensation, the distortion of reference blocks will affect the quality of subsequent prediction blocks. And considering this temporal dependency in RDO can improve the global rate-distortion (R-D) performance. x265 leveraged on a lookahead module to analyze the temporal dependency between blocks, and weighted the quality of each block based on its reference strength. However, the original algorithm in x265 ignored the impacts of quantization, and this shortcoming degraded the R-D performance of x265. In this paper, we propose a new linear distortion propagation model to estimate the temporal dependency, which introduces the impacts of quantization. And from a perspective of global RDO, a corresponding adaptive quantization formula is presented. The proposed algorithm was conducted in x265 version 3.2. Experiments revealed that, the proposed algorithm achieved average 15.43% PSNR-based and 23.81% SSIM-based BD-rate reductions, which outperformed the original algorithm in x265 by 4.14% and 9.68%, respectively.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131560756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dictionary Learning-based Reference Picture Resampling in VVC 基于字典学习的VVC参考图片重采样
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675361
J. Schneider, Christian Rohlfing
Versatile Video Coding (VVC) introduces the con-cept of Reference Picture Resampling (RPR), which allows for a resolution change of the video during decoding, without introducing an additional Intra Random Access Point (IRAP) into the bitstream. When the resolution is increased, an upsampling operation of the reference picture is required in order to apply motion compensated prediction. Conceptually, the upsampling by linear interpolation filters fails to recover frequencies which were lost during downsampling. Yet, the quality of the upsampled reference picture is crucial to the pre-diction performance. In recent years, machine learning based Super-Resolution (SR) has shown to outperform conventional interpolation filters by far in regard to super-resolving a previ-ously downsampled image. In particular, Dictionary Learning-based Super-Resolution (DLSR) was shown to improve the inter-layer prediction in SHVC [1]. Thus, this paper introduces DLSR to the prediction process in RPR. Further, the approach is experimentally evaluated by an implementation based on the VTM-9.3 reference software. The simulation results show a reduction of the instantaneous bitrate of 0.98% on average at the same objective quality in terms of PSNR. Moreover, the peak bitrate reduction is measured to 4.74% for the “Johnny” sequence of the JVET test set.
通用视频编码(VVC)引入了参考图像重采样(RPR)的概念,它允许在解码过程中改变视频的分辨率,而无需在比特流中引入额外的内部随机接入点(IRAP)。当分辨率增加时,为了应用运动补偿预测,需要对参考图像进行上采样操作。从概念上讲,线性插值滤波器的上采样不能恢复下采样期间丢失的频率。然而,上采样参考图像的质量对预测性能至关重要。近年来,基于机器学习的超分辨率(SR)在对先前下采样图像的超分辨率方面表现优于传统的插值滤波器。特别是,基于字典学习的超分辨率(DLSR)被证明可以改善SHVC中的层间预测[1]。因此,本文将DLSR引入到RPR的预测过程中。此外,基于VTM-9.3参考软件的实现对该方法进行了实验评估。仿真结果表明,在相同物镜质量的情况下,PSNR的瞬时比特率平均降低了0.98%。此外,JVET测试集的“Johnny”序列的峰值比特率降低为4.74%。
{"title":"Dictionary Learning-based Reference Picture Resampling in VVC","authors":"J. Schneider, Christian Rohlfing","doi":"10.1109/VCIP53242.2021.9675361","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675361","url":null,"abstract":"Versatile Video Coding (VVC) introduces the con-cept of Reference Picture Resampling (RPR), which allows for a resolution change of the video during decoding, without introducing an additional Intra Random Access Point (IRAP) into the bitstream. When the resolution is increased, an upsampling operation of the reference picture is required in order to apply motion compensated prediction. Conceptually, the upsampling by linear interpolation filters fails to recover frequencies which were lost during downsampling. Yet, the quality of the upsampled reference picture is crucial to the pre-diction performance. In recent years, machine learning based Super-Resolution (SR) has shown to outperform conventional interpolation filters by far in regard to super-resolving a previ-ously downsampled image. In particular, Dictionary Learning-based Super-Resolution (DLSR) was shown to improve the inter-layer prediction in SHVC [1]. Thus, this paper introduces DLSR to the prediction process in RPR. Further, the approach is experimentally evaluated by an implementation based on the VTM-9.3 reference software. The simulation results show a reduction of the instantaneous bitrate of 0.98% on average at the same objective quality in terms of PSNR. Moreover, the peak bitrate reduction is measured to 4.74% for the “Johnny” sequence of the JVET test set.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134548173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Error Self-learning Semi-supervised Method for No-reference Image Quality Assessment 无参考图像质量评价的误差自学习半监督方法
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675352
Yingjie Feng, Sumei Li, Sihan Hao
In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images are available in image quality assessment (IQA) field for deep learning, which heavily hinders the development and application for IQA. To tackle this problem, in this paper, we proposed an error self-learning semi-supervised method for no-reference (NR) IQA (ESSIQA), which is based on deep learning. We employed an advanced full reference (FR) IQA method to expand databases and supervise the training of network. In addition, the network outputs of expanding images were used as proxy labels replacing errors between subjective scores and objective scores to achieve error self-learning. Two weights of error back propagation were designed to reduce the impact of inaccurate outputs. The experimental results show that the proposed method yielded comparative effect.
近年来,深度学习在许多方面都取得了重大进展。然而,与图像识别等其他拥有数百万标记数据的研究领域不同,深度学习的图像质量评估(IQA)领域只有几千张标记图像,这严重阻碍了IQA的发展和应用。为了解决这一问题,本文提出了一种基于深度学习的无参考(NR) IQA (ESSIQA)错误自学习半监督方法。我们采用了一种先进的全参考(FR) IQA方法来扩展数据库和监督网络的训练。此外,将扩展图像的网络输出作为代理标签,替换主观评分与客观评分之间的误差,实现误差自学习。设计了误差反向传播的两个权值,以减少不准确输出的影响。实验结果表明,该方法取得了比较好的效果。
{"title":"An Error Self-learning Semi-supervised Method for No-reference Image Quality Assessment","authors":"Yingjie Feng, Sumei Li, Sihan Hao","doi":"10.1109/VCIP53242.2021.9675352","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675352","url":null,"abstract":"In recent years, deep learning has achieved significant progress in many respects. However, unlike other research fields with millions of labeled data such as image recognition, only several thousand labeled images are available in image quality assessment (IQA) field for deep learning, which heavily hinders the development and application for IQA. To tackle this problem, in this paper, we proposed an error self-learning semi-supervised method for no-reference (NR) IQA (ESSIQA), which is based on deep learning. We employed an advanced full reference (FR) IQA method to expand databases and supervise the training of network. In addition, the network outputs of expanding images were used as proxy labels replacing errors between subjective scores and objective scores to achieve error self-learning. Two weights of error back propagation were designed to reduce the impact of inaccurate outputs. The experimental results show that the proposed method yielded comparative effect.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115719799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Learning-Based Blind Image Super-Resolution using Iterative Networks 基于迭代网络的深度学习盲图像超分辨
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675367
Asfand Yaar, H. Ateş, B. Gunturk
Deep learning-based single image super-resolution (SR) consistently shows superior performance compared to the traditional SR methods. However, most of these methods assume that the blur kernel used to generate the low-resolution (LR) image is known and fixed (e.g. bicubic). Since blur kernels involved in real-life scenarios are complex and unknown, per-formance of these SR methods is greatly reduced for real blurry images. Reconstruction of high-resolution (HR) images from randomly blurred and noisy LR images remains a challenging task. Typical blind SR approaches involve two sequential stages: i) kernel estimation; ii) SR image reconstruction based on estimated kernel. However, due to the ill-posed nature of this problem, an iterative refinement could be beneficial for both kernel and SR image estimate. With this observation, in this paper, we propose an image SR method based on deep learning with iterative kernel estimation and image reconstruction. Simulation results show that the proposed method outperforms state-of-the-art in blind image SR and produces visually superior results as well.
基于深度学习的单幅图像超分辨率(SR)与传统的单幅图像超分辨率(SR)方法相比,一直表现出优越的性能。然而,这些方法大多假设用于生成低分辨率(LR)图像的模糊核是已知和固定的(例如双三次)。由于现实场景中涉及的模糊核是复杂和未知的,这些SR方法的性能大大降低了真实模糊图像。从随机模糊和噪声LR图像中重建高分辨率图像仍然是一项具有挑战性的任务。典型的盲SR方法包括两个连续的阶段:i)核估计;ii)基于估计核的SR图像重建。然而,由于该问题的病态性质,迭代细化可能对核和SR图像估计都有益。基于此,本文提出了一种基于迭代核估计和图像重建的深度学习图像SR方法。仿真结果表明,该方法在盲图像SR中具有较好的性能,并且具有较好的视觉效果。
{"title":"Deep Learning-Based Blind Image Super-Resolution using Iterative Networks","authors":"Asfand Yaar, H. Ateş, B. Gunturk","doi":"10.1109/VCIP53242.2021.9675367","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675367","url":null,"abstract":"Deep learning-based single image super-resolution (SR) consistently shows superior performance compared to the traditional SR methods. However, most of these methods assume that the blur kernel used to generate the low-resolution (LR) image is known and fixed (e.g. bicubic). Since blur kernels involved in real-life scenarios are complex and unknown, per-formance of these SR methods is greatly reduced for real blurry images. Reconstruction of high-resolution (HR) images from randomly blurred and noisy LR images remains a challenging task. Typical blind SR approaches involve two sequential stages: i) kernel estimation; ii) SR image reconstruction based on estimated kernel. However, due to the ill-posed nature of this problem, an iterative refinement could be beneficial for both kernel and SR image estimate. With this observation, in this paper, we propose an image SR method based on deep learning with iterative kernel estimation and image reconstruction. Simulation results show that the proposed method outperforms state-of-the-art in blind image SR and produces visually superior results as well.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134133493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Analysis of VVC Intra Prediction Block Partitioning Structure VVC内预测块划分结构分析
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675347
Mário Saldanha, G. Sanchez, C. Marcon, L. Agostini
This paper presents an encoding time and encoding efficiency analysis of the Quadtree with nested Multi-type Tree (QTMT) structure in the Versatile Video Coding (VVC) intra-frame prediction. The QTMT structure enables VVC to improve the compression performance compared to its predecessor standard at the cost of a higher encoding complexity. The intra-frame prediction time raised about 26 times compared to the HEVC reference software, and most of this time is related to the new block partitioning structure. Thus, this paper provides a detailed description of the VVC block partitioning structure and an in-depth analysis of the QTMT structure regarding coding time and coding efficiency. Based on the presented analyses, this paper can guide outcoming works focusing on the block partitioning of the VVC intra-frame prediction.
本文对通用视频编码(VVC)帧内预测中嵌套多类型树(QTMT)结构的四叉树的编码时间和编码效率进行了分析。与之前的标准相比,QTMT结构使VVC能够以更高的编码复杂度为代价提高压缩性能。帧内预测时间比HEVC参考软件提高了约26倍,其中大部分时间与新的块划分结构有关。因此,本文对VVC块划分结构进行了详细描述,并对QTMT结构在编码时间和编码效率方面进行了深入分析。基于所提出的分析,本文可以指导后续的VVC帧内预测的块划分工作。
{"title":"Analysis of VVC Intra Prediction Block Partitioning Structure","authors":"Mário Saldanha, G. Sanchez, C. Marcon, L. Agostini","doi":"10.1109/VCIP53242.2021.9675347","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675347","url":null,"abstract":"This paper presents an encoding time and encoding efficiency analysis of the Quadtree with nested Multi-type Tree (QTMT) structure in the Versatile Video Coding (VVC) intra-frame prediction. The QTMT structure enables VVC to improve the compression performance compared to its predecessor standard at the cost of a higher encoding complexity. The intra-frame prediction time raised about 26 times compared to the HEVC reference software, and most of this time is related to the new block partitioning structure. Thus, this paper provides a detailed description of the VVC block partitioning structure and an in-depth analysis of the QTMT structure regarding coding time and coding efficiency. Based on the presented analyses, this paper can guide outcoming works focusing on the block partitioning of the VVC intra-frame prediction.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"5 12","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132329617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MAPS: Joint Multimodal Attention and POS Sequence Generation for Video Captioning MAPS:视频字幕的联合多模态注意和POS序列生成
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675348
Cong Zou, Xuchen Wang, Yaosi Hu, Zhenzhong Chen, Shan Liu
Video captioning is considered to be challenging due to the combination of video understanding and text generation. Recent progress in video captioning has been made mainly using methods of visual feature extraction and sequential learning. However, the syntax structure and semantic consistency of generated captions are not fully explored. Thus, in our work, we propose a novel multimodal attention based framework with Part-of-Speech (POS) sequence guidance to generate more accu-rate video captions. In general, the word sequence generation and POS sequence prediction are hierarchically jointly modeled in the framework. Specifically, different modalities including visual, motion, object and syntactic features are adaptively weighted and fused with the POS guided attention mechanism when computing the probability distributions of prediction words. Experimental results on two benchmark datasets, i.e. MSVD and MSR-VTT, demonstrate that the proposed method can not only fully exploit the information from video and text content, but also focus on the decisive feature modality when generating a word with a certain POS type. Thus, our approach boosts the video captioning performance as well as generating idiomatic captions.
由于视频理解和文本生成的结合,视频字幕被认为是具有挑战性的。近年来在视频字幕方面取得的进展主要是利用视觉特征提取和顺序学习方法。然而,对生成的标题的语法结构和语义一致性的研究并没有得到充分的探讨。因此,在我们的工作中,我们提出了一种新的基于多模态注意力的框架,该框架具有词性(POS)序列指导,以生成更准确的视频字幕。一般来说,该框架将词序列生成和词序预测分层联合建模。具体而言,在计算预测词的概率分布时,将视觉、运动、对象和句法特征等不同模态自适应加权融合到POS引导注意机制中。在MSVD和MSR-VTT两个基准数据集上的实验结果表明,所提出的方法不仅可以充分利用视频和文本内容中的信息,而且在生成具有特定词性类型的词时,还可以关注决定性的特征模态。因此,我们的方法提高了视频字幕的性能,并生成了习惯的字幕。
{"title":"MAPS: Joint Multimodal Attention and POS Sequence Generation for Video Captioning","authors":"Cong Zou, Xuchen Wang, Yaosi Hu, Zhenzhong Chen, Shan Liu","doi":"10.1109/VCIP53242.2021.9675348","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675348","url":null,"abstract":"Video captioning is considered to be challenging due to the combination of video understanding and text generation. Recent progress in video captioning has been made mainly using methods of visual feature extraction and sequential learning. However, the syntax structure and semantic consistency of generated captions are not fully explored. Thus, in our work, we propose a novel multimodal attention based framework with Part-of-Speech (POS) sequence guidance to generate more accu-rate video captions. In general, the word sequence generation and POS sequence prediction are hierarchically jointly modeled in the framework. Specifically, different modalities including visual, motion, object and syntactic features are adaptively weighted and fused with the POS guided attention mechanism when computing the probability distributions of prediction words. Experimental results on two benchmark datasets, i.e. MSVD and MSR-VTT, demonstrate that the proposed method can not only fully exploit the information from video and text content, but also focus on the decisive feature modality when generating a word with a certain POS type. Thus, our approach boosts the video captioning performance as well as generating idiomatic captions.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"1129 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133802942","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cross-Component Sample Offset for Image and Video Coding 图像和视频编码的跨分量样本偏移
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675355
Yixin Du, Xin Zhao, Shanchun Liu
Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a non-linear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.
现有的跨分量视频编码技术在提高编码效率方面显示出巨大的潜力。跨分量编码技术的基本思想是尊重不同颜色分量之间的统计相关性。本文根据亮度分量往往包含较多的纹理,而色度分量相对光滑的特点,提出了一种用于图像和视频编码的跨分量样本偏移(Cross-Component Sample Offset, CCSO)方法。CCSO的关键组件是一个非线性偏移映射机制,实现为一个查找表(LUT)。映射的输入是亮度分量的共定位重构样本,输出是色度分量上的偏移值。所提出的方法已经在libaom的最新版本上实现。实验结果表明,该方法在AV1的基础上可节省1.16%的随机存取(RA) bd率,且编解码时间边际增加。
{"title":"Cross-Component Sample Offset for Image and Video Coding","authors":"Yixin Du, Xin Zhao, Shanchun Liu","doi":"10.1109/VCIP53242.2021.9675355","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675355","url":null,"abstract":"Existing cross-component video coding technologies have shown great potential on improving coding efficiency. The fundamental insight of cross-component coding technology is respecting the statistical correlations among different color components. In this paper, a Cross-Component Sample Offset (CCSO) approach for image and video coding is proposed inspired by the observation that, luma component tends to contain more texture, while chroma component is relatively smoother. The key component of CCSO is a non-linear offset mapping mechanism implemented as a look-up-table (LUT). The input of the mapping is the co-located reconstructed samples of luma component, and the output is offset values applied on chroma component. The proposed method has been implemented on top of a recent version of libaom. Experimental results show that the proposed approach brings 1.16% Random Access (RA) BD-rate saving on top of AV1 with marginal encoding/decoding time increase.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132858740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Portable Congenital Glaucoma Detection System 便携式先天性青光眼检测系统
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675423
Chunjun Hua, Menghan Hu, Yue Wu
Congenital glaucoma is an eye disease caused by embryonic developmental disorders, which damages the optic nerve. In this demo paper, we proposed a portable non-contact congenital glaucoma detection system, which can evaluate the condition of children's eyes by measuring the cornea size using the developed mobile application. The system consists of two modules viz. cornea identification module and diagnosis module. This system can be utilized by everyone with a smartphone, which is of wider application. It can be used as a convenient home self-examination tool for children in the large-scale screening of congenital glaucoma. The demo video of the proposed detection system is available at: https://doi.org/10.6084/m9.figshare.14728854.v1.
先天性青光眼是一种由胚胎发育障碍引起的眼部疾病,主要损害视神经。在本文的演示中,我们提出了一种便携式非接触式先天性青光眼检测系统,该系统可以使用开发的移动应用程序通过测量角膜大小来评估儿童眼睛的状况。该系统由角膜识别模块和角膜诊断模块两个模块组成。这个系统可以让每个人都有智能手机使用,应用范围更广。在先天性青光眼的大规模筛查中,可作为儿童方便的家庭自检工具。所提出的检测系统的演示视频可在:https://doi.org/10.6084/m9.figshare.14728854.v1。
{"title":"Portable Congenital Glaucoma Detection System","authors":"Chunjun Hua, Menghan Hu, Yue Wu","doi":"10.1109/VCIP53242.2021.9675423","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675423","url":null,"abstract":"Congenital glaucoma is an eye disease caused by embryonic developmental disorders, which damages the optic nerve. In this demo paper, we proposed a portable non-contact congenital glaucoma detection system, which can evaluate the condition of children's eyes by measuring the cornea size using the developed mobile application. The system consists of two modules viz. cornea identification module and diagnosis module. This system can be utilized by everyone with a smartphone, which is of wider application. It can be used as a convenient home self-examination tool for children in the large-scale screening of congenital glaucoma. The demo video of the proposed detection system is available at: https://doi.org/10.6084/m9.figshare.14728854.v1.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116542331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Urban Planter: A Web App for Automatic Classification of Urban Plants 城市植物:一个用于城市植物自动分类的Web应用程序
Pub Date : 2021-12-05 DOI: 10.1109/VCIP53242.2021.9675318
Sarit Divekar, Irina Rabaev, Marina Litvak
Plant classification requires an expert because subtle differences in leaves or petal forms might differentiate between different species. On the contrary, some species are characterized by high variability in appearance. This paper introduces a web app for assisting people in identifying plants for discovering the best growing methods. The uploaded picture is submitted to the back-end server, and a pre-trained neural network classifies it to one of the predefined classes. The classification label and confidence are displayed to the end user on the front-end page. The application focuses on the house and garden plant species that can be grown mainly in a desert climate and are not covered by existing datasets. For training a model, we collected the Urban Planter dataset. The installation code of the alpha version and the demo video of the app can be found on https://github.com/UrbanPlanter/urbanplanterapp.
植物分类需要专家,因为叶子或花瓣形态的细微差异可能会区分不同的物种。相反,有些物种的特征是在外观上有很大的变异性。本文介绍了一个帮助人们识别植物以发现最佳种植方法的web应用程序。上传的图片被提交到后端服务器,一个预训练的神经网络将其分类到一个预定义的类中。分类标签和置信度在前端页面显示给最终用户。该应用程序侧重于主要在沙漠气候中生长的房屋和花园植物物种,这些物种未被现有数据集覆盖。为了训练模型,我们收集了Urban Planter数据集。alpha版本的安装代码和演示视频可以在https://github.com/UrbanPlanter/urbanplanterapp上找到。
{"title":"Urban Planter: A Web App for Automatic Classification of Urban Plants","authors":"Sarit Divekar, Irina Rabaev, Marina Litvak","doi":"10.1109/VCIP53242.2021.9675318","DOIUrl":"https://doi.org/10.1109/VCIP53242.2021.9675318","url":null,"abstract":"Plant classification requires an expert because subtle differences in leaves or petal forms might differentiate between different species. On the contrary, some species are characterized by high variability in appearance. This paper introduces a web app for assisting people in identifying plants for discovering the best growing methods. The uploaded picture is submitted to the back-end server, and a pre-trained neural network classifies it to one of the predefined classes. The classification label and confidence are displayed to the end user on the front-end page. The application focuses on the house and garden plant species that can be grown mainly in a desert climate and are not covered by existing datasets. For training a model, we collected the Urban Planter dataset. The installation code of the alpha version and the demo video of the app can be found on https://github.com/UrbanPlanter/urbanplanterapp.","PeriodicalId":114062,"journal":{"name":"2021 International Conference on Visual Communications and Image Processing (VCIP)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115999684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2021 International Conference on Visual Communications and Image Processing (VCIP)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1