首页 > 最新文献

Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications最新文献

英文 中文
Point Cloud Quality Assessment Using Cross-correlation of Deep Features 基于深度特征互相关的点云质量评价
M. Tliba, A. Chetouani, G. Valenzise, F. Dufaux
3D point clouds have emerged as a preferred format for recent immersive communication systems, due to the six degrees of freedom they offer. The huge data size of point clouds, which consists of both geometry and color information, has motivated the development of efficient compression schemes recently. To support the optimization of these algorithms, adequate and efficient perceptual quality metrics are needed. In this paper we propose a novel end-to-end deep full-reference framework for 3D point cloud quality assessment, considering both the geometry and color information. We use two identical neural networks, based on a residual permutation-invariant architecture, for extracting local features from a sparse set of patches extracted from the point cloud. Afterwards, we measure the cross-correlation between the embedding of pristine and distorted point clouds to quantify the global shift in the features due to visual distortion. The proposed scheme achieves comparable results to state-of-the-art metrics even when a small number of centroids are used, reducing the computational complexity.
3D点云已经成为最近沉浸式通信系统的首选格式,因为它们提供了六个自由度。点云数据量巨大,包含几何信息和颜色信息,近年来促使了高效压缩方案的发展。为了支持这些算法的优化,需要足够和有效的感知质量度量。在本文中,我们提出了一个新的端到端深度全参考框架,用于三维点云质量评估,同时考虑几何和颜色信息。我们使用两个相同的神经网络,基于残差置换不变架构,从从点云提取的稀疏补丁集中提取局部特征。然后,我们测量原始点云和扭曲点云嵌入之间的相互关系,以量化由于视觉失真而导致的特征的全局偏移。即使使用少量质心,所提出的方案也能达到与最先进的指标相当的结果,从而降低了计算复杂度。
{"title":"Point Cloud Quality Assessment Using Cross-correlation of Deep Features","authors":"M. Tliba, A. Chetouani, G. Valenzise, F. Dufaux","doi":"10.1145/3552469.3555710","DOIUrl":"https://doi.org/10.1145/3552469.3555710","url":null,"abstract":"3D point clouds have emerged as a preferred format for recent immersive communication systems, due to the six degrees of freedom they offer. The huge data size of point clouds, which consists of both geometry and color information, has motivated the development of efficient compression schemes recently. To support the optimization of these algorithms, adequate and efficient perceptual quality metrics are needed. In this paper we propose a novel end-to-end deep full-reference framework for 3D point cloud quality assessment, considering both the geometry and color information. We use two identical neural networks, based on a residual permutation-invariant architecture, for extracting local features from a sparse set of patches extracted from the point cloud. Afterwards, we measure the cross-correlation between the embedding of pristine and distorted point clouds to quantify the global shift in the features due to visual distortion. The proposed scheme achieves comparable results to state-of-the-art metrics even when a small number of centroids are used, reducing the computational complexity.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125606635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
No-reference Point Clouds Quality Assessment using Transformer and Visual Saliency 使用变压器和视觉显著性的无参考点云质量评估
Salima Bourbia, Ayoub Karine, A. Chetouani, M. El Hassouni, M. Jridi
Quality estimation of 3D objects/scenes represented by cloud point is a crucial and challenging task in computer vision. In real-world applications, reference data is not always available, which motivates the development of new point cloud quality assessment (PCQA) metrics that do not require the original 3D point cloud (3DPC). This family of methods is called no-reference or blind PCQA. In this context, we propose a deep-learning-based approach that benefits from the advantage of the self-attention mechanism in transformers to accurately predict the perceptual quality score for each degraded 3DPC. Additionally, we introduce the use of saliency maps to reflect the human visual system behavior that is attracted to some specific regions compared to others during the evaluation. To this end, we first render 2D projections (i.e. views) of a 3DPC from different viewpoints. Then, we weight the obtained projected images with their corresponding saliency maps. After that, we discard the majority of the background information by extracting sub-salient images. The latter is introduced as a sequential input of the vision transformer in order to extract the global contextual information and to predict the quality scores of the sub-images. Finally, we average the scores of all the salient sub-images to obtain the perceptual 3DPC quality score. We evaluate the performance of our model on the ICIP2020 and SJTU point cloud quality assessment benchmarks. Experimental results show that our model achieves promising performance compared to the state-of-the-art point cloud quality assessment metrics.
云点表示的三维物体/场景的质量估计是计算机视觉中的一个关键和具有挑战性的任务。在实际应用中,参考数据并不总是可用的,这促使开发新的不需要原始3D点云(3DPC)的点云质量评估(PCQA)指标。这类方法被称为无参考或盲PCQA。在这种情况下,我们提出了一种基于深度学习的方法,利用变压器中自注意机制的优势,准确预测每个退化3DPC的感知质量分数。此外,我们介绍了显著性地图的使用,以反映在评估过程中被某些特定区域所吸引的人类视觉系统行为。为此,我们首先从不同的角度渲染3DPC的2D投影(即视图)。然后,我们将得到的投影图像与其相应的显著性图进行加权。之后,我们通过提取亚显著图像来丢弃大部分背景信息。后者作为视觉转换器的顺序输入,用于提取全局上下文信息并预测子图像的质量分数。最后,我们对所有显著子图像的得分进行平均,得到感知3DPC质量得分。我们在ICIP2020和上海交通大学点云质量评估基准上评估了模型的性能。实验结果表明,与目前最先进的点云质量评估指标相比,我们的模型取得了很好的性能。
{"title":"No-reference Point Clouds Quality Assessment using Transformer and Visual Saliency","authors":"Salima Bourbia, Ayoub Karine, A. Chetouani, M. El Hassouni, M. Jridi","doi":"10.1145/3552469.3555713","DOIUrl":"https://doi.org/10.1145/3552469.3555713","url":null,"abstract":"Quality estimation of 3D objects/scenes represented by cloud point is a crucial and challenging task in computer vision. In real-world applications, reference data is not always available, which motivates the development of new point cloud quality assessment (PCQA) metrics that do not require the original 3D point cloud (3DPC). This family of methods is called no-reference or blind PCQA. In this context, we propose a deep-learning-based approach that benefits from the advantage of the self-attention mechanism in transformers to accurately predict the perceptual quality score for each degraded 3DPC. Additionally, we introduce the use of saliency maps to reflect the human visual system behavior that is attracted to some specific regions compared to others during the evaluation. To this end, we first render 2D projections (i.e. views) of a 3DPC from different viewpoints. Then, we weight the obtained projected images with their corresponding saliency maps. After that, we discard the majority of the background information by extracting sub-salient images. The latter is introduced as a sequential input of the vision transformer in order to extract the global contextual information and to predict the quality scores of the sub-images. Finally, we average the scores of all the salient sub-images to obtain the perceptual 3DPC quality score. We evaluate the performance of our model on the ICIP2020 and SJTU point cloud quality assessment benchmarks. Experimental results show that our model achieves promising performance compared to the state-of-the-art point cloud quality assessment metrics.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129243559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Adversarial Attacks Against Blind Image Quality Assessment Models 针对盲图像质量评估模型的对抗性攻击
J. Korhonen, Junyong You
Several deep models for blind image quality assessment (BIQA) have been proposed during the past few years, with promising results on standard image quality datasets. However, generalization of BIQA models beyond the standard content remains a challenge. In this paper, we study basic adversarial attack techniques to assess the robustness of representative deep BIQA models. Our results show that adversarial images created for a simple substitute BIQA model (i.e. white-box scenario) are transferable as such and able to deceive also several other more complex BIQA models (i.e. black-box scenario). We also investigated some basic defense mechanisms. Our results indicate that re-training BIQA models with a dataset augmented with adversarial images improves robustness of several models, but at the cost of decreased quality prediction accuracy on genuine images.
在过去的几年中,人们提出了几种用于盲图像质量评估(BIQA)的深度模型,并在标准图像质量数据集上取得了令人满意的结果。然而,在标准内容之外推广BIQA模型仍然是一个挑战。在本文中,我们研究了基本的对抗性攻击技术来评估具有代表性的深度BIQA模型的鲁棒性。我们的研究结果表明,为一个简单的替代BIQA模型(即白盒场景)创建的对抗图像是可转移的,并且能够欺骗其他几个更复杂的BIQA模型(即黑盒场景)。我们还研究了一些基本的防御机制。我们的研究结果表明,使用增强了对抗图像的数据集重新训练BIQA模型可以提高几个模型的鲁棒性,但代价是降低了真实图像的质量预测精度。
{"title":"Adversarial Attacks Against Blind Image Quality Assessment Models","authors":"J. Korhonen, Junyong You","doi":"10.1145/3552469.3555715","DOIUrl":"https://doi.org/10.1145/3552469.3555715","url":null,"abstract":"Several deep models for blind image quality assessment (BIQA) have been proposed during the past few years, with promising results on standard image quality datasets. However, generalization of BIQA models beyond the standard content remains a challenge. In this paper, we study basic adversarial attack techniques to assess the robustness of representative deep BIQA models. Our results show that adversarial images created for a simple substitute BIQA model (i.e. white-box scenario) are transferable as such and able to deceive also several other more complex BIQA models (i.e. black-box scenario). We also investigated some basic defense mechanisms. Our results indicate that re-training BIQA models with a dataset augmented with adversarial images improves robustness of several models, but at the cost of decreased quality prediction accuracy on genuine images.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124651779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Impact of Content on Subjective Quality of Experience Assessment for 3D Video 内容对3D视频主观体验质量评价的影响
Dawid Juszka, Z. Papir
Ongoing improvements in the field of visual entertainment may incline users to display 3D video content on various terminal devices provided satisfying Quality of Experience. This study aims to determine whether the cognitive features of appealing and yet uncommon 3D content may obfuscate subjective QoE measurements performed under different bitrates. To test the hypothesis two 3D video databases are compared in terms of a perceived QoE under an innovative scenario. The reference database is GroTruQoE-3D (VQEG) including short artificial clips. The authorial 3D video database DJ3D contains longer clips from feature films with a proved substantial level of cognitive features. The 3D video content features are operationalised by three cognitive attributes (attractiveness, interestingness, 3D effect experience). Gradation of video quality is introduced by streaming at four bitrate levels. The collected subjects' scores are statistically analysed with a stochastic dominance test adjusted to a 5-point Likert scale. The obtained results show that quality assessment scores depend on the intensity of the cognitive attributes of the content. Sequences commonly used in subjective QoE experiments are more vulnerable to the intensity of subjective content attributes (visual attractiveness, interestingness, and 3D effect experience) than sequences from commercial feature films and documentaries. Moreover, it is shown that the test material commonly used in research is assessed higher for lower bitrates. In view of key results QoE researchers should consider to use test material originating from commercially available content to minimize content impact on QoE assessment scores collected during subjective experiment. The research contributes to the QoE best practices by paying attention to 3D cognitive attributes that may obfuscate subjective scores. The innovative scenario for comparing video databases with the stochastic dominance test adjusted to the ordinal scale is proposed. The approach may be useful in a broader context when an emerging service operator wants to ascertain whether subjective QoE tests are not substantially biased by the service novelty.
视觉娱乐领域的持续改进可能使用户倾向于在提供满意体验质量的各种终端设备上显示3D视频内容。本研究旨在确定吸引人但不常见的3D内容的认知特征是否会混淆在不同比特率下进行的主观QoE测量。为了验证这一假设,在创新场景下比较了两个3D视频数据库的感知QoE。参考数据库为GroTruQoE-3D (VQEG),包括人工短剪辑。作者的3D视频数据库DJ3D包含较长的故事片片段,这些片段已被证明具有相当水平的认知特征。3D视频内容特征通过三个认知属性(吸引力、趣味性、3D效果体验)来实现。视频质量的分级是通过四比特率的流式传输来实现的。收集到的被试的得分采用调整为5分李克特量表的随机优势检验进行统计分析。得到的结果表明,质量评估分数取决于内容的认知属性的强度。主观QoE实验中常用的序列比商业故事片和纪录片的序列更容易受到主观内容属性(视觉吸引力、趣味性和3D效果体验)强度的影响。此外,研究表明,通常用于研究的测试材料在较低的比特率下评估更高。针对关键结果,QoE研究人员应考虑使用来源于市售内容的测试材料,以尽量减少主观实验中收集的内容对QoE评估分数的影响。该研究通过关注可能混淆主观评分的3D认知属性,为QoE最佳实践做出了贡献。提出了一种基于随机优势度检验的视频数据库比较创新方案。当新兴的服务运营商想要确定主观QoE测试是否因服务的新颖性而产生重大偏差时,该方法在更广泛的背景下可能是有用的。
{"title":"Impact of Content on Subjective Quality of Experience Assessment for 3D Video","authors":"Dawid Juszka, Z. Papir","doi":"10.1145/3552469.3555717","DOIUrl":"https://doi.org/10.1145/3552469.3555717","url":null,"abstract":"Ongoing improvements in the field of visual entertainment may incline users to display 3D video content on various terminal devices provided satisfying Quality of Experience. This study aims to determine whether the cognitive features of appealing and yet uncommon 3D content may obfuscate subjective QoE measurements performed under different bitrates. To test the hypothesis two 3D video databases are compared in terms of a perceived QoE under an innovative scenario. The reference database is GroTruQoE-3D (VQEG) including short artificial clips. The authorial 3D video database DJ3D contains longer clips from feature films with a proved substantial level of cognitive features. The 3D video content features are operationalised by three cognitive attributes (attractiveness, interestingness, 3D effect experience). Gradation of video quality is introduced by streaming at four bitrate levels. The collected subjects' scores are statistically analysed with a stochastic dominance test adjusted to a 5-point Likert scale. The obtained results show that quality assessment scores depend on the intensity of the cognitive attributes of the content. Sequences commonly used in subjective QoE experiments are more vulnerable to the intensity of subjective content attributes (visual attractiveness, interestingness, and 3D effect experience) than sequences from commercial feature films and documentaries. Moreover, it is shown that the test material commonly used in research is assessed higher for lower bitrates. In view of key results QoE researchers should consider to use test material originating from commercially available content to minimize content impact on QoE assessment scores collected during subjective experiment. The research contributes to the QoE best practices by paying attention to 3D cognitive attributes that may obfuscate subjective scores. The innovative scenario for comparing video databases with the stochastic dominance test adjusted to the ordinal scale is proposed. The approach may be useful in a broader context when an emerging service operator wants to ascertain whether subjective QoE tests are not substantially biased by the service novelty.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128056756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-Reference Quality Assessment of Stereoscopic Video Based on Deep Frequency Perception 基于深频感知的立体视频无参考质量评价
Shuai Xiao, Jiabao Wen, Jiachen Yang, Yan Zhou
The purpose of stereo video quality assessment (SVQA) is to easily and quickly measure the quality of stereo video, and strive to reach a consensus with human visual perception. Stereo video contains more perceptual information and involves more visual perception theory than 2D image/video,making SVQA more challenging. Aiming at the effect of distortion on the frequency domain characteristics of stereo video, a SVQA method based on frequency domain depth perception is proposed. Specifically, the frequency domain is utilized while minimizing the changes in the existing network structure to realize the in-depth exploration of the frequency domain characteristics without changing the original frame size of stereo video. Experiments are carried out on three public stereo video databases, namely NAMA3DS1-COSPAD1 database, WaterlooIVC 3D Video Phase I database, and QI-SVQA database. From the experimental results, it can be seen that the proposed method has good quality prediction ability, especially on asymmetric compressed stereo video databases.
立体视频质量评估(SVQA)的目的是方便、快速地测量立体视频的质量,力求与人类视觉感知达成共识。与2D图像/视频相比,立体视频包含更多的感知信息,涉及更多的视觉感知理论,使得SVQA更具挑战性。针对失真对立体视频频域特性的影响,提出了一种基于频域深度感知的SVQA方法。具体来说,就是在尽量减少对现有网络结构变化的同时,利用频域,在不改变立体视频原有帧长的情况下,实现对频域特征的深入挖掘。实验在NAMA3DS1-COSPAD1、WaterlooIVC 3D video Phase I、QI-SVQA三个公共立体视频数据库上进行。实验结果表明,该方法具有较好的质量预测能力,特别是在非对称压缩立体视频数据库上。
{"title":"No-Reference Quality Assessment of Stereoscopic Video Based on Deep Frequency Perception","authors":"Shuai Xiao, Jiabao Wen, Jiachen Yang, Yan Zhou","doi":"10.1145/3552469.3555711","DOIUrl":"https://doi.org/10.1145/3552469.3555711","url":null,"abstract":"The purpose of stereo video quality assessment (SVQA) is to easily and quickly measure the quality of stereo video, and strive to reach a consensus with human visual perception. Stereo video contains more perceptual information and involves more visual perception theory than 2D image/video,making SVQA more challenging. Aiming at the effect of distortion on the frequency domain characteristics of stereo video, a SVQA method based on frequency domain depth perception is proposed. Specifically, the frequency domain is utilized while minimizing the changes in the existing network structure to realize the in-depth exploration of the frequency domain characteristics without changing the original frame size of stereo video. Experiments are carried out on three public stereo video databases, namely NAMA3DS1-COSPAD1 database, WaterlooIVC 3D Video Phase I database, and QI-SVQA database. From the experimental results, it can be seen that the proposed method has good quality prediction ability, especially on asymmetric compressed stereo video databases.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130526624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
From Just Noticeable Differences to Image Quality 从仅仅明显的差异到图像质量
Ali Ak, Andréas Pastor, P. Callet
Distortions can occur due to several processing steps in the imaging chain of a wide range of multimedia content. The visibility of distortions is highly correlated with the overall perceived quality of a certain multimedia content. Subjective quality evaluation of images relies mainly on mean opinion scores (MOS) to provide ground-truth for measuring image quality on a continuous scale. Alternatively, just noticeable difference (JND) defines the visibility of distortions as a binary measurement based on an anchor point. By using the pristine reference as the anchor, the first JND point can be determined. This first JND point provides an intrinsic quantification of the visible distortions within the multimedia content. Therefore, it is intuitively appealing to develop a quality assessment model by utilizing the JND information as the fundamental cornerstone. In this work, we use the first JND point information to train a Siamese Convolutional Neural Network to predict image quality scores on a continuous scale. To ensure generalization, we incorporated a white-box optical retinal pathway model to acquire achromatic responses. The proposed model, D-JNDQ, displays a competitive performance on cross dataset evaluation conducted on TID2013 dataset, proving the generalization of the model on unseen distortion types and supra-threshold distortion levels.
由于在广泛的多媒体内容的成像链中的几个处理步骤,可能会发生失真。扭曲的可见性与某多媒体内容的整体感知质量高度相关。图像的主观质量评价主要依赖于平均意见分数(MOS),为连续尺度上的图像质量测量提供基础真值。或者,仅可注意差异(JND)将扭曲的可见性定义为基于锚点的二值测量。通过使用原始参考作为锚点,可以确定第一个JND点。第一个JND点提供了多媒体内容中可见失真的内在量化。因此,利用JND信息作为基本基石,开发一个质量评估模型是直观的吸引力。在这项工作中,我们使用第一个JND点信息来训练暹罗卷积神经网络来预测连续尺度上的图像质量分数。为了保证泛化,我们采用了白盒光学视网膜通路模型来获取消色差反应。本文提出的模型D-JNDQ在TID2013数据集上进行的跨数据集评估中表现出较好的性能,证明了该模型在未见失真类型和超阈值失真水平上的泛化性。
{"title":"From Just Noticeable Differences to Image Quality","authors":"Ali Ak, Andréas Pastor, P. Callet","doi":"10.1145/3552469.3555712","DOIUrl":"https://doi.org/10.1145/3552469.3555712","url":null,"abstract":"Distortions can occur due to several processing steps in the imaging chain of a wide range of multimedia content. The visibility of distortions is highly correlated with the overall perceived quality of a certain multimedia content. Subjective quality evaluation of images relies mainly on mean opinion scores (MOS) to provide ground-truth for measuring image quality on a continuous scale. Alternatively, just noticeable difference (JND) defines the visibility of distortions as a binary measurement based on an anchor point. By using the pristine reference as the anchor, the first JND point can be determined. This first JND point provides an intrinsic quantification of the visible distortions within the multimedia content. Therefore, it is intuitively appealing to develop a quality assessment model by utilizing the JND information as the fundamental cornerstone. In this work, we use the first JND point information to train a Siamese Convolutional Neural Network to predict image quality scores on a continuous scale. To ensure generalization, we incorporated a white-box optical retinal pathway model to acquire achromatic responses. The proposed model, D-JNDQ, displays a competitive performance on cross dataset evaluation conducted on TID2013 dataset, proving the generalization of the model on unseen distortion types and supra-threshold distortion levels.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115505810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Simulating Visual Mechanisms by Sequential Spatial-Channel Attention for Image Quality Assessment 基于序列空间通道注意力的图像质量评价视觉机制模拟
Junyong You, J. Korhonen
As a subjective concept, image quality assessment (IQA) is significantly affected by perceptual mechanisms. Two mutually influenced mechanisms, namely spatial attention and contrast sensitivity, are particularly important for IQA. This paper aims to explore a deep learning approach based on transformer for the two mechanisms. By converting contrast sensitivity to attention representation, a unified multi-head attention module is performed on spatial and channel features in transformer encoder to simulate the two mechanisms in IQA. Sequential spatial-channel self-attention is proposed to avoid expensive computation in the classical Transformer model. In addition, as image rescaling can potentially affect perceived quality, zero-padding and masking with assigning special attention weights are performed to handle arbitrary image resolutions without requiring image rescaling. The evaluation results on publicly available large-scale IQA databases have demonstrated outstanding performance and generalization of the proposed IQA model.
图像质量评价作为一个主观概念,受到感知机制的显著影响。两个相互影响的机制,即空间注意和对比敏感性,对IQA尤为重要。本文旨在针对这两种机制探索一种基于变压器的深度学习方法。通过将对比灵敏度转换为注意表示,对变压器编码器的空间和通道特征进行统一的多头注意模块,模拟了IQA中的两种机制。为了避免经典Transformer模型中计算量大的问题,提出了顺序空间信道自关注。此外,由于图像重新缩放可能会潜在地影响感知质量,因此执行零填充和分配特殊注意权重的掩码来处理任意图像分辨率,而不需要图像重新缩放。在公开的大规模IQA数据库上的评估结果表明,所提出的IQA模型具有出色的性能和泛化性。
{"title":"Simulating Visual Mechanisms by Sequential Spatial-Channel Attention for Image Quality Assessment","authors":"Junyong You, J. Korhonen","doi":"10.1145/3552469.3555714","DOIUrl":"https://doi.org/10.1145/3552469.3555714","url":null,"abstract":"As a subjective concept, image quality assessment (IQA) is significantly affected by perceptual mechanisms. Two mutually influenced mechanisms, namely spatial attention and contrast sensitivity, are particularly important for IQA. This paper aims to explore a deep learning approach based on transformer for the two mechanisms. By converting contrast sensitivity to attention representation, a unified multi-head attention module is performed on spatial and channel features in transformer encoder to simulate the two mechanisms in IQA. Sequential spatial-channel self-attention is proposed to avoid expensive computation in the classical Transformer model. In addition, as image rescaling can potentially affect perceived quality, zero-padding and masking with assigning special attention weights are performed to handle arbitrary image resolutions without requiring image rescaling. The evaluation results on publicly available large-scale IQA databases have demonstrated outstanding performance and generalization of the proposed IQA model.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134539481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating the Quality of Experience of Immersive Contents 沉浸式内容的体验质量评估
Mylène C. Q. Farias
Recent technology advancements have driven the production of plenoptic devices that capture and display visual contents, not only as texture information (as in 2D images) but also as 3D texture-geometric information. These devices represent the visual information using an approximation of the plenoptic illumination function that can describe visible objects from any point in the 3D space. Depending on the capturing device, this approximation can correspond to holograms, light fields, or point clouds imaging formats. Naturally, the success of immersive applications depends on the acceptability of these formats by the final user, which ultimately depends on the perceived quality of experience. Several subjective experiments have been performed with the goal of understanding how humans perceive immersive media in 6 Degree-of-Freedom (6DoF) environments and what are the impacts of different rendering and compression techniques on the perceived visual quality. In this context, an open area of research is the design of objective methods that estimate the quality of this type of content. In this talk, I describe a set of objective methods designed to estimate the quality of immersive visual contents -- an important aspect of the overall user quality of experience. The methods use different techniques, from texture operators to convolution neural networks, to estimate quality by also taking into consideration the specificities of the different formats. Finally, I discuss some of the exciting research challenges in the area of realistic immersive multimedia applications.
最近的技术进步推动了捕捉和显示视觉内容的全光学设备的生产,不仅作为纹理信息(如2D图像),而且作为3D纹理几何信息。这些设备使用近似于全光照明功能来表示视觉信息,该功能可以从3D空间的任何点描述可见物体。根据捕获设备的不同,这种近似可以对应于全息图、光场或点云成像格式。当然,沉浸式应用的成功取决于最终用户对这些格式的接受程度,而这最终取决于体验的感知质量。为了了解人类如何在6自由度(6DoF)环境中感知沉浸式媒体,以及不同的渲染和压缩技术对感知视觉质量的影响,已经进行了一些主观实验。在这种情况下,一个开放的研究领域是设计客观的方法来估计这类内容的质量。在这次演讲中,我将描述一套用于评估沉浸式视觉内容质量的客观方法——这是整体用户体验质量的一个重要方面。这些方法使用不同的技术,从纹理算子到卷积神经网络,通过考虑不同格式的特殊性来估计质量。最后,我讨论了现实沉浸式多媒体应用领域的一些令人兴奋的研究挑战。
{"title":"Estimating the Quality of Experience of Immersive Contents","authors":"Mylène C. Q. Farias","doi":"10.1145/3552469.3557784","DOIUrl":"https://doi.org/10.1145/3552469.3557784","url":null,"abstract":"Recent technology advancements have driven the production of plenoptic devices that capture and display visual contents, not only as texture information (as in 2D images) but also as 3D texture-geometric information. These devices represent the visual information using an approximation of the plenoptic illumination function that can describe visible objects from any point in the 3D space. Depending on the capturing device, this approximation can correspond to holograms, light fields, or point clouds imaging formats. Naturally, the success of immersive applications depends on the acceptability of these formats by the final user, which ultimately depends on the perceived quality of experience. Several subjective experiments have been performed with the goal of understanding how humans perceive immersive media in 6 Degree-of-Freedom (6DoF) environments and what are the impacts of different rendering and compression techniques on the perceived visual quality. In this context, an open area of research is the design of objective methods that estimate the quality of this type of content. In this talk, I describe a set of objective methods designed to estimate the quality of immersive visual contents -- an important aspect of the overall user quality of experience. The methods use different techniques, from texture operators to convolution neural networks, to estimate quality by also taking into consideration the specificities of the different formats. Finally, I discuss some of the exciting research challenges in the area of realistic immersive multimedia applications.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123453057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On Objective and Subjective Quality of 6DoF Synthesized Live Immersive Videos 六自由度合成现场沉浸式视频的主客观质量研究
Yuan-Chun Sun, Shengkun Tang, Ching-Ting Wang, Cheng-Hsin Hsu
We address the problem of quantifying the perceived quality in 6DoF (Degree-of-Freedom) live immersive video in two steps. First, we develop a set of tools to generate (or collect) datasets in a photorealistic simulator, AirSim. Using these tools, we get to change diverse settings of live immersive videos, such as scenes, trajectories, camera placements, and encoding parameters. Second, we develop objective and subjective evaluation procedures, and carry out evaluations on a sample immersive video codec, MPEG MIV, using our own dataset. Several insights were found through our experiments: (1) the two synthesizers in TMIV produce comparable target view quality, but RVS runs 2 times faster; (2) Quantization Parameter (QP) is a good control knob to exercise target view quality and bitrate, but camera placements (or trajectories) also impose significant impacts; and (3) overall subjective quality has strong linear/rank correlation with subjective similarity, sharpness, and color. These findings shed some light on the future research problems for the development of emerging applications relying on immersive interactions.
我们分两步解决了6DoF(自由度)实时沉浸式视频的感知质量量化问题。首先,我们开发了一套工具来生成(或收集)数据集在一个逼真的模拟器,AirSim。使用这些工具,我们可以改变现场沉浸式视频的各种设置,例如场景,轨迹,摄像机位置和编码参数。其次,我们开发了客观和主观的评估程序,并使用我们自己的数据集对示例沉浸式视频编解码器MPEG MIV进行了评估。通过我们的实验,我们发现了一些见解:(1)TMIV中的两个合成器产生相当的目标视图质量,但RVS运行速度快2倍;(2)量化参数(QP)可以很好地控制目标视场质量和比特率,但摄像机位置(或轨迹)也会对目标视场质量和比特率产生重大影响;(3)总体主观素质与主观相似性、主观清晰度和主观色彩具有较强的线性/等级相关性。这些发现为依赖沉浸式交互的新兴应用开发的未来研究问题提供了一些启示。
{"title":"On Objective and Subjective Quality of 6DoF Synthesized Live Immersive Videos","authors":"Yuan-Chun Sun, Shengkun Tang, Ching-Ting Wang, Cheng-Hsin Hsu","doi":"10.1145/3552469.3555709","DOIUrl":"https://doi.org/10.1145/3552469.3555709","url":null,"abstract":"We address the problem of quantifying the perceived quality in 6DoF (Degree-of-Freedom) live immersive video in two steps. First, we develop a set of tools to generate (or collect) datasets in a photorealistic simulator, AirSim. Using these tools, we get to change diverse settings of live immersive videos, such as scenes, trajectories, camera placements, and encoding parameters. Second, we develop objective and subjective evaluation procedures, and carry out evaluations on a sample immersive video codec, MPEG MIV, using our own dataset. Several insights were found through our experiments: (1) the two synthesizers in TMIV produce comparable target view quality, but RVS runs 2 times faster; (2) Quantization Parameter (QP) is a good control knob to exercise target view quality and bitrate, but camera placements (or trajectories) also impose significant impacts; and (3) overall subjective quality has strong linear/rank correlation with subjective similarity, sharpness, and color. These findings shed some light on the future research problems for the development of emerging applications relying on immersive interactions.","PeriodicalId":296389,"journal":{"name":"Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128195222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
Proceedings of the 2nd Workshop on Quality of Experience in Visual Multimedia Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1