Transformer-Based Light Field Geometry Learning for No-Reference Light Field Image Quality Assessment

IF 3.2 1区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Broadcasting Pub Date : 2024-01-31 DOI:10.1109/TBC.2024.3353579

Lili Lin;Siyu Bai;Mengjia Qu;Xuehui Wei;Luyao Wang;Feifan Wu;Biao Liu;Wenhui Zhou;Ercan Engin Kuruoglu

{"title":"Transformer-Based Light Field Geometry Learning for No-Reference Light Field Image Quality Assessment","authors":"Lili Lin;Siyu Bai;Mengjia Qu;Xuehui Wei;Luyao Wang;Feifan Wu;Biao Liu;Wenhui Zhou;Ercan Engin Kuruoglu","doi":"10.1109/TBC.2024.3353579","DOIUrl":null,"url":null,"abstract":"Elevating traditional 2-dimensional (2D) plane display to 4-dimensional (4D) light field display can significantly enhance users’ immersion and realism, because light field image (LFI) provides various visual cues in terms of multi-view disparity, motion disparity, and selective focus. Therefore, it is crucial to establish a light field image quality assessment (LF-IQA) model that aligns with human visual perception characteristics. However, it has always been a challenge to evaluate the perceptual quality of multiple light field visual cues simultaneously and consistently. To this end, this paper proposes a Transformer-based explicit learning of light field geometry for the no-reference light field image quality assessment. Specifically, to explicitly learn the light field epipolar geometry, we stack up light field sub-aperture images (SAIs) to form four SAI stacks according to four specific light field angular directions, and use a sub-grouping strategy to hierarchically learn the local and global light field geometric features. Then, a Transformer encoder with a spatial-shift tokenization strategy is applied to learn structure-aware light field geometric distortion representation, which is used to regress the final quality score. Evaluation experiments are carried out on three commonly used light field image quality assessment datasets: Win5-LID, NBU-LF1.0, and MPI-LFA. Experimental results demonstrate that our model outperforms state-of-the-art methods and exhibits a high correlation with human perception. The source code is publicly available at \n<uri>https://github.com/windyz77/GeoNRLFIQA</uri>\n.","PeriodicalId":13159,"journal":{"name":"IEEE Transactions on Broadcasting","volume":"70 2","pages":"597-606"},"PeriodicalIF":3.2000,"publicationDate":"2024-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Broadcasting","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10418048/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Elevating traditional 2-dimensional (2D) plane display to 4-dimensional (4D) light field display can significantly enhance users’ immersion and realism, because light field image (LFI) provides various visual cues in terms of multi-view disparity, motion disparity, and selective focus. Therefore, it is crucial to establish a light field image quality assessment (LF-IQA) model that aligns with human visual perception characteristics. However, it has always been a challenge to evaluate the perceptual quality of multiple light field visual cues simultaneously and consistently. To this end, this paper proposes a Transformer-based explicit learning of light field geometry for the no-reference light field image quality assessment. Specifically, to explicitly learn the light field epipolar geometry, we stack up light field sub-aperture images (SAIs) to form four SAI stacks according to four specific light field angular directions, and use a sub-grouping strategy to hierarchically learn the local and global light field geometric features. Then, a Transformer encoder with a spatial-shift tokenization strategy is applied to learn structure-aware light field geometric distortion representation, which is used to regress the final quality score. Evaluation experiments are carried out on three commonly used light field image quality assessment datasets: Win5-LID, NBU-LF1.0, and MPI-LFA. Experimental results demonstrate that our model outperforms state-of-the-art methods and exhibits a high correlation with human perception. The source code is publicly available at https://github.com/windyz77/GeoNRLFIQA .

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于变换器的光场几何学习，用于无参考光场图像质量评估

将传统的二维（2D）平面显示提升到四维（4D）光场显示，可以显著增强用户的沉浸感和真实感，因为光场图像（LFI）提供了多视角差异、运动差异和选择性聚焦等多种视觉线索。因此，建立一个符合人类视觉感知特征的光场图像质量评估（LF-IQA）模型至关重要。然而，如何同时、一致地评估多个光场视觉线索的感知质量一直是个难题。为此，本文提出了一种基于变换器的光场几何显式学习方法，用于无参照光场图像质量评估。具体来说，为了显式学习光场外极几何，我们将光场子孔径图像（SAI）按照四个特定的光场角度方向堆叠成四个 SAI 堆栈，并使用子分组策略分层学习局部和全局光场几何特征。然后，采用空间偏移标记化策略的变换器编码器学习结构感知光场几何失真表示，并以此回归最终质量得分。评估实验在三个常用的光场图像质量评估数据集上进行：Win5-LID、NBU-LF1.0 和 MPI-LFA。实验结果表明，我们的模型优于最先进的方法，并且与人类感知具有很高的相关性。源代码可通过 https://github.com/windyz77/GeoNRLFIQA 公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Broadcasting 工程技术-电信学

CiteScore

9.40

自引率

31.10%

发文量

审稿时长

6-12 weeks

期刊介绍： The Society’s Field of Interest is “Devices, equipment, techniques and systems related to broadcast technology, including the production, distribution, transmission, and propagation aspects.” In addition to this formal FOI statement, which is used to provide guidance to the Publications Committee in the selection of content, the AdCom has further resolved that “broadcast systems includes all aspects of transmission, propagation, and reception.”

期刊最新文献

Table of Contents 2024 Scott Helt Memorial Award for the Best Paper Published in the IEEE Transactions on Broadcasting IEEE Transactions on Broadcasting Publication Information IEEE Transactions on Broadcasting Information for Authors Enhancing Channel Estimation in Terrestrial Broadcast Communications Using Machine Learning