Transformer-Based Light Field Salient Object Detection and Its Application to Autofocus

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-11-20 DOI:10.1109/TIP.2024.3498331

Yao Jiang;Xin Li;Keren Fu;Qijun Zhao

{"title":"Transformer-Based Light Field Salient Object Detection and Its Application to Autofocus","authors":"Yao Jiang;Xin Li;Keren Fu;Qijun Zhao","doi":"10.1109/TIP.2024.3498331","DOIUrl":null,"url":null,"abstract":"Existing light field salient object detection (LFSOD) models predominantly rely on convolutional neural networks or local attention to process light field data, consequently encountering difficulties in modeling intra-slice and cross-slice long-range dependencies within focal stacks. In this paper, we ponder the feasibility of relying solely on the pure Transformer architecture to address this dilemma and propose a novel quasi-pure Transformer-based framework for LFSOD, termed TLFNet. TLFNet incorporates innovative Transformer-based fusion modules (PGFormer) along with an edge enhancement module. The PGFormer employs a perpendicular self-attention (PSA) mechanism to capture long-range dependencies along both cross-slice and intra-slice axes within the focal stack, and integrates multi-modal features using a guided feature fusion (GFF) module. To address the issue of blurry edges arising from the Transformer-based encoder-decoder architecture, the edge enhancement module combines detailed texture and body information and employs focal loss to improve the edge precision of salient objects. TLFNet is a nearly pure Transformer-based approach (with approximately 99.01% of its parameters belonging to the Transformer), while the edge enhancement module significantly boosts accuracy with only around 0.99% of parameters. Comprehensive benchmarks demonstrate that TLFNet outperforms 14 light field models and achieves new state-of-the-art performance. Last but not least, we show in this paper a new application scheme of TLFNet, by cooperating with the deep autofocus technique proposed by Herrmann et al. (2020), leading to light field salient object autofocus (LFSOA). LFSOA aims to identify and output the focal slice with a salient object in focus while keeping other irrelevant background blurred (out-of-focus), yielding an autonomous bokeh effect in photography. The code for the model and application will be publicly available at \n<uri>https://github.com/jiangyao-scu/TLFNet</uri>\n.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"33 ","pages":"6647-6659"},"PeriodicalIF":13.7000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10759590/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Existing light field salient object detection (LFSOD) models predominantly rely on convolutional neural networks or local attention to process light field data, consequently encountering difficulties in modeling intra-slice and cross-slice long-range dependencies within focal stacks. In this paper, we ponder the feasibility of relying solely on the pure Transformer architecture to address this dilemma and propose a novel quasi-pure Transformer-based framework for LFSOD, termed TLFNet. TLFNet incorporates innovative Transformer-based fusion modules (PGFormer) along with an edge enhancement module. The PGFormer employs a perpendicular self-attention (PSA) mechanism to capture long-range dependencies along both cross-slice and intra-slice axes within the focal stack, and integrates multi-modal features using a guided feature fusion (GFF) module. To address the issue of blurry edges arising from the Transformer-based encoder-decoder architecture, the edge enhancement module combines detailed texture and body information and employs focal loss to improve the edge precision of salient objects. TLFNet is a nearly pure Transformer-based approach (with approximately 99.01% of its parameters belonging to the Transformer), while the edge enhancement module significantly boosts accuracy with only around 0.99% of parameters. Comprehensive benchmarks demonstrate that TLFNet outperforms 14 light field models and achieves new state-of-the-art performance. Last but not least, we show in this paper a new application scheme of TLFNet, by cooperating with the deep autofocus technique proposed by Herrmann et al. (2020), leading to light field salient object autofocus (LFSOA). LFSOA aims to identify and output the focal slice with a salient object in focus while keeping other irrelevant background blurred (out-of-focus), yielding an autonomous bokeh effect in photography. The code for the model and application will be publicly available at https://github.com/jiangyao-scu/TLFNet .

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于变压器的光场突出物体检测及其在自动对焦中的应用

现有的光场显著目标检测（LFSOD）模型主要依靠卷积神经网络或局部注意力来处理光场数据，因此在模拟焦点堆栈内的片内和跨片的远程依赖关系方面存在困难。在本文中，我们考虑了完全依赖纯Transformer架构来解决这一困境的可行性，并提出了一种新的基于准纯Transformer的LFSOD框架，称为TLFNet。TLFNet集成了创新的基于变压器的融合模块（PGFormer）以及边缘增强模块。PGFormer采用垂直自关注（PSA）机制来捕获焦点堆栈中沿交叉切片和切片内轴的远程依赖关系，并使用引导特征融合（GFF）模块集成多模态特征。为了解决基于transformer的编码器-解码器架构导致的边缘模糊问题，边缘增强模块结合了详细的纹理和身体信息，并利用焦损失来提高突出物体的边缘精度。TLFNet是一种几乎纯粹的基于Transformer的方法（大约99.01%的参数属于Transformer），而边缘增强模块仅使用约0.99%的参数即可显着提高准确性。综合基准测试表明，TLFNet优于14种光场模型，实现了新的最先进的性能。最后，本文提出了一种新的TLFNet应用方案，该方案与Herrmann等人（2020）提出的深度自动对焦技术相结合，实现了光场显著物自动对焦（LFSOA）。LFSOA旨在识别和输出焦点切片，同时保持其他不相关的背景模糊（失焦），在摄影中产生自动散景效果。模型和应用程序的代码将在https://github.com/jiangyao-scu/TLFNet上公开提供。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量