{"title":"Spatial Attention-Guided Light Field Salient Object Detection Network With Implicit Neural Representation","authors":"Xin Zheng;Zhengqu Li;Deyang Liu;Xiaofei Zhou;Caifeng Shan","doi":"10.1109/TCSVT.2024.3437685","DOIUrl":null,"url":null,"abstract":"Recently, many Light Field Salient Object Detection (LF SOD) methods have been proposed. However, guaranteeing the integrality and recovering more high-frequency details of the generated salient object map still remain challenging. To this end, we propose a spatial attention-guided LF SOD network with implicit neural representation to further improve LF SOD performance. We adopt an encoder-decoder structure for model construction. In order to ensure the completeness of the generated salient object map, a multi-modal and multi-scale feature fusion module is designed in the encoder part to refine the salient regions within all-in-focus image and aggregate the focal stack and all-in-focus image in spatial attention-guided manner. In order to recover more high-frequency details of the obtained salient object map, an implicit detail restoration module is proposed in the decoder part. In virtue of implicit neural representation, we convert the detail restoration problem into a functional mapping problem. By further integrating the self-attention mechanism, the derived saliency map can be depicted at a more refined level. Comprehensive experimental results demonstrate the superiority of the proposed method. Ablation studies and visual comparisons further validate that the proposed method can guarantee the integrality and recover more high-frequency detail information of the obtained saliency map. The code is publicly available at \n<uri>https://github.com/ldyorchid/LFSOD-Net</uri>\n.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"34 12","pages":"12437-12449"},"PeriodicalIF":11.1000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10621671/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, many Light Field Salient Object Detection (LF SOD) methods have been proposed. However, guaranteeing the integrality and recovering more high-frequency details of the generated salient object map still remain challenging. To this end, we propose a spatial attention-guided LF SOD network with implicit neural representation to further improve LF SOD performance. We adopt an encoder-decoder structure for model construction. In order to ensure the completeness of the generated salient object map, a multi-modal and multi-scale feature fusion module is designed in the encoder part to refine the salient regions within all-in-focus image and aggregate the focal stack and all-in-focus image in spatial attention-guided manner. In order to recover more high-frequency details of the obtained salient object map, an implicit detail restoration module is proposed in the decoder part. In virtue of implicit neural representation, we convert the detail restoration problem into a functional mapping problem. By further integrating the self-attention mechanism, the derived saliency map can be depicted at a more refined level. Comprehensive experimental results demonstrate the superiority of the proposed method. Ablation studies and visual comparisons further validate that the proposed method can guarantee the integrality and recover more high-frequency detail information of the obtained saliency map. The code is publicly available at
https://github.com/ldyorchid/LFSOD-Net
.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.