Line-of-Sight Depth Attention for Panoptic Parsing of Distant Small-Faint Instances

IF 13.7 IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-02-14 DOI:10.1109/TIP.2025.3540265

Zhongqi Lin;Xudong Jiang;Zengwei Zheng

{"title":"Line-of-Sight Depth Attention for Panoptic Parsing of Distant Small-Faint Instances","authors":"Zhongqi Lin;Xudong Jiang;Zengwei Zheng","doi":"10.1109/TIP.2025.3540265","DOIUrl":null,"url":null,"abstract":"Current scene parsers have effectively distilled abstract relationships among refined instances, while overlooking the discrepancies arising from variations in scene depth. Hence, their potential to imitate the intrinsic 3D perception ability of humans is constrained. In accordance with the principle of perspective, we advocate first grading the depth of the scenes into several slices, and then digging semantic correlations within a slice or between multiple slices. Two attention-based components, namely the Scene Depth Grading Module (SDGM) and the Edge-oriented Correlation Refining Module (EoCRM), comprise our framework, the Line-of-Sight Depth Network (LoSDN). SDGM grades scene into several slices by calculating depth attention tendencies based on parameters with explicit physical meanings, e.g., albedo, occlusion, specular embeddings. This process allocates numerous multi-scale instances to each scene slice based on their line-of-sight extension distance, establishing a solid groundwork for ordered association mining in EoCRM. Since the primary step in distinguishing distant faint targets is boundary delineation, EoCRM implements edge-wise saliency quantification and association digging. Quantitative and diagnostic experiments on Cityscapes, ADE20K, and PASCAL Context datasets reveal the competitiveness of LoSDN and the individual contribution of each highlight. Visualizations display that our strategy offers clear benefits in detecting distant, faint targets.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1354-1366"},"PeriodicalIF":13.7000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10890921/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Current scene parsers have effectively distilled abstract relationships among refined instances, while overlooking the discrepancies arising from variations in scene depth. Hence, their potential to imitate the intrinsic 3D perception ability of humans is constrained. In accordance with the principle of perspective, we advocate first grading the depth of the scenes into several slices, and then digging semantic correlations within a slice or between multiple slices. Two attention-based components, namely the Scene Depth Grading Module (SDGM) and the Edge-oriented Correlation Refining Module (EoCRM), comprise our framework, the Line-of-Sight Depth Network (LoSDN). SDGM grades scene into several slices by calculating depth attention tendencies based on parameters with explicit physical meanings, e.g., albedo, occlusion, specular embeddings. This process allocates numerous multi-scale instances to each scene slice based on their line-of-sight extension distance, establishing a solid groundwork for ordered association mining in EoCRM. Since the primary step in distinguishing distant faint targets is boundary delineation, EoCRM implements edge-wise saliency quantification and association digging. Quantitative and diagnostic experiments on Cityscapes, ADE20K, and PASCAL Context datasets reveal the competitiveness of LoSDN and the individual contribution of each highlight. Visualizations display that our strategy offers clear benefits in detecting distant, faint targets.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

远距离小模糊实例全视分析的视线深度注意

当前的场景解析器已经有效地提取了精炼实例之间的抽象关系，而忽略了场景深度变化引起的差异。因此，它们模仿人类固有的3D感知能力的潜力受到限制。根据透视原理，我们主张先将场景的深度划分为几个切片，然后挖掘一个切片内或多个切片之间的语义相关性。两个基于注意力的组件，即场景深度分级模块（SDGM）和面向边缘的相关精炼模块（EoCRM），组成了我们的框架，即视线深度网络（LoSDN）。SDGM基于具有明确物理含义的参数（如反照率、遮挡、镜面嵌入）计算深度注意趋势，将场景划分为多个切片。该过程根据每个场景片的视距扩展距离分配多个多尺度实例，为EoCRM中的有序关联挖掘奠定了坚实的基础。由于区分远距离微弱目标的第一步是边界划定，EoCRM实现了边缘显著性量化和关联挖掘。在cityscape、ADE20K和PASCAL上下文数据集上的定量和诊断实验揭示了LoSDN的竞争力和每个亮点的个人贡献。可视化显示，我们的策略在探测遥远、微弱的目标方面提供了明显的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量