A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Neurocomputing Pub Date : 2024-10-13 DOI:10.1016/j.neucom.2024.128748

{"title":"A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images","authors":"","doi":"10.1016/j.neucom.2024.128748","DOIUrl":null,"url":null,"abstract":"<div><div>Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":null,"pages":null},"PeriodicalIF":5.5000,"publicationDate":"2024-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224015194","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

用于探测无人机图像中的目标和关键部分的多元知识感知与融合网络

检测无人机图像中的目标及其关键部分对于军事和民用应用都至关重要，包括优化损害评估、评估基础设施和促进灾难响应工作。传统的自上而下方法施加了过多的限制，难以应对关键部分的定义和数量可变、潜在的目标遮挡和模型冗余等挑战。相反，端到端方法往往会忽略目标与关键部件之间的关系，从而导致检测精度低下。受人类非凡推理过程的启发，我们提出了多样化知识感知和融合（DKPF）网络，它巧妙地平衡了严格约束和无约束方法之间的权衡，同时确保了检测精度和实时性。具体来说，我们的模型整合了三种不同形式的知识指导下的推理：以无监督方式在图像层面上的上下文知识；在实例层面上关于目标和关键部分之间相互作用的显式语义知识；以及关于不同类型目标或关键部分之间关系（如形状相似性）的隐式综合知识。这些特定的知识形式分别是通过一种新颖的多尺度特征自适应融合策略、二元区域到区域语义知识图谱和数据驱动的自我关注架构提取的。在模拟和真实世界数据集上进行的实验表明，无论目标中关键部分的数量如何，我们的方法都明显优于最先进的技术。此外，广泛的消融研究和可视化分析验证了我们方法的有效性和生成特征的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.