A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images

Hanyu Wang; Qiang Shen; Zilong Deng

doi:10.1016/j.neucom.2024.128748

A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images

Hanyu Wang, Qiang Shen^*, Zilong Deng

^*Corresponding author for this work

School of Mechatronical Engineering

Beijing Institute of Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.

Original language	English
Article number	128748
Journal	Neurocomputing
Volume	612
DOIs	https://doi.org/10.1016/j.neucom.2024.128748
Publication status	Published - 7 Jan 2025

Keywords

Deep learning
Key parts
Knowledge graph
Military targets
Object detection

Access to Document

10.1016/j.neucom.2024.128748

Cite this

@article{6a1bfdd558994888abdd5ad83764c38c,

title = "A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images",

abstract = "Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.",

keywords = "Deep learning, Key parts, Knowledge graph, Military targets, Object detection",

author = "Hanyu Wang and Qiang Shen and Zilong Deng",

note = "Publisher Copyright: {\textcopyright} 2024 Elsevier B.V.",

year = "2025",

month = jan,

day = "7",

doi = "10.1016/j.neucom.2024.128748",

language = "English",

volume = "612",

journal = "Neurocomputing",

issn = "0925-2312",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images

AU - Wang, Hanyu

AU - Shen, Qiang

AU - Deng, Zilong

PY - 2025/1/7

Y1 - 2025/1/7

N2 - Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.

AB - Detecting targets and their key parts in UAV images is crucial for both military and civilian applications, including optimizing damage assessment, evaluating infrastructure, and facilitating disaster response efforts. Traditional top-down approaches impose excessive constraints that struggle to address challenges such as variable definitions and quantities of key parts, potential target occlusion, and model redundancy. Conversely, end-to-end approaches often overlook the relationships between targets and key parts, resulting in low detection accuracy. Inspired by the remarkable human reasoning process, we propose the Diverse Knowledge Perception and Fusion (DKPF) network, which skillfully balances the trade-offs between stringent constraints and unconstrained methods while ensuring both detection precision and real-time performance. Specifically, our model integrates reasoning guided by three distinct forms of knowledge: contextual knowledge at the image level in an unsupervised manner; explicit semantic knowledge regarding the interactions between targets and key parts at the instance level; and implicit comprehensive knowledge about the relationships among different types of targets or key parts, such as shape similarity. These specific knowledge forms are extracted through a novel adaptive fusion strategy for multi-scale features, a binary region-to-region semantic knowledge graph, and a data-driven self-attention architecture, respectively. Experiments conducted on both simulated and real-world datasets reveal that our method significantly outperforms state-of-the-art techniques, regardless of the number of key parts in the target. Furthermore, extensive ablation studies and visualization analyses validate both the efficacy of our approach and the interpretability of the generated features.

KW - Deep learning

KW - Key parts

KW - Knowledge graph

KW - Military targets

KW - Object detection

UR - http://www.scopus.com/inward/record.url?scp=85206681062&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2024.128748

DO - 10.1016/j.neucom.2024.128748

M3 - Article

AN - SCOPUS:85206681062

SN - 0925-2312

VL - 612

JO - Neurocomputing

JF - Neurocomputing

M1 - 128748

ER -

A Diverse Knowledge Perception and Fusion network for detecting targets and key parts in UAV images

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this