Radar spectrum-image fusion using dual 2D-3D convolutional neural network to transformer inspired multi-headed self-attention bi-long short-term memory network for vehicle recognition

IF 1 4区计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Journal of Electronic Imaging Pub Date : 2024-07-01 DOI:10.1117/1.jei.33.4.043010

Ferris I. Arnous, Ram M. Narayanan

{"title":"Radar spectrum-image fusion using dual 2D-3D convolutional neural network to transformer inspired multi-headed self-attention bi-long short-term memory network for vehicle recognition","authors":"Ferris I. Arnous, Ram M. Narayanan","doi":"10.1117/1.jei.33.4.043010","DOIUrl":null,"url":null,"abstract":"Radar imaging techniques, such as synthetic aperture radar, are widely explored in automatic vehicle recognition algorithms for remote sensing tasks. A large basis of literature covering several machine learning methodologies using visual information transformers, self-attention, convolutional neural networks (CNN), long short-term memory (LSTM), CNN-LSTM, CNN-attention-LSTM, and CNN Bi-LSTM models for detection of military vehicles have been attributed with high performance using a combination of these approaches. Tradeoffs between differing number of poses, single/multiple feature extraction streams, use of signals and/or images, as well as the specific mechanisms used to combine them, have widely been debated. We propose the adaptation of several models towards a unique biologically inspired architecture that utilizes both multi-pose and multi-contextual image and signal radar sensor information to make vehicle assessments over time. We implement a compact multi-pose 3D CNN single stream to process and fuse multi-temporal images while a dual sister 2D CNN stream processes the same information over a lower-dimensional power-spectral domain to mimic the way multi-sequence visual imagery is combined with auditory feedback for enhanced situational awareness. These data are then fused across data domains using transformer-modified encoding blocks to Bi-LSTM segments. Classification results on a fundamentally controlled simulated dataset yielded accuracies of up to 98% and 99% in line with literature. This enhanced performance was then evaluated for robustness not previously explored for three simultaneous parameterizations of incidence angle, object orientation, and lowered signal-to-noise ratio values and found to increase recognition on all three cases for low to moderate noised environments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":"66 1","pages":""},"PeriodicalIF":1.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.4.043010","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Radar imaging techniques, such as synthetic aperture radar, are widely explored in automatic vehicle recognition algorithms for remote sensing tasks. A large basis of literature covering several machine learning methodologies using visual information transformers, self-attention, convolutional neural networks (CNN), long short-term memory (LSTM), CNN-LSTM, CNN-attention-LSTM, and CNN Bi-LSTM models for detection of military vehicles have been attributed with high performance using a combination of these approaches. Tradeoffs between differing number of poses, single/multiple feature extraction streams, use of signals and/or images, as well as the specific mechanisms used to combine them, have widely been debated. We propose the adaptation of several models towards a unique biologically inspired architecture that utilizes both multi-pose and multi-contextual image and signal radar sensor information to make vehicle assessments over time. We implement a compact multi-pose 3D CNN single stream to process and fuse multi-temporal images while a dual sister 2D CNN stream processes the same information over a lower-dimensional power-spectral domain to mimic the way multi-sequence visual imagery is combined with auditory feedback for enhanced situational awareness. These data are then fused across data domains using transformer-modified encoding blocks to Bi-LSTM segments. Classification results on a fundamentally controlled simulated dataset yielded accuracies of up to 98% and 99% in line with literature. This enhanced performance was then evaluated for robustness not previously explored for three simultaneous parameterizations of incidence angle, object orientation, and lowered signal-to-noise ratio values and found to increase recognition on all three cases for low to moderate noised environments.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用双 2D-3D 卷积神经网络与变压器启发的多头自注意双长短期记忆网络进行雷达频谱-图像融合，以识别车辆

雷达成像技术（如合成孔径雷达）在遥感任务的车辆自动识别算法中得到了广泛应用。大量文献涉及多种机器学习方法，包括视觉信息转换器、自注意、卷积神经网络（CNN）、长短期记忆（LSTM）、CNN-LSTM、CNN-attention-LSTM 和 CNN Bi-LSTM 模型，这些方法的组合使用可实现高性能的军用车辆检测。不同姿势数量、单一/多重特征提取流、信号和/或图像的使用以及用于组合这些方法的具体机制之间的权衡问题引起了广泛讨论。我们建议对几种模型进行调整，以建立一个独特的生物灵感架构，利用多姿态和多上下文图像以及信号雷达传感器信息，随时间推移对车辆进行评估。我们采用了紧凑型多用途三维 CNN 单数据流来处理和融合多时态图像，而双姐妹二维 CNN 数据流则在低维功率频谱域上处理相同的信息，以模仿多序列视觉图像与听觉反馈相结合的方式来增强态势感知。然后，利用变压器修改编码块，将这些数据跨数据域融合到 Bi-LSTM 片段中。在一个基本受控的模拟数据集上的分类结果显示，准确率高达 98% 和 99%，与文献一致。然后，针对入射角、物体方位和降低信噪比值的三种同步参数化，对这一增强性能的鲁棒性进行了评估，结果发现，在中低噪声环境中，这三种情况下的识别率都有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Journal of Electronic Imaging 工程技术-成像科学与照相技术

CiteScore

1.70

自引率

27.30%

发文量

341

审稿时长

4.0 months

期刊介绍： The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.