Radar spectrum-image fusion using dual 2D-3D convolutional neural network to transformer inspired multi-headed self-attention bi-long short-term memory network for vehicle recognition

IF 1 4区 计算机科学 Q4 ENGINEERING, ELECTRICAL & ELECTRONIC Journal of Electronic Imaging Pub Date : 2024-07-01 DOI:10.1117/1.jei.33.4.043010
Ferris I. Arnous, Ram M. Narayanan
{"title":"Radar spectrum-image fusion using dual 2D-3D convolutional neural network to transformer inspired multi-headed self-attention bi-long short-term memory network for vehicle recognition","authors":"Ferris I. Arnous, Ram M. Narayanan","doi":"10.1117/1.jei.33.4.043010","DOIUrl":null,"url":null,"abstract":"Radar imaging techniques, such as synthetic aperture radar, are widely explored in automatic vehicle recognition algorithms for remote sensing tasks. A large basis of literature covering several machine learning methodologies using visual information transformers, self-attention, convolutional neural networks (CNN), long short-term memory (LSTM), CNN-LSTM, CNN-attention-LSTM, and CNN Bi-LSTM models for detection of military vehicles have been attributed with high performance using a combination of these approaches. Tradeoffs between differing number of poses, single/multiple feature extraction streams, use of signals and/or images, as well as the specific mechanisms used to combine them, have widely been debated. We propose the adaptation of several models towards a unique biologically inspired architecture that utilizes both multi-pose and multi-contextual image and signal radar sensor information to make vehicle assessments over time. We implement a compact multi-pose 3D CNN single stream to process and fuse multi-temporal images while a dual sister 2D CNN stream processes the same information over a lower-dimensional power-spectral domain to mimic the way multi-sequence visual imagery is combined with auditory feedback for enhanced situational awareness. These data are then fused across data domains using transformer-modified encoding blocks to Bi-LSTM segments. Classification results on a fundamentally controlled simulated dataset yielded accuracies of up to 98% and 99% in line with literature. This enhanced performance was then evaluated for robustness not previously explored for three simultaneous parameterizations of incidence angle, object orientation, and lowered signal-to-noise ratio values and found to increase recognition on all three cases for low to moderate noised environments.","PeriodicalId":54843,"journal":{"name":"Journal of Electronic Imaging","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Electronic Imaging","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1117/1.jei.33.4.043010","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Radar imaging techniques, such as synthetic aperture radar, are widely explored in automatic vehicle recognition algorithms for remote sensing tasks. A large basis of literature covering several machine learning methodologies using visual information transformers, self-attention, convolutional neural networks (CNN), long short-term memory (LSTM), CNN-LSTM, CNN-attention-LSTM, and CNN Bi-LSTM models for detection of military vehicles have been attributed with high performance using a combination of these approaches. Tradeoffs between differing number of poses, single/multiple feature extraction streams, use of signals and/or images, as well as the specific mechanisms used to combine them, have widely been debated. We propose the adaptation of several models towards a unique biologically inspired architecture that utilizes both multi-pose and multi-contextual image and signal radar sensor information to make vehicle assessments over time. We implement a compact multi-pose 3D CNN single stream to process and fuse multi-temporal images while a dual sister 2D CNN stream processes the same information over a lower-dimensional power-spectral domain to mimic the way multi-sequence visual imagery is combined with auditory feedback for enhanced situational awareness. These data are then fused across data domains using transformer-modified encoding blocks to Bi-LSTM segments. Classification results on a fundamentally controlled simulated dataset yielded accuracies of up to 98% and 99% in line with literature. This enhanced performance was then evaluated for robustness not previously explored for three simultaneous parameterizations of incidence angle, object orientation, and lowered signal-to-noise ratio values and found to increase recognition on all three cases for low to moderate noised environments.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用双 2D-3D 卷积神经网络与变压器启发的多头自注意双长短期记忆网络进行雷达频谱-图像融合,以识别车辆
雷达成像技术(如合成孔径雷达)在遥感任务的车辆自动识别算法中得到了广泛应用。大量文献涉及多种机器学习方法,包括视觉信息转换器、自注意、卷积神经网络(CNN)、长短期记忆(LSTM)、CNN-LSTM、CNN-attention-LSTM 和 CNN Bi-LSTM 模型,这些方法的组合使用可实现高性能的军用车辆检测。不同姿势数量、单一/多重特征提取流、信号和/或图像的使用以及用于组合这些方法的具体机制之间的权衡问题引起了广泛讨论。我们建议对几种模型进行调整,以建立一个独特的生物灵感架构,利用多姿态和多上下文图像以及信号雷达传感器信息,随时间推移对车辆进行评估。我们采用了紧凑型多用途三维 CNN 单数据流来处理和融合多时态图像,而双姐妹二维 CNN 数据流则在低维功率频谱域上处理相同的信息,以模仿多序列视觉图像与听觉反馈相结合的方式来增强态势感知。然后,利用变压器修改编码块,将这些数据跨数据域融合到 Bi-LSTM 片段中。在一个基本受控的模拟数据集上的分类结果显示,准确率高达 98% 和 99%,与文献一致。然后,针对入射角、物体方位和降低信噪比值的三种同步参数化,对这一增强性能的鲁棒性进行了评估,结果发现,在中低噪声环境中,这三种情况下的识别率都有所提高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Electronic Imaging
Journal of Electronic Imaging 工程技术-成像科学与照相技术
CiteScore
1.70
自引率
27.30%
发文量
341
审稿时长
4.0 months
期刊介绍: The Journal of Electronic Imaging publishes peer-reviewed papers in all technology areas that make up the field of electronic imaging and are normally considered in the design, engineering, and applications of electronic imaging systems.
期刊最新文献
DTSIDNet: a discrete wavelet and transformer based network for single image denoising Multi-head attention with reinforcement learning for supervised video summarization End-to-end multitasking network for smart container product positioning and segmentation Generative object separation in X-ray images Toward effective local dimming-driven liquid crystal displays: a deep curve estimation–based adaptive compensation solution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1