IEEE Journal of Selected Topics in Signal Processing最新文献_第2页

DDL: Empowering Delivery Drones With Large-Scale Urban Sensing Capability DDL：赋予送货无人机大规模城市感知能力

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-07-19 DOI: 10.1109/JSTSP.2024.3427371

Xuecheng Chen;Haoyang Wang;Yuhan Cheng;Haohao Fu;Yuxuan Liu;Fan Dang;Yunhao Liu;Jinqiang Cui;Xinlei Chen

Delivery drones provide a promising sensing platform for smart cities thanks to their city-wide infrastructure and large-scale deployment. However, due to limited battery lifetime and available resources, it is challenging to schedule delivery drones to derive both high sensing and delivery performance, which is a highly complicated optimization problem with several coupled decision variables. Meanwhile, this complex optimization problem involves multiple interconnected decision variables, making it even more complex. In this paper, we first propose a delivery drone-based sensing system and formulate a mixed-integer non-linear programming problem (MINLP) that jointly optimizes the sensing utility and delivery time, considering practical factors including energy capacity and available delivery drones. Then we provide an efficient solution that integrates the strength of deep reinforcement learning (DRL) and heuristic, which decouples the highly complicated optimization search process and replaces the heavy computation with a rapid approximation. Evaluation results compared with the state-of-the-art baselines show that DDL improves the scheduling quality by at least 46% on average. More importantly, our proposed method could effectively improve the computational efficiency, which is up to 98 times higher than the best baseline.

送货无人机凭借其遍布城市的基础设施和大规模部署，为智慧城市提供了一个前景广阔的感知平台。然而，由于电池寿命和可用资源有限，如何调度送货无人机以获得较高的感知和送货性能具有挑战性，这是一个高度复杂的优化问题，涉及多个耦合决策变量。同时，这个复杂的优化问题涉及多个相互关联的决策变量，使其变得更加复杂。在本文中，我们首先提出了一种基于送货无人机的感知系统，并提出了一个混合整数非线性编程问题（MINLP），在考虑能源容量和可用送货无人机等实际因素的情况下，联合优化感知效用和送货时间。然后，我们提供了一种整合了深度强化学习（DRL）和启发式优势的高效解决方案，该方案解耦了高度复杂的优化搜索过程，并以快速近似代替了繁重的计算。与最先进的基线相比，评估结果表明，DDL 平均将调度质量提高了至少 46%。更重要的是，我们提出的方法能有效提高计算效率，比最佳基线高出 98 倍。

{"title":"DDL: Empowering Delivery Drones With Large-Scale Urban Sensing Capability","authors":"Xuecheng Chen;Haoyang Wang;Yuhan Cheng;Haohao Fu;Yuxuan Liu;Fan Dang;Yunhao Liu;Jinqiang Cui;Xinlei Chen","doi":"10.1109/JSTSP.2024.3427371","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3427371","url":null,"abstract":"Delivery drones provide a promising sensing platform for smart cities thanks to their city-wide infrastructure and large-scale deployment. However, due to limited battery lifetime and available resources, it is challenging to schedule delivery drones to derive both high sensing and delivery performance, which is a highly complicated optimization problem with several coupled decision variables. Meanwhile, this complex optimization problem involves multiple interconnected decision variables, making it even more complex. In this paper, we first propose a delivery drone-based sensing system and formulate a mixed-integer non-linear programming problem (MINLP) that jointly optimizes the sensing utility and delivery time, considering practical factors including energy capacity and available delivery drones. Then we provide an efficient solution that integrates the strength of deep reinforcement learning (DRL) and heuristic, which decouples the highly complicated optimization search process and replaces the heavy computation with a rapid approximation. Evaluation results compared with the state-of-the-art baselines show that \u0000<italic>DDL</i>\u0000 improves the scheduling quality by at least 46% on average. More importantly, our proposed method could effectively improve the computational efficiency, which is up to 98 times higher than the best baseline.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"502-515"},"PeriodicalIF":8.7,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Two-Stage Audio-Visual Speech Separation Method Without Visual Signals for Testing and Tuples Loss With Dynamic Margin 一种无视觉信号测试的两阶段视听语音分离方法和具有动态余量的图元丢失方法

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-07-12 DOI: 10.1109/JSTSP.2024.3427424

Yinggang Liu;Yuanjie Deng;Ying Wei

Speech separation as a fundamental task in signal processing can be used in many types of intelligent robots, and audio-visual (AV) speech separation has been proven to be superior to audio-only speech separation. In current AV speech separation methods, visual information plays a pivotal role not only during network training but also during testing. However, due to various factors in real environments, sensors do not always possible to obtain high-quality visual signals. In this paper, we propose an effective two-stage AV speech separation model that introduces a new approach of visual feature embedding, where visual information is used to optimize the separation network during training, but no visual input is required during testing. Different from the current methods which fuse visual features and audio features together as the input of the separation network, in this model, visual features are embedded into AV matching block to calculate the cross-modal consistency loss, which is used as part of the loss function for network optimization. A novel tuples loss function with a learnable dynamic margin is proposed for better AV matching, and two margin change strategies are given. The proposed two-stage AV speech separation method is evaluated on the widely used GRID and VoxCeleb2 datasets. Experimental results show that the performance outperforms current AV speech separation methods.

语音分离作为信号处理中的一项基本任务，可用于多种类型的智能机器人，而视听（AV）语音分离已被证明优于纯音频语音分离。在目前的视听语音分离方法中，视觉信息不仅在网络训练过程中起着关键作用，在测试过程中也是如此。然而，由于实际环境中的各种因素，传感器并不总能获得高质量的视觉信号。本文提出了一种有效的两阶段视听语音分离模型，引入了视觉特征嵌入的新方法，即在训练过程中使用视觉信息优化分离网络，而在测试过程中不需要视觉输入。与目前将视觉特征和音频特征融合在一起作为分离网络输入的方法不同，在该模型中，视觉特征被嵌入到影音匹配块中，以计算跨模态一致性损失，并将其作为网络优化损失函数的一部分。为了更好地进行视听匹配，提出了一种具有可学习动态余量的新型元组损失函数，并给出了两种余量变化策略。在广泛使用的 GRID 和 VoxCeleb2 数据集上评估了所提出的两阶段影音语音分离方法。实验结果表明，该方法的性能优于当前的 AV 语音分离方法。

{"title":"A Two-Stage Audio-Visual Speech Separation Method Without Visual Signals for Testing and Tuples Loss With Dynamic Margin","authors":"Yinggang Liu;Yuanjie Deng;Ying Wei","doi":"10.1109/JSTSP.2024.3427424","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3427424","url":null,"abstract":"Speech separation as a fundamental task in signal processing can be used in many types of intelligent robots, and audio-visual (AV) speech separation has been proven to be superior to audio-only speech separation. In current AV speech separation methods, visual information plays a pivotal role not only during network training but also during testing. However, due to various factors in real environments, sensors do not always possible to obtain high-quality visual signals. In this paper, we propose an effective two-stage AV speech separation model that introduces a new approach of visual feature embedding, where visual information is used to optimize the separation network during training, but no visual input is required during testing. Different from the current methods which fuse visual features and audio features together as the input of the separation network, in this model, visual features are embedded into AV matching block to calculate the cross-modal consistency loss, which is used as part of the loss function for network optimization. A novel tuples loss function with a learnable dynamic margin is proposed for better AV matching, and two margin change strategies are given. The proposed two-stage AV speech separation method is evaluated on the widely used GRID and VoxCeleb2 datasets. Experimental results show that the performance outperforms current AV speech separation methods.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"459-472"},"PeriodicalIF":8.7,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Adaptive Image Thresholding Algorithm Using Fuzzy Logic for Autonomous Underwater Vehicle Navigation 使用模糊逻辑的自适应图像阈值算法，用于水下航行器自主导航

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-07-11 DOI: 10.1109/JSTSP.2024.3426484

I-Chen Sang;William R. Norris

Breakthroughs in autonomous vehicle technology have ignited diverse topics within engineering research. Among these, the focus on conducting inspections through autonomous underwater vehicles (AUVs) stands out as particularly influential, owing to the substantial investments directed towards offshore infrastructures. Leveraging the capabilities of onboard sensors, AUVs hold the potential to adeptly trace and examine pipelines with high levels of accuracy. However, the complicated and varying underwater environment presents a formidable challenge to ensuring the robustness of the localization and navigation framework. In response to these challenges, this study introduces a novel GPS-denied, adaptive, vision-based navigation framework tailored specifically for AUV inspection tasks. Different from conventional approaches involving manual parameter tuning, this framework dynamically adjusts contrast enhancement and edge detection functions based on incoming frame data. Fuzzy inference systems (FIS) have been harnessed within both image processing and the navigation algorithm, strengthening the overall robustness of the system. The verification of the proposed framework took place within a simulation environment. Through the implemented algorithm, the AUV adeptly identified, approached, and traversed the pipeline. Additionally, the framework distinctly showcased its capacity to dynamically adjust parameters, reduce processing time, and uphold consistency amid diverse illuminations and levels of noise.

自动潜航器技术的突破引发了工程研究领域的各种课题。其中，由于对近海基础设施的大量投资，通过自动潜航器（AUV）进行检查的重点尤其具有影响力。利用机载传感器的能力，自动潜航器有可能以高精度对管道进行追踪和检查。然而，复杂多变的水下环境对确保定位和导航框架的稳健性提出了严峻的挑战。为了应对这些挑战，本研究引入了一种新颖的、不依赖 GPS 的、自适应的、基于视觉的导航框架，专门为 AUV 检测任务定制。与涉及手动参数调整的传统方法不同，该框架可根据传入的帧数据动态调整对比度增强和边缘检测功能。图像处理和导航算法都采用了模糊推理系统（FIS），从而增强了系统的整体鲁棒性。在模拟环境中对所提出的框架进行了验证。通过实施的算法，自动潜航器巧妙地识别、接近并穿越了管道。此外，该框架明显展示了其动态调整参数、缩短处理时间以及在不同光照和噪声水平下保持一致性的能力。

{"title":"An Adaptive Image Thresholding Algorithm Using Fuzzy Logic for Autonomous Underwater Vehicle Navigation","authors":"I-Chen Sang;William R. Norris","doi":"10.1109/JSTSP.2024.3426484","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3426484","url":null,"abstract":"Breakthroughs in autonomous vehicle technology have ignited diverse topics within engineering research. Among these, the focus on conducting inspections through autonomous underwater vehicles (AUVs) stands out as particularly influential, owing to the substantial investments directed towards offshore infrastructures. Leveraging the capabilities of onboard sensors, AUVs hold the potential to adeptly trace and examine pipelines with high levels of accuracy. However, the complicated and varying underwater environment presents a formidable challenge to ensuring the robustness of the localization and navigation framework. In response to these challenges, this study introduces a novel GPS-denied, adaptive, vision-based navigation framework tailored specifically for AUV inspection tasks. Different from conventional approaches involving manual parameter tuning, this framework dynamically adjusts contrast enhancement and edge detection functions based on incoming frame data. Fuzzy inference systems (FIS) have been harnessed within both image processing and the navigation algorithm, strengthening the overall robustness of the system. The verification of the proposed framework took place within a simulation environment. Through the implemented algorithm, the AUV adeptly identified, approached, and traversed the pipeline. Additionally, the framework distinctly showcased its capacity to dynamically adjust parameters, reduce processing time, and uphold consistency amid diverse illuminations and levels of noise.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"358-367"},"PeriodicalIF":8.7,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10596073","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-Frequency Spherical Near-Field Antenna Measurements Using Compressive Sensing 利用压缩传感进行多频率球形近场天线测量

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-07-10 DOI: 10.1109/JSTSP.2024.3424310

Marc Andrew Valdez;Jacob D. Rezac;Michael B. Wakin;Joshua A. Gordon

We propose compressive sensing approaches for broadband spherical near-field measurements that reduce measurement demands beyond what is achievable using conventional single-frequency compressive sensing. Our approaches use two different compressive signal models—sparsity-based and low-rank-based—whose viability we establish using a simulated standard gain horn antenna. Under mild assumptions on the device being tested, we prove that sparsity-based broadband compressive sensing provides significant measurement number reductions over single-frequency compressive sensing. We find that our proposed low-rank model also provides an effective means of achieving broadband compressive sensing, using numerical experiments, with performance on par with the best broadband sparsity-based method. Exemplifying these best-case results, even in the presence of measurement noise, the methods we propose can achieve relative errors of −40 dB using about 1/4 of the measurements required for conventional sampling. This is equivalent to about 1/2 sample per unknown, whereas traditional spherical near-field measurements require a minimum of roughly 2 measurements per unknown.

我们提出了用于宽带球形近场测量的压缩传感方法，该方法可降低测量要求，超越传统单频压缩传感方法。我们的方法使用了两种不同的压缩信号模型--基于稀疏性的模型和基于低秩的模型，我们使用模拟的标准增益喇叭天线建立了这两种模型的可行性。在测试设备的温和假设条件下，我们证明基于稀疏性的宽带压缩传感比单频压缩传感能显著减少测量次数。通过数值实验，我们发现我们提出的低秩模型也是实现宽带压缩传感的有效方法，其性能与基于稀疏性的最佳宽带方法相当。作为这些最佳结果的例证，即使存在测量噪声，我们提出的方法也能使用传统采样所需的约 1/4 测量值，实现 -40 dB 的相对误差。这相当于对每个未知数进行约 1/2 次采样，而传统的球面近场测量对每个未知数至少需要约 2 次测量。

{"title":"Multi-Frequency Spherical Near-Field Antenna Measurements Using Compressive Sensing","authors":"Marc Andrew Valdez;Jacob D. Rezac;Michael B. Wakin;Joshua A. Gordon","doi":"10.1109/JSTSP.2024.3424310","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3424310","url":null,"abstract":"We propose compressive sensing approaches for broadband spherical near-field measurements that reduce measurement demands beyond what is achievable using conventional single-frequency compressive sensing. Our approaches use two different compressive signal models—sparsity-based and low-rank-based—whose viability we establish using a simulated standard gain horn antenna. Under mild assumptions on the device being tested, we prove that sparsity-based broadband compressive sensing provides significant measurement number reductions over single-frequency compressive sensing. We find that our proposed low-rank model also provides an effective means of achieving broadband compressive sensing, using numerical experiments, with performance on par with the best broadband sparsity-based method. Exemplifying these best-case results, even in the presence of measurement noise, the methods we propose can achieve relative errors of −40 dB using about 1/4 of the measurements required for conventional sampling. This is equivalent to about 1/2 sample per unknown, whereas traditional spherical near-field measurements require a minimum of roughly 2 measurements per unknown.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"572-586"},"PeriodicalIF":8.7,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Standoff Target Tracking for Networked UAVs With Specified Performance via Deep Reinforcement Learning 通过深度强化学习为联网无人机提供具有指定性能的对峙目标跟踪

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-07-09 DOI: 10.1109/JSTSP.2024.3425052

Yi Xia;Jun Du;Zekai Zhang;Ziyuan Wang;Jingzehua Xu;Weishi Mi

Maintaining rapid and prolonged standoff target tracking for networked unmanned aerial vehicles (UAVs) is challenging, as existing methods fail to improve tracking performance while simultaneously reducing energy consumption. This paper proposes a deep reinforcement learning (DRL)-based tracking scheme for UAVs to approximate an escape target, effectively addressing time constraints and guaranteeing low energy expenditure. In the first phase, a coordinated target tracking protocol and a target position estimator are developed using only bearing measurements, which enable the deployment of UAVs along a standoff circle centered at the target with an expected angular spacing. Additionally, an unknown system dynamics estimator (USDE) is devised based on concise filtering operations to mitigate adverse disturbances. In the second phase, multi-agent deep deterministic policy gradient (MADDPG) is employed to strike an optimal balance between tracking accuracy and energy consumption by encoding time limitations as skilled barrier functions. Simulation results demonstrate that the proposed method outperforms benchmarks in terms of tracking accuracy and control cost.

保持联网无人飞行器（UAV）的快速和长时间对峙目标跟踪具有挑战性，因为现有方法无法在提高跟踪性能的同时降低能耗。本文提出了一种基于深度强化学习（DRL）的无人飞行器近似逃逸目标跟踪方案，可有效解决时间限制并保证低能耗。在第一阶段，仅使用方位测量开发了协调目标跟踪协议和目标位置估算器，使无人机能够沿着以目标为中心的对峙圆部署，并具有预期的角间距。此外，还根据简明滤波操作设计了未知系统动态估计器（USDE），以减轻不利干扰。在第二阶段，采用多代理深度确定性策略梯度（MADDPG），通过将时间限制编码为熟练的障碍函数，在跟踪精度和能耗之间取得最佳平衡。仿真结果表明，所提出的方法在跟踪精度和控制成本方面都优于基准方法。

{"title":"Standoff Target Tracking for Networked UAVs With Specified Performance via Deep Reinforcement Learning","authors":"Yi Xia;Jun Du;Zekai Zhang;Ziyuan Wang;Jingzehua Xu;Weishi Mi","doi":"10.1109/JSTSP.2024.3425052","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3425052","url":null,"abstract":"Maintaining rapid and prolonged standoff target tracking for networked unmanned aerial vehicles (UAVs) is challenging, as existing methods fail to improve tracking performance while simultaneously reducing energy consumption. This paper proposes a deep reinforcement learning (DRL)-based tracking scheme for UAVs to approximate an escape target, effectively addressing time constraints and guaranteeing low energy expenditure. In the first phase, a coordinated target tracking protocol and a target position estimator are developed using only bearing measurements, which enable the deployment of UAVs along a standoff circle centered at the target with an expected angular spacing. Additionally, an unknown system dynamics estimator (USDE) is devised based on concise filtering operations to mitigate adverse disturbances. In the second phase, multi-agent deep deterministic policy gradient (MADDPG) is employed to strike an optimal balance between tracking accuracy and energy consumption by encoding time limitations as skilled barrier functions. Simulation results demonstrate that the proposed method outperforms benchmarks in terms of tracking accuracy and control cost.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"516-528"},"PeriodicalIF":8.7,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition 意识到不协调性的跨模态注意力用于维度情感识别中的视听融合

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-07-03 DOI: 10.1109/JSTSP.2024.3422823

R. Gnana Praveen;Jahangir Alam

Multimodal emotion recognition has immense potential for the comprehensive assessment of human emotions, utilizing multiple modalities that often exhibit complementary relationships. In video-based emotion recognition, audio and visual modalities have emerged as prominent contact-free channels, widely explored in existing literature. Current approaches typically employ cross-modal attention mechanisms between audio and visual modalities, assuming a constant state of complementarity. However, this assumption may not always hold true, as non-complementary relationships can also manifest, undermining the efficacy of cross-modal feature integration and thereby diminishing the quality of audio-visual feature representations. To tackle this problem, we introduce a novel Incongruity-Aware Cross-Attention (IACA) model, capable of harnessing the benefits of robust complementary relationships while efficiently managing non-complementary scenarios. Specifically, our approach incorporates a two-stage gating mechanism designed to adaptively select semantic features, thereby effectively capturing the inter-modal associations. Additionally, the proposed model demonstrates an ability to mitigate the adverse effects of severely corrupted or missing modalities. We rigorously evaluate the performance of the proposed model through extensive experiments conducted on the challenging RECOLA and Aff-Wild2 datasets. The results underscore the efficacy of our approach, as it outperforms state-of-the-art methods by adeptly capturing inter-modal relationships and minimizing the influence of missing or heavily corrupted modalities. Furthermore, we show that the proposed model is compatible with various cross-modal attention variants, consistently improving performance on both datasets.

多模态情感识别在全面评估人类情感方面具有巨大潜力，它利用的多种模态往往表现出互补关系。在基于视频的情绪识别中，音频和视觉模式已成为突出的非接触渠道，在现有文献中得到了广泛探讨。当前的方法通常采用音频和视觉模式之间的跨模式注意机制，假设互补性处于恒定状态。然而，这一假设并不总是成立的，因为非互补关系也可能表现出来，从而削弱跨模态特征整合的效果，进而降低视听特征表征的质量。为了解决这个问题，我们引入了一种新颖的不协调感知交叉注意（IACA）模型，该模型能够利用稳健互补关系的优势，同时有效管理非互补情景。具体来说，我们的方法采用了两阶段门控机制，旨在自适应地选择语义特征，从而有效捕捉模式间关联。此外，所提出的模型还能减轻严重损坏或缺失模态的不利影响。我们在具有挑战性的 RECOLA 和 Aff-Wild2 数据集上进行了大量实验，严格评估了所提模型的性能。实验结果表明，我们的方法能够巧妙地捕捉模态间的关系，并最大限度地减少缺失或严重损坏模态的影响，因此其性能优于最先进的方法。此外，我们还证明了所提出的模型与各种跨模态注意力变体兼容，在两个数据集上的性能都得到了持续改善。

{"title":"Incongruity-Aware Cross-Modal Attention for Audio-Visual Fusion in Dimensional Emotion Recognition","authors":"R. Gnana Praveen;Jahangir Alam","doi":"10.1109/JSTSP.2024.3422823","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3422823","url":null,"abstract":"Multimodal emotion recognition has immense potential for the comprehensive assessment of human emotions, utilizing multiple modalities that often exhibit complementary relationships. In video-based emotion recognition, audio and visual modalities have emerged as prominent contact-free channels, widely explored in existing literature. Current approaches typically employ cross-modal attention mechanisms between audio and visual modalities, assuming a constant state of complementarity. However, this assumption may not always hold true, as non-complementary relationships can also manifest, undermining the efficacy of cross-modal feature integration and thereby diminishing the quality of audio-visual feature representations. To tackle this problem, we introduce a novel Incongruity-Aware Cross-Attention (IACA) model, capable of harnessing the benefits of robust complementary relationships while efficiently managing non-complementary scenarios. Specifically, our approach incorporates a two-stage gating mechanism designed to adaptively select semantic features, thereby effectively capturing the inter-modal associations. Additionally, the proposed model demonstrates an ability to mitigate the adverse effects of severely corrupted or missing modalities. We rigorously evaluate the performance of the proposed model through extensive experiments conducted on the challenging RECOLA and Aff-Wild2 datasets. The results underscore the efficacy of our approach, as it outperforms state-of-the-art methods by adeptly capturing inter-modal relationships and minimizing the influence of missing or heavily corrupted modalities. Furthermore, we show that the proposed model is compatible with various cross-modal attention variants, consistently improving performance on both datasets.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"444-458"},"PeriodicalIF":8.7,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Topology-Preserving Motion Coordination for Multi-Robot Systems in Adversarial Environments 逆境中多机器人系统的拓扑保全运动协调

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-07-02 DOI: 10.1109/JSTSP.2024.3421898

Zitong Wang;Yushan Li;Xiaoming Duan;Jianping He

The interaction topology plays a significant role in the distributed motion coordination of multi-robot systems (MRSs) for its noticeable impact on the information flow between robots. However, recent research has revealed that in adversarial environments, the topology can be inferred by external adversaries equipped with advanced sensors, posing severe security risks to MRSs. Therefore, it is of utmost importance to preserve the interaction topology from inference attacks while ensuring the coordination performance. To this end, we propose a topology-preserving motion coordination (TPMC) algorithm that strategically introduces perturbation signals during the coordination process with a compensation design. The major novelty is threefold: i) We focus on the second-order motion coordination model and tackle the coupling issue of the perturbation signals with the unstable state updating process; ii) We develop a general framework for distributed compensation of perturbation signals, strategically addressing the challenge of perturbation accumulation while ensuring precise motion coordination; iii) We derive the convergence conditions and rate characterization to achieve the motion coordination under the TPMC algorithm. Extensive simulations and real-world experiments are conducted to verify the performance of the proposed method.

交互拓扑在多机器人系统（MRS）的分布式运动协调中发挥着重要作用，因为它对机器人之间的信息流有明显影响。然而，最近的研究表明，在对抗环境中，外部对手可以通过配备的先进传感器推断出拓扑结构，从而给多机器人系统带来严重的安全风险。因此，在确保协调性能的同时，保护交互拓扑免受推理攻击至关重要。为此，我们提出了一种拓扑保护运动协调（TPMC）算法，在协调过程中策略性地引入扰动信号，并进行补偿设计。该算法的主要创新点包括三个方面：i) 我们关注二阶运动协调模型，并解决了扰动信号与不稳定状态更新过程的耦合问题；ii) 我们开发了扰动信号分布式补偿的通用框架，在确保精确运动协调的同时，战略性地解决了扰动累积的难题；iii) 我们推导了收敛条件和速率特征，以实现 TPMC 算法下的运动协调。我们进行了广泛的模拟和实际实验，以验证所提方法的性能。

{"title":"Topology-Preserving Motion Coordination for Multi-Robot Systems in Adversarial Environments","authors":"Zitong Wang;Yushan Li;Xiaoming Duan;Jianping He","doi":"10.1109/JSTSP.2024.3421898","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3421898","url":null,"abstract":"The interaction topology plays a significant role in the distributed motion coordination of multi-robot systems (MRSs) for its noticeable impact on the information flow between robots. However, recent research has revealed that in adversarial environments, the topology can be inferred by external adversaries equipped with advanced sensors, posing severe security risks to MRSs. Therefore, it is of utmost importance to preserve the interaction topology from inference attacks while ensuring the coordination performance. To this end, we propose a topology-preserving motion coordination (TPMC) algorithm that strategically introduces perturbation signals during the coordination process with a compensation design. The major novelty is threefold: i) We focus on the second-order motion coordination model and tackle the coupling issue of the perturbation signals with the unstable state updating process; ii) We develop a general framework for distributed compensation of perturbation signals, strategically addressing the challenge of perturbation accumulation while ensuring precise motion coordination; iii) We derive the convergence conditions and rate characterization to achieve the motion coordination under the TPMC algorithm. Extensive simulations and real-world experiments are conducted to verify the performance of the proposed method.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"473-486"},"PeriodicalIF":8.7,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Robust Robot Perception Framework for Complex Environments Using Multiple mmWave Radars 使用多个毫米波雷达的复杂环境鲁棒机器人感知框架

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-06-28 DOI: 10.1109/JSTSP.2024.3420234

Hongyu Chen;Yimin Liu;Yuwei Cheng

The robust perception of environments is crucial for mobile robots to operate autonomously in complex environments. Over the years, mobile robots mainly rely on optical sensors for perception, which degrade severely in adverse weather conditions. Recently, single-chip millimeter-wave (mmWave) radars have been widely used for mobile perception, owing to their robustness to all-weather conditions, lightweight design, and low cost. However, existing research based on mmWave radars primarily focuses on single radar and single task. Due to the limited field of view and sparse observation, perception based on a single radar may not ensure the required robustness in complex environments. To address this challenge, we propose a novel robust perception framework for robots in complex environments based on multiple mmWave radars, named MMR-PFR. The framework integrates three critical tasks for robots, including ego-motion estimation, multi-radar fusion mapping, and dynamic target state estimation. Multiple tasks collaborate and facilitate each other to improve overall performance. In the framework, we propose a new multi-radar point cloud fusion method to generate a more accurate environmental map. In addition, we propose a new online calibration algorithm for multiple radars to ensure the long-term reliability of the system. To evaluate MMR-PRF, we build a prototype and carry out experiments in real-world scenarios. The evaluation results show the effectiveness and superiority of the proposed framework.

移动机器人要想在复杂环境中自主运行，对环境的可靠感知至关重要。多年来，移动机器人主要依靠光学传感器进行感知，但光学传感器在恶劣天气条件下会严重退化。最近，单芯片毫米波（mmWave）雷达因其在全天候条件下的鲁棒性、轻量化设计和低成本而被广泛用于移动感知。然而，基于毫米波雷达的现有研究主要集中在单雷达和单任务上。由于视场有限和观测稀疏，基于单个雷达的感知可能无法确保在复杂环境下所需的鲁棒性。为了应对这一挑战，我们提出了一种基于多个毫米波雷达的新型复杂环境下机器人鲁棒感知框架，命名为 MMR-PFR。该框架集成了机器人的三个关键任务，包括自我运动估计、多雷达融合映射和动态目标状态估计。多个任务相互协作、相互促进，以提高整体性能。在该框架中，我们提出了一种新的多雷达点云融合方法，以生成更精确的环境地图。此外，我们还提出了一种新的多雷达在线校准算法，以确保系统的长期可靠性。为了评估 MMR-PRF，我们建立了一个原型，并在实际场景中进行了实验。评估结果表明了所提框架的有效性和优越性。

{"title":"A Robust Robot Perception Framework for Complex Environments Using Multiple mmWave Radars","authors":"Hongyu Chen;Yimin Liu;Yuwei Cheng","doi":"10.1109/JSTSP.2024.3420234","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3420234","url":null,"abstract":"The robust perception of environments is crucial for mobile robots to operate autonomously in complex environments. Over the years, mobile robots mainly rely on optical sensors for perception, which degrade severely in adverse weather conditions. Recently, single-chip millimeter-wave (mmWave) radars have been widely used for mobile perception, owing to their robustness to all-weather conditions, lightweight design, and low cost. However, existing research based on mmWave radars primarily focuses on single radar and single task. Due to the limited field of view and sparse observation, perception based on a single radar may not ensure the required robustness in complex environments. To address this challenge, we propose a novel robust perception framework for robots in complex environments based on multiple mmWave radars, named MMR-PFR. The framework integrates three critical tasks for robots, including ego-motion estimation, multi-radar fusion mapping, and dynamic target state estimation. Multiple tasks collaborate and facilitate each other to improve overall performance. In the framework, we propose a new multi-radar point cloud fusion method to generate a more accurate environmental map. In addition, we propose a new online calibration algorithm for multiple radars to ensure the long-term reliability of the system. To evaluate MMR-PRF, we build a prototype and carry out experiments in real-world scenarios. The evaluation results show the effectiveness and superiority of the proposed framework.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"380-395"},"PeriodicalIF":8.7,"publicationDate":"2024-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137512","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Holographic Imaging With XL-MIMO and RIS: Illumination and Reflection Design 利用 XL-MIMO 和 RIS 进行全息成像：照明和反射设计

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-06-20 DOI: 10.1109/JSTSP.2024.3417356

G. Torcolacci;A. Guerra;H. Zhang;F. Guidi;Q. Yang;Y. C. Eldar;D. Dardari

This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scattering coefficients. Leveraging XL-MIMO and high frequencies, the ROI is situated in the radiative near-field region of both antenna arrays, thus enhancing the degrees of freedom (DoF) (i.e., the channel matrix rank) of the illuminating and sensing channels available for imaging, here referred to as holographic imaging. To further boost the imaging performance, we optimize the illuminating waveform by solving a min-max optimization problem having the upper bound of the mean squared error (MSE) of the image estimate as the objective function. Additionally, we address the challenge of non-line-of-sight (NLOS) scenarios by considering the presence of a RIS and deriving its optimal reflection coefficients. Numerical results investigate the interplay between illumination optimization, geometric configuration (monostatic and bistatic), the DoF of the illuminating and sensing channels, image estimation accuracy, and image complexity.

本文利用超大规模多输入多输出（XL-MIMO）天线和已用于无线通信的可重构智能表面（RIS）来解决近场成像问题。为此，我们考虑使用一个固定发射天线阵列照亮感兴趣区域（ROI）和一个固定接收天线阵列推断感兴趣区域散射系数的系统。利用 XL-MIMO 和高频，ROI 位于两个天线阵列的辐射近场区域，从而增强了可用于成像的照明和传感信道的自由度（DoF）（即信道矩阵秩），这里称为全息成像。为了进一步提高成像性能，我们通过求解一个最小-最大优化问题来优化照明波形，该问题的目标函数是图像估计的均方误差 (MSE) 上限。此外，我们还考虑到 RIS 的存在，并推导出其最佳反射系数，从而应对非视距（NLOS）场景的挑战。数值结果研究了照明优化、几何配置（单静态和双静态）、照明和传感通道的 DoF、图像估计精度和图像复杂度之间的相互作用。

{"title":"Holographic Imaging With XL-MIMO and RIS: Illumination and Reflection Design","authors":"G. Torcolacci;A. Guerra;H. Zhang;F. Guidi;Q. Yang;Y. C. Eldar;D. Dardari","doi":"10.1109/JSTSP.2024.3417356","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3417356","url":null,"abstract":"This paper addresses a near-field imaging problem utilizing extremely large-scale multiple-input multiple-output (XL-MIMO) antennas and reconfigurable intelligent surfaces (RISs) already in place for wireless communications. To this end, we consider a system with a fixed transmitting antenna array illuminating a region of interest (ROI) and a fixed receiving antenna array inferring the ROI's scattering coefficients. Leveraging XL-MIMO and high frequencies, the ROI is situated in the radiative near-field region of both antenna arrays, thus enhancing the degrees of freedom (DoF) (i.e., the channel matrix rank) of the illuminating and sensing channels available for imaging, here referred to as \u0000<italic>holographic imaging</i>\u0000. To further boost the imaging performance, we optimize the illuminating waveform by solving a min-max optimization problem having the upper bound of the mean squared error (MSE) of the image estimate as the objective function. Additionally, we address the challenge of non-line-of-sight (NLOS) scenarios by considering the presence of a RIS and deriving its optimal reflection coefficients. Numerical results investigate the interplay between illumination optimization, geometric configuration (monostatic and bistatic), the DoF of the illuminating and sensing channels, image estimation accuracy, and image complexity.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"587-602"},"PeriodicalIF":8.7,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

When Vision Meets Touch: A Contemporary Review for Visuotactile Sensors From the Signal Processing Perspective 当视觉遇到触觉从信号处理角度看视觉触觉传感器的当代回顾

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-06-19 DOI: 10.1109/JSTSP.2024.3416841

Shoujie Li;Zihan Wang;Changsheng Wu;Xiang Li;Shan Luo;Bin Fang;Fuchun Sun;Xiao-Ping Zhang;Wenbo Ding

Tactile sensors, which provide information about the physical properties of objects, are an essential component of robotic systems. The visuotactile sensing technology with the merits of high resolution and low cost has facilitated the development of robotics from environment exploration to dexterous operation. Over the years, several reviews on visuotactile sensors for robots have been presented, but few of them discussed the significance of signal processing methods to visuotactile sensors. Apart from ingenious hardware design, the full potential of the sensory system toward designated tasks can only be released with the appropriate signal processing methods. Therefore, this paper provides a comprehensive review of visuotactile sensors from the perspective of signal processing methods and outlooks possible future research directions for visuotactile sensors.

触觉传感器可提供物体的物理特性信息，是机器人系统的重要组成部分。视觉触觉传感技术具有分辨率高、成本低的优点，促进了机器人技术从环境探索到灵巧操作的发展。多年来，已有多篇关于机器人视觉触觉传感器的综述，但很少有人讨论信号处理方法对视觉触觉传感器的意义。除了巧妙的硬件设计，只有采用适当的信号处理方法，才能充分发挥感知系统的潜力，完成指定任务。因此，本文从信号处理方法的角度对视觉灵敏传感器进行了全面评述，并展望了视觉灵敏传感器未来可能的研究方向。

引用次数: 0