IEEE Journal of Selected Topics in Signal Processing最新文献_第3页

Near-Field Multiuser Communications Based on Sparse Arrays 基于稀疏阵列的近场多用户通信

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-06-19 DOI: 10.1109/JSTSP.2024.3416681

Kangjian Chen;Chenhao Qi;Geoffrey Ye Li;Octavia A. Dobre

This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitigate the high multiuser interference from the grating lobes of the USAs. To maximize the sum rate of near-field multiuser communications, we optimize the antenna positions of the NSAs, where a successive convex approximation-based antenna position optimization algorithm is proposed. Moreover, we find that the channels of both the USAs and the NSAs show uniform sparsity in the defined surrogate distance-angle (SD-A) domain. Based on the channel sparsity, an on-grid SD-A-domain orthogonal matching pursuit (SDA-OMP) algorithm is developed to estimate multiuser channels. To further improve the resolution of the SDA-OMP, we also design an off-grid SD-A-domain iterative super-resolution channel estimation algorithm. Simulation results demonstrate the superior performance of the proposed methods.

本文研究了基于稀疏阵列（SA）的近场多用户通信。首先，对于均匀阵列（USA），我们分析了信道转向矢量的波束增益，结果表明增加天线间距可以有效提高天线阵列的空间分辨率，从而提高多用户通信的总和速率。然后，我们研究了非均匀 SA（NSA），以减轻 USA 的光栅裂片对多用户的高干扰。为了最大限度地提高近场多用户通信的总和速率，我们优化了 NSA 的天线位置，并提出了一种基于连续凸近似的天线位置优化算法。此外，我们还发现美国和非国家行为者的信道在定义的代理距离-角度（SD-A）域中呈现出均匀的稀疏性。基于信道稀疏性，我们开发了一种网格上 SD-A 域正交匹配追求（SDA-OMP）算法来估计多用户信道。为了进一步提高 SDA-OMP 的分辨率，我们还设计了一种离网 SD-A 域迭代超分辨率信道估计算法。仿真结果证明了所提方法的优越性能。

{"title":"Near-Field Multiuser Communications Based on Sparse Arrays","authors":"Kangjian Chen;Chenhao Qi;Geoffrey Ye Li;Octavia A. Dobre","doi":"10.1109/JSTSP.2024.3416681","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3416681","url":null,"abstract":"This paper considers near-field multiuser communications based on sparse arrays (SAs). First, for the uniform SAs (USAs), we analyze the beam gains of channel steering vectors, which shows that increasing the antenna spacings can effectively improve the spatial resolution of the antenna arrays to enhance the sum rate of multiuser communications. Then, we investigate nonuniform SAs (NSAs) to mitigate the high multiuser interference from the grating lobes of the USAs. To maximize the sum rate of near-field multiuser communications, we optimize the antenna positions of the NSAs, where a successive convex approximation-based antenna position optimization algorithm is proposed. Moreover, we find that the channels of both the USAs and the NSAs show uniform sparsity in the defined surrogate distance-angle (SD-A) domain. Based on the channel sparsity, an on-grid SD-A-domain orthogonal matching pursuit (SDA-OMP) algorithm is developed to estimate multiuser channels. To further improve the resolution of the SDA-OMP, we also design an off-grid SD-A-domain iterative super-resolution channel estimation algorithm. Simulation results demonstrate the superior performance of the proposed methods.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"619-632"},"PeriodicalIF":8.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RoboFiSense: Attention-Based Robotic Arm Activity Recognition With WiFi Sensing RoboFiSense：利用 WiFi 传感技术识别基于注意力的机械臂活动

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-06-19 DOI: 10.1109/JSTSP.2024.3416851

Rojin Zandi;Kian Behzad;Elaheh Motamedi;Hojjat Salehinejad;Milad Siami

Despite the current surge of interest in autonomous robotic systems, robot activity recognition within restricted indoor environments remains a formidable challenge. Conventional methods for detecting and recognizing robotic arms' activities often rely on vision-based or light detection and ranging (LiDAR) sensors, which require line-of-sight (LoS) access and may raise privacy concerns, for example, in nursing facilities. This research pioneers an innovative approach harnessing channel state information (CSI) measured from WiFi signals, subtly influenced by the activity of robotic arms. We developed an attention-based network to classify eight distinct activities performed by a Franka Emika robotic arm in different situations. Our proposed bidirectional vision transformer-concatenated (BiVTC) methodology aspires to predict robotic arm activities accurately, even when trained on activities with different velocities, all without dependency on external or internal sensors or visual aids. Considering the high dependency of CSI data on the environment motivated us study the problem of sniffer location selection, by systematically changing the sniffer's location and collecting different sets of data. Finally, this paper also marks the first publication of the CSI data of eight distinct robotic arm activities, collectively referred to as RoboFiSense. This initiative aims to provide a benchmark dataset and baselines to the research community, fostering advancements in the field of robotics sensing.

尽管目前人们对自主机器人系统的兴趣大增，但在受限的室内环境中识别机器人的活动仍是一项艰巨的挑战。检测和识别机械臂活动的传统方法通常依赖于基于视觉的传感器或光探测和测距（LiDAR）传感器，这些传感器需要视线（LoS）接入，可能会引发隐私问题，例如在护理设施中。这项研究开创了一种创新方法，利用从 WiFi 信号中测量到的信道状态信息 (CSI)，这些信息会受到机械臂活动的微妙影响。我们开发了一种基于注意力的网络，用于对 Franka Emika 机械臂在不同情况下进行的八种不同活动进行分类。我们提出的双向视觉转换器组合（BiVTC）方法旨在准确预测机械臂的活动，即使是在不同速度的活动中进行训练，也无需依赖外部或内部传感器或视觉辅助设备。考虑到 CSI 数据对环境的高度依赖性，我们通过系统地改变嗅探器的位置和收集不同的数据集，研究了嗅探器位置选择问题。最后，本文也是首次发表八种不同机械臂活动的 CSI 数据，统称为 RoboFiSense。该计划旨在为研究界提供基准数据集和基线，促进机器人传感领域的进步。

{"title":"RoboFiSense: Attention-Based Robotic Arm Activity Recognition With WiFi Sensing","authors":"Rojin Zandi;Kian Behzad;Elaheh Motamedi;Hojjat Salehinejad;Milad Siami","doi":"10.1109/JSTSP.2024.3416851","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3416851","url":null,"abstract":"Despite the current surge of interest in autonomous robotic systems, robot activity recognition within restricted indoor environments remains a formidable challenge. Conventional methods for detecting and recognizing robotic arms' activities often rely on vision-based or light detection and ranging (LiDAR) sensors, which require line-of-sight (LoS) access and may raise privacy concerns, for example, in nursing facilities. This research pioneers an innovative approach harnessing channel state information (CSI) measured from WiFi signals, subtly influenced by the activity of robotic arms. We developed an attention-based network to classify eight distinct activities performed by a Franka Emika robotic arm in different situations. Our proposed bidirectional vision transformer-concatenated (BiVTC) methodology aspires to predict robotic arm activities accurately, even when trained on activities with different velocities, all without dependency on external or internal sensors or visual aids. Considering the high dependency of CSI data on the environment motivated us study the problem of sniffer location selection, by systematically changing the sniffer's location and collecting different sets of data. Finally, this paper also marks the first publication of the CSI data of eight distinct robotic arm activities, collectively referred to as RoboFiSense. This initiative aims to provide a benchmark dataset and baselines to the research community, fostering advancements in the field of robotics sensing.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"396-406"},"PeriodicalIF":8.7,"publicationDate":"2024-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RIS-Enabled NLoS Near-Field Joint Position and Velocity Estimation Under User Mobility 用户移动情况下基于 RIS 的 NLoS 近场联合位置和速度估计

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-06-13 DOI: 10.1109/JSTSP.2024.3414110

Moustafa Rahal;Benoit Denis;Musa Furkan Keskin;Bernard Uguen;Henk Wymeersch

In the context of single-base station (BS) non-line-of-sight (NLoS) single-epoch localization with the aid of a reflective reconfigurable intelligent surface (RIS), this paper introduces a novel three-step algorithm that jointly estimates the position and velocity of a mobile user equipment (UE), while compensating for the Doppler effects observed in near-field (NF) at the RIS elements over the short transmission duration of a sequence of downlink (DL) pilot symbols. First, a low-complexity initialization procedure is proposed, relying in part on far-field (FF) approximation and a static user assumption. Then, an alternating optimization procedure is designed to iteratively refine the velocity and position estimates, as well as the channel gain. The refinement routines leverage small angle approximations and the linearization of the RIS response, accounting for both NF and mobility effects. We evaluate the performance of the proposed algorithm through extensive simulations under diverse operating conditions with regard to signal-to-noise ratio (SNR), UE mobility, uncontrolled multipath and RIS-UE distance. Our results reveal remarkable performance improvements over the state-of-the-art (SoTA) mobility-agnostic benchmark algorithm, while indicating convergence of the proposed algorithm to respective theoretical bounds on position and velocity estimation.

在借助反射式可重构智能表面（RIS）进行单基站（BS）非视距（NLoS）单波段定位的背景下，本文介绍了一种新颖的三步算法，该算法可联合估计移动用户设备（UE）的位置和速度，同时补偿在下行链路（DL）先导符号序列的短传输持续时间内，在 RIS 元件上观察到的近场（NF）多普勒效应。首先，提出了一种低复杂度初始化程序，部分依赖于远场（FF）近似和静态用户假设。然后，设计了一种交替优化程序，以迭代改进速度和位置估计以及信道增益。细化例程利用了小角度近似和 RIS 响应的线性化，同时考虑了 NF 和移动效应。我们通过大量仿真评估了所提算法在信噪比 (SNR)、UE 移动性、不受控多径和 RIS-UE 距离等不同工作条件下的性能。我们的结果表明，与最先进的不考虑移动性的基准算法（SoTA）相比，该算法的性能有了显著提高，同时还表明，所提出的算法在位置和速度估计方面收敛到了各自的理论边界。

{"title":"RIS-Enabled NLoS Near-Field Joint Position and Velocity Estimation Under User Mobility","authors":"Moustafa Rahal;Benoit Denis;Musa Furkan Keskin;Bernard Uguen;Henk Wymeersch","doi":"10.1109/JSTSP.2024.3414110","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3414110","url":null,"abstract":"In the context of single-base station (BS) non-line-of-sight (NLoS) single-epoch localization with the aid of a reflective reconfigurable intelligent surface (RIS), this paper introduces a novel three-step algorithm that jointly estimates the position and velocity of a mobile user equipment (UE), while compensating for the Doppler effects observed in near-field (NF) at the RIS elements over the short transmission duration of a sequence of downlink (DL) pilot symbols. First, a low-complexity initialization procedure is proposed, relying in part on far-field (FF) approximation and a static user assumption. Then, an alternating optimization procedure is designed to iteratively refine the velocity and position estimates, as well as the channel gain. The refinement routines leverage small angle approximations and the linearization of the RIS response, accounting for both NF and mobility effects. We evaluate the performance of the proposed algorithm through extensive simulations under diverse operating conditions with regard to signal-to-noise ratio (SNR), UE mobility, uncontrolled multipath and RIS-UE distance. Our results reveal remarkable performance improvements over the state-of-the-art (SoTA) mobility-agnostic benchmark algorithm, while indicating convergence of the proposed algorithm to respective theoretical bounds on position and velocity estimation.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"633-645"},"PeriodicalIF":8.7,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayes-Optimal Unsupervised Learning for Channel Estimation in Near-Field Holographic MIMO 用于近场全息多输入输出（MIMO）信道估计的贝叶斯最优无监督学习

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-06-13 DOI: 10.1109/JSTSP.2024.3414137

Wentao Yu;Hengtao He;Xianghao Yu;Shenghui Song;Jun Zhang;Ross Murch;Khaled B. Letaief

Holographic MIMO (HMIMO) is being increasingly recognized as a key enabling technology for 6G wireless systems through the deployment of an extremely large number of antennas within a compact space to fully exploit the potentials of the electromagnetic (EM) channel. Nevertheless, the benefits of HMIMO systems cannot be fully unleashed without an efficient means to estimate the high-dimensional channel, whose distribution becomes increasingly complicated due to the accessibility of the near-field region. In this paper, we address the fundamental challenge of designing a low-complexity Bayes-optimal channel estimator in near-field HMIMO systems operating in unknown EM environments. The core idea is to estimate the HMIMO channels solely based on the Stein' s score function of the received pilot signals and an estimated noise level, without relying on priors or supervision that is not feasible in practical deployment. A neural network is trained with the unsupervised denoising score matching objective to learn the parameterized score function. Meanwhile, a principal component analysis (PCA)-based algorithm is proposed to estimate the noise level leveraging the low-rank near-field spatial correlation. Building upon these techniques, we develop a Bayes-optimal score-based channel estimator for fully-digital HMIMO transceivers in a closed form. The optimal score-based estimator is also extended to hybrid analog-digital HMIMO systems by incorporating it into a low-complexity message passing algorithm. The (quasi-) Bayes-optimality of the proposed estimators is validated both in theory and by extensive simulation results. In addition to optimality, it is shown that our proposal is robust to various mismatches and can quickly adapt to dynamic EM environments in an online manner thanks to its unsupervised nature, demonstrating its potential in real-world deployment.

全息多输入多输出（HMIMO）通过在紧凑的空间内部署大量天线来充分利用电磁（EM）信道的潜力，被越来越多地认为是 6G 无线系统的一项关键使能技术。然而，如果没有估算高维信道的有效方法，HMIMO 系统的优势就无法充分释放，而由于近场区域的可达性，高维信道的分布变得越来越复杂。在本文中，我们要解决在未知电磁环境中运行的近场 HMIMO 系统中设计低复杂度贝叶斯最优信道估计器这一基本挑战。其核心思想是仅根据接收到的先导信号的 Stein' 分数函数和估计的噪声电平来估计 HMIMO 信道，而不依赖在实际部署中不可行的先验或监督。采用无监督去噪分数匹配目标训练神经网络，以学习参数化分数函数。同时，我们提出了一种基于主成分分析（PCA）的算法，利用低级近场空间相关性来估计噪声水平。在这些技术的基础上，我们以闭合形式为全数字 HMIMO 收发器开发了基于贝叶斯最优分数的信道估计器。通过将基于分数的最优估计器纳入低复杂度的消息传递算法，该估计器还扩展到了模拟-数字混合 HMIMO 系统。理论和大量仿真结果都验证了所提估计器的（准）贝叶斯最优性。除了最优性之外，我们的建议还对各种不匹配具有鲁棒性，并且由于其无监督的性质，能够以在线方式快速适应动态电磁环境，这证明了它在实际部署中的潜力。

{"title":"Bayes-Optimal Unsupervised Learning for Channel Estimation in Near-Field Holographic MIMO","authors":"Wentao Yu;Hengtao He;Xianghao Yu;Shenghui Song;Jun Zhang;Ross Murch;Khaled B. Letaief","doi":"10.1109/JSTSP.2024.3414137","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3414137","url":null,"abstract":"Holographic MIMO (HMIMO) is being increasingly recognized as a key enabling technology for 6G wireless systems through the deployment of an extremely large number of antennas within a compact space to fully exploit the potentials of the electromagnetic (EM) channel. Nevertheless, the benefits of HMIMO systems cannot be fully unleashed without an efficient means to estimate the high-dimensional channel, whose distribution becomes increasingly complicated due to the accessibility of the near-field region. In this paper, we address the fundamental challenge of designing a \u0000<italic>low-complexity Bayes-optimal\u0000 channel estimator in near-field HMIMO systems operating in \u0000<italic>unknown\u0000 EM environments. The core idea is to estimate the HMIMO channels solely based on the Stein' s score function of the received pilot signals and an estimated noise level, \u0000<italic>without\u0000 relying on priors or supervision that is not feasible in practical deployment. A neural network is trained with the unsupervised denoising score matching objective to learn the parameterized score function. Meanwhile, a principal component analysis (PCA)-based algorithm is proposed to estimate the noise level leveraging the low-rank near-field spatial correlation. Building upon these techniques, we develop a Bayes-optimal score-based channel estimator for fully-digital HMIMO transceivers in a closed form. The optimal score-based estimator is also extended to hybrid analog-digital HMIMO systems by incorporating it into a low-complexity message passing algorithm. The (quasi-) Bayes-optimality of the proposed estimators is validated both in theory and by extensive simulation results. In addition to optimality, it is shown that our proposal is robust to various mismatches and can quickly adapt to dynamic EM environments in an online manner thanks to its unsupervised nature, demonstrating its potential in real-world deployment.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"714-729"},"PeriodicalIF":8.7,"publicationDate":"2024-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10556727","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Modeling and Analysis of Near-Field ISAC 近场 ISAC 建模与分析

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-04-12 DOI: 10.1109/JSTSP.2024.3386054

Boqun Zhao;Chongjun Ouyang;Yuanwei Liu;Xingqi Zhang;H. Vincent Poor

As technologies envisioned for next-generation wireless networks significantly extend the near-field region, it is of interest to reevaluate integrated sensing and communications (ISAC) with an appropriate channel model to account for the effects introduced by the near field. In this article, a near-field ISAC framework is proposed for both downlink and uplink scenarios based on such a channel model. We consider a base station equipped with a uniform planar array, and the impacts of the effective aperture and polarization of antennas are considered. For the downlink case, three distinct designs are studied: a communications-centric (C-C) design, a sensing-centric (S-C) design, and a Pareto optimal design. Regarding the uplink case, the C-C design, the S-C design and a time-sharing strategy are considered. Within each design, sensing rates (SRs) and communication rates (CRs) are derived. To gain further insights, high signal-to-noise ratio slopes and rate scaling laws concerning the number of antennas are examined. The attainable near-field SR-CR regions of ISAC and the baseline frequency-division S&C are also characterized. Numerical results reveal that, as the number of antennas in the array grows, the SRs and CRs under our model converge to finite values, while those under conventional far- and near-field models exhibit unbounded growth, highlighting the importance of precise channel modeling for near-field ISAC.

由于下一代无线网络技术的设想大大扩展了近场区域，因此采用适当的信道模型重新评估综合传感与通信（ISAC）以考虑近场带来的影响是很有意义的。本文基于这种信道模型，为下行链路和上行链路场景提出了近场 ISAC 框架。我们考虑了一个配备均匀平面阵列的基站，并考虑了天线有效孔径和极化的影响。对于下行链路情况，我们研究了三种不同的设计：以通信为中心的设计（C-C）、以传感为中心的设计（S-C）和帕累托最优设计。对于上行链路，则考虑了 C-C 设计、S-C 设计和分时策略。在每种设计中，都得出了传感速率（SR）和通信速率（CR）。为了进一步深入了解，还研究了高信噪比斜率和与天线数量有关的速率缩放规律。此外，还描述了 ISAC 和基线频分 S&C 可达到的近场 SR-CR 区域。数值结果表明，随着阵列中天线数量的增加，我们的模型下的 SR 和 CR 收敛到有限值，而传统远场和近场模型下的 SR 和 CR 则呈现无限制增长，这凸显了精确信道建模对于近场 ISAC 的重要性。

{"title":"Modeling and Analysis of Near-Field ISAC","authors":"Boqun Zhao;Chongjun Ouyang;Yuanwei Liu;Xingqi Zhang;H. Vincent Poor","doi":"10.1109/JSTSP.2024.3386054","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3386054","url":null,"abstract":"As technologies envisioned for next-generation wireless networks significantly extend the near-field region, it is of interest to reevaluate integrated sensing and communications (ISAC) with an appropriate channel model to account for the effects introduced by the near field. In this article, a near-field ISAC framework is proposed for both downlink and uplink scenarios based on such a channel model. We consider a base station equipped with a uniform planar array, and the impacts of the effective aperture and polarization of antennas are considered. For the downlink case, three distinct designs are studied: a communications-centric (C-C) design, a sensing-centric (S-C) design, and a Pareto optimal design. Regarding the uplink case, the C-C design, the S-C design and a time-sharing strategy are considered. Within each design, sensing rates (SRs) and communication rates (CRs) are derived. To gain further insights, high signal-to-noise ratio slopes and rate scaling laws concerning the number of antennas are examined. The attainable near-field SR-CR regions of ISAC and the baseline frequency-division S&C are also characterized. Numerical results reveal that, as the number of antennas in the array grows, the SRs and CRs under our model converge to finite values, while those under conventional far- and near-field models exhibit unbounded growth, highlighting the importance of precise channel modeling for near-field ISAC.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 4","pages":"678-693"},"PeriodicalIF":8.7,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PaLmTac: A Vision-Based Tactile Sensor Leveraging Distributed-Modality Design and Modal-Matching Recognition for Soft Hand Perception PaLmTac：基于视觉的触觉传感器，利用分布式模态设计和模态匹配识别实现软手感知

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-04-08 DOI: 10.1109/JSTSP.2024.3386070

Shixin Zhang;Yiyong Yang;Jianhua Shan;Fuchun Sun;Hongxiang Xue;Bin Fang

This paper proposes a vision-based tactile sensor (VBTS) embedded into the soft hand palm, named PaLmTac. We adopt a distributed modality design instead of overlaying function layers. On the one hand, the problem of unrelated modality integration (texture and temperature) is solved. On the other hand, combining regional recognition can avoid mixed unrelated information. Herein, a Level-Regional Feature Extraction Network (LRFE-Net) is presented to match the modality design. We leverage feature mapping, regional convolution, and regional vectorization to construct the regional recognition mechanism, which can extract features in parallel and control fusion degrees. The level recognition mechanism balances the learning difficulty of each modality. Compared with the existing VBTSs, the PaLmTac optimizes unrelated modality integration and reduces fusion interference. This paper provides a novel idea of multimodal VBTS design and sensing mechanism, which is expected to be applied to human-computer interaction scenarios based on multimodal fusion.

本文提出了一种嵌入软手掌的视觉触觉传感器（VBTS），命名为 PaLmTac。我们采用分布式模态设计，而非功能层叠加。一方面，解决了不相关模态整合（纹理和温度）的问题。另一方面，结合区域识别可以避免不相关信息的混合。在此，我们提出了一种与模态设计相匹配的层级区域特征提取网络（LRFE-Net）。我们利用特征映射、区域卷积和区域矢量化来构建区域识别机制，可以并行提取特征并控制融合度。水平识别机制平衡了每种模态的学习难度。与现有的 VBTS 相比，PaLmTac 优化了非相关模态的融合，减少了融合干扰。本文提供了一种新颖的多模态 VBTS 设计思路和感知机制，有望应用于基于多模态融合的人机交互场景。

{"title":"PaLmTac: A Vision-Based Tactile Sensor Leveraging Distributed-Modality Design and Modal-Matching Recognition for Soft Hand Perception","authors":"Shixin Zhang;Yiyong Yang;Jianhua Shan;Fuchun Sun;Hongxiang Xue;Bin Fang","doi":"10.1109/JSTSP.2024.3386070","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3386070","url":null,"abstract":"This paper proposes a vision-based tactile sensor (VBTS) embedded into the soft hand palm, named PaLmTac. We adopt a distributed modality design instead of overlaying function layers. On the one hand, the problem of unrelated modality integration (texture and temperature) is solved. On the other hand, combining regional recognition can avoid mixed unrelated information. Herein, a Level-Regional Feature Extraction Network (LRFE-Net) is presented to match the modality design. We leverage feature mapping, regional convolution, and regional vectorization to construct the regional recognition mechanism, which can extract features in parallel and control fusion degrees. The level recognition mechanism balances the learning difficulty of each modality. Compared with the existing VBTSs, the PaLmTac optimizes unrelated modality integration and reduces fusion interference. This paper provides a novel idea of multimodal VBTS design and sensing mechanism, which is expected to be applied to human-computer interaction scenarios based on multimodal fusion.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"288-298"},"PeriodicalIF":8.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-Time Large-Motion Deblurring for Gimbal-Based Imaging Systems 基于云台成像系统的实时大运动去模糊技术

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-04-08 DOI: 10.1109/JSTSP.2024.3386056

Nisha Varghese;A. N. Rajagopalan;Zahir Ahmed Ansari

Robotic systems employed in tasks such as navigation, target tracking, security, and surveillance often use camera gimbal systems to enhance their monitoring and security capabilities. These camera gimbal systems undergo fast to-and-fro rotational motion to surveil the extended field of view (FOV). A high steering rate (rotation angle per second) of the gimbal is essential to revisit a given scene as fast as possible, which results in significant motion blur in the captured video frames. Real-time motion deblurring is essential in surveillance robots since the subsequent image-processing tasks demand immediate availability of blur-free images. Existing deep learning (DL) based motion deblurring methods either lack real-time performance due to network complexity or suffer from poor deblurring quality for large motion blurs. In this work, we propose a Gyro-guided Network for Real-time motion deblurring (GRNet) which makes effective use of existing prior information to improve deblurring without increasing the complexity of the network. The steering rate of the gimbal is taken as a prior for data generation. A contrastive learning scheme is introduced for the network to learn the amount of blur in an image by utilizing the knowledge of blur content in images during training. To the GRNet, a sharp reference image is additionally given as input to guide the deblurring process. The most relevant features from the reference image are selected using a cross-attention module. Our method works in real-time at 30 fps. As a first, we propose a Gimbal Yaw motion Real-wOrld (GYRO) dataset of infrared (IR) as well as color images with significant motion blur along with the inertial measurements of camera rotation, captured by a gimbal-based imaging setup where the gimbal undergoes rotational yaw motion. Both qualitative and quantitative evaluations on our proposed GYRO dataset, demonstrate the practical utility of our method.

用于导航、目标跟踪、安全和监视等任务的机器人系统经常使用摄像云台系统来增强其监控和安全能力。这些摄像万向节系统通过快速的来回旋转运动来监视扩展视场（FOV）。万向节的高转向率（每秒旋转角度）对于尽快重访特定场景至关重要，这会导致捕捉到的视频帧出现明显的运动模糊。实时运动去模糊对监控机器人至关重要，因为后续的图像处理任务需要立即获得无模糊图像。现有的基于深度学习（DL）的运动去模糊方法要么因网络复杂性而缺乏实时性，要么因运动模糊较大而去模糊质量较差。在这项工作中，我们提出了一种用于实时运动去模糊的陀螺仪引导网络（GRNet），它能有效利用现有的先验信息，在不增加网络复杂度的情况下改善去模糊效果。万向节的转向率被作为数据生成的先验信息。该网络引入了对比学习方案，在训练过程中利用图像中模糊内容的知识来学习图像中的模糊量。此外，GRNet 还将清晰的参考图像作为输入，以指导去模糊过程。使用交叉注意模块从参考图像中选择最相关的特征。我们的方法以 30 fps 的速度实时运行。首先，我们提出了一个万向节偏航运动真实世界（GYRO）数据集，该数据集包含红外（IR）和彩色图像，这些图像具有明显的运动模糊以及相机旋转的惯性测量，由基于万向节的成像装置捕获，其中万向节发生旋转偏航运动。通过对我们提出的 GYRO 数据集进行定性和定量评估，证明了我们的方法非常实用。

{"title":"Real-Time Large-Motion Deblurring for Gimbal-Based Imaging Systems","authors":"Nisha Varghese;A. N. Rajagopalan;Zahir Ahmed Ansari","doi":"10.1109/JSTSP.2024.3386056","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3386056","url":null,"abstract":"Robotic systems employed in tasks such as navigation, target tracking, security, and surveillance often use camera gimbal systems to enhance their monitoring and security capabilities. These camera gimbal systems undergo fast to-and-fro rotational motion to surveil the extended field of view (FOV). A high steering rate (rotation angle per second) of the gimbal is essential to revisit a given scene as fast as possible, which results in significant motion blur in the captured video frames. Real-time motion deblurring is essential in surveillance robots since the subsequent image-processing tasks demand immediate availability of blur-free images. Existing deep learning (DL) based motion deblurring methods either lack real-time performance due to network complexity or suffer from poor deblurring quality for large motion blurs. In this work, we propose a Gyro-guided Network for Real-time motion deblurring (GRNet) which makes effective use of existing prior information to improve deblurring without increasing the complexity of the network. The steering rate of the gimbal is taken as a prior for data generation. A contrastive learning scheme is introduced for the network to learn the amount of blur in an image by utilizing the knowledge of blur content in images during training. To the GRNet, a sharp reference image is additionally given as input to guide the deblurring process. The most relevant features from the reference image are selected using a cross-attention module. Our method works in real-time at 30 fps. As a first, we propose a Gimbal Yaw motion Real-wOrld (GYRO) dataset of infrared (IR) as well as color images with significant motion blur along with the inertial measurements of camera rotation, captured by a gimbal-based imaging setup where the gimbal undergoes rotational yaw motion. Both qualitative and quantitative evaluations on our proposed GYRO dataset, demonstrate the practical utility of our method.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"346-357"},"PeriodicalIF":8.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration SGD 中的自适应 Top-K，实现多机器人协作中高通信效率的分布式学习

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-04-05 DOI: 10.1109/JSTSP.2024.3381373

Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu

Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.

带有梯度压缩功能的分布式随机梯度下降（D-SGD）已成为在多机器人系统等分布式学习系统中加速优化程序的一种流行的通信高效解决方案。一种常用的梯度压缩方法是 Top-K sparsification，该方法在模型训练过程中按固定程度稀疏梯度。然而，目前还缺乏一种系统处理和分析的自适应方法来调整稀疏程度，以最大限度地发挥模型的性能潜力或训练速度。本文在随机梯度下降框架中提出了一种新的自适应 Top-K，通过平衡通信成本和收敛误差与梯度准则和通信预算之间的权衡，为每个梯度下降步骤提供自适应的稀疏化程度，以优化收敛性能。首先，针对自适应稀疏化方案和损失函数推导出收敛误差的上限。其次，我们考虑了通信预算约束，并提出了在这种约束下最小化深度模型收敛误差的优化方案。我们得到了一种增强型压缩算法，它能在给定的通信预算约束条件下显著提高模型精度。最后，我们使用 MNIST 和 CIFAR-10 数据集对一般图像分类任务进行了数值实验。对于多机器人协作任务，我们选择了 PASCAL VOC 数据集上的物体检测任务。结果表明，与最先进的方法相比，即使考虑了误差补偿，SGD 中提出的自适应 Top-K 算法的收敛率也明显更高。

{"title":"Adaptive Top-K in SGD for Communication-Efficient Distributed Learning in Multi-Robot Collaboration","authors":"Mengzhe Ruan;Guangfeng Yan;Yuanzhang Xiao;Linqi Song;Weitao Xu","doi":"10.1109/JSTSP.2024.3381373","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3381373","url":null,"abstract":"Distributed stochastic gradient descent (D-SGD) with gradient compression has become a popular communication-efficient solution for accelerating optimization procedures in distributed learning systems like multi-robot systems. One commonly used method for gradient compression is Top-K sparsification, which sparsifies the gradients by a fixed degree during model training. However, there has been a lack of an adaptive approach with a systematic treatment and analysis to adjust the sparsification degree to maximize the potential of the model's performance or training speed. This paper proposes a novel adaptive Top-K in Stochastic Gradient Descent framework that enables an adaptive degree of sparsification for each gradient descent step to optimize the convergence performance by balancing the trade-off between communication cost and convergence error with respect to the norm of gradients and the communication budget. Firstly, an upper bound of convergence error is derived for the adaptive sparsification scheme and the loss function. Secondly, we consider communication budget constraints and propose an optimization formulation for minimizing the deep model's convergence error under such constraints. We obtain an enhanced compression algorithm that significantly improves model accuracy under given communication budget constraints. Finally, we conduct numerical experiments on general image classification tasks using the MNIST, CIFAR-10 datasets. For the multi-robot collaboration tasks, we choose the object detection task on the PASCAL VOC dataset. The results demonstrate that the proposed adaptive Top-K algorithm in SGD achieves a significantly better convergence rate compared to state-of-the-art methods, even after considering error compensation.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"487-501"},"PeriodicalIF":8.7,"publicationDate":"2024-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Language Model-Based Fine-Grained Address Resolution Framework in UAV Delivery System 无人机投递系统中基于语言模型的细粒度地址解析框架

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-04-03 DOI: 10.1109/JSTSP.2024.3376962

Sichun Luo;Yuxuan Yao;Haohan Zhao;Linqi Song

Accurate address resolution plays a vital role in UAV delivery systems. Existing address resolution systems heavily rely on user-provided Point of Interest (POI) information. However, such information often lacks precision, making it challenging to obtain fine-grained details for further processing. In this paper, we present an end-to-end Language Model-based fine-grained Address Resolution framework (LMAR). Instead of solely relying on POI information, we introduce a language model to process the user input text information. Specifically, we start by collecting data and constructing two datasets, which are then used to fine-tune a pre-trained language model. Additionally, our pipeline incorporates pre-processing and post-processing modules to handle data processing and regularization. We combine the output of the language model with the POI information to perform a database match and derive the final outcome. To evaluate our proposed LMAR, we conduct offline and online experiments. In both offline and online testing, our proposed model achieves an overall performance of over 90% accuracy, while in the online pressure test, it achieves satisfactory performance, demonstrating its effectiveness and practicality. The proposed LMAR has passed the internal test and will be deployed into the Meituan UAV delivery system in the near future.

准确的地址解析在无人机投递系统中起着至关重要的作用。现有的地址解析系统在很大程度上依赖于用户提供的兴趣点（POI）信息。然而，这些信息往往缺乏精确性，因此要获得用于进一步处理的细粒度细节具有挑战性。在本文中，我们提出了一个基于语言模型的端到端细粒度地址解析框架（LMAR）。我们不再单纯依赖 POI 信息，而是引入了语言模型来处理用户输入的文本信息。具体来说，我们首先收集数据并构建两个数据集，然后利用这两个数据集对预先训练好的语言模型进行微调。此外，我们的管道还包含预处理和后处理模块，以处理数据处理和正则化。我们将语言模型的输出与 POI 信息相结合，进行数据库匹配并得出最终结果。为了评估我们提出的 LMAR，我们进行了离线和在线实验。在离线和在线测试中，我们提出的模型总体准确率超过了 90%，而在在线压力测试中，它也取得了令人满意的成绩，证明了它的有效性和实用性。所提出的 LMAR 已通过内部测试，并将在不久的将来部署到美团无人机配送系统中。

{"title":"A Language Model-Based Fine-Grained Address Resolution Framework in UAV Delivery System","authors":"Sichun Luo;Yuxuan Yao;Haohan Zhao;Linqi Song","doi":"10.1109/JSTSP.2024.3376962","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3376962","url":null,"abstract":"Accurate address resolution plays a vital role in UAV delivery systems. Existing address resolution systems heavily rely on user-provided Point of Interest (POI) information. However, such information often lacks precision, making it challenging to obtain fine-grained details for further processing. In this paper, we present an end-to-end \u0000<underline>L\u0000anguage \u0000<underline>M\u0000odel-based fine-grained \u0000<underline>A\u0000ddress \u0000<underline>R\u0000esolution framework (\u0000<underline>LMAR\u0000). Instead of solely relying on POI information, we introduce a language model to process the user input text information. Specifically, we start by collecting data and constructing two datasets, which are then used to fine-tune a pre-trained language model. Additionally, our pipeline incorporates pre-processing and post-processing modules to handle data processing and regularization. We combine the output of the language model with the POI information to perform a database match and derive the final outcome. To evaluate our proposed LMAR, we conduct offline and online experiments. In both offline and online testing, our proposed model achieves an overall performance of over 90% accuracy, while in the online pressure test, it achieves satisfactory performance, demonstrating its effectiveness and practicality. The proposed LMAR has passed the internal test and will be deployed into the Meituan UAV delivery system in the near future.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"529-539"},"PeriodicalIF":8.7,"publicationDate":"2024-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Brain-Inspired Visual Attention Modeling Based on EEG for Intelligent Robotics 基于脑电图的大脑启发式视觉注意力建模，用于智能机器人技术

IF 8.7 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Journal of Selected Topics in Signal Processing

Pub Date : 2024-03-31 DOI: 10.1109/JSTSP.2024.3408100

Shuzhan Hu;Yiping Duan;Xiaoming Tao;Jian Chu;Jianhua Lu

Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.

视觉作为智能机器人的主要感知模式，在各种人机交互（HRI）场景中发挥着至关重要的作用。在某些情况下，必须利用视觉传感器为人类捕捉视频，协助人类执行探索任务等任务。然而，视频信息量的不断增加给数据传输和存储带来了巨大挑战。因此，迫切需要开发更高效的视频压缩策略来应对这一挑战。人类在感知视频时，往往会对一些特定片段给予更多关注，这些片段可能只占整个视频内容的一小部分，但却在很大程度上影响着感知质量。这种人类视觉注意力（VA）机制为优化 HRI 场景下的视频压缩方法提供了宝贵的灵感。因此，我们结合心理生理学范式和机器学习方法，对人类视觉注意力进行建模，并将其引入比特率分配，以充分利用有限的资源。具体来说，我们收集了人类观看视频时的脑电图（EEG）数据，构建了一个反映 VA 的 EEG 数据集。基于该数据集，我们提出了一个 VA 测量模型，以确定人类大脑底层反应中的 VA 状态。然后，建立大脑启发的 VA 预测模型，直接从视频中获取 VA 指标。最后，根据 VA 指标，为人类更关注的片段分配更多比特率。实验结果表明，我们提出的方法可以准确判断人类的 VA 状态，并预测不同视频片段所唤起的 VA 指标。此外，基于 VA 指标的比特率分配方法可以在低比特率下获得更好的感知质量。

{"title":"Brain-Inspired Visual Attention Modeling Based on EEG for Intelligent Robotics","authors":"Shuzhan Hu;Yiping Duan;Xiaoming Tao;Jian Chu;Jianhua Lu","doi":"10.1109/JSTSP.2024.3408100","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3408100","url":null,"abstract":"Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"431-443"},"PeriodicalIF":8.7,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0