Pub Date : 2024-03-31DOI: 10.1109/JSTSP.2024.3408100
Shuzhan Hu;Yiping Duan;Xiaoming Tao;Jian Chu;Jianhua Lu
Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.
视觉作为智能机器人的主要感知模式,在各种人机交互(HRI)场景中发挥着至关重要的作用。在某些情况下,必须利用视觉传感器为人类捕捉视频,协助人类执行探索任务等任务。然而,视频信息量的不断增加给数据传输和存储带来了巨大挑战。因此,迫切需要开发更高效的视频压缩策略来应对这一挑战。人类在感知视频时,往往会对一些特定片段给予更多关注,这些片段可能只占整个视频内容的一小部分,但却在很大程度上影响着感知质量。这种人类视觉注意力(VA)机制为优化 HRI 场景下的视频压缩方法提供了宝贵的灵感。因此,我们结合心理生理学范式和机器学习方法,对人类视觉注意力进行建模,并将其引入比特率分配,以充分利用有限的资源。具体来说,我们收集了人类观看视频时的脑电图(EEG)数据,构建了一个反映 VA 的 EEG 数据集。基于该数据集,我们提出了一个 VA 测量模型,以确定人类大脑底层反应中的 VA 状态。然后,建立大脑启发的 VA 预测模型,直接从视频中获取 VA 指标。最后,根据 VA 指标,为人类更关注的片段分配更多比特率。实验结果表明,我们提出的方法可以准确判断人类的 VA 状态,并预测不同视频片段所唤起的 VA 指标。此外,基于 VA 指标的比特率分配方法可以在低比特率下获得更好的感知质量。
{"title":"Brain-Inspired Visual Attention Modeling Based on EEG for Intelligent Robotics","authors":"Shuzhan Hu;Yiping Duan;Xiaoming Tao;Jian Chu;Jianhua Lu","doi":"10.1109/JSTSP.2024.3408100","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3408100","url":null,"abstract":"Vision, as the primary perceptual mode for intelligent robots, plays a crucial role in various human-robot interaction (HRI) scenarios. In certain situations, it is essential to utilize the visual sensors to capture videos for humans, assisting them in tasks like exploration missions. However, the increasing amount of video information brings great challenges for data transmission and storage. Therefore, there is an urgent need to develop more efficient video compression strategies to address this challenge. When perceiving a video, humans tend to pay more attention to some specific clips, which may occupy a small part of the whole video content, but largely affect the perceptual quality. This human visual attention (VA) mechanism provides valuable inspiration for optimizing video compression methods for HRI scenarios. Therefore, we combine psychophysiological paradigms and machine learning methods to model human VA and introduce it into the bitrate allocation to fully utilize the limited resources. Specifically, we collect electroencephalographic (EEG) data when humans watch videos, constructing an EEG dataset reflecting VA. Based on the dataset, we propose a VA measurement model to determine the VA states of humans in their underlying brain responses. Then, a brain-inspired VA prediction model is established to obtain VA metrics directly from the videos. Finally, based on the VA metric, more bitrates are allocated to the clips that humans pay more attention to. The experimental results show that our proposed methods can accurately determine the humans' VA states and predict the VA metrics evoked by different video clips. Furthermore, the bitrate allocation method based on the VA metric can achieve better perceptual quality at low bitrates.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks. In our recent work, we have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs) and shown that VNNs draw similarities with traditional principal component analysis (PCA) while overcoming its limitations regarding instability. In this paper, we focus on characterizing the transferability of VNNs. The notion of transferability is motivated from the intuitive expectation that learning models could generalize to “compatible” datasets (i.e., datasets of different dimensionalities describing the same domain) with minimal effort. VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance (without re-training) over datasets whose covariance matrices converge to a limit object. Multi-scale neuroimaging datasets enable the study of the brain at multiple scales and hence, provide an ideal scenario to validate the transferability of VNNs. We first demonstrate the quantitative transferability of VNNs over a regression task of predicting chronological age from a multi-scale dataset of cortical thickness features. Further, to elucidate the advantages offered by VNNs in neuroimaging data analysis, we also deploy VNNs as regression models in a pipeline for “brain age” prediction from cortical thickness features. The discordance between brain age and chronological age (“brain age gap”) can reflect increased vulnerability or resilience toward neurological disease or cognitive impairments. The architecture of VNNs allows us to extend beyond the coarse metric of brain age gap and associate anatomical interpretability to elevated brain age gap in Alzheimer's disease (AD). We leverage the transferability of VNNs to cross validate the anatomical interpretability offered by VNNs to brain age gap across datasets of different dimensionalities.
{"title":"Transferability of coVariance Neural Networks","authors":"Saurabh Sihag;Gonzalo Mateos;Corey McMillan;Alejandro Ribeiro","doi":"10.1109/JSTSP.2024.3378887","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3378887","url":null,"abstract":"Graph convolutional networks (GCN) leverage topology-driven graph convolutional operations to combine information across the graph for inference tasks. In our recent work, we have studied GCNs with covariance matrices as graphs in the form of coVariance neural networks (VNNs) and shown that VNNs draw similarities with traditional principal component analysis (PCA) while overcoming its limitations regarding instability. In this paper, we focus on characterizing the transferability of VNNs. The notion of transferability is motivated from the intuitive expectation that learning models could generalize to “compatible” datasets (i.e., datasets of different dimensionalities describing the same domain) with minimal effort. VNNs inherit the scale-free data processing architecture from GCNs and here, we show that VNNs exhibit transferability of performance (without re-training) over datasets whose covariance matrices converge to a limit object. Multi-scale neuroimaging datasets enable the study of the brain at multiple scales and hence, provide an ideal scenario to validate the transferability of VNNs. We first demonstrate the quantitative transferability of VNNs over a regression task of predicting chronological age from a multi-scale dataset of cortical thickness features. Further, to elucidate the advantages offered by VNNs in neuroimaging data analysis, we also deploy VNNs as regression models in a pipeline for “brain age” prediction from cortical thickness features. The discordance between brain age and chronological age (“brain age gap”) can reflect increased vulnerability or resilience toward neurological disease or cognitive impairments. The architecture of VNNs allows us to extend beyond the coarse metric of brain age gap and associate anatomical interpretability to elevated brain age gap in Alzheimer's disease (AD). We leverage the transferability of VNNs to cross validate the anatomical interpretability offered by VNNs to brain age gap across datasets of different dimensionalities.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141500179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-27DOI: 10.1109/JSTSP.2024.3405853
Bowen Zhang;Zhijin Qin;Geoffrey Ye Li
It is crucial to improve data acquisition and transmission efficiency for mobile robots with limited power, memory, and bandwidth resources. For efficient data acquisition, a novel video compressed-sensing system with spatially-variant compression ratios is designed, which offers high imaging quality with low sampling rates; To improve data transmission efficiency, semantic communication is leveraged to reduce bandwidth requirement, which provides high image recovery quality with low transmission rates. In particular, we focus on the trade-off between rate and quality. To address the challenge, we use neural networks to decide the optimal rate allocation policy for given quality requirements. Due to the non-differentiable issue of rate, we train the networks by policy-gradient-based reinforcement learning. Numerical results show the superiority of the proposed methods over the existing baselines.
{"title":"Compression Ratio Learning and Semantic Communications for Video Imaging","authors":"Bowen Zhang;Zhijin Qin;Geoffrey Ye Li","doi":"10.1109/JSTSP.2024.3405853","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3405853","url":null,"abstract":"It is crucial to improve data acquisition and transmission efficiency for mobile robots with limited power, memory, and bandwidth resources. For efficient data acquisition, a novel video compressed-sensing system with spatially-variant compression ratios is designed, which offers high imaging quality with low sampling rates; To improve data transmission efficiency, semantic communication is leveraged to reduce bandwidth requirement, which provides high image recovery quality with low transmission rates. In particular, we focus on the trade-off between rate and quality. To address the challenge, we use neural networks to decide the optimal rate allocation policy for given quality requirements. Due to the non-differentiable issue of rate, we train the networks by policy-gradient-based reinforcement learning. Numerical results show the superiority of the proposed methods over the existing baselines.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10539255","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Visual Place Recognition (VPR) is a critical task within the fields of intelligent robotics and computer vision. It involves retrieving similar database images based on a query photo from an extensive collection of known images. In real-world applications, this task encounters challenges when dealing with extreme illumination changes caused by nighttime query images. However, a large-scale training set with day-night correspondence for VPR remains absent. To address this challenge, we propose a novel pipeline that divides the general VPR into distinct domains of day and night, subsequently conquering Nocturnal Place Recognition (NPR). Specifically, we first establish a daynight street scene dataset, named NightStreet, and use it to train an unpaired image-to-image translation model. Then, we utilize this model to process existing large-scale VPR datasets, generating the night version of VPR datasets and demonstrating how to combine them with two popular VPR pipelines. Finally, we introduce a divide-and-conquer VPR framework designed to solve the degradation of NPR during daytime conditions. We provide comprehensive explanations at theoretical, experimental, and application levels. Under our framework, the performance of previous methods can be significantly improved on two public datasets, including the top-ranked method.
{"title":"NPR: Nocturnal Place Recognition Using Nighttime Translation in Large-Scale Training Procedures","authors":"Bingxi Liu;Yujie Fu;Feng Lu;Jinqiang Cui;Yihong Wu;Hong Zhang","doi":"10.1109/JSTSP.2024.3403247","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3403247","url":null,"abstract":"Visual Place Recognition (VPR) is a critical task within the fields of intelligent robotics and computer vision. It involves retrieving similar database images based on a query photo from an extensive collection of known images. In real-world applications, this task encounters challenges when dealing with extreme illumination changes caused by nighttime query images. However, a large-scale training set with day-night correspondence for VPR remains absent. To address this challenge, we propose a novel pipeline that divides the general VPR into distinct domains of day and night, subsequently conquering Nocturnal Place Recognition (NPR). Specifically, we first establish a daynight street scene dataset, named NightStreet, and use it to train an unpaired image-to-image translation model. Then, we utilize this model to process existing large-scale VPR datasets, generating the night version of VPR datasets and demonstrating how to combine them with two popular VPR pipelines. Finally, we introduce a divide-and-conquer VPR framework designed to solve the degradation of NPR during daytime conditions. We provide comprehensive explanations at theoretical, experimental, and application levels. Under our framework, the performance of previous methods can be significantly improved on two public datasets, including the top-ranked method.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Snapshot Compressive Spectral Imaging Systems (SCSI) compress the scenes by capturing 2D projections of the encoded underlying signals. A decoder, trained with pre-acquired datasets, reconstructs the spectral images. SCSI systems based on diffractive optical elements (DOE) provide a small form factor and the single DOE can be optimized in an end-to-end manner. Since the spectral image is highly compressed in a SCSI system based on a single DOE, the quality of image reconstruction can be insufficient for diverse spectral imaging applications. This work proposes a multishot spectral imaging system employing a double-phase encoding with a double DOE architecture (DoDo), to improve the spectral reconstruction performance. The first DOE is fixed and provides the benefits of the diffractive optical systems. The second DOE provides the variable encoding of the multishot architectures. The work presents a differentiable mathematical model for the multishot DoDo system and optimizes the parameters of the DoDo architecture in an end-to-end manner. The proposed system was tested using simulations and a hardware prototype. To obtain a low-cost system, the implementation uses a deformable mirror for the variable DOE. The proposed DoDo system shows an improvement of up to 4 dB in PSNR in the reconstructed spectral images compared with the single DOE system.
快照压缩光谱成像系统(SCSI)通过捕捉编码底层信号的二维投影来压缩场景。解码器通过预先获取的数据集进行训练,重建光谱图像。基于衍射光学元件(DOE)的 SCSI 系统外形小巧,单个 DOE 可以端对端方式进行优化。由于光谱图像在基于单个 DOE 的 SCSI 系统中被高度压缩,图像重建的质量可能无法满足各种光谱成像应用的需要。这项研究提出了一种采用双相编码和双 DOE 结构(DoDo)的多频谱成像系统,以提高光谱重建性能。第一个 DOE 是固定的,具有衍射光学系统的优势。第二个 DOE 提供多点结构的可变编码。这项工作提出了一个多射 DODo 系统的可变数学模型,并以端到端的方式优化了 DoDo 架构的参数。利用模拟和硬件原型对所提出的系统进行了测试。为了实现低成本系统,实施过程中使用了可变 DOE 变形镜。与单一 DOE 系统相比,拟议的 DoDo 系统在重建光谱图像的 PSNR 方面提高了多达 4 dB。
{"title":"DoDo: Double DOE Optical System for Multishot Spectral Imaging","authors":"Sergio Urrea;Roman Jacome;M. Salman Asif;Henry Arguello;Hans Garcia","doi":"10.1109/JSTSP.2024.3402320","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3402320","url":null,"abstract":"Snapshot Compressive Spectral Imaging Systems (SCSI) compress the scenes by capturing 2D projections of the encoded underlying signals. A decoder, trained with pre-acquired datasets, reconstructs the spectral images. SCSI systems based on diffractive optical elements (DOE) provide a small form factor and the single DOE can be optimized in an end-to-end manner. Since the spectral image is highly compressed in a SCSI system based on a single DOE, the quality of image reconstruction can be insufficient for diverse spectral imaging applications. This work proposes a multishot spectral imaging system employing a double-phase encoding with a double DOE architecture (DoDo), to improve the spectral reconstruction performance. The first DOE is fixed and provides the benefits of the diffractive optical systems. The second DOE provides the variable encoding of the multishot architectures. The work presents a differentiable mathematical model for the multishot DoDo system and optimizes the parameters of the DoDo architecture in an end-to-end manner. The proposed system was tested using simulations and a hardware prototype. To obtain a low-cost system, the implementation uses a deformable mirror for the variable DOE. The proposed DoDo system shows an improvement of up to 4 dB in PSNR in the reconstructed spectral images compared with the single DOE system.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Surface electromyography (sEMG) and high-density sEMG (HD-sEMG) biosignals have been extensively investigated for myoelectric control of prosthetic devices, neurorobotics, and more recently human-computer interfaces because of their capability for hand gesture recognition/prediction in a wearable and non-invasive manner. High intraday (same-day) performance has been reported. However, the interday performance (separating training and testing days) is substantially degraded due to the poor generalizability of conventional approaches over time, hindering the application of such techniques in real-life practices. There are limited recent studies on the feasibility of multi-day hand gesture recognition. The existing studies face a major challenge: the need for long sEMG epochs makes the corresponding neural interfaces impractical due to the induced delay in myoelectric control. This paper proposes a compact ViT-based network for multi-day dynamic hand gesture prediction. We tackle the main challenge as the proposed model only relies on very short HD-sEMG signal windows (i.e., 50 ms, accounting for only one-sixth of the convention for real-time myoelectric implementation), boosting agility and responsiveness. Our proposed model can predict 11 dynamic gestures for 20 subjects with an average accuracy of over 71% on the testing day, 3-25 days after training. Moreover, when calibrated on just a small portion of data from the testing day, the proposed model can achieve over 92% accuracy by retraining less than 10% of the parameters for computational efficiency.
{"title":"ViT-MDHGR: Cross-Day Reliability and Agility in Dynamic Hand Gesture Prediction via HD-sEMG Signal Decoding","authors":"Qin Hu;Golara Ahmadi Azar;Alyson Fletcher;Sundeep Rangan;S. Farokh Atashzar","doi":"10.1109/JSTSP.2024.3402340","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3402340","url":null,"abstract":"Surface electromyography (sEMG) and high-density sEMG (HD-sEMG) biosignals have been extensively investigated for myoelectric control of prosthetic devices, neurorobotics, and more recently human-computer interfaces because of their capability for hand gesture recognition/prediction in a wearable and non-invasive manner. High intraday (same-day) performance has been reported. However, the interday performance (separating training and testing days) is substantially degraded due to the poor generalizability of conventional approaches over time, hindering the application of such techniques in real-life practices. There are limited recent studies on the feasibility of multi-day hand gesture recognition. The existing studies face a major challenge: the need for long sEMG epochs makes the corresponding neural interfaces impractical due to the induced delay in myoelectric control. This paper proposes a compact ViT-based network for multi-day dynamic hand gesture prediction. We tackle the main challenge as the proposed model only relies on very short HD-sEMG signal windows (i.e., 50 ms, accounting for only one-sixth of the convention for real-time myoelectric implementation), boosting agility and responsiveness. Our proposed model can predict 11 dynamic gestures for 20 subjects with an average accuracy of over 71% on the testing day, 3-25 days after training. Moreover, when calibrated on just a small portion of data from the testing day, the proposed model can achieve over 92% accuracy by retraining less than 10% of the parameters for computational efficiency.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.1109/JSTSP.2024.3400010
ZhenZhou Wang
Due to its better accuracy and resolution, Fourier transform profilometry (FTP) is more widely used than the line clustering (LC) based structured light (SL) 3D reconstruction technique. However, it has the bottleneck problem of the unavoidable phase unwrapping errors at places of occlusions and large discontinuities. In this paper, we propose a composite pattern based on the red, green and blue (RGB) channels of the color image to fuse FTP and LC for more robust single-shot reconstruction. The red channel contains the sinusoidal pattern for FTP and the rest of the channels contain the line patterns for LC. Therefore, the intervals between the adjacent lines in the line pattern could be selected as large as possible for robust clustering while the accuracy of FTP will not be affected by the large intervals of the lines. Based on the clustered lines, the phase wrap boundary errors caused by occlusions and large discontinuities are corrected. At last, a one-dimensional phase wrap boundary guided phase unwrapping approach is proposed to solve the bottleneck problem of spatial phase unwrapping for FTP. Experimental results showed that the proposed fusion method could reconstruct the complex shapes with occlusions and large discontinuities more robust than FTP or LC based SL alone.
{"title":"Single-shot 3D Reconstruction by Fusion of Fourier Transform Profilometry and Line Clustering","authors":"ZhenZhou Wang","doi":"10.1109/JSTSP.2024.3400010","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3400010","url":null,"abstract":"Due to its better accuracy and resolution, Fourier transform profilometry (FTP) is more widely used than the line clustering (LC) based structured light (SL) 3D reconstruction technique. However, it has the bottleneck problem of the unavoidable phase unwrapping errors at places of occlusions and large discontinuities. In this paper, we propose a composite pattern based on the red, green and blue (RGB) channels of the color image to fuse FTP and LC for more robust single-shot reconstruction. The red channel contains the sinusoidal pattern for FTP and the rest of the channels contain the line patterns for LC. Therefore, the intervals between the adjacent lines in the line pattern could be selected as large as possible for robust clustering while the accuracy of FTP will not be affected by the large intervals of the lines. Based on the clustered lines, the phase wrap boundary errors caused by occlusions and large discontinuities are corrected. At last, a one-dimensional phase wrap boundary guided phase unwrapping approach is proposed to solve the bottleneck problem of spatial phase unwrapping for FTP. Experimental results showed that the proposed fusion method could reconstruct the complex shapes with occlusions and large discontinuities more robust than FTP or LC based SL alone.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.1109/JSTSP.2024.3400030
Wujie Zhou;Yuming Zhang;Weiqing Yan;Lv Ye
The rapid progression of convolutional neural networks (CNNs) has significantly improved indoor scene parsing, transforming the fields of robotics, autonomous navigation, augmented reality, and surveillance. Currently, societal demand is propelling these technologies toward integration into mobile smart device applications. However, the processing capabilities of mobile devices cannot support the comprehensive system requirements of CNNs, which poses a challenge for several deep-learning applications. One promising solution to this predicament is the deployment of lightweight student networks. These streamlined networks learn from their robust, cloud-based counterparts—that is, teacher networks—through knowledge distillation (KD). This facilitates a reduction in parameter count and optimizes student classification. Furthermore, a lightweight multiflow intersection network (LMINet) is proposed and developed for red–green–blue–depth (RGB-D) indoor scene parsing. The proposed method relies on dual-frequency KD (FKD) and compression KD (CKD) methods. A multiflow intersection module is introduced to efficiently integrate feature information from disparate layers. To maximize the performance of lightweight LMINet student (LMINet-S) networks, the FKD module employs a discrete cosine transform to capture feature information from different frequencies, whereas the CKD module compresses the features of diverse layers and distills their corresponding dimensions. Experiments using the NYUDv2 and SUN-RGBD datasets demonstrate that our LMINet teacher (LMINet-T) model, LMINet-S (without KD), and LMINet-S* (LMINet-S with KD) outperform state-of-the-art scene-parsing tools without increasing the parameter count (26.2M). Consequently, the technology is now closer to integration into mobile devices.
{"title":"An Efficient RGB-D Indoor Scene-Parsing Solution via Lightweight Multiflow Intersection and Knowledge Distillation","authors":"Wujie Zhou;Yuming Zhang;Weiqing Yan;Lv Ye","doi":"10.1109/JSTSP.2024.3400030","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3400030","url":null,"abstract":"The rapid progression of convolutional neural networks (CNNs) has significantly improved indoor scene parsing, transforming the fields of robotics, autonomous navigation, augmented reality, and surveillance. Currently, societal demand is propelling these technologies toward integration into mobile smart device applications. However, the processing capabilities of mobile devices cannot support the comprehensive system requirements of CNNs, which poses a challenge for several deep-learning applications. One promising solution to this predicament is the deployment of lightweight student networks. These streamlined networks learn from their robust, cloud-based counterparts—that is, teacher networks—through knowledge distillation (KD). This facilitates a reduction in parameter count and optimizes student classification. Furthermore, a lightweight multiflow intersection network (LMINet) is proposed and developed for red–green–blue–depth (RGB-D) indoor scene parsing. The proposed method relies on dual-frequency KD (FKD) and compression KD (CKD) methods. A multiflow intersection module is introduced to efficiently integrate feature information from disparate layers. To maximize the performance of lightweight LMINet student (LMINet-S) networks, the FKD module employs a discrete cosine transform to capture feature information from different frequencies, whereas the CKD module compresses the features of diverse layers and distills their corresponding dimensions. Experiments using the NYUDv2 and SUN-RGBD datasets demonstrate that our LMINet teacher (LMINet-T) model, LMINet-S (without KD), and LMINet-S* (LMINet-S with KD) outperform state-of-the-art scene-parsing tools without increasing the parameter count (26.2M). Consequently, the technology is now closer to integration into mobile devices.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-13DOI: 10.1109/JSTSP.2024.3400046
Hua Chen;Zelong Yi;Zhiwei Jiang;Wei Liu;Ye Tian;Qing Wang;Gang Wang
In this paper, an underdetermined three-dimensional (3-D) near-field source localization method is proposed, based on a two-dimensional (2-D) symmetric nonuniform cross array. Firstly, by utilizing the symmetric coprime array along the x-axis, a fourth-order cumulant (FOC) based matrix is constructed, followed by vectorization operation to form a single virtual snapshot, which is equivalent to the received data of a virtual array observing from virtual far-field sources, generating an increased number of degrees of freedom (DOFs) compared to the original physical array. Meanwhile, multiple delay lags, named as pseudo snapshots, are introduced to address the single snapshot issue. Then, the received data of the uniform linear array along the y-axis is similarly processed to form another virtual array, followed by a cross-correlation operation on the virtual array observations constructed from the coprime array. Finally, the 2-D angles of the near-field sources are jointly estimated by employing the recently proposed sparse and parametric approach (SPA) and the Vandermonde decomposition technique, eliminating the need for parameter discretization. To estimate the range term, the conjugate symmetry property of the signal's autocorrelation function is used to construct the second-order statistics based received data with the whole array elements, and subsequently, the one-dimensional (1-D) MUSIC algorithm is applied. Moreover, some properties of the proposed array are analyzed. Compared with existing algorithms, the proposed one has better estimation performance given the same number of sensor elements, which can work in an underdetermined and mixed sources situation, as shown by simulation results with 3-D parameters automatically paired.
本文提出了一种基于二维对称非均匀交叉阵列的欠定三维近场源定位方法。首先,利用沿 x 轴对称共轭阵列,构建基于四阶累积(FOC)的矩阵,然后进行矢量化操作,形成单个虚拟快照,该快照相当于虚拟阵列从虚拟远场源观测到的接收数据,与原始物理阵列相比,增加了自由度(DOF)。同时,为了解决单快照问题,还引入了多个延迟滞后(称为伪快照)。然后,对沿 Y 轴的均匀线性阵列的接收数据进行类似处理,形成另一个虚拟阵列,接着对由共轭阵列构建的虚拟阵列观测数据进行交叉相关操作。最后,利用最近提出的稀疏和参数方法(SPA)以及范德蒙德分解技术共同估算近场源的二维角度,从而消除了参数离散化的需要。为了估算测距项,利用信号自相关函数的共轭对称特性来构建基于整个阵元接收数据的二阶统计量,然后应用一维(1-D)MUSIC 算法。此外,还分析了拟议阵列的一些特性。三维参数自动配对的仿真结果表明,与现有算法相比,拟议算法在相同传感元件数量的情况下具有更好的估计性能,可以在不确定和混合信号源的情况下工作。
{"title":"Spatial-Temporal-Based Underdetermined Near-Field 3-D Localization Employing a Nonuniform Cross Array","authors":"Hua Chen;Zelong Yi;Zhiwei Jiang;Wei Liu;Ye Tian;Qing Wang;Gang Wang","doi":"10.1109/JSTSP.2024.3400046","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3400046","url":null,"abstract":"In this paper, an underdetermined three-dimensional (3-D) near-field source localization method is proposed, based on a two-dimensional (2-D) symmetric nonuniform cross array. Firstly, by utilizing the symmetric coprime array along the x-axis, a fourth-order cumulant (FOC) based matrix is constructed, followed by vectorization operation to form a single virtual snapshot, which is equivalent to the received data of a virtual array observing from virtual far-field sources, generating an increased number of degrees of freedom (DOFs) compared to the original physical array. Meanwhile, multiple delay lags, named as pseudo snapshots, are introduced to address the single snapshot issue. Then, the received data of the uniform linear array along the y-axis is similarly processed to form another virtual array, followed by a cross-correlation operation on the virtual array observations constructed from the coprime array. Finally, the 2-D angles of the near-field sources are jointly estimated by employing the recently proposed sparse and parametric approach (SPA) and the Vandermonde decomposition technique, eliminating the need for parameter discretization. To estimate the range term, the conjugate symmetry property of the signal's autocorrelation function is used to construct the second-order statistics based received data with the whole array elements, and subsequently, the one-dimensional (1-D) MUSIC algorithm is applied. Moreover, some properties of the proposed array are analyzed. Compared with existing algorithms, the proposed one has better estimation performance given the same number of sensor elements, which can work in an underdetermined and mixed sources situation, as shown by simulation results with 3-D parameters automatically paired.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":null,"pages":null},"PeriodicalIF":8.7,"publicationDate":"2024-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-11DOI: 10.1109/JSTSP.2024.3374591
Rohit Parasnis;Seyyedali Hosseinalipour;Yun-Wei Chu;Mung Chiang;Christopher G. Brinton
Semi-decentralized federated learning blends the conventional device-to-server (D2S) interaction structure of federated model training with localized device-to-device (D2D) communications. We study this architecture over edge networks with multiple D2D clusters modeled as time-varying and directed communication graphs. Our investigation results in two algorithms: (a) a connectivity-aware