Pub Date : 2024-10-15DOI: 10.1109/LSP.2024.3481151
Jinghao Cao;Sheng Liu;Xiong Yang;Yang Li;Sidan Du
The large-scale generation of real-world scenario datasets is a pivotal task in the field of autonomous driving. Existing methods emphasize solely on single-frame rendering, which need complex inputs for continuous scenario rendering. In this letter, ARES: a text-driven automatic realistic simulator is proposed, which can generate extensive realistic datasets with just a single text input. Its core idea is to generate vehicle trajectories based on the textual description, and then render the scenario by vehicle attributes associated with these trajectories. For learning trajectories generating, supervisory signal temporal logic is proposed to assist conditional diffusion model, which incorporates prior physical information. We annotate textual descriptions for KITTI-MOT dataset and establish an objective quantitative evaluation system. The superiority of our method is demonstrated by its high performance, which is reflected in a matching score of 3.54 and an FID of 8.93in the trajectory reconstruction task, along with a speed accuracy of 0.99 and a direction accuracy of 0.93in the trajectory editing task. The scenarios rendered by the proposed method exhibit high quality and realism, which indicates its great potential in testing of autonomous driving algorithms with vehicle-in-the-loop simulations.
{"title":"ARES: Text-Driven Automatic Realistic Simulator for Autonomous Traffic","authors":"Jinghao Cao;Sheng Liu;Xiong Yang;Yang Li;Sidan Du","doi":"10.1109/LSP.2024.3481151","DOIUrl":"https://doi.org/10.1109/LSP.2024.3481151","url":null,"abstract":"The large-scale generation of real-world scenario datasets is a pivotal task in the field of autonomous driving. Existing methods emphasize solely on single-frame rendering, which need complex inputs for continuous scenario rendering. In this letter, ARES: a text-driven automatic realistic simulator is proposed, which can generate extensive realistic datasets with just a single text input. Its core idea is to generate vehicle trajectories based on the textual description, and then render the scenario by vehicle attributes associated with these trajectories. For learning trajectories generating, supervisory signal temporal logic is proposed to assist conditional diffusion model, which incorporates prior physical information. We annotate textual descriptions for KITTI-MOT dataset and establish an objective quantitative evaluation system. The superiority of our method is demonstrated by its high performance, which is reflected in a matching score of 3.54 and an FID of 8.93in the trajectory reconstruction task, along with a speed accuracy of 0.99 and a direction accuracy of 0.93in the trajectory editing task. The scenarios rendered by the proposed method exhibit high quality and realism, which indicates its great potential in testing of autonomous driving algorithms with vehicle-in-the-loop simulations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3049-3053"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595022","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1109/LSP.2024.3480833
Hongmei Wang;Sheng Xing;Zhiwei Wang;Minghui Min;Shiyin Li
Ultra-wideband (UWB) positioning system offers high-precision location capabilities. However, it introduces positive biases in complex environments. Pedestrian Dead Reckoning (PDR) algorithm based on Inertial Measurement Unit (IMU) can maintain robust tracking even in cases of abrupt changes in pedestrian trajectories but suffers from cumulative errors. Therefore, in this study, the strengths of both systems are combined. Hence, a factor graph model is established to enhance the multi-system fusion localization method based on factor graphs. Experimental verification in both straight-line trajectories and scenarios involving state mutations demonstrates an integrated average positioning accuracy within 0.1m. When compared to traditional system fusion localization methods, the accuracy is enhanced by more than 50%.
{"title":"Multi-System Fusion Positioning Method Based on Factor Graph","authors":"Hongmei Wang;Sheng Xing;Zhiwei Wang;Minghui Min;Shiyin Li","doi":"10.1109/LSP.2024.3480833","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480833","url":null,"abstract":"Ultra-wideband (UWB) positioning system offers high-precision location capabilities. However, it introduces positive biases in complex environments. Pedestrian Dead Reckoning (PDR) algorithm based on Inertial Measurement Unit (IMU) can maintain robust tracking even in cases of abrupt changes in pedestrian trajectories but suffers from cumulative errors. Therefore, in this study, the strengths of both systems are combined. Hence, a factor graph model is established to enhance the multi-system fusion localization method based on factor graphs. Experimental verification in both straight-line trajectories and scenarios involving state mutations demonstrates an integrated average positioning accuracy within 0.1m. When compared to traditional system fusion localization methods, the accuracy is enhanced by more than 50%.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3025-3029"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The datasets of most image quality assessment studies contain ratings on a categorical scale with five levels, from bad (1) to excellent (5). For each stimulus, the number of ratings from 1 to 5 is summarized and given in the form of the mean opinion score. In this study, we investigate families of multinomial probability distributions parameterized by mean and variance that are used to fit the empirical rating distributions. To this end, we consider quantized metric models based on continuous distributions that model perceived stimulus quality on a latent scale. The probabilities for the rating categories are determined by quantizing the corresponding random variables using threshold values. Furthermore, we introduce a novel discrete maximum entropy distribution for a given mean and variance. We compare the performance of these models and the state of the art given by the generalized score distribution for two large data sets, KonIQ-10k and VQEG HDTV. Given an input distribution of ratings, our fitted two-parameter models predict unseen ratings better than the empirical distribution. In contrast to empirical distributions of absolute category ratings and their discrete models, our continuous models can provide fine-grained estimates of quantiles of quality of experience that are relevant to service providers to satisfy a certain fraction of the user population.
{"title":"Maximum Entropy and Quantized Metric Models for Absolute Category Ratings","authors":"Dietmar Saupe;Krzysztof Rusek;David Hägele;Daniel Weiskopf;Lucjan Janowski","doi":"10.1109/LSP.2024.3480832","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480832","url":null,"abstract":"The datasets of most image quality assessment studies contain ratings on a categorical scale with five levels, from bad (1) to excellent (5). For each stimulus, the number of ratings from 1 to 5 is summarized and given in the form of the mean opinion score. In this study, we investigate families of multinomial probability distributions parameterized by mean and variance that are used to fit the empirical rating distributions. To this end, we consider quantized metric models based on continuous distributions that model perceived stimulus quality on a latent scale. The probabilities for the rating categories are determined by quantizing the corresponding random variables using threshold values. Furthermore, we introduce a novel discrete maximum entropy distribution for a given mean and variance. We compare the performance of these models and the state of the art given by the generalized score distribution for two large data sets, KonIQ-10k and VQEG HDTV. Given an input distribution of ratings, our fitted two-parameter models predict unseen ratings better than the empirical distribution. In contrast to empirical distributions of absolute category ratings and their discrete models, our continuous models can provide fine-grained estimates of quantiles of quality of experience that are relevant to service providers to satisfy a certain fraction of the user population.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2970-2974"},"PeriodicalIF":3.2,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1109/LSP.2024.3480033
Yahui Liu;Jian Wang;Yuntai Yang;Renlong Wang;Simiao Wang
Due to the detrimental impact of noisy labels on the generalization of deep neural networks, learning with noisy labels has become an important task in modern deep learning applications. Many previous efforts have mitigated this problem by either removing noisy samples or correcting labels. In this letter, we address this issue from a new perspective and empirically find that models trained with both clean and mislabeled samples exhibit distinguishable activation feature distributions. Building on this observation, we propose a novel meta-learning approach called the Hierarchical Noise-tolerant Meta-Learning (HNML) method, which involves a bi-level optimization comprising meta-training and meta-testing. In the meta-training stage, we incorporate consistency loss at the output prediction hierarchy to facilitate model adaptation to dynamically changing label noise. In the meta-testing stage, we extract activation feature distributions using class activation maps and propose a new mask-guided self-learning method to correct biases in the foreground regions. Through the bi-level optimization of HNML, we ensure that the model generates discriminative feature representations that are insensitive to noisy labels. When evaluated on both synthetic and real-world noisy datasets, our HNML method achieves significant improvements over previous state-of-the-art methods.
{"title":"Hierarchical Noise-Tolerant Meta-Learning With Noisy Labels","authors":"Yahui Liu;Jian Wang;Yuntai Yang;Renlong Wang;Simiao Wang","doi":"10.1109/LSP.2024.3480033","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480033","url":null,"abstract":"Due to the detrimental impact of noisy labels on the generalization of deep neural networks, learning with noisy labels has become an important task in modern deep learning applications. Many previous efforts have mitigated this problem by either removing noisy samples or correcting labels. In this letter, we address this issue from a new perspective and empirically find that models trained with both clean and mislabeled samples exhibit distinguishable activation feature distributions. Building on this observation, we propose a novel meta-learning approach called the Hierarchical Noise-tolerant Meta-Learning (HNML) method, which involves a bi-level optimization comprising meta-training and meta-testing. In the meta-training stage, we incorporate consistency loss at the output prediction hierarchy to facilitate model adaptation to dynamically changing label noise. In the meta-testing stage, we extract activation feature distributions using class activation maps and propose a new mask-guided self-learning method to correct biases in the foreground regions. Through the bi-level optimization of HNML, we ensure that the model generates discriminative feature representations that are insensitive to noisy labels. When evaluated on both synthetic and real-world noisy datasets, our HNML method achieves significant improvements over previous state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"3020-3024"},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1109/LSP.2024.3480046
Qilang Ye;Zitong Yu
Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.
{"title":"Pose-Promote: Progressive Visual Perception for Activities of Daily Living","authors":"Qilang Ye;Zitong Yu","doi":"10.1109/LSP.2024.3480046","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480046","url":null,"abstract":"Poses are effective in interpreting fine-grained human activities, especially when encountering complex visual information. Unimodal methods for action recognition unsatisfactorily to daily activities due to the lack of a more comprehensive perspective. Multimodal methods to combine pose and visual are still not exhaustive enough in mining complementary information. Therefore, we propose a Pose-promote (Ppromo) framework that utilizes a priori knowledge of pose joints to perceive visual information progressively. We first introduce a temporal promote module to activate each video segment using temporally synchronized joint weights. Then a spatial promote module is proposed to capture the key regions in visuals using the learned pose attentions. To further refine the bimodal associations, the global inter-promote module is proposed to align global pose-visual semantics at the feature granularity. Finally, a learnable late fusion strategy between visual and pose is applied for accurate inference. Ppromo achieves state-of-the-art performance on three publicly available datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2950-2954"},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1109/LSP.2024.3480831
Qi Gao;Mingfeng Yin;Yuanzhi Ni;Yuming Bo;Shaoyi Bei
The recent development of advanced trackers, which use nighttime image enhancement technology, has led to marked advances in the performance of visual tracking at night. However, the images recovered by currently available enhancement methods still have some weaknesses, such as blurred target details and obvious image noise. To this end, we propose a novel method for learning multidimensional spatial attention for robust nighttime visual tracking, which is developed over a spatial channel transformer based low light enhancer (SCT), named MSA-SCT. First, a novel multidimensional spatial attention (MSA) is designed. Additional reliable feature responses are generated by aggregating channel and multi-scale spatial information, thus making the model more adaptable to illumination conditions and noise levels in different regions of the image. Second, with optimized skip connections, the effects of redundant information and noise can be limited, which is more useful for the propagation of fine detail features in nighttime images from low to high level features and improves the enhancement effect. Finally, the tracker with enhancers was tested on multiple tracking benchmarks to fully demonstrate the effectiveness and superiority of MSA-SCT.
{"title":"Learning Multidimensional Spatial Attention for Robust Nighttime Visual Tracking","authors":"Qi Gao;Mingfeng Yin;Yuanzhi Ni;Yuming Bo;Shaoyi Bei","doi":"10.1109/LSP.2024.3480831","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480831","url":null,"abstract":"The recent development of advanced trackers, which use nighttime image enhancement technology, has led to marked advances in the performance of visual tracking at night. However, the images recovered by currently available enhancement methods still have some weaknesses, such as blurred target details and obvious image noise. To this end, we propose a novel method for learning multidimensional spatial attention for robust nighttime visual tracking, which is developed over a spatial channel transformer based low light enhancer (SCT), named MSA-SCT. First, a novel multidimensional spatial attention (MSA) is designed. Additional reliable feature responses are generated by aggregating channel and multi-scale spatial information, thus making the model more adaptable to illumination conditions and noise levels in different regions of the image. Second, with optimized skip connections, the effects of redundant information and noise can be limited, which is more useful for the propagation of fine detail features in nighttime images from low to high level features and improves the enhancement effect. Finally, the tracker with enhancers was tested on multiple tracking benchmarks to fully demonstrate the effectiveness and superiority of MSA-SCT.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2910-2914"},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
With the advancement of technology, the field of multi-channel time series forecasting has emerged as a focal point of research. In this context, spatio-temporal graph neural networks have attracted significant interest due to their outstanding performance. An established approach involves integrating graph convolutional networks into recurrent neural networks. However, this approach faces difficulties in capturing dynamic spatial correlations and discerning the correlation of multi-channel time series signals. Another major problem is that the discrete time interval of recurrent neural networks limits the accuracy of spatio-temporal prediction. To address these challenges, we propose a continuous spatio-temporal framework, termed Recurrent Spatio-Temporal Graph Neural Network based on Latent Time Graph (RST-LTG). RST-LTG incorporates adaptive graph convolution networks with a time embedding generator to construct a latent time graph, which subtly captures evolving spatial characteristics by aggregating spatial information across multiple time steps. Additionally, to improve the accuracy of continuous time modeling, we introduce a gate enhanced neural ordinary differential equation that effectively integrates information across multiple scales. Empirical results on four publicly available datasets demonstrate that the RST-LTG model outperforms 19 competing methods in terms of accuracy.
{"title":"A Recurrent Spatio-Temporal Graph Neural Network Based on Latent Time Graph for Multi-Channel Time Series Forecasting","authors":"Linzhi Li;Xiaofeng Zhou;Guoliang Hu;Shuai Li;Dongni Jia","doi":"10.1109/LSP.2024.3479917","DOIUrl":"https://doi.org/10.1109/LSP.2024.3479917","url":null,"abstract":"With the advancement of technology, the field of multi-channel time series forecasting has emerged as a focal point of research. In this context, spatio-temporal graph neural networks have attracted significant interest due to their outstanding performance. An established approach involves integrating graph convolutional networks into recurrent neural networks. However, this approach faces difficulties in capturing dynamic spatial correlations and discerning the correlation of multi-channel time series signals. Another major problem is that the discrete time interval of recurrent neural networks limits the accuracy of spatio-temporal prediction. To address these challenges, we propose a continuous spatio-temporal framework, termed Recurrent Spatio-Temporal Graph Neural Network based on Latent Time Graph (RST-LTG). RST-LTG incorporates adaptive graph convolution networks with a time embedding generator to construct a latent time graph, which subtly captures evolving spatial characteristics by aggregating spatial information across multiple time steps. Additionally, to improve the accuracy of continuous time modeling, we introduce a gate enhanced neural ordinary differential equation that effectively integrates information across multiple scales. Empirical results on four publicly available datasets demonstrate that the RST-LTG model outperforms 19 competing methods in terms of accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2875-2879"},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142452677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1109/LSP.2024.3480043
Muhammad Bilal Akram Dastagir;Dongsoo Han
Quantum computing, combined with deep learning, leverages principles like superposition and entanglement to enhance complex data-driven tasks. The Noisy Intermediate-Scale Quantum (NISQ) era presents opportunities for hybrid quantum-classical architectures to address this challenge. Despite significant progress, practical applications of these hybrid models are limited. This letter proposes a novel hybrid quantum-classical deep learning architecture, integrating Quantum Convolutional Neural Networks (QCNNs) and Long-Short-Term Memory (LSTM) networks, enhanced by Cluster State Signal Processing. Furthermore, this letter addresses indoor-outdoor detection using high-dimensional signal data, utilizing the Cirq platform—a Python framework for developing and simulating Noisy Intermediate Scale Quantum (NISQ) circuits on quantum computers and simulators. The approach addresses noise and decoherence issues. Preliminary results show that the QCNN-LSTM model outperforms pure quantum and hybrid models in accuracy and efficiency. This validates the practical benefits of hybrid architectures, paving the way for advancements in complex data classification like indoor-outdoor detection.
量子计算与深度学习相结合,可利用叠加和纠缠等原理来增强复杂的数据驱动任务。噪声中量子(NISQ)时代为混合量子-经典架构应对这一挑战提供了机遇。尽管取得了重大进展,但这些混合模型的实际应用仍然有限。这封信提出了一种新型混合量子-经典深度学习架构,它整合了量子卷积神经网络(QCNN)和长短期记忆(LSTM)网络,并通过簇态信号处理(Cluster State Signal Processing)进行了增强。此外,这封信还利用 Cirq 平台--在量子计算机和模拟器上开发和模拟噪声中间量级量子(NISQ)电路的 Python 框架--解决了利用高维信号数据进行室内-室外检测的问题。该方法解决了噪声和退相干问题。初步结果表明,QCNN-LSTM 模型在准确性和效率方面优于纯量子模型和混合模型。这验证了混合架构的实际优势,为室内外检测等复杂数据分类的进步铺平了道路。
{"title":"Towards Hybrid Quantum-Classical Deep Learning Architecture for Indoor-Outdoor Detection Using QCNN-LSTM and Cluster State Signal Processing","authors":"Muhammad Bilal Akram Dastagir;Dongsoo Han","doi":"10.1109/LSP.2024.3480043","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480043","url":null,"abstract":"Quantum computing, combined with deep learning, leverages principles like superposition and entanglement to enhance complex data-driven tasks. The Noisy Intermediate-Scale Quantum (NISQ) era presents opportunities for hybrid quantum-classical architectures to address this challenge. Despite significant progress, practical applications of these hybrid models are limited. This letter proposes a novel hybrid quantum-classical deep learning architecture, integrating Quantum Convolutional Neural Networks (QCNNs) and Long-Short-Term Memory (LSTM) networks, enhanced by Cluster State Signal Processing. Furthermore, this letter addresses indoor-outdoor detection using high-dimensional signal data, utilizing the Cirq platform—a Python framework for developing and simulating Noisy Intermediate Scale Quantum (NISQ) circuits on quantum computers and simulators. The approach addresses noise and decoherence issues. Preliminary results show that the QCNN-LSTM model outperforms pure quantum and hybrid models in accuracy and efficiency. This validates the practical benefits of hybrid architectures, paving the way for advancements in complex data classification like indoor-outdoor detection.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2945-2949"},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1109/LSP.2024.3479923
Haitao Zhao;Chunxi Zhao;Tianyu Zhang;Bo Xu;Jinlong Sun
Integrated sensing and communication in 6G, particularly for air-ground surveillance using automatic dependent surveillance-broadcast (ADS-B) and multi-lateration (MLAT) systems, is gaining significant research interest. This letter investigates the problem of optimal anchor station selection for tracking aerial vehicles, and proposes a novel heuristic learning scheme termed as fishing net-like optimization (FNO). Specifically, we perform constrained random walk steps on a two-dimensional surface to optimize the initial anchor stations’ parameters. FNO also incorporates with new evaluation strategies and acceleration techniques to accelerate the convergence speed. Experimental results demonstrate that FNO can achieve better selection of the anchor stations, and the accuracy of the chosen MLAT can be improved by ten times or more with the anchors optimization.
{"title":"Fishing Net Optimization: A Learning Scheme of Optimizing Multi-Lateration Stations in Air-Ground Vehicle Networks","authors":"Haitao Zhao;Chunxi Zhao;Tianyu Zhang;Bo Xu;Jinlong Sun","doi":"10.1109/LSP.2024.3479923","DOIUrl":"https://doi.org/10.1109/LSP.2024.3479923","url":null,"abstract":"Integrated sensing and communication in 6G, particularly for air-ground surveillance using automatic dependent surveillance-broadcast (ADS-B) and multi-lateration (MLAT) systems, is gaining significant research interest. This letter investigates the problem of optimal anchor station selection for tracking aerial vehicles, and proposes a novel heuristic learning scheme termed as fishing net-like optimization (FNO). Specifically, we perform constrained random walk steps on a two-dimensional surface to optimize the initial anchor stations’ parameters. FNO also incorporates with new evaluation strategies and acceleration techniques to accelerate the convergence speed. Experimental results demonstrate that FNO can achieve better selection of the anchor stations, and the accuracy of the chosen MLAT can be improved by ten times or more with the anchors optimization.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2965-2969"},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-14DOI: 10.1109/LSP.2024.3480032
Yanbo Gao;Xianye Wu;Shuai Li;Xun Cai;Chuankun Li
In recent years, self-supervised monocular depth estimation has become popular due to its advantage in estimating the depth without the need of groundtruth depth labels. Instead, it takes an inter-frame supervision using depth based view synthesis to reconstruct temporal adjacent frames to indirectly supervise the generated depth. However, such supervision weakens the depth estimation at temporal incoherent regions containing small changes among consecutive frames. To overcome the above problem, we propose a color and geometric contrastive learning based intra-frame supervision framework to enhance self-supervised monocular depth estimation. Color-contrastive learning is proposed to guide the network to learn color invariant features considering color information is irrelevant to depth data. To improve the local details of the learned feature, a pixel-level contrastive learning is further used to optimize the learning. In view that the depth estimation, as a pixel-level task, is sensitive to the geometric transformation, geometric-contrastive learning is developed using an inverse geometric transformation to learn features that are equivariant to the geometric data augmentation. A local plane guidance layer (LPG) with contrastive learning is further used to decompose the geometric information and enhance the geometric contrastive learning. Experiments demonstrate that the proposed method achieves the best result compared to the state-of-the-art methods in all tested quality metrics, with the largest improvement of 22.8% over baseline Monodepth2 and 3.2% over Monovit, in terms of SqRel reduction.
{"title":"Color and Geometric Contrastive Learning Based Intra-Frame Supervision for Self-Supervised Monocular Depth Estimation","authors":"Yanbo Gao;Xianye Wu;Shuai Li;Xun Cai;Chuankun Li","doi":"10.1109/LSP.2024.3480032","DOIUrl":"https://doi.org/10.1109/LSP.2024.3480032","url":null,"abstract":"In recent years, self-supervised monocular depth estimation has become popular due to its advantage in estimating the depth without the need of groundtruth depth labels. Instead, it takes an inter-frame supervision using depth based view synthesis to reconstruct temporal adjacent frames to indirectly supervise the generated depth. However, such supervision weakens the depth estimation at temporal incoherent regions containing small changes among consecutive frames. To overcome the above problem, we propose a color and geometric contrastive learning based intra-frame supervision framework to enhance self-supervised monocular depth estimation. Color-contrastive learning is proposed to guide the network to learn color invariant features considering color information is irrelevant to depth data. To improve the local details of the learned feature, a pixel-level contrastive learning is further used to optimize the learning. In view that the depth estimation, as a pixel-level task, is sensitive to the geometric transformation, geometric-contrastive learning is developed using an inverse geometric transformation to learn features that are equivariant to the geometric data augmentation. A local plane guidance layer (LPG) with contrastive learning is further used to decompose the geometric information and enhance the geometric contrastive learning. Experiments demonstrate that the proposed method achieves the best result compared to the state-of-the-art methods in all tested quality metrics, with the largest improvement of 22.8% over baseline Monodepth2 and 3.2% over Monovit, in terms of SqRel reduction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"31 ","pages":"2940-2944"},"PeriodicalIF":3.2,"publicationDate":"2024-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142524167","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}