Pub Date : 2024-11-28DOI: 10.1109/LSP.2024.3509341
Ziming Hu;Quan Kong;Qing Liao
Infrared and visible image fusion involves integrating complementary or critical information extracted from different source images into one image. Due to the significant differences between the two modality features and those across different scales, traditional fusion strategies, such as addition or concatenation, often result in information redundancy or the degradation of crucial information. This letter proposes a multi-level adaptive attention fusion network to adaptively fuse features extracted from different sources. Specifically, we introduced an Adaptive Scale Attention Fusion (ASAF) module that uses a soft selection mechanism to assess the relative importance of different modality features at the same scale and assign corresponding fusion weights. Additionally, a guided upsampling layer is utilized to integrate shallow and deep feature information at different scales in the multi-scale structure. Qualitative and quantitative results on public datasets validate the superior performance of our approach in both visual effects and quantitative metrics.
{"title":"Multi-Level Adaptive Attention Fusion Network for Infrared and Visible Image Fusion","authors":"Ziming Hu;Quan Kong;Qing Liao","doi":"10.1109/LSP.2024.3509341","DOIUrl":"https://doi.org/10.1109/LSP.2024.3509341","url":null,"abstract":"Infrared and visible image fusion involves integrating complementary or critical information extracted from different source images into one image. Due to the significant differences between the two modality features and those across different scales, traditional fusion strategies, such as addition or concatenation, often result in information redundancy or the degradation of crucial information. This letter proposes a multi-level adaptive attention fusion network to adaptively fuse features extracted from different sources. Specifically, we introduced an Adaptive Scale Attention Fusion (ASAF) module that uses a soft selection mechanism to assess the relative importance of different modality features at the same scale and assign corresponding fusion weights. Additionally, a guided upsampling layer is utilized to integrate shallow and deep feature information at different scales in the multi-scale structure. Qualitative and quantitative results on public datasets validate the superior performance of our approach in both visual effects and quantitative metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"366-370"},"PeriodicalIF":3.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142912465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-28DOI: 10.1109/LSP.2024.3509333
Mingyang Liu;Zuyuan Yang;Wei Han;Shengli Xie
Hashing is essential for approximate nearest neighbor search by mapping high-dimensional data to compact binary codes. The balance between similarity preservation and code diversity is a key challenge. Existing projection-based methods often struggle with fitting binary codes to continuous space due to space heterogeneity. To address this, we propose a novel Cluster Guided Truncated Hashing (CGTH) method that uses latent cluster information to guide the binary learning process. By leveraging data clusters as anchor points and applying a truncated coding strategy, our method effectively maintains local similarity and code diversity. Experiments on benchmark datasets demonstrate that CGTH outperforms existing methods, achieving superior search performance.
{"title":"Cluster Guided Truncated Hashing for Enhanced Approximate Nearest Neighbor Search","authors":"Mingyang Liu;Zuyuan Yang;Wei Han;Shengli Xie","doi":"10.1109/LSP.2024.3509333","DOIUrl":"https://doi.org/10.1109/LSP.2024.3509333","url":null,"abstract":"Hashing is essential for approximate nearest neighbor search by mapping high-dimensional data to compact binary codes. The balance between similarity preservation and code diversity is a key challenge. Existing projection-based methods often struggle with fitting binary codes to continuous space due to space heterogeneity. To address this, we propose a novel Cluster Guided Truncated Hashing (CGTH) method that uses latent cluster information to guide the binary learning process. By leveraging data clusters as anchor points and applying a truncated coding strategy, our method effectively maintains local similarity and code diversity. Experiments on benchmark datasets demonstrate that CGTH outperforms existing methods, achieving superior search performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"181-185"},"PeriodicalIF":3.2,"publicationDate":"2024-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.
生成式对抗网络(GAN)在从小维度生成逼真图像方面取得了显著进展,这从根本上通过丰富的语义建立了潜在生成空间。因此,GAN 反演旨在将现实世界的图像映射回潜在空间,从而从图像中获取语义。然而,现有的 GAN 反演方法只能反演具有固定分辨率的图像,这大大限制了真实世界场景中的表示能力。为了解决这个问题,我们提出了通过补丁反转图像的方法,因此被命名为补丁反转,这是首次尝试针对任意分辨率的分块反转。更具体地说,我们开发了无填充操作,以确保跨补丁的连续性,并分析了反转过程中的内在不匹配问题。为了缓解这种不匹配,我们提出了一种移位卷积操作,它既能保持图像斑块间的连续性,又能同时扩大每个卷积层的感受野。我们还进一步提出了倒易损失法,将反转潜码正则化,使其驻留在原始潜码生成空间,从而最大限度地保留了丰富的语义。实验结果表明,我们的补丁反相器能够准确反相任意分辨率的图像,同时在真实世界场景中呈现精确而丰富的图像语义。
{"title":"Patch Inverter: A Novel Block-Wise GAN Inversion Method for Arbitrary Image Resolutions","authors":"Yifei Li;Mai Xu;Shengxi Li;Jialu Zhang;Zhenyu Guan","doi":"10.1109/LSP.2024.3506859","DOIUrl":"https://doi.org/10.1109/LSP.2024.3506859","url":null,"abstract":"Generative adversarial networks (GANs) have achieved remarkable progress in generating realistic images from merely small dimensions, which essentially establishes the latent generating space by rich semantics. GAN inversion thus aims at mapping real-world images back into the latent space, allowing for the access of semantics from images. However, existing GAN inversion methods can only invert images with fixed resolutions; this significantly restricts the representation capability in real-world scenarios. To address this issue, we propose to invert images by patches, thus named as patch inverter, which is the first attempt in terms of block-wise inversion for arbitrary resolutions. More specifically, we develop the padding-free operation to ensure the continuity across patches, and analyse the intrinsic mismatch within the inversion procedure. To relieve the mismatch, we propose a shifted convolution operation, which retains the continuity across image patches and simultaneously enlarges the receptive field for each convolution layer. We further propose the reciprocal loss to regularize the inverted latent codes to reside on the original latent generating space, such that the rich semantics can be maximally preserved. Experimental results have demonstrated that our patch inverter is able to accurately invert images with arbitrary resolutions, whilst representing precise and rich image semantics in real-world scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"171-175"},"PeriodicalIF":3.2,"publicationDate":"2024-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-26DOI: 10.1109/LSP.2024.3506863
Kang Yi;Yumeng Li;Haoran Tang;Jing Xu
RGB-D Salient Object Detection (SOD) aims to identify and highlight the most visually prominent objects from complex backgrounds by leveraging both RGB and depth information. However, depth maps often suffer from noise and inconsistencies due to the imaging modalities and sensor limitations. Additionally, the low-level spatial details and high-level semantic information from multiple levels pose another complexity layer. These issues result in depth maps that may not align well with the corresponding RGB images, causing incorrect foreground and background segmentation. To address these issues, we propose a novel adaptive depth enhancement network (ADENet), which adopts the Depth Feature Refinement (DFR) module to mitigate the negative impact of low-quality depth data and improve the synergy between multi-modal features. We also design a simple yet effective Cross Modality Fusion (CMF) module that combines the spatial and channel attention mechanisms to calibrate single modality features and boost the fusion. The Progressive Multiscale Aggregation (PMA) decoder has also been introduced to integrate multiscale features, promoting more globally retained information. Extensive experiments illustrate that our proposed ADENet is superior to the other 10 state-of-the-art methods on four benchmark datasets.
{"title":"Adaptive Depth Enhancement Network for RGB-D Salient Object Detection","authors":"Kang Yi;Yumeng Li;Haoran Tang;Jing Xu","doi":"10.1109/LSP.2024.3506863","DOIUrl":"https://doi.org/10.1109/LSP.2024.3506863","url":null,"abstract":"RGB-D Salient Object Detection (SOD) aims to identify and highlight the most visually prominent objects from complex backgrounds by leveraging both RGB and depth information. However, depth maps often suffer from noise and inconsistencies due to the imaging modalities and sensor limitations. Additionally, the low-level spatial details and high-level semantic information from multiple levels pose another complexity layer. These issues result in depth maps that may not align well with the corresponding RGB images, causing incorrect foreground and background segmentation. To address these issues, we propose a novel adaptive depth enhancement network (ADENet), which adopts the Depth Feature Refinement (DFR) module to mitigate the negative impact of low-quality depth data and improve the synergy between multi-modal features. We also design a simple yet effective Cross Modality Fusion (CMF) module that combines the spatial and channel attention mechanisms to calibrate single modality features and boost the fusion. The Progressive Multiscale Aggregation (PMA) decoder has also been introduced to integrate multiscale features, promoting more globally retained information. Extensive experiments illustrate that our proposed ADENet is superior to the other 10 state-of-the-art methods on four benchmark datasets.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"176-180"},"PeriodicalIF":3.2,"publicationDate":"2024-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25DOI: 10.1109/LSP.2024.3506858
Yu-Teng Hsu;Jun-You Wang;Jyh-Shing Roger Jang
Singing voice conversion (SVC) aims to convert the singer identity of a singing voice to that of another singer. However, most existing SVC systems only perform the conversion of timbre information, while leaving other information unchanged. This approach does not consider other aspects of singer identity, particularly a singer's performance style, which is reflected in the pitch (F0) and the energy (volume dynamics) contours of singing. To address this issue, this paper proposes a many-to-many singing performance style transfer system that converts the pitch and energy contours of one singer's style to another singer's. To achieve this target, we utilize two AutoVC-like autoencoders with an information bottleneck to automatically disentangle performance style from other musical contents, one for the pitch contour while another for the energy contour. Experiment results suggested that the proposed model can perform singing performance style transfer in a many-to-many conversion scenario, resulting in improved singer identity similarity to the target singer.
{"title":"Many-to-Many Singing Performance Style Transfer on Pitch and Energy Contours","authors":"Yu-Teng Hsu;Jun-You Wang;Jyh-Shing Roger Jang","doi":"10.1109/LSP.2024.3506858","DOIUrl":"https://doi.org/10.1109/LSP.2024.3506858","url":null,"abstract":"Singing voice conversion (SVC) aims to convert the singer identity of a singing voice to that of another singer. However, most existing SVC systems only perform the conversion of timbre information, while leaving other information unchanged. This approach does not consider other aspects of singer identity, particularly a singer's performance style, which is reflected in the pitch (F0) and the energy (volume dynamics) contours of singing. To address this issue, this paper proposes a many-to-many singing performance style transfer system that converts the pitch and energy contours of one singer's style to another singer's. To achieve this target, we utilize two AutoVC-like autoencoders with an information bottleneck to automatically disentangle performance style from other musical contents, one for the pitch contour while another for the energy contour. Experiment results suggested that the proposed model can perform singing performance style transfer in a many-to-many conversion scenario, resulting in improved singer identity similarity to the target singer.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"166-170"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Multimodal apparent personality traits analysis is a challenging issue due to the asynchrony among modalities. To address this issue, this paper proposes a Progressive Adaptive Crossmodal Reinforcement (PACMR) approach for multimodal apparent personality traits analysis. PACMR adopts a progressive reinforcement strategy to provide a multi-level information exchange among different modalities for crossmodal interactions, resulting in reinforcing the source and target modalities simultaneously. Specifically, PACMR introduces an Adaptive Modality Reinforcement Unit (AMRU) to adaptively adjust the weights of self-attention and crossmodal attention for capturing reliable contextual dependencies of multimodal sequence data. Experiment results on the public First Impressions dataset demonstrate the effectiveness of the proposed method.
{"title":"PACMR: Progressive Adaptive Crossmodal Reinforcement for Multimodal Apparent Personality Traits Analysis","authors":"Peng Shen;Dandan Wang;Yingying Xu;Shiqing Zhang;Xiaoming Zhao","doi":"10.1109/LSP.2024.3505799","DOIUrl":"https://doi.org/10.1109/LSP.2024.3505799","url":null,"abstract":"Multimodal apparent personality traits analysis is a challenging issue due to the asynchrony among modalities. To address this issue, this paper proposes a Progressive Adaptive Crossmodal Reinforcement (PACMR) approach for multimodal apparent personality traits analysis. PACMR adopts a progressive reinforcement strategy to provide a multi-level information exchange among different modalities for crossmodal interactions, resulting in reinforcing the source and target modalities simultaneously. Specifically, PACMR introduces an Adaptive Modality Reinforcement Unit (AMRU) to adaptively adjust the weights of self-attention and crossmodal attention for capturing reliable contextual dependencies of multimodal sequence data. Experiment results on the public First Impressions dataset demonstrate the effectiveness of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"161-165"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844341","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cross-domain classification of hyperspectral remote sensing images is one of the hotspots of research in recent years, and its main problem is insufficient training samples. To address this issue, few-shot learning (FSL) has emerged as a promising paradigm in cross-domain classification tasks. However, a notable limitation of most existing FSL methods is that they focus only on local information and less on the critical role of global information. Based on this, this paper proposes a new feature processing method with adaptive band selection, which takes into account the global nature of image features. Firstly, adaptive band analysis is performed in the target domain, and threshold analysis is used to determine the number of selected bands. Secondly, a band selection method is employed to select representative bands from the spectral bands of the high-dimensional data according to the determined band count. Finally, the weights of the selected bands are analyzed, fully considering the importance of pixel weight, and then the results are used as inputs for the classification model. The experimental results on various datasets show that this method can effectively improve the classification accuracy and generalization ability. Meanwhile, the results of the objective accuracy index of the proposed method in different databases improved by 3.9%, 4.7% and 5.4%.
{"title":"An Attention-Based Feature Processing Method for Cross-Domain Hyperspectral Image Classification","authors":"Yazhen Wang;Guojun Liu;Lixia Yang;Junmin Liu;Lili Wei","doi":"10.1109/LSP.2024.3505793","DOIUrl":"https://doi.org/10.1109/LSP.2024.3505793","url":null,"abstract":"Cross-domain classification of hyperspectral remote sensing images is one of the hotspots of research in recent years, and its main problem is insufficient training samples. To address this issue, few-shot learning (FSL) has emerged as a promising paradigm in cross-domain classification tasks. However, a notable limitation of most existing FSL methods is that they focus only on local information and less on the critical role of global information. Based on this, this paper proposes a new feature processing method with adaptive band selection, which takes into account the global nature of image features. Firstly, adaptive band analysis is performed in the target domain, and threshold analysis is used to determine the number of selected bands. Secondly, a band selection method is employed to select representative bands from the spectral bands of the high-dimensional data according to the determined band count. Finally, the weights of the selected bands are analyzed, fully considering the importance of pixel weight, and then the results are used as inputs for the classification model. The experimental results on various datasets show that this method can effectively improve the classification accuracy and generalization ability. Meanwhile, the results of the objective accuracy index of the proposed method in different databases improved by 3.9%, 4.7% and 5.4%.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"196-200"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-25DOI: 10.1109/LSP.2024.3505794
Zilu Guo;Jun Du;Sabato Marco Siniscalchi;Jia Pan;Qingfeng Liu
We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach.
{"title":"Controllable Conformer for Speech Enhancement and Recognition","authors":"Zilu Guo;Jun Du;Sabato Marco Siniscalchi;Jia Pan;Qingfeng Liu","doi":"10.1109/LSP.2024.3505794","DOIUrl":"https://doi.org/10.1109/LSP.2024.3505794","url":null,"abstract":"We propose a novel approach to speech enhancement, termed Controllable ConforMer for Speech Enhancement (CCMSE), which leverages a Conformer-based architecture integrated with a control factor embedding module. Our method is designed to optimize speech quality for both human auditory perception and automatic speech recognition (ASR). It is observed that while mild denoising typically preserves speech naturalness, stronger denoising can improve human auditory tasks but often at the cost of ASR accuracy due to increased distortion. To address this, we introduce an algorithm that balances these trade-offs. By utilizing differential equations to interpolate between outputs at varying levels of denoising intensity, our method effectively combines the robustness of mild denoising with the clarity of stronger denoising, resulting in enhanced speech that is well-suited for both human and machine listeners. Experimental results on the CHiME-4 dataset validate the effectiveness of our approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"156-160"},"PeriodicalIF":3.2,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-22DOI: 10.1109/LSP.2024.3504374
Hui Luo;Xin Liu;Jian Sun;Yang Zhang
Vector quantized variational autoencoders, as variants of variational autoencoders, effectively capture discrete representations by quantizing continuous latent spaces and are widely used in generative tasks. However, these models still face limitations in handling complex image reconstruction, particularly in preserving high-quality details. Moreover, quaternion neural networks have shown unique advantages in handling multi-dimensional data, indicating that integrating quaternion approaches could potentially improve the performance of these autoencoders. To this end, we propose QVQ-VAE, a lightweight network in the quaternion domain that introduces a quaternion-based quantization layer and training strategy to improve reconstruction precision. By fully leveraging quaternion operations, QVQ-VAE reduces the number of model parameters, thereby lowering computational resource demands. Extensive evaluations on face and general object reconstruction tasks show that QVQ-VAE consistently outperforms existing methods while using significantly fewer parameters.
{"title":"Quaternion Vector Quantized Variational Autoencoder","authors":"Hui Luo;Xin Liu;Jian Sun;Yang Zhang","doi":"10.1109/LSP.2024.3504374","DOIUrl":"https://doi.org/10.1109/LSP.2024.3504374","url":null,"abstract":"Vector quantized variational autoencoders, as variants of variational autoencoders, effectively capture discrete representations by quantizing continuous latent spaces and are widely used in generative tasks. However, these models still face limitations in handling complex image reconstruction, particularly in preserving high-quality details. Moreover, quaternion neural networks have shown unique advantages in handling multi-dimensional data, indicating that integrating quaternion approaches could potentially improve the performance of these autoencoders. To this end, we propose QVQ-VAE, a lightweight network in the quaternion domain that introduces a quaternion-based quantization layer and training strategy to improve reconstruction precision. By fully leveraging quaternion operations, QVQ-VAE reduces the number of model parameters, thereby lowering computational resource demands. Extensive evaluations on face and general object reconstruction tasks show that QVQ-VAE consistently outperforms existing methods while using significantly fewer parameters.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"151-155"},"PeriodicalIF":3.2,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-11-21DOI: 10.1109/LSP.2024.3504339
Jie Gao;Shengqi Zhu;Lan Lan;Jinxin Sui;Ximin Li
The existence of range ambiguity and range dependence will seriously deteriorate the performance of space-time adaptive processing (STAP). In this regard, an adaptive range-ambiguous clutter separation method suitable for the element-pulse coding (EPC)-multiple-input multiple-output (MIMO) radar is developed in this letter. By introducing the EPC factor in both transmit elements and pulses, the clutter located in different range-ambiguous regions can be distinguished in the transmit spatial frequency dimension. Particularly, to ensure the separated performance of range-ambiguous clutter, the EPC factor is designed. Moreover, an approach on the basis of reweighted atomic norm minimization (RANM) is developed to separate the range-ambiguous clutter, leveraging the transmit spatial frequencies of clutter located in various range ambiguity areas. Furthermore, after clutter separation, the clutter is canceled via STAP individually in each range ambiguous region. A series of simulation results validate the efficacy of the proposed approach.
{"title":"Range-Ambiguous Clutter Separation via Reweighted Atomic Norm Minimization With EPC-MIMO Radar","authors":"Jie Gao;Shengqi Zhu;Lan Lan;Jinxin Sui;Ximin Li","doi":"10.1109/LSP.2024.3504339","DOIUrl":"https://doi.org/10.1109/LSP.2024.3504339","url":null,"abstract":"The existence of range ambiguity and range dependence will seriously deteriorate the performance of space-time adaptive processing (STAP). In this regard, an adaptive range-ambiguous clutter separation method suitable for the element-pulse coding (EPC)-multiple-input multiple-output (MIMO) radar is developed in this letter. By introducing the EPC factor in both transmit elements and pulses, the clutter located in different range-ambiguous regions can be distinguished in the transmit spatial frequency dimension. Particularly, to ensure the separated performance of range-ambiguous clutter, the EPC factor is designed. Moreover, an approach on the basis of reweighted atomic norm minimization (RANM) is developed to separate the range-ambiguous clutter, leveraging the transmit spatial frequencies of clutter located in various range ambiguity areas. Furthermore, after clutter separation, the clutter is canceled via STAP individually in each range ambiguous region. A series of simulation results validate the efficacy of the proposed approach.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"146-150"},"PeriodicalIF":3.2,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}