Pub Date : 2024-09-06DOI: 10.1109/LSP.2024.3455991
Ye Liu;Yan Pan;Jian Yin
Deep hashing algorithms can transform high-dimensional features into low-dimensional hash codes, which can reduce storage space and improve computational efficiency in traditional information retrieval (IR) and large model related retrieval augmented generation (RAG) scenarios. In recent years, pre-trained convolutional or transformer networks are commonly chosen as the backbone in deep hashing frameworks. This involves incorporating local loss constraints among training samples, and then fine-tuning the model to generate hash codes. Due to the relatively limited local information of constraints among training samples, we propose to design the novel anchor constraint and structural constraint as internal global loss constraints with the vision transformer network, and augment external information by integrating the large vision-language model, thereby enhancing the performance of hash code generation. Additionally, to enhance the scalability of the novel deep hashing framework, we propose to incorporate the adapter module to extend its application from the image domain to the audio domain. By conducting comparative experiments and ablation analysis on various image and audio datasets, it can be confirmed that the proposed method achieves state-of-the-art retrieval results.
{"title":"Enhancing Multi-Label Deep Hashing for Image and Audio With Joint Internal Global Loss Constraints and Large Vision-Language Model","authors":"Ye Liu;Yan Pan;Jian Yin","doi":"10.1109/LSP.2024.3455991","DOIUrl":"10.1109/LSP.2024.3455991","url":null,"abstract":"Deep hashing algorithms can transform high-dimensional features into low-dimensional hash codes, which can reduce storage space and improve computational efficiency in traditional information retrieval (IR) and large model related retrieval augmented generation (RAG) scenarios. In recent years, pre-trained convolutional or transformer networks are commonly chosen as the backbone in deep hashing frameworks. This involves incorporating local loss constraints among training samples, and then fine-tuning the model to generate hash codes. Due to the relatively limited local information of constraints among training samples, we propose to design the novel anchor constraint and structural constraint as internal global loss constraints with the vision transformer network, and augment external information by integrating the large vision-language model, thereby enhancing the performance of hash code generation. Additionally, to enhance the scalability of the novel deep hashing framework, we propose to incorporate the adapter module to extend its application from the image domain to the audio domain. By conducting comparative experiments and ablation analysis on various image and audio datasets, it can be confirmed that the proposed method achieves state-of-the-art retrieval results.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1109/LSP.2024.3456006
Xinxin Zhang;Wenjing Shang;Qiangchang Wang;Yongshun Gong;Qifang Liu
In this letter, we propose a precise algorithm to eliminate reflections from two images by utilizing temporal and spatial priors. For the temporal prior, we compute the motion information between reflection layers in the two input reflection-contaminated images. Different from numerous popular multi-image reflection removal methods, our proposed algorithm does not assume that two input images are captured under similar lighting conditions and the same camera settings. Furthermore, the proposed algorithm is robust to the difference between the two reflection layers, such as moving objects and different reflections. For the spatial term, a sparsity gradient regularization is adopted to enforce the spatial smoothness of transmission layers and reflection layers. Importantly, the proposed algorithm does not rely on additional training data or high-performance computing devices. Experimental results on both synthetic images and real-world photographs demonstrate that the proposed algorithm achieves State-of-the-Art performance.
{"title":"Spatio-Temporal Multi-Image Reflection Removal","authors":"Xinxin Zhang;Wenjing Shang;Qiangchang Wang;Yongshun Gong;Qifang Liu","doi":"10.1109/LSP.2024.3456006","DOIUrl":"10.1109/LSP.2024.3456006","url":null,"abstract":"In this letter, we propose a precise algorithm to eliminate reflections from two images by utilizing temporal and spatial priors. For the temporal prior, we compute the motion information between reflection layers in the two input reflection-contaminated images. Different from numerous popular multi-image reflection removal methods, our proposed algorithm does not assume that two input images are captured under similar lighting conditions and the same camera settings. Furthermore, the proposed algorithm is robust to the difference between the two reflection layers, such as moving objects and different reflections. For the spatial term, a sparsity gradient regularization is adopted to enforce the spatial smoothness of transmission layers and reflection layers. Importantly, the proposed algorithm does not rely on additional training data or high-performance computing devices. Experimental results on both synthetic images and real-world photographs demonstrate that the proposed algorithm achieves State-of-the-Art performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1109/LSP.2024.3455234
Jinhui Li;Xiaorun Li;Shuhan Chen
Self-supervised learning effectively leverages the information from unlabeled data to extract spatial-spectral features that are both representative and discriminative, partially addressing the challenge of high data annotation costs in hyperspectral image classification. Inspired by the success of redundancy reduction-based self-supervised learning in other domains, we introduce it into HSIC. We proposed a spatial-spectral feature extraction network, HyperBT, to more effectively reduce redundancy. Specifically, we added the off-diagonal terms of the cross-covariance matrix to the loss function and new data augmentation methods, including band bisection and edge weakening. Experimental results demonstrate that our method achieves high accuracy in classification, surpassing many state-of-the-art methods. Through ablation experiments, we validate the effectiveness of each component in the loss function.
{"title":"HyperBT: Redundancy Reduction-Based Self-Supervised Learning for Hyperspectral Image Classification","authors":"Jinhui Li;Xiaorun Li;Shuhan Chen","doi":"10.1109/LSP.2024.3455234","DOIUrl":"10.1109/LSP.2024.3455234","url":null,"abstract":"Self-supervised learning effectively leverages the information from unlabeled data to extract spatial-spectral features that are both representative and discriminative, partially addressing the challenge of high data annotation costs in hyperspectral image classification. Inspired by the success of redundancy reduction-based self-supervised learning in other domains, we introduce it into HSIC. We proposed a spatial-spectral feature extraction network, HyperBT, to more effectively reduce redundancy. Specifically, we added the off-diagonal terms of the cross-covariance matrix to the loss function and new data augmentation methods, including band bisection and edge weakening. Experimental results demonstrate that our method achieves high accuracy in classification, surpassing many state-of-the-art methods. Through ablation experiments, we validate the effectiveness of each component in the loss function.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1109/LSP.2024.3455230
Ye Zhu;Jian Liu;Yang Yu;Yingchun Guo;Xiaoke Hao
Recent developments in image editing techniques have given rise to serious challenges to the credibility of multimedia data. Although some deep learning methods have achieved impressive results, they often fail to detect subtle edge artefacts, and current mainstream methods focus mainly on the foreground content and ignore the background content, which also contains abundant information related to manipulation. To address this issue, this letter proposes a progressive mask transformer with an edge enhancement network for image manipulation localization. Specifically, an edge enhancement flow is introduced to detect subtle manipulated edge artefacts and guide the localization of manipulated regions. Then, the manipulated, genuine and global features are progressively refined using a progressive mask transformer module. We perform extensive experiments on NIST16, Coverage, CASIA and IMD20 datasets to verify the effectiveness of our method, and the results demonstrate that the proposed method outperforms state-of-the-art methods by a wide margin based on on commonly used evaluation metrics.
{"title":"Progressive Mask Transformer With Edge Enhancement for Image Manipulation Localization","authors":"Ye Zhu;Jian Liu;Yang Yu;Yingchun Guo;Xiaoke Hao","doi":"10.1109/LSP.2024.3455230","DOIUrl":"10.1109/LSP.2024.3455230","url":null,"abstract":"Recent developments in image editing techniques have given rise to serious challenges to the credibility of multimedia data. Although some deep learning methods have achieved impressive results, they often fail to detect subtle edge artefacts, and current mainstream methods focus mainly on the foreground content and ignore the background content, which also contains abundant information related to manipulation. To address this issue, this letter proposes a progressive mask transformer with an edge enhancement network for image manipulation localization. Specifically, an edge enhancement flow is introduced to detect subtle manipulated edge artefacts and guide the localization of manipulated regions. Then, the manipulated, genuine and global features are progressively refined using a progressive mask transformer module. We perform extensive experiments on NIST16, Coverage, CASIA and IMD20 datasets to verify the effectiveness of our method, and the results demonstrate that the proposed method outperforms state-of-the-art methods by a wide margin based on on commonly used evaluation metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1109/LSP.2024.3453655
Panqi Chen;Lei Cheng
This letter introduces a structured high-rank tensor approach for estimating sub-6G uplink channels in multi-user multiple-input and multiple-output (MU-MIMO) systems. To tackle the difficulty of channel estimation in sub-6G bands with hundreds of sub-paths, our approach fully exploits the physical structure of channel and establishes the link between sub-6G channel model and a high-rank four-dimensional (4D) tensor Canonical Polyadic Decomposition (CPD) with three factor matrices being Vandermonde-constrained. Accordingly, a stronger uniqueness property is derived in this work. This model supports an efficient one-pass algorithm for estimating sub-path parameters, which ensures plug-in compatibility with the widely-used baseline. Our method performs much better than the state-of-the-art tensor-based techniques on the simulations adhering to the 3GPP-R18 5G protocols.
{"title":"Estimating Channels With Hundreds of Sub-Paths for MU-MIMO Uplink: A Structured High-Rank Tensor Approach","authors":"Panqi Chen;Lei Cheng","doi":"10.1109/LSP.2024.3453655","DOIUrl":"https://doi.org/10.1109/LSP.2024.3453655","url":null,"abstract":"This letter introduces a structured high-rank tensor approach for estimating sub-6G uplink channels in multi-user multiple-input and multiple-output (MU-MIMO) systems. To tackle the difficulty of channel estimation in sub-6G bands with hundreds of sub-paths, our approach fully exploits the physical structure of channel and establishes the link between sub-6G channel model and a high-rank four-dimensional (4D) tensor Canonical Polyadic Decomposition (CPD) with three factor matrices being Vandermonde-constrained. Accordingly, a stronger uniqueness property is derived in this work. This model supports an efficient one-pass algorithm for estimating sub-path parameters, which ensures plug-in compatibility with the widely-used baseline. Our method performs much better than the state-of-the-art tensor-based techniques on the simulations adhering to the 3GPP-R18 5G protocols.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1109/LSP.2024.3453201
Chen Mao;Chong Tan;Jingqi Hu;Min Zheng
Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1.
{"title":"Time-Frequency Analysis of Variable-Length WiFi CSI Signals for Person Re-Identification","authors":"Chen Mao;Chong Tan;Jingqi Hu;Min Zheng","doi":"10.1109/LSP.2024.3453201","DOIUrl":"https://doi.org/10.1109/LSP.2024.3453201","url":null,"abstract":"Person re-identification (ReID), as a crucial technology in the field of security, plays an important role in security detection and people counting. Current security and monitoring systems largely rely on visual information, which may infringe on personal privacy and be susceptible to interference from pedestrian appearances and clothing in certain scenarios. Meanwhile, the widespread use of routers offers new possibilities for ReID. This letter introduces a method using WiFi Channel State Information (CSI), leveraging the multipath propagation characteristics of WiFi signals as a basis for distinguishing different pedestrian features. We propose a two-stream network structure capable of processing variable-length data, which analyzes the amplitude in the time domain and the phase in the frequency domain of WiFi signals, fuses time-frequency information through continuous lateral connections, and employs advanced objective functions for representation and metric learning. Tested on a dataset collected in the real world, our method achieves 93.68% mAP and 98.13% Rank-1.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1109/LSP.2024.3453662
Yan Pan;Shunwai Zhang
In this letter, we propose a novel reconfigurable intelligent surface (RIS)-assisted coded cooperation system based on polar codes to pursue the ultra-reliable and global coverage transmission. Firstly, we establish the RIS-assisted coded cooperation system based on polar codes. The polar codes employed at the source and relay are jointly designed by Plotkin construction method, and the joint decoding is performed at the destination. Subsequently, we derive the theoretical expressions for ergodic capacity (EC) under Nakagami- $m$