Pub Date : 2024-09-09DOI: 10.1109/LSP.2024.3456636
Yihao Li;Ru Zhang;Jianyi Liu;Qi Lei
With ongoing advancements in natural language technology, text steganography has achieved notable progress. However, existing methods primarily concentrate on the probability distribution between words, often overlooking comprehensive control over text semantics. Particularly in the case of longer texts, these methods struggle to preserve coherence and contextual consistency, thereby increasing the risk of detection in practical applications. To effectively improve steganography security, we propose a semantic controllable long-text steganography framework based on prompt engineering and knowledge graph (KG) integration, obviating supplementary training. This framework leverages triplets from the KG and task descriptions to construct prompts, directing the large language model (LLM) to generate text that aligns with the triplet content. Subsequently, the model effectively embeds secret information by encoding the candidate pools established around the sampled target words. The experimental results demonstrate that our framework ensures the concealment of steganographic text while maintaining the relevance and consistency of the content as expected. Moreover, it can be flexibly adapted to various application scenarios, showcasing its potential and advantages in practical implementations.
{"title":"A Semantic Controllable Long Text Steganography Framework Based on LLM Prompt Engineering and Knowledge Graph","authors":"Yihao Li;Ru Zhang;Jianyi Liu;Qi Lei","doi":"10.1109/LSP.2024.3456636","DOIUrl":"10.1109/LSP.2024.3456636","url":null,"abstract":"With ongoing advancements in natural language technology, text steganography has achieved notable progress. However, existing methods primarily concentrate on the probability distribution between words, often overlooking comprehensive control over text semantics. Particularly in the case of longer texts, these methods struggle to preserve coherence and contextual consistency, thereby increasing the risk of detection in practical applications. To effectively improve steganography security, we propose a semantic controllable long-text steganography framework based on prompt engineering and knowledge graph (KG) integration, obviating supplementary training. This framework leverages triplets from the KG and task descriptions to construct prompts, directing the large language model (LLM) to generate text that aligns with the triplet content. Subsequently, the model effectively embeds secret information by encoding the candidate pools established around the sampled target words. The experimental results demonstrate that our framework ensures the concealment of steganographic text while maintaining the relevance and consistency of the content as expected. Moreover, it can be flexibly adapted to various application scenarios, showcasing its potential and advantages in practical implementations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic facial expression recognition (DFER) plays a vital role in understanding human emotions and behaviors. Existing efforts tend to fall into a single modality self-supervised pretraining learning paradigm, which limits the representation ability of models. Besides, coarse-grained temporal modeling struggles to capture subtle facial expression representations from various inputs. In this letter, we propose a novel method for DFER, termed fine-grained temporal-enhanced transformer (FTET-DFER), which consists of two stages. First, we employ the inherent correlation between visual and auditory modalities in real videos, to capture temporally dense representations such as facial movements and expressions, in a self-supervised audio-visual learning manner. Second, we utilize the learned embeddings as targets, to achieve the DFER. In addition, we design the FTET block to study fine-grained temporal-enhanced facial expression features based on intra-clip locally-enhanced relations as well as inter-clip locally-enhanced global relationships in videos. Extensive experiments show that FTET-DFER outperforms the state-of-the-arts through within-dataset and cross-dataset evaluation.
{"title":"Fine-Grained Temporal-Enhanced Transformer for Dynamic Facial Expression Recognition","authors":"Yaning Zhang;Jiahe Zhang;Linlin Shen;Zitong Yu;Zan Gao","doi":"10.1109/LSP.2024.3456668","DOIUrl":"10.1109/LSP.2024.3456668","url":null,"abstract":"Dynamic facial expression recognition (DFER) plays a vital role in understanding human emotions and behaviors. Existing efforts tend to fall into a single modality self-supervised pretraining learning paradigm, which limits the representation ability of models. Besides, coarse-grained temporal modeling struggles to capture subtle facial expression representations from various inputs. In this letter, we propose a novel method for DFER, termed fine-grained temporal-enhanced transformer (FTET-DFER), which consists of two stages. First, we employ the inherent correlation between visual and auditory modalities in real videos, to capture temporally dense representations such as facial movements and expressions, in a self-supervised audio-visual learning manner. Second, we utilize the learned embeddings as targets, to achieve the DFER. In addition, we design the FTET block to study fine-grained temporal-enhanced facial expression features based on intra-clip locally-enhanced relations as well as inter-clip locally-enhanced global relationships in videos. Extensive experiments show that FTET-DFER outperforms the state-of-the-arts through within-dataset and cross-dataset evaluation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter introduces a speed-improved imaging technique for a real-aperture MIMO system engaged in off-focus imaging. The existing off-focus imaging algorithm is time-intensive due to the extensive interpolation and reliance on traditional frequency-domain back-projection algorithms (FDBPA). To address its limitation, the least squares (LS) method is used to fit the non-analytical single-trip history function into a second-order polynomial function. Both MIMO array and the wideband received chirp signal are subdivided into multiple sub-arrays and sub-bands to reconstruct the low-resolution images using FDBPA. The wavenumber spectrums of these low-resolution images are band-limited, thus they can be combined to get the global high-resolution image. Simulation and experiment confirmed the efficacy of the proposed technique called Fusion-BPA. It is a state-of-art, fast algorithm to deal with the non-linear scene imaging problem.
这封信介绍了一种用于离焦成像的真实孔径 MIMO 系统的速度改进型成像技术。现有的离焦成像算法需要大量插值并依赖于传统的频域反投影算法(FDBPA),因此耗费大量时间。为解决其局限性,采用最小二乘法(LS)将非分析性单程历史函数拟合为二阶多项式函数。MIMO 阵列和接收到的宽带啁啾信号都被细分为多个子阵列和子波段,利用 FDBPA 重建低分辨率图像。这些低分辨率图像的波谱是带限的,因此可以将它们组合起来得到全局高分辨率图像。模拟和实验证实了所提出的 Fusion-BPA 技术的有效性。这是一种处理非线性场景成像问题的先进快速算法。
{"title":"Speed-Improved Off-Focus Imaging Technique for Real-Aperture Imaging System Based on Wavenumber Spectrum Fusion","authors":"WenRui Zhang;ShiYou Wu;YiCai Ji;Chao Li;GuangYou Fang","doi":"10.1109/LSP.2024.3456635","DOIUrl":"10.1109/LSP.2024.3456635","url":null,"abstract":"This letter introduces a speed-improved imaging technique for a real-aperture MIMO system engaged in off-focus imaging. The existing off-focus imaging algorithm is time-intensive due to the extensive interpolation and reliance on traditional frequency-domain back-projection algorithms (FDBPA). To address its limitation, the least squares (LS) method is used to fit the non-analytical single-trip history function into a second-order polynomial function. Both MIMO array and the wideband received chirp signal are subdivided into multiple sub-arrays and sub-bands to reconstruct the low-resolution images using FDBPA. The wavenumber spectrums of these low-resolution images are band-limited, thus they can be combined to get the global high-resolution image. Simulation and experiment confirmed the efficacy of the proposed technique called Fusion-BPA. It is a state-of-art, fast algorithm to deal with the non-linear scene imaging problem.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218761","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1109/LSP.2024.3456632
Zhiming Zhang;Jiao Liu;Yongfeng Dong;Jun Zhang
Image retrieval aims to find the most semantically similar images in the database. Existing deep hash-based retrieval algorithms utilize data augmentation strategies thus generating generalized hash codes. However, simple data augmentation only improves the accuracy of hash codes from the perspective of sample diversity, without fully utilizing the inherent characteristics of the images. In this letter, we explore the frequency domain information of images and propose a Frequency Domain Auxiliary Network (FDANet) for deep hash retrieval. To capture frequency domain information that can cope with image transformations, we develop the spectrum enhancement module (SEM) in FDANet. The SEM utilizes Fourier transform techniques to extract the amplitude component that can reflect the low-level statistics of the image. Then, leveraging the extracted amplitude components, the retrieval network enhances its perception of regions undergoing relative changes in the original spatial domain. Experiments on several image retrieval benchmarks demonstrate that our method outperforms other state-of-the-art hash algorithms in terms of performance on the test metrics.
{"title":"A Frequency Domain Auxiliary Network for Image Retrieval","authors":"Zhiming Zhang;Jiao Liu;Yongfeng Dong;Jun Zhang","doi":"10.1109/LSP.2024.3456632","DOIUrl":"10.1109/LSP.2024.3456632","url":null,"abstract":"Image retrieval aims to find the most semantically similar images in the database. Existing deep hash-based retrieval algorithms utilize data augmentation strategies thus generating generalized hash codes. However, simple data augmentation only improves the accuracy of hash codes from the perspective of sample diversity, without fully utilizing the inherent characteristics of the images. In this letter, we explore the frequency domain information of images and propose a Frequency Domain Auxiliary Network (FDANet) for deep hash retrieval. To capture frequency domain information that can cope with image transformations, we develop the spectrum enhancement module (SEM) in FDANet. The SEM utilizes Fourier transform techniques to extract the amplitude component that can reflect the low-level statistics of the image. Then, leveraging the extracted amplitude components, the retrieval network enhances its perception of regions undergoing relative changes in the original spatial domain. Experiments on several image retrieval benchmarks demonstrate that our method outperforms other state-of-the-art hash algorithms in terms of performance on the test metrics.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1109/LSP.2024.3455990
Wenting Li;Zhuosheng Zhang;Rui Zhang
In this letter, we introduce an accurate group delay estimator and two high-resolution time-frequency analysis methods to characterize fast frequency-varying signals. Firstly, we explore the limitations of time-reassigned synchrosqueezing wavelet transform and its multi-synchrosqueezing case in dealing with fast frequency-varying signals. Secondly, we present Newton group delay estimator based on wavelet transform properties and Newton's method. Based on this, we introduce the Newton time-reassigned synchrosqueezing wavelet transform, which improves the readability of the time-frequency representation, by reassigning the wavelet transform coefficients into the group delay trajectories along the time direction, and further derive its reconstruction formula. Moreover, we propose Newton time-reassigned multi-synchrosqueezing wavelet transform by multiple squeezing operations, which can achieve a more concentrated time-frequency representation and accurate signal reconstruction. Finally, we employ synthetic and real signals to verify the effectiveness of the proposed methods on the time-frequency energy concentration, group delay estimation and signal reconstruction.
{"title":"Newton Time-Reassigned Multi-Synchrosqueezing Wavelet Transform","authors":"Wenting Li;Zhuosheng Zhang;Rui Zhang","doi":"10.1109/LSP.2024.3455990","DOIUrl":"10.1109/LSP.2024.3455990","url":null,"abstract":"In this letter, we introduce an accurate group delay estimator and two high-resolution time-frequency analysis methods to characterize fast frequency-varying signals. Firstly, we explore the limitations of time-reassigned synchrosqueezing wavelet transform and its multi-synchrosqueezing case in dealing with fast frequency-varying signals. Secondly, we present Newton group delay estimator based on wavelet transform properties and Newton's method. Based on this, we introduce the Newton time-reassigned synchrosqueezing wavelet transform, which improves the readability of the time-frequency representation, by reassigning the wavelet transform coefficients into the group delay trajectories along the time direction, and further derive its reconstruction formula. Moreover, we propose Newton time-reassigned multi-synchrosqueezing wavelet transform by multiple squeezing operations, which can achieve a more concentrated time-frequency representation and accurate signal reconstruction. Finally, we employ synthetic and real signals to verify the effectiveness of the proposed methods on the time-frequency energy concentration, group delay estimation and signal reconstruction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recently, matrix embedding (ME), a well-known steganographic technique, has been employed in reversible data hiding (RDH) for the first time, improving the performance of single histogram modification (SHM) methods. In this letter, the ME-based RDH strategy is extended from SHM to the more effective multiple histograms modification (MHM) to further improve the reversible embedding performance. The capacity-distortion model is first established in the novel scenario. Then, some theoretical results for payload partition and expansion-bins-determination are given. Finally, based on the derived theoretical investigations, an efficient RDH method with low computational complexity is proposed. Experimental results show that the proposed method can achieve better visual quality compared to some state-of-the-art methods.
最近,矩阵嵌入(ME)这一著名的隐写技术首次被用于可逆数据隐藏(RDH),从而改善了单直方图修改(SHM)方法的性能。在这封信中,基于 ME 的 RDH 策略从 SHM 扩展到了更有效的多直方图修改(MHM),从而进一步提高了可逆嵌入性能。首先在新方案中建立了容量-失真模型。然后,给出了有效载荷分割和扩展箱确定的一些理论结果。最后,基于推导出的理论研究,提出了一种计算复杂度较低的高效 RDH 方法。实验结果表明,与一些最先进的方法相比,所提出的方法可以获得更好的视觉质量。
{"title":"Matrix Embedding Based Multiple Histograms Modification for Efficient Reversible Data Hiding","authors":"Xiang Li;Mengyao Xiao;Xiaolong Li;Shijun Xiang;Yao Zhao","doi":"10.1109/LSP.2024.3455995","DOIUrl":"10.1109/LSP.2024.3455995","url":null,"abstract":"Recently, matrix embedding (ME), a well-known steganographic technique, has been employed in reversible data hiding (RDH) for the first time, improving the performance of single histogram modification (SHM) methods. In this letter, the ME-based RDH strategy is extended from SHM to the more effective multiple histograms modification (MHM) to further improve the reversible embedding performance. The capacity-distortion model is first established in the novel scenario. Then, some theoretical results for payload partition and expansion-bins-determination are given. Finally, based on the derived theoretical investigations, an efficient RDH method with low computational complexity is proposed. Experimental results show that the proposed method can achieve better visual quality compared to some state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1109/LSP.2024.3456005
Xingyuan Liang;Shijun Xiang
In reversible data hiding (RDH) community, researchers often embed bits by shifting prediction-error histogram based on the mean-square error metric (MSEM). This will cause more pixel distortion in the smooth areas. Considering that the human eye is more sensitive to the distortion in the smooth areas, in this letter, we propose a new histogram shifting strategy for RDH by referring to the general distortion metric (GDM). With the GDM, data can be embedded by first modifying those pixels in the texture areas. In both theoretical analysis and experimental testing, we have shown that the use of the proposed GDM-based histogram shifting strategy for RDH can further improve the visual quality of marked images in higher SSIM values by comparing with typical MSEM-based histogram shifting methods.
{"title":"General Distortion Metric Based Histogram Shifting for Reversible Data Hiding","authors":"Xingyuan Liang;Shijun Xiang","doi":"10.1109/LSP.2024.3456005","DOIUrl":"10.1109/LSP.2024.3456005","url":null,"abstract":"In reversible data hiding (RDH) community, researchers often embed bits by shifting prediction-error histogram based on the mean-square error metric (MSEM). This will cause more pixel distortion in the smooth areas. Considering that the human eye is more sensitive to the distortion in the smooth areas, in this letter, we propose a new histogram shifting strategy for RDH by referring to the general distortion metric (GDM). With the GDM, data can be embedded by first modifying those pixels in the texture areas. In both theoretical analysis and experimental testing, we have shown that the use of the proposed GDM-based histogram shifting strategy for RDH can further improve the visual quality of marked images in higher SSIM values by comparing with typical MSEM-based histogram shifting methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-06DOI: 10.1109/LSP.2024.3455994
Tianpeng Liu;Yun Cheng;Junpeng Shi;Zhen Liu;Yongxiang Liu
A sparsity-based adaptive beamforming (ABF) method is introduced to effectively process coherent signals with polarized sensor arrays (PSA). This method exploits the spatial sparsity of observed signals by transforming it into row-sparsity within a waveform-polarization composite matrix through data reorganization. This row-sparsity is subsequently cast as an $ell _{2,1}$