Pub Date : 2024-09-10DOI: 10.1109/LSP.2024.3457449
Jiahui Cao;Zhibo Yang;Xuefeng Chen
Efficient line spectral estimation methods applicable to sub-Nyquist sampling are drawing considerable attention in both academia and industry. In this letter, we propose an enhanced compressed sensing (CS) framework for line spectral estimation, termed sparsity-based compressed covariance sensing (SCCS). In terms of sampling, SCCS is implemented by periodic non-uniform sampling; In terms of recovery, SCCS focuses on compressed line spectral recovery using covariance information. Due to the dual priors on sparsity and structure, SCCS theoretically performs better than CS in compressed line spectral estimation. We explain this superiority from the mutual incoherence perspective: the sensing matrix in SCCS has a lower mutual coherence than that in classic CS. Extensive experimental results show a high consistency with the theoretical inference. All in all, SCCS opens many avenues for line spectral estimation.
{"title":"Compressed Line Spectral Estimation Using Covariance: A Sparse Reconstruction Perspective","authors":"Jiahui Cao;Zhibo Yang;Xuefeng Chen","doi":"10.1109/LSP.2024.3457449","DOIUrl":"10.1109/LSP.2024.3457449","url":null,"abstract":"Efficient line spectral estimation methods applicable to sub-Nyquist sampling are drawing considerable attention in both academia and industry. In this letter, we propose an enhanced compressed sensing (CS) framework for line spectral estimation, termed sparsity-based compressed covariance sensing (SCCS). In terms of sampling, SCCS is implemented by periodic non-uniform sampling; In terms of recovery, SCCS focuses on compressed line spectral recovery using covariance information. Due to the dual priors on sparsity and structure, SCCS theoretically performs better than CS in compressed line spectral estimation. We explain this superiority from the mutual incoherence perspective: the sensing matrix in SCCS has a lower mutual coherence than that in classic CS. Extensive experimental results show a high consistency with the theoretical inference. All in all, SCCS opens many avenues for line spectral estimation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1109/LSP.2024.3457244
Yiqing Lin;H. Vicky Zhao
The reliability and security of distributed detection systems have become increasingly important due to their growing prevalence in various applications. As advancements in human-machine systems continue, human factors, such as herding behaviors, are becoming influential in decision fusion process of these systems. The presence of malicious users further highlights the necessity to mitigate security concerns. In this paper, we propose a maximum entropy attack exploring the herding behaviors of users to amplify the hazard of attackers. Different from prior works that try to maximize the fusion error rate, the proposed attack maximizes the entropy of inferred system states from the fusion center, making the fusion results the same as a random coin toss. Moreover, we design static and dynamic attack modes to maximize the entropy of fusion results at the steady state and during the dynamic evolution stage, respectively. Simulation results show that the proposed attack strategy can cause the fusion accuracy to hover around 50% and existing fusion rules cannot resist our proposed attack, demonstrating its effectiveness.
{"title":"Maximum Entropy Attack on Decision Fusion With Herding Behaviors","authors":"Yiqing Lin;H. Vicky Zhao","doi":"10.1109/LSP.2024.3457244","DOIUrl":"10.1109/LSP.2024.3457244","url":null,"abstract":"The reliability and security of distributed detection systems have become increasingly important due to their growing prevalence in various applications. As advancements in human-machine systems continue, human factors, such as herding behaviors, are becoming influential in decision fusion process of these systems. The presence of malicious users further highlights the necessity to mitigate security concerns. In this paper, we propose a maximum entropy attack exploring the herding behaviors of users to amplify the hazard of attackers. Different from prior works that try to maximize the fusion error rate, the proposed attack maximizes the entropy of inferred system states from the fusion center, making the fusion results the same as a random coin toss. Moreover, we design static and dynamic attack modes to maximize the entropy of fusion results at the steady state and during the dynamic evolution stage, respectively. Simulation results show that the proposed attack strategy can cause the fusion accuracy to hover around 50% and existing fusion rules cannot resist our proposed attack, demonstrating its effectiveness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1109/LSP.2024.3457862
Zheng Zhou;Xu Guo;Yu-Jie Xiong;Chun-Ming Xia
In the field of time series forecasting, time series are often considered as linear time-varying systems, which facilitates the analysis and modeling of time series from a structural state perspective. Due to the non-stationary nature and noise interference in real-world data, existing models struggle to predict long-term time series effectively. To address this issue, we propose a novel model that integrates the Kalman filter with a state space model (SSM) approach to enhance the accuracy of long-term time series forecasting. The Kalman filter requires recursive computation, whereas the SSM approach reformulates the Kalman filtering process into a convolutional form, simplifying training and enhancing model efficiency. Our Kalman-SSM model estimates the future state of dynamic systems for forecasting by utilizing a series of time series data containing noise. In real-world datasets, the Kalman-SSM has demonstrated competitive performance and satisfactory efficiency in comparison to state-of-the-art (SOTA) models.
{"title":"Kalman-SSM: Modeling Long-Term Time Series With Kalman Filter Structured State Spaces","authors":"Zheng Zhou;Xu Guo;Yu-Jie Xiong;Chun-Ming Xia","doi":"10.1109/LSP.2024.3457862","DOIUrl":"10.1109/LSP.2024.3457862","url":null,"abstract":"In the field of time series forecasting, time series are often considered as linear time-varying systems, which facilitates the analysis and modeling of time series from a structural state perspective. Due to the non-stationary nature and noise interference in real-world data, existing models struggle to predict long-term time series effectively. To address this issue, we propose a novel model that integrates the Kalman filter with a state space model (SSM) approach to enhance the accuracy of long-term time series forecasting. The Kalman filter requires recursive computation, whereas the SSM approach reformulates the Kalman filtering process into a convolutional form, simplifying training and enhancing model efficiency. Our Kalman-SSM model estimates the future state of dynamic systems for forecasting by utilizing a series of time series data containing noise. In real-world datasets, the Kalman-SSM has demonstrated competitive performance and satisfactory efficiency in comparison to state-of-the-art (SOTA) models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1109/LSP.2024.3458792
Zhengwei Miao;Hui Luo;Dongxu Liu;Jianlin Zhang
Recently, Masked Autoencoders (MAE) have gained attention for their abilities to generate visual representations efficiently through pretext tasks. However, there has been little research evaluating the visual representations obtained by pre-trained MAE during the fine-tuning process. In this study, we address the gap by examining the attention maps within each block of the pre-trained MAE during the fine-tuning process. We observed artifacts in pre-trained models, which appear as significant responses in the attention maps of shallow blocks. These artifacts may negatively impact the transfer ability performance of MAE. To address this issue, we localize the cause of these artifacts to the asymmetry between the pre-training and fine-tuning processes. To suppress these artifacts, we propose a novel semantic masking strategy. This strategy aims to preserve complete and continuous semantic information within visible patches while maintaining randomness to facilitate robust representation learning. Experimental results demonstrate that the proposed masking strategy improves the performance of various downstream tasks while reducing artifacts. Specifically, we observed a 3.2% improvement in linear probing, a 0.5% enhancement in fine-tuning on Imagenet1K, and a 0.6% increase in semantic segmentation on ADE20K.
最近,掩码自动编码器(MAE)因其通过借口任务高效生成视觉表征的能力而备受关注。然而,很少有研究评估预训练 MAE 在微调过程中获得的视觉表征。在本研究中,我们通过检测微调过程中预训练 MAE 每个区块内的注意力图谱来填补这一空白。我们在预训练模型中观察到了假象,这些假象在浅区块的注意力图中表现为显著的反应。这些假象可能会对 MAE 的转移能力性能产生负面影响。为了解决这个问题,我们将这些假象的原因归结为预训练和微调过程之间的不对称。为了抑制这些假象,我们提出了一种新颖的语义屏蔽策略。该策略旨在保留可见斑块内完整、连续的语义信息,同时保持随机性,以促进稳健的表征学习。实验结果表明,所提出的屏蔽策略在减少伪像的同时,还提高了各种下游任务的性能。具体来说,我们在 Imagenet1K 上观察到线性探测性能提高了 3.2%,微调性能提高了 0.5%,在 ADE20K 上观察到语义分割性能提高了 0.6%。
{"title":"Improving Visual Representations of Masked Autoencoders With Artifacts Suppression","authors":"Zhengwei Miao;Hui Luo;Dongxu Liu;Jianlin Zhang","doi":"10.1109/LSP.2024.3458792","DOIUrl":"10.1109/LSP.2024.3458792","url":null,"abstract":"Recently, Masked Autoencoders (MAE) have gained attention for their abilities to generate visual representations efficiently through pretext tasks. However, there has been little research evaluating the visual representations obtained by pre-trained MAE during the fine-tuning process. In this study, we address the gap by examining the attention maps within each block of the pre-trained MAE during the fine-tuning process. We observed artifacts in pre-trained models, which appear as significant responses in the attention maps of shallow blocks. These artifacts may negatively impact the transfer ability performance of MAE. To address this issue, we localize the cause of these artifacts to the asymmetry between the pre-training and fine-tuning processes. To suppress these artifacts, we propose a novel semantic masking strategy. This strategy aims to preserve complete and continuous semantic information within visible patches while maintaining randomness to facilitate robust representation learning. Experimental results demonstrate that the proposed masking strategy improves the performance of various downstream tasks while reducing artifacts. Specifically, we observed a 3.2% improvement in linear probing, a 0.5% enhancement in fine-tuning on Imagenet1K, and a 0.6% increase in semantic segmentation on ADE20K.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1109/LSP.2024.3456673
Xiangyu Cheng;Yaofei Wang;Chang Liu;Donghui Hu;Zhaopin Su
Advancements in speech synthesis technology bring generated speech closer to natural human voices, but they also introduce a series of potential risks, such as the dissemination of false information and voice impersonation. Therefore, it becomes significant to detect any potential misuse of the released speech content. This letter introduces an active strategy that combines audio watermarking with the HiFi-GAN vocoder to embed an invisible watermark in all synthesized speech for detection purposes. We first pre-train a watermark extraction network as the watermark extractor, and then use the watermark extraction loss and speech quality loss of the extractor to adjust the HiFi-GAN generator to ensure that the watermark can be extracted from the synthesized speech. We evaluate the imperceptibility and robustness of the watermark across various speech synthesis models. The experimental results demonstrate that our method effectively withstands various attacks and exhibits excellent imperceptibility. Moreover, our method is universal and compatible with various vocoder-based speech synthesis models.
{"title":"HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN","authors":"Xiangyu Cheng;Yaofei Wang;Chang Liu;Donghui Hu;Zhaopin Su","doi":"10.1109/LSP.2024.3456673","DOIUrl":"10.1109/LSP.2024.3456673","url":null,"abstract":"Advancements in speech synthesis technology bring generated speech closer to natural human voices, but they also introduce a series of potential risks, such as the dissemination of false information and voice impersonation. Therefore, it becomes significant to detect any potential misuse of the released speech content. This letter introduces an active strategy that combines audio watermarking with the HiFi-GAN vocoder to embed an invisible watermark in all synthesized speech for detection purposes. We first pre-train a watermark extraction network as the watermark extractor, and then use the watermark extraction loss and speech quality loss of the extractor to adjust the HiFi-GAN generator to ensure that the watermark can be extracted from the synthesized speech. We evaluate the imperceptibility and robustness of the watermark across various speech synthesis models. The experimental results demonstrate that our method effectively withstands various attacks and exhibits excellent imperceptibility. Moreover, our method is universal and compatible with various vocoder-based speech synthesis models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5 $^{i}$