首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
Compressed Line Spectral Estimation Using Covariance: A Sparse Reconstruction Perspective 利用协方差进行压缩线谱估计:稀疏重构视角
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-10 DOI: 10.1109/LSP.2024.3457449
Jiahui Cao;Zhibo Yang;Xuefeng Chen
Efficient line spectral estimation methods applicable to sub-Nyquist sampling are drawing considerable attention in both academia and industry. In this letter, we propose an enhanced compressed sensing (CS) framework for line spectral estimation, termed sparsity-based compressed covariance sensing (SCCS). In terms of sampling, SCCS is implemented by periodic non-uniform sampling; In terms of recovery, SCCS focuses on compressed line spectral recovery using covariance information. Due to the dual priors on sparsity and structure, SCCS theoretically performs better than CS in compressed line spectral estimation. We explain this superiority from the mutual incoherence perspective: the sensing matrix in SCCS has a lower mutual coherence than that in classic CS. Extensive experimental results show a high consistency with the theoretical inference. All in all, SCCS opens many avenues for line spectral estimation.
适用于亚奈奎斯特采样的高效线谱估算方法受到学术界和工业界的广泛关注。在这封信中,我们提出了一种用于线谱估计的增强型压缩传感(CS)框架,称为基于稀疏性的压缩协方差传感(SCCS)。在采样方面,SCCS 是通过周期性非均匀采样实现的;在恢复方面,SCCS 侧重于利用协方差信息进行压缩线谱恢复。由于稀疏性和结构的双重先验,理论上 SCCS 在压缩线谱估计方面比 CS 性能更好。我们从互不相干的角度来解释这种优越性:SCCS 中的传感矩阵比经典 CS 中的传感矩阵具有更低的互不相干性。广泛的实验结果表明,这与理论推论高度一致。总之,SCCS 为线谱估计开辟了许多途径。
{"title":"Compressed Line Spectral Estimation Using Covariance: A Sparse Reconstruction Perspective","authors":"Jiahui Cao;Zhibo Yang;Xuefeng Chen","doi":"10.1109/LSP.2024.3457449","DOIUrl":"10.1109/LSP.2024.3457449","url":null,"abstract":"Efficient line spectral estimation methods applicable to sub-Nyquist sampling are drawing considerable attention in both academia and industry. In this letter, we propose an enhanced compressed sensing (CS) framework for line spectral estimation, termed sparsity-based compressed covariance sensing (SCCS). In terms of sampling, SCCS is implemented by periodic non-uniform sampling; In terms of recovery, SCCS focuses on compressed line spectral recovery using covariance information. Due to the dual priors on sparsity and structure, SCCS theoretically performs better than CS in compressed line spectral estimation. We explain this superiority from the mutual incoherence perspective: the sensing matrix in SCCS has a lower mutual coherence than that in classic CS. Extensive experimental results show a high consistency with the theoretical inference. All in all, SCCS opens many avenues for line spectral estimation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Maximum Entropy Attack on Decision Fusion With Herding Behaviors 对具有羊群行为的决策融合的最大熵攻击
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-10 DOI: 10.1109/LSP.2024.3457244
Yiqing Lin;H. Vicky Zhao
The reliability and security of distributed detection systems have become increasingly important due to their growing prevalence in various applications. As advancements in human-machine systems continue, human factors, such as herding behaviors, are becoming influential in decision fusion process of these systems. The presence of malicious users further highlights the necessity to mitigate security concerns. In this paper, we propose a maximum entropy attack exploring the herding behaviors of users to amplify the hazard of attackers. Different from prior works that try to maximize the fusion error rate, the proposed attack maximizes the entropy of inferred system states from the fusion center, making the fusion results the same as a random coin toss. Moreover, we design static and dynamic attack modes to maximize the entropy of fusion results at the steady state and during the dynamic evolution stage, respectively. Simulation results show that the proposed attack strategy can cause the fusion accuracy to hover around 50% and existing fusion rules cannot resist our proposed attack, demonstrating its effectiveness.
由于分布式检测系统在各种应用中越来越普遍,其可靠性和安全性也变得越来越重要。随着人机系统的不断进步,人的因素(如群居行为)在这些系统的决策融合过程中变得越来越有影响力。恶意用户的存在进一步凸显了减轻安全问题的必要性。在本文中,我们提出了一种最大熵攻击,利用用户的羊群行为来扩大攻击者的危害。与之前试图最大化融合错误率的研究不同,本文提出的攻击最大化了从融合中心推断出的系统状态的熵,使融合结果与随机掷硬币的结果相同。此外,我们还设计了静态和动态攻击模式,分别在稳定状态和动态演化阶段最大化融合结果的熵。仿真结果表明,建议的攻击策略能使融合精度徘徊在 50%左右,现有的融合规则无法抵御我们建议的攻击,这证明了它的有效性。
{"title":"Maximum Entropy Attack on Decision Fusion With Herding Behaviors","authors":"Yiqing Lin;H. Vicky Zhao","doi":"10.1109/LSP.2024.3457244","DOIUrl":"10.1109/LSP.2024.3457244","url":null,"abstract":"The reliability and security of distributed detection systems have become increasingly important due to their growing prevalence in various applications. As advancements in human-machine systems continue, human factors, such as herding behaviors, are becoming influential in decision fusion process of these systems. The presence of malicious users further highlights the necessity to mitigate security concerns. In this paper, we propose a maximum entropy attack exploring the herding behaviors of users to amplify the hazard of attackers. Different from prior works that try to maximize the fusion error rate, the proposed attack maximizes the entropy of inferred system states from the fusion center, making the fusion results the same as a random coin toss. Moreover, we design static and dynamic attack modes to maximize the entropy of fusion results at the steady state and during the dynamic evolution stage, respectively. Simulation results show that the proposed attack strategy can cause the fusion accuracy to hover around 50% and existing fusion rules cannot resist our proposed attack, demonstrating its effectiveness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142227365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Kalman-SSM: Modeling Long-Term Time Series With Kalman Filter Structured State Spaces 卡尔曼-SSM:用卡尔曼滤波器结构化状态空间为长期时间序列建模
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-10 DOI: 10.1109/LSP.2024.3457862
Zheng Zhou;Xu Guo;Yu-Jie Xiong;Chun-Ming Xia
In the field of time series forecasting, time series are often considered as linear time-varying systems, which facilitates the analysis and modeling of time series from a structural state perspective. Due to the non-stationary nature and noise interference in real-world data, existing models struggle to predict long-term time series effectively. To address this issue, we propose a novel model that integrates the Kalman filter with a state space model (SSM) approach to enhance the accuracy of long-term time series forecasting. The Kalman filter requires recursive computation, whereas the SSM approach reformulates the Kalman filtering process into a convolutional form, simplifying training and enhancing model efficiency. Our Kalman-SSM model estimates the future state of dynamic systems for forecasting by utilizing a series of time series data containing noise. In real-world datasets, the Kalman-SSM has demonstrated competitive performance and satisfactory efficiency in comparison to state-of-the-art (SOTA) models.
在时间序列预测领域,时间序列通常被视为线性时变系统,这有利于从结构状态的角度对时间序列进行分析和建模。由于现实世界数据的非平稳性和噪声干扰,现有模型难以有效预测长期时间序列。为解决这一问题,我们提出了一种将卡尔曼滤波器与状态空间模型(SSM)方法相结合的新型模型,以提高长期时间序列预测的准确性。卡尔曼滤波需要递归计算,而状态空间模型方法将卡尔曼滤波过程重新组合为卷积形式,从而简化了训练并提高了模型效率。我们的卡尔曼-SSM 模型通过利用一系列含有噪声的时间序列数据来估计动态系统的未来状态,从而进行预测。在实际数据集中,Kalman-SSM 与最先进的(SOTA)模型相比,表现出了极具竞争力的性能和令人满意的效率。
{"title":"Kalman-SSM: Modeling Long-Term Time Series With Kalman Filter Structured State Spaces","authors":"Zheng Zhou;Xu Guo;Yu-Jie Xiong;Chun-Ming Xia","doi":"10.1109/LSP.2024.3457862","DOIUrl":"10.1109/LSP.2024.3457862","url":null,"abstract":"In the field of time series forecasting, time series are often considered as linear time-varying systems, which facilitates the analysis and modeling of time series from a structural state perspective. Due to the non-stationary nature and noise interference in real-world data, existing models struggle to predict long-term time series effectively. To address this issue, we propose a novel model that integrates the Kalman filter with a state space model (SSM) approach to enhance the accuracy of long-term time series forecasting. The Kalman filter requires recursive computation, whereas the SSM approach reformulates the Kalman filtering process into a convolutional form, simplifying training and enhancing model efficiency. Our Kalman-SSM model estimates the future state of dynamic systems for forecasting by utilizing a series of time series data containing noise. In real-world datasets, the Kalman-SSM has demonstrated competitive performance and satisfactory efficiency in comparison to state-of-the-art (SOTA) models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218752","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Visual Representations of Masked Autoencoders With Artifacts Suppression 通过抑制伪影改进屏蔽自动编码器的视觉呈现
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-10 DOI: 10.1109/LSP.2024.3458792
Zhengwei Miao;Hui Luo;Dongxu Liu;Jianlin Zhang
Recently, Masked Autoencoders (MAE) have gained attention for their abilities to generate visual representations efficiently through pretext tasks. However, there has been little research evaluating the visual representations obtained by pre-trained MAE during the fine-tuning process. In this study, we address the gap by examining the attention maps within each block of the pre-trained MAE during the fine-tuning process. We observed artifacts in pre-trained models, which appear as significant responses in the attention maps of shallow blocks. These artifacts may negatively impact the transfer ability performance of MAE. To address this issue, we localize the cause of these artifacts to the asymmetry between the pre-training and fine-tuning processes. To suppress these artifacts, we propose a novel semantic masking strategy. This strategy aims to preserve complete and continuous semantic information within visible patches while maintaining randomness to facilitate robust representation learning. Experimental results demonstrate that the proposed masking strategy improves the performance of various downstream tasks while reducing artifacts. Specifically, we observed a 3.2% improvement in linear probing, a 0.5% enhancement in fine-tuning on Imagenet1K, and a 0.6% increase in semantic segmentation on ADE20K.
最近,掩码自动编码器(MAE)因其通过借口任务高效生成视觉表征的能力而备受关注。然而,很少有研究评估预训练 MAE 在微调过程中获得的视觉表征。在本研究中,我们通过检测微调过程中预训练 MAE 每个区块内的注意力图谱来填补这一空白。我们在预训练模型中观察到了假象,这些假象在浅区块的注意力图中表现为显著的反应。这些假象可能会对 MAE 的转移能力性能产生负面影响。为了解决这个问题,我们将这些假象的原因归结为预训练和微调过程之间的不对称。为了抑制这些假象,我们提出了一种新颖的语义屏蔽策略。该策略旨在保留可见斑块内完整、连续的语义信息,同时保持随机性,以促进稳健的表征学习。实验结果表明,所提出的屏蔽策略在减少伪像的同时,还提高了各种下游任务的性能。具体来说,我们在 Imagenet1K 上观察到线性探测性能提高了 3.2%,微调性能提高了 0.5%,在 ADE20K 上观察到语义分割性能提高了 0.6%。
{"title":"Improving Visual Representations of Masked Autoencoders With Artifacts Suppression","authors":"Zhengwei Miao;Hui Luo;Dongxu Liu;Jianlin Zhang","doi":"10.1109/LSP.2024.3458792","DOIUrl":"10.1109/LSP.2024.3458792","url":null,"abstract":"Recently, Masked Autoencoders (MAE) have gained attention for their abilities to generate visual representations efficiently through pretext tasks. However, there has been little research evaluating the visual representations obtained by pre-trained MAE during the fine-tuning process. In this study, we address the gap by examining the attention maps within each block of the pre-trained MAE during the fine-tuning process. We observed artifacts in pre-trained models, which appear as significant responses in the attention maps of shallow blocks. These artifacts may negatively impact the transfer ability performance of MAE. To address this issue, we localize the cause of these artifacts to the asymmetry between the pre-training and fine-tuning processes. To suppress these artifacts, we propose a novel semantic masking strategy. This strategy aims to preserve complete and continuous semantic information within visible patches while maintaining randomness to facilitate robust representation learning. Experimental results demonstrate that the proposed masking strategy improves the performance of various downstream tasks while reducing artifacts. Specifically, we observed a 3.2% improvement in linear probing, a 0.5% enhancement in fine-tuning on Imagenet1K, and a 0.6% increase in semantic segmentation on ADE20K.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN HiFi-GANw:通过微调 HiFi-GAN(高保真广义网络)进行水印语音合成
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-10 DOI: 10.1109/LSP.2024.3456673
Xiangyu Cheng;Yaofei Wang;Chang Liu;Donghui Hu;Zhaopin Su
Advancements in speech synthesis technology bring generated speech closer to natural human voices, but they also introduce a series of potential risks, such as the dissemination of false information and voice impersonation. Therefore, it becomes significant to detect any potential misuse of the released speech content. This letter introduces an active strategy that combines audio watermarking with the HiFi-GAN vocoder to embed an invisible watermark in all synthesized speech for detection purposes. We first pre-train a watermark extraction network as the watermark extractor, and then use the watermark extraction loss and speech quality loss of the extractor to adjust the HiFi-GAN generator to ensure that the watermark can be extracted from the synthesized speech. We evaluate the imperceptibility and robustness of the watermark across various speech synthesis models. The experimental results demonstrate that our method effectively withstands various attacks and exhibits excellent imperceptibility. Moreover, our method is universal and compatible with various vocoder-based speech synthesis models.
语音合成技术的进步使生成的语音更接近自然人的声音,但也带来了一系列潜在风险,如传播虚假信息和语音冒充。因此,检测对已发布语音内容的任何潜在滥用变得尤为重要。这封信介绍了一种将音频水印与 HiFi-GAN 声码器相结合的主动策略,即在所有合成语音中嵌入不可见的水印,以达到检测目的。我们首先预训练一个水印提取网络作为水印提取器,然后利用提取器的水印提取损失和语音质量损失来调整 HiFi-GAN 生成器,以确保能从合成语音中提取水印。我们评估了水印在各种语音合成模型中的不可感知性和鲁棒性。实验结果表明,我们的方法能有效抵御各种攻击,并表现出卓越的不可感知性。此外,我们的方法具有通用性,可兼容各种基于声码器的语音合成模型。
{"title":"HiFi-GANw: Watermarked Speech Synthesis via Fine-Tuning of HiFi-GAN","authors":"Xiangyu Cheng;Yaofei Wang;Chang Liu;Donghui Hu;Zhaopin Su","doi":"10.1109/LSP.2024.3456673","DOIUrl":"10.1109/LSP.2024.3456673","url":null,"abstract":"Advancements in speech synthesis technology bring generated speech closer to natural human voices, but they also introduce a series of potential risks, such as the dissemination of false information and voice impersonation. Therefore, it becomes significant to detect any potential misuse of the released speech content. This letter introduces an active strategy that combines audio watermarking with the HiFi-GAN vocoder to embed an invisible watermark in all synthesized speech for detection purposes. We first pre-train a watermark extraction network as the watermark extractor, and then use the watermark extraction loss and speech quality loss of the extractor to adjust the HiFi-GAN generator to ensure that the watermark can be extracted from the synthesized speech. We evaluate the imperceptibility and robustness of the watermark across various speech synthesis models. The experimental results demonstrate that our method effectively withstands various attacks and exhibits excellent imperceptibility. Moreover, our method is universal and compatible with various vocoder-based speech synthesis models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation 用于三模态图像少镜头语义分割的交互式融合与相关网络
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-09 DOI: 10.1109/LSP.2024.3456634
Haolan He;Xianguo Dong;Xiaofei Zhou;Bo Wang;Jiyong Zhang
This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5$^{i}$ dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.
这封信提出了一种新颖的三模态图像少镜头语义分割方法。以往的一些方法是在特征相关之前融合多种模态,但这会改变对后续特征匹配有用的原始视觉信息。还有一些方法是建立在早期相关学习的基础上,这会导致细节丢失,从而影响多模态融合。为了应对这些挑战,我们建立了一个新颖的交互式融合与相关网络(IFCNet)。具体来说,所提出的融合与关联(FC)模块以交互方式执行特征关联和基于注意力的多模态融合,从而建立有效的模态间互补性,并有利于模态内的查询支持关联。此外,我们还添加了多模态相关(MC)模块,利用多层余弦相似性图来丰富多模态视觉对应关系。在 VDT-2048-5$^{i}$ 数据集上进行的实验证明了该网络的卓越性能,在 1 次拍摄和 5 次拍摄设置中均优于现有的先进方法。研究还包括一项消融分析,以验证 FC 模块和 MC 模块对整体分割准确性的贡献。
{"title":"Interactive Fusion and Correlation Network for Three-Modal Images Few-Shot Semantic Segmentation","authors":"Haolan He;Xianguo Dong;Xiaofei Zhou;Bo Wang;Jiyong Zhang","doi":"10.1109/LSP.2024.3456634","DOIUrl":"10.1109/LSP.2024.3456634","url":null,"abstract":"This letter presents a novel method for three-modal images few-shot semantic segmentation. Some previous efforts fuse multiple modalities before feature correlation, while this changes the original visual information that is useful to subsequent feature matching. Others are built based on early correlation learning, which can cause details loss and thereby defects multi-modal integration. To address these challenges, we build a novel interactive fusion and correlation network (IFCNet). Specifically, the proposed fusing and correlating (FC) module performs feature correlating and attention-based multi-modal fusing interactively, which establishes effective inter-modal complementarity and benefits intra-modal query-support correlation. Furthermore, we add a multi-modal correlation (MC) module, which leverages multi-layer cosine similarity maps to enrich multi-modal visual correspondence. Experiments on the VDT-2048-5\u0000<inline-formula><tex-math>$^{i}$</tex-math></inline-formula>\u0000 dataset demonstrate the network's superior performance, which outperforms existing state-of-the-art methods in both 1-shot and 5-shot settings. The study also includes an ablation analysis to validate the contributions of the FC module and the MC module to the overall segmentation accuracy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Adaptive Filter Network With Scale-Sharing Convolution for Image Demoiréing 利用规模共享卷积的空间自适应滤波网络进行图像演示
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-09 DOI: 10.1109/LSP.2024.3451948
Yong Xu;Zhiyu Wei;Ruotao Xu;Zihan Zhou;Zhuliang Yu
Removing moiré patterns is a challenging task as it is a spatially varying degradation that varies in shape, color and scale. Existing image restoration models often rely on static convolutional neural networks (CNNs)-based architectures, and hence potentially suboptimal for addressing the diverse manifestations of moiré patterns across different images and spatial positions. To this end, we propose a spatially adaptive neural network for image demoiréing. This network introduces a dual-branch filter prediction module engineered to predict pixel-wise adaptive filters that can process moiré patterns of varying orientations and color-shift issues. To further tackle the challenge presented by scale variability, a scale-sharing convolution module is proposed, utilizing pixel-wise adaptive filters with multiple dilations to handle moiré patterns of different sizes but similar shapes effectively. Upon extensive evaluations of three benchmark datasets, our model consistently outperforms existing methods, yielding a PSNR improvement of over 0.37dB across all evaluated datasets and providing additional benefits in terms of model size.
消除摩尔纹是一项极具挑战性的任务,因为摩尔纹是一种在形状、颜色和比例上都各不相同的空间退化现象。现有的图像修复模型通常依赖于基于静态卷积神经网络(CNN)的架构,因此可能无法很好地解决摩尔纹在不同图像和空间位置上的各种表现形式。为此,我们提出了一种空间自适应神经网络,用于图像纹理分析。该网络引入了一个双分支滤波器预测模块,旨在预测像素自适应滤波器,以处理不同方向和色移问题的摩尔纹图案。为了进一步应对尺度变化带来的挑战,我们提出了一个尺度共享卷积模块,利用像素自适应滤波器的多重扩张来有效处理大小不同但形状相似的摩尔纹图案。经过对三个基准数据集的广泛评估,我们的模型始终优于现有方法,在所有评估数据集上的 PSNR 均提高了 0.37dB 以上,并在模型大小方面提供了额外的优势。
{"title":"Spatial Adaptive Filter Network With Scale-Sharing Convolution for Image Demoiréing","authors":"Yong Xu;Zhiyu Wei;Ruotao Xu;Zihan Zhou;Zhuliang Yu","doi":"10.1109/LSP.2024.3451948","DOIUrl":"10.1109/LSP.2024.3451948","url":null,"abstract":"Removing moiré patterns is a challenging task as it is a spatially varying degradation that varies in shape, color and scale. Existing image restoration models often rely on static convolutional neural networks (CNNs)-based architectures, and hence potentially suboptimal for addressing the diverse manifestations of moiré patterns across different images and spatial positions. To this end, we propose a spatially adaptive neural network for image demoiréing. This network introduces a dual-branch filter prediction module engineered to predict pixel-wise adaptive filters that can process moiré patterns of varying orientations and color-shift issues. To further tackle the challenge presented by scale variability, a scale-sharing convolution module is proposed, utilizing pixel-wise adaptive filters with multiple dilations to handle moiré patterns of different sizes but similar shapes effectively. Upon extensive evaluations of three benchmark datasets, our model consistently outperforms existing methods, yielding a PSNR improvement of over 0.37dB across all evaluated datasets and providing additional benefits in terms of model size.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition 基于聚类的双流时态特征聚合用于少量动作识别
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-09 DOI: 10.1109/LSP.2024.3456670
Long Deng;Ao Li;Bingxin Zhou;Yongxin Ge
The metric learning paradigm has achieved notable success in few-shot action recognition; however, it faces unaddressed challenges. Specifically, (1) limited training data could impede the exploration of temporal action relations, and (2) precision would decline from the presence of outliers during the frame-level feature alignment. To address the challenges, we propose a two-stream temporal feature aggregation method based on clustering, incorporating a temporal augmentation module (TAM) and a feature aggregation module (FAM). The TAM adeptly integrates three consecutive grayscale frames into the original RGB frame through weighted summation, thereby addressing the color-related misguidance and enhancing the temporal information extraction. Meanwhile, the FAM employs clustering to aggregate the frame-level features into high semantic sub-actions and replaces the original features with cluster centers to mitigate the adverse impact of outliers on the model performance. Experimental results on benchmark datasets demonstrate the effectiveness of our method in few-shot action recognition. We validate our proposed approach by conducting comprehensive ablation experiments.
度量学习范式在少镜头动作识别方面取得了显著的成功,但也面临着尚未解决的挑战。具体来说,(1) 有限的训练数据可能会阻碍对时间动作关系的探索,(2) 在帧级特征对齐过程中,由于异常值的存在,精度会下降。为了应对这些挑战,我们提出了一种基于聚类的双流时态特征聚合方法,其中包含一个时态增强模块(TAM)和一个特征聚合模块(FAM)。TAM 通过加权求和将三个连续的灰度帧巧妙地整合到原始 RGB 帧中,从而解决了与颜色相关的误导问题,增强了时间信息的提取。同时,FAM 采用聚类方法将帧级特征聚合为高语义子动作,并用聚类中心替换原始特征,以减轻异常值对模型性能的不利影响。在基准数据集上的实验结果证明了我们的方法在少镜头动作识别中的有效性。我们通过进行全面的消融实验验证了我们提出的方法。
{"title":"Two-Stream Temporal Feature Aggregation Based on Clustering for Few-Shot Action Recognition","authors":"Long Deng;Ao Li;Bingxin Zhou;Yongxin Ge","doi":"10.1109/LSP.2024.3456670","DOIUrl":"10.1109/LSP.2024.3456670","url":null,"abstract":"The metric learning paradigm has achieved notable success in few-shot action recognition; however, it faces unaddressed challenges. Specifically, \u0000<bold>(1)</b>\u0000 limited training data could impede the exploration of temporal action relations, and \u0000<bold>(2)</b>\u0000 precision would decline from the presence of outliers during the frame-level feature alignment. To address the challenges, we propose a two-stream temporal feature aggregation method based on clustering, incorporating a temporal augmentation module (TAM) and a feature aggregation module (FAM). The TAM adeptly integrates three consecutive grayscale frames into the original RGB frame through weighted summation, thereby addressing the color-related misguidance and enhancing the temporal information extraction. Meanwhile, the FAM employs clustering to aggregate the frame-level features into high semantic sub-actions and replaces the original features with cluster centers to mitigate the adverse impact of outliers on the model performance. Experimental results on benchmark datasets demonstrate the effectiveness of our method in few-shot action recognition. We validate our proposed approach by conducting comprehensive ablation experiments.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distributed Online Ordinal Regression Based on VUS Maximization 基于 VUS 最大化的分布式在线序数回归
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-09 DOI: 10.1109/LSP.2024.3456629
Huan Liu;Jiankai Tu;Chunguang Li
Ordinal regression (OR) is a multi-class classification problem with ordered labels. The objective functions of most OR methods are based on the misclassification error. The volume under the ROC surface (VUS) is a measure of OR that quantifies the ranking ability of OR models. It can also be used as an objective function in OR. In practice, data may be collected by multiple nodes in a distributed and online manner, and is difficult to process centrally. In this paper, we intend to develop a VUS-based distributed online OR method. Computing VUS requires a sequence of data from all categories, but the available online data may not cover all categories and the required data may distribute across different nodes. Besides, the existing approximation methods of VUS are inappropriate for using in OR. To address these issues, we first propose two new surrogate losses of the VUS in OR. We then derive their decomposed formulations and propose distributed online OR algorithms based on VUS maximization (dVMOR). The experimental results demonstrate their effectiveness.
序数回归(Ordinal regression,OR)是一种具有有序标签的多类分类问题。大多数序列回归方法的目标函数都基于误分类误差。ROC 表面下的体积(VUS)是对 OR 的一种度量,它量化了 OR 模型的排序能力。它也可用作 OR 的目标函数。在实践中,数据可能由多个节点以分布式在线方式收集,难以集中处理。本文打算开发一种基于 VUS 的分布式在线 OR 方法。计算 VUS 需要所有类别的数据序列,但可用的在线数据可能无法覆盖所有类别,所需的数据也可能分布在不同的节点上。此外,现有的 VUS 近似方法也不适合用于 OR。为了解决这些问题,我们首先提出了两个新的替代损失。然后,我们推导出它们的分解式,并提出了基于 VUS 最大化的分布式在线 OR 算法(dVMOR)。实验结果证明了它们的有效性。
{"title":"Distributed Online Ordinal Regression Based on VUS Maximization","authors":"Huan Liu;Jiankai Tu;Chunguang Li","doi":"10.1109/LSP.2024.3456629","DOIUrl":"10.1109/LSP.2024.3456629","url":null,"abstract":"Ordinal regression (OR) is a multi-class classification problem with ordered labels. The objective functions of most OR methods are based on the misclassification error. The volume under the ROC surface (VUS) is a measure of OR that quantifies the ranking ability of OR models. It can also be used as an objective function in OR. In practice, data may be collected by multiple nodes in a distributed and online manner, and is difficult to process centrally. In this paper, we intend to develop a VUS-based distributed online OR method. Computing VUS requires a sequence of data from all categories, but the available online data may not cover all categories and the required data may distribute across different nodes. Besides, the existing approximation methods of VUS are inappropriate for using in OR. To address these issues, we first propose two new surrogate losses of the VUS in OR. We then derive their decomposed formulations and propose distributed online OR algorithms based on VUS maximization (dVMOR). The experimental results demonstrate their effectiveness.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218759","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fast H.266/VVC Intra Coding by Early Skipping Joint Coding of Chroma Residuals 通过提前跳过色度残差联合编码实现快速 H.266/VVC 内部编码
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2024-09-09 DOI: 10.1109/LSP.2024.3456631
Yifan Wang;Shixuan Feng;Wei Zhang;Kai Li;Fuzheng Yang
Versatile Video Coding (H.266/VVC) significantly enhances compression performance but also increases encoding complexity. Numerous fast intra coding algorithms have been proposed to balance the performance gains with the increased complexity introduced by various advanced tools. However, few fast algorithms are specific to Joint Coding of Chroma Residuals (JCCR), a newly introduced technique that requires additional rate-distortion optimization (RDO) processes. To address this gap, this letter introduces a fast intra coding algorithm that reduces complexity by early skipping unnecessary JCCR. Specifically, we propose skipping JCCR when the correlation between chroma components is low, and we construct a JCCR normalized reconstruction distortion to measure this correlation. The skip conditions are determined by statistical analysis, intelligently reducing the number of RDO processes for chroma components. Experimental results show that, compared to the Fraunhofer Versatile Video Encoder (VVenC), our algorithm achieves encoding time savings of 1.68% and 2.82% under Random Access (RA) and All Intra (AI) settings, respectively, with a performance loss of 0.03% and 0.09% only, demonstrating the effectiveness of our fast algorithm.
多功能视频编码(H.266/VVC)大大提高了压缩性能,但也增加了编码的复杂性。为了在性能提升与各种先进工具带来的复杂性增加之间取得平衡,人们提出了许多快速内部编码算法。然而,很少有快速算法是专门针对色度残差联合编码(JCCR)的,这是一种新引入的技术,需要额外的速率-失真优化(RDO)过程。为了弥补这一不足,本文介绍了一种快速内部编码算法,它通过提前跳过不必要的 JCCR 来降低复杂性。具体来说,我们建议在色度分量之间的相关性较低时跳过 JCCR,并构建了一个 JCCR 归一化重建失真来衡量这种相关性。跳过条件通过统计分析确定,从而智能地减少了色度分量的 RDO 处理次数。实验结果表明,与弗劳恩霍夫多功能视频编码器(VVenC)相比,我们的算法在随机存取(RA)和全内部(AI)设置下分别节省了 1.68% 和 2.82% 的编码时间,性能损失仅为 0.03% 和 0.09%,证明了我们的快速算法的有效性。
{"title":"Fast H.266/VVC Intra Coding by Early Skipping Joint Coding of Chroma Residuals","authors":"Yifan Wang;Shixuan Feng;Wei Zhang;Kai Li;Fuzheng Yang","doi":"10.1109/LSP.2024.3456631","DOIUrl":"10.1109/LSP.2024.3456631","url":null,"abstract":"Versatile Video Coding (H.266/VVC) significantly enhances compression performance but also increases encoding complexity. Numerous fast intra coding algorithms have been proposed to balance the performance gains with the increased complexity introduced by various advanced tools. However, few fast algorithms are specific to Joint Coding of Chroma Residuals (JCCR), a newly introduced technique that requires additional rate-distortion optimization (RDO) processes. To address this gap, this letter introduces a fast intra coding algorithm that reduces complexity by early skipping unnecessary JCCR. Specifically, we propose skipping JCCR when the correlation between chroma components is low, and we construct a JCCR normalized reconstruction distortion to measure this correlation. The skip conditions are determined by statistical analysis, intelligently reducing the number of RDO processes for chroma components. Experimental results show that, compared to the Fraunhofer Versatile Video Encoder (VVenC), our algorithm achieves encoding time savings of 1.68% and 2.82% under Random Access (RA) and All Intra (AI) settings, respectively, with a performance loss of 0.03% and 0.09% only, demonstrating the effectiveness of our fast algorithm.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142218760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1