首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
PFCNet: Enhancing Rail Surface Defect Detection With Pixel-Aware Frequency Conversion Networks PFCNet:基于像素感知的变频网络增强钢轨表面缺陷检测
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-06 DOI: 10.1109/LSP.2025.3525855
Yue Wu;Fangfang Qiang;Wujie Zhou;Weiqing Yan
Applying computer vision techniques to rail surface defect detection (RSDD) is crucial for preventing catastrophic accidents. However, challenges such as complex backgrounds and irregular defect shapes persist. Previous methods have focused on extracting salient object information from a pixel perspective, thereby neglecting valuable high- and low-frequency image information, which can better capture global structural information. In this study, we design a pixel-aware frequency conversion network (PFCNet) to explore RSDD from a frequency domain perspective. We use different attention mechanisms and frequency enhancement for high-level and shallow features to explore local details and global structures comprehensively. In addition, we design a dual-control reorganization module to refine the features across levels. We conducted extensive experiments on an industrial RGB-D dataset (NEU RSDDS-AUG), and PFCNet achieved superior performance.
将计算机视觉技术应用到钢轨表面缺陷检测中,对于防止灾难性事故的发生至关重要。然而,复杂的背景和不规则的缺陷形状等挑战仍然存在。以往的方法侧重于从像素角度提取显著目标信息,从而忽略了有价值的高低频图像信息,而这些信息可以更好地捕获全局结构信息。在本研究中,我们设计了一个像素感知频率转换网络(PFCNet),从频域角度探索RSDD。我们对高层次和浅层特征采用不同的注意机制和频率增强,以全面探索局部细节和全局结构。此外,我们设计了一个双控重组模块来细化跨关卡的特征。我们在工业RGB-D数据集(NEU RSDDS-AUG)上进行了大量实验,PFCNet取得了优异的性能。
{"title":"PFCNet: Enhancing Rail Surface Defect Detection With Pixel-Aware Frequency Conversion Networks","authors":"Yue Wu;Fangfang Qiang;Wujie Zhou;Weiqing Yan","doi":"10.1109/LSP.2025.3525855","DOIUrl":"https://doi.org/10.1109/LSP.2025.3525855","url":null,"abstract":"Applying computer vision techniques to rail surface defect detection (RSDD) is crucial for preventing catastrophic accidents. However, challenges such as complex backgrounds and irregular defect shapes persist. Previous methods have focused on extracting salient object information from a pixel perspective, thereby neglecting valuable high- and low-frequency image information, which can better capture global structural information. In this study, we design a pixel-aware frequency conversion network (PFCNet) to explore RSDD from a frequency domain perspective. We use different attention mechanisms and frequency enhancement for high-level and shallow features to explore local details and global structures comprehensively. In addition, we design a dual-control reorganization module to refine the features across levels. We conducted extensive experiments on an industrial RGB-D dataset (NEU RSDDS-AUG), and PFCNet achieved superior performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"606-610"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Piecewise Student's t-distribution Mixture Model-Based Estimation for NAND Flash Memory Channels 基于分段学生t分布混合模型的NAND闪存通道估计
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-06 DOI: 10.1109/LSP.2024.3521326
Cheng Wang;Zhen Mei;Jun Li;Kui Cai;Lingjun Kong
Accurate modeling and estimation of the threshold voltages of the flash memory can facilitate the efficient design of channel codes and detectors. However, most flash memory channel models are based on Gaussian distributions, which fail to capture certain key properties of the threshold voltages, such as their heavy-tails. To enhance the model accuracy, we first propose a piecewise student's t-distribution mixture model (PSTMM), which features degrees of freedom to control the left and right tails of the voltage distributions. We further propose an PSTMM based expectation maximization (PSTMM-EM) algorithm to estimate model parameters for flash memories by alternately computing the expected values of the missing data and maximizing the likelihood function with respect to the model parameters. Simulation results demonstrate that our proposed algorithm exhibits superior stability and can effectively extend the flash memory lifespan by 1700 program/erase (PE) cycles compared with the existing parameter estimation algorithms.
对闪存的阈值电压进行准确的建模和估计,有助于有效地设计通道码和检测器。然而,大多数闪存通道模型是基于高斯分布的,无法捕捉阈值电压的某些关键特性,例如它们的重尾。为了提高模型的精度,我们首先提出了分段学生t分布混合模型(PSTMM),该模型具有控制电压分布的左右尾部的自由度。我们进一步提出了一种基于PSTMM的期望最大化(PSTMM- em)算法,通过交替计算缺失数据的期望值和最大化关于模型参数的似然函数来估计闪存的模型参数。仿真结果表明,与现有的参数估计算法相比,该算法具有优异的稳定性,可以有效地将闪存寿命延长1700个程序/擦除(PE)周期。
{"title":"Piecewise Student's t-distribution Mixture Model-Based Estimation for NAND Flash Memory Channels","authors":"Cheng Wang;Zhen Mei;Jun Li;Kui Cai;Lingjun Kong","doi":"10.1109/LSP.2024.3521326","DOIUrl":"https://doi.org/10.1109/LSP.2024.3521326","url":null,"abstract":"Accurate modeling and estimation of the threshold voltages of the flash memory can facilitate the efficient design of channel codes and detectors. However, most flash memory channel models are based on Gaussian distributions, which fail to capture certain key properties of the threshold voltages, such as their heavy-tails. To enhance the model accuracy, we first propose a piecewise student's t-distribution mixture model (PSTMM), which features degrees of freedom to control the left and right tails of the voltage distributions. We further propose an PSTMM based expectation maximization (PSTMM-EM) algorithm to estimate model parameters for flash memories by alternately computing the expected values of the missing data and maximizing the likelihood function with respect to the model parameters. Simulation results demonstrate that our proposed algorithm exhibits superior stability and can effectively extend the flash memory lifespan by 1700 program/erase (PE) cycles compared with the existing parameter estimation algorithms.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"451-455"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Noise Covariance Matrix Estimation in Block-Correlated Noise Field for Direction Finding 面向测向的块相关噪声场噪声协方差矩阵估计
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-06 DOI: 10.1109/LSP.2025.3525898
Majdoddin Esfandiari;Sergiy A. Vorobyov
A noise covariance matrix estimation approach in unknown noise field for direction finding applicable for the practically important cases of nonuniform and block-diagonal sensor noise is proposed. It is based on an alternating procedure that can be adjusted for a specific noise type. Numerical simulations are conducted in order to establish the generality and superiority of the proposed approach over the existing state-of-the-art methods, especially in challenging scenarios.
提出了一种未知噪声场中噪声协方差矩阵估计的测向方法,适用于非均匀和对角块传感器噪声的实际重要情况。它是基于一个交替的程序,可以根据特定的噪声类型进行调整。数值模拟是为了证明所提出的方法比现有的最先进的方法具有通用性和优越性,特别是在具有挑战性的情况下。
{"title":"Noise Covariance Matrix Estimation in Block-Correlated Noise Field for Direction Finding","authors":"Majdoddin Esfandiari;Sergiy A. Vorobyov","doi":"10.1109/LSP.2025.3525898","DOIUrl":"https://doi.org/10.1109/LSP.2025.3525898","url":null,"abstract":"A noise covariance matrix estimation approach in unknown noise field for direction finding applicable for the practically important cases of nonuniform and block-diagonal sensor noise is proposed. It is based on an alternating procedure that can be adjusted for a specific noise type. Numerical simulations are conducted in order to establish the generality and superiority of the proposed approach over the existing state-of-the-art methods, especially in challenging scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"531-535"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10824965","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142976117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the Modality Gap in Multimodal Eye Disease Screening: Learning Modality Shared-Specific Features via Multi-Level Regularization 弥合多模态眼病筛查的模态差距:通过多层次正则化学习模态共享特异性特征
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-06 DOI: 10.1109/LSP.2025.3526094
Jiayue Zhao;Shiman Li;Yi Hao;Chenxi Zhang
Color fundus photography (CFP) and optical coherence tomography (OCT) are two common modalities used in eye disease screening, providing crucial complementary information for the diagnosis of eye diseases. However, existing multimodal learning methods cannot fully leverage the information from each modality due to the large dimensional and semantic gap between 2D CFP and 3D OCT images, leading to suboptimal classification performance. To bridge the modality gap and fully exploit the information from each modality, we propose a novel feature disentanglement method that decomposes features into modality-shared and modality-specific components. We design a multi-level regularization strategy including intra-modality, inter-modality, and intra-inter-modality regularization to facilitate the effective learning of the modality Shared-Specific features. Our method achieves state-of-the-art performance on two eye disease diagnosis tasks using two publicly available datasets. Our method promises to serve as a useful tool for multimodal eye disease diagnosis.
彩色眼底摄影(CFP)和光学相干断层扫描(OCT)是眼病筛查中常用的两种方式,为眼病的诊断提供了重要的补充信息。然而,由于二维CFP和三维OCT图像之间存在较大的维度和语义差距,现有的多模态学习方法无法充分利用每个模态的信息,导致分类性能不理想。为了消除模态差异并充分利用每种模态的信息,我们提出了一种新的特征解纠缠方法,将特征分解为模态共享组件和模态特定组件。我们设计了一种多级正则化策略,包括模态内、模态间和模态间的正则化,以促进模态共享特定特征的有效学习。我们的方法使用两个公开可用的数据集在两个眼病诊断任务上实现了最先进的性能。该方法有望成为多模态眼病诊断的有效工具。
{"title":"Bridging the Modality Gap in Multimodal Eye Disease Screening: Learning Modality Shared-Specific Features via Multi-Level Regularization","authors":"Jiayue Zhao;Shiman Li;Yi Hao;Chenxi Zhang","doi":"10.1109/LSP.2025.3526094","DOIUrl":"https://doi.org/10.1109/LSP.2025.3526094","url":null,"abstract":"Color fundus photography (CFP) and optical coherence tomography (OCT) are two common modalities used in eye disease screening, providing crucial complementary information for the diagnosis of eye diseases. However, existing multimodal learning methods cannot fully leverage the information from each modality due to the large dimensional and semantic gap between 2D CFP and 3D OCT images, leading to suboptimal classification performance. To bridge the modality gap and fully exploit the information from each modality, we propose a novel feature disentanglement method that decomposes features into modality-shared and modality-specific components. We design a multi-level regularization strategy including intra-modality, inter-modality, and intra-inter-modality regularization to facilitate the effective learning of the modality Shared-Specific features. Our method achieves state-of-the-art performance on two eye disease diagnosis tasks using two publicly available datasets. Our method promises to serve as a useful tool for multimodal eye disease diagnosis.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"586-590"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consensus Iterated Posterior Linearization Filter for Distributed State Estimation 分布状态估计的一致迭代后验线性化滤波器
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-06 DOI: 10.1109/LSP.2025.3526092
Ángel F. García-Fernández;Giorgio Battistelli
This paper presents the consensus iterated posterior linearisation filter (IPLF) for distributed state estimation. The consensus IPLF algorithm is based on a measurement model described by its conditional mean and covariance given the state, and performs iterated statistical linear regressions of the measurements with respect to the current approximation of the posterior to improve estimation performance. Three variants of the algorithm are presented based on the type of consensus that is used: consensus on information, consensus on measurements, and hybrid consensus on measurements and information. Simulation results show the benefits of the proposed algorithm in distributed state estimation.
提出了一种用于分布状态估计的一致迭代后验线性化滤波器。共识IPLF算法基于给定状态下的条件均值和协方差描述的测量模型,并对测量值相对于当前后验近似值进行迭代统计线性回归,以提高估计性能。基于所使用的共识类型,提出了该算法的三种变体:信息共识、测量共识和测量和信息混合共识。仿真结果表明了该算法在分布式状态估计中的优越性。
{"title":"Consensus Iterated Posterior Linearization Filter for Distributed State Estimation","authors":"Ángel F. García-Fernández;Giorgio Battistelli","doi":"10.1109/LSP.2025.3526092","DOIUrl":"https://doi.org/10.1109/LSP.2025.3526092","url":null,"abstract":"This paper presents the consensus iterated posterior linearisation filter (IPLF) for distributed state estimation. The consensus IPLF algorithm is based on a measurement model described by its conditional mean and covariance given the state, and performs iterated statistical linear regressions of the measurements with respect to the current approximation of the posterior to improve estimation performance. Three variants of the algorithm are presented based on the type of consensus that is used: consensus on information, consensus on measurements, and hybrid consensus on measurements and information. Simulation results show the benefits of the proposed algorithm in distributed state estimation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"561-565"},"PeriodicalIF":3.2,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cramér-Rao Bounds and Resolution Benefits of Sparse Arrays in Measurement-Dependent SNR Regimes 测量相关信噪比条件下稀疏阵列的cram<s:1> - rao边界和分辨率优势
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3525400
Sina Shahsavari;Piya Pal
This paper derives new non-asymptotic characterization of the Cramér-Rao Bound (CRB) of any sparse array as a function of the angular separation between two far-field narrowband sources in certain regimes characterized by a low Signal-to-Noise Ratio (SNR). The primary contribution is the derivation of matching upper and lower bounds on the CRB in a certain measurement-dependent SNR (MD-SNR) regime, where one can zoom into progressively lower SNR as the number of sensors increases. This tight characterization helps to establish that sparse arrays such as nested and coprime arrays provably exhibit lower CRB compared to Uniform Linear Arrays (ULAs) in the specified SNR regime.
本文给出了在低信噪比条件下,任意稀疏阵列的cram r- rao界(CRB)作为两个远场窄带源间角间距函数的非渐近刻画。主要贡献是在特定测量依赖的信噪比(MD-SNR)制度下匹配CRB的上界和下界的推导,其中可以随着传感器数量的增加而逐渐放大到较低的信噪比。这种严格的表征有助于建立稀疏阵列,如嵌套阵列和协素数阵列,在特定的信噪比下,与均匀线性阵列(ULAs)相比,可证明具有更低的CRB。
{"title":"Cramér-Rao Bounds and Resolution Benefits of Sparse Arrays in Measurement-Dependent SNR Regimes","authors":"Sina Shahsavari;Piya Pal","doi":"10.1109/LSP.2024.3525400","DOIUrl":"https://doi.org/10.1109/LSP.2024.3525400","url":null,"abstract":"This paper derives new non-asymptotic characterization of the Cramér-Rao Bound (CRB) of any sparse array as a function of the angular separation between two far-field narrowband sources in certain regimes characterized by a low Signal-to-Noise Ratio (SNR). The primary contribution is the derivation of matching upper and lower bounds on the CRB in a certain measurement-dependent SNR (MD-SNR) regime, where one can zoom into progressively lower SNR as the number of sensors increases. This tight characterization helps to establish that sparse arrays such as nested and coprime arrays provably exhibit lower CRB compared to Uniform Linear Arrays (ULAs) in the specified SNR regime.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"601-605"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition 频率解耦掩码自编码器用于自监督骨架动作识别
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3525398
Ye Liu;Tianhao Shi;Mingliang Zhai;Jun Liu
In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.
在基于3D骨骼的动作识别中,监督数据的有限可用性促使人们对自监督学习方法产生了兴趣。基于掩码自编码器(MAE)的重构范式是一种有效的主流自监督学习方法。然而,最近的研究表明,MAE模型倾向于关注某一频率范围内的特征,这可能会导致重要信息的丢失。为了解决这个问题,我们提出了一种频率解耦的MAE。具体来说,通过结合特定尺度的频率特征重建模块,我们深入研究了利用频率信息作为重建的直接和明确的目标,这增强了MAE在数据中识别和准确再现不同频率属性的能力。此外,为了解决更复杂的优化目标在频率重构中导致梯度更新不稳定的问题,我们引入了双路径网络结合指数移动平均(EMA)参数更新策略来指导模型稳定训练过程。我们进行了大量的实验,证明了所提出方法的有效性。
{"title":"Frequency Decoupled Masked Auto-Encoder for Self-Supervised Skeleton-Based Action Recognition","authors":"Ye Liu;Tianhao Shi;Mingliang Zhai;Jun Liu","doi":"10.1109/LSP.2024.3525398","DOIUrl":"https://doi.org/10.1109/LSP.2024.3525398","url":null,"abstract":"In 3D skeleton-based action recognition, the limited availability of supervised data has driven interest in self-supervised learning methods. The reconstruction paradigm using masked auto-encoder (MAE) is an effective and mainstream self-supervised learning approach. However, recent studies indicate that MAE models tend to focus on features within a certain frequency range, which may result in the loss of important information. To address this issue, we propose a frequency decoupled MAE. Specifically, by incorporating a scale-specific frequency feature reconstruction module, we delve into leveraging frequency information as a direct and explicit target for reconstruction, which augments the MAE's capability to discern and accurately reproduce diverse frequency attributes within the data. Moreover, in order to address the issue of unstable gradient updates caused by more complex optimization objectives with frequency reconstruction, we introduce a dual-path network combined with an exponential moving average (EMA) parameter updating strategy to guide the model in stabilizing the training process. We have conducted extensive experiments which have demonstrated the effectiveness of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"546-550"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142975891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer-Prompted Network: Efficient Audio–Visual Segmentation via Transformer and Prompt Learning 变压器提示网络:基于变压器和提示学习的高效视听分割
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3524120
Yusen Wang;Xiaohong Qian;Wujie Zhou
Audio–visual segmentation (AVS) is a challenging task that focuses on segmenting sound-producing objects within video frames by leveraging audio signals. Existing convolutional neural networks (CNNs) and Transformer-based methods extract features separately from modality-specific encoders and then use fusion modules to integrate the visual and auditory features. We propose an effective Transformer-prompted network, TPNet, which utilizes prompt learning with a Transformer to guide the CNN in addressing AVS tasks. Specifically, during feature encoding, we incorporate a frequency-based prompt-supplement module to fine-tune and enhance the encoded features through frequency-domain methods. Furthermore, during audio–visual fusion, we integrate a self-supplementing cross-fusion module that uses self-attention, two-dimensional selective scanning, and cross-attention mechanisms to merge and enhance audio–visual features effectively. The prompt features undergo the same processing in cross-modal fusion, further refining the fused features to achieve more accurate segmentation results. Finally, we apply self-knowledge distillation to the network, further enhancing the model performance. Extensive experiments on the AVSBench dataset validate the effectiveness of TPNet.
视听分割(AVS)是一项具有挑战性的任务,其重点是利用音频信号对视频帧内的声音产生对象进行分割。现有的卷积神经网络(cnn)和基于transformer的方法分别从模态特定的编码器中提取特征,然后使用融合模块将视觉和听觉特征融合。我们提出了一个有效的变压器提示网络,TPNet,它利用变压器的快速学习来指导CNN处理AVS任务。具体而言,在特征编码过程中,我们引入了基于频率的提示补充模块,通过频域方法对编码特征进行微调和增强。此外,在视听融合过程中,我们集成了一个自补充的交叉融合模块,该模块利用自注意、二维选择性扫描和交叉注意机制有效地融合和增强了视听特征。提示特征在跨模态融合中进行同样的处理,进一步细化融合特征,以获得更准确的分割结果。最后,我们将自知识蒸馏应用于网络,进一步提高了模型的性能。在AVSBench数据集上的大量实验验证了TPNet的有效性。
{"title":"Transformer-Prompted Network: Efficient Audio–Visual Segmentation via Transformer and Prompt Learning","authors":"Yusen Wang;Xiaohong Qian;Wujie Zhou","doi":"10.1109/LSP.2024.3524120","DOIUrl":"https://doi.org/10.1109/LSP.2024.3524120","url":null,"abstract":"Audio–visual segmentation (AVS) is a challenging task that focuses on segmenting sound-producing objects within video frames by leveraging audio signals. Existing convolutional neural networks (CNNs) and Transformer-based methods extract features separately from modality-specific encoders and then use fusion modules to integrate the visual and auditory features. We propose an effective Transformer-prompted network, TPNet, which utilizes prompt learning with a Transformer to guide the CNN in addressing AVS tasks. Specifically, during feature encoding, we incorporate a frequency-based prompt-supplement module to fine-tune and enhance the encoded features through frequency-domain methods. Furthermore, during audio–visual fusion, we integrate a self-supplementing cross-fusion module that uses self-attention, two-dimensional selective scanning, and cross-attention mechanisms to merge and enhance audio–visual features effectively. The prompt features undergo the same processing in cross-modal fusion, further refining the fused features to achieve more accurate segmentation results. Finally, we apply self-knowledge distillation to the network, further enhancing the model performance. Extensive experiments on the AVSBench dataset validate the effectiveness of TPNet.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"516-520"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Secure Degree of Freedom Bound of Secret-Key Capacity for Two-Way Wiretap Channel 双向窃听信道密钥容量的安全自由度边界
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3525406
Qingpeng Liang;Linsong Du;Yanzhi Wu;Zheng Ma
This letter focuses on the optimal sum secure degree of freedom (SDoF) in a two-way wiretap channel (TW-WC), wherein two legitimate full-duplex multiple-antenna nodes cooperate with each other and are wiretapped by a multiple antenna eavesdropper simultaneously. It aims to find the optimal sum SDoF pertaining to secret-key capacity for the TW-WC. First, we analyze the upper bound and lower bounds of the optimal sum SDoF by establishing their equivalence to the expression of the optimal SDoF corresponding to the secrecy rate for the TW-WC. Subsequently, in scenarios where the legitimate nodes are configured with an equal number of transmit and receive antennas, it is elucidated that the upper and lower bounds of the optimal SDoF converge. Furthermore, the findings suggest that a higher SDoF can be achieved than the existing works, thereby heralding an enhancement in secure spectral efficiency.
这封信的重点是在双向窃听信道(TW-WC)的最佳总和安全自由度(SDoF),其中两个合法的全双工多天线节点相互合作,同时被多天线窃听者窃听。其目的是寻找与TW-WC保密密钥容量相关的最优SDoF总和。首先,我们分析了最优和SDoF的上界和下界,建立了它们与TW-WC保密率对应的最优SDoF表达式的等价性。然后,在合法节点配置相同数量的发射和接收天线的情况下,阐明了最优SDoF的上界和下界收敛。此外,研究结果表明,可以实现比现有工作更高的SDoF,从而预示着安全频谱效率的提高。
{"title":"Secure Degree of Freedom Bound of Secret-Key Capacity for Two-Way Wiretap Channel","authors":"Qingpeng Liang;Linsong Du;Yanzhi Wu;Zheng Ma","doi":"10.1109/LSP.2024.3525406","DOIUrl":"https://doi.org/10.1109/LSP.2024.3525406","url":null,"abstract":"This letter focuses on the optimal sum secure degree of freedom (SDoF) in a two-way wiretap channel (TW-WC), wherein two legitimate full-duplex multiple-antenna nodes cooperate with each other and are wiretapped by a multiple antenna eavesdropper simultaneously. It aims to find the optimal sum SDoF pertaining to secret-key capacity for the TW-WC. First, we analyze the upper bound and lower bounds of the optimal sum SDoF by establishing their equivalence to the expression of the optimal SDoF corresponding to the secrecy rate for the TW-WC. Subsequently, in scenarios where the legitimate nodes are configured with an equal number of transmit and receive antennas, it is elucidated that the upper and lower bounds of the optimal SDoF converge. Furthermore, the findings suggest that a higher SDoF can be achieved than the existing works, thereby heralding an enhancement in secure spectral efficiency.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"581-585"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Study on the Optimality of Downlink Hybrid NOMA 下行链路混合NOMA的最优性研究
IF 3.2 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-01-03 DOI: 10.1109/LSP.2024.3524096
Zhiguo Ding
The key idea of hybrid non-orthogonal multiple access (NOMA) is to allow users to use the bandwidth resources to which they cannot have access in orthogonal multiple access (OMA) based legacy networks while still guaranteeing its compatibility with the legacy network. However, in a conventional hybrid NOMA downlink network, some users have access to more bandwidth resources than others, which leads to a potential performance loss. So what if the users can access the same amount of bandwidth resources? This letter focuses on a simple two-user scenario, and develops analytical and simulation results to reveal that for this considered scenario, conventional hybrid NOMA is still an optimal transmission strategy.
混合非正交多址(NOMA)的关键思想是允许用户使用在基于正交多址(OMA)的传统网络中无法访问的带宽资源,同时保证其与传统网络的兼容性。然而,在传统的混合NOMA下行网络中,一些用户可以访问比其他用户更多的带宽资源,从而导致潜在的性能损失。那么,如果用户可以访问相同数量的带宽资源呢?这封信的重点是一个简单的双用户场景,并开发了分析和仿真结果,以揭示在这种考虑的场景中,传统的混合NOMA仍然是一种最佳的传输策略。
{"title":"A Study on the Optimality of Downlink Hybrid NOMA","authors":"Zhiguo Ding","doi":"10.1109/LSP.2024.3524096","DOIUrl":"https://doi.org/10.1109/LSP.2024.3524096","url":null,"abstract":"The key idea of hybrid non-orthogonal multiple access (NOMA) is to allow users to use the bandwidth resources to which they cannot have access in orthogonal multiple access (OMA) based legacy networks while still guaranteeing its compatibility with the legacy network. However, in a conventional hybrid NOMA downlink network, some users have access to more bandwidth resources than others, which leads to a potential performance loss. So what if the users can access the same amount of bandwidth resources? This letter focuses on a simple two-user scenario, and develops analytical and simulation results to reveal that for this considered scenario, conventional hybrid NOMA is still an optimal transmission strategy.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"511-515"},"PeriodicalIF":3.2,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142937892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1