首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
SPTPCA: Structure-Preserving Tensor Principal Component Analysis for Hyperspectral Dimensionality Reduction 高光谱降维的保结构张量主成分分析
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-21 DOI: 10.1109/LSP.2025.3635010
Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar
Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor $ast _{mathcal {L}}$ -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.
高光谱成像产生具有复杂空间光谱相关性的高维数据,这对降维构成了重大挑战。主成分分析(PCA)将自然多维张量结构扁平化为向量,导致关键空间关系的丢失。现有的张量方法包括Tucker分解和tensor Train (TT),它们提供了低秩近似,但没有将PCA的方差优化框架扩展到张量域。在本文中,我们提出了一种基于广义张量$ast _{mathcal {L}}$ -product框架的保结构张量主成分分析(SPTPCA),这是一种解决这一问题的降维方法。与标准PCA不同,SPTPCA直接对张量表示进行操作,在保持方差优化特性的同时保留自然结构和空间光谱相关性。在Indian Pines数据集上的实验验证表明,在不同组分数量下,MSE降低了7.9-50.0%,PSNR提高了0.35-2.59 dB,建立了一个数学上严格的结构保持高光谱降维框架。
{"title":"SPTPCA: Structure-Preserving Tensor Principal Component Analysis for Hyperspectral Dimensionality Reduction","authors":"Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar","doi":"10.1109/LSP.2025.3635010","DOIUrl":"https://doi.org/10.1109/LSP.2025.3635010","url":null,"abstract":"Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor <inline-formula><tex-math>$ast _{mathcal {L}}$</tex-math></inline-formula> -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4469-4472"},"PeriodicalIF":3.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Interpretable Image Steganalysis for Invertible Image Hiding 零镜头可解释图像隐写分析的可逆图像隐藏
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3633618
Hao Wang;Yiming Yao;Yaguang Xie;Tong Qiao;Zhidong Zhao
Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.
图像隐写分析以检测隐藏在图像中的秘密信息为目的,已成为评估隐写方法安全性的重要手段,尤其是新兴的可逆图像隐藏方法。然而,以往的研究仅仅将输入图像分为两类(隐写或覆盖),并且通常在训练数据和测试数据必须遵循相似分布的约束下进行隐写分析,从而阻碍了其在现实场景中的应用。为了克服这些缺点,我们提出了一种新的可解释的图像隐写分析框架,该框架针对具有挑战性的零镜头设置下的可逆图像隐藏方案量身定制。具体来说,我们将图像隐藏、显示和隐写分析集成到一个统一的框架中,赋予隐写分析组件恢复嵌入在隐写图像中的秘密信息的能力。此外,我们阐述了一种简单而有效的残差增强策略,用于生成隐写图像,以进一步增强隐写分析器在跨数据集和跨架构场景下的泛化能力。在基准数据集上的大量实验表明,我们提出的方法明显优于现有的可逆图像隐藏方案的隐写分析技术。
{"title":"Zero-Shot Interpretable Image Steganalysis for Invertible Image Hiding","authors":"Hao Wang;Yiming Yao;Yaguang Xie;Tong Qiao;Zhidong Zhao","doi":"10.1109/LSP.2025.3633618","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633618","url":null,"abstract":"Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4434-4438"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance velocity2dm:钢琴演奏中动态标记预测的上下文建模方法
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3633579
Hyon Kim;Emmanouil Benetos;Xavier Serra
Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., mp, ff) and non-constant DMs (e.g., crescendo, fp). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.
音乐表演中的表达动态是主观的和依赖于上下文的,然而大多数符号模型将动态标记(dm)视为具有固定MIDI速度的静态。本文提出了一种结合MusicXML乐谱信息和演奏MIDI数据的钢琴演奏dm预测方法,该方法采用了一种新的标记化方案和基于roberta的掩码语言模型(MLM)。我们的方法侧重于上下文聚合的MIDI速度和相应的dm,考虑到钢琴家的主观解释。笔记级的特征被序列化并转换成一个符号序列,以预测常数(例如mp, ff)和非常数dm(例如crescendo, fp)。对三个专家性能数据集的评估表明,该模型有效地从上下文注释块中学习动态转换,并泛化到常量标记之外。这是第一个使用上下文序列学习在统一框架中对恒定和非恒定动态进行建模的研究。结果表明,该方法在音乐分析、表演建模和计算机辅助音乐教育等方面具有广阔的应用前景。
{"title":"Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance","authors":"Hyon Kim;Emmanouil Benetos;Xavier Serra","doi":"10.1109/LSP.2025.3633579","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633579","url":null,"abstract":"Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., <italic>mp</i>, <italic>ff</i>) and non-constant DMs (e.g., <italic>crescendo</i>, <italic>fp</i>). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4459-4463"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250595","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective 多视点感知的组合分布式学习:最大编码率降低视角
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3633169
Zhuojun Tian;Mehdi Bennis
In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.
在这封信中,我们利用最大编码率降低原理结合子空间基融合,制定了一个多视图感知的组合分布式学习框架。在该算法中,每个智能体对其学习到的子空间进行周期性奇异值分解,并交换截断的基矩阵,在此基础上得到融合子空间。通过引入投影矩阵并最小化输出与其投影之间的距离,将学习到的表示强制到融合子空间。从理论上证明了编码速率变化轨迹是有界的,保证了基融合的一致性。与显示相关子空间和耦合表征的基线相比,数值模拟验证了该算法在保持表征多样性的同时实现了较高的分类精度。
{"title":"Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective","authors":"Zhuojun Tian;Mehdi Bennis","doi":"10.1109/LSP.2025.3633169","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633169","url":null,"abstract":"In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4409-4413"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Noisy Sensor Positions Using Noisy Inter-Sensor AOA Measurements 利用噪声传感器间AOA测量改进噪声传感器位置
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3634030
Yanbin Zou;Binhan Liao
In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram $acute{text{e}}$ r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.
在本文中,我们研究了利用传感器间到达角(AOA)测量来改善噪声传感器位置的问题,这是高度非线性的。首先,我们提出Cram $acute{text{e}}$ r-Rao下界(CRLB)分析,表明传感器间AOA测量的结合提高了传感器位置的精度。其次,我们提出了两个加权最小二乘(WLS)方法来解决问题。其中一种方法采用了Tikhonov正则化方法,因为所构造的回归量不是列全秩矩阵,而另一种方法(称为改进的WLS解)是由极大似然估计量导出的,避免了正则化因子的选择。最后,仿真结果表明,无论正则化因子的选择如何,改进的WLS解的性能都接近CRLB,并且优于基于正则化的WLS解。
{"title":"Improving Noisy Sensor Positions Using Noisy Inter-Sensor AOA Measurements","authors":"Yanbin Zou;Binhan Liao","doi":"10.1109/LSP.2025.3634030","DOIUrl":"https://doi.org/10.1109/LSP.2025.3634030","url":null,"abstract":"In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram <inline-formula><tex-math>$acute{text{e}}$</tex-math></inline-formula> r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4449-4453"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NAS-GS: Normal Alignment and Surface-Constrained Optimization of 3DGS for High-Fidelity Surface Reconstruction NAS-GS:用于高保真曲面重建的3DGS法向对准和表面约束优化
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-14 DOI: 10.1109/LSP.2025.3632760
Jianyu Ding;Yichen Song;Shuai Liu;Xiaonan Mao;Jie Yang;Wei Liu
Multi-view surface reconstruction is essential for accurate 3D modeling and high-quality novel view synthesis. Neural implicit methods, such as NeRF, exhibit excellent rendering but struggle with precise surface extraction. While 3D Gaussian splatting (3DGS) provides efficient explicit representations, it suffers from geometric inaccuracies due to misalignment and lack of strong surface constraints, especially in real-world scenarios. These issues stem from unordered Gaussian primitives, may tend to surface drift, redundancy, and blurred boundaries. To overcome these limitations, we propose a surface-aware Gaussian aggregation framework featuring an adaptive normal alignment loss that integrates rendered, depth-based, and monocular normals to enforce robust surface orientation supervision. Additionally, our surface-guided optimization strategy aligns Gaussian primitives precisely to surfaces by exploiting combined rendered and predicted geometric information. Extensive experiments demonstrate our approach achieves state-of-the-art surface reconstruction accuracy alongside superior novel view synthesis, with ablation studies confirming the efficacy of our contributions.
多视点曲面重构是实现精确三维建模和高质量新视点合成的关键。神经隐式方法,如NeRF,表现出出色的渲染,但难以精确地提取表面。虽然3D高斯飞溅(3DGS)提供了有效的显式表示,但由于不对齐和缺乏强表面约束,特别是在现实场景中,它受到几何不准确性的影响。这些问题源于无序的高斯原语,可能倾向于表面漂移、冗余和模糊的边界。为了克服这些限制,我们提出了一个表面感知的高斯聚集框架,该框架具有自适应法线对齐损失,该框架集成了渲染、基于深度和单目法线,以加强鲁棒的表面方向监督。此外,我们的表面导向优化策略通过利用组合呈现和预测的几何信息,精确地将高斯原语对齐到表面。大量的实验表明,我们的方法实现了最先进的表面重建精度以及卓越的新视图合成,烧蚀研究证实了我们的贡献的有效性。
{"title":"NAS-GS: Normal Alignment and Surface-Constrained Optimization of 3DGS for High-Fidelity Surface Reconstruction","authors":"Jianyu Ding;Yichen Song;Shuai Liu;Xiaonan Mao;Jie Yang;Wei Liu","doi":"10.1109/LSP.2025.3632760","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632760","url":null,"abstract":"Multi-view surface reconstruction is essential for accurate 3D modeling and high-quality novel view synthesis. Neural implicit methods, such as NeRF, exhibit excellent rendering but struggle with precise surface extraction. While 3D Gaussian splatting (3DGS) provides efficient explicit representations, it suffers from geometric inaccuracies due to misalignment and lack of strong surface constraints, especially in real-world scenarios. These issues stem from unordered Gaussian primitives, may tend to surface drift, redundancy, and blurred boundaries. To overcome these limitations, we propose a surface-aware Gaussian aggregation framework featuring an adaptive normal alignment loss that integrates rendered, depth-based, and monocular normals to enforce robust surface orientation supervision. Additionally, our surface-guided optimization strategy aligns Gaussian primitives precisely to surfaces by exploiting combined rendered and predicted geometric information. Extensive experiments demonstrate our approach achieves state-of-the-art surface reconstruction accuracy alongside superior novel view synthesis, with ablation studies confirming the efficacy of our contributions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4399-4403"},"PeriodicalIF":3.9,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ConMSDMamba: Multi-Scale Dilated Mamba Based on Conformer for Speech Emotion Recognition ConMSDMamba:基于自适应的多尺度扩张曼巴语音情感识别
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-14 DOI: 10.1109/LSP.2025.3632762
Guangyuan Qian;Zhenchun Lei;Sihong Liu;Changhong Liu;Aiwen Jiang
Although the Conformer model excels in speech processing, its core self-attention mechanism is limited in capturing multi-scale temporal dynamics and lacks explicit modeling of frequency-domain features, both crucial for Speech Emotion Recognition (SER). To address this, we propose ConMSDMamba, a novel Conformer-based architecture for SER. Specifically, to overcome the single-scale limitation of the original self-attention, we introduce a multi-scale dilated structure with parallel dilated convolutions to capture diverse temporal contexts. We further find that combining this structure with bidirectional Mamba models long-range temporal dependencies more efficiently than multi-head self-attention. Furthermore, to complement the Conformer’s time-domain focus, we design a time-frequency convolution module that incorporates a wavelet-based branch for joint time-frequency perception. Experimental results on the widely used IEMOCAP and MELD datasets demonstrate that ConMSDMamba outperforms state-of-the-art methods.
尽管Conformer模型在语音处理方面表现出色,但其核心的自注意机制在捕捉多尺度时间动态方面受到限制,并且缺乏对频域特征的明确建模,而这两者对于语音情感识别(SER)至关重要。为了解决这个问题,我们提出了ConMSDMamba,这是一种新颖的基于conformer的SER体系结构。具体而言,为了克服原始自注意的单尺度限制,我们引入了具有平行扩展卷积的多尺度扩展结构来捕获不同的时间背景。我们进一步发现,将这种结构与双向曼巴模型相结合,比多头自我注意更有效地模拟了长期时间依赖性。此外,为了补充Conformer的时域焦点,我们设计了一个时频卷积模块,该模块包含一个基于小波的分支,用于联合时频感知。在广泛使用的IEMOCAP和MELD数据集上的实验结果表明,ConMSDMamba优于最先进的方法。
{"title":"ConMSDMamba: Multi-Scale Dilated Mamba Based on Conformer for Speech Emotion Recognition","authors":"Guangyuan Qian;Zhenchun Lei;Sihong Liu;Changhong Liu;Aiwen Jiang","doi":"10.1109/LSP.2025.3632762","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632762","url":null,"abstract":"Although the Conformer model excels in speech processing, its core self-attention mechanism is limited in capturing multi-scale temporal dynamics and lacks explicit modeling of frequency-domain features, both crucial for Speech Emotion Recognition (SER). To address this, we propose ConMSDMamba, a novel Conformer-based architecture for SER. Specifically, to overcome the single-scale limitation of the original self-attention, we introduce a multi-scale dilated structure with parallel dilated convolutions to capture diverse temporal contexts. We further find that combining this structure with bidirectional Mamba models long-range temporal dependencies more efficiently than multi-head self-attention. Furthermore, to complement the Conformer’s time-domain focus, we design a time-frequency convolution module that incorporates a wavelet-based branch for joint time-frequency perception. Experimental results on the widely used IEMOCAP and MELD datasets demonstrate that ConMSDMamba outperforms state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4379-4383"},"PeriodicalIF":3.9,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Video Steganography With Optimized Robust Modulation Paths for Lossy Channels 具有优化鲁棒调制路径的有损信道视频隐写
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-14 DOI: 10.1109/LSP.2025.3632809
Jie Gan;Hong Zhang;Zihao Guo;Yun Cao
Social networks provide an ideal channel for covert communication due to their one-to-many broadcasting nature and the concealment of communication links. Videos, with their rich content and high embedding capacity, serve as suitable carriers for steganography. However, video transcoding performed by social networks often invalidates traditional steganographic methods. To address this challenge, we propose a novel framework based on optimized robust modulation paths. Specifically, we analyze the influence of modulation types on the robustness of embedding units, introduce a cost assignment method to quantify the embedding impact, and develop an optimization strategy to identify robust modulation paths. Experimental results demonstrate that the proposed method achieves an average bit error rate below 0.5% across mainstream social networks, outperforming state-of-the-art methods in terms of robustness while maintaining sufficient steganographic security.
社交网络以其一对多的传播特性和通信链路的隐蔽性为隐蔽通信提供了理想的渠道。视频内容丰富、嵌入容量大,是隐写的合适载体。然而,由社交网络执行的视频转码往往使传统的隐写方法无效。为了解决这一挑战,我们提出了一种基于优化鲁棒调制路径的新框架。具体而言,我们分析了调制类型对嵌入单元鲁棒性的影响,引入了一种成本分配方法来量化嵌入影响,并开发了一种优化策略来识别鲁棒调制路径。实验结果表明,该方法在主流社交网络中的平均误码率低于0.5%,在鲁棒性方面优于最先进的方法,同时保持足够的隐写安全性。
{"title":"Video Steganography With Optimized Robust Modulation Paths for Lossy Channels","authors":"Jie Gan;Hong Zhang;Zihao Guo;Yun Cao","doi":"10.1109/LSP.2025.3632809","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632809","url":null,"abstract":"Social networks provide an ideal channel for covert communication due to their one-to-many broadcasting nature and the concealment of communication links. Videos, with their rich content and high embedding capacity, serve as suitable carriers for steganography. However, video transcoding performed by social networks often invalidates traditional steganographic methods. To address this challenge, we propose a novel framework based on optimized robust modulation paths. Specifically, we analyze the influence of modulation types on the robustness of embedding units, introduce a cost assignment method to quantify the embedding impact, and develop an optimization strategy to identify robust modulation paths. Experimental results demonstrate that the proposed method achieves an average bit error rate below 0.5% across mainstream social networks, outperforming state-of-the-art methods in terms of robustness while maintaining sufficient steganographic security.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4404-4408"},"PeriodicalIF":3.9,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Tube Arrangement Using a Mesh-Based Sorting Approach in Video Synopsis 视频摘要中基于网格的管材排列方法
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-12 DOI: 10.1109/LSP.2025.3631432
Vyasdev;Koustuv Saha;Jeel Patel;Ansuman Mahapatra;Priyadharshini S
As the amount of surveillance video keeps growing rapidly, monitoring it has become harder and takes more time. Video synopsis helps by making a compact version of the original video, eliminating spatial and temporal redundancies while retaining all critical activities. In contrast to conventional approaches that rely on traditional tube rearrangement strategies, this work proposes a novel mesh-based tube sorting algorithm within a comprehensive video synopsis pipeline. The framework includes object segmentation and tracking using the YOLO11 segmentation model, followed by tube extraction and rearrangement using custom algorithms. Furthermore, a new evaluation metric is introduced to measure tube rearrangement algorithms, with lower computational complexity than existing metrics in the domain of video synopsis. To improve accuracy, an interpolation algorithm is also proposed to reconstruct broken object tubes caused by detection or segmentation errors, resulting in a more efficient and robust video synopsis framework.
随着监控视频数量的快速增长,监控难度加大,耗时增加。视频摘要有助于制作原始视频的紧凑版本,消除空间和时间冗余,同时保留所有关键活动。与依赖于传统管重排策略的传统方法相比,本研究提出了一种基于网格的综合视频摘要管道管排序算法。该框架包括使用YOLO11分割模型对目标进行分割和跟踪,然后使用自定义算法对管进行提取和重排。在此基础上,引入了一种新的评价指标来衡量管道重排算法,该指标的计算复杂度低于视频摘要领域现有的评价指标。为了提高精度,还提出了一种插值算法来重建由于检测或分割错误而导致的破碎的目标管,从而使视频摘要框架更加高效和鲁棒。
{"title":"Tube Arrangement Using a Mesh-Based Sorting Approach in Video Synopsis","authors":"Vyasdev;Koustuv Saha;Jeel Patel;Ansuman Mahapatra;Priyadharshini S","doi":"10.1109/LSP.2025.3631432","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631432","url":null,"abstract":"As the amount of surveillance video keeps growing rapidly, monitoring it has become harder and takes more time. Video synopsis helps by making a compact version of the original video, eliminating spatial and temporal redundancies while retaining all critical activities. In contrast to conventional approaches that rely on traditional tube rearrangement strategies, this work proposes a novel mesh-based tube sorting algorithm within a comprehensive video synopsis pipeline. The framework includes object segmentation and tracking using the YOLO11 segmentation model, followed by tube extraction and rearrangement using custom algorithms. Furthermore, a new evaluation metric is introduced to measure tube rearrangement algorithms, with lower computational complexity than existing metrics in the domain of video synopsis. To improve accuracy, an interpolation algorithm is also proposed to reconstruct broken object tubes caused by detection or segmentation errors, resulting in a more efficient and robust video synopsis framework.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4389-4393"},"PeriodicalIF":3.9,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
General Pruning Criteria for Fast SBL 快速SBL的一般修剪标准
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-12 DOI: 10.1109/LSP.2025.3632230
Jakob Möderl;Erik Leitinger;Bernard Henri Fleury
Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the weights and performing (marginalized) maximum likelihood (ML) estimation. sparse Bayesian learning (SBL) returns many hyperparameter estimates to diverge to infinity, effectively setting the estimates of the corresponding weights to zero (i.e., pruning the corresponding weights from the model) and thereby yielding a sparse estimate of the weight vector. In this letter, we analyze the marginal likelihood as function of a single hyperparameter while keeping the others fixed, when the Gaussian assumptions on the noise samples and the weight distribution that underlies the derivation of SBL are relaxed. We derive sufficient conditions that lead, on the one hand, to finite hyperparameter estimates and, on the other, to infinite ones. Finally, we show that in the Gaussian case, the two conditions are complementary and reduce to the pruning condition of fast SBL (F-SBL). Thereby, our results offer a novel insight into the fundamental internal features that lead to the pruning mechanism of F-SBL.
稀疏贝叶斯学习(SBL)将底层线性模型中的每个权值与一个超参数关联起来,方法是假设每个权值是高斯分布,均值为零,精度(方差逆)等于其相关的超参数。该方法通过边缘化权重和执行(边缘化)最大似然(ML)估计来估计超参数。稀疏贝叶斯学习(SBL)返回许多超参数估计发散到无穷,有效地将相应权值的估计设置为零(即从模型中修剪相应的权值),从而产生权向量的稀疏估计。在这封信中,我们分析边际似然作为单个超参数的函数,同时保持其他参数固定,当噪声样本的高斯假设和SBL推导基础的权重分布放松时。我们一方面导出有限超参数估计,另一方面导出无限超参数估计的充分条件。最后,我们证明了在高斯情况下,这两个条件是互补的,并简化为快速SBL (F-SBL)的剪枝条件。因此,我们的研究结果对导致F-SBL修剪机制的基本内部特征提供了新的见解。
{"title":"General Pruning Criteria for Fast SBL","authors":"Jakob Möderl;Erik Leitinger;Bernard Henri Fleury","doi":"10.1109/LSP.2025.3632230","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632230","url":null,"abstract":"Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the weights and performing (marginalized) maximum likelihood (ML) estimation. sparse Bayesian learning (SBL) returns many hyperparameter estimates to diverge to infinity, effectively setting the estimates of the corresponding weights to zero (i.e., pruning the corresponding weights from the model) and thereby yielding a sparse estimate of the weight vector. In this letter, we analyze the marginal likelihood as function of a single hyperparameter while keeping the others fixed, when the Gaussian assumptions on the noise samples and the weight distribution that underlies the derivation of SBL are relaxed. We derive sufficient conditions that lead, on the one hand, to finite hyperparameter estimates and, on the other, to infinite ones. Finally, we show that in the Gaussian case, the two conditions are complementary and reduce to the pruning condition of fast SBL (F-SBL). Thereby, our results offer a novel insight into the fundamental internal features that lead to the pruning mechanism of F-SBL.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4374-4378"},"PeriodicalIF":3.9,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11244229","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1