Pub Date : 2025-11-21DOI: 10.1109/LSP.2025.3635010
Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar
Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor $ast _{mathcal {L}}$ -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.
{"title":"SPTPCA: Structure-Preserving Tensor Principal Component Analysis for Hyperspectral Dimensionality Reduction","authors":"Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar","doi":"10.1109/LSP.2025.3635010","DOIUrl":"https://doi.org/10.1109/LSP.2025.3635010","url":null,"abstract":"Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor <inline-formula><tex-math>$ast _{mathcal {L}}$</tex-math></inline-formula> -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4469-4472"},"PeriodicalIF":3.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.
{"title":"Zero-Shot Interpretable Image Steganalysis for Invertible Image Hiding","authors":"Hao Wang;Yiming Yao;Yaguang Xie;Tong Qiao;Zhidong Zhao","doi":"10.1109/LSP.2025.3633618","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633618","url":null,"abstract":"Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4434-4438"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LSP.2025.3633579
Hyon Kim;Emmanouil Benetos;Xavier Serra
Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., mp, ff) and non-constant DMs (e.g., crescendo, fp). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.
{"title":"Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance","authors":"Hyon Kim;Emmanouil Benetos;Xavier Serra","doi":"10.1109/LSP.2025.3633579","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633579","url":null,"abstract":"Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., <italic>mp</i>, <italic>ff</i>) and non-constant DMs (e.g., <italic>crescendo</i>, <italic>fp</i>). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4459-4463"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250595","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LSP.2025.3633169
Zhuojun Tian;Mehdi Bennis
In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.
{"title":"Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective","authors":"Zhuojun Tian;Mehdi Bennis","doi":"10.1109/LSP.2025.3633169","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633169","url":null,"abstract":"In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4409-4413"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LSP.2025.3634030
Yanbin Zou;Binhan Liao
In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram $acute{text{e}}$ r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.
{"title":"Improving Noisy Sensor Positions Using Noisy Inter-Sensor AOA Measurements","authors":"Yanbin Zou;Binhan Liao","doi":"10.1109/LSP.2025.3634030","DOIUrl":"https://doi.org/10.1109/LSP.2025.3634030","url":null,"abstract":"In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram <inline-formula><tex-math>$acute{text{e}}$</tex-math></inline-formula> r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4449-4453"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1109/LSP.2025.3632760
Jianyu Ding;Yichen Song;Shuai Liu;Xiaonan Mao;Jie Yang;Wei Liu
Multi-view surface reconstruction is essential for accurate 3D modeling and high-quality novel view synthesis. Neural implicit methods, such as NeRF, exhibit excellent rendering but struggle with precise surface extraction. While 3D Gaussian splatting (3DGS) provides efficient explicit representations, it suffers from geometric inaccuracies due to misalignment and lack of strong surface constraints, especially in real-world scenarios. These issues stem from unordered Gaussian primitives, may tend to surface drift, redundancy, and blurred boundaries. To overcome these limitations, we propose a surface-aware Gaussian aggregation framework featuring an adaptive normal alignment loss that integrates rendered, depth-based, and monocular normals to enforce robust surface orientation supervision. Additionally, our surface-guided optimization strategy aligns Gaussian primitives precisely to surfaces by exploiting combined rendered and predicted geometric information. Extensive experiments demonstrate our approach achieves state-of-the-art surface reconstruction accuracy alongside superior novel view synthesis, with ablation studies confirming the efficacy of our contributions.
{"title":"NAS-GS: Normal Alignment and Surface-Constrained Optimization of 3DGS for High-Fidelity Surface Reconstruction","authors":"Jianyu Ding;Yichen Song;Shuai Liu;Xiaonan Mao;Jie Yang;Wei Liu","doi":"10.1109/LSP.2025.3632760","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632760","url":null,"abstract":"Multi-view surface reconstruction is essential for accurate 3D modeling and high-quality novel view synthesis. Neural implicit methods, such as NeRF, exhibit excellent rendering but struggle with precise surface extraction. While 3D Gaussian splatting (3DGS) provides efficient explicit representations, it suffers from geometric inaccuracies due to misalignment and lack of strong surface constraints, especially in real-world scenarios. These issues stem from unordered Gaussian primitives, may tend to surface drift, redundancy, and blurred boundaries. To overcome these limitations, we propose a surface-aware Gaussian aggregation framework featuring an adaptive normal alignment loss that integrates rendered, depth-based, and monocular normals to enforce robust surface orientation supervision. Additionally, our surface-guided optimization strategy aligns Gaussian primitives precisely to surfaces by exploiting combined rendered and predicted geometric information. Extensive experiments demonstrate our approach achieves state-of-the-art surface reconstruction accuracy alongside superior novel view synthesis, with ablation studies confirming the efficacy of our contributions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4399-4403"},"PeriodicalIF":3.9,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although the Conformer model excels in speech processing, its core self-attention mechanism is limited in capturing multi-scale temporal dynamics and lacks explicit modeling of frequency-domain features, both crucial for Speech Emotion Recognition (SER). To address this, we propose ConMSDMamba, a novel Conformer-based architecture for SER. Specifically, to overcome the single-scale limitation of the original self-attention, we introduce a multi-scale dilated structure with parallel dilated convolutions to capture diverse temporal contexts. We further find that combining this structure with bidirectional Mamba models long-range temporal dependencies more efficiently than multi-head self-attention. Furthermore, to complement the Conformer’s time-domain focus, we design a time-frequency convolution module that incorporates a wavelet-based branch for joint time-frequency perception. Experimental results on the widely used IEMOCAP and MELD datasets demonstrate that ConMSDMamba outperforms state-of-the-art methods.
{"title":"ConMSDMamba: Multi-Scale Dilated Mamba Based on Conformer for Speech Emotion Recognition","authors":"Guangyuan Qian;Zhenchun Lei;Sihong Liu;Changhong Liu;Aiwen Jiang","doi":"10.1109/LSP.2025.3632762","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632762","url":null,"abstract":"Although the Conformer model excels in speech processing, its core self-attention mechanism is limited in capturing multi-scale temporal dynamics and lacks explicit modeling of frequency-domain features, both crucial for Speech Emotion Recognition (SER). To address this, we propose ConMSDMamba, a novel Conformer-based architecture for SER. Specifically, to overcome the single-scale limitation of the original self-attention, we introduce a multi-scale dilated structure with parallel dilated convolutions to capture diverse temporal contexts. We further find that combining this structure with bidirectional Mamba models long-range temporal dependencies more efficiently than multi-head self-attention. Furthermore, to complement the Conformer’s time-domain focus, we design a time-frequency convolution module that incorporates a wavelet-based branch for joint time-frequency perception. Experimental results on the widely used IEMOCAP and MELD datasets demonstrate that ConMSDMamba outperforms state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4379-4383"},"PeriodicalIF":3.9,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1109/LSP.2025.3632809
Jie Gan;Hong Zhang;Zihao Guo;Yun Cao
Social networks provide an ideal channel for covert communication due to their one-to-many broadcasting nature and the concealment of communication links. Videos, with their rich content and high embedding capacity, serve as suitable carriers for steganography. However, video transcoding performed by social networks often invalidates traditional steganographic methods. To address this challenge, we propose a novel framework based on optimized robust modulation paths. Specifically, we analyze the influence of modulation types on the robustness of embedding units, introduce a cost assignment method to quantify the embedding impact, and develop an optimization strategy to identify robust modulation paths. Experimental results demonstrate that the proposed method achieves an average bit error rate below 0.5% across mainstream social networks, outperforming state-of-the-art methods in terms of robustness while maintaining sufficient steganographic security.
{"title":"Video Steganography With Optimized Robust Modulation Paths for Lossy Channels","authors":"Jie Gan;Hong Zhang;Zihao Guo;Yun Cao","doi":"10.1109/LSP.2025.3632809","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632809","url":null,"abstract":"Social networks provide an ideal channel for covert communication due to their one-to-many broadcasting nature and the concealment of communication links. Videos, with their rich content and high embedding capacity, serve as suitable carriers for steganography. However, video transcoding performed by social networks often invalidates traditional steganographic methods. To address this challenge, we propose a novel framework based on optimized robust modulation paths. Specifically, we analyze the influence of modulation types on the robustness of embedding units, introduce a cost assignment method to quantify the embedding impact, and develop an optimization strategy to identify robust modulation paths. Experimental results demonstrate that the proposed method achieves an average bit error rate below 0.5% across mainstream social networks, outperforming state-of-the-art methods in terms of robustness while maintaining sufficient steganographic security.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4404-4408"},"PeriodicalIF":3.9,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1109/LSP.2025.3631432
Vyasdev;Koustuv Saha;Jeel Patel;Ansuman Mahapatra;Priyadharshini S
As the amount of surveillance video keeps growing rapidly, monitoring it has become harder and takes more time. Video synopsis helps by making a compact version of the original video, eliminating spatial and temporal redundancies while retaining all critical activities. In contrast to conventional approaches that rely on traditional tube rearrangement strategies, this work proposes a novel mesh-based tube sorting algorithm within a comprehensive video synopsis pipeline. The framework includes object segmentation and tracking using the YOLO11 segmentation model, followed by tube extraction and rearrangement using custom algorithms. Furthermore, a new evaluation metric is introduced to measure tube rearrangement algorithms, with lower computational complexity than existing metrics in the domain of video synopsis. To improve accuracy, an interpolation algorithm is also proposed to reconstruct broken object tubes caused by detection or segmentation errors, resulting in a more efficient and robust video synopsis framework.
{"title":"Tube Arrangement Using a Mesh-Based Sorting Approach in Video Synopsis","authors":"Vyasdev;Koustuv Saha;Jeel Patel;Ansuman Mahapatra;Priyadharshini S","doi":"10.1109/LSP.2025.3631432","DOIUrl":"https://doi.org/10.1109/LSP.2025.3631432","url":null,"abstract":"As the amount of surveillance video keeps growing rapidly, monitoring it has become harder and takes more time. Video synopsis helps by making a compact version of the original video, eliminating spatial and temporal redundancies while retaining all critical activities. In contrast to conventional approaches that rely on traditional tube rearrangement strategies, this work proposes a novel mesh-based tube sorting algorithm within a comprehensive video synopsis pipeline. The framework includes object segmentation and tracking using the YOLO11 segmentation model, followed by tube extraction and rearrangement using custom algorithms. Furthermore, a new evaluation metric is introduced to measure tube rearrangement algorithms, with lower computational complexity than existing metrics in the domain of video synopsis. To improve accuracy, an interpolation algorithm is also proposed to reconstruct broken object tubes caused by detection or segmentation errors, resulting in a more efficient and robust video synopsis framework.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4389-4393"},"PeriodicalIF":3.9,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-12DOI: 10.1109/LSP.2025.3632230
Jakob Möderl;Erik Leitinger;Bernard Henri Fleury
Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the weights and performing (marginalized) maximum likelihood (ML) estimation. sparse Bayesian learning (SBL) returns many hyperparameter estimates to diverge to infinity, effectively setting the estimates of the corresponding weights to zero (i.e., pruning the corresponding weights from the model) and thereby yielding a sparse estimate of the weight vector. In this letter, we analyze the marginal likelihood as function of a single hyperparameter while keeping the others fixed, when the Gaussian assumptions on the noise samples and the weight distribution that underlies the derivation of SBL are relaxed. We derive sufficient conditions that lead, on the one hand, to finite hyperparameter estimates and, on the other, to infinite ones. Finally, we show that in the Gaussian case, the two conditions are complementary and reduce to the pruning condition of fast SBL (F-SBL). Thereby, our results offer a novel insight into the fundamental internal features that lead to the pruning mechanism of F-SBL.
{"title":"General Pruning Criteria for Fast SBL","authors":"Jakob Möderl;Erik Leitinger;Bernard Henri Fleury","doi":"10.1109/LSP.2025.3632230","DOIUrl":"https://doi.org/10.1109/LSP.2025.3632230","url":null,"abstract":"Sparse Bayesian learning (SBL) associates to each weight in the underlying linear model a hyperparameter by assuming that each weight is Gaussian distributed with zero mean and precision (inverse variance) equal to its associated hyperparameter. The method estimates the hyperparameters by marginalizing out the weights and performing (marginalized) maximum likelihood (ML) estimation. sparse Bayesian learning (SBL) returns many hyperparameter estimates to diverge to infinity, effectively setting the estimates of the corresponding weights to zero (i.e., pruning the corresponding weights from the model) and thereby yielding a sparse estimate of the weight vector. In this letter, we analyze the marginal likelihood as function of a single hyperparameter while keeping the others fixed, when the Gaussian assumptions on the noise samples and the weight distribution that underlies the derivation of SBL are relaxed. We derive sufficient conditions that lead, on the one hand, to finite hyperparameter estimates and, on the other, to infinite ones. Finally, we show that in the Gaussian case, the two conditions are complementary and reduce to the pruning condition of fast SBL (F-SBL). Thereby, our results offer a novel insight into the fundamental internal features that lead to the pruning mechanism of F-SBL.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4374-4378"},"PeriodicalIF":3.9,"publicationDate":"2025-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11244229","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145560682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}