With the widespread adoption of smart terminals, compressed video is increasingly utilized in the receiver for purposes beyond human vision. Conventional video coding standards are optimized primarily for human visual perception and often fail to accommodate the distinct requirements of machine vision. To simultaneously satisfy the perceptual needs and the analytical demands, we propose a novel rate control scheme based on Versatile Video Coding (VVC) for human-machine vision collaborative video coding. Specifically, we employ the You Only Look Once (YOLO) network to extract task-relevant features for machine vision and formulate a detection feature weight based on these features. Leveraging the feature weight and the spatial location information of Coding Tree Units (CTUs), we propose a region classification algorithm that partitions a frame into machine vision-sensitive region (MVSR) and machine vision non-sensitive region (MVNR). Subsequently, we develop an enhanced and refined bit allocation strategy that performs region-level and CTU-level bit allocation, thereby improving the precision and effectiveness of the rate control. Experimental results demonstrate that the scheme improves machine task detection accuracy while preserving perceptual quality for human observers, effectively meeting the dual encoding requirements of human and machine vision.
随着智能终端的广泛采用,压缩视频越来越多地在接收器中用于超越人类视觉的目的。传统的视频编码标准主要针对人类视觉感知进行优化,往往无法适应机器视觉的独特要求。为了同时满足感知需求和分析需求,提出了一种基于通用视频编码(VVC)的人机视觉协同视频编码速率控制方案。具体来说,我们使用You Only Look Once (YOLO)网络来提取机器视觉的任务相关特征,并基于这些特征制定检测特征权重。利用特征权值和编码树单元(ctu)的空间位置信息,提出了一种将帧划分为机器视觉敏感区(MVSR)和机器视觉非敏感区(MVNR)的区域分类算法。随后,我们开发了一种增强和改进的比特分配策略,可以执行区域级和ctu级的比特分配,从而提高了速率控制的精度和有效性。实验结果表明,该方案在提高机器任务检测精度的同时,保持了人类观察者的感知质量,有效地满足了人类和机器视觉的双重编码要求。
{"title":"Human-Machine Vision Collaboration Based Rate Control Scheme for VVC","authors":"Zeming Zhao;Xiaohai He;Xiaodong Bi;Hong Yang;Shuhua Xiong","doi":"10.1109/LSP.2025.3638597","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638597","url":null,"abstract":"With the widespread adoption of smart terminals, compressed video is increasingly utilized in the receiver for purposes beyond human vision. Conventional video coding standards are optimized primarily for human visual perception and often fail to accommodate the distinct requirements of machine vision. To simultaneously satisfy the perceptual needs and the analytical demands, we propose a novel rate control scheme based on Versatile Video Coding (VVC) for human-machine vision collaborative video coding. Specifically, we employ the You Only Look Once (YOLO) network to extract task-relevant features for machine vision and formulate a detection feature weight based on these features. Leveraging the feature weight and the spatial location information of Coding Tree Units (CTUs), we propose a region classification algorithm that partitions a frame into machine vision-sensitive region (MVSR) and machine vision non-sensitive region (MVNR). Subsequently, we develop an enhanced and refined bit allocation strategy that performs region-level and CTU-level bit allocation, thereby improving the precision and effectiveness of the rate control. Experimental results demonstrate that the scheme improves machine task detection accuracy while preserving perceptual quality for human observers, effectively meeting the dual encoding requirements of human and machine vision.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"126-130"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-27DOI: 10.1109/LSP.2025.3637728
Haitao Yang;Yingzhuo Xiong;Dongliang Zhang;Xiai Yan;Xuran Hu
Small-object detection in uncrewed aerial vehicle (UAV) imagery remains challenging due to limited resolution, complex backgrounds, scale variation, and strict real-time constraints. Existing lightweight detectors often struggle to retain fine details while ensuring efficiency, reducing robustness in UAV applications. This letter proposes a lightweight multi-scale framework integrating Partial Dilated Convolution (PDC), a Triplet Focus Attention Module (TFAM), a Multi-Scale Feature Fusion (MSFF) branch, and a bidirectional BiFPN. PDC enlarges receptive field diversity while preserving local texture, TFAM jointly enhances spatial, channel, and coordinate attention, and MSFF with BiFPN achieves efficient cross-scale fusion. On VisDrone2019, our model reaches 52.7% mAP50 with 6.01M parameters and 148 FPS, and on HIT-UAV yields 85.2% mAP50 and 155 FPS, surpassing state-of-the-art UAV detectors in accuracy and efficiency. Visualization further verifies robustness under low-light, dense, and scale-varying UAV scenes.
{"title":"Lightweight Attention-Enhanced Multi-Scale Detector for Robust Small Object Detection in UAV","authors":"Haitao Yang;Yingzhuo Xiong;Dongliang Zhang;Xiai Yan;Xuran Hu","doi":"10.1109/LSP.2025.3637728","DOIUrl":"https://doi.org/10.1109/LSP.2025.3637728","url":null,"abstract":"Small-object detection in uncrewed aerial vehicle (UAV) imagery remains challenging due to limited resolution, complex backgrounds, scale variation, and strict real-time constraints. Existing lightweight detectors often struggle to retain fine details while ensuring efficiency, reducing robustness in UAV applications. This letter proposes a lightweight multi-scale framework integrating Partial Dilated Convolution (PDC), a Triplet Focus Attention Module (TFAM), a Multi-Scale Feature Fusion (MSFF) branch, and a bidirectional BiFPN. PDC enlarges receptive field diversity while preserving local texture, TFAM jointly enhances spatial, channel, and coordinate attention, and MSFF with BiFPN achieves efficient cross-scale fusion. On VisDrone2019, our model reaches 52.7% mAP50 with 6.01M parameters and 148 FPS, and on HIT-UAV yields 85.2% mAP50 and 155 FPS, surpassing state-of-the-art UAV detectors in accuracy and efficiency. Visualization further verifies robustness under low-light, dense, and scale-varying UAV scenes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"271-275"},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-24DOI: 10.1109/LSP.2025.3636762
Weicao Deng;Sangwoo Park;Min Li;Osvaldo Simeone
Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent approaches based on meta-learning or in-context learning (ICL) partially mitigate these drawbacks. However, they rely on training procedures not specifically tailored to CP, which may yield large prediction sets. We introduce an efficient FCP framework, termed enhanced ICL-based FCP (E-ICL+FCP), which employs a permutation-invariant Transformer-based ICL model trained with a CP-aware loss. By simulating the multiple retrained models required by FCP without actual retraining, E-ICL+FCP preserves coverage while markedly reducing both inefficiency and computational overhead. Experiments on synthetic and real tasks demonstrate that E-ICL+FCP attains superior efficiency-coverage trade-offs compared to existing SCP and FCP baselines.
{"title":"Optimizing In-Context Learning for Efficient Full Conformal Prediction","authors":"Weicao Deng;Sangwoo Park;Min Li;Osvaldo Simeone","doi":"10.1109/LSP.2025.3636762","DOIUrl":"https://doi.org/10.1109/LSP.2025.3636762","url":null,"abstract":"Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent approaches based on meta-learning or in-context learning (ICL) partially mitigate these drawbacks. However, they rely on training procedures not specifically tailored to CP, which may yield large prediction sets. We introduce an efficient FCP framework, termed enhanced ICL-based FCP (E-ICL+FCP), which employs a permutation-invariant Transformer-based ICL model trained with a CP-aware loss. By simulating the multiple retrained models required by FCP without actual retraining, E-ICL+FCP preserves coverage while markedly reducing both inefficiency and computational overhead. Experiments on synthetic and real tasks demonstrate that E-ICL+FCP attains superior efficiency-coverage trade-offs compared to existing SCP and FCP baselines.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"311-315"},"PeriodicalIF":3.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Polycystic ovary syndrome (PCOS) not only causes anovulation in women but also severely affects their physical and mental health. Clinically, diagnostic delays often cause patients to miss optimal treatment windows. As a non-invasive detection technique, Raman spectroscopy has been used for screening this disease. In this letter, the Raman spectra of follicular fluid and plasma from women which PCOS are examined using a deep physics-informed neural network. The results demonstrate that by incorporating physical priors and integrating multi-domain spectral information, the proposed method achieves accuracies of 96.25$%$ in detecting PCOS from plasma samples and 90.00$%$ from follicular fluid samples.
{"title":"Study on an Intelligent Screening Method for Polycystic Ovary Syndrome Based on Deep PhysicsInformed Neural Network","authors":"Yu Gong;Danji Wang;Chao Wu;Man Ni;Shengli Li;Yang Liu;Ziyuan Shen;Zhidong Su;Xiaoxiao Liu;Huiping Zhou;Huijie Zhang","doi":"10.1109/LSP.2025.3636719","DOIUrl":"https://doi.org/10.1109/LSP.2025.3636719","url":null,"abstract":"Polycystic ovary syndrome (PCOS) not only causes anovulation in women but also severely affects their physical and mental health. Clinically, diagnostic delays often cause patients to miss optimal treatment windows. As a non-invasive detection technique, Raman spectroscopy has been used for screening this disease. In this letter, the Raman spectra of follicular fluid and plasma from women which PCOS are examined using a deep physics-informed neural network. The results demonstrate that by incorporating physical priors and integrating multi-domain spectral information, the proposed method achieves accuracies of 96.25<inline-formula><tex-math>$%$</tex-math></inline-formula> in detecting PCOS from plasma samples and 90.00<inline-formula><tex-math>$%$</tex-math></inline-formula> from follicular fluid samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"266-270"},"PeriodicalIF":3.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21DOI: 10.1109/LSP.2025.3635004
Hao Wu;Fengjiao Gan;Xu Chen
This letter presents a wide field-of-view (FoV) millimeter-wave array synthetic aperture radar (SAR) imaging system based on curved linear array. The proposed system retains the low-cost advantage of planar scanning array SARs while offering a broader viewing angle. However, the significant disparity in spatial sampling density across different regions of the sampling aperture results in suboptimal imaging performance when employing the classical back-projection algorithm (BPA). To address this issue, we introduce a measurement-fusion imaging algorithm tailored for this system, which involves constructing uniformly sampled sub-apertures and calculating spatial grid weights. This approach significantly enhances image integrity and mitigates artifacts and sidelobes. Experiments demonstrate high-quality imaging with an extended FoV.
{"title":"Wide Field-of-View MMW SISO-SAR Image Reconstruction Based on Curved Linear Array","authors":"Hao Wu;Fengjiao Gan;Xu Chen","doi":"10.1109/LSP.2025.3635004","DOIUrl":"https://doi.org/10.1109/LSP.2025.3635004","url":null,"abstract":"This letter presents a wide field-of-view (FoV) millimeter-wave array synthetic aperture radar (SAR) imaging system based on curved linear array. The proposed system retains the low-cost advantage of planar scanning array SARs while offering a broader viewing angle. However, the significant disparity in spatial sampling density across different regions of the sampling aperture results in suboptimal imaging performance when employing the classical back-projection algorithm (BPA). To address this issue, we introduce a measurement-fusion imaging algorithm tailored for this system, which involves constructing uniformly sampled sub-apertures and calculating spatial grid weights. This approach significantly enhances image integrity and mitigates artifacts and sidelobes. Experiments demonstrate high-quality imaging with an extended FoV.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4464-4468"},"PeriodicalIF":3.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-21DOI: 10.1109/LSP.2025.3635010
Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar
Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor $ast _{mathcal {L}}$ -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.
{"title":"SPTPCA: Structure-Preserving Tensor Principal Component Analysis for Hyperspectral Dimensionality Reduction","authors":"Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar","doi":"10.1109/LSP.2025.3635010","DOIUrl":"https://doi.org/10.1109/LSP.2025.3635010","url":null,"abstract":"Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor <inline-formula><tex-math>$ast _{mathcal {L}}$</tex-math></inline-formula> -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4469-4472"},"PeriodicalIF":3.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.
{"title":"Zero-Shot Interpretable Image Steganalysis for Invertible Image Hiding","authors":"Hao Wang;Yiming Yao;Yaguang Xie;Tong Qiao;Zhidong Zhao","doi":"10.1109/LSP.2025.3633618","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633618","url":null,"abstract":"Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4434-4438"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LSP.2025.3633579
Hyon Kim;Emmanouil Benetos;Xavier Serra
Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., mp, ff) and non-constant DMs (e.g., crescendo, fp). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.
{"title":"Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance","authors":"Hyon Kim;Emmanouil Benetos;Xavier Serra","doi":"10.1109/LSP.2025.3633579","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633579","url":null,"abstract":"Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., <italic>mp</i>, <italic>ff</i>) and non-constant DMs (e.g., <italic>crescendo</i>, <italic>fp</i>). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4459-4463"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250595","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LSP.2025.3633169
Zhuojun Tian;Mehdi Bennis
In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.
{"title":"Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective","authors":"Zhuojun Tian;Mehdi Bennis","doi":"10.1109/LSP.2025.3633169","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633169","url":null,"abstract":"In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4409-4413"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-17DOI: 10.1109/LSP.2025.3634030
Yanbin Zou;Binhan Liao
In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram $acute{text{e}}$ r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.
{"title":"Improving Noisy Sensor Positions Using Noisy Inter-Sensor AOA Measurements","authors":"Yanbin Zou;Binhan Liao","doi":"10.1109/LSP.2025.3634030","DOIUrl":"https://doi.org/10.1109/LSP.2025.3634030","url":null,"abstract":"In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram <inline-formula><tex-math>$acute{text{e}}$</tex-math></inline-formula> r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4449-4453"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}