首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
Human-Machine Vision Collaboration Based Rate Control Scheme for VVC 基于人机视觉协同的VVC速率控制方案
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638597
Zeming Zhao;Xiaohai He;Xiaodong Bi;Hong Yang;Shuhua Xiong
With the widespread adoption of smart terminals, compressed video is increasingly utilized in the receiver for purposes beyond human vision. Conventional video coding standards are optimized primarily for human visual perception and often fail to accommodate the distinct requirements of machine vision. To simultaneously satisfy the perceptual needs and the analytical demands, we propose a novel rate control scheme based on Versatile Video Coding (VVC) for human-machine vision collaborative video coding. Specifically, we employ the You Only Look Once (YOLO) network to extract task-relevant features for machine vision and formulate a detection feature weight based on these features. Leveraging the feature weight and the spatial location information of Coding Tree Units (CTUs), we propose a region classification algorithm that partitions a frame into machine vision-sensitive region (MVSR) and machine vision non-sensitive region (MVNR). Subsequently, we develop an enhanced and refined bit allocation strategy that performs region-level and CTU-level bit allocation, thereby improving the precision and effectiveness of the rate control. Experimental results demonstrate that the scheme improves machine task detection accuracy while preserving perceptual quality for human observers, effectively meeting the dual encoding requirements of human and machine vision.
随着智能终端的广泛采用,压缩视频越来越多地在接收器中用于超越人类视觉的目的。传统的视频编码标准主要针对人类视觉感知进行优化,往往无法适应机器视觉的独特要求。为了同时满足感知需求和分析需求,提出了一种基于通用视频编码(VVC)的人机视觉协同视频编码速率控制方案。具体来说,我们使用You Only Look Once (YOLO)网络来提取机器视觉的任务相关特征,并基于这些特征制定检测特征权重。利用特征权值和编码树单元(ctu)的空间位置信息,提出了一种将帧划分为机器视觉敏感区(MVSR)和机器视觉非敏感区(MVNR)的区域分类算法。随后,我们开发了一种增强和改进的比特分配策略,可以执行区域级和ctu级的比特分配,从而提高了速率控制的精度和有效性。实验结果表明,该方案在提高机器任务检测精度的同时,保持了人类观察者的感知质量,有效地满足了人类和机器视觉的双重编码要求。
{"title":"Human-Machine Vision Collaboration Based Rate Control Scheme for VVC","authors":"Zeming Zhao;Xiaohai He;Xiaodong Bi;Hong Yang;Shuhua Xiong","doi":"10.1109/LSP.2025.3638597","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638597","url":null,"abstract":"With the widespread adoption of smart terminals, compressed video is increasingly utilized in the receiver for purposes beyond human vision. Conventional video coding standards are optimized primarily for human visual perception and often fail to accommodate the distinct requirements of machine vision. To simultaneously satisfy the perceptual needs and the analytical demands, we propose a novel rate control scheme based on Versatile Video Coding (VVC) for human-machine vision collaborative video coding. Specifically, we employ the You Only Look Once (YOLO) network to extract task-relevant features for machine vision and formulate a detection feature weight based on these features. Leveraging the feature weight and the spatial location information of Coding Tree Units (CTUs), we propose a region classification algorithm that partitions a frame into machine vision-sensitive region (MVSR) and machine vision non-sensitive region (MVNR). Subsequently, we develop an enhanced and refined bit allocation strategy that performs region-level and CTU-level bit allocation, thereby improving the precision and effectiveness of the rate control. Experimental results demonstrate that the scheme improves machine task detection accuracy while preserving perceptual quality for human observers, effectively meeting the dual encoding requirements of human and machine vision.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"126-130"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight Attention-Enhanced Multi-Scale Detector for Robust Small Object Detection in UAV 轻型注意力增强多尺度检测器用于无人机小目标鲁棒检测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-27 DOI: 10.1109/LSP.2025.3637728
Haitao Yang;Yingzhuo Xiong;Dongliang Zhang;Xiai Yan;Xuran Hu
Small-object detection in uncrewed aerial vehicle (UAV) imagery remains challenging due to limited resolution, complex backgrounds, scale variation, and strict real-time constraints. Existing lightweight detectors often struggle to retain fine details while ensuring efficiency, reducing robustness in UAV applications. This letter proposes a lightweight multi-scale framework integrating Partial Dilated Convolution (PDC), a Triplet Focus Attention Module (TFAM), a Multi-Scale Feature Fusion (MSFF) branch, and a bidirectional BiFPN. PDC enlarges receptive field diversity while preserving local texture, TFAM jointly enhances spatial, channel, and coordinate attention, and MSFF with BiFPN achieves efficient cross-scale fusion. On VisDrone2019, our model reaches 52.7% mAP50 with 6.01M parameters and 148 FPS, and on HIT-UAV yields 85.2% mAP50 and 155 FPS, surpassing state-of-the-art UAV detectors in accuracy and efficiency. Visualization further verifies robustness under low-light, dense, and scale-varying UAV scenes.
由于分辨率有限、背景复杂、尺度变化和严格的实时限制,无人机图像中的小目标检测仍然具有挑战性。现有的轻型探测器往往难以在保证效率的同时保留细节,从而降低了无人机应用的稳健性。本文提出了一个轻量级的多尺度框架,该框架集成了部分扩展卷积(PDC)、三重焦点注意模块(TFAM)、多尺度特征融合(MSFF)分支和双向BiFPN。PDC在保留局部纹理的同时扩大了感受野多样性,TFAM共同增强了空间、通道和协调注意力,MSFF与BiFPN实现了高效的跨尺度融合。在VisDrone2019上,我们的模型在6.01M参数和148 FPS下达到52.7%的mAP50,在HIT-UAV上达到85.2%的mAP50和155 FPS,在精度和效率上超过了最先进的无人机探测器。可视化进一步验证了在低光照、密集和尺度变化的无人机场景下的鲁棒性。
{"title":"Lightweight Attention-Enhanced Multi-Scale Detector for Robust Small Object Detection in UAV","authors":"Haitao Yang;Yingzhuo Xiong;Dongliang Zhang;Xiai Yan;Xuran Hu","doi":"10.1109/LSP.2025.3637728","DOIUrl":"https://doi.org/10.1109/LSP.2025.3637728","url":null,"abstract":"Small-object detection in uncrewed aerial vehicle (UAV) imagery remains challenging due to limited resolution, complex backgrounds, scale variation, and strict real-time constraints. Existing lightweight detectors often struggle to retain fine details while ensuring efficiency, reducing robustness in UAV applications. This letter proposes a lightweight multi-scale framework integrating Partial Dilated Convolution (PDC), a Triplet Focus Attention Module (TFAM), a Multi-Scale Feature Fusion (MSFF) branch, and a bidirectional BiFPN. PDC enlarges receptive field diversity while preserving local texture, TFAM jointly enhances spatial, channel, and coordinate attention, and MSFF with BiFPN achieves efficient cross-scale fusion. On VisDrone2019, our model reaches 52.7% mAP50 with 6.01M parameters and 148 FPS, and on HIT-UAV yields 85.2% mAP50 and 155 FPS, surpassing state-of-the-art UAV detectors in accuracy and efficiency. Visualization further verifies robustness under low-light, dense, and scale-varying UAV scenes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"271-275"},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing In-Context Learning for Efficient Full Conformal Prediction 优化上下文学习,实现高效的全保形预测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-24 DOI: 10.1109/LSP.2025.3636762
Weicao Deng;Sangwoo Park;Min Li;Osvaldo Simeone
Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent approaches based on meta-learning or in-context learning (ICL) partially mitigate these drawbacks. However, they rely on training procedures not specifically tailored to CP, which may yield large prediction sets. We introduce an efficient FCP framework, termed enhanced ICL-based FCP (E-ICL+FCP), which employs a permutation-invariant Transformer-based ICL model trained with a CP-aware loss. By simulating the multiple retrained models required by FCP without actual retraining, E-ICL+FCP preserves coverage while markedly reducing both inefficiency and computational overhead. Experiments on synthetic and real tasks demonstrate that E-ICL+FCP attains superior efficiency-coverage trade-offs compared to existing SCP and FCP baselines.
可靠的不确定性量化是值得信赖的人工智能的关键。共形预测(CP)提供了具有无分布覆盖保证的预测集,但是它的两个主要变体面临互补的限制。分割CP (SCP)由于数据集分区而导致数据效率低下,而完全CP (FCP)以过高的再训练复杂性为代价提高数据效率。最近基于元学习或上下文学习(ICL)的方法部分缓解了这些缺点。然而,它们依赖于训练程序,而不是专门为CP量身定制的,这可能会产生大量的预测集。我们介绍了一种高效的FCP框架,称为增强型基于ICL的FCP (E-ICL+FCP),它采用了一种基于置换不变变压器的ICL模型,该模型经过了cp感知损失的训练。通过模拟FCP所需的多个再训练模型而无需实际的再训练,E-ICL+FCP保留了覆盖范围,同时显着降低了效率和计算开销。合成任务和实际任务的实验表明,与现有的SCP和FCP基线相比,E-ICL+FCP获得了更好的效率-覆盖率权衡。
{"title":"Optimizing In-Context Learning for Efficient Full Conformal Prediction","authors":"Weicao Deng;Sangwoo Park;Min Li;Osvaldo Simeone","doi":"10.1109/LSP.2025.3636762","DOIUrl":"https://doi.org/10.1109/LSP.2025.3636762","url":null,"abstract":"Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent approaches based on meta-learning or in-context learning (ICL) partially mitigate these drawbacks. However, they rely on training procedures not specifically tailored to CP, which may yield large prediction sets. We introduce an efficient FCP framework, termed enhanced ICL-based FCP (E-ICL+FCP), which employs a permutation-invariant Transformer-based ICL model trained with a CP-aware loss. By simulating the multiple retrained models required by FCP without actual retraining, E-ICL+FCP preserves coverage while markedly reducing both inefficiency and computational overhead. Experiments on synthetic and real tasks demonstrate that E-ICL+FCP attains superior efficiency-coverage trade-offs compared to existing SCP and FCP baselines.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"311-315"},"PeriodicalIF":3.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study on an Intelligent Screening Method for Polycystic Ovary Syndrome Based on Deep PhysicsInformed Neural Network 基于深度物理信息神经网络的多囊卵巢综合征智能筛查方法研究
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-24 DOI: 10.1109/LSP.2025.3636719
Yu Gong;Danji Wang;Chao Wu;Man Ni;Shengli Li;Yang Liu;Ziyuan Shen;Zhidong Su;Xiaoxiao Liu;Huiping Zhou;Huijie Zhang
Polycystic ovary syndrome (PCOS) not only causes anovulation in women but also severely affects their physical and mental health. Clinically, diagnostic delays often cause patients to miss optimal treatment windows. As a non-invasive detection technique, Raman spectroscopy has been used for screening this disease. In this letter, the Raman spectra of follicular fluid and plasma from women which PCOS are examined using a deep physics-informed neural network. The results demonstrate that by incorporating physical priors and integrating multi-domain spectral information, the proposed method achieves accuracies of 96.25$%$ in detecting PCOS from plasma samples and 90.00$%$ from follicular fluid samples.
多囊卵巢综合征(PCOS)不仅会导致女性排卵障碍,还会严重影响女性的身心健康。临床上,诊断延误常常导致患者错过最佳治疗时机。作为一种无创检测技术,拉曼光谱已被用于该病的筛查。在这封信中,使用深度物理信息神经网络检查了多囊卵巢综合征妇女的卵泡液和血浆的拉曼光谱。结果表明,该方法结合物理先验和多域光谱信息,对血浆样品和卵泡液样品的PCOS检测准确率分别达到96.25美元和90.00美元。
{"title":"Study on an Intelligent Screening Method for Polycystic Ovary Syndrome Based on Deep PhysicsInformed Neural Network","authors":"Yu Gong;Danji Wang;Chao Wu;Man Ni;Shengli Li;Yang Liu;Ziyuan Shen;Zhidong Su;Xiaoxiao Liu;Huiping Zhou;Huijie Zhang","doi":"10.1109/LSP.2025.3636719","DOIUrl":"https://doi.org/10.1109/LSP.2025.3636719","url":null,"abstract":"Polycystic ovary syndrome (PCOS) not only causes anovulation in women but also severely affects their physical and mental health. Clinically, diagnostic delays often cause patients to miss optimal treatment windows. As a non-invasive detection technique, Raman spectroscopy has been used for screening this disease. In this letter, the Raman spectra of follicular fluid and plasma from women which PCOS are examined using a deep physics-informed neural network. The results demonstrate that by incorporating physical priors and integrating multi-domain spectral information, the proposed method achieves accuracies of 96.25<inline-formula><tex-math>$%$</tex-math></inline-formula> in detecting PCOS from plasma samples and 90.00<inline-formula><tex-math>$%$</tex-math></inline-formula> from follicular fluid samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"266-270"},"PeriodicalIF":3.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wide Field-of-View MMW SISO-SAR Image Reconstruction Based on Curved Linear Array 基于弯曲线阵的毫米波宽视场SISO-SAR图像重构
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-21 DOI: 10.1109/LSP.2025.3635004
Hao Wu;Fengjiao Gan;Xu Chen
This letter presents a wide field-of-view (FoV) millimeter-wave array synthetic aperture radar (SAR) imaging system based on curved linear array. The proposed system retains the low-cost advantage of planar scanning array SARs while offering a broader viewing angle. However, the significant disparity in spatial sampling density across different regions of the sampling aperture results in suboptimal imaging performance when employing the classical back-projection algorithm (BPA). To address this issue, we introduce a measurement-fusion imaging algorithm tailored for this system, which involves constructing uniformly sampled sub-apertures and calculating spatial grid weights. This approach significantly enhances image integrity and mitigates artifacts and sidelobes. Experiments demonstrate high-quality imaging with an extended FoV.
本文介绍了一种基于弯曲线阵的宽视场毫米波阵列合成孔径雷达(SAR)成像系统。该系统保留了平面扫描阵列sar的低成本优势,同时提供了更宽的视角。然而,传统的反投影算法(BPA)在不同采样孔径区域的空间采样密度差异较大,导致成像性能不理想。为了解决这个问题,我们引入了一种针对该系统的测量融合成像算法,该算法包括构造均匀采样的子孔径和计算空间网格权重。这种方法显著提高了图像的完整性,减轻了伪影和副瓣。实验证明了扩展视场的高质量成像。
{"title":"Wide Field-of-View MMW SISO-SAR Image Reconstruction Based on Curved Linear Array","authors":"Hao Wu;Fengjiao Gan;Xu Chen","doi":"10.1109/LSP.2025.3635004","DOIUrl":"https://doi.org/10.1109/LSP.2025.3635004","url":null,"abstract":"This letter presents a wide field-of-view (FoV) millimeter-wave array synthetic aperture radar (SAR) imaging system based on curved linear array. The proposed system retains the low-cost advantage of planar scanning array SARs while offering a broader viewing angle. However, the significant disparity in spatial sampling density across different regions of the sampling aperture results in suboptimal imaging performance when employing the classical back-projection algorithm (BPA). To address this issue, we introduce a measurement-fusion imaging algorithm tailored for this system, which involves constructing uniformly sampled sub-apertures and calculating spatial grid weights. This approach significantly enhances image integrity and mitigates artifacts and sidelobes. Experiments demonstrate high-quality imaging with an extended FoV.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4464-4468"},"PeriodicalIF":3.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SPTPCA: Structure-Preserving Tensor Principal Component Analysis for Hyperspectral Dimensionality Reduction 高光谱降维的保结构张量主成分分析
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-21 DOI: 10.1109/LSP.2025.3635010
Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar
Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor $ast _{mathcal {L}}$ -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.
高光谱成像产生具有复杂空间光谱相关性的高维数据,这对降维构成了重大挑战。主成分分析(PCA)将自然多维张量结构扁平化为向量,导致关键空间关系的丢失。现有的张量方法包括Tucker分解和tensor Train (TT),它们提供了低秩近似,但没有将PCA的方差优化框架扩展到张量域。在本文中,我们提出了一种基于广义张量$ast _{mathcal {L}}$ -product框架的保结构张量主成分分析(SPTPCA),这是一种解决这一问题的降维方法。与标准PCA不同,SPTPCA直接对张量表示进行操作,在保持方差优化特性的同时保留自然结构和空间光谱相关性。在Indian Pines数据集上的实验验证表明,在不同组分数量下,MSE降低了7.9-50.0%,PSNR提高了0.35-2.59 dB,建立了一个数学上严格的结构保持高光谱降维框架。
{"title":"SPTPCA: Structure-Preserving Tensor Principal Component Analysis for Hyperspectral Dimensionality Reduction","authors":"Alaa El Ichi;Olga Assainova;Nadine Abdallah Saab;Nesma Settouti;Marwa El Bouz;Mohammed El Amine Bechar","doi":"10.1109/LSP.2025.3635010","DOIUrl":"https://doi.org/10.1109/LSP.2025.3635010","url":null,"abstract":"Hyperspectral imaging generates high-dimensional data with complex spatial-spectral correlations that pose significant dimensionality reduction challenges. Principal Component Analysis (PCA) flattens the natural multidimensional tensor structure into vectors, causing loss of critical spatial relationships. Existing tensor methods including Tucker decomposition and Tensor Train (TT) provide low-rank approximations but do not extend PCA’s variance optimization framework to tensor domains. In this paper, we present Structure-Preserving Tensor Principal Component Analysis (SPTPCA), a dimensionality reduction method based on the generalized tensor <inline-formula><tex-math>$ast _{mathcal {L}}$</tex-math></inline-formula> -product framework that addresses this gap. Unlike standard PCA, SPTPCA operates directly on tensor representations, preserving natural structure and spatial-spectral correlations while maintaining variance optimization properties. Experimental validation on the Indian Pines dataset demonstrates MSE reductions of 7.9–50.0% and PSNR improvements of 0.35–2.59 dB across different numbers of components, establishing a mathematically rigorous framework for structure-preserving hyperspectral dimensionality reduction.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4469-4472"},"PeriodicalIF":3.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612077","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Zero-Shot Interpretable Image Steganalysis for Invertible Image Hiding 零镜头可解释图像隐写分析的可逆图像隐藏
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3633618
Hao Wang;Yiming Yao;Yaguang Xie;Tong Qiao;Zhidong Zhao
Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.
图像隐写分析以检测隐藏在图像中的秘密信息为目的,已成为评估隐写方法安全性的重要手段,尤其是新兴的可逆图像隐藏方法。然而,以往的研究仅仅将输入图像分为两类(隐写或覆盖),并且通常在训练数据和测试数据必须遵循相似分布的约束下进行隐写分析,从而阻碍了其在现实场景中的应用。为了克服这些缺点,我们提出了一种新的可解释的图像隐写分析框架,该框架针对具有挑战性的零镜头设置下的可逆图像隐藏方案量身定制。具体来说,我们将图像隐藏、显示和隐写分析集成到一个统一的框架中,赋予隐写分析组件恢复嵌入在隐写图像中的秘密信息的能力。此外,我们阐述了一种简单而有效的残差增强策略,用于生成隐写图像,以进一步增强隐写分析器在跨数据集和跨架构场景下的泛化能力。在基准数据集上的大量实验表明,我们提出的方法明显优于现有的可逆图像隐藏方案的隐写分析技术。
{"title":"Zero-Shot Interpretable Image Steganalysis for Invertible Image Hiding","authors":"Hao Wang;Yiming Yao;Yaguang Xie;Tong Qiao;Zhidong Zhao","doi":"10.1109/LSP.2025.3633618","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633618","url":null,"abstract":"Image steganalysis, which aims at detecting secret information concealed within images, has become a critical countermeasure for assessing the security of steganography methods, especially the emerging invertible image hiding approaches. However, prior studies merely classify input images into two categories (i.e., stego or cover) and typically conduct steganalysis under the constraint that training and testing data must follow similar distribution, thereby hindering their application in real-world scenarios. To overcome these shortcomings, we propose a novel interpretable image steganalysis framework tailored for invertible image hiding schemes under a challenging zero-shot setting. Specifically, we integrate image hiding, revealing, and steganalysis into a unified framework, endowing the steganalysis component with the capability to recover the secret information embedded in stego images. Additionally, we elaborate a simple yet effective residual augmentation strategy for generating stego images to further enhance the generalizability of the steganalyzer in cross-dataset and cross-architecture scenarios. Extensive experiments on benchmark datasets demonstrate that our proposed approach significantly outperforms the existing steganalysis techniques for invertible image hiding schemes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4434-4438"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612085","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance velocity2dm:钢琴演奏中动态标记预测的上下文建模方法
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3633579
Hyon Kim;Emmanouil Benetos;Xavier Serra
Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., mp, ff) and non-constant DMs (e.g., crescendo, fp). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.
音乐表演中的表达动态是主观的和依赖于上下文的,然而大多数符号模型将动态标记(dm)视为具有固定MIDI速度的静态。本文提出了一种结合MusicXML乐谱信息和演奏MIDI数据的钢琴演奏dm预测方法,该方法采用了一种新的标记化方案和基于roberta的掩码语言模型(MLM)。我们的方法侧重于上下文聚合的MIDI速度和相应的dm,考虑到钢琴家的主观解释。笔记级的特征被序列化并转换成一个符号序列,以预测常数(例如mp, ff)和非常数dm(例如crescendo, fp)。对三个专家性能数据集的评估表明,该模型有效地从上下文注释块中学习动态转换,并泛化到常量标记之外。这是第一个使用上下文序列学习在统一框架中对恒定和非恒定动态进行建模的研究。结果表明,该方法在音乐分析、表演建模和计算机辅助音乐教育等方面具有广阔的应用前景。
{"title":"Velocity2DMs: A Contextual Modeling Approach to Dynamics Marking Prediction in Piano Performance","authors":"Hyon Kim;Emmanouil Benetos;Xavier Serra","doi":"10.1109/LSP.2025.3633579","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633579","url":null,"abstract":"Expressive dynamics in music performance are subjective and context-dependent, yet most symbolic models treat Dynamics Markings (DMs) as static with fixed MIDI velocities. This paper proposes a method for predicting DMs in piano performance by combining MusicXML score information with performance MIDI data through a novel tokenization scheme and an adapted RoBERTa-based Masked Language Model (MLM). Our approach focuses on contextual aggregated MIDI velocities and corresponding DMs, accounting for subjective interpretations of pianists. Note-level features are serialized and translated into a sequence of tokens to predict both constant (e.g., <italic>mp</i>, <italic>ff</i>) and non-constant DMs (e.g., <italic>crescendo</i>, <italic>fp</i>). Evaluation across three expert performance datasets shows that the model effectively learns dynamics transitions from contextual note blocks and generalizes beyond constant markings. This is the first study to model both constant and non-constant dynamics in a unified framework using contextual sequence learning. The results suggest promising applications for expressive music analysis, performance modeling, and computer-assisted music education.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4459-4463"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11250595","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612046","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective 多视点感知的组合分布式学习:最大编码率降低视角
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3633169
Zhuojun Tian;Mehdi Bennis
In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.
在这封信中,我们利用最大编码率降低原理结合子空间基融合,制定了一个多视图感知的组合分布式学习框架。在该算法中,每个智能体对其学习到的子空间进行周期性奇异值分解,并交换截断的基矩阵,在此基础上得到融合子空间。通过引入投影矩阵并最小化输出与其投影之间的距离,将学习到的表示强制到融合子空间。从理论上证明了编码速率变化轨迹是有界的,保证了基融合的一致性。与显示相关子空间和耦合表征的基线相比,数值模拟验证了该算法在保持表征多样性的同时实现了较高的分类精度。
{"title":"Compositional Distributed Learning for Multi-View Perception: A Maximal Coding Rate Reduction Perspective","authors":"Zhuojun Tian;Mehdi Bennis","doi":"10.1109/LSP.2025.3633169","DOIUrl":"https://doi.org/10.1109/LSP.2025.3633169","url":null,"abstract":"In this letter, we formulate a compositional distributed learning framework for multi-view perception by leveraging the maximal coding rate reduction principle combined with subspace basis fusion. In the proposed algorithm, each agent conducts a periodic singular value decomposition on its learned subspaces and exchanges truncated basis matrices, based on which the fused subspaces are obtained. By introducing a projection matrix and minimizing the distance between the outputs and its projection, the learned representations are enforced towards the fused subspaces. It is proved that the trace on the coding-rate change is bounded and the consistency of basis fusion is guaranteed theoretically. Numerical simulations validate that the proposed algorithm achieves high classification accuracy while maintaining representations' diversity, compared to baselines showing correlated subspaces and coupled representations.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4409-4413"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Noisy Sensor Positions Using Noisy Inter-Sensor AOA Measurements 利用噪声传感器间AOA测量改进噪声传感器位置
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-17 DOI: 10.1109/LSP.2025.3634030
Yanbin Zou;Binhan Liao
In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram $acute{text{e}}$ r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.
在本文中,我们研究了利用传感器间到达角(AOA)测量来改善噪声传感器位置的问题,这是高度非线性的。首先,我们提出Cram $acute{text{e}}$ r-Rao下界(CRLB)分析,表明传感器间AOA测量的结合提高了传感器位置的精度。其次,我们提出了两个加权最小二乘(WLS)方法来解决问题。其中一种方法采用了Tikhonov正则化方法,因为所构造的回归量不是列全秩矩阵,而另一种方法(称为改进的WLS解)是由极大似然估计量导出的,避免了正则化因子的选择。最后,仿真结果表明,无论正则化因子的选择如何,改进的WLS解的性能都接近CRLB,并且优于基于正则化的WLS解。
{"title":"Improving Noisy Sensor Positions Using Noisy Inter-Sensor AOA Measurements","authors":"Yanbin Zou;Binhan Liao","doi":"10.1109/LSP.2025.3634030","DOIUrl":"https://doi.org/10.1109/LSP.2025.3634030","url":null,"abstract":"In this article, we investigate the problem of improving noisy sensor positions using inter-sensor angle-of-arrival (AOA) measurements, which is highly non-linear. First, we present the Cram <inline-formula><tex-math>$acute{text{e}}$</tex-math></inline-formula> r-Rao lower bounds (CRLB) analysis to show that the incorporating of inter-sensor AOA measurements refines the accuracy of sensor positions. Second, we proposed two weighted least-squares (WLS) solutions to solve the problem. The one resorts to the Tikhonov regularization method as the formulated regressor is not a column full rank matrix, and the other one (called improved WLS solution) derived from the maximum likelihood estimator, avoids choosing regularization factor. Finally, simulation results show that the performance of the improved WLS solution is close to the CRLB and better than the regularization-based WLS solution irrespective of the choice of regularization factor.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4449-4453"},"PeriodicalIF":3.9,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1