首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
MFPD: Mamba-Driven Feature Pyramid Decoding for Underwater Object Detection MFPD:用于水下目标检测的曼巴驱动特征金字塔解码
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639347
Yiteng Guo;Junpeng Xu;Jiali Wang;Wenyi Zhao;Weidong Zhang
Underwater object detection suffers from limited long-range dependency modeling, fine-grained feature representation, and noise suppression, resulting in blurred boundaries, frequent missed detections, and reduced robustness. To address these challenges, we propose the Mamba-Driven Feature Pyramid Decoding framework, which employs a parallel Feature Pyramid Network and Path Aggregation Network collaborative pathway to enhance semantic and geometric features. A lightweight Mamba Block models long-range dependencies, while an Adaptive Sparse Self-Attention module highlights discriminative targets and suppresses noise. Together, these components improve feature representation and robustness. Experiments on two publicly available underwater datasets demonstrate that MFPD significantly outperforms existing methods, validating its effectiveness in complex underwater environments. The code is publicly available at: https://github.com/YitengGuo/MFPD
水下目标检测受限于有限的远程依赖建模、细粒度特征表示和噪声抑制,导致边界模糊、经常遗漏检测和鲁棒性降低。为了解决这些问题,我们提出了mamba驱动的特征金字塔解码框架,该框架采用并行的特征金字塔网络和路径聚合网络协同路径来增强语义和几何特征。轻量级的Mamba块建模远程依赖关系,而自适应稀疏自注意模块突出区分目标并抑制噪声。这些组件一起改进了特征表示和鲁棒性。在两个公开的水下数据集上进行的实验表明,MFPD显著优于现有方法,验证了其在复杂水下环境中的有效性。该代码可在https://github.com/YitengGuo/MFPD公开获取
{"title":"MFPD: Mamba-Driven Feature Pyramid Decoding for Underwater Object Detection","authors":"Yiteng Guo;Junpeng Xu;Jiali Wang;Wenyi Zhao;Weidong Zhang","doi":"10.1109/LSP.2025.3639347","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639347","url":null,"abstract":"Underwater object detection suffers from limited long-range dependency modeling, fine-grained feature representation, and noise suppression, resulting in blurred boundaries, frequent missed detections, and reduced robustness. To address these challenges, we propose the Mamba-Driven Feature Pyramid Decoding framework, which employs a parallel Feature Pyramid Network and Path Aggregation Network collaborative pathway to enhance semantic and geometric features. A lightweight Mamba Block models long-range dependencies, while an Adaptive Sparse Self-Attention module highlights discriminative targets and suppresses noise. Together, these components improve feature representation and robustness. Experiments on two publicly available underwater datasets demonstrate that MFPD significantly outperforms existing methods, validating its effectiveness in complex underwater environments. The code is publicly available at: <uri>https://github.com/YitengGuo/MFPD</uri>","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"141-145"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing Missing Data in Thermal Power Plant Monitoring With Hybrid Attention Time Series Imputation 用混合关注时间序列插值解决火电厂监测数据缺失问题
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639375
Liusong Huang;Adam Amril bin Jaharadak;Nor Izzati Ahmad;Jie Wang;Dalin Zhang
Thermal power plants rely on extensive sensor networks to monitor key operational parameters, yet harsh industrial environments often lead to incomplete data characterized by significant noise, complex physical dependencies, and abrupt state transitions. This impedes accurate monitoring and predictive analyses. To address these domain-specific challenges, we propose a novel Hybrid Multi-Head Attention (HybridMHA) model for time series imputation. The core novelty of our approach lies in the synergistic combination of diagonally-masked self-attention and dynamic sparse attention. Specifically, the diagonally-masked component strictly preserves temporal causality to model the sequential evolution of plant states, while the dynamic sparse component selectively identifies critical cross-variable dependencies, effectively filtering out sensor noise. This tailored design enables the model to robustly capture sparse physical inter-dependencies even during abrupt operational shifts.Using a real-world dataset from a thermal power plant, our model demonstrates statistically significant improvements, outperforming existing methods by 10%–20% on key metrics. Further validation on a public benchmark dataset confirms its generalizability. These findings highlight the model's potential for robust real-time monitoring in complex industrial applications.
火力发电厂依靠广泛的传感器网络来监测关键的运行参数,然而恶劣的工业环境往往导致数据不完整,其特征是显著的噪声、复杂的物理依赖关系和突然的状态转换。这阻碍了准确的监测和预测分析。为了解决这些特定领域的挑战,我们提出了一种新的混合多头注意(HybridMHA)模型用于时间序列imputation。该方法的核心新颖之处在于对角掩蔽自注意和动态稀疏注意的协同结合。其中,对角掩模分量严格保留了植物状态序列演化的时间因果关系,而动态稀疏分量选择性地识别了关键的交叉变量依赖关系,有效滤除了传感器噪声。这种量身定制的设计使模型即使在突然的操作变化期间也能健壮地捕获稀疏的物理相互依赖关系。使用来自火力发电厂的真实数据集,我们的模型显示了统计上显着的改进,在关键指标上优于现有方法10%-20%。在公共基准数据集上的进一步验证证实了其泛化性。这些发现突出了该模型在复杂工业应用中强大的实时监测潜力。
{"title":"Addressing Missing Data in Thermal Power Plant Monitoring With Hybrid Attention Time Series Imputation","authors":"Liusong Huang;Adam Amril bin Jaharadak;Nor Izzati Ahmad;Jie Wang;Dalin Zhang","doi":"10.1109/LSP.2025.3639375","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639375","url":null,"abstract":"Thermal power plants rely on extensive sensor networks to monitor key operational parameters, yet harsh industrial environments often lead to incomplete data characterized by significant noise, complex physical dependencies, and abrupt state transitions. This impedes accurate monitoring and predictive analyses. To address these domain-specific challenges, we propose a novel Hybrid Multi-Head Attention (HybridMHA) model for time series imputation. The core novelty of our approach lies in the synergistic combination of diagonally-masked self-attention and dynamic sparse attention. Specifically, the diagonally-masked component strictly preserves temporal causality to model the sequential evolution of plant states, while the dynamic sparse component selectively identifies critical cross-variable dependencies, effectively filtering out sensor noise. This tailored design enables the model to robustly capture sparse physical inter-dependencies even during abrupt operational shifts.Using a real-world dataset from a thermal power plant, our model demonstrates statistically significant improvements, outperforming existing methods by 10%–20% on key metrics. Further validation on a public benchmark dataset confirms its generalizability. These findings highlight the model's potential for robust real-time monitoring in complex industrial applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"536-540"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continual Deepfake Detection Based on Multi-Perspective Sample Selection Mechanism 基于多视角样本选择机制的连续深度伪造检测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638634
Yu Lian;Xinshan Zhu;Di He;Biao Sun;Ruyi Zhang
The rapid development and malicious use of deepfakes pose a significant crisis of trust. To cope with the evolving deepfake technologies, an increasing number of detection methods adopt the continual learning paradigm, but they often suffer from catastrophic forgetting. Although replay-based methods mitigate this issue by storing a portion of samples from historical tasks, their sample selection strategies usually rely on a single metric, which may lead to the omission of critical samples and consequently hinder the construction of a robust instance memory bank. In this letter, we propose a novel Multi-perspective Sample Selection Mechanism (MSSM) for continual deepfake detection, which jointly evaluates prediction error, temporal instability, and sample diversity to preserve informative and challenging samples in the instance memory bank. Furthermore, we design a Hierarchical Prototype Generation Mechanism (HPGM) that constructs prototypes at both the category and task levels, which are stored in the prototype memory bank. Extensive experiments under two evaluation protocols demonstrate that the proposed method achieves state-of-the-art performance.
深度伪造的快速发展和恶意使用造成了严重的信任危机。为了应对不断发展的深度伪造技术,越来越多的检测方法采用持续学习范式,但它们往往遭受灾难性遗忘。尽管基于重放的方法通过存储历史任务的部分样本来缓解这个问题,但它们的样本选择策略通常依赖于单一指标,这可能导致遗漏关键样本,从而阻碍了健壮的实例记忆库的构建。在这封信中,我们提出了一种新的用于连续深度伪造检测的多视角样本选择机制(MSSM),该机制联合评估预测误差、时间不稳定性和样本多样性,以在实例记忆库中保留信息丰富且具有挑战性的样本。此外,我们设计了一种分层原型生成机制(HPGM),该机制在类别和任务级别构建原型,并将原型存储在原型存储库中。在两种评估协议下进行的大量实验表明,所提出的方法达到了最先进的性能。
{"title":"Continual Deepfake Detection Based on Multi-Perspective Sample Selection Mechanism","authors":"Yu Lian;Xinshan Zhu;Di He;Biao Sun;Ruyi Zhang","doi":"10.1109/LSP.2025.3638634","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638634","url":null,"abstract":"The rapid development and malicious use of deepfakes pose a significant crisis of trust. To cope with the evolving deepfake technologies, an increasing number of detection methods adopt the continual learning paradigm, but they often suffer from catastrophic forgetting. Although replay-based methods mitigate this issue by storing a portion of samples from historical tasks, their sample selection strategies usually rely on a single metric, which may lead to the omission of critical samples and consequently hinder the construction of a robust instance memory bank. In this letter, we propose a novel Multi-perspective Sample Selection Mechanism (MSSM) for continual deepfake detection, which jointly evaluates prediction error, temporal instability, and sample diversity to preserve informative and challenging samples in the instance memory bank. Furthermore, we design a Hierarchical Prototype Generation Mechanism (HPGM) that constructs prototypes at both the category and task levels, which are stored in the prototype memory bank. Extensive experiments under two evaluation protocols demonstrate that the proposed method achieves state-of-the-art performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"131-135"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deployment-Oriented Simulation Framework for Deep Learning-Based Lane Change Prediction 基于深度学习的车道变化预测面向部署的仿真框架
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638676
Luca Forneris;Riccardo Berta;Matteo Fresta;Luca Lazzaroni;Hadise Rojhan;Changjae Oh;Alessandro Pighetti;Hadi Ballout;Fabio Tango;Francesco Bellotti
Advanced driving simulations are increasingly used in automated driving research, yet freely available data and tools remain limited. We present a new open-source framework for synthetic data generation for lane change (LC) intention recognition in highways. Built on the CARLA simulator, it advances the state-of-the-art by providing a 50-driver dataset, a large-scale 3D map, and code for reproducibility and new data creation. The 60 km highway map includes varying curvature radii and straight segments. The codebase supports simulation enhancements (traffic management, vehicle cockpit, engine noise) and Machine Learning (ML) model training and evaluation, including CARLA log post-processing into time series. The dataset contains over 3,400 annotated LC maneuvers with synchronized ego dynamics, road geometry, and traffic context. From an automotive industry perspective, we also assess leading-edge ML models on STM32 microcontrollers using deployability metrics. Unlike prior infrastructure-based works, we estimate time-to-LC from ego-centric data. Results show that a Transformer model yields the lowest regression error, while XGBoost offers the best trade-offs on extremely resource-constrained devices. The entire framework is publicly released to support advancement in automated driving research.
先进的驾驶模拟越来越多地应用于自动驾驶研究,但可免费获得的数据和工具仍然有限。本文提出了一种新的开源框架,用于高速公路变道意图识别的合成数据生成。它建立在CARLA模拟器的基础上,通过提供50个驾驶员数据集、大规模3D地图以及可再现性和新数据创建的代码,推进了最先进的技术。60公里的公路地图包括不同的曲率半径和直线段。代码库支持仿真增强(交通管理、车辆驾驶舱、发动机噪声)和机器学习(ML)模型训练和评估,包括CARLA日志后处理到时间序列。该数据集包含超过3400个带注释的LC机动,具有同步的自我动力学、道路几何和交通背景。从汽车行业的角度来看,我们还使用可部署性指标评估STM32微控制器上的领先ML模型。与之前基于基础设施的工作不同,我们从以自我为中心的数据估计到lc的时间。结果表明,Transformer模型产生最低的回归误差,而XGBoost在资源极度受限的设备上提供了最佳的权衡。整个框架公开发布,以支持自动驾驶研究的进步。
{"title":"A Deployment-Oriented Simulation Framework for Deep Learning-Based Lane Change Prediction","authors":"Luca Forneris;Riccardo Berta;Matteo Fresta;Luca Lazzaroni;Hadise Rojhan;Changjae Oh;Alessandro Pighetti;Hadi Ballout;Fabio Tango;Francesco Bellotti","doi":"10.1109/LSP.2025.3638676","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638676","url":null,"abstract":"Advanced driving simulations are increasingly used in automated driving research, yet freely available data and tools remain limited. We present a new open-source framework for synthetic data generation for lane change (LC) intention recognition in highways. Built on the CARLA simulator, it advances the state-of-the-art by providing a 50-driver dataset, a large-scale 3D map, and code for reproducibility and new data creation. The 60 km highway map includes varying curvature radii and straight segments. The codebase supports simulation enhancements (traffic management, vehicle cockpit, engine noise) and Machine Learning (ML) model training and evaluation, including CARLA log post-processing into time series. The dataset contains over 3,400 annotated LC maneuvers with synchronized ego dynamics, road geometry, and traffic context. From an automotive industry perspective, we also assess leading-edge ML models on STM32 microcontrollers using deployability metrics. Unlike prior infrastructure-based works, we estimate time-to-LC from ego-centric data. Results show that a Transformer model yields the lowest regression error, while XGBoost offers the best trade-offs on extremely resource-constrained devices. The entire framework is publicly released to support advancement in automated driving research.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"136-140"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11271346","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking RGB-T跟踪的跨模态注意引导增强融合网络
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638688
Jun Liu;Wei Ke;Shuai Wang;Da Yang;Hao Sheng
Visual tracking that combines RGB and thermal infrared modalities (RGB-T) aims to utilize the useful information of each modality to achieve more robust object localization. Most existing tracking methods based on convolutional neural networks (CNNs) and Transformers emphasize integrating multi-modal features through cross-modal attention, but ignore the potential exploitability of complementary information learned by cross-modal attention for enhancing modal features. In this paper, we propose a novel hierarchical progressive fusion network based on cross-modal attention guided enhancement for RGB-T tracking. Specifically, the complementary information generated by cross-modal attention implicitly reflects the consistent regions of interest of important information between different modalities, which is used to enhance modal features in a targeted manner. In addition, a modal feature refinement module and a fusion module are designed based on dynamic routing to perform noise suppression and adaptive integration on the enhanced multi-modal features. Extensive experiments on GTOT, RGBT234, LasHeR and VTUAV show that our method has competitive performance compared with recent state-of-the-art methods.
结合RGB和热红外模态(RGB- t)的视觉跟踪旨在利用每种模态的有用信息来实现更鲁棒的目标定位。现有的基于卷积神经网络(cnn)和Transformers的跟踪方法大多强调通过跨模态注意集成多模态特征,但忽略了利用跨模态注意学习到的互补信息来增强模态特征的潜力。本文提出了一种基于跨模态注意引导增强的分层递进融合网络,用于RGB-T跟踪。具体而言,跨模态注意产生的互补信息隐含地反映了不同模态之间重要信息的一致感兴趣区域,用于有针对性地增强模态特征。此外,设计了基于动态路由的模态特征细化模块和融合模块,对增强的多模态特征进行噪声抑制和自适应集成。在GTOT, RGBT234, LasHeR和VTUAV上进行的大量实验表明,与最近最先进的方法相比,我们的方法具有竞争力的性能。
{"title":"Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking","authors":"Jun Liu;Wei Ke;Shuai Wang;Da Yang;Hao Sheng","doi":"10.1109/LSP.2025.3638688","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638688","url":null,"abstract":"Visual tracking that combines RGB and thermal infrared modalities (RGB-T) aims to utilize the useful information of each modality to achieve more robust object localization. Most existing tracking methods based on convolutional neural networks (CNNs) and Transformers emphasize integrating multi-modal features through cross-modal attention, but ignore the potential exploitability of complementary information learned by cross-modal attention for enhancing modal features. In this paper, we propose a novel hierarchical progressive fusion network based on cross-modal attention guided enhancement for RGB-T tracking. Specifically, the complementary information generated by cross-modal attention implicitly reflects the consistent regions of interest of important information between different modalities, which is used to enhance modal features in a targeted manner. In addition, a modal feature refinement module and a fusion module are designed based on dynamic routing to perform noise suppression and adaptive integration on the enhanced multi-modal features. Extensive experiments on GTOT, RGBT234, LasHeR and VTUAV show that our method has competitive performance compared with recent state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"276-280"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-Machine Vision Collaboration Based Rate Control Scheme for VVC 基于人机视觉协同的VVC速率控制方案
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638597
Zeming Zhao;Xiaohai He;Xiaodong Bi;Hong Yang;Shuhua Xiong
With the widespread adoption of smart terminals, compressed video is increasingly utilized in the receiver for purposes beyond human vision. Conventional video coding standards are optimized primarily for human visual perception and often fail to accommodate the distinct requirements of machine vision. To simultaneously satisfy the perceptual needs and the analytical demands, we propose a novel rate control scheme based on Versatile Video Coding (VVC) for human-machine vision collaborative video coding. Specifically, we employ the You Only Look Once (YOLO) network to extract task-relevant features for machine vision and formulate a detection feature weight based on these features. Leveraging the feature weight and the spatial location information of Coding Tree Units (CTUs), we propose a region classification algorithm that partitions a frame into machine vision-sensitive region (MVSR) and machine vision non-sensitive region (MVNR). Subsequently, we develop an enhanced and refined bit allocation strategy that performs region-level and CTU-level bit allocation, thereby improving the precision and effectiveness of the rate control. Experimental results demonstrate that the scheme improves machine task detection accuracy while preserving perceptual quality for human observers, effectively meeting the dual encoding requirements of human and machine vision.
随着智能终端的广泛采用,压缩视频越来越多地在接收器中用于超越人类视觉的目的。传统的视频编码标准主要针对人类视觉感知进行优化,往往无法适应机器视觉的独特要求。为了同时满足感知需求和分析需求,提出了一种基于通用视频编码(VVC)的人机视觉协同视频编码速率控制方案。具体来说,我们使用You Only Look Once (YOLO)网络来提取机器视觉的任务相关特征,并基于这些特征制定检测特征权重。利用特征权值和编码树单元(ctu)的空间位置信息,提出了一种将帧划分为机器视觉敏感区(MVSR)和机器视觉非敏感区(MVNR)的区域分类算法。随后,我们开发了一种增强和改进的比特分配策略,可以执行区域级和ctu级的比特分配,从而提高了速率控制的精度和有效性。实验结果表明,该方案在提高机器任务检测精度的同时,保持了人类观察者的感知质量,有效地满足了人类和机器视觉的双重编码要求。
{"title":"Human-Machine Vision Collaboration Based Rate Control Scheme for VVC","authors":"Zeming Zhao;Xiaohai He;Xiaodong Bi;Hong Yang;Shuhua Xiong","doi":"10.1109/LSP.2025.3638597","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638597","url":null,"abstract":"With the widespread adoption of smart terminals, compressed video is increasingly utilized in the receiver for purposes beyond human vision. Conventional video coding standards are optimized primarily for human visual perception and often fail to accommodate the distinct requirements of machine vision. To simultaneously satisfy the perceptual needs and the analytical demands, we propose a novel rate control scheme based on Versatile Video Coding (VVC) for human-machine vision collaborative video coding. Specifically, we employ the You Only Look Once (YOLO) network to extract task-relevant features for machine vision and formulate a detection feature weight based on these features. Leveraging the feature weight and the spatial location information of Coding Tree Units (CTUs), we propose a region classification algorithm that partitions a frame into machine vision-sensitive region (MVSR) and machine vision non-sensitive region (MVNR). Subsequently, we develop an enhanced and refined bit allocation strategy that performs region-level and CTU-level bit allocation, thereby improving the precision and effectiveness of the rate control. Experimental results demonstrate that the scheme improves machine task detection accuracy while preserving perceptual quality for human observers, effectively meeting the dual encoding requirements of human and machine vision.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"126-130"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight Attention-Enhanced Multi-Scale Detector for Robust Small Object Detection in UAV 轻型注意力增强多尺度检测器用于无人机小目标鲁棒检测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-27 DOI: 10.1109/LSP.2025.3637728
Haitao Yang;Yingzhuo Xiong;Dongliang Zhang;Xiai Yan;Xuran Hu
Small-object detection in uncrewed aerial vehicle (UAV) imagery remains challenging due to limited resolution, complex backgrounds, scale variation, and strict real-time constraints. Existing lightweight detectors often struggle to retain fine details while ensuring efficiency, reducing robustness in UAV applications. This letter proposes a lightweight multi-scale framework integrating Partial Dilated Convolution (PDC), a Triplet Focus Attention Module (TFAM), a Multi-Scale Feature Fusion (MSFF) branch, and a bidirectional BiFPN. PDC enlarges receptive field diversity while preserving local texture, TFAM jointly enhances spatial, channel, and coordinate attention, and MSFF with BiFPN achieves efficient cross-scale fusion. On VisDrone2019, our model reaches 52.7% mAP50 with 6.01M parameters and 148 FPS, and on HIT-UAV yields 85.2% mAP50 and 155 FPS, surpassing state-of-the-art UAV detectors in accuracy and efficiency. Visualization further verifies robustness under low-light, dense, and scale-varying UAV scenes.
由于分辨率有限、背景复杂、尺度变化和严格的实时限制,无人机图像中的小目标检测仍然具有挑战性。现有的轻型探测器往往难以在保证效率的同时保留细节,从而降低了无人机应用的稳健性。本文提出了一个轻量级的多尺度框架,该框架集成了部分扩展卷积(PDC)、三重焦点注意模块(TFAM)、多尺度特征融合(MSFF)分支和双向BiFPN。PDC在保留局部纹理的同时扩大了感受野多样性,TFAM共同增强了空间、通道和协调注意力,MSFF与BiFPN实现了高效的跨尺度融合。在VisDrone2019上,我们的模型在6.01M参数和148 FPS下达到52.7%的mAP50,在HIT-UAV上达到85.2%的mAP50和155 FPS,在精度和效率上超过了最先进的无人机探测器。可视化进一步验证了在低光照、密集和尺度变化的无人机场景下的鲁棒性。
{"title":"Lightweight Attention-Enhanced Multi-Scale Detector for Robust Small Object Detection in UAV","authors":"Haitao Yang;Yingzhuo Xiong;Dongliang Zhang;Xiai Yan;Xuran Hu","doi":"10.1109/LSP.2025.3637728","DOIUrl":"https://doi.org/10.1109/LSP.2025.3637728","url":null,"abstract":"Small-object detection in uncrewed aerial vehicle (UAV) imagery remains challenging due to limited resolution, complex backgrounds, scale variation, and strict real-time constraints. Existing lightweight detectors often struggle to retain fine details while ensuring efficiency, reducing robustness in UAV applications. This letter proposes a lightweight multi-scale framework integrating Partial Dilated Convolution (PDC), a Triplet Focus Attention Module (TFAM), a Multi-Scale Feature Fusion (MSFF) branch, and a bidirectional BiFPN. PDC enlarges receptive field diversity while preserving local texture, TFAM jointly enhances spatial, channel, and coordinate attention, and MSFF with BiFPN achieves efficient cross-scale fusion. On VisDrone2019, our model reaches 52.7% mAP50 with 6.01M parameters and 148 FPS, and on HIT-UAV yields 85.2% mAP50 and 155 FPS, surpassing state-of-the-art UAV detectors in accuracy and efficiency. Visualization further verifies robustness under low-light, dense, and scale-varying UAV scenes.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"271-275"},"PeriodicalIF":3.9,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830859","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing In-Context Learning for Efficient Full Conformal Prediction 优化上下文学习,实现高效的全保形预测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-24 DOI: 10.1109/LSP.2025.3636762
Weicao Deng;Sangwoo Park;Min Li;Osvaldo Simeone
Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent approaches based on meta-learning or in-context learning (ICL) partially mitigate these drawbacks. However, they rely on training procedures not specifically tailored to CP, which may yield large prediction sets. We introduce an efficient FCP framework, termed enhanced ICL-based FCP (E-ICL+FCP), which employs a permutation-invariant Transformer-based ICL model trained with a CP-aware loss. By simulating the multiple retrained models required by FCP without actual retraining, E-ICL+FCP preserves coverage while markedly reducing both inefficiency and computational overhead. Experiments on synthetic and real tasks demonstrate that E-ICL+FCP attains superior efficiency-coverage trade-offs compared to existing SCP and FCP baselines.
可靠的不确定性量化是值得信赖的人工智能的关键。共形预测(CP)提供了具有无分布覆盖保证的预测集,但是它的两个主要变体面临互补的限制。分割CP (SCP)由于数据集分区而导致数据效率低下,而完全CP (FCP)以过高的再训练复杂性为代价提高数据效率。最近基于元学习或上下文学习(ICL)的方法部分缓解了这些缺点。然而,它们依赖于训练程序,而不是专门为CP量身定制的,这可能会产生大量的预测集。我们介绍了一种高效的FCP框架,称为增强型基于ICL的FCP (E-ICL+FCP),它采用了一种基于置换不变变压器的ICL模型,该模型经过了cp感知损失的训练。通过模拟FCP所需的多个再训练模型而无需实际的再训练,E-ICL+FCP保留了覆盖范围,同时显着降低了效率和计算开销。合成任务和实际任务的实验表明,与现有的SCP和FCP基线相比,E-ICL+FCP获得了更好的效率-覆盖率权衡。
{"title":"Optimizing In-Context Learning for Efficient Full Conformal Prediction","authors":"Weicao Deng;Sangwoo Park;Min Li;Osvaldo Simeone","doi":"10.1109/LSP.2025.3636762","DOIUrl":"https://doi.org/10.1109/LSP.2025.3636762","url":null,"abstract":"Reliable uncertainty quantification is critical for trustworthy AI. Conformal Prediction (CP) provides prediction sets with distribution-free coverage guarantees, but its two main variants face complementary limitations. Split CP (SCP) suffers from data inefficiency due to dataset partitioning, while full CP (FCP) improves data efficiency at the cost of prohibitive retraining complexity. Recent approaches based on meta-learning or in-context learning (ICL) partially mitigate these drawbacks. However, they rely on training procedures not specifically tailored to CP, which may yield large prediction sets. We introduce an efficient FCP framework, termed enhanced ICL-based FCP (E-ICL+FCP), which employs a permutation-invariant Transformer-based ICL model trained with a CP-aware loss. By simulating the multiple retrained models required by FCP without actual retraining, E-ICL+FCP preserves coverage while markedly reducing both inefficiency and computational overhead. Experiments on synthetic and real tasks demonstrate that E-ICL+FCP attains superior efficiency-coverage trade-offs compared to existing SCP and FCP baselines.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"311-315"},"PeriodicalIF":3.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886641","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study on an Intelligent Screening Method for Polycystic Ovary Syndrome Based on Deep PhysicsInformed Neural Network 基于深度物理信息神经网络的多囊卵巢综合征智能筛查方法研究
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-24 DOI: 10.1109/LSP.2025.3636719
Yu Gong;Danji Wang;Chao Wu;Man Ni;Shengli Li;Yang Liu;Ziyuan Shen;Zhidong Su;Xiaoxiao Liu;Huiping Zhou;Huijie Zhang
Polycystic ovary syndrome (PCOS) not only causes anovulation in women but also severely affects their physical and mental health. Clinically, diagnostic delays often cause patients to miss optimal treatment windows. As a non-invasive detection technique, Raman spectroscopy has been used for screening this disease. In this letter, the Raman spectra of follicular fluid and plasma from women which PCOS are examined using a deep physics-informed neural network. The results demonstrate that by incorporating physical priors and integrating multi-domain spectral information, the proposed method achieves accuracies of 96.25$%$ in detecting PCOS from plasma samples and 90.00$%$ from follicular fluid samples.
多囊卵巢综合征(PCOS)不仅会导致女性排卵障碍,还会严重影响女性的身心健康。临床上,诊断延误常常导致患者错过最佳治疗时机。作为一种无创检测技术,拉曼光谱已被用于该病的筛查。在这封信中,使用深度物理信息神经网络检查了多囊卵巢综合征妇女的卵泡液和血浆的拉曼光谱。结果表明,该方法结合物理先验和多域光谱信息,对血浆样品和卵泡液样品的PCOS检测准确率分别达到96.25美元和90.00美元。
{"title":"Study on an Intelligent Screening Method for Polycystic Ovary Syndrome Based on Deep PhysicsInformed Neural Network","authors":"Yu Gong;Danji Wang;Chao Wu;Man Ni;Shengli Li;Yang Liu;Ziyuan Shen;Zhidong Su;Xiaoxiao Liu;Huiping Zhou;Huijie Zhang","doi":"10.1109/LSP.2025.3636719","DOIUrl":"https://doi.org/10.1109/LSP.2025.3636719","url":null,"abstract":"Polycystic ovary syndrome (PCOS) not only causes anovulation in women but also severely affects their physical and mental health. Clinically, diagnostic delays often cause patients to miss optimal treatment windows. As a non-invasive detection technique, Raman spectroscopy has been used for screening this disease. In this letter, the Raman spectra of follicular fluid and plasma from women which PCOS are examined using a deep physics-informed neural network. The results demonstrate that by incorporating physical priors and integrating multi-domain spectral information, the proposed method achieves accuracies of 96.25<inline-formula><tex-math>$%$</tex-math></inline-formula> in detecting PCOS from plasma samples and 90.00<inline-formula><tex-math>$%$</tex-math></inline-formula> from follicular fluid samples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"266-270"},"PeriodicalIF":3.9,"publicationDate":"2025-11-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Wide Field-of-View MMW SISO-SAR Image Reconstruction Based on Curved Linear Array 基于弯曲线阵的毫米波宽视场SISO-SAR图像重构
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-21 DOI: 10.1109/LSP.2025.3635004
Hao Wu;Fengjiao Gan;Xu Chen
This letter presents a wide field-of-view (FoV) millimeter-wave array synthetic aperture radar (SAR) imaging system based on curved linear array. The proposed system retains the low-cost advantage of planar scanning array SARs while offering a broader viewing angle. However, the significant disparity in spatial sampling density across different regions of the sampling aperture results in suboptimal imaging performance when employing the classical back-projection algorithm (BPA). To address this issue, we introduce a measurement-fusion imaging algorithm tailored for this system, which involves constructing uniformly sampled sub-apertures and calculating spatial grid weights. This approach significantly enhances image integrity and mitigates artifacts and sidelobes. Experiments demonstrate high-quality imaging with an extended FoV.
本文介绍了一种基于弯曲线阵的宽视场毫米波阵列合成孔径雷达(SAR)成像系统。该系统保留了平面扫描阵列sar的低成本优势,同时提供了更宽的视角。然而,传统的反投影算法(BPA)在不同采样孔径区域的空间采样密度差异较大,导致成像性能不理想。为了解决这个问题,我们引入了一种针对该系统的测量融合成像算法,该算法包括构造均匀采样的子孔径和计算空间网格权重。这种方法显著提高了图像的完整性,减轻了伪影和副瓣。实验证明了扩展视场的高质量成像。
{"title":"Wide Field-of-View MMW SISO-SAR Image Reconstruction Based on Curved Linear Array","authors":"Hao Wu;Fengjiao Gan;Xu Chen","doi":"10.1109/LSP.2025.3635004","DOIUrl":"https://doi.org/10.1109/LSP.2025.3635004","url":null,"abstract":"This letter presents a wide field-of-view (FoV) millimeter-wave array synthetic aperture radar (SAR) imaging system based on curved linear array. The proposed system retains the low-cost advantage of planar scanning array SARs while offering a broader viewing angle. However, the significant disparity in spatial sampling density across different regions of the sampling aperture results in suboptimal imaging performance when employing the classical back-projection algorithm (BPA). To address this issue, we introduce a measurement-fusion imaging algorithm tailored for this system, which involves constructing uniformly sampled sub-apertures and calculating spatial grid weights. This approach significantly enhances image integrity and mitigates artifacts and sidelobes. Experiments demonstrate high-quality imaging with an extended FoV.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4464-4468"},"PeriodicalIF":3.9,"publicationDate":"2025-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145612121","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1