首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
TPEech: Target Speaker Extraction and Noise Suppression With Historical Dialogue Text Cues 基于历史对话文本线索的目标说话人提取和噪声抑制
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-04 DOI: 10.1109/LSP.2025.3640519
Ziyang Jiang;Xueyan Chen;Shuai Wang;Xinyuan Qian;Haizhou Li
In complex multi-speaker scenarios with significant speaker overlap and background noise, extracting the target speaker's speech remains a major challenge. This capability is crucial for dialogue-based applications such as AI speech assistants, where downstream tasks such as speech recognition depend on clean speech. A potential solution to address these challenges is Target Speaker Extraction (TSE), which leverages auxiliary information to extract target speech from mixed and noisy speech, thus overcoming the limitations of Speech Separation (SS) and Speech Enhancement (SE). In particular, we propose a multi-modal TSE network, namely Text Prompt Extractor with echo cue block (TPEech), which uses historical dialogue text as cues for extraction and incorporates the echo cue block (ECB) to further exploit this cue and enhance TSE performance. The experiments show the excellent extraction and denoising capabilities of our proposed network. TPEech achieves an SI-SDRi of 9.632 dB, an SDR of 13.045 dB, a PESQ of 2.814, and a STOI of 0.885, outperforming competitive baselines. Additionally, we experimentally verify that TPEech is robust against semantically incomplete textual prompts.
在具有明显的说话人重叠和背景噪声的复杂多说话人场景中,目标说话人的语音提取仍然是一个主要的挑战。这种能力对于基于对话的应用程序(如AI语音助手)至关重要,其中语音识别等下游任务依赖于干净的语音。目标说话人提取(TSE)是一种潜在的解决方案,它利用辅助信息从混合和噪声语音中提取目标语音,从而克服了语音分离(SS)和语音增强(SE)的局限性。特别地,我们提出了一个多模态的TSE网络,即带有回声提示块的文本提示提取器(TPEech),它使用历史对话文本作为提取线索,并结合回声提示块(ECB)进一步利用该线索并提高TSE的性能。实验结果表明,该网络具有良好的提取和去噪能力。TPEech的SI-SDRi为9.632 dB, SDR为13.045 dB, PESQ为2.814,STOI为0.885,优于竞争基准。此外,我们通过实验验证了TPEech对语义不完整的文本提示具有鲁棒性。
{"title":"TPEech: Target Speaker Extraction and Noise Suppression With Historical Dialogue Text Cues","authors":"Ziyang Jiang;Xueyan Chen;Shuai Wang;Xinyuan Qian;Haizhou Li","doi":"10.1109/LSP.2025.3640519","DOIUrl":"https://doi.org/10.1109/LSP.2025.3640519","url":null,"abstract":"In complex multi-speaker scenarios with significant speaker overlap and background noise, extracting the target speaker's speech remains a major challenge. This capability is crucial for dialogue-based applications such as AI speech assistants, where downstream tasks such as speech recognition depend on clean speech. A potential solution to address these challenges is Target Speaker Extraction (TSE), which leverages auxiliary information to extract target speech from mixed and noisy speech, thus overcoming the limitations of Speech Separation (SS) and Speech Enhancement (SE). In particular, we propose a multi-modal TSE network, namely Text Prompt Extractor with echo cue block (TPEech), which uses historical dialogue text as cues for extraction and incorporates the echo cue block (ECB) to further exploit this cue and enhance TSE performance. The experiments show the excellent extraction and denoising capabilities of our proposed network. TPEech achieves an SI-SDRi of 9.632 dB, an SDR of 13.045 dB, a PESQ of 2.814, and a STOI of 0.885, outperforming competitive baselines. Additionally, we experimentally verify that TPEech is robust against semantically incomplete textual prompts.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"351-355"},"PeriodicalIF":3.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Multi-Scale PoseNet for Self-Supervised Monocular Depth Estimation 自监督单目深度估计的增强多尺度PoseNet
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639361
Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao
Monocular depth estimation is essential for 3D perception in applications such as autonomous driving and robotics. Self-supervised methods avoid depth labels but often rely on shallow pose networks with weak temporal modeling, leading to unstable predictions. We propose EMSP-Net, an Enhanced Multi-Scale PoseNet for self-supervised monocular depth estimation. It introduces a hierarchical feature fusion encoder, a temporal attention-context decoder, and a pose consistency loss to jointly improve feature extraction, temporal stability, and geometric constraints. On the KITTI dataset, EMSP-Net achieved an absolute relative error of 0.105 and a squared relative error of 0.708. In the Make3D cross-domain test, its strong robustness was further demonstrated.
单目深度估计对于自动驾驶和机器人等应用中的3D感知至关重要。自监督方法避免了深度标签,但往往依赖于具有弱时间建模的浅姿态网络,导致预测不稳定。我们提出了EMSP-Net,一种用于自监督单目深度估计的增强型多尺度PoseNet。它引入了一个分层特征融合编码器、一个时间注意-上下文解码器和一个姿态一致性损失来共同改进特征提取、时间稳定性和几何约束。在KITTI数据集上,EMSP-Net的绝对相对误差为0.105,平方相对误差为0.708。在Make3D跨域测试中,进一步验证了该算法的鲁棒性。
{"title":"Enhanced Multi-Scale PoseNet for Self-Supervised Monocular Depth Estimation","authors":"Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao","doi":"10.1109/LSP.2025.3639361","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639361","url":null,"abstract":"Monocular depth estimation is essential for 3D perception in applications such as autonomous driving and robotics. Self-supervised methods avoid depth labels but often rely on shallow pose networks with weak temporal modeling, leading to unstable predictions. We propose EMSP-Net, an Enhanced Multi-Scale PoseNet for self-supervised monocular depth estimation. It introduces a hierarchical feature fusion encoder, a temporal attention-context decoder, and a pose consistency loss to jointly improve feature extraction, temporal stability, and geometric constraints. On the KITTI dataset, EMSP-Net achieved an absolute relative error of 0.105 and a squared relative error of 0.708. In the Make3D cross-domain test, its strong robustness was further demonstrated.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"316-320"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-Driven Medical Image Segmentation With LLM Semantic Bridge and LLM Prompt Bridge 基于LLM语义桥和LLM提示桥的文本驱动医学图像分割
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639352
Zhengyi Liu;Jiali Wu;Xianyong Fang;Linbo Wang
Text-driven medical image segmentation aims to accurately segment pathological regions in medical images based on textual descriptions. Existing methods face two major challenges: (a) The significant modality heterogeneity between textual and visual features leads to inefficient cross-modal feature alignment; (b) The insufficient utilization of medical shared knowledge restricts semantic understanding. To address these challenges, two large language model (LLM) bridges are constructed. LLM semantic bridge leverages the sequential modeling capability of a frozen LLM to reorganize visual features into semantically coherent units that possess linguistic logic, thereby effectively bridging vision and language. The LLM prompt bridge appends learnable prompts, which encode medical shared knowledge from the LLM, to text embeddings, thereby effectively bridging case-specificity and medical consensus knowledge. Experimental results show the predominant performance due to LLM participation.
文本驱动医学图像分割旨在基于文本描述准确分割医学图像中的病理区域。现有方法面临两个主要挑战:(a)文本和视觉特征之间的显著模态异质性导致跨模态特征对齐效率低下;(b)医学共享知识利用不足限制了语义理解。为了应对这些挑战,构建了两个大型语言模型(LLM)桥梁。LLM语义桥利用冻结LLM的顺序建模能力,将视觉特征重新组织成具有语言逻辑的语义连贯单元,从而有效地桥接视觉和语言。法学硕士提示桥将可学习的提示附加到文本嵌入中,这些提示编码来自法学硕士的医学共享知识,从而有效地连接病例特异性和医学共识知识。实验结果表明,LLM的参与对系统性能有显著影响。
{"title":"Text-Driven Medical Image Segmentation With LLM Semantic Bridge and LLM Prompt Bridge","authors":"Zhengyi Liu;Jiali Wu;Xianyong Fang;Linbo Wang","doi":"10.1109/LSP.2025.3639352","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639352","url":null,"abstract":"Text-driven medical image segmentation aims to accurately segment pathological regions in medical images based on textual descriptions. Existing methods face two major challenges: (a) The significant modality heterogeneity between textual and visual features leads to inefficient cross-modal feature alignment; (b) The insufficient utilization of medical shared knowledge restricts semantic understanding. To address these challenges, two large language model (LLM) bridges are constructed. LLM semantic bridge leverages the sequential modeling capability of a frozen LLM to reorganize visual features into semantically coherent units that possess linguistic logic, thereby effectively bridging vision and language. The LLM prompt bridge appends learnable prompts, which encode medical shared knowledge from the LLM, to text embeddings, thereby effectively bridging case-specificity and medical consensus knowledge. Experimental results show the predominant performance due to LLM participation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"146-150"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Experiment Design for Nonlinear System Identification With Operational Constraints 具有操作约束的非线性系统辨识的自适应实验设计
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639512
Jingwei Hu;Dave Zachariah;Torbjörn Wigren;Petre Stoica
We consider the joint problem of online experiment design and parameter estimation for identifying nonlinear system models, while adhering to system constraints. We utilize a receding horizon approach and propose a new adaptive input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.
在不违背系统约束条件的前提下,考虑在线实验设计和参数估计的联合问题来识别非线性系统模型。我们利用后退视界方法并提出了一种新的自适应输入设计准则,该准则适合于不断更新的参数估计,以及一个新的顺序估计器。我们演示了该方法在操作约束范围内指导系统的同时,在线设计信息实验的能力。
{"title":"Adaptive Experiment Design for Nonlinear System Identification With Operational Constraints","authors":"Jingwei Hu;Dave Zachariah;Torbjörn Wigren;Petre Stoica","doi":"10.1109/LSP.2025.3639512","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639512","url":null,"abstract":"We consider the joint problem of online experiment design and parameter estimation for identifying nonlinear system models, while adhering to system constraints. We utilize a receding horizon approach and propose a new adaptive input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"151-155"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFPD: Mamba-Driven Feature Pyramid Decoding for Underwater Object Detection MFPD:用于水下目标检测的曼巴驱动特征金字塔解码
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639347
Yiteng Guo;Junpeng Xu;Jiali Wang;Wenyi Zhao;Weidong Zhang
Underwater object detection suffers from limited long-range dependency modeling, fine-grained feature representation, and noise suppression, resulting in blurred boundaries, frequent missed detections, and reduced robustness. To address these challenges, we propose the Mamba-Driven Feature Pyramid Decoding framework, which employs a parallel Feature Pyramid Network and Path Aggregation Network collaborative pathway to enhance semantic and geometric features. A lightweight Mamba Block models long-range dependencies, while an Adaptive Sparse Self-Attention module highlights discriminative targets and suppresses noise. Together, these components improve feature representation and robustness. Experiments on two publicly available underwater datasets demonstrate that MFPD significantly outperforms existing methods, validating its effectiveness in complex underwater environments. The code is publicly available at: https://github.com/YitengGuo/MFPD
水下目标检测受限于有限的远程依赖建模、细粒度特征表示和噪声抑制,导致边界模糊、经常遗漏检测和鲁棒性降低。为了解决这些问题,我们提出了mamba驱动的特征金字塔解码框架,该框架采用并行的特征金字塔网络和路径聚合网络协同路径来增强语义和几何特征。轻量级的Mamba块建模远程依赖关系,而自适应稀疏自注意模块突出区分目标并抑制噪声。这些组件一起改进了特征表示和鲁棒性。在两个公开的水下数据集上进行的实验表明,MFPD显著优于现有方法,验证了其在复杂水下环境中的有效性。该代码可在https://github.com/YitengGuo/MFPD公开获取
{"title":"MFPD: Mamba-Driven Feature Pyramid Decoding for Underwater Object Detection","authors":"Yiteng Guo;Junpeng Xu;Jiali Wang;Wenyi Zhao;Weidong Zhang","doi":"10.1109/LSP.2025.3639347","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639347","url":null,"abstract":"Underwater object detection suffers from limited long-range dependency modeling, fine-grained feature representation, and noise suppression, resulting in blurred boundaries, frequent missed detections, and reduced robustness. To address these challenges, we propose the Mamba-Driven Feature Pyramid Decoding framework, which employs a parallel Feature Pyramid Network and Path Aggregation Network collaborative pathway to enhance semantic and geometric features. A lightweight Mamba Block models long-range dependencies, while an Adaptive Sparse Self-Attention module highlights discriminative targets and suppresses noise. Together, these components improve feature representation and robustness. Experiments on two publicly available underwater datasets demonstrate that MFPD significantly outperforms existing methods, validating its effectiveness in complex underwater environments. The code is publicly available at: <uri>https://github.com/YitengGuo/MFPD</uri>","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"141-145"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Addressing Missing Data in Thermal Power Plant Monitoring With Hybrid Attention Time Series Imputation 用混合关注时间序列插值解决火电厂监测数据缺失问题
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639375
Liusong Huang;Adam Amril bin Jaharadak;Nor Izzati Ahmad;Jie Wang;Dalin Zhang
Thermal power plants rely on extensive sensor networks to monitor key operational parameters, yet harsh industrial environments often lead to incomplete data characterized by significant noise, complex physical dependencies, and abrupt state transitions. This impedes accurate monitoring and predictive analyses. To address these domain-specific challenges, we propose a novel Hybrid Multi-Head Attention (HybridMHA) model for time series imputation. The core novelty of our approach lies in the synergistic combination of diagonally-masked self-attention and dynamic sparse attention. Specifically, the diagonally-masked component strictly preserves temporal causality to model the sequential evolution of plant states, while the dynamic sparse component selectively identifies critical cross-variable dependencies, effectively filtering out sensor noise. This tailored design enables the model to robustly capture sparse physical inter-dependencies even during abrupt operational shifts.Using a real-world dataset from a thermal power plant, our model demonstrates statistically significant improvements, outperforming existing methods by 10%–20% on key metrics. Further validation on a public benchmark dataset confirms its generalizability. These findings highlight the model's potential for robust real-time monitoring in complex industrial applications.
火力发电厂依靠广泛的传感器网络来监测关键的运行参数,然而恶劣的工业环境往往导致数据不完整,其特征是显著的噪声、复杂的物理依赖关系和突然的状态转换。这阻碍了准确的监测和预测分析。为了解决这些特定领域的挑战,我们提出了一种新的混合多头注意(HybridMHA)模型用于时间序列imputation。该方法的核心新颖之处在于对角掩蔽自注意和动态稀疏注意的协同结合。其中,对角掩模分量严格保留了植物状态序列演化的时间因果关系,而动态稀疏分量选择性地识别了关键的交叉变量依赖关系,有效滤除了传感器噪声。这种量身定制的设计使模型即使在突然的操作变化期间也能健壮地捕获稀疏的物理相互依赖关系。使用来自火力发电厂的真实数据集,我们的模型显示了统计上显着的改进,在关键指标上优于现有方法10%-20%。在公共基准数据集上的进一步验证证实了其泛化性。这些发现突出了该模型在复杂工业应用中强大的实时监测潜力。
{"title":"Addressing Missing Data in Thermal Power Plant Monitoring With Hybrid Attention Time Series Imputation","authors":"Liusong Huang;Adam Amril bin Jaharadak;Nor Izzati Ahmad;Jie Wang;Dalin Zhang","doi":"10.1109/LSP.2025.3639375","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639375","url":null,"abstract":"Thermal power plants rely on extensive sensor networks to monitor key operational parameters, yet harsh industrial environments often lead to incomplete data characterized by significant noise, complex physical dependencies, and abrupt state transitions. This impedes accurate monitoring and predictive analyses. To address these domain-specific challenges, we propose a novel Hybrid Multi-Head Attention (HybridMHA) model for time series imputation. The core novelty of our approach lies in the synergistic combination of diagonally-masked self-attention and dynamic sparse attention. Specifically, the diagonally-masked component strictly preserves temporal causality to model the sequential evolution of plant states, while the dynamic sparse component selectively identifies critical cross-variable dependencies, effectively filtering out sensor noise. This tailored design enables the model to robustly capture sparse physical inter-dependencies even during abrupt operational shifts.Using a real-world dataset from a thermal power plant, our model demonstrates statistically significant improvements, outperforming existing methods by 10%–20% on key metrics. Further validation on a public benchmark dataset confirms its generalizability. These findings highlight the model's potential for robust real-time monitoring in complex industrial applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"536-540"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146026338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Continual Deepfake Detection Based on Multi-Perspective Sample Selection Mechanism 基于多视角样本选择机制的连续深度伪造检测
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638634
Yu Lian;Xinshan Zhu;Di He;Biao Sun;Ruyi Zhang
The rapid development and malicious use of deepfakes pose a significant crisis of trust. To cope with the evolving deepfake technologies, an increasing number of detection methods adopt the continual learning paradigm, but they often suffer from catastrophic forgetting. Although replay-based methods mitigate this issue by storing a portion of samples from historical tasks, their sample selection strategies usually rely on a single metric, which may lead to the omission of critical samples and consequently hinder the construction of a robust instance memory bank. In this letter, we propose a novel Multi-perspective Sample Selection Mechanism (MSSM) for continual deepfake detection, which jointly evaluates prediction error, temporal instability, and sample diversity to preserve informative and challenging samples in the instance memory bank. Furthermore, we design a Hierarchical Prototype Generation Mechanism (HPGM) that constructs prototypes at both the category and task levels, which are stored in the prototype memory bank. Extensive experiments under two evaluation protocols demonstrate that the proposed method achieves state-of-the-art performance.
深度伪造的快速发展和恶意使用造成了严重的信任危机。为了应对不断发展的深度伪造技术,越来越多的检测方法采用持续学习范式,但它们往往遭受灾难性遗忘。尽管基于重放的方法通过存储历史任务的部分样本来缓解这个问题,但它们的样本选择策略通常依赖于单一指标,这可能导致遗漏关键样本,从而阻碍了健壮的实例记忆库的构建。在这封信中,我们提出了一种新的用于连续深度伪造检测的多视角样本选择机制(MSSM),该机制联合评估预测误差、时间不稳定性和样本多样性,以在实例记忆库中保留信息丰富且具有挑战性的样本。此外,我们设计了一种分层原型生成机制(HPGM),该机制在类别和任务级别构建原型,并将原型存储在原型存储库中。在两种评估协议下进行的大量实验表明,所提出的方法达到了最先进的性能。
{"title":"Continual Deepfake Detection Based on Multi-Perspective Sample Selection Mechanism","authors":"Yu Lian;Xinshan Zhu;Di He;Biao Sun;Ruyi Zhang","doi":"10.1109/LSP.2025.3638634","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638634","url":null,"abstract":"The rapid development and malicious use of deepfakes pose a significant crisis of trust. To cope with the evolving deepfake technologies, an increasing number of detection methods adopt the continual learning paradigm, but they often suffer from catastrophic forgetting. Although replay-based methods mitigate this issue by storing a portion of samples from historical tasks, their sample selection strategies usually rely on a single metric, which may lead to the omission of critical samples and consequently hinder the construction of a robust instance memory bank. In this letter, we propose a novel Multi-perspective Sample Selection Mechanism (MSSM) for continual deepfake detection, which jointly evaluates prediction error, temporal instability, and sample diversity to preserve informative and challenging samples in the instance memory bank. Furthermore, we design a Hierarchical Prototype Generation Mechanism (HPGM) that constructs prototypes at both the category and task levels, which are stored in the prototype memory bank. Extensive experiments under two evaluation protocols demonstrate that the proposed method achieves state-of-the-art performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"131-135"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Deployment-Oriented Simulation Framework for Deep Learning-Based Lane Change Prediction 基于深度学习的车道变化预测面向部署的仿真框架
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638676
Luca Forneris;Riccardo Berta;Matteo Fresta;Luca Lazzaroni;Hadise Rojhan;Changjae Oh;Alessandro Pighetti;Hadi Ballout;Fabio Tango;Francesco Bellotti
Advanced driving simulations are increasingly used in automated driving research, yet freely available data and tools remain limited. We present a new open-source framework for synthetic data generation for lane change (LC) intention recognition in highways. Built on the CARLA simulator, it advances the state-of-the-art by providing a 50-driver dataset, a large-scale 3D map, and code for reproducibility and new data creation. The 60 km highway map includes varying curvature radii and straight segments. The codebase supports simulation enhancements (traffic management, vehicle cockpit, engine noise) and Machine Learning (ML) model training and evaluation, including CARLA log post-processing into time series. The dataset contains over 3,400 annotated LC maneuvers with synchronized ego dynamics, road geometry, and traffic context. From an automotive industry perspective, we also assess leading-edge ML models on STM32 microcontrollers using deployability metrics. Unlike prior infrastructure-based works, we estimate time-to-LC from ego-centric data. Results show that a Transformer model yields the lowest regression error, while XGBoost offers the best trade-offs on extremely resource-constrained devices. The entire framework is publicly released to support advancement in automated driving research.
先进的驾驶模拟越来越多地应用于自动驾驶研究,但可免费获得的数据和工具仍然有限。本文提出了一种新的开源框架,用于高速公路变道意图识别的合成数据生成。它建立在CARLA模拟器的基础上,通过提供50个驾驶员数据集、大规模3D地图以及可再现性和新数据创建的代码,推进了最先进的技术。60公里的公路地图包括不同的曲率半径和直线段。代码库支持仿真增强(交通管理、车辆驾驶舱、发动机噪声)和机器学习(ML)模型训练和评估,包括CARLA日志后处理到时间序列。该数据集包含超过3400个带注释的LC机动,具有同步的自我动力学、道路几何和交通背景。从汽车行业的角度来看,我们还使用可部署性指标评估STM32微控制器上的领先ML模型。与之前基于基础设施的工作不同,我们从以自我为中心的数据估计到lc的时间。结果表明,Transformer模型产生最低的回归误差,而XGBoost在资源极度受限的设备上提供了最佳的权衡。整个框架公开发布,以支持自动驾驶研究的进步。
{"title":"A Deployment-Oriented Simulation Framework for Deep Learning-Based Lane Change Prediction","authors":"Luca Forneris;Riccardo Berta;Matteo Fresta;Luca Lazzaroni;Hadise Rojhan;Changjae Oh;Alessandro Pighetti;Hadi Ballout;Fabio Tango;Francesco Bellotti","doi":"10.1109/LSP.2025.3638676","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638676","url":null,"abstract":"Advanced driving simulations are increasingly used in automated driving research, yet freely available data and tools remain limited. We present a new open-source framework for synthetic data generation for lane change (LC) intention recognition in highways. Built on the CARLA simulator, it advances the state-of-the-art by providing a 50-driver dataset, a large-scale 3D map, and code for reproducibility and new data creation. The 60 km highway map includes varying curvature radii and straight segments. The codebase supports simulation enhancements (traffic management, vehicle cockpit, engine noise) and Machine Learning (ML) model training and evaluation, including CARLA log post-processing into time series. The dataset contains over 3,400 annotated LC maneuvers with synchronized ego dynamics, road geometry, and traffic context. From an automotive industry perspective, we also assess leading-edge ML models on STM32 microcontrollers using deployability metrics. Unlike prior infrastructure-based works, we estimate time-to-LC from ego-centric data. Results show that a Transformer model yields the lowest regression error, while XGBoost offers the best trade-offs on extremely resource-constrained devices. The entire framework is publicly released to support advancement in automated driving research.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"136-140"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11271346","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145778421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking RGB-T跟踪的跨模态注意引导增强融合网络
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638688
Jun Liu;Wei Ke;Shuai Wang;Da Yang;Hao Sheng
Visual tracking that combines RGB and thermal infrared modalities (RGB-T) aims to utilize the useful information of each modality to achieve more robust object localization. Most existing tracking methods based on convolutional neural networks (CNNs) and Transformers emphasize integrating multi-modal features through cross-modal attention, but ignore the potential exploitability of complementary information learned by cross-modal attention for enhancing modal features. In this paper, we propose a novel hierarchical progressive fusion network based on cross-modal attention guided enhancement for RGB-T tracking. Specifically, the complementary information generated by cross-modal attention implicitly reflects the consistent regions of interest of important information between different modalities, which is used to enhance modal features in a targeted manner. In addition, a modal feature refinement module and a fusion module are designed based on dynamic routing to perform noise suppression and adaptive integration on the enhanced multi-modal features. Extensive experiments on GTOT, RGBT234, LasHeR and VTUAV show that our method has competitive performance compared with recent state-of-the-art methods.
结合RGB和热红外模态(RGB- t)的视觉跟踪旨在利用每种模态的有用信息来实现更鲁棒的目标定位。现有的基于卷积神经网络(cnn)和Transformers的跟踪方法大多强调通过跨模态注意集成多模态特征,但忽略了利用跨模态注意学习到的互补信息来增强模态特征的潜力。本文提出了一种基于跨模态注意引导增强的分层递进融合网络,用于RGB-T跟踪。具体而言,跨模态注意产生的互补信息隐含地反映了不同模态之间重要信息的一致感兴趣区域,用于有针对性地增强模态特征。此外,设计了基于动态路由的模态特征细化模块和融合模块,对增强的多模态特征进行噪声抑制和自适应集成。在GTOT, RGBT234, LasHeR和VTUAV上进行的大量实验表明,与最近最先进的方法相比,我们的方法具有竞争力的性能。
{"title":"Cross-Modal Attention Guided Enhanced Fusion Network for RGB-T Tracking","authors":"Jun Liu;Wei Ke;Shuai Wang;Da Yang;Hao Sheng","doi":"10.1109/LSP.2025.3638688","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638688","url":null,"abstract":"Visual tracking that combines RGB and thermal infrared modalities (RGB-T) aims to utilize the useful information of each modality to achieve more robust object localization. Most existing tracking methods based on convolutional neural networks (CNNs) and Transformers emphasize integrating multi-modal features through cross-modal attention, but ignore the potential exploitability of complementary information learned by cross-modal attention for enhancing modal features. In this paper, we propose a novel hierarchical progressive fusion network based on cross-modal attention guided enhancement for RGB-T tracking. Specifically, the complementary information generated by cross-modal attention implicitly reflects the consistent regions of interest of important information between different modalities, which is used to enhance modal features in a targeted manner. In addition, a modal feature refinement module and a fusion module are designed based on dynamic routing to perform noise suppression and adaptive integration on the enhanced multi-modal features. Extensive experiments on GTOT, RGBT234, LasHeR and VTUAV show that our method has competitive performance compared with recent state-of-the-art methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"276-280"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145830888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generative Model for 2.5D-Assisted Future Urban Remote Sensing Image Synthesis 2.5 d辅助未来城市遥感影像合成的生成模型
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-11-28 DOI: 10.1109/LSP.2025.3638666
Yuhan Zhang;Jie Zhou;Weihang Peng;Xiaode Liu;Yuanpei Chen
Generating realistic future urban remote sensing imagery is critical for visualizing potential urban changes and supporting related technical analysis within urban planning. Traditional 2D-assisted methods are inherently limited in synthesize vertical development and infrastructure evolution, as they rely on binary planning maps. To address these limitations, we propose a novel 2.5D-assisted future urban remote sensing image synthesis method, aimed at generating future urban layouts based on existing urban structures and 2.5D planning maps. Specifically, the 2.5D map is divided into construction and demolition components, which are then integrated with the existing layout images and the embedding of the corresponding text as conditions for our generative model. We further design two trainable cascaded gated attention layers that process these two conditions separately and embed them into the latent diffusion model (LDM). This approach allows our model to dynamically comprehend the planning design requirements for key areas, making adjustments to accommodate diverse demands. Compared to existing state-of-the-art (SoTA) methods, our approach effectively targets design requirements, enabling flexible modifications that involve new constructions and demolitions in relevant urban areas. Experimental results on the 3DCD dataset demonstrate that the images generated by our method retain high fidelity and exhibit strong consistency with the 2.5D planning map.
生成真实的未来城市遥感图像对于可视化潜在的城市变化和支持城市规划中的相关技术分析至关重要。传统的二维辅助方法依赖于二元规划图,在综合垂直开发和基础设施演变方面存在固有的局限性。为了解决这些限制,我们提出了一种新的2.5D辅助未来城市遥感图像合成方法,旨在基于现有城市结构和2.5D规划图生成未来城市布局。具体来说,将2.5D地图划分为建筑和拆除组件,然后将其与现有的布局图像集成并嵌入相应的文本,作为我们生成模型的条件。我们进一步设计了两个可训练的级联门控注意层,分别处理这两个条件,并将它们嵌入到潜在扩散模型(LDM)中。这种方法使我们的模型能够动态地理解关键区域的规划设计要求,并根据不同的需求进行调整。与现有的最先进的(SoTA)方法相比,我们的方法有效地针对设计要求,允许在相关城市地区进行灵活的修改,包括新建和拆除。在3DCD数据集上的实验结果表明,该方法生成的图像保真度较高,与2.5D规划图具有较强的一致性。
{"title":"Generative Model for 2.5D-Assisted Future Urban Remote Sensing Image Synthesis","authors":"Yuhan Zhang;Jie Zhou;Weihang Peng;Xiaode Liu;Yuanpei Chen","doi":"10.1109/LSP.2025.3638666","DOIUrl":"https://doi.org/10.1109/LSP.2025.3638666","url":null,"abstract":"Generating realistic future urban remote sensing imagery is critical for visualizing potential urban changes and supporting related technical analysis within urban planning. Traditional 2D-assisted methods are inherently limited in synthesize vertical development and infrastructure evolution, as they rely on binary planning maps. To address these limitations, we propose a novel 2.5D-assisted future urban remote sensing image synthesis method, aimed at generating future urban layouts based on existing urban structures and 2.5D planning maps. Specifically, the 2.5D map is divided into construction and demolition components, which are then integrated with the existing layout images and the embedding of the corresponding text as conditions for our generative model. We further design two trainable cascaded gated attention layers that process these two conditions separately and embed them into the latent diffusion model (LDM). This approach allows our model to dynamically comprehend the planning design requirements for key areas, making adjustments to accommodate diverse demands. Compared to existing state-of-the-art (SoTA) methods, our approach effectively targets design requirements, enabling flexible modifications that involve new constructions and demolitions in relevant urban areas. Experimental results on the 3DCD dataset demonstrate that the images generated by our method retain high fidelity and exhibit strong consistency with the 2.5D planning map.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"878-882"},"PeriodicalIF":3.9,"publicationDate":"2025-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146223708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1