首页 > 最新文献

IEEE Signal Processing Letters最新文献

英文 中文
List of Reviewers 审稿人名单
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-08 DOI: 10.1109/LSP.2025.3634660
{"title":"List of Reviewers","authors":"","doi":"10.1109/LSP.2025.3634660","DOIUrl":"https://doi.org/10.1109/LSP.2025.3634660","url":null,"abstract":"","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"4473-4484"},"PeriodicalIF":3.9,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11284689","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145729272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LVMF3D: Large Vision Model Boosting Multimodal Fusion for Indoor 3D Object Detection lvf3d:用于室内3D物体检测的大视觉模型增强多模态融合
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-08 DOI: 10.1109/LSP.2025.3641506
Yichen Shi;Wenming Yang;Nan Su;Guijin Wang
3D object detection plays an important role in intelligent systems perceiving the world. Although manystudies have been conducted to address this task, the detection accuracy is still limited by the network’s learning capability. Therefore, we propose LVMF3D, a Large Vision Model (LVM) boosted multimodal fusion indoor 3D object detection framework, consisting of two branches. The pre-trained LVM is used as the RGB branch to better extract the image texture feature. The point branch is used to encode the spatial geometric feature. Furthermore, Point Fusion Module (PFM) and Multi-Scale Attention Fusion Module (MS-AFM) are specially designed in the 2D and 3D spaces, respectively, to realize more comprehensive and effective information fusion between the two branches. We conduct experiments on the indoor 3D object detection dataset SUN RGB-D and achieve state-of-the-art results compared to other 3D object detection methods.
三维物体检测在智能系统感知世界中起着重要的作用。尽管已经进行了许多研究来解决这个问题,但检测精度仍然受到网络学习能力的限制。为此,我们提出了一种基于大视觉模型(LVM)的多模态融合室内3D目标检测框架LVMF3D,该框架由两个分支组成。将预训练好的LVM作为RGB分支,更好地提取图像纹理特征。利用点分支对空间几何特征进行编码。此外,在二维空间和三维空间分别设计了点融合模块(PFM)和多尺度注意力融合模块(MS-AFM),实现了两个分支之间更全面有效的信息融合。我们在室内三维目标检测数据集SUN RGB-D上进行了实验,与其他三维目标检测方法相比,获得了最先进的结果。
{"title":"LVMF3D: Large Vision Model Boosting Multimodal Fusion for Indoor 3D Object Detection","authors":"Yichen Shi;Wenming Yang;Nan Su;Guijin Wang","doi":"10.1109/LSP.2025.3641506","DOIUrl":"https://doi.org/10.1109/LSP.2025.3641506","url":null,"abstract":"3D object detection plays an important role in intelligent systems perceiving the world. Although manystudies have been conducted to address this task, the detection accuracy is still limited by the network’s learning capability. Therefore, we propose LVMF3D, a Large Vision Model (LVM) boosted multimodal fusion indoor 3D object detection framework, consisting of two branches. The pre-trained LVM is used as the RGB branch to better extract the image texture feature. The point branch is used to encode the spatial geometric feature. Furthermore, Point Fusion Module (PFM) and Multi-Scale Attention Fusion Module (MS-AFM) are specially designed in the 2D and 3D spaces, respectively, to realize more comprehensive and effective information fusion between the two branches. We conduct experiments on the indoor 3D object detection dataset SUN RGB-D and achieve state-of-the-art results compared to other 3D object detection methods.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"356-360"},"PeriodicalIF":3.9,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Superdirective Beamforming Method Based on Spherical Harmonic Expansion in the Waveguide Environment 波导环境下基于球面谐波展开的超指令波束形成方法
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-05 DOI: 10.1109/LSP.2025.3640094
Junyuan Guo;Mingqian Han
Superdirective beamforming methods based on spherical harmonic expansion can achieve higher array gain compared to conventional beamforming methods when the array aperture is very small. However, in the waveguide environment, the array gain of beamforming methods based on spherical harmonic expansion may degrade significantly due to the influence of multipath effects. To address the issue, this letter proposes an improved beamforming method for compact planar acoustic vector sensor arrays to mitigate the negative impact of multipath effects on array gain. First, the form of the steering vector model in the direct arrival zone of the waveguide environment is reasonably simplified. Second, a closed-form beamformer is constructed by utilizing the information of signals’ arriving directions. Subsequently, the theoretical derivation demonstrates the advantages of the proposed beamforming method in the waveguide environment. Finally, simulation analysis substantiates the rationality and feasibility of the proposed method.
当阵列孔径很小时,基于球谐展开的超指令波束形成方法比传统波束形成方法获得更高的阵列增益。然而,在波导环境下,由于多径效应的影响,基于球面谐波展开的波束形成方法的阵列增益会显著降低。为了解决这个问题,本文提出了一种改进的紧凑平面声矢量传感器阵列波束形成方法,以减轻多径效应对阵列增益的负面影响。首先,合理简化了波导环境直接到达区导向矢量模型的形式;其次,利用信号到达方向信息构造了一个闭式波束形成器。随后,理论推导证明了所提出的波束形成方法在波导环境中的优越性。最后通过仿真分析验证了所提方法的合理性和可行性。
{"title":"Superdirective Beamforming Method Based on Spherical Harmonic Expansion in the Waveguide Environment","authors":"Junyuan Guo;Mingqian Han","doi":"10.1109/LSP.2025.3640094","DOIUrl":"https://doi.org/10.1109/LSP.2025.3640094","url":null,"abstract":"Superdirective beamforming methods based on spherical harmonic expansion can achieve higher array gain compared to conventional beamforming methods when the array aperture is very small. However, in the waveguide environment, the array gain of beamforming methods based on spherical harmonic expansion may degrade significantly due to the influence of multipath effects. To address the issue, this letter proposes an improved beamforming method for compact planar acoustic vector sensor arrays to mitigate the negative impact of multipath effects on array gain. First, the form of the steering vector model in the direct arrival zone of the waveguide environment is reasonably simplified. Second, a closed-form beamformer is constructed by utilizing the information of signals’ arriving directions. Subsequently, the theoretical derivation demonstrates the advantages of the proposed beamforming method in the waveguide environment. Finally, simulation analysis substantiates the rationality and feasibility of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"416-420"},"PeriodicalIF":3.9,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimal Binary Hypothesis Testing Based on the Behavioral Kullback–Leibler Divergence Criterion 基于行为Kullback-Leibler散度准则的最优二值假设检验
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-04 DOI: 10.1109/LSP.2025.3640513
Alperen Berber;Berkan Dulek
Kullback–Leibler (KL) divergence plays a central role in hypothesis testing. It gives a measure of the statistical distance between two probability distributions. In the distributed detection problem, it is used as a design criterion in the absence of the information regarding the fusion center's (FC) decision rule: The local sensor decision rules are designed to maximize the KL divergence between the distributions of quantized messages sent to the FC under alternative and null hypotheses. In decision making tasks involving humans, subjective perception of probability values due to behavioral biases needs to be taken into account. In this letter, the notion of behavioral KL divergence is proposed. The statistical distance between two distributions is computed based on the perceived values of the probabilities, which are obtained from the actual probabilities using the probability weighting function employed in prospect theory. It is proved that the behavioral KL divergence between the distributions of the quantized decision at the output of a detector under both hypotheses is maximized by either the Neyman-Pearson (NP) rule or flipped Neyman-Pearson (FNP) rule for any fixed false alarm probability. Based on this result, it is also established that under a constraint on the average perceived false alarm probability, the average behavioral KL divergence is maximized by time-sharing between at most two single-threshold likelihood-ratio tests, each of which is either an NP or an FNP rule. The theoretical results are supported by numerical examples.
Kullback-Leibler (KL)散度在假设检验中起着核心作用。它给出了两个概率分布之间的统计距离的度量。在分布式检测问题中,它被用作缺少关于融合中心(FC)决策规则信息的设计准则:局部传感器决策规则被设计为最大化在备用假设和零假设下发送到FC的量化消息分布之间的KL散度。在涉及人类的决策任务中,需要考虑由于行为偏差而产生的概率值的主观感知。在这封信中,提出了行为KL分歧的概念。根据概率感知值计算两个分布之间的统计距离,感知值是利用前景理论中的概率加权函数从实际概率中得到的。证明了在两种假设下,对于任意固定的虚警概率,采用Neyman-Pearson (NP)规则或翻转Neyman-Pearson (FNP)规则均能使检测器输出处量化决策分布之间的行为KL散度最大化。在此结果的基础上,还建立了在平均感知虚警概率约束下,通过在最多两个单阈值似然比测试(每个测试都是NP规则或FNP规则)之间分时实现平均行为KL散度的最大化。数值算例支持了理论结果。
{"title":"Optimal Binary Hypothesis Testing Based on the Behavioral Kullback–Leibler Divergence Criterion","authors":"Alperen Berber;Berkan Dulek","doi":"10.1109/LSP.2025.3640513","DOIUrl":"https://doi.org/10.1109/LSP.2025.3640513","url":null,"abstract":"Kullback–Leibler (KL) divergence plays a central role in hypothesis testing. It gives a measure of the statistical distance between two probability distributions. In the distributed detection problem, it is used as a design criterion in the absence of the information regarding the fusion center's (FC) decision rule: The local sensor decision rules are designed to maximize the KL divergence between the distributions of quantized messages sent to the FC under alternative and null hypotheses. In decision making tasks involving humans, subjective perception of probability values due to behavioral biases needs to be taken into account. In this letter, the notion of behavioral KL divergence is proposed. The statistical distance between two distributions is computed based on the perceived values of the probabilities, which are obtained from the actual probabilities using the probability weighting function employed in prospect theory. It is proved that the behavioral KL divergence between the distributions of the quantized decision at the output of a detector under both hypotheses is maximized by either the Neyman-Pearson (NP) rule or flipped Neyman-Pearson (FNP) rule for any fixed false alarm probability. Based on this result, it is also established that under a constraint on the average perceived false alarm probability, the average behavioral KL divergence is maximized by time-sharing between at most two single-threshold likelihood-ratio tests, each of which is either an NP or an FNP rule. The theoretical results are supported by numerical examples.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"161-165"},"PeriodicalIF":3.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Discrete-Phase Waveform Design for Desired Ambiguity Functions in Pulse-Doppler MIMO Radar 脉冲多普勒MIMO雷达中期望模糊函数的离散相位波形设计
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-04 DOI: 10.1109/LSP.2025.3640530
Hezhe Jia;Hua Wang;Jun Liu;Kai Zhong;Jinfeng Hu
Unimodular waveform design plays a crucial role in MIMO radar systems. Previous studies have mainly focused on continuous- and discrete-phase coding for single-pulse MIMO radar waveforms, as well as continuous-phase coding for pulse-Doppler MIMO radar waveforms. Although multi-pulse discrete-phase waveforms provide both high resolution and hardware simplicity, their design remains a challenging optimization problem. In this work, we go beyond prior approaches by investigating the design of pulse-Doppler MIMO waveforms under discrete phase constraints. We formulate the problem as optimizing the waveform phase matrix to minimize the weighted integrated sidelobe level (WISL) of the joint ambiguity function. The non-convexity of WISL and the discrete phase constraints make the problem particularly challenging. Noting that the Adam optimizer incorporates both adaptive learning rate and momentum mechanisms, making it suitable for solving non-convex optimization problems, and that nonlinear functions can be used to approximate quantization in a continuously differentiable form, we propose a soft quantization Adam optimization (SQAO) method to solve this problem. Simulations show that SQAO outperforms existing method.
单模块波形设计在MIMO雷达系统中起着至关重要的作用。以往的研究主要集中在单脉冲MIMO雷达波形的连续相位编码和离散相位编码,以及脉冲多普勒MIMO雷达波形的连续相位编码。尽管多脉冲离散相位波形提供了高分辨率和硬件简单性,但其设计仍然是一个具有挑战性的优化问题。在这项工作中,我们通过研究离散相位约束下脉冲多普勒MIMO波形的设计,超越了先前的方法。我们将问题表述为优化波形相位矩阵以最小化联合模糊函数的加权综合旁瓣电平(WISL)。WISL的非凸性和离散相位约束使得该问题特别具有挑战性。注意到Adam优化器结合了自适应学习率和动量机制,使其适合于求解非凸优化问题,并且非线性函数可以用连续可微形式近似量化,我们提出了一种软量化Adam优化(SQAO)方法来解决这一问题。仿真结果表明,该方法优于现有方法。
{"title":"Discrete-Phase Waveform Design for Desired Ambiguity Functions in Pulse-Doppler MIMO Radar","authors":"Hezhe Jia;Hua Wang;Jun Liu;Kai Zhong;Jinfeng Hu","doi":"10.1109/LSP.2025.3640530","DOIUrl":"https://doi.org/10.1109/LSP.2025.3640530","url":null,"abstract":"Unimodular waveform design plays a crucial role in MIMO radar systems. Previous studies have mainly focused on continuous- and discrete-phase coding for single-pulse MIMO radar waveforms, as well as continuous-phase coding for pulse-Doppler MIMO radar waveforms. Although multi-pulse discrete-phase waveforms provide both high resolution and hardware simplicity, their design remains a challenging optimization problem. In this work, we go beyond prior approaches by investigating the design of pulse-Doppler MIMO waveforms under discrete phase constraints. We formulate the problem as optimizing the waveform phase matrix to minimize the weighted integrated sidelobe level (WISL) of the joint ambiguity function. The non-convexity of WISL and the discrete phase constraints make the problem particularly challenging. Noting that the Adam optimizer incorporates both adaptive learning rate and momentum mechanisms, making it suitable for solving non-convex optimization problems, and that nonlinear functions can be used to approximate quantization in a continuously differentiable form, we propose a soft quantization Adam optimization (SQAO) method to solve this problem. Simulations show that SQAO outperforms existing method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"421-425"},"PeriodicalIF":3.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Occam’s Razor in Pooling of Probability Densities 概率密度池中的奥卡姆剃刀
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-04 DOI: 10.1109/LSP.2025.3640068
Miroslav Kárný
Geometric and linear poolings often serve for the fusion of the knowledge contained in a finite set of probability densities. Their pros and cons are relatively well understood. Many other ways have also been studied. A recent insightful survey letter by Koliander et al. inspects a range of pooling ways based on various axioms, optimisation and supra-Bayesian handling. The gained extensive option set makes the proper choice of the pooling function harder. This letter reduces the extent of unjustified options. It provides the optimisation-based selection among available options. Its steps are justified by well-established, axiomatically supported, minimum relative entropy and approximation principles. The text applies Occam’s razor to its theoretical tools, too. It simplifies the user’s choice of the pooling function and its weights. This weakens the possibility of a bad choice and opens the way to a range of applications.
几何和线性池通常用于融合包含在有限概率密度集合中的知识。它们的优点和缺点相对来说很容易理解。许多其他的方法也被研究过。Koliander等人最近发表了一篇颇有见地的调查信,考察了一系列基于各种公理、优化和超贝叶斯处理的池化方式。获得的广泛选项集使得正确选择池功能变得更加困难。这封信减少了不合理选择的范围。它在可用选项中提供基于优化的选择。它的步骤被公认的、公理支持的、最小相对熵和近似原理所证明。本书也将奥卡姆剃刀理论应用于其理论工具。它简化了用户对池化函数及其权重的选择。这降低了错误选择的可能性,并为一系列应用开辟了道路。
{"title":"Occam’s Razor in Pooling of Probability Densities","authors":"Miroslav Kárný","doi":"10.1109/LSP.2025.3640068","DOIUrl":"https://doi.org/10.1109/LSP.2025.3640068","url":null,"abstract":"Geometric and linear poolings often serve for the fusion of the knowledge contained in a finite set of probability densities. Their pros and cons are relatively well understood. Many other ways have also been studied. A recent insightful survey letter by Koliander et al. inspects a range of pooling ways based on various axioms, optimisation and supra-Bayesian handling. The gained extensive option set makes the proper choice of the pooling function harder. This letter reduces the extent of unjustified options. It provides the optimisation-based selection among available options. Its steps are justified by well-established, axiomatically supported, minimum relative entropy and approximation principles. The text applies Occam’s razor to its theoretical tools, too. It simplifies the user’s choice of the pooling function and its weights. This weakens the possibility of a bad choice and opens the way to a range of applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"156-160"},"PeriodicalIF":3.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TPEech: Target Speaker Extraction and Noise Suppression With Historical Dialogue Text Cues 基于历史对话文本线索的目标说话人提取和噪声抑制
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-04 DOI: 10.1109/LSP.2025.3640519
Ziyang Jiang;Xueyan Chen;Shuai Wang;Xinyuan Qian;Haizhou Li
In complex multi-speaker scenarios with significant speaker overlap and background noise, extracting the target speaker's speech remains a major challenge. This capability is crucial for dialogue-based applications such as AI speech assistants, where downstream tasks such as speech recognition depend on clean speech. A potential solution to address these challenges is Target Speaker Extraction (TSE), which leverages auxiliary information to extract target speech from mixed and noisy speech, thus overcoming the limitations of Speech Separation (SS) and Speech Enhancement (SE). In particular, we propose a multi-modal TSE network, namely Text Prompt Extractor with echo cue block (TPEech), which uses historical dialogue text as cues for extraction and incorporates the echo cue block (ECB) to further exploit this cue and enhance TSE performance. The experiments show the excellent extraction and denoising capabilities of our proposed network. TPEech achieves an SI-SDRi of 9.632 dB, an SDR of 13.045 dB, a PESQ of 2.814, and a STOI of 0.885, outperforming competitive baselines. Additionally, we experimentally verify that TPEech is robust against semantically incomplete textual prompts.
在具有明显的说话人重叠和背景噪声的复杂多说话人场景中,目标说话人的语音提取仍然是一个主要的挑战。这种能力对于基于对话的应用程序(如AI语音助手)至关重要,其中语音识别等下游任务依赖于干净的语音。目标说话人提取(TSE)是一种潜在的解决方案,它利用辅助信息从混合和噪声语音中提取目标语音,从而克服了语音分离(SS)和语音增强(SE)的局限性。特别地,我们提出了一个多模态的TSE网络,即带有回声提示块的文本提示提取器(TPEech),它使用历史对话文本作为提取线索,并结合回声提示块(ECB)进一步利用该线索并提高TSE的性能。实验结果表明,该网络具有良好的提取和去噪能力。TPEech的SI-SDRi为9.632 dB, SDR为13.045 dB, PESQ为2.814,STOI为0.885,优于竞争基准。此外,我们通过实验验证了TPEech对语义不完整的文本提示具有鲁棒性。
{"title":"TPEech: Target Speaker Extraction and Noise Suppression With Historical Dialogue Text Cues","authors":"Ziyang Jiang;Xueyan Chen;Shuai Wang;Xinyuan Qian;Haizhou Li","doi":"10.1109/LSP.2025.3640519","DOIUrl":"https://doi.org/10.1109/LSP.2025.3640519","url":null,"abstract":"In complex multi-speaker scenarios with significant speaker overlap and background noise, extracting the target speaker's speech remains a major challenge. This capability is crucial for dialogue-based applications such as AI speech assistants, where downstream tasks such as speech recognition depend on clean speech. A potential solution to address these challenges is Target Speaker Extraction (TSE), which leverages auxiliary information to extract target speech from mixed and noisy speech, thus overcoming the limitations of Speech Separation (SS) and Speech Enhancement (SE). In particular, we propose a multi-modal TSE network, namely Text Prompt Extractor with echo cue block (TPEech), which uses historical dialogue text as cues for extraction and incorporates the echo cue block (ECB) to further exploit this cue and enhance TSE performance. The experiments show the excellent extraction and denoising capabilities of our proposed network. TPEech achieves an SI-SDRi of 9.632 dB, an SDR of 13.045 dB, a PESQ of 2.814, and a STOI of 0.885, outperforming competitive baselines. Additionally, we experimentally verify that TPEech is robust against semantically incomplete textual prompts.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"351-355"},"PeriodicalIF":3.9,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced Multi-Scale PoseNet for Self-Supervised Monocular Depth Estimation 自监督单目深度估计的增强多尺度PoseNet
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639361
Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao
Monocular depth estimation is essential for 3D perception in applications such as autonomous driving and robotics. Self-supervised methods avoid depth labels but often rely on shallow pose networks with weak temporal modeling, leading to unstable predictions. We propose EMSP-Net, an Enhanced Multi-Scale PoseNet for self-supervised monocular depth estimation. It introduces a hierarchical feature fusion encoder, a temporal attention-context decoder, and a pose consistency loss to jointly improve feature extraction, temporal stability, and geometric constraints. On the KITTI dataset, EMSP-Net achieved an absolute relative error of 0.105 and a squared relative error of 0.708. In the Make3D cross-domain test, its strong robustness was further demonstrated.
单目深度估计对于自动驾驶和机器人等应用中的3D感知至关重要。自监督方法避免了深度标签,但往往依赖于具有弱时间建模的浅姿态网络,导致预测不稳定。我们提出了EMSP-Net,一种用于自监督单目深度估计的增强型多尺度PoseNet。它引入了一个分层特征融合编码器、一个时间注意-上下文解码器和一个姿态一致性损失来共同改进特征提取、时间稳定性和几何约束。在KITTI数据集上,EMSP-Net的绝对相对误差为0.105,平方相对误差为0.708。在Make3D跨域测试中,进一步验证了该算法的鲁棒性。
{"title":"Enhanced Multi-Scale PoseNet for Self-Supervised Monocular Depth Estimation","authors":"Chao Zhang;Tian Tian;Cheng Han;Tiancheng Shao;Mi Zhou;Shichao Zhao","doi":"10.1109/LSP.2025.3639361","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639361","url":null,"abstract":"Monocular depth estimation is essential for 3D perception in applications such as autonomous driving and robotics. Self-supervised methods avoid depth labels but often rely on shallow pose networks with weak temporal modeling, leading to unstable predictions. We propose EMSP-Net, an Enhanced Multi-Scale PoseNet for self-supervised monocular depth estimation. It introduces a hierarchical feature fusion encoder, a temporal attention-context decoder, and a pose consistency loss to jointly improve feature extraction, temporal stability, and geometric constraints. On the KITTI dataset, EMSP-Net achieved an absolute relative error of 0.105 and a squared relative error of 0.708. In the Make3D cross-domain test, its strong robustness was further demonstrated.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"316-320"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text-Driven Medical Image Segmentation With LLM Semantic Bridge and LLM Prompt Bridge 基于LLM语义桥和LLM提示桥的文本驱动医学图像分割
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639352
Zhengyi Liu;Jiali Wu;Xianyong Fang;Linbo Wang
Text-driven medical image segmentation aims to accurately segment pathological regions in medical images based on textual descriptions. Existing methods face two major challenges: (a) The significant modality heterogeneity between textual and visual features leads to inefficient cross-modal feature alignment; (b) The insufficient utilization of medical shared knowledge restricts semantic understanding. To address these challenges, two large language model (LLM) bridges are constructed. LLM semantic bridge leverages the sequential modeling capability of a frozen LLM to reorganize visual features into semantically coherent units that possess linguistic logic, thereby effectively bridging vision and language. The LLM prompt bridge appends learnable prompts, which encode medical shared knowledge from the LLM, to text embeddings, thereby effectively bridging case-specificity and medical consensus knowledge. Experimental results show the predominant performance due to LLM participation.
文本驱动医学图像分割旨在基于文本描述准确分割医学图像中的病理区域。现有方法面临两个主要挑战:(a)文本和视觉特征之间的显著模态异质性导致跨模态特征对齐效率低下;(b)医学共享知识利用不足限制了语义理解。为了应对这些挑战,构建了两个大型语言模型(LLM)桥梁。LLM语义桥利用冻结LLM的顺序建模能力,将视觉特征重新组织成具有语言逻辑的语义连贯单元,从而有效地桥接视觉和语言。法学硕士提示桥将可学习的提示附加到文本嵌入中,这些提示编码来自法学硕士的医学共享知识,从而有效地连接病例特异性和医学共识知识。实验结果表明,LLM的参与对系统性能有显著影响。
{"title":"Text-Driven Medical Image Segmentation With LLM Semantic Bridge and LLM Prompt Bridge","authors":"Zhengyi Liu;Jiali Wu;Xianyong Fang;Linbo Wang","doi":"10.1109/LSP.2025.3639352","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639352","url":null,"abstract":"Text-driven medical image segmentation aims to accurately segment pathological regions in medical images based on textual descriptions. Existing methods face two major challenges: (a) The significant modality heterogeneity between textual and visual features leads to inefficient cross-modal feature alignment; (b) The insufficient utilization of medical shared knowledge restricts semantic understanding. To address these challenges, two large language model (LLM) bridges are constructed. LLM semantic bridge leverages the sequential modeling capability of a frozen LLM to reorganize visual features into semantically coherent units that possess linguistic logic, thereby effectively bridging vision and language. The LLM prompt bridge appends learnable prompts, which encode medical shared knowledge from the LLM, to text embeddings, thereby effectively bridging case-specificity and medical consensus knowledge. Experimental results show the predominant performance due to LLM participation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"146-150"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Experiment Design for Nonlinear System Identification With Operational Constraints 具有操作约束的非线性系统辨识的自适应实验设计
IF 3.9 2区 工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC Pub Date : 2025-12-02 DOI: 10.1109/LSP.2025.3639512
Jingwei Hu;Dave Zachariah;Torbjörn Wigren;Petre Stoica
We consider the joint problem of online experiment design and parameter estimation for identifying nonlinear system models, while adhering to system constraints. We utilize a receding horizon approach and propose a new adaptive input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.
在不违背系统约束条件的前提下,考虑在线实验设计和参数估计的联合问题来识别非线性系统模型。我们利用后退视界方法并提出了一种新的自适应输入设计准则,该准则适合于不断更新的参数估计,以及一个新的顺序估计器。我们演示了该方法在操作约束范围内指导系统的同时,在线设计信息实验的能力。
{"title":"Adaptive Experiment Design for Nonlinear System Identification With Operational Constraints","authors":"Jingwei Hu;Dave Zachariah;Torbjörn Wigren;Petre Stoica","doi":"10.1109/LSP.2025.3639512","DOIUrl":"https://doi.org/10.1109/LSP.2025.3639512","url":null,"abstract":"We consider the joint problem of online experiment design and parameter estimation for identifying nonlinear system models, while adhering to system constraints. We utilize a receding horizon approach and propose a new adaptive input design criterion, which is tailored to continuously updated parameter estimates, along with a new sequential estimator. We demonstrate the ability of the method to design informative experiments online, while steering the system within operational constraints.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"33 ","pages":"151-155"},"PeriodicalIF":3.9,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145772065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Signal Processing Letters
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1