IEEE Signal Processing Letters最新文献

英文中文

Radar Signal Sorting via Graph Convolutional Network and Semi-Supervised Learning 基于图卷积网络和半监督学习的雷达信号分类

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-18 DOI: 10.1109/LSP.2024.3519884

Ziying Li;Xiongjun Fu;Jian Dong;Min Xie

As a key technology in radar reconnaissance systems, radar signal sorting aims to separate multiple radar pulses from an interleaved pulse stream. Supervised signal sorting methods based on deep learning depend on a large volume of training data to optimize model parameters. However, acquiring labeled pulses in practice is challenging. In this letter, a semi-supervised learning (SSL) framework is proposed to address this issue. First, a Self-Organizing Map (SOM) is used to learn the spatial distribution of impulse features, and an anchor graph is constructed based on SOM nodes. A pseudo-label set is then generated using the SOM based on pulse discrepancy information. Finally, a three-layer Weighted Residual Graph Convolutional Network (WRGCN) is designed for signal sorting, with its parameters pre-trained on pseudo-labels and fine-tuned with a limited number of true labels. Experiments on a simulated radar pulse dataset demonstrate that this framework outperforms several existing methods for radar signal sorting with limited labeled pulses.

雷达信号分选是雷达侦察系统中的一项关键技术，其目的是从交织的脉冲流中分离出多个雷达脉冲。基于深度学习的监督信号排序方法依赖于大量的训练数据来优化模型参数。然而，在实践中获取标记脉冲是具有挑战性的。在这封信中，提出了一个半监督学习（SSL）框架来解决这个问题。首先，使用自组织映射（SOM）学习脉冲特征的空间分布，并基于SOM节点构建锚点图；然后，基于脉冲差异信息，使用SOM生成伪标签集。最后，设计了一种三层加权残差图卷积网络（WRGCN）用于信号排序，其参数在伪标签上进行预训练，并使用有限数量的真标签进行微调。在模拟雷达脉冲数据集上的实验表明，该框架优于现有的几种具有有限标记脉冲的雷达信号分选方法。

引用次数: 0

Calibration Matters: Prototype-Aware Diffusion for OCT Cervical Classification With Calibration 校正事项：有校正的OCT宫颈分类的原型感知扩散

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-18 DOI: 10.1109/LSP.2024.3520010

Yuxuan Xiong;Zhou Zhao;Yongchao Xu;Yan Zhang;Bo Du

Cervical optical coherence tomography (OCT) imaging serves as an effective diagnostic tool, and the development of deep learning classification models for OCT has the potential to enhance diagnosis. However, the complex imaging patterns of OCT data, significant noise, and the substantial domain gap from multi-center data result in high uncertainty and low accuracy in classification networks. To address these challenges, we propose a Multi-scale Prototype-Guided Diffusion learning method (MPGD), which is constructed with the Multi-scale Feature Condition (MFC), Diffusion-based Classification Calibrator (DCC), and Multi-scale Prototype Bank (MPB) modules. Specifically, MFC provides initial classification based on multi-scale features, DCC calibrates MFC's classification results through a diffusion model, and MPB refines DCC's visual guidance using prototypes obtained from clustering. Extensive experiments demonstrate that MPGD outperforms widely-used competitors for cervical OCT image classification, showing excellent generalization performance.

子宫颈光学相干断层扫描（OCT）成像是一种有效的诊断工具，而OCT的深度学习分类模型的发展有可能增强诊断。然而，OCT数据复杂的成像模式、明显的噪声以及多中心数据的大量域间隙导致了分类网络的高不确定性和低准确率。为了解决这些挑战，我们提出了一种多尺度原型引导扩散学习方法（MPGD），该方法由多尺度特征条件（MFC）、基于扩散的分类校准器（DCC）和多尺度原型库（MPB）模块组成。具体来说，MFC提供基于多尺度特征的初始分类，DCC通过扩散模型校准MFC的分类结果，MPB使用聚类获得的原型来完善DCC的视觉引导。大量的实验表明，MPGD在宫颈OCT图像分类中优于广泛使用的竞争对手，具有出色的泛化性能。

{"title":"Calibration Matters: Prototype-Aware Diffusion for OCT Cervical Classification With Calibration","authors":"Yuxuan Xiong;Zhou Zhao;Yongchao Xu;Yan Zhang;Bo Du","doi":"10.1109/LSP.2024.3520010","DOIUrl":"https://doi.org/10.1109/LSP.2024.3520010","url":null,"abstract":"Cervical optical coherence tomography (OCT) imaging serves as an effective diagnostic tool, and the development of deep learning classification models for OCT has the potential to enhance diagnosis. However, the complex imaging patterns of OCT data, significant noise, and the substantial domain gap from multi-center data result in high uncertainty and low accuracy in classification networks. To address these challenges, we propose a Multi-scale Prototype-Guided Diffusion learning method (MPGD), which is constructed with the \u0000<bold>Multi-scale Feature Condition (MFC)</b>\u0000, \u0000<bold>Diffusion-based Classification Calibrator (DCC)</b>\u0000, and \u0000<bold>Multi-scale Prototype Bank (MPB)</b>\u0000 modules. Specifically, MFC provides initial classification based on multi-scale features, DCC calibrates MFC's classification results through a diffusion model, and MPB refines DCC's visual guidance using prototypes obtained from clustering. Extensive experiments demonstrate that MPGD outperforms widely-used competitors for cervical OCT image classification, showing excellent generalization performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"396-400"},"PeriodicalIF":3.2,"publicationDate":"2024-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142925369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sparse Projection Matrix Approximation and Its Applications 稀疏投影矩阵逼近及其应用

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-17 DOI: 10.1109/LSP.2024.3519459

Zheng Zhai;Mingxin Wu;Jialu Xu;Xiaohui Li

This letter introduces a sparse regularized projection matrix approximation (SPMA) model to recover cluster structures from affinity matrices. The model is formulated as a projection approximation problem with an entry-wise sparsity penalty to encourage sparse solutions. We propose two algorithms to solve this problem: one involves direct optimization on the Stiefel manifold using the Cayley transformation, while the other employs the Alternating Direction Method of Multipliers (ADMM). Numerical experiments on synthetic and real-world datasets demonstrate that our regularized projection matrix approximation approach significantly outperforms state-of-the-art methods in clustering accuracy and performance.

本文介绍了一种稀疏正则化投影矩阵近似（SPMA）模型，用于从亲和矩阵中恢复簇结构。该模型被表述为具有入口稀疏性惩罚的投影近似问题，以鼓励稀疏解。我们提出了两种算法来解决这个问题：一种是使用Cayley变换对Stiefel流形进行直接优化，而另一种是使用乘法器的交替方向方法（ADMM）。在合成数据集和真实数据集上的数值实验表明，我们的正则化投影矩阵近似方法在聚类精度和性能方面明显优于最先进的方法。

引用次数: 0

How to Understand Generation of Dirac Weighted Combs in Signal Sampling Operation? 如何理解信号采样操作中狄拉克加权梳的产生？

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-17 DOI: 10.1109/LSP.2024.3519261

Andrzej Borys

This letter shows that the description of signal sampling operation that uses a weighted Dirac comb plays only a supporting role. It must be supplemented with a relation that transfers this description into the world of ordinary functions. In this case, these will be the weighted step functions. It is shown that the description mentioned, together with the complementary relation, form a joint model of the signal sampling operation. To achieve this, an idea of a Schrödinger's cat locked in a black box, which opens at sampling instants, was used.

这封信表明，使用加权狄拉克梳的信号采样操作的描述仅起辅助作用。它必须用一种关系加以补充，把这种描述转化为普通函数的世界。在这种情况下，这些就是加权阶跃函数。结果表明，上述描述与互补关系构成了信号采样运算的联合模型。为了实现这一目标，研究人员使用了一个想法，将一只Schrödinger的猫锁在一个黑盒子里，在采样的瞬间打开。

引用次数: 0

Enhancing the Transferability of Adversarial Point Clouds by Initializing Transferable Adversarial Noise 通过初始化可转移的对抗噪声来增强对抗点云的可转移性

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-17 DOI: 10.1109/LSP.2024.3509335

Hai Chen;Shu Zhao;Yuanting Yan;Fulan Qian

One of the most popular methods for analyzing the robustness of 3D Deep Neural Networks (DNNs) is the transfer-based adversarial attack method, as it allows to analyze the robustness of an unknown model by generating an adversarial point cloud on an alternative model. However, the adversarial point clouds generated by current methods may overfit the surrogate models that generated them, thus limiting their performance in transfer attacks against different target 3D classifiers. To enhance the transferability of the adversarial point cloud, we propose in this letter an adversarial attack method by Initializing the Transferable Adversarial Noise, which named as ITAN. Specifically, we pre-train on the training set a generator capable of generating the adversarial noise with transferability and diversity, and then the noise generated by the generator serves as the initial adversarial noise to be integrated into the iterations of the attack. Extensive experiments on well-recognized benchmark datasets demonstrate that the adversarial point clouds generated by the proposed ITAN could be effectively transferred across unknown 3D classifiers.

分析三维深度神经网络（DNN）鲁棒性的最流行方法之一是基于转移的对抗攻击法，因为它可以通过在替代模型上生成对抗点云来分析未知模型的鲁棒性。然而，当前方法生成的对抗点云可能会过度拟合生成它们的代用模型，从而限制了它们在针对不同目标三维分类器的转移攻击中的性能。为了提高对抗点云的可转移性，我们在这封信中提出了一种通过初始化可转移对抗噪声的对抗攻击方法，并将其命名为 ITAN。具体来说，我们在训练集上预先训练一个能够生成具有可转移性和多样性的对抗噪声的生成器，然后将生成器生成的噪声作为初始对抗噪声，整合到攻击的迭代中。在公认的基准数据集上进行的大量实验证明，由所提出的 ITAN 生成的对抗性点云可以在未知的 3D 分类器之间有效转移。

{"title":"Enhancing the Transferability of Adversarial Point Clouds by Initializing Transferable Adversarial Noise","authors":"Hai Chen;Shu Zhao;Yuanting Yan;Fulan Qian","doi":"10.1109/LSP.2024.3509335","DOIUrl":"https://doi.org/10.1109/LSP.2024.3509335","url":null,"abstract":"One of the most popular methods for analyzing the robustness of 3D Deep Neural Networks (DNNs) is the transfer-based adversarial attack method, as it allows to analyze the robustness of an unknown model by generating an adversarial point cloud on an alternative model. However, the adversarial point clouds generated by current methods may overfit the surrogate models that generated them, thus limiting their performance in transfer attacks against different target 3D classifiers. To enhance the transferability of the adversarial point cloud, we propose in this letter an adversarial attack method by Initializing the Transferable Adversarial Noise, which named as \u0000<bold>ITAN</b>\u0000. Specifically, we pre-train on the training set a generator capable of generating the adversarial noise with transferability and diversity, and then the noise generated by the generator serves as the initial adversarial noise to be integrated into the iterations of the attack. Extensive experiments on well-recognized benchmark datasets demonstrate that the adversarial point clouds generated by the proposed ITAN could be effectively transferred across unknown 3D classifiers.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"201-205"},"PeriodicalIF":3.2,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142844601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cholesky-KalmanNet: Model-Based Deep Learning With Positive Definite Error Covariance Structure Cholesky-KalmanNet：基于模型的深度学习正定误差协方差结构

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-17 DOI: 10.1109/LSP.2024.3519265

Minhyeok Ko;Abdollah Shafieezadeh

State estimation from noisy observations is crucial across various fields. Traditional methods such as Kalman, Extended Kalman, and Unscented Kalman Filter often struggle with nonlinearities, model inaccuracies, and high observation noise. This letter introduces Cholesky-KalmanNet (CKN), a model-based deep learning approach that considerably enhances state estimation by providing and enforcing transiently precise error covariance estimation. Specifically, the CKN embeds mathematical constraints associated with the positive definiteness of error covariance in a recurrent DNN architecture through the Cholesky decomposition. This architecture enhances statistical reliability and mitigates numerical instabilities. Furthermore, introducing a novel loss function that minimizes discrepancies between the estimated and empirical error covariance ensures a comprehensive minimization of estimation errors, accounting for interdependencies among state variables. Extensive evaluations on both synthetic and real-world datasets affirm CKN's superior performance vis-a-vis state estimation accuracy, robustness against system inaccuracies and observation noise, as well as stability across varying training data partitions, an essential feature for practical scenarios with suboptimal data conditions.

从噪声观测中估计状态在各个领域都是至关重要的。传统的卡尔曼滤波、扩展卡尔曼滤波和无气味卡尔曼滤波等方法往往存在非线性、模型不准确和观测噪声大等问题。这封信介绍了Cholesky-KalmanNet (CKN)，这是一种基于模型的深度学习方法，通过提供和强制瞬态精确误差协方差估计，大大增强了状态估计。具体来说，CKN通过Cholesky分解在循环DNN架构中嵌入与误差协方差正确定性相关的数学约束。这种结构提高了统计可靠性，减轻了数值不稳定性。此外，引入一种新的损失函数，最小化估计误差和经验误差协方差之间的差异，确保估计误差的全面最小化，考虑到状态变量之间的相互依赖性。对合成数据集和真实数据集的广泛评估证实了CKN在状态估计准确性方面的卓越性能，对系统不准确性和观察噪声的鲁棒性，以及不同训练数据分区的稳定性，这是具有次优数据条件的实际场景的基本特征。

{"title":"Cholesky-KalmanNet: Model-Based Deep Learning With Positive Definite Error Covariance Structure","authors":"Minhyeok Ko;Abdollah Shafieezadeh","doi":"10.1109/LSP.2024.3519265","DOIUrl":"https://doi.org/10.1109/LSP.2024.3519265","url":null,"abstract":"State estimation from noisy observations is crucial across various fields. Traditional methods such as Kalman, Extended Kalman, and Unscented Kalman Filter often struggle with nonlinearities, model inaccuracies, and high observation noise. This letter introduces Cholesky-KalmanNet (CKN), a model-based deep learning approach that considerably enhances state estimation by providing and enforcing transiently precise error covariance estimation. Specifically, the CKN embeds mathematical constraints associated with the positive definiteness of error covariance in a recurrent DNN architecture through the Cholesky decomposition. This architecture enhances statistical reliability and mitigates numerical instabilities. Furthermore, introducing a novel loss function that minimizes discrepancies between the estimated and empirical error covariance ensures a comprehensive minimization of estimation errors, accounting for interdependencies among state variables. Extensive evaluations on both synthetic and real-world datasets affirm CKN's superior performance vis-a-vis state estimation accuracy, robustness against system inaccuracies and observation noise, as well as stability across varying training data partitions, an essential feature for practical scenarios with suboptimal data conditions.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"326-330"},"PeriodicalIF":3.2,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parallel State Estimation for Systems With Integrated Measurements 集成测量系统的并行状态估计

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-17 DOI: 10.1109/LSP.2024.3519258

Fatemeh Yaghoobi;Simo Särkkä

This paper presents parallel-in-time state estimation methods for systems with Slow-Rate inTegrated Measurements (SRTM). Integrated measurements are common in various applications, and they appear in analysis of data resulting from processes that require material collection or integration over the sampling period. Current state estimation methods for SRTM are inherently sequential, preventing temporal parallelization in their standard form. This paper proposes parallel Bayesian filters and smoothers for linear Gaussian SRTM models. For that purpose, we develop a novel smoother for SRTM models and develop parallel-in-time filters and smoother for them using an associative scan-based parallel formulation. Empirical experiments ran on a GPU demonstrate the superior time complexity of the proposed methods over traditional sequential approaches.

提出了一种具有慢速综合测量（SRTM）系统的并行实时状态估计方法。集成测量在各种应用中很常见，它们出现在需要在采样期间收集或集成材料的过程中产生的数据分析中。当前SRTM的状态估计方法本质上是顺序的，阻止了其标准形式的时间并行化。本文针对线性高斯SRTM模型提出了并行贝叶斯滤波器和平滑器。为此，我们为SRTM模型开发了一种新的平滑器，并使用基于关联扫描的并行公式开发了并行实时滤波器和平滑器。在GPU上运行的经验实验表明，所提出的方法比传统的顺序方法具有更好的时间复杂度。

引用次数: 0

Part2Pose: Inferring Human Pose From Parts in Complex Scenes Part2Pose：从复杂场景中的部分推断人体姿势

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-13 DOI: 10.1109/LSP.2024.3517418

Rong Zhang;Junneng Feng;Cun Feng;Yirui Wang;Lijun Guo

Most of existing Human Pose Estimation (HPE) methods struggle to handle with challenges such as changeable poses, complex backgrounds, and occlusion encountered in complex scenes. To address these problems, a novel HPE network, called Part2Pose, is proposed in this paper. In our Part2Pose, instead of focusing on small-sized keypoints like existing HPE methods do, we first extract image features based on human body parts to expand the detection scope. This strategy enhances the robustness of the extracted features to variations and distractions in complex scenes. Then, a Transformer-based Global Part Relation Module (GPRM) and a graph convolutional network-based Local Part Relation Module (LPRM) are used to capture global and local relationships among different body parts to help infer the position of keypoints. Extensive experiments on challenging datasets, including COCO, CrowdPose and OCHuman, show that the proposed Part2Pose can surpass existing popular state-of-the-art HPE methods. The combination with lightweight networks confirms the robustness and generalizability of our Part2Pose.

大多数现有的人体姿态估计（HPE）方法都难以处理复杂场景中遇到的姿态变化、复杂背景和遮挡等挑战。为了解决这些问题，本文提出了一种新的HPE网络，称为Part2Pose。在我们的Part2Pose中，我们不像现有的HPE方法那样专注于小尺寸的关键点，而是首先基于人体部位提取图像特征，扩大检测范围。该策略增强了提取的特征对复杂场景中变化和干扰的鲁棒性。然后，使用基于变压器的全局部分关系模块（GPRM）和基于图卷积网络的局部部分关系模块（LPRM）来捕获不同身体部位之间的全局和局部关系，以帮助推断关键点的位置。在具有挑战性的数据集（包括COCO、CrowdPose和ochhuman）上进行的大量实验表明，提出的Part2Pose可以超越现有流行的最先进的HPE方法。与轻量级网络的结合证实了我们Part2Pose的鲁棒性和泛化性。

引用次数: 0

Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution 盲图像超分辨率的多模态先验引导扩散模型

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-12 DOI: 10.1109/LSP.2024.3516699

Detian Huang;Jiaxun Song;Xiaoqian Huang;Zhenzhen Hu;Huanqiang Zeng

Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resolution images to guide diffusion models for restoring high-fidelity images, resulting in inferior realism. In this letter, we propose a Multi-modal Prior-Guided diffusion model for blind image Super-Resolution (MPGSR), which fine-tunes Stable Diffusion (SD) by utilizing the superior visual-and-textual guidance for restoring realistic high-resolution images. Specifically, our MPGSR involves two stages, i.e., multi-modal guidance extraction and adaptive guidance injection. For the former, we propose a composited transformer and further incorporate it with GPT-CLIP to extract the representative visual-and-textual guidance. For the latter, we design a feature calibration ControlNet to inject the visual guidance and employ the cross-attention layer provided by the frozen SD to inject the textual guidance, thus effectively activating the powerful text-to-image generation potential. Extensive experiments show that our MPGSR outperforms state-of-the-art methods in restoration quality and convergence time.

近年来，扩散模型在盲图像超分辨方面取得了显著的成功。然而，现有的方法大多仅依靠单模态退化的低分辨率图像来指导扩散模型来恢复高保真图像，导致真实感较差。在这篇文章中，我们提出了一种用于盲图像超分辨率（MPGSR）的多模态先验引导扩散模型，该模型通过利用优越的视觉和文本指导来微调稳定扩散（SD），以恢复逼真的高分辨率图像。具体来说，我们的MPGSR包括两个阶段，即多模态制导提取和自适应制导注入。对于前者，我们提出了一个复合变压器，并进一步将其与GPT-CLIP结合，以提取具有代表性的视觉和文本指导。对于后者，我们设计了一个特征校准ControlNet注入视觉引导，并利用静止SD提供的交叉注意层注入文本引导，从而有效地激活了强大的文本到图像生成潜力。大量的实验表明，我们的MPGSR在恢复质量和收敛时间方面优于最先进的方法。

{"title":"Multi-Modal Prior-Guided Diffusion Model for Blind Image Super-Resolution","authors":"Detian Huang;Jiaxun Song;Xiaoqian Huang;Zhenzhen Hu;Huanqiang Zeng","doi":"10.1109/LSP.2024.3516699","DOIUrl":"https://doi.org/10.1109/LSP.2024.3516699","url":null,"abstract":"Recently, diffusion models have achieved remarkable success in blind image super-resolution. However, most existing methods rely solely on uni-modal degraded low-resolution images to guide diffusion models for restoring high-fidelity images, resulting in inferior realism. In this letter, we propose a Multi-modal Prior-Guided diffusion model for blind image Super-Resolution (MPGSR), which fine-tunes Stable Diffusion (SD) by utilizing the superior visual-and-textual guidance for restoring realistic high-resolution images. Specifically, our MPGSR involves two stages, i.e., multi-modal guidance extraction and adaptive guidance injection. For the former, we propose a composited transformer and further incorporate it with GPT-CLIP to extract the representative visual-and-textual guidance. For the latter, we design a feature calibration ControlNet to inject the visual guidance and employ the cross-attention layer provided by the frozen SD to inject the textual guidance, thus effectively activating the powerful text-to-image generation potential. Extensive experiments show that our MPGSR outperforms state-of-the-art methods in restoration quality and convergence time.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"316-320"},"PeriodicalIF":3.2,"publicationDate":"2024-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hypformer: A Fast Hypothesis-Driven Rescoring Speech Recognition Framework Hypformer：一个快速假设驱动的语音识别框架

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2024-12-12 DOI: 10.1109/LSP.2024.3516700

Xuyi Zhuang;Yukun Qian;Mingjiang Wang

Recently, the performance of non-autoregressive ASR models has made significant progress but still lags behind hybrid CTC/attention systems. This paper introduces Hypformer, a fast hypothesis-driven rescoring speech recognition framework. Multiple hypothetical prefixes are realized by fast prefix generation algorithm. With two different rescoring methods, nar-ar rescoring and nar

$^{2}$

rescoring, Hypformer can flexibly switch between autoregressive and non-autoregressive decoding modes to perform rescoring of hypothesis prefixes. Experiments on the standard Mandarin datasets AISHELL-1 and AISHELL-2 demonstrate that Hypformer outperforms the state-of-the-art Hybrid CTC/Attention systems in ASR performance while achieving a speedup of over six times. Experiments on the Mandarin sub-dialect dataset KeSpeech indicate that Hypformer achieves more accurate recognition by leveraging richer contextual information.

近年来，非自回归ASR模型的性能取得了显著进展，但仍落后于混合CTC/注意力系统。本文介绍了Hypformer，一个基于假设驱动的快速语音识别框架。通过快速前缀生成算法实现了多个假设前缀。Hypformer采用nar-ar和nar$^{2}$两种不同的评分方法，可以灵活地在自回归和非自回归解码模式之间切换，对假设前缀进行评分。在标准普通话数据集AISHELL-1和AISHELL-2上的实验表明，Hypformer在ASR性能上优于最先进的Hybrid CTC/Attention系统，同时实现了6倍以上的加速。在普通话子方言数据集kesspeech上的实验表明，Hypformer利用了更丰富的上下文信息，实现了更准确的识别。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

IEEE Signal Processing Letters

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀