IEEE Signal Processing Letters最新文献

Progressive Skip Connection Improves Consistency of Diffusion-Based Speech Enhancement 渐进式跳跃连接提高了基于扩散的语音增强的一致性

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-14 DOI: 10.1109/LSP.2025.3560622

Yue Lei;Xucheng Luo;Wenxin Tai;Fan Zhou

Recent advancements in generative modeling have successfully integrated denoising diffusion probabilistic models (DDPMs) into the domain of speech enhancement (SE). Despite their considerable advantages in generalizability, ensuring semantic consistency of the generated samples with the condition signal remains a formidable challenge. Inspired by techniques addressing posterior collapse in variational autoencoders, we explore skip connections within diffusion-based SE models to improve consistency with condition signals. However, experiments reveal that simply adding skip connections is ineffective and even counterproductive. We argue that the independence between the predictive target and the condition signal causes this failure. To address this, we modify the training objective from predicting random Gaussian noise to predicting clean speech and propose a progressive skip connection strategy to mitigate the decrease in mutual information between the layer's output and the condition signal as network depth increases. Experiments on two standard datasets demonstrate the effectiveness of our approach in both seen and unseen scenarios.

生成建模的最新进展已经成功地将去噪扩散概率模型（ddpm）集成到语音增强（SE）领域中。尽管它们在泛化方面具有相当大的优势，但确保生成的样本与条件信号的语义一致性仍然是一个巨大的挑战。受变分自编码器解决后验崩溃技术的启发，我们探索了基于扩散的SE模型中的跳过连接，以提高与条件信号的一致性。然而，实验表明，简单地添加跳过连接是无效的，甚至适得其反。我们认为预测目标和条件信号之间的独立性导致了这种失败。为了解决这个问题，我们将训练目标从预测随机高斯噪声修改为预测干净语音，并提出了一种渐进式跳过连接策略，以缓解随着网络深度的增加，层输出和条件信号之间互信息的减少。在两个标准数据集上的实验证明了我们的方法在可见和不可见场景下的有效性。

{"title":"Progressive Skip Connection Improves Consistency of Diffusion-Based Speech Enhancement","authors":"Yue Lei;Xucheng Luo;Wenxin Tai;Fan Zhou","doi":"10.1109/LSP.2025.3560622","DOIUrl":"https://doi.org/10.1109/LSP.2025.3560622","url":null,"abstract":"Recent advancements in generative modeling have successfully integrated denoising diffusion probabilistic models (DDPMs) into the domain of speech enhancement (SE). Despite their considerable advantages in generalizability, ensuring semantic consistency of the generated samples with the condition signal remains a formidable challenge. Inspired by techniques addressing posterior collapse in variational autoencoders, we explore skip connections within diffusion-based SE models to improve consistency with condition signals. However, experiments reveal that simply adding skip connections is ineffective and even counterproductive. We argue that the independence between the predictive target and the condition signal causes this failure. To address this, we modify the training objective from predicting random Gaussian noise to predicting clean speech and propose a progressive skip connection strategy to mitigate the decrease in mutual information between the layer's output and the condition signal as network depth increases. Experiments on two standard datasets demonstrate the effectiveness of our approach in both seen and unseen scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1650-1654"},"PeriodicalIF":3.2,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Streamable Neural Audio Codec With Residual Scalar-Vector Quantization for Real-Time Communication 基于残差标量矢量量化的可流神经音频编解码器

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-11 DOI: 10.1109/LSP.2025.3560172

Xiao-Hang Jiang;Yang Ai;Rui-Chen Zheng;Zhen-Hua Ling

This paper proposes StreamCodec, a streamable neural audio codec designed for real-time communication. StreamCodec adopts a fully causal, symmetric encoder-decoder structure and operates in the modified discrete cosine transform (MDCT) domain, aiming for low-latency inference and real-time efficient generation. To improve codebook utilization efficiency and compensate for the audio quality loss caused by structural causality, StreamCodec introduces a novel residual scalar-vector quantizer (RSVQ). The RSVQ sequentially connects scalar quantizers and improved vector quantizers in a residual manner, constructing coarse audio contours and refining acoustic details, respectively. Experimental results confirm that the proposed StreamCodec achieves decoded audio quality comparable to advanced non-streamable neural audio codecs. Specifically, on the 16 kHz LibriTTS dataset, StreamCodec attains a ViSQOL score of 4.30 at 1.5 kbps. It has a fixed latency of only 20 ms and achieves a generation speed nearly 20 times real-time on a CPU, with a lightweight model size of just 7 M parameters, making it highly suitable for real-time communication applications.

本文提出了一种用于实时通信的可流神经音频编解码器——流编解码器。StreamCodec采用完全因果对称的编码器-解码器结构，工作在改进的离散余弦变换（MDCT）域中，旨在实现低延迟推理和实时高效生成。为了提高码本利用率和弥补结构因果关系造成的音质损失，StreamCodec引入了一种新的残差标量矢量量化器（RSVQ）。RSVQ以残差方式依次连接标量量化器和改进矢量量化器，分别构建粗音频轮廓和细化声学细节。实验结果表明，所提出的流编解码器所解码的音频质量与先进的非流神经音频编解码器相当。具体来说，在16 kHz LibriTTS数据集上，StreamCodec以1.5 kbps的速度获得4.30的ViSQOL分数。它的固定延迟仅为20毫秒，在CPU上实现了近20倍的实时生成速度，轻量级模型尺寸仅为7 M参数，非常适合实时通信应用。

{"title":"A Streamable Neural Audio Codec With Residual Scalar-Vector Quantization for Real-Time Communication","authors":"Xiao-Hang Jiang;Yang Ai;Rui-Chen Zheng;Zhen-Hua Ling","doi":"10.1109/LSP.2025.3560172","DOIUrl":"https://doi.org/10.1109/LSP.2025.3560172","url":null,"abstract":"This paper proposes StreamCodec, a streamable neural audio codec designed for real-time communication. StreamCodec adopts a fully causal, symmetric encoder-decoder structure and operates in the modified discrete cosine transform (MDCT) domain, aiming for low-latency inference and real-time efficient generation. To improve codebook utilization efficiency and compensate for the audio quality loss caused by structural causality, StreamCodec introduces a novel residual scalar-vector quantizer (RSVQ). The RSVQ sequentially connects scalar quantizers and improved vector quantizers in a residual manner, constructing coarse audio contours and refining acoustic details, respectively. Experimental results confirm that the proposed StreamCodec achieves decoded audio quality comparable to advanced non-streamable neural audio codecs. Specifically, on the 16 kHz LibriTTS dataset, StreamCodec attains a ViSQOL score of 4.30 at 1.5 kbps. It has a fixed latency of only 20 ms and achieves a generation speed nearly 20 times real-time on a CPU, with a lightweight model size of just 7 M parameters, making it highly suitable for real-time communication applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1645-1649"},"PeriodicalIF":3.2,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Image Forgery Localization With State Space Models 利用状态空间模型进行图像伪造定位

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-09 DOI: 10.1109/LSP.2025.3559429

Zijie Lou;Gang Cao;Kun Guo;Shaowei Weng;Lifang Yu

Pixel dependency modeling from tampered images is pivotal for image forgery localization. Current approaches predominantly rely on Convolutional Neural Networks (CNNs) or Transformer-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, we propose LoMa, a novel image forgery localization method that leverages the selective SSMs. Specifically, LoMa initially employs atrous selective scan to traverse the spatial domain and convert the tampered image into ordered patch sequences, and subsequently applies multi-directional state space modeling. In addition, an auxiliary convolutional branch is introduced to enhance local feature extraction. Extensive experimental results validate the superiority of LoMa over CNN-based and Transformer-based state-of-the-arts. To our best knowledge, this is the first image forgery localization model constructed based on the SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based forgery localization models.

篡改图像的像素依赖关系建模是图像伪造定位的关键。目前的方法主要依赖于卷积神经网络（cnn）或基于变压器的模型，这些模型通常要么缺乏足够的接受域，要么需要大量的计算开销。最近，以Mamba为例的状态空间模型（SSMs）作为一种很有前途的方法出现了。它们不仅在远程相互作用建模方面表现优异，而且保持了线性计算复杂度。本文提出了一种利用选择性ssm的图像伪造定位方法LoMa。具体而言，LoMa首先采用非均匀选择性扫描遍历空间域，将篡改后的图像转换为有序的补丁序列，然后应用多向状态空间建模。此外，还引入了一个辅助的卷积分支来增强局部特征提取。大量的实验结果验证了LoMa相对于基于cnn和基于transformer的最先进技术的优越性。据我们所知，这是第一个基于ssm模型构建的图像伪造定位模型。我们的目标是建立一个基线，并为未来发展更高效和有效的基于ssm的伪造定位模型提供有价值的见解。

{"title":"Image Forgery Localization With State Space Models","authors":"Zijie Lou;Gang Cao;Kun Guo;Shaowei Weng;Lifang Yu","doi":"10.1109/LSP.2025.3559429","DOIUrl":"https://doi.org/10.1109/LSP.2025.3559429","url":null,"abstract":"Pixel dependency modeling from tampered images is pivotal for image forgery localization. Current approaches predominantly rely on Convolutional Neural Networks (CNNs) or Transformer-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, we propose LoMa, a novel image forgery localization method that leverages the selective SSMs. Specifically, LoMa initially employs atrous selective scan to traverse the spatial domain and convert the tampered image into ordered patch sequences, and subsequently applies multi-directional state space modeling. In addition, an auxiliary convolutional branch is introduced to enhance local feature extraction. Extensive experimental results validate the superiority of LoMa over CNN-based and Transformer-based state-of-the-arts. To our best knowledge, this is the first image forgery localization model constructed based on the SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based forgery localization models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1590-1594"},"PeriodicalIF":3.2,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Self-Weighted Multi-View Fuzzy Clustering With Multiple Graph Learning 基于多图学习的自加权多视图模糊聚类

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-08 DOI: 10.1109/LSP.2025.3558161

Chaodie Liu;Cheng Chang;Feiping Nie

Graph-based multi-view clustering has garnered considerable attention owing to its effectiveness. Nevertheless, despite the promising performance achieved by previous studies, several limitations remain to be addressed. Most graph-based models employ a two-stage strategy involving relaxation and discretization to derive clustering results, which may lead to deviation from the original problem. Moreover, graph-based methods do not adequately address the challenges of overlapping clusters or ambiguous cluster membership. Additionally, assigning appropriate weights based on the importance of each view is crucial. To address these problems, we propose a self-weighted multi-view fuzzy clustering algorithm that incorporates multiple graph learning. Specifically, we automatically allocate weights corresponding to each view to construct a fused similarity graph matrix. Subsequently, we approximate it as the scaled product of fuzzy membership matrices to directly derive clustering assignments. An iterative optimization algorithm is designed for solving the proposed model. Experiment evaluations conducted on benchmark datasets illustrate that the proposed method outperforms several leading multi-view clustering approaches.

基于图的多视图聚类由于其有效性而受到广泛关注。然而，尽管以前的研究取得了有希望的成绩，但仍有几个限制有待解决。大多数基于图的模型采用松弛和离散两阶段策略来获得聚类结果，这可能导致与原始问题的偏差。此外，基于图的方法不能充分解决重叠集群或模糊集群成员的挑战。此外，根据每个视图的重要性分配适当的权重是至关重要的。为了解决这些问题，我们提出了一种结合多图学习的自加权多视图模糊聚类算法。具体地说，我们自动分配每个视图对应的权重来构造一个融合的相似图矩阵。随后，我们将其近似为模糊隶属度矩阵的缩放积，从而直接导出聚类分配。设计了求解该模型的迭代优化算法。在基准数据集上进行的实验评估表明，该方法优于几种领先的多视图聚类方法。

{"title":"Self-Weighted Multi-View Fuzzy Clustering With Multiple Graph Learning","authors":"Chaodie Liu;Cheng Chang;Feiping Nie","doi":"10.1109/LSP.2025.3558161","DOIUrl":"https://doi.org/10.1109/LSP.2025.3558161","url":null,"abstract":"Graph-based multi-view clustering has garnered considerable attention owing to its effectiveness. Nevertheless, despite the promising performance achieved by previous studies, several limitations remain to be addressed. Most graph-based models employ a two-stage strategy involving relaxation and discretization to derive clustering results, which may lead to deviation from the original problem. Moreover, graph-based methods do not adequately address the challenges of overlapping clusters or ambiguous cluster membership. Additionally, assigning appropriate weights based on the importance of each view is crucial. To address these problems, we propose a self-weighted multi-view fuzzy clustering algorithm that incorporates multiple graph learning. Specifically, we automatically allocate weights corresponding to each view to construct a fused similarity graph matrix. Subsequently, we approximate it as the scaled product of fuzzy membership matrices to directly derive clustering assignments. An iterative optimization algorithm is designed for solving the proposed model. Experiment evaluations conducted on benchmark datasets illustrate that the proposed method outperforms several leading multi-view clustering approaches.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1585-1589"},"PeriodicalIF":3.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Clustering-Based Adaptive Query Generation for Semantic Segmentation 基于聚类的语义分割自适应查询生成

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-08 DOI: 10.1109/LSP.2025.3558160

Yeong Woo Kim;Wonjun Kim

Semantic segmentation is one of the crucial tasks in the field of computer vision, aiming to label each pixel according to its class. Most recently, several semantic segmentation methods, which adopt the transformer decoder with learnable queries, have achieved the impressive improvement. However, since learnable queries are primarily determined by the distribution of training samples, discriminative characteristics of the input image often have been disregarded. In this letter, we propose a novel clustering-based query generation method for semantic segmentation. The key idea of the proposed method is to adaptively generate queries based on the clustering scheme, which leverages semantic affinities in the latent space. By aggregating latent features that represent the same class in a given input, the semantic information of each class can be efficiently encoded into the query. Furthermore, we propose to apply the auxiliary loss function to predict the segmentation result in a coarse scale during the process of query generation. This enables each query to grasp spatial information of the target object in a given image. Experimental results on various benchmarks show that the proposed method effectively improves the performance of semantic segmentation.

语义分割是计算机视觉领域的关键任务之一，其目的是根据每个像素的类别标记每个像素。最近，几种采用具有可学习查询的转换器解码器的语义分割方法取得了令人印象深刻的改进。然而，由于可学习查询主要是由训练样本的分布决定的，输入图像的判别特征经常被忽略。在这封信中，我们提出了一种新的基于聚类的语义分割查询生成方法。该方法的核心思想是基于聚类方案自适应生成查询，该方案利用潜在空间中的语义亲和力。通过聚合给定输入中表示同一类的潜在特征，可以有效地将每个类的语义信息编码到查询中。此外，我们提出在查询生成过程中应用辅助损失函数在粗尺度上预测分割结果。这使得每个查询都能在给定图像中掌握目标对象的空间信息。各种基准测试的实验结果表明，该方法有效地提高了语义分割的性能。

{"title":"Clustering-Based Adaptive Query Generation for Semantic Segmentation","authors":"Yeong Woo Kim;Wonjun Kim","doi":"10.1109/LSP.2025.3558160","DOIUrl":"https://doi.org/10.1109/LSP.2025.3558160","url":null,"abstract":"Semantic segmentation is one of the crucial tasks in the field of computer vision, aiming to label each pixel according to its class. Most recently, several semantic segmentation methods, which adopt the transformer decoder with learnable queries, have achieved the impressive improvement. However, since learnable queries are primarily determined by the distribution of training samples, discriminative characteristics of the input image often have been disregarded. In this letter, we propose a novel clustering-based query generation method for semantic segmentation. The key idea of the proposed method is to adaptively generate queries based on the clustering scheme, which leverages semantic affinities in the latent space. By aggregating latent features that represent the same class in a given input, the semantic information of each class can be efficiently encoded into the query. Furthermore, we propose to apply the auxiliary loss function to predict the segmentation result in a coarse scale during the process of query generation. This enables each query to grasp spatial information of the target object in a given image. Experimental results on various benchmarks show that the proposed method effectively improves the performance of semantic segmentation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1580-1584"},"PeriodicalIF":3.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimization for Paralyzing G2A Communication Network: A DRL-Based Joint Path Planning and Jamming Power Allocation Approach 瘫痪G2A通信网络优化：基于drl的联合路径规划与干扰功率分配方法

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-04 DOI: 10.1109/LSP.2025.3558123

Xiang Peng;Hua Xu;Zisen Qi;Dan Wang;Yiqiong Pang

This letter investigates the jammer path planning and jamming power allocation problem during airborne deterrence operation (ADO) in highly dynamic environments. In response to airborne threats posed by enemy aircraft formations, jammers must rely on perceptual information to plan trajectories and emit jamming signals to paralyze the ground-to-air (G2A) communication networks. Unlike traditional static scenarios, the high mobility of both sides presents significant challenges. Most works only study jamming solutions for static ground or single airborne targets, failing to address multiple airborne targets. We propose a joint path planning and jamming power allocation approach based on deep reinforcement learning (JPPJPA-DRL). This approach considers the impact of flight paths on receiving antenna gain, models the ADO as a Markov Decision Process (MDP), and uses the proximal policy optimization (PPO) algorithm to generate optimized path points and jamming power allocation schemes. In addition, a scientific reward function is designed to guide the learning process, and a visual communication countermeasure simulation platform is developed. The results show that the proposed approach can efficiently paralyze G2A communication networks, outperforming the baseline.

本文研究了高动态环境下机载威慑作战（ADO）中的干扰机路径规划和干扰功率分配问题。为了应对敌方飞机编队构成的空中威胁，干扰机必须依靠感知信息来规划轨迹，并发射干扰信号来瘫痪地对空（G2A）通信网络。与传统的静态场景不同，双方的高机动性带来了重大挑战。大多数工作只研究静态地面或单个机载目标的干扰解决方案，未能解决多个机载目标。提出了一种基于深度强化学习（JPPJPA-DRL）的联合路径规划和干扰功率分配方法。该方法考虑了飞行路径对接收天线增益的影响，将ADO建模为马尔可夫决策过程（MDP），并利用最近策略优化（PPO）算法生成最优路径点和干扰功率分配方案。此外，设计了科学的奖励函数来指导学习过程，并开发了视觉通信对策仿真平台。结果表明，该方法能有效地麻痹G2A通信网络，性能优于基线。

{"title":"Optimization for Paralyzing G2A Communication Network: A DRL-Based Joint Path Planning and Jamming Power Allocation Approach","authors":"Xiang Peng;Hua Xu;Zisen Qi;Dan Wang;Yiqiong Pang","doi":"10.1109/LSP.2025.3558123","DOIUrl":"https://doi.org/10.1109/LSP.2025.3558123","url":null,"abstract":"This letter investigates the jammer path planning and jamming power allocation problem during airborne deterrence operation (ADO) in highly dynamic environments. In response to airborne threats posed by enemy aircraft formations, jammers must rely on perceptual information to plan trajectories and emit jamming signals to paralyze the ground-to-air (G2A) communication networks. Unlike traditional static scenarios, the high mobility of both sides presents significant challenges. Most works only study jamming solutions for static ground or single airborne targets, failing to address multiple airborne targets. We propose a joint path planning and jamming power allocation approach based on deep reinforcement learning (JPPJPA-DRL). This approach considers the impact of flight paths on receiving antenna gain, models the ADO as a Markov Decision Process (MDP), and uses the proximal policy optimization (PPO) algorithm to generate optimized path points and jamming power allocation schemes. In addition, a scientific reward function is designed to guide the learning process, and a visual communication countermeasure simulation platform is developed. The results show that the proposed approach can efficiently paralyze G2A communication networks, outperforming the baseline.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1640-1644"},"PeriodicalIF":3.2,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-Time Self-Supervised Ultrasound Image Enhancement Using Test-Time Adaptation for Sophisticated Rotator Cuff Tear Diagnosis 使用测试时间适应的实时自监督超声图像增强用于复杂的肩袖撕裂诊断

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-03 DOI: 10.1109/LSP.2025.3557754

Haeyun Lee;Kyungsu Lee;Jong Pil Yoon;Jihun Kim;Jun-Young Kim

Medical ultrasound imaging is a key diagnostic tool across various fields, with computer-aided diagnosis systems benefiting from advances in deep learning. However, its lower resolution and artifacts pose challenges, particularly for non-specialists. The simultaneous acquisition of degraded and high-quality images is infeasible, limiting supervised learning approaches. Additionally, self-supervised and zero-shot methods require extensive processing time, conflicting with the real-time demands of ultrasound imaging. Therefore, to address the aforementioned issues, we propose real-time ultrasound image enhancement via a self-supervised learning technique and a test-time adaptation for sophisticated rotational cuff tear diagnosis. The proposed approach learns from other domain image datasets and performs self-supervised learning on an ultrasound image during inference for enhancement. Our approach not only demonstrated superior ultrasound image enhancement performance compared to other state-of-the-art methods but also achieved an 18% improvement in the RCT segmentation performance.

医学超声成像是各个领域的关键诊断工具，计算机辅助诊断系统受益于深度学习的进步。然而，它的低分辨率和伪影带来了挑战，特别是对于非专业人士。同时获取退化和高质量的图像是不可行的，限制了监督学习方法。此外，自监督和零射击方法需要大量的处理时间，与超声成像的实时性要求相冲突。因此，为了解决上述问题，我们提出了通过自监督学习技术和测试时间适应的实时超声图像增强技术，用于复杂的旋转袖带撕裂诊断。该方法从其他领域图像数据集学习，并在推理过程中对超声图像进行自监督学习以增强。与其他最先进的方法相比，我们的方法不仅表现出优越的超声图像增强性能，而且在RCT分割性能上提高了18%。

{"title":"Real-Time Self-Supervised Ultrasound Image Enhancement Using Test-Time Adaptation for Sophisticated Rotator Cuff Tear Diagnosis","authors":"Haeyun Lee;Kyungsu Lee;Jong Pil Yoon;Jihun Kim;Jun-Young Kim","doi":"10.1109/LSP.2025.3557754","DOIUrl":"https://doi.org/10.1109/LSP.2025.3557754","url":null,"abstract":"Medical ultrasound imaging is a key diagnostic tool across various fields, with computer-aided diagnosis systems benefiting from advances in deep learning. However, its lower resolution and artifacts pose challenges, particularly for non-specialists. The simultaneous acquisition of degraded and high-quality images is infeasible, limiting supervised learning approaches. Additionally, self-supervised and zero-shot methods require extensive processing time, conflicting with the real-time demands of ultrasound imaging. Therefore, to address the aforementioned issues, we propose real-time ultrasound image enhancement via a self-supervised learning technique and a test-time adaptation for sophisticated rotational cuff tear diagnosis. The proposed approach learns from other domain image datasets and performs self-supervised learning on an ultrasound image during inference for enhancement. Our approach not only demonstrated superior ultrasound image enhancement performance compared to other state-of-the-art methods but also achieved an 18% improvement in the RCT segmentation performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1635-1639"},"PeriodicalIF":3.2,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Bayesian Language Model Adaptation for Personalized Speech Recognition 个性化语音识别的贝叶斯语言模型自适应

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-02 DOI: 10.1109/LSP.2025.3556787

Mun-Hak Lee;Ji-Hwan Mo;Ji-Hun Kang;Jin-Young Son;Joon-Hyuk Chang

In deployment environments for speech recognition models, diverse proper nouns such as personal names, song titles, and application names are frequently uttered. These proper nouns are often sparsely distributed within the training dataset, leading to performance degradation and limiting the practical utility of the models. Personalization strategies that leverage user-specific information, such as contact lists or search histories, have proven effective in mitigating performance degradation caused by rare words. In this study, we propose a novel personalization method for combining the scores of a general language model (LM) and a personal LM within a probabilistic framework. The proposed method entails low computational costs, storage requirements, and latency. Through experiments using a real-world dataset collected from the vehicle environment, we demonstrate that the proposed method effectively overcomes the out-of-vocabulary problem and improves recognition performance for rare words.

在语音识别模型的部署环境中，经常会出现各种专有名词，如人名、歌名和应用程序名称。这些专有名词通常稀疏地分布在训练数据集中，导致性能下降并限制了模型的实际效用。利用特定于用户的信息（如联系人列表或搜索历史记录）的个性化策略已被证明可以有效地减轻由生疏词引起的性能下降。在这项研究中，我们提出了一种新的个性化方法，将通用语言模型（LM）和个人LM的分数在概率框架内结合起来。该方法具有较低的计算成本、存储需求和延迟。通过对真实车辆环境数据集的实验，我们证明了该方法有效地克服了词汇外问题，提高了对罕见词的识别性能。

引用次数: 0

Multi-Party Reversible Data Hiding in Ciphertext Binary Images Based on Visual Cryptography 基于视觉密码学的密文二值图像多方可逆数据隐藏

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-02 DOI: 10.1109/LSP.2025.3557273

Bing Chen;Jingkun Yu;Bingwen Feng;Wei Lu;Jun Cai

Existing methods for reversible data hiding in ciphertext binary images only involve one data hider to perform data embedding. When the data hider is attacked, the original binary image cannot be perfectly reconstructed. To this end, this letter proposes multi-party reversible data hiding in ciphertext binary images, where multiple data hiders are involved in data embedding. In this solution, we use visual cryptography technology to encrypt a binary image into multiple ciphertext binary images, and transmit the ciphertext binary images to different data hiders. Each data hider can embed data into a ciphertext binary image and generate a marked ciphertext binary image. The original binary image is perfectly reconstructed by collecting a portion of marked ciphertext binary images from the unattacked data hiders. Compared with existing solutions, the proposed solution enhances the recoverability of the original binary image. Besides, the proposed solution maintains a stable embedding capacity for different categories of images.

现有的密文二进制图像可逆数据隐藏方法只涉及一个数据隐藏器来执行数据嵌入。当数据隐藏者受到攻击时，原始二进制图像就无法完美地重建。为此，本文提出了在密文二进制图像中进行多方可逆数据隐藏的方法，即多个数据隐藏者参与数据嵌入。在该方案中，我们利用可视密码技术将二进制图像加密为多个密文二进制图像，并将密文二进制图像传输给不同的数据隐藏者。每个数据隐藏者都能将数据嵌入密文二进制图像，并生成标记的密文二进制图像。通过从未受攻击的数据隐藏者那里收集部分标记的密文二进制图像，就能完美地重建原始二进制图像。与现有解决方案相比，建议的解决方案提高了原始二进制图像的可恢复性。此外，建议的解决方案还能对不同类别的图像保持稳定的嵌入能力。

引用次数: 0

Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence 基于视觉语言对应的人工智能生成全方位图像的视口独立盲质评估

IF 3.2 2区工程技术 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Signal Processing Letters

Pub Date : 2025-04-02 DOI: 10.1109/LSP.2025.3556791

Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang

The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.

深度生成技术的进步显著促进了人工智能生成内容（AIGC）的增长。其中，人工智能生成的全向图像（AGOIs）在虚拟现实（VR）中具有相当大的应用前景。然而，AGOIs的质量参差不齐，对其质量评价的研究有限。在这封信中，受人类视觉系统特征的启发，我们提出了一种新的独立于视口的agai盲质量评估方法，称为VI-AGOIQA，它利用了视觉语言对应关系。具体而言，为了最大限度地减少基于视口的全向图像质量评估预测方法的计算负担，首先以等矩形投影（ERP）格式从AGOIs中提取一组图像补丁。然后，利用对比语言-图像预训练（CLIP）模型的预训练图像和文本编码器，有效地学习视觉和文本输入之间的对应关系。最后，基于学习到的视觉语言一致性知识，应用多模态特征融合模块预测人类视觉偏好。在公开数据库上进行的大量实验表明，该方法具有良好的性能。

{"title":"Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence","authors":"Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang","doi":"10.1109/LSP.2025.3556791","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556791","url":null,"abstract":"The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1630-1634"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0