Pub Date : 2025-04-14DOI: 10.1109/LSP.2025.3560622
Yue Lei;Xucheng Luo;Wenxin Tai;Fan Zhou
Recent advancements in generative modeling have successfully integrated denoising diffusion probabilistic models (DDPMs) into the domain of speech enhancement (SE). Despite their considerable advantages in generalizability, ensuring semantic consistency of the generated samples with the condition signal remains a formidable challenge. Inspired by techniques addressing posterior collapse in variational autoencoders, we explore skip connections within diffusion-based SE models to improve consistency with condition signals. However, experiments reveal that simply adding skip connections is ineffective and even counterproductive. We argue that the independence between the predictive target and the condition signal causes this failure. To address this, we modify the training objective from predicting random Gaussian noise to predicting clean speech and propose a progressive skip connection strategy to mitigate the decrease in mutual information between the layer's output and the condition signal as network depth increases. Experiments on two standard datasets demonstrate the effectiveness of our approach in both seen and unseen scenarios.
{"title":"Progressive Skip Connection Improves Consistency of Diffusion-Based Speech Enhancement","authors":"Yue Lei;Xucheng Luo;Wenxin Tai;Fan Zhou","doi":"10.1109/LSP.2025.3560622","DOIUrl":"https://doi.org/10.1109/LSP.2025.3560622","url":null,"abstract":"Recent advancements in generative modeling have successfully integrated denoising diffusion probabilistic models (DDPMs) into the domain of speech enhancement (SE). Despite their considerable advantages in generalizability, ensuring semantic consistency of the generated samples with the condition signal remains a formidable challenge. Inspired by techniques addressing posterior collapse in variational autoencoders, we explore skip connections within diffusion-based SE models to improve consistency with condition signals. However, experiments reveal that simply adding skip connections is ineffective and even counterproductive. We argue that the independence between the predictive target and the condition signal causes this failure. To address this, we modify the training objective from predicting random Gaussian noise to predicting clean speech and propose a progressive skip connection strategy to mitigate the decrease in mutual information between the layer's output and the condition signal as network depth increases. Experiments on two standard datasets demonstrate the effectiveness of our approach in both seen and unseen scenarios.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1650-1654"},"PeriodicalIF":3.2,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes StreamCodec, a streamable neural audio codec designed for real-time communication. StreamCodec adopts a fully causal, symmetric encoder-decoder structure and operates in the modified discrete cosine transform (MDCT) domain, aiming for low-latency inference and real-time efficient generation. To improve codebook utilization efficiency and compensate for the audio quality loss caused by structural causality, StreamCodec introduces a novel residual scalar-vector quantizer (RSVQ). The RSVQ sequentially connects scalar quantizers and improved vector quantizers in a residual manner, constructing coarse audio contours and refining acoustic details, respectively. Experimental results confirm that the proposed StreamCodec achieves decoded audio quality comparable to advanced non-streamable neural audio codecs. Specifically, on the 16 kHz LibriTTS dataset, StreamCodec attains a ViSQOL score of 4.30 at 1.5 kbps. It has a fixed latency of only 20 ms and achieves a generation speed nearly 20 times real-time on a CPU, with a lightweight model size of just 7 M parameters, making it highly suitable for real-time communication applications.
{"title":"A Streamable Neural Audio Codec With Residual Scalar-Vector Quantization for Real-Time Communication","authors":"Xiao-Hang Jiang;Yang Ai;Rui-Chen Zheng;Zhen-Hua Ling","doi":"10.1109/LSP.2025.3560172","DOIUrl":"https://doi.org/10.1109/LSP.2025.3560172","url":null,"abstract":"This paper proposes StreamCodec, a streamable neural audio codec designed for real-time communication. StreamCodec adopts a fully causal, symmetric encoder-decoder structure and operates in the modified discrete cosine transform (MDCT) domain, aiming for low-latency inference and real-time efficient generation. To improve codebook utilization efficiency and compensate for the audio quality loss caused by structural causality, StreamCodec introduces a novel residual scalar-vector quantizer (RSVQ). The RSVQ sequentially connects scalar quantizers and improved vector quantizers in a residual manner, constructing coarse audio contours and refining acoustic details, respectively. Experimental results confirm that the proposed StreamCodec achieves decoded audio quality comparable to advanced non-streamable neural audio codecs. Specifically, on the 16 kHz LibriTTS dataset, StreamCodec attains a ViSQOL score of 4.30 at 1.5 kbps. It has a fixed latency of only 20 ms and achieves a generation speed nearly 20 times real-time on a CPU, with a lightweight model size of just 7 M parameters, making it highly suitable for real-time communication applications.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1645-1649"},"PeriodicalIF":3.2,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09DOI: 10.1109/LSP.2025.3559429
Zijie Lou;Gang Cao;Kun Guo;Shaowei Weng;Lifang Yu
Pixel dependency modeling from tampered images is pivotal for image forgery localization. Current approaches predominantly rely on Convolutional Neural Networks (CNNs) or Transformer-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, we propose LoMa, a novel image forgery localization method that leverages the selective SSMs. Specifically, LoMa initially employs atrous selective scan to traverse the spatial domain and convert the tampered image into ordered patch sequences, and subsequently applies multi-directional state space modeling. In addition, an auxiliary convolutional branch is introduced to enhance local feature extraction. Extensive experimental results validate the superiority of LoMa over CNN-based and Transformer-based state-of-the-arts. To our best knowledge, this is the first image forgery localization model constructed based on the SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based forgery localization models.
{"title":"Image Forgery Localization With State Space Models","authors":"Zijie Lou;Gang Cao;Kun Guo;Shaowei Weng;Lifang Yu","doi":"10.1109/LSP.2025.3559429","DOIUrl":"https://doi.org/10.1109/LSP.2025.3559429","url":null,"abstract":"Pixel dependency modeling from tampered images is pivotal for image forgery localization. Current approaches predominantly rely on Convolutional Neural Networks (CNNs) or Transformer-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, we propose LoMa, a novel image forgery localization method that leverages the selective SSMs. Specifically, LoMa initially employs atrous selective scan to traverse the spatial domain and convert the tampered image into ordered patch sequences, and subsequently applies multi-directional state space modeling. In addition, an auxiliary convolutional branch is introduced to enhance local feature extraction. Extensive experimental results validate the superiority of LoMa over CNN-based and Transformer-based state-of-the-arts. To our best knowledge, this is the first image forgery localization model constructed based on the SSM-based model. We aim to establish a baseline and provide valuable insights for the future development of more efficient and effective SSM-based forgery localization models.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1590-1594"},"PeriodicalIF":3.2,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1109/LSP.2025.3558161
Chaodie Liu;Cheng Chang;Feiping Nie
Graph-based multi-view clustering has garnered considerable attention owing to its effectiveness. Nevertheless, despite the promising performance achieved by previous studies, several limitations remain to be addressed. Most graph-based models employ a two-stage strategy involving relaxation and discretization to derive clustering results, which may lead to deviation from the original problem. Moreover, graph-based methods do not adequately address the challenges of overlapping clusters or ambiguous cluster membership. Additionally, assigning appropriate weights based on the importance of each view is crucial. To address these problems, we propose a self-weighted multi-view fuzzy clustering algorithm that incorporates multiple graph learning. Specifically, we automatically allocate weights corresponding to each view to construct a fused similarity graph matrix. Subsequently, we approximate it as the scaled product of fuzzy membership matrices to directly derive clustering assignments. An iterative optimization algorithm is designed for solving the proposed model. Experiment evaluations conducted on benchmark datasets illustrate that the proposed method outperforms several leading multi-view clustering approaches.
{"title":"Self-Weighted Multi-View Fuzzy Clustering With Multiple Graph Learning","authors":"Chaodie Liu;Cheng Chang;Feiping Nie","doi":"10.1109/LSP.2025.3558161","DOIUrl":"https://doi.org/10.1109/LSP.2025.3558161","url":null,"abstract":"Graph-based multi-view clustering has garnered considerable attention owing to its effectiveness. Nevertheless, despite the promising performance achieved by previous studies, several limitations remain to be addressed. Most graph-based models employ a two-stage strategy involving relaxation and discretization to derive clustering results, which may lead to deviation from the original problem. Moreover, graph-based methods do not adequately address the challenges of overlapping clusters or ambiguous cluster membership. Additionally, assigning appropriate weights based on the importance of each view is crucial. To address these problems, we propose a self-weighted multi-view fuzzy clustering algorithm that incorporates multiple graph learning. Specifically, we automatically allocate weights corresponding to each view to construct a fused similarity graph matrix. Subsequently, we approximate it as the scaled product of fuzzy membership matrices to directly derive clustering assignments. An iterative optimization algorithm is designed for solving the proposed model. Experiment evaluations conducted on benchmark datasets illustrate that the proposed method outperforms several leading multi-view clustering approaches.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1585-1589"},"PeriodicalIF":3.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-08DOI: 10.1109/LSP.2025.3558160
Yeong Woo Kim;Wonjun Kim
Semantic segmentation is one of the crucial tasks in the field of computer vision, aiming to label each pixel according to its class. Most recently, several semantic segmentation methods, which adopt the transformer decoder with learnable queries, have achieved the impressive improvement. However, since learnable queries are primarily determined by the distribution of training samples, discriminative characteristics of the input image often have been disregarded. In this letter, we propose a novel clustering-based query generation method for semantic segmentation. The key idea of the proposed method is to adaptively generate queries based on the clustering scheme, which leverages semantic affinities in the latent space. By aggregating latent features that represent the same class in a given input, the semantic information of each class can be efficiently encoded into the query. Furthermore, we propose to apply the auxiliary loss function to predict the segmentation result in a coarse scale during the process of query generation. This enables each query to grasp spatial information of the target object in a given image. Experimental results on various benchmarks show that the proposed method effectively improves the performance of semantic segmentation.
{"title":"Clustering-Based Adaptive Query Generation for Semantic Segmentation","authors":"Yeong Woo Kim;Wonjun Kim","doi":"10.1109/LSP.2025.3558160","DOIUrl":"https://doi.org/10.1109/LSP.2025.3558160","url":null,"abstract":"Semantic segmentation is one of the crucial tasks in the field of computer vision, aiming to label each pixel according to its class. Most recently, several semantic segmentation methods, which adopt the transformer decoder with learnable queries, have achieved the impressive improvement. However, since learnable queries are primarily determined by the distribution of training samples, discriminative characteristics of the input image often have been disregarded. In this letter, we propose a novel clustering-based query generation method for semantic segmentation. The key idea of the proposed method is to adaptively generate queries based on the clustering scheme, which leverages semantic affinities in the latent space. By aggregating latent features that represent the same class in a given input, the semantic information of each class can be efficiently encoded into the query. Furthermore, we propose to apply the auxiliary loss function to predict the segmentation result in a coarse scale during the process of query generation. This enables each query to grasp spatial information of the target object in a given image. Experimental results on various benchmarks show that the proposed method effectively improves the performance of semantic segmentation.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1580-1584"},"PeriodicalIF":3.2,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143856346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-04DOI: 10.1109/LSP.2025.3558123
Xiang Peng;Hua Xu;Zisen Qi;Dan Wang;Yiqiong Pang
This letter investigates the jammer path planning and jamming power allocation problem during airborne deterrence operation (ADO) in highly dynamic environments. In response to airborne threats posed by enemy aircraft formations, jammers must rely on perceptual information to plan trajectories and emit jamming signals to paralyze the ground-to-air (G2A) communication networks. Unlike traditional static scenarios, the high mobility of both sides presents significant challenges. Most works only study jamming solutions for static ground or single airborne targets, failing to address multiple airborne targets. We propose a joint path planning and jamming power allocation approach based on deep reinforcement learning (JPPJPA-DRL). This approach considers the impact of flight paths on receiving antenna gain, models the ADO as a Markov Decision Process (MDP), and uses the proximal policy optimization (PPO) algorithm to generate optimized path points and jamming power allocation schemes. In addition, a scientific reward function is designed to guide the learning process, and a visual communication countermeasure simulation platform is developed. The results show that the proposed approach can efficiently paralyze G2A communication networks, outperforming the baseline.
{"title":"Optimization for Paralyzing G2A Communication Network: A DRL-Based Joint Path Planning and Jamming Power Allocation Approach","authors":"Xiang Peng;Hua Xu;Zisen Qi;Dan Wang;Yiqiong Pang","doi":"10.1109/LSP.2025.3558123","DOIUrl":"https://doi.org/10.1109/LSP.2025.3558123","url":null,"abstract":"This letter investigates the jammer path planning and jamming power allocation problem during airborne deterrence operation (ADO) in highly dynamic environments. In response to airborne threats posed by enemy aircraft formations, jammers must rely on perceptual information to plan trajectories and emit jamming signals to paralyze the ground-to-air (G2A) communication networks. Unlike traditional static scenarios, the high mobility of both sides presents significant challenges. Most works only study jamming solutions for static ground or single airborne targets, failing to address multiple airborne targets. We propose a joint path planning and jamming power allocation approach based on deep reinforcement learning (JPPJPA-DRL). This approach considers the impact of flight paths on receiving antenna gain, models the ADO as a Markov Decision Process (MDP), and uses the proximal policy optimization (PPO) algorithm to generate optimized path points and jamming power allocation schemes. In addition, a scientific reward function is designed to guide the learning process, and a visual communication countermeasure simulation platform is developed. The results show that the proposed approach can efficiently paralyze G2A communication networks, outperforming the baseline.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1640-1644"},"PeriodicalIF":3.2,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-03DOI: 10.1109/LSP.2025.3557754
Haeyun Lee;Kyungsu Lee;Jong Pil Yoon;Jihun Kim;Jun-Young Kim
Medical ultrasound imaging is a key diagnostic tool across various fields, with computer-aided diagnosis systems benefiting from advances in deep learning. However, its lower resolution and artifacts pose challenges, particularly for non-specialists. The simultaneous acquisition of degraded and high-quality images is infeasible, limiting supervised learning approaches. Additionally, self-supervised and zero-shot methods require extensive processing time, conflicting with the real-time demands of ultrasound imaging. Therefore, to address the aforementioned issues, we propose real-time ultrasound image enhancement via a self-supervised learning technique and a test-time adaptation for sophisticated rotational cuff tear diagnosis. The proposed approach learns from other domain image datasets and performs self-supervised learning on an ultrasound image during inference for enhancement. Our approach not only demonstrated superior ultrasound image enhancement performance compared to other state-of-the-art methods but also achieved an 18% improvement in the RCT segmentation performance.
{"title":"Real-Time Self-Supervised Ultrasound Image Enhancement Using Test-Time Adaptation for Sophisticated Rotator Cuff Tear Diagnosis","authors":"Haeyun Lee;Kyungsu Lee;Jong Pil Yoon;Jihun Kim;Jun-Young Kim","doi":"10.1109/LSP.2025.3557754","DOIUrl":"https://doi.org/10.1109/LSP.2025.3557754","url":null,"abstract":"Medical ultrasound imaging is a key diagnostic tool across various fields, with computer-aided diagnosis systems benefiting from advances in deep learning. However, its lower resolution and artifacts pose challenges, particularly for non-specialists. The simultaneous acquisition of degraded and high-quality images is infeasible, limiting supervised learning approaches. Additionally, self-supervised and zero-shot methods require extensive processing time, conflicting with the real-time demands of ultrasound imaging. Therefore, to address the aforementioned issues, we propose real-time ultrasound image enhancement via a self-supervised learning technique and a test-time adaptation for sophisticated rotational cuff tear diagnosis. The proposed approach learns from other domain image datasets and performs self-supervised learning on an ultrasound image during inference for enhancement. Our approach not only demonstrated superior ultrasound image enhancement performance compared to other state-of-the-art methods but also achieved an 18% improvement in the RCT segmentation performance.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1635-1639"},"PeriodicalIF":3.2,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In deployment environments for speech recognition models, diverse proper nouns such as personal names, song titles, and application names are frequently uttered. These proper nouns are often sparsely distributed within the training dataset, leading to performance degradation and limiting the practical utility of the models. Personalization strategies that leverage user-specific information, such as contact lists or search histories, have proven effective in mitigating performance degradation caused by rare words. In this study, we propose a novel personalization method for combining the scores of a general language model (LM) and a personal LM within a probabilistic framework. The proposed method entails low computational costs, storage requirements, and latency. Through experiments using a real-world dataset collected from the vehicle environment, we demonstrate that the proposed method effectively overcomes the out-of-vocabulary problem and improves recognition performance for rare words.
{"title":"Bayesian Language Model Adaptation for Personalized Speech Recognition","authors":"Mun-Hak Lee;Ji-Hwan Mo;Ji-Hun Kang;Jin-Young Son;Joon-Hyuk Chang","doi":"10.1109/LSP.2025.3556787","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556787","url":null,"abstract":"In deployment environments for speech recognition models, diverse proper nouns such as personal names, song titles, and application names are frequently uttered. These proper nouns are often sparsely distributed within the training dataset, leading to performance degradation and limiting the practical utility of the models. Personalization strategies that leverage user-specific information, such as contact lists or search histories, have proven effective in mitigating performance degradation caused by rare words. In this study, we propose a novel personalization method for combining the scores of a general language model (LM) and a personal LM within a probabilistic framework. The proposed method entails low computational costs, storage requirements, and latency. Through experiments using a real-world dataset collected from the vehicle environment, we demonstrate that the proposed method effectively overcomes the out-of-vocabulary problem and improves recognition performance for rare words.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1620-1624"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143871000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-02DOI: 10.1109/LSP.2025.3557273
Bing Chen;Jingkun Yu;Bingwen Feng;Wei Lu;Jun Cai
Existing methods for reversible data hiding in ciphertext binary images only involve one data hider to perform data embedding. When the data hider is attacked, the original binary image cannot be perfectly reconstructed. To this end, this letter proposes multi-party reversible data hiding in ciphertext binary images, where multiple data hiders are involved in data embedding. In this solution, we use visual cryptography technology to encrypt a binary image into multiple ciphertext binary images, and transmit the ciphertext binary images to different data hiders. Each data hider can embed data into a ciphertext binary image and generate a marked ciphertext binary image. The original binary image is perfectly reconstructed by collecting a portion of marked ciphertext binary images from the unattacked data hiders. Compared with existing solutions, the proposed solution enhances the recoverability of the original binary image. Besides, the proposed solution maintains a stable embedding capacity for different categories of images.
{"title":"Multi-Party Reversible Data Hiding in Ciphertext Binary Images Based on Visual Cryptography","authors":"Bing Chen;Jingkun Yu;Bingwen Feng;Wei Lu;Jun Cai","doi":"10.1109/LSP.2025.3557273","DOIUrl":"https://doi.org/10.1109/LSP.2025.3557273","url":null,"abstract":"Existing methods for reversible data hiding in ciphertext binary images only involve one data hider to perform data embedding. When the data hider is attacked, the original binary image cannot be perfectly reconstructed. To this end, this letter proposes multi-party reversible data hiding in ciphertext binary images, where multiple data hiders are involved in data embedding. In this solution, we use visual cryptography technology to encrypt a binary image into multiple ciphertext binary images, and transmit the ciphertext binary images to different data hiders. Each data hider can embed data into a ciphertext binary image and generate a marked ciphertext binary image. The original binary image is perfectly reconstructed by collecting a portion of marked ciphertext binary images from the unattacked data hiders. Compared with existing solutions, the proposed solution enhances the recoverability of the original binary image. Besides, the proposed solution maintains a stable embedding capacity for different categories of images.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1560-1564"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143845568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.
{"title":"Viewport-Independent Blind Quality Assessment of AI-Generated Omnidirectional Images via Vision-Language Correspondence","authors":"Xuelin Liu;Jiebin Yan;Chenyi Lai;Yang Li;Yuming Fang","doi":"10.1109/LSP.2025.3556791","DOIUrl":"https://doi.org/10.1109/LSP.2025.3556791","url":null,"abstract":"The advancement of deep generation technology has significantly enhanced the growth of artificial intelligence-generated content (AIGC). Among these, AI-generated omnidirectional images (AGOIs), hold considerable promise for applications in virtual reality (VR). However, the quality of AGOIs varies widely, and there has been limited research focused on their quality assessment. In this letter, inspired by the characteristics of the human visual system, we propose a novel viewport-independent blindquality assessment method for AGOIs, termed VI-AGOIQA, which leverages vision-language correspondence. Specifically, to minimize the computational burden associated with viewport-based prediction methods for omnidirectional image quality assessment, a set of image patches are first extracted from AGOIs in Equirectangular Projection (ERP) format. Then, the correspondence between visual and textual inputs is effectively learned by utilizing the pre-trained image and text encoders of the Contrastive Language-Image Pre-training (CLIP) model. Finally, a multimodal feature fusion module is applied to predict human visual preferences based on the learned knowledge of visual-language consistency. Extensive experiments conducted on publicly available database demonstrate the promising performance of the proposed method.","PeriodicalId":13154,"journal":{"name":"IEEE Signal Processing Letters","volume":"32 ","pages":"1630-1634"},"PeriodicalIF":3.2,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143865253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}