Rate-Distortion-Perception Controllable Joint Source-Channel Coding for High-Fidelity Generative Semantic Communications

IF 7 1区计算机科学 Q1 TELECOMMUNICATIONS IEEE Transactions on Cognitive Communications and Networking Pub Date : 2024-12-05 DOI:10.1109/TCCN.2024.3511960

Kailin Tan;Jincheng Dai;Zhenyu Liu;Sixian Wang;Xiaoqi Qin;Wenjun Xu;Kai Niu;Ping Zhang

{"title":"Rate-Distortion-Perception Controllable Joint Source-Channel Coding for High-Fidelity Generative Semantic Communications","authors":"Kailin Tan;Jincheng Dai;Zhenyu Liu;Sixian Wang;Xiaoqi Qin;Wenjun Xu;Kai Niu;Ping Zhang","doi":"10.1109/TCCN.2024.3511960","DOIUrl":null,"url":null,"abstract":"End-to-end image transmission has recently become a crucial trend in intelligent wireless communications, driven by the increasing demand for high bandwidth efficiency. However, existing methods primarily optimize the trade-off between bandwidth cost and objective distortion, often failing to deliver visually pleasing results aligned with human perception. In this paper, we propose a novel rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance perception quality in human communications. Our RDP-JSCC framework integrates a flexible plug-in conditional Generative Adversarial Networks (GANs) to provide detailed and realistic image reconstructions at the receiver, overcoming the limitations of traditional rate-distortion optimized solutions that typically produce blurry or poorly textured images. Based on this framework, we introduce a distortion-perception controllable transmission (DPCT) model, which addresses the variation in the perception-distortion trade-off. DPCT uses a lightweight spatial realism embedding module (SREM) to condition the generator on a realism map, enabling the customization of appearance realism for each image region at the receiver from a single transmission. Furthermore, for scenarios with scarce bandwidth, we propose an interest-oriented content-controllable transmission (CCT) model. CCT prioritizes the transmission of regions that attract user attention and generates other regions from an instance label map, ensuring both content consistency and appearance realism for all regions while proportionally reducing channel bandwidth costs. Comprehensive experiments demonstrate the superiority of our RDP-optimized image transmission framework over state-of-the-art engineered image transmission systems and advanced perceptual methods.","PeriodicalId":13069,"journal":{"name":"IEEE Transactions on Cognitive Communications and Networking","volume":"11 2","pages":"672-686"},"PeriodicalIF":7.0000,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cognitive Communications and Networking","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10778256/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

End-to-end image transmission has recently become a crucial trend in intelligent wireless communications, driven by the increasing demand for high bandwidth efficiency. However, existing methods primarily optimize the trade-off between bandwidth cost and objective distortion, often failing to deliver visually pleasing results aligned with human perception. In this paper, we propose a novel rate-distortion-perception (RDP) jointly optimized joint source-channel coding (JSCC) framework to enhance perception quality in human communications. Our RDP-JSCC framework integrates a flexible plug-in conditional Generative Adversarial Networks (GANs) to provide detailed and realistic image reconstructions at the receiver, overcoming the limitations of traditional rate-distortion optimized solutions that typically produce blurry or poorly textured images. Based on this framework, we introduce a distortion-perception controllable transmission (DPCT) model, which addresses the variation in the perception-distortion trade-off. DPCT uses a lightweight spatial realism embedding module (SREM) to condition the generator on a realism map, enabling the customization of appearance realism for each image region at the receiver from a single transmission. Furthermore, for scenarios with scarce bandwidth, we propose an interest-oriented content-controllable transmission (CCT) model. CCT prioritizes the transmission of regions that attract user attention and generates other regions from an instance label map, ensuring both content consistency and appearance realism for all regions while proportionally reducing channel bandwidth costs. Comprehensive experiments demonstrate the superiority of our RDP-optimized image transmission framework over state-of-the-art engineered image transmission systems and advanced perceptual methods.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

高保真生成语义通信的速率-失真-感知可控联合信源-信道编码

在高带宽效率需求的推动下，端到端图像传输已成为智能无线通信的一个重要趋势。然而，现有的方法主要是优化带宽成本和客观失真之间的权衡，往往无法提供与人类感知一致的视觉愉悦结果。为了提高人类通信的感知质量，我们提出了一种新的速率失真感知（RDP）联合优化的联合信源信道编码（JSCC）框架。我们的RDP-JSCC框架集成了一个灵活的插件条件生成对抗网络（gan），在接收器上提供详细和逼真的图像重建，克服了传统的率失真优化解决方案的局限性，这些解决方案通常会产生模糊或纹理较差的图像。基于该框架，我们引入了失真-感知可控传输（DPCT）模型，该模型解决了感知-失真权衡的变化。DPCT使用轻量级的空间真实感嵌入模块（SREM）在真实感地图上调节生成器，从而能够从单个传输中为接收器的每个图像区域定制外观真实感。此外，对于带宽稀缺的场景，我们提出了一种面向兴趣的内容可控传输（CCT）模型。CCT优先传输吸引用户注意的区域，并从实例标签地图生成其他区域，确保所有区域的内容一致性和外观真实感，同时按比例降低信道带宽成本。综合实验证明了我们的rdp优化图像传输框架优于最先进的工程图像传输系统和先进的感知方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

IEEE Transactions on Cognitive Communications and Networking Computer Science-Artificial Intelligence

CiteScore

15.50

自引率

7.00%

发文量

108

期刊介绍： The IEEE Transactions on Cognitive Communications and Networking (TCCN) aims to publish high-quality manuscripts that push the boundaries of cognitive communications and networking research. Cognitive, in this context, refers to the application of perception, learning, reasoning, memory, and adaptive approaches in communication system design. The transactions welcome submissions that explore various aspects of cognitive communications and networks, focusing on innovative and holistic approaches to complex system design. Key topics covered include architecture, protocols, cross-layer design, and cognition cycle design for cognitive networks. Additionally, research on machine learning, artificial intelligence, end-to-end and distributed intelligence, software-defined networking, cognitive radios, spectrum sharing, and security and privacy issues in cognitive networks are of interest. The publication also encourages papers addressing novel services and applications enabled by these cognitive concepts.