Pub Date : 2025-04-11DOI: 10.1109/JSAC.2025.3560013
Shehan Edirimannage;Ibrahim Khalil;Charitha Elvitigala;Wathsara Daluwatta;Primal Wijesekera;Albert Y. Zomaya
With the rapid expansion of next-generation networking, Internet of Things (IoT) devices have become central components of federated learning (FL) networks. FL offers a paradigm for distributed training machine learning models while preserving user data privacy. However, existing network security measures often struggle to identify legitimate contributors from opportunistic free riders within these networks. The Free Rider (FR) problem arises when participants seek to benefit from the FL processes without contributing. In particular, free riders are known to exist within or outside of the network, whereas outside free riders can hardly be identified. The Zero Trust model proposes an environment where no entity, including the network itself, is inherently trusted, providing a foundation to counter external threats seeking to exploit the network. This study proposes a novel framework strengthened by the Zero Trust model to identify external free riders in FL networks. Leveraging a Deep Autoencoding Gaussian Mixture Model (DAGMM)-based technique for internal free rider detection, our framework demonstrates superior performance in identifying free riders across various FR scenarios compared to current state-of-the-art solutions. Through our proposed framework and the principles of Zero Trust, we establish a robust security guarantee for FL networks, ensuring the integrity of the learning process.
{"title":"ZeTFRi—A Zero Trust-Based Free Rider Detection Framework for Next Generation Federated Learning Networks","authors":"Shehan Edirimannage;Ibrahim Khalil;Charitha Elvitigala;Wathsara Daluwatta;Primal Wijesekera;Albert Y. Zomaya","doi":"10.1109/JSAC.2025.3560013","DOIUrl":"10.1109/JSAC.2025.3560013","url":null,"abstract":"With the rapid expansion of next-generation networking, Internet of Things (IoT) devices have become central components of federated learning (FL) networks. FL offers a paradigm for distributed training machine learning models while preserving user data privacy. However, existing network security measures often struggle to identify legitimate contributors from opportunistic free riders within these networks. The Free Rider (FR) problem arises when participants seek to benefit from the FL processes without contributing. In particular, free riders are known to exist within or outside of the network, whereas outside free riders can hardly be identified. The Zero Trust model proposes an environment where no entity, including the network itself, is inherently trusted, providing a foundation to counter external threats seeking to exploit the network. This study proposes a novel framework strengthened by the Zero Trust model to identify external free riders in FL networks. Leveraging a Deep Autoencoding Gaussian Mixture Model (DAGMM)-based technique for internal free rider detection, our framework demonstrates superior performance in identifying free riders across various FR scenarios compared to current state-of-the-art solutions. Through our proposed framework and the principles of Zero Trust, we establish a robust security guarantee for FL networks, ensuring the integrity of the learning process.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 6","pages":"1938-1953"},"PeriodicalIF":0.0,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143822605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1109/JSAC.2025.3559149
Huiqiang Xie;Zhijin Qin;Zhu Han;Khaled B. Letaief
Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is achieved through the introduction of analog-digital allocation and fusion modules. To strike a balance between data rate and distortion, we design new loss functions that take into account long-distance dependencies in the semantic distortion constraint, essential information recovery in the channel distortion constraint, and optimal bit stream generation in the rate constraint. Additionally, we propose denoising diffusion-based signal detection techniques, which involve carefully designed variance schedules and sampling algorithms to refine transmitted signals. Through extensive numerical experiments, we will demonstrate that HDA-DeepSC exhibits robustness to channel variations and is capable of supporting various communication scenarios. Our proposed framework outperforms existing benchmarks in terms of peak signal-to-noise ratio and multi-scale structural similarity, showcasing its superiority in semantic communication quality.
{"title":"Hybrid Digital-Analog Semantic Communications","authors":"Huiqiang Xie;Zhijin Qin;Zhu Han;Khaled B. Letaief","doi":"10.1109/JSAC.2025.3559149","DOIUrl":"10.1109/JSAC.2025.3559149","url":null,"abstract":"Digital and analog semantic communications (SemCom) face inherent limitations such as data security concerns in analog SemCom, as well as leveling-off and cliff-edge effects in digital SemCom. In order to overcome these challenges, we propose a novel SemCom framework and a corresponding system called HDA-DeepSC, which leverages a hybrid digital-analog approach for multimedia transmission. This is achieved through the introduction of analog-digital allocation and fusion modules. To strike a balance between data rate and distortion, we design new loss functions that take into account long-distance dependencies in the semantic distortion constraint, essential information recovery in the channel distortion constraint, and optimal bit stream generation in the rate constraint. Additionally, we propose denoising diffusion-based signal detection techniques, which involve carefully designed variance schedules and sampling algorithms to refine transmitted signals. Through extensive numerical experiments, we will demonstrate that HDA-DeepSC exhibits robustness to channel variations and is capable of supporting various communication scenarios. Our proposed framework outperforms existing benchmarks in terms of peak signal-to-noise ratio and multi-scale structural similarity, showcasing its superiority in semantic communication quality.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2478-2492"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10960639","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1109/JSAC.2025.3559136
Guangfeng Yan;Tan Li;Kui Wu;Linqi Song
In distributed learning systems, ensuring efficient communication and privacy protection are two significant challenges. Although several existing works have attempted to address these challenges simultaneously, they often overlook essential learning-oriented features such as dynamic gradient and communication characteristics. In this paper, we propose a communication-efficient and privacy-preserving distributed SGD algorithm. Our proposed algorithm employs a layered randomized quantizer (LRQ) to reduce communication overhead, which also ensures that quantization errors follow an exact Gaussian distribution, thus achieving client-level differential privacy. We analyze the trade-off between convergence error, communication, and privacy under non-IID data distributions. Besides, we modify the algorithm to be training-adaptive by adjusting the per-round privacy budget allocation in response to i) dynamic gradient features and ii) real-time changing communication rounds. Both closed-form solutions are derived by solving the minimization problem of convergence error subject to the privacy budget constraint. Finally, we evaluate the effectiveness of our approach through extensive experiments on various datasets, including MNIST, CIFAR-10, and CIFAR-100, demonstrating its superiority in terms of communication cost, privacy protection, and model performance compared to state-of-the-art methods.
{"title":"Layered Randomized Quantization for Communication-Efficient and Privacy-Preserving Distributed Learning","authors":"Guangfeng Yan;Tan Li;Kui Wu;Linqi Song","doi":"10.1109/JSAC.2025.3559136","DOIUrl":"10.1109/JSAC.2025.3559136","url":null,"abstract":"In distributed learning systems, ensuring efficient communication and privacy protection are two significant challenges. Although several existing works have attempted to address these challenges simultaneously, they often overlook essential learning-oriented features such as dynamic gradient and communication characteristics. In this paper, we propose a communication-efficient and privacy-preserving distributed SGD algorithm. Our proposed algorithm employs a layered randomized quantizer (LRQ) to reduce communication overhead, which also ensures that quantization errors follow an exact Gaussian distribution, thus achieving client-level differential privacy. We analyze the trade-off between convergence error, communication, and privacy under non-IID data distributions. Besides, we modify the algorithm to be training-adaptive by adjusting the per-round privacy budget allocation in response to i) dynamic gradient features and ii) real-time changing communication rounds. Both closed-form solutions are derived by solving the minimization problem of convergence error subject to the privacy budget constraint. Finally, we evaluate the effectiveness of our approach through extensive experiments on various datasets, including MNIST, CIFAR-10, and CIFAR-100, demonstrating its superiority in terms of communication cost, privacy protection, and model performance compared to state-of-the-art methods.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2684-2699"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1109/JSAC.2025.3559138
Yoon Huh;Hyowoon Seo;Wan Choi
From the perspective of joint source-channel coding (JSCC), there has been significant research on utilizing semantic communication, which inherently possesses analog characteristics, within digital device environments. However, a single-model approach that operates modulation-agnostically across various digital modulation orders has not yet been established. This article presents the first attempt at such an approach by proposing a universal joint source-channel coding (uJSCC) system that utilizes a single-model encoder-decoder pair and trained vector quantization (VQ) codebooks. To support various modulation orders within a single model, the operation of every neural network (NN)-based module in the uJSCC system requires the selection of modulation orders according to signal-to-noise ratio (SNR) boundaries. To address the challenge of unequal output statistics from shared parameters across NN layers, we integrate multiple batch normalization (BN) layers, selected based on modulation order, after each NN layer. This integration occurs with minimal impact on the overall model size. Through a comprehensive series of experiments, we validate that the modulation-agnostic semantic communication framework demonstrates superiority over existing digital semantic communication approaches in terms of model complexity, communication efficiency, and task effectiveness.
{"title":"Universal Joint Source-Channel Coding for Modulation-Agnostic Semantic Communication","authors":"Yoon Huh;Hyowoon Seo;Wan Choi","doi":"10.1109/JSAC.2025.3559138","DOIUrl":"10.1109/JSAC.2025.3559138","url":null,"abstract":"From the perspective of joint source-channel coding (JSCC), there has been significant research on utilizing semantic communication, which inherently possesses analog characteristics, within digital device environments. However, a single-model approach that operates modulation-agnostically across various digital modulation orders has not yet been established. This article presents the first attempt at such an approach by proposing a universal joint source-channel coding (uJSCC) system that utilizes a single-model encoder-decoder pair and trained vector quantization (VQ) codebooks. To support various modulation orders within a single model, the operation of every neural network (NN)-based module in the uJSCC system requires the selection of modulation orders according to signal-to-noise ratio (SNR) boundaries. To address the challenge of unequal output statistics from shared parameters across NN layers, we integrate multiple batch normalization (BN) layers, selected based on modulation order, after each NN layer. This integration occurs with minimal impact on the overall model size. Through a comprehensive series of experiments, we validate that the modulation-agnostic semantic communication framework demonstrates superiority over existing digital semantic communication approaches in terms of model complexity, communication efficiency, and task effectiveness.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2560-2574"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1109/JSAC.2025.3559140
Qiang Hu;Qihan He;Houqiang Zhong;Guo Lu;Xiaoyun Zhang;Guangtao Zhai;Yanfeng Wang
Free-view video (FVV) allows users to explore immersive video content from multiple views. However, delivering FVV poses significant challenges due to the uncertainty in view switching, combined with the substantial bandwidth and computational resources required to transmit and decode multiple video streams, which may result in frequent playback interruptions. Existing approaches, either client-based or cloud-based, struggle to meet high Quality of Experience (QoE) requirements under limited bandwidth and computational resources. To address these issues, we propose VARFVV, a bandwidth- and computationally-efficient system that enables real-time interactive FVV streaming with high QoE and low switching delay. Specifically, VARFVV introduces a low-complexity FVV generation scheme that reassembles multiview video frames at the edge server based on user-selected view tracks, eliminating the need for transcoding and significantly reducing computational overhead. This design makes it well-suited for large-scale, mobile-based UHD FVV experiences. Furthermore, we present a popularity-adaptive bit allocation method, leveraging a graph neural network, that predicts view popularity and dynamically adjusts bit allocation to maximize QoE within bandwidth constraints. We also construct an FVV dataset comprising 330 videos from 10 scenes, including basketball, opera, etc. Extensive experiments show that VARFVV surpasses existing methods in video quality, switching latency, computational efficiency, and bandwidth usage, supporting over 500 users on a single edge server with a switching delay of 71.5ms. Our code and dataset are available at https://github.com/qianghu-huber/VARFVV
{"title":"VARFVV: View-Adaptive Real-Time Interactive Free-View Video Streaming With Edge Computing","authors":"Qiang Hu;Qihan He;Houqiang Zhong;Guo Lu;Xiaoyun Zhang;Guangtao Zhai;Yanfeng Wang","doi":"10.1109/JSAC.2025.3559140","DOIUrl":"10.1109/JSAC.2025.3559140","url":null,"abstract":"Free-view video (FVV) allows users to explore immersive video content from multiple views. However, delivering FVV poses significant challenges due to the uncertainty in view switching, combined with the substantial bandwidth and computational resources required to transmit and decode multiple video streams, which may result in frequent playback interruptions. Existing approaches, either client-based or cloud-based, struggle to meet high Quality of Experience (QoE) requirements under limited bandwidth and computational resources. To address these issues, we propose VARFVV, a bandwidth- and computationally-efficient system that enables real-time interactive FVV streaming with high QoE and low switching delay. Specifically, VARFVV introduces a low-complexity FVV generation scheme that reassembles multiview video frames at the edge server based on user-selected view tracks, eliminating the need for transcoding and significantly reducing computational overhead. This design makes it well-suited for large-scale, mobile-based UHD FVV experiences. Furthermore, we present a popularity-adaptive bit allocation method, leveraging a graph neural network, that predicts view popularity and dynamically adjusts bit allocation to maximize QoE within bandwidth constraints. We also construct an FVV dataset comprising 330 videos from 10 scenes, including basketball, opera, etc. Extensive experiments show that VARFVV surpasses existing methods in video quality, switching latency, computational efficiency, and bandwidth usage, supporting over 500 users on a single edge server with a switching delay of 71.5ms. Our code and dataset are available at <uri>https://github.com/qianghu-huber/VARFVV</uri>","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2620-2634"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1109/JSAC.2025.3559154
Seong-Joon Park;Hee-Youl Kwak;Sang-Hyo Kim;Sunghwan Kim;Yongjune Kim;Jong-Seon No
With the broadening applications of deep learning, neural decoders have emerged as a key research focus, specifically aimed at improving the decoding performance of conventional decoding algorithms. In particular, error correction code transformer (ECCT), which utilizes the transformer architecture, has achieved state-of-the-art performance among neural network-based decoders. We present three technical contributions to significantly enhance the performance of ECCT. First, we propose a novel transformer architecture of ECCT, termed the multiple-masks ECCT (MM ECCT). We employ multiple masked self-attention blocks with different mask matrices in a parallel manner to learn diverse relationships among the codeword bits. Second, we discover that constructing mask matrices based on systematic parity check matrices (PCMs) can make the attention maps sparse, which not only enhances the decoding performance but also reduces computational complexity. Finally, we propose using complementary mask matrices derived from cyclic permutations of the systematic PCM. These complementary mask matrices are specifically designed to enhance the decoding of cyclic codes. Our extensive simulation results show that the proposed MM ECCT architecture with carefully designed mask matrices outperforms the original ECCT by a large margin, achieving state-of-the-art decoding performance among neural decoders. The source code is available at https://github.com/iil-postech/mm-ecct.
{"title":"Multiple-Masks Error Correction Code Transformer for Short Block Codes","authors":"Seong-Joon Park;Hee-Youl Kwak;Sang-Hyo Kim;Sunghwan Kim;Yongjune Kim;Jong-Seon No","doi":"10.1109/JSAC.2025.3559154","DOIUrl":"10.1109/JSAC.2025.3559154","url":null,"abstract":"With the broadening applications of deep learning, neural decoders have emerged as a key research focus, specifically aimed at improving the decoding performance of conventional decoding algorithms. In particular, error correction code transformer (ECCT), which utilizes the transformer architecture, has achieved state-of-the-art performance among neural network-based decoders. We present three technical contributions to significantly enhance the performance of ECCT. First, we propose a novel transformer architecture of ECCT, termed the <italic>multiple-masks ECCT (MM ECCT)</i>. We employ multiple masked self-attention blocks with different mask matrices in a parallel manner to learn diverse relationships among the codeword bits. Second, we discover that constructing mask matrices based on systematic parity check matrices (PCMs) can make the attention maps <italic>sparse</i>, which not only enhances the decoding performance but also reduces computational complexity. Finally, we propose using complementary mask matrices derived from cyclic permutations of the systematic PCM. These complementary mask matrices are specifically designed to enhance the decoding of cyclic codes. Our extensive simulation results show that the proposed MM ECCT architecture with carefully designed mask matrices outperforms the original ECCT by a large margin, achieving state-of-the-art decoding performance among neural decoders. The source code is available at <uri>https://github.com/iil-postech/mm-ecct</uri>.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2518-2529"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-10DOI: 10.1109/JSAC.2025.3559150
Yuval Ben-Hur;Yuval Cassuto
As machine-learning models grow in size, their implementation requirements cannot be met by a single computer system. This observation motivates distributed settings, in which intermediate computations are performed across a network of processing units, while the central node only aggregates their outputs. However, distributing inference tasks across low-precision or faulty edge devices, operating over a network of noisy communication channels, gives rise to serious reliability challenges. We study the problem of an ensemble of devices, implementing regression algorithms, that communicate through additive noisy channels in order to collaboratively perform a joint regression task. We define the problem formally, and develop methods for optimizing the aggregation coefficients for the parameters of the noise in the channels, which can potentially be correlated. Our results apply to the leading state-of-the-art ensemble regression methods: bagging and gradient boosting. We demonstrate the effectiveness of our algorithms on both synthetic and real-world datasets.
{"title":"Robust Regression With Ensembles Communicating Over Noisy Channels","authors":"Yuval Ben-Hur;Yuval Cassuto","doi":"10.1109/JSAC.2025.3559150","DOIUrl":"10.1109/JSAC.2025.3559150","url":null,"abstract":"As machine-learning models grow in size, their implementation requirements cannot be met by a single computer system. This observation motivates distributed settings, in which intermediate computations are performed across a network of processing units, while the central node only aggregates their outputs. However, distributing inference tasks across low-precision or faulty edge devices, operating over a network of noisy communication channels, gives rise to serious reliability challenges. We study the problem of an ensemble of devices, implementing regression algorithms, that communicate through additive noisy channels in order to collaboratively perform a joint regression task. We define the problem formally, and develop methods for optimizing the aggregation coefficients for the parameters of the noise in the channels, which can potentially be correlated. Our results apply to the leading state-of-the-art ensemble regression methods: bagging and gradient boosting. We demonstrate the effectiveness of our algorithms on both synthetic and real-world datasets.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2714-2727"},"PeriodicalIF":0.0,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143819730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09DOI: 10.1109/JSAC.2025.3559118
Kuo-Yu Liao;Cheng-Shang Chang;Y.-W. Peter Hong
Recent advances in Large Language Models (LLMs) have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model proposed by Arora and Goyal for modeling semantic languages, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to this ratio. Upon completion of the training, the association of learned skills can also be acquired to form a skill association graph. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph. Our analysis can also be extended to the setting with a hierarchy of skills, where a fine-tuned model is built upon a foundation model. It is also applicable to the setting with multiple classes of skills and texts. As an important application, we propose a method for semantic compression and discuss its connections to semantic communication.
{"title":"A Mathematical Theory for Learning Semantic Languages by Abstract Learners","authors":"Kuo-Yu Liao;Cheng-Shang Chang;Y.-W. Peter Hong","doi":"10.1109/JSAC.2025.3559118","DOIUrl":"10.1109/JSAC.2025.3559118","url":null,"abstract":"Recent advances in Large Language Models (LLMs) have demonstrated the emergence of capabilities (learned skills) when the number of system parameters and the size of training data surpass certain thresholds. The exact mechanisms behind such phenomena are not fully understood and remain a topic of active research. Inspired by the skill-text bipartite graph model proposed by Arora and Goyal for modeling semantic languages, we develop a mathematical theory to explain the emergence of learned skills, taking the learning (or training) process into account. Our approach models the learning process for skills in the skill-text bipartite graph as an iterative decoding process in Low-Density Parity Check (LDPC) codes and Irregular Repetition Slotted ALOHA (IRSA). Using density evolution analysis, we demonstrate the emergence of learned skills when the ratio of the number of training texts to the number of skills exceeds a certain threshold. Our analysis also yields a scaling law for testing errors relative to this ratio. Upon completion of the training, the association of learned skills can also be acquired to form a skill association graph. We use site percolation analysis to derive the conditions for the existence of a giant component in the skill association graph. Our analysis can also be extended to the setting with a hierarchy of skills, where a fine-tuned model is built upon a foundation model. It is also applicable to the setting with multiple classes of skills and texts. As an important application, we propose a method for semantic compression and discuss its connections to semantic communication.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2700-2713"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-04-09DOI: 10.1109/JSAC.2025.3559117
Anju Anand;Emrah Akyol
This paper studies a quantization problem between an encoder and a decoder with misaligned objectives, where the quantization indices are transmitted over a noisy channel. Building on the prior results on the non-strategic counterpart of this problem, we characterize the encoding and decoding strategies and expected encoder and decoder distortions at the Stackelberg equilibrium, where the encoder is the leader, and the decoder is the follower. On the design side, we extend the gradient-descent-based solution framework developed for the noiseless setting to this noisy communication scenario, combined with uniformly randomized index mapping. We finally present numerical simulation results to demonstrate the efficacy of the proposed approach. The MATLAB codes associated with the design and evaluation of the proposed algorithm are provided at: https://github.com/strategic-quantization/channel-optimized-strategic-quantizer.
{"title":"Channel-Optimized Strategic Quantization","authors":"Anju Anand;Emrah Akyol","doi":"10.1109/JSAC.2025.3559117","DOIUrl":"10.1109/JSAC.2025.3559117","url":null,"abstract":"This paper studies a quantization problem between an encoder and a decoder with misaligned objectives, where the quantization indices are transmitted over a noisy channel. Building on the prior results on the non-strategic counterpart of this problem, we characterize the encoding and decoding strategies and expected encoder and decoder distortions at the Stackelberg equilibrium, where the encoder is the leader, and the decoder is the follower. On the design side, we extend the gradient-descent-based solution framework developed for the noiseless setting to this noisy communication scenario, combined with uniformly randomized index mapping. We finally present numerical simulation results to demonstrate the efficacy of the proposed approach. The MATLAB codes associated with the design and evaluation of the proposed algorithm are provided at: <uri>https://github.com/strategic-quantization/channel-optimized-strategic-quantizer</uri>.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2530-2542"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
By extracting task-relevant information while maximally compressing the input, the information bottleneck (IB) principle has provided a guideline for learning effective and robust representations of the target inference. However, extending the idea to the multi-task learning scenario with joint consideration of generative tasks and traditional reconstruction tasks remains unexplored. This paper addresses this gap by reconsidering the lossy compression problem with diverse constraints on data reconstruction, perceptual quality, and classification accuracy. Firstly, we study two ternary relationships, namely, the rate-distortion-classification (RDC) and rate-perception-classification (RPC). For both RDC and RPC functions, we derive the closed-form expressions of the optimal rate for binary and Gaussian sources. These new results complement the IB principle and provide insights into effectively extracting task-oriented information to fulfill diverse objectives. Secondly, unlike prior research demonstrating a tradeoff between classification and perception in signal restoration problems, we prove that such a tradeoff does not exist in the RPC function and reveal that the source noise plays a decisive role in the classification-perception tradeoff. Finally, we implement a deep-learning-based image compression framework, incorporating multiple tasks related to distortion, perception, and classification. The experimental results coincide with the theoretical analysis and verify the effectiveness of our generalized IB in balancing various task objectives.
{"title":"Task-Oriented Lossy Compression With Data, Perception, and Classification Constraints","authors":"Yuhan Wang;Youlong Wu;Shuai Ma;Ying-Jun Angela Zhang","doi":"10.1109/JSAC.2025.3559164","DOIUrl":"10.1109/JSAC.2025.3559164","url":null,"abstract":"By extracting task-relevant information while maximally compressing the input, the information bottleneck (IB) principle has provided a guideline for learning effective and robust representations of the target inference. However, extending the idea to the multi-task learning scenario with joint consideration of generative tasks and traditional reconstruction tasks remains unexplored. This paper addresses this gap by reconsidering the lossy compression problem with diverse constraints on data reconstruction, perceptual quality, and classification accuracy. Firstly, we study two ternary relationships, namely, the <italic>rate-distortion-classification (RDC)</i> and <italic>rate-perception-classification (RPC)</i>. For both RDC and RPC functions, we derive the <italic>closed-form expressions</i> of the optimal rate for binary and Gaussian sources. These new results complement the IB principle and provide insights into effectively extracting task-oriented information to fulfill diverse objectives. Secondly, unlike prior research demonstrating a tradeoff between classification and perception in signal restoration problems, we prove that such a tradeoff does not exist in the RPC function and reveal that the source noise plays a decisive role in the classification-perception tradeoff. Finally, we implement a deep-learning-based image compression framework, incorporating multiple tasks related to distortion, perception, and classification. The experimental results coincide with the theoretical analysis and verify the effectiveness of our generalized IB in balancing various task objectives.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 7","pages":"2635-2650"},"PeriodicalIF":0.0,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143813481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}