Pub Date : 2025-01-10DOI: 10.1109/JSTSP.2024.3516382
Yuhui Song;Zijun Gong;Yuanzhu Chen;Cheng Li
Massive machine-type communications (mMTC), an essential fifth-generation (5G) usage scenario, aims to provide services for a large number of users that intermittently transmit small data packets in smart cities, manufacturing, and agriculture. Massive random access (MRA) emerges as a promising candidate for multiple access in mMTC characterized by the sporadic data traffic. Despite the use of massive multiple-input multiple-output (mMIMO) in MRA to achieve spatial division multiple access and mitigate small-scale fading, existing research endeavors overlook the near-far effect of large-scale fading by assuming perfect power control. In this paper, we present a cost-efficient, effective, and fully distributed solution for MRA to combat large-scale fading, wherein distributed access points (APs) cooperatively detect and serve active users. Each AP is equipped with low resolution analog-to-digital converters (ADCs) for energy-efficient system implementation. Specifically, we derive a rigorous closed-form expression for the uplink achievable rate, considering the impact of non-orthogonal pilots and low resolution ADCs. We also propose a scalable distributed algorithm for user activity detection under flat fading channels, and further adapt it to handle frequency-selective fading in popular orthogonal frequency division multiplexing (OFDM) systems. The proposed solution is fully distributed, since most processing tasks, such as activity detection, channel estimation, and data detection, are localized at each AP. Simulation results demonstrate the significant advantage of distributed systems over co-located systems in accommodating more users while achieving higher activity detection accuracy, and quantify performance loss resulting from the use of low resolution ADCs.
{"title":"Distributed Massive MIMO With Low Resolution ADCs for Massive Random Access","authors":"Yuhui Song;Zijun Gong;Yuanzhu Chen;Cheng Li","doi":"10.1109/JSTSP.2024.3516382","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3516382","url":null,"abstract":"Massive machine-type communications (mMTC), an essential fifth-generation (5G) usage scenario, aims to provide services for a large number of users that intermittently transmit small data packets in smart cities, manufacturing, and agriculture. Massive random access (MRA) emerges as a promising candidate for multiple access in mMTC characterized by the sporadic data traffic. Despite the use of massive multiple-input multiple-output (mMIMO) in MRA to achieve spatial division multiple access and mitigate small-scale fading, existing research endeavors overlook the near-far effect of large-scale fading by assuming perfect power control. In this paper, we present a cost-efficient, effective, and fully distributed solution for MRA to combat large-scale fading, wherein distributed access points (APs) cooperatively detect and serve active users. Each AP is equipped with low resolution analog-to-digital converters (ADCs) for energy-efficient system implementation. Specifically, we derive a rigorous closed-form expression for the uplink achievable rate, considering the impact of non-orthogonal pilots and low resolution ADCs. We also propose a scalable distributed algorithm for user activity detection under flat fading channels, and further adapt it to handle frequency-selective fading in popular orthogonal frequency division multiplexing (OFDM) systems. The proposed solution is fully distributed, since most processing tasks, such as activity detection, channel estimation, and data detection, are localized at each AP. Simulation results demonstrate the significant advantage of distributed systems over co-located systems in accommodating more users while achieving higher activity detection accuracy, and quantify performance loss resulting from the use of low resolution ADCs.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 7","pages":"1381-1395"},"PeriodicalIF":8.7,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993328","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07DOI: 10.1109/JSTSP.2024.3511064
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2024.3511064","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3511064","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 5","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10832404","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07DOI: 10.1109/JSTSP.2024.3511060
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2024.3511060","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3511060","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 5","pages":"C3-C3"},"PeriodicalIF":8.7,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10832440","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07DOI: 10.1109/JSTSP.2024.3524022
Behnam Gholami;Mostafa El-Khamy;Kee-Bong Song
Traditional speech enhancement methods often rely on complex signal processing algorithms, which may not be efficient for real-time applications or may suffer from limitations in handling various types of noise. Deploying complex Deep Neural Network (DNN) models in resource-constrained environments can be challenging due to their high computational requirements. In this paper, we propose a Knowledge Distillation (KD) method for speech enhancement leveraging the information stored in the intermediate latent features of a very complex DNN (teacher) model to train a smaller, more efficient (student) model. Experimental results on a two benchmark speech enhancement datasets demonstrate the effectiveness of the proposed KD method for speech enhancement. The student model trained with knowledge distillation outperforms SOTA speech enhancement methods and achieves comparable performance to the teacher model. Furthermore, our method achieves significant reductions in computational complexity, making it suitable for deployment in resource-constrained environments such as embedded systems and mobile devices.
传统的语音增强方法通常依赖于复杂的信号处理算法,这些算法对于实时应用来说可能并不高效,或者在处理各种类型的噪声时可能会受到限制。由于计算要求较高,在资源有限的环境中部署复杂的深度神经网络(DNN)模型可能具有挑战性。在本文中,我们提出了一种用于语音增强的知识蒸馏(KD)方法,利用存储在非常复杂的 DNN(教师)模型的中间潜在特征中的信息来训练一个更小、更高效的(学生)模型。在两个基准语音增强数据集上的实验结果证明了所提出的 KD 方法在语音增强方面的有效性。采用知识提炼方法训练的学生模型优于 SOTA 语音增强方法,其性能与教师模型相当。此外,我们的方法大大降低了计算复杂度,使其适用于嵌入式系统和移动设备等资源有限的环境。
{"title":"Latent Mixup Knowledge Distillation for Single Channel Speech Enhancement","authors":"Behnam Gholami;Mostafa El-Khamy;Kee-Bong Song","doi":"10.1109/JSTSP.2024.3524022","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3524022","url":null,"abstract":"Traditional speech enhancement methods often rely on complex signal processing algorithms, which may not be efficient for real-time applications or may suffer from limitations in handling various types of noise. Deploying complex Deep Neural Network (DNN) models in resource-constrained environments can be challenging due to their high computational requirements. In this paper, we propose a Knowledge Distillation (KD) method for speech enhancement leveraging the information stored in the intermediate latent features of a very complex DNN (teacher) model to train a smaller, more efficient (student) model. Experimental results on a two benchmark speech enhancement datasets demonstrate the effectiveness of the proposed KD method for speech enhancement. The student model trained with knowledge distillation outperforms SOTA speech enhancement methods and achieves comparable performance to the teacher model. Furthermore, our method achieves significant reductions in computational complexity, making it suitable for deployment in resource-constrained environments such as embedded systems and mobile devices.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 8","pages":"1544-1556"},"PeriodicalIF":8.7,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-07DOI: 10.1109/JSTSP.2024.3522437
Kumar Vijay Mishra;M. R. Bhavani Shankar;Nuria González-Prelcic;Mikko Valkama;Wei Yu;Björn Ottersten
Signal processing techniques have played a pivotal role in the early development of joint sensing and communication systems [1]. These efforts were driven by the need to address spectrum scarcity and to reduce hardware size and cost. Initially focused on dual-function radar-communication systems, this field has since evolved into the broader paradigm of Integrated Sensing and Communication (ISAC). ISAC encompasses a wide range of interactions between sensing and communication, incorporating not just radar but also other sensors, and leveraging their capabilities for applications such as autonomous driving, drone-based services, radio-frequency identification, and weather monitoring. With wireless networks now operating at higher frequencies, their dual role as communication networks and environmental sensors has become increasingly significant, providing critical information for both user needs and network operations [2].
{"title":"Editorial Introduction to the Special Issue on Learning-Based Signal Processing for Integrated Sensing and Communications","authors":"Kumar Vijay Mishra;M. R. Bhavani Shankar;Nuria González-Prelcic;Mikko Valkama;Wei Yu;Björn Ottersten","doi":"10.1109/JSTSP.2024.3522437","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3522437","url":null,"abstract":"Signal processing techniques have played a pivotal role in the early development of joint sensing and communication systems [1]. These efforts were driven by the need to address spectrum scarcity and to reduce hardware size and cost. Initially focused on dual-function radar-communication systems, this field has since evolved into the broader paradigm of Integrated Sensing and Communication (ISAC). ISAC encompasses a wide range of interactions between sensing and communication, incorporating not just radar but also other sensors, and leveraging their capabilities for applications such as autonomous driving, drone-based services, radio-frequency identification, and weather monitoring. With wireless networks now operating at higher frequencies, their dual role as communication networks and environmental sensors has become increasingly significant, providing critical information for both user needs and network operations [2].","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 5","pages":"731-736"},"PeriodicalIF":8.7,"publicationDate":"2025-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10832414","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142938096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-06DOI: 10.1109/JSTSP.2024.3521277
Mohammad Bokaei;Jesper Jensen;Simon Doclo;Jan Østergaard
Low-latency configurable speech transmission presents significant challenges in modern communication systems. Traditional methods rely on separate source and channel coding, which often degrades performance under low-latency constraints. Moreover, non-configurable systems require separate training for each condition, limiting their adaptability in resource-constrained scenarios. This paper proposes a configurable low-latency deep Joint Source-Channel Coding (JSCC) system for speech transmission. The system can be configured for varying signal-to-noise ratios (SNR), wireless channel conditions, or bandwidths. A joint source-channel encoder based on deep neural networks (DNN) is used to compress and transmit analog-coded information, while a configurable decoder reconstructs speech from noisy compressed signals. The system latency is adaptable based on the input speech length, achieving a minimum latency of 2 ms, with a lightweight architecture of 25 k parameters, significantly fewer than state-of-the-art systems. The simulation results demonstrate that the proposed system outperforms conventional separate source-channel coding systems in terms of speech quality and intelligibility, particularly in low-latency and noisy channel conditions. It also shows robustness in fixed configured scenarios, though higher latency conditions and better channel environments favor traditional coding systems.
低延迟可配置语音传输给现代通信系统带来了巨大挑战。传统方法依赖于单独的信源和信道编码,这往往会降低低延迟限制下的性能。此外,不可配置系统需要对每种条件进行单独训练,限制了其在资源受限情况下的适应性。本文提出了一种用于语音传输的可配置低延迟深度联合信源信道编码(JSCC)系统。该系统可根据不同的信噪比 (SNR)、无线信道条件或带宽进行配置。基于深度神经网络(DNN)的联合源信道编码器用于压缩和传输模拟编码信息,而可配置的解码器则从有噪声的压缩信号中重建语音。系统延迟可根据输入语音的长度进行调整,实现了 2 毫秒的最低延迟,其轻量级架构包含 25 k 个参数,大大少于最先进的系统。仿真结果表明,在语音质量和可懂度方面,特别是在低延迟和高噪声信道条件下,拟议系统优于传统的独立源信道编码系统。它还显示了在固定配置情况下的鲁棒性,尽管更高的延迟条件和更好的信道环境有利于传统编码系统。
{"title":"Low-Latency Deep Analog Speech Transmission Using Joint Source Channel Coding","authors":"Mohammad Bokaei;Jesper Jensen;Simon Doclo;Jan Østergaard","doi":"10.1109/JSTSP.2024.3521277","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3521277","url":null,"abstract":"Low-latency configurable speech transmission presents significant challenges in modern communication systems. Traditional methods rely on separate source and channel coding, which often degrades performance under low-latency constraints. Moreover, non-configurable systems require separate training for each condition, limiting their adaptability in resource-constrained scenarios. This paper proposes a configurable low-latency deep Joint Source-Channel Coding (JSCC) system for speech transmission. The system can be configured for varying signal-to-noise ratios (SNR), wireless channel conditions, or bandwidths. A joint source-channel encoder based on deep neural networks (DNN) is used to compress and transmit analog-coded information, while a configurable decoder reconstructs speech from noisy compressed signals. The system latency is adaptable based on the input speech length, achieving a minimum latency of 2 ms, with a lightweight architecture of 25 k parameters, significantly fewer than state-of-the-art systems. The simulation results demonstrate that the proposed system outperforms conventional separate source-channel coding systems in terms of speech quality and intelligibility, particularly in low-latency and noisy channel conditions. It also shows robustness in fixed configured scenarios, though higher latency conditions and better channel environments favor traditional coding systems.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 8","pages":"1401-1413"},"PeriodicalIF":8.7,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143184263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-01-03DOI: 10.1109/JSTSP.2025.3526081
Boyue Li;Yuejie Chi
Achieving communication efficiency in decentralized machine learning has been attracting significant attention, with communication compression recognized as an effective technique in algorithm design. This paper takes a first step to understand the role of gradient clipping, a popular strategy in practice, in decentralized nonconvex optimization with communication compression. We propose PORTER, which considers two variants of gradient clipping added before or after taking a mini-batch of stochastic gradients, where the former variant PORTER-DP allows local differential privacy analysis with additional Gaussian perturbation, and the latter variant PORTER-GC helps to stabilize training. We develop a novel analysis framework that establishes their convergence guarantees without assuming the stringent bounded gradient assumption. To the best of our knowledge, our work provides the first convergence analysis for decentralized nonconvex optimization with gradient clipping and communication compression, highlighting the trade-offs between convergence rate, compression ratio, network connectivity, and privacy.
{"title":"Convergence and Privacy of Decentralized Nonconvex Optimization With Gradient Clipping and Communication Compression","authors":"Boyue Li;Yuejie Chi","doi":"10.1109/JSTSP.2025.3526081","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3526081","url":null,"abstract":"Achieving communication efficiency in decentralized machine learning has been attracting significant attention, with communication compression recognized as an effective technique in algorithm design. This paper takes a first step to understand the role of gradient clipping, a popular strategy in practice, in decentralized nonconvex optimization with communication compression. We propose <monospace><b>PORTER</b></monospace>, which considers two variants of gradient clipping added before or after taking a mini-batch of stochastic gradients, where the former variant <monospace><b>PORTER-DP</b></monospace> allows local differential privacy analysis with additional Gaussian perturbation, and the latter variant <monospace><b>PORTER-GC</b></monospace> helps to stabilize training. We develop a novel analysis framework that establishes their convergence guarantees without assuming the stringent bounded gradient assumption. To the best of our knowledge, our work provides the first convergence analysis for decentralized nonconvex optimization with gradient clipping and communication compression, highlighting the trade-offs between convergence rate, compression ratio, network connectivity, and privacy.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"273-282"},"PeriodicalIF":8.7,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-26DOI: 10.1109/JSTSP.2024.3523233
Maryam Talebi Rostami;Seyed Ahmad Motamedi
Due to the significant advancements in medical imaging, the application of artificial intelligence for early disease diagnosis has greatly contributed to reducing mortality caused by heart diseases. Cardiac MRI segmentation into the left ventricle (LV), right ventricle (RV) and myocardium (MYO) is a key step towards the early treatment and diagnosis of heart diseases. The proposed method consists of two main components: a localization part to extract the heart organ from other parts of the image, and a GAN model for segmentation. In the generator part of the generative adversarial network (GAN) model, a customized U-Net network is employed to segment the cardiac image. A combination of inception and residual blocks and utilizing an attention mechanism yield an improved version of U-Net. On the other hand, the discriminator part of the GAN model is designed to accurately distinguish between the ground truth image and the segmented image generated by the generator part. Our method is evaluated on two cardiac MRI datasets. The evaluation results on the ACDC 2017 challenge dataset show mean Dice scores of 0.947 for LV, 0.919 for RV, and 0.907 for MYO in cardiac MRI segmentation. The experimental results highlight that our proposed IRAU-Net method outperforms other state-of-the-art methods in terms of accuracy while significantly reducing computational costs.
{"title":"IRAU-Net: Inception Residual Attention U-Net in Adversarial Network for Cardiac MRI Segmentation","authors":"Maryam Talebi Rostami;Seyed Ahmad Motamedi","doi":"10.1109/JSTSP.2024.3523233","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3523233","url":null,"abstract":"Due to the significant advancements in medical imaging, the application of artificial intelligence for early disease diagnosis has greatly contributed to reducing mortality caused by heart diseases. Cardiac MRI segmentation into the left ventricle (LV), right ventricle (RV) and myocardium (MYO) is a key step towards the early treatment and diagnosis of heart diseases. The proposed method consists of two main components: a localization part to extract the heart organ from other parts of the image, and a GAN model for segmentation. In the generator part of the generative adversarial network (GAN) model, a customized U-Net network is employed to segment the cardiac image. A combination of inception and residual blocks and utilizing an attention mechanism yield an improved version of U-Net. On the other hand, the discriminator part of the GAN model is designed to accurately distinguish between the ground truth image and the segmented image generated by the generator part. Our method is evaluated on two cardiac MRI datasets. The evaluation results on the ACDC 2017 challenge dataset show mean Dice scores of 0.947 for LV, 0.919 for RV, and 0.907 for MYO in cardiac MRI segmentation. The experimental results highlight that our proposed IRAU-Net method outperforms other state-of-the-art methods in terms of accuracy while significantly reducing computational costs.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"260-272"},"PeriodicalIF":8.7,"publicationDate":"2024-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512840","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-17DOI: 10.1109/JSTSP.2024.3513692
Kaidi Xu;Shenglong Zhou;Geoffrey Ye Li
Resource allocation significantly impacts the performance of vehicle-to-everything (V2X) networks in next generation multiple access (NGMA). Most existing algorithms for resource allocation are based on optimization or machine learning (e.g., reinforcement learning). In this paper, we explore resource allocation in a NGMA V2X network under the framework of federated reinforcement learning (FRL). On one hand, the usage of RL overcomes many challenges from the model-based optimization schemes. On the other hand, federated learning (FL) enables agents to deal with a number of practical issues, such as privacy, communication overhead, distributed learning, and exploration efficiency. The framework of FRL is then implemented by the inexact alternative direction method of multipliers (ADMM), where subproblems are solved approximately using policy gradients and accelerated by an adaptive step size calculated from their second moments. The developed algorithm, PASM, is proven to be convergent under mild conditions and has a nice numerical performance compared with some baseline methods for solving the resource allocation problems in a NGMA V2X network.
{"title":"Federated Reinforcement Learning for Resource Allocation in V2X Networks","authors":"Kaidi Xu;Shenglong Zhou;Geoffrey Ye Li","doi":"10.1109/JSTSP.2024.3513692","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3513692","url":null,"abstract":"Resource allocation significantly impacts the performance of vehicle-to-everything (V2X) networks in next generation multiple access (NGMA). Most existing algorithms for resource allocation are based on optimization or machine learning (e.g., reinforcement learning). In this paper, we explore resource allocation in a NGMA V2X network under the framework of federated reinforcement learning (FRL). On one hand, the usage of RL overcomes many challenges from the model-based optimization schemes. On the other hand, federated learning (FL) enables agents to deal with a number of practical issues, such as privacy, communication overhead, distributed learning, and exploration efficiency. The framework of FRL is then implemented by the inexact alternative direction method of multipliers (ADMM), where subproblems are solved approximately using policy gradients and accelerated by an adaptive step size calculated from their second moments. The developed algorithm, PASM, is proven to be convergent under mild conditions and has a nice numerical performance compared with some baseline methods for solving the resource allocation problems in a NGMA V2X network.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 7","pages":"1210-1221"},"PeriodicalIF":8.7,"publicationDate":"2024-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-12-16DOI: 10.1109/JSTSP.2024.3518257
Wenhao Yang;Jianguo Wei;Wenhuan Lu;Lei Li;Xugang Lu
Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learning (CRSL) framework that enhances the robustness of the current speaker verification pipeline, considering data source, data augmentation, and the efficiency of model transfer processes. Our framework introduces an augmentation module that mitigates bandwidth variations in radio speech datasets by manipulating the bandwidth of training inputs. It also addresses unknown noise by introducing noise within the manifold space. Additionally, we propose an efficient fine-tuning method that reduces the need for extensive additional training time and large amounts of data. Moreover, we develop a toolkit for assembling a large-scale radio speech corpus and establish a benchmark specifically tailored for radio scenario speaker verification studies. Experimental results demonstrate that our proposed methodology effectively enhances performance and mitigates degradation caused by radio transmission in speaker verification tasks.
{"title":"Robust Channel Learning for Large-Scale Radio Speaker Verification","authors":"Wenhao Yang;Jianguo Wei;Wenhuan Lu;Lei Li;Xugang Lu","doi":"10.1109/JSTSP.2024.3518257","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3518257","url":null,"abstract":"Recent research in speaker verification has increasingly focused on achieving robust and reliable recognition under challenging channel conditions and noisy environments. Identifying speakers in radio communications is particularly difficult due to inherent limitations such as constrained bandwidth and pervasive noise interference. To address this issue, we present a Channel Robust Speaker Learning (CRSL) framework that enhances the robustness of the current speaker verification pipeline, considering data source, data augmentation, and the efficiency of model transfer processes. Our framework introduces an augmentation module that mitigates bandwidth variations in radio speech datasets by manipulating the bandwidth of training inputs. It also addresses unknown noise by introducing noise within the manifold space. Additionally, we propose an efficient fine-tuning method that reduces the need for extensive additional training time and large amounts of data. Moreover, we develop a toolkit for assembling a large-scale radio speech corpus and establish a benchmark specifically tailored for radio scenario speaker verification studies. Experimental results demonstrate that our proposed methodology effectively enhances performance and mitigates degradation caused by radio transmission in speaker verification tasks.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"248-259"},"PeriodicalIF":8.7,"publicationDate":"2024-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}