Automatic modulation classification (AMC) is one of the crucial technologies for designing an intelligent and efficient transceiver for future wireless communications. However, the channel interferences can cause instability in traditional signal representations, such as inphase and quadrature (I/Q) sequence, and constellations, leading to poor generalization and significant classification performance degradation in new channel environments. Retraining the classifier to achieve robust and effective performance in such cases requires a large number of re-collected samples and consumes vast computational resources, which makes it costly and difficult to apply in practice. To solve this problem, we propose the grayscale spectral quotient constellation matrix (GSQCM)-based AMC methods using deep learning (DL) in orthogonal frequency division multiplexing (OFDM) systems, which do not require retraining the classifier or performing equalization even for the unseen channel cases. Specifically, we first propose a novel method, named bidirectional and multi-step spectral cyclic division (BMSSCD), to generate the channel-robust spectral quotient signals in a length-extension manner. Then, we convert these generated signals into dimension-specific GSQCMs. Finally, the GSQCMs are used as the input to train our classifiers based on several classical DL models, such as AlexNet, VGGNet, GoogLeNet, and ResNet. It is noted that all of the DL-based classifiers are trained under additive white Gaussian noise (AWGN) channel but tested under Rician and Rayleigh multipath fading channels. Extensive simulations show that (i) the novel signal representation, i.e., GSQCM, is well suited as network input for the DL-based AMC methods to train the reliable classifiers, avoiding the model overfitting on the dataset collected under a specific channel condition, (ii) the proposed GSQCM-DL methods exhibit strong generalization, achieving robust and superior performance in comparison to some existing methods when the unseen propagation scenarios are considered.
{"title":"Reliable Automatic Modulation Classification via Grayscale Spectral Quotient Constellation Matrix and Deep Learning Models","authors":"Jiashuo He;Yuting Chen;Shanchuan Ying;Shuo Chang;Sai Huang;Zhiyong Feng","doi":"10.1109/JSTSP.2025.3547223","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3547223","url":null,"abstract":"Automatic modulation classification (AMC) is one of the crucial technologies for designing an intelligent and efficient transceiver for future wireless communications. However, the channel interferences can cause instability in traditional signal representations, such as inphase and quadrature (I/Q) sequence, and constellations, leading to poor generalization and significant classification performance degradation in new channel environments. Retraining the classifier to achieve robust and effective performance in such cases requires a large number of re-collected samples and consumes vast computational resources, which makes it costly and difficult to apply in practice. To solve this problem, we propose the grayscale spectral quotient constellation matrix (GSQCM)-based AMC methods using deep learning (DL) in orthogonal frequency division multiplexing (OFDM) systems, which do not require retraining the classifier or performing equalization even for the unseen channel cases. Specifically, we first propose a novel method, named bidirectional and multi-step spectral cyclic division (BMSSCD), to generate the channel-robust spectral quotient signals in a length-extension manner. Then, we convert these generated signals into dimension-specific GSQCMs. Finally, the GSQCMs are used as the input to train our classifiers based on several classical DL models, such as AlexNet, VGGNet, GoogLeNet, and ResNet. It is noted that all of the DL-based classifiers are trained under additive white Gaussian noise (AWGN) channel but tested under Rician and Rayleigh multipath fading channels. Extensive simulations show that (i) the novel signal representation, i.e., GSQCM, is well suited as network input for the DL-based AMC methods to train the reliable classifiers, avoiding the model overfitting on the dataset collected under a specific channel condition, (ii) the proposed GSQCM-DL methods exhibit strong generalization, achieving robust and superior performance in comparison to some existing methods when the unseen propagation scenarios are considered.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"583-594"},"PeriodicalIF":8.7,"publicationDate":"2025-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-09DOI: 10.1109/JSTSP.2025.3568585
Jianhui Lv;Wadii Boulila;Shalli Rani;Huamao Jiang
Communication in healthcare settings is sometimes affected by ambient noise, resulting in possible misunderstanding of essential information. We introduce the healthcare audio-visual deep fusion (HAV-DF) model, an innovative method that improves speech comprehension in clinical environments by intelligently merging acoustic and visual data. The HAV-DF model has three key advancements. First, it utilizes a medical video interface that collects nuanced visual signals pertinent to medical communication. Then, it employs an advanced multimodal fusion method that adaptively modifies the integration of auditory and visual data in response to noisy situations. Finally, it employs an innovative loss function that integrates healthcare-specific indicators to increase voice optimization for medical applications. Experimental findings on the MedDialog and MedVidQA datasets illustrate the efficacy of the proposed model efficacy under diverse noise situations. In low SNR situations (−5dB), HAV-DF attains a PESQ score of 2.45, indicating a 25% enhancement compared to leading approaches. The model achieves a medical term preservation rate of 93.18% under difficult acoustic settings, markedly surpassing current methodologies. These enhancements provide more dependable communication across many therapeutic contexts, from emergency departments to telemedicine consultations.
{"title":"Enhanced Multimodal Speech Processing for Healthcare Applications: A Deep Fusion Approach","authors":"Jianhui Lv;Wadii Boulila;Shalli Rani;Huamao Jiang","doi":"10.1109/JSTSP.2025.3568585","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3568585","url":null,"abstract":"Communication in healthcare settings is sometimes affected by ambient noise, resulting in possible misunderstanding of essential information. We introduce the healthcare audio-visual deep fusion (HAV-DF) model, an innovative method that improves speech comprehension in clinical environments by intelligently merging acoustic and visual data. The HAV-DF model has three key advancements. First, it utilizes a medical video interface that collects nuanced visual signals pertinent to medical communication. Then, it employs an advanced multimodal fusion method that adaptively modifies the integration of auditory and visual data in response to noisy situations. Finally, it employs an innovative loss function that integrates healthcare-specific indicators to increase voice optimization for medical applications. Experimental findings on the MedDialog and MedVidQA datasets illustrate the efficacy of the proposed model efficacy under diverse noise situations. In low SNR situations (−5dB), HAV-DF attains a PESQ score of 2.45, indicating a 25% enhancement compared to leading approaches. The model achieves a medical term preservation rate of 93.18% under difficult acoustic settings, markedly surpassing current methodologies. These enhancements provide more dependable communication across many therapeutic contexts, from emergency departments to telemedicine consultations.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"600-612"},"PeriodicalIF":8.7,"publicationDate":"2025-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144501943","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-07DOI: 10.1109/JSTSP.2025.3567838
Vahid Ahmadi Kalkhorani;Cheng Yu;Anurag Kumar;Ke Tan;Buye Xu;DeLiang Wang
Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the TF-CrossNet architecture, which is a recently proposed network that performs complex spectral mapping for speech separation by leveraging global attention and positional encoding. To effectively utilize visual cues, the proposed system incorporates pre-extracted visual embeddings and employs a visual encoder comprising temporal convolutional layers. Audio and visual features are fused in an early fusion layer before feeding to AV-CrossNet blocks. We evaluate AV-CrossNet on multiple datasets, including LRS, VoxCeleb, TCD-TIMIT, and COG-MHEAR challenge, in terms of the performance metrics of PESQ, STOI, SNR and SDR. Evaluation results demonstrate that AV-CrossNet advances the state-of-the-art performance in all audiovisual tasks, even on untrained and mismatched datasets.
{"title":"AV-CrossNet: An Audiovisual Complex Spectral Mapping Network for Speech Separation by Leveraging Narrow- and Cross-Band Modeling","authors":"Vahid Ahmadi Kalkhorani;Cheng Yu;Anurag Kumar;Ke Tan;Buye Xu;DeLiang Wang","doi":"10.1109/JSTSP.2025.3567838","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3567838","url":null,"abstract":"Adding visual cues to audio-based speech separation can improve separation performance. This paper introduces AV-CrossNet, an audiovisual (AV) system for speech enhancement, target speaker extraction, and multi-talker speaker separation. AV-CrossNet is extended from the TF-CrossNet architecture, which is a recently proposed network that performs complex spectral mapping for speech separation by leveraging global attention and positional encoding. To effectively utilize visual cues, the proposed system incorporates pre-extracted visual embeddings and employs a visual encoder comprising temporal convolutional layers. Audio and visual features are fused in an early fusion layer before feeding to AV-CrossNet blocks. We evaluate AV-CrossNet on multiple datasets, including LRS, VoxCeleb, TCD-TIMIT, and COG-MHEAR challenge, in terms of the performance metrics of PESQ, STOI, SNR and SDR. Evaluation results demonstrate that AV-CrossNet advances the state-of-the-art performance in all audiovisual tasks, even on untrained and mismatched datasets.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 4","pages":"685-694"},"PeriodicalIF":8.7,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144501031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-02DOI: 10.1109/JSTSP.2025.3562641
{"title":"IEEE Signal Processing Society Publication Information","authors":"","doi":"10.1109/JSTSP.2025.3562641","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3562641","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 2","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982379","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143900552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-02DOI: 10.1109/JSTSP.2025.3542164
Tsung-Hui Chang;Eduard A. Jorswieck;Erik G. Larsson;Xiao Li;A. Lee Swindlehurst
{"title":"Guest Editorial Distributed Signal Processing for Extremely Large-Scale Antenna Array Systems","authors":"Tsung-Hui Chang;Eduard A. Jorswieck;Erik G. Larsson;Xiao Li;A. Lee Swindlehurst","doi":"10.1109/JSTSP.2025.3542164","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3542164","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 2","pages":"298-303"},"PeriodicalIF":8.7,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982380","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143900472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-03-02DOI: 10.1109/JSTSP.2025.3562646
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2025.3562646","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3562646","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 2","pages":"C3-C3"},"PeriodicalIF":8.7,"publicationDate":"2025-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10982378","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143900540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-28DOI: 10.1109/JSTSP.2025.3546083
Hansung Choi;Daewon Seo
The concept of a minimax classifier is well-established in statistical decision theory, but its implementation via neural networks remains challenging, particularly in scenarios with imbalanced training data having a limited number of samples for minority classes. To address this issue, we propose a novel minimax learning algorithm designed to minimize the risk of worst-performing classes. Our algorithm iterates through two steps: a minimization step that trains the model based on a selected target prior, and a maximization step that updates the target prior towards the adversarial prior for the trained model. In the minimization, we introduce a targeted logit-adjustment loss function that efficiently identifies optimal decision boundaries under the target prior. Moreover, based on a new prior-dependent generalization bound that we obtained, we theoretically prove that our loss function has a better generalization capability than existing loss functions. During the maximization, we refine the target prior by shifting it towards the adversarial prior, depending on the worst-performing classes rather than on per-class risk estimates. Our maximization method is particularly robust in the regime of a small number of samples. Additionally, to adapt to overparameterized neural networks, we partition the entire training dataset into two subsets: one for model training during the minimization step and the other for updating the target prior during the maximization step. Our proposed algorithm has a provable convergence property, and empirical results indicate that our algorithm performs better than or is comparable to existing methods.
{"title":"Deep Minimax Classifiers for Imbalanced Datasets With a Small Number of Minority Samples","authors":"Hansung Choi;Daewon Seo","doi":"10.1109/JSTSP.2025.3546083","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3546083","url":null,"abstract":"The concept of a minimax classifier is well-established in statistical decision theory, but its implementation via neural networks remains challenging, particularly in scenarios with imbalanced training data having a limited number of samples for minority classes. To address this issue, we propose a novel minimax learning algorithm designed to minimize the risk of worst-performing classes. Our algorithm iterates through two steps: a minimization step that trains the model based on a selected target prior, and a maximization step that updates the target prior towards the adversarial prior for the trained model. In the minimization, we introduce a targeted logit-adjustment loss function that efficiently identifies optimal decision boundaries under the target prior. Moreover, based on a new prior-dependent generalization bound that we obtained, we theoretically prove that our loss function has a better generalization capability than existing loss functions. During the maximization, we refine the target prior by shifting it towards the adversarial prior, depending on the worst-performing classes rather than on per-class risk estimates. Our maximization method is particularly robust in the regime of a small number of samples. Additionally, to adapt to overparameterized neural networks, we partition the entire training dataset into two subsets: one for model training during the minimization step and the other for updating the target prior during the maximization step. Our proposed algorithm has a provable convergence property, and empirical results indicate that our algorithm performs better than or is comparable to existing methods.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"491-506"},"PeriodicalIF":8.7,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-27DOI: 10.1109/JSTSP.2025.3539494
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2025.3539494","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3539494","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"C3-C3"},"PeriodicalIF":8.7,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10906681","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143521551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-27DOI: 10.1109/JSTSP.2025.3539490
{"title":"IEEE Signal Processing Society Publication Information","authors":"","doi":"10.1109/JSTSP.2025.3539490","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3539490","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10906684","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-20DOI: 10.1109/JSTSP.2025.3544024
Basarbatu Can;Soner Ozgun Pelvan;Huseyin Ozkan
We consider the statistical anomaly detection problem with regard to false alarm rate (or false positive rate, FPR) controllability, nonlinear modeling and computational efficiency for real-time processing. A decision theoretical solution can be formulated as Neyman-Pearson (NP) hypothesis testing (binary classification: anomaly/nominal). In this framework, we propose an ensemble NP classifier (Tree OLNP) that is based on a binary partitioning tree. Tree OLNP generates an ensemble of sample space partitions. Each partition corresponds to an online piecewise linear (hence nonlinear) expert classifier as a union of online linear NP classifiers (union of OLNPs). While maintaining a precise control over the FPR, Tree OLNP generates its overall prediction as a performance driven and time varying weighted combination of the experts. This provides a dynamical nonlinear modeling power in the sense that simpler (more powerful) experts receive larger weights early (late) in the data stream, which manages the bias-variance trade-off and mitigates overfitting/underfitting issues. We mathematically prove that, for any stream, Tree OLNP asymptotically performs at least as well as of the best expert in terms of the NP performance with a regret diminishing in the order $O(1/sqrt{t})$ ($t:$ data size). Our algorithm is computationally highly efficient since it is online and its complexity scales linearly with respect to both the data size and tree depth, and scales twice-logarithmic with respect to the number of experts. We experimentally show that Tree OLNP strongly outperforms the state-of-the-art alternative techniques.
{"title":"Online Neyman-Pearson Classification With Hierarchically Represented Models","authors":"Basarbatu Can;Soner Ozgun Pelvan;Huseyin Ozkan","doi":"10.1109/JSTSP.2025.3544024","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3544024","url":null,"abstract":"We consider the statistical anomaly detection problem with regard to false alarm rate (or false positive rate, FPR) controllability, nonlinear modeling and computational efficiency for real-time processing. A decision theoretical solution can be formulated as Neyman-Pearson (NP) hypothesis testing (binary classification: anomaly/nominal). In this framework, we propose an ensemble NP classifier (Tree OLNP) that is based on a binary partitioning tree. Tree OLNP generates an ensemble of sample space partitions. Each partition corresponds to an online piecewise linear (hence nonlinear) expert classifier as a union of online linear NP classifiers (union of OLNPs). While maintaining a precise control over the FPR, Tree OLNP generates its overall prediction as a performance driven and time varying weighted combination of the experts. This provides a dynamical nonlinear modeling power in the sense that simpler (more powerful) experts receive larger weights early (late) in the data stream, which manages the bias-variance trade-off and mitigates overfitting/underfitting issues. We mathematically prove that, for any stream, Tree OLNP asymptotically performs at least as well as of the best expert in terms of the NP performance with a regret diminishing in the order <inline-formula><tex-math>$O(1/sqrt{t})$</tex-math></inline-formula> (<inline-formula><tex-math>$t:$</tex-math></inline-formula> data size). Our algorithm is computationally highly efficient since it is online and its complexity scales linearly with respect to both the data size and tree depth, and scales twice-logarithmic with respect to the number of experts. We experimentally show that Tree OLNP strongly outperforms the state-of-the-art alternative techniques.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 3","pages":"478-490"},"PeriodicalIF":8.7,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144073107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}