Pub Date : 2024-09-09DOI: 10.1109/JSTSP.2024.3454957
Di You;Pier Luigi Dragotti
Generative diffusion models are becoming one of the most popular prior in image restoration (IR) tasks due to their remarkable ability to generate realistic natural images. Despite achieving satisfactory results, IR methods based on diffusion models present several limitations. First of all, most non-blind approaches require an analytical expression of the degradation model to guide the sampling process. Secondly, most existing blind approaches rely on families of pre-defined degradation models for training their deep networks. The above issues limit the flexibility of these approaches and so their ability to handle real-world degradation tasks. In this paper, we propose a novel INN-guided probabilistic diffusion algorithm for non-blind and blind image restoration, namely INDIGO and BlindINDIGO, which combines the merits of the perfect reconstruction property of invertible neural networks (INN) with the strong generative capabilities of pre-trained diffusion models. Specifically, we train the forward process of the INN to simulate an arbitrary degradation process and use the inverse to obtain an intermediate image that we use to guide the reverse diffusion sampling process through a gradient step. We also introduce an initialization strategy, to further improve the performance and inference speed of our algorithm. Experiments demonstrate that our algorithm obtains competitive results compared with recently leading methods both quantitatively and visually on synthetic and real-world low-quality images.
{"title":"INDIGO+: A Unified INN-Guided Probabilistic Diffusion Algorithm for Blind and Non-Blind Image Restoration","authors":"Di You;Pier Luigi Dragotti","doi":"10.1109/JSTSP.2024.3454957","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3454957","url":null,"abstract":"Generative diffusion models are becoming one of the most popular prior in image restoration (IR) tasks due to their remarkable ability to generate realistic natural images. Despite achieving satisfactory results, IR methods based on diffusion models present several limitations. First of all, most non-blind approaches require an analytical expression of the degradation model to guide the sampling process. Secondly, most existing blind approaches rely on families of pre-defined degradation models for training their deep networks. The above issues limit the flexibility of these approaches and so their ability to handle real-world degradation tasks. In this paper, we propose a novel INN-guided probabilistic diffusion algorithm for non-blind and blind image restoration, namely INDIGO and BlindINDIGO, which combines the merits of the perfect reconstruction property of invertible neural networks (INN) with the strong generative capabilities of pre-trained diffusion models. Specifically, we train the forward process of the INN to simulate an arbitrary degradation process and use the inverse to obtain an intermediate image that we use to guide the reverse diffusion sampling process through a gradient step. We also introduce an initialization strategy, to further improve the performance and inference speed of our algorithm. Experiments demonstrate that our algorithm obtains competitive results compared with recently leading methods both quantitatively and visually on synthetic and real-world low-quality images.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"1108-1122"},"PeriodicalIF":8.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10670023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1109/JSTSP.2024.3454948
Shuai Gao;Fan Xu;Qingjiang Shi
This paper studies the channel acquisition problem in multi-input-multi-output orthogonal frequency division multiplexing networks based on channel statistical information, aiming at mitigating the interference caused by users sharing the same resource blocks and the same pilot signal in massive access. A novel feature domain is established for wireless channels by approximating the channel into a linear combination of statistical subchannels, so as to reduce the number of parameters to be estimated as well as enhance the accuracy of channel acquisition. In order to estimate the multipliers of subchannels in the linear combination, a zero-forcing-based and a minimum-mean-square-error-based iterative algorithms are proposed to optimize the transceiver matrices for feature-domain channel acquisition. Simulation results show that the proposed schemes achieve a more accurate acquisition of the channels than the existing channel acquisition methods when a considerable number of users share the same resource blocks, demonstrating the effectiveness of the proposed feature-domain channel acquisition methods for massive access.
{"title":"A Feature-Domain Channel Acquisition Scheme for MIMO-OFDM","authors":"Shuai Gao;Fan Xu;Qingjiang Shi","doi":"10.1109/JSTSP.2024.3454948","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3454948","url":null,"abstract":"This paper studies the channel acquisition problem in multi-input-multi-output orthogonal frequency division multiplexing networks based on channel statistical information, aiming at mitigating the interference caused by users sharing the same resource blocks and the same pilot signal in massive access. A novel feature domain is established for wireless channels by approximating the channel into a linear combination of statistical subchannels, so as to reduce the number of parameters to be estimated as well as enhance the accuracy of channel acquisition. In order to estimate the multipliers of subchannels in the linear combination, a zero-forcing-based and a minimum-mean-square-error-based iterative algorithms are proposed to optimize the transceiver matrices for feature-domain channel acquisition. Simulation results show that the proposed schemes achieve a more accurate acquisition of the channels than the existing channel acquisition methods when a considerable number of users share the same resource blocks, demonstrating the effectiveness of the proposed feature-domain channel acquisition methods for massive access.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 7","pages":"1351-1365"},"PeriodicalIF":8.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-09DOI: 10.1109/JSTSP.2024.3455102
Konstantinos D. Katsanos;Paolo Di Lorenzo;George C. Alexandropoulos
The plethora of wirelessly connected devices, whose deployment density is expected to largely increase in the upcoming sixth Generation (6G) of wireless networks, will naturally necessitate substantial advances in multiple access schemes. Reconfigurable Intelligent Surfaces (RISs) constitute a candidate 6G technology capable to offer dynamic over-the-air signal propagation programmability, which can be optimized for efficient non-orthogonal access of a multitude of devices. In this paper, we study the downlink of a wideband communication system comprising multiple multi-antenna Base Stations (BSs), each wishing to serve an associated single-antenna user via the assistance of a Beyond Diagonal (BD) and frequency-selective RIS. Under the assumption that each BS performs Orthogonal Frequency Division Multiplexing (OFDM) transmissions and exclusively controls a distinct RIS, we focus on the sum-rate maximization problem and present a distributed joint design of the linear precoders at the BSs as well as the tunable capacitances and the switch selection matrices at the multiple BD RISs. The formulated non-convex design optimization problem is solved via successive concave approximation necessitating minimal cooperation among the BSs. Our extensive simulation results showcase the performance superiority of the proposed cooperative scheme over non-cooperation benchmarks, indicating the performance gains with BD RISs via the presented optimized frequency selective operation for various scenarios.
{"title":"Multi-RIS-Empowered Multiple Access: A Distributed Sum-Rate Maximization Approach","authors":"Konstantinos D. Katsanos;Paolo Di Lorenzo;George C. Alexandropoulos","doi":"10.1109/JSTSP.2024.3455102","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3455102","url":null,"abstract":"The plethora of wirelessly connected devices, whose deployment density is expected to largely increase in the upcoming sixth Generation (6G) of wireless networks, will naturally necessitate substantial advances in multiple access schemes. Reconfigurable Intelligent Surfaces (RISs) constitute a candidate 6G technology capable to offer dynamic over-the-air signal propagation programmability, which can be optimized for efficient non-orthogonal access of a multitude of devices. In this paper, we study the downlink of a wideband communication system comprising multiple multi-antenna Base Stations (BSs), each wishing to serve an associated single-antenna user via the assistance of a Beyond Diagonal (BD) and frequency-selective RIS. Under the assumption that each BS performs Orthogonal Frequency Division Multiplexing (OFDM) transmissions and exclusively controls a distinct RIS, we focus on the sum-rate maximization problem and present a distributed joint design of the linear precoders at the BSs as well as the tunable capacitances and the switch selection matrices at the multiple BD RISs. The formulated non-convex design optimization problem is solved via successive concave approximation necessitating minimal cooperation among the BSs. Our extensive simulation results showcase the performance superiority of the proposed cooperative scheme over non-cooperation benchmarks, indicating the performance gains with BD RISs via the presented optimized frequency selective operation for various scenarios.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 7","pages":"1324-1338"},"PeriodicalIF":8.7,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate localization of mobile terminals is a pivotal aspect of integrated sensing and communication systems. Traditional fingerprint-based localization methods, which infer coordinates from channel information within pre-set rectangular areas, often face challenges due to the heterogeneous distribution of fingerprints inherent in non-line-of-sight (NLOS) scenarios, particularly within orthogonal frequency division multiplexing systems. To overcome this limitation, we develop a novel multi-sources information fusion learning framework referred to as the Autosync Multi-Domains NLOS Localization (AMDNLoc). Specifically, AMDNLoc employs a two-stage matched filter fused with a target tracking algorithm and iterative centroid-based clustering to automatically and irregularly segment NLOS regions, ensuring uniform distribution within channel state information across frequency, power, and time-delay domains. Additionally, the framework utilizes a segment-specific linear classifier array, coupled with deep residual network-based feature extraction and fusion, to establish the correlation function between fingerprint features and coordinates within these regions. Simulation results reveal that AMDNLoc achieves an impressive NLOS localization accuracy of 1.46 meters on typical wireless artificial intelligence research datasets and demonstrates significant improvements in interpretability, adaptability, and scalability.
{"title":"Multi-Sources Fusion Learning for Multi-Points NLOS Localization in OFDM System","authors":"Bohao Wang;Zitao Shuai;Chongwen Huang;Qianqian Yang;Zhaohui Yang;Richeng Jin;Ahmed Al Hammadi;Zhaoyang Zhang;Chau Yuen;Mérouane Debbah","doi":"10.1109/JSTSP.2024.3453548","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3453548","url":null,"abstract":"Accurate localization of mobile terminals is a pivotal aspect of integrated sensing and communication systems. Traditional fingerprint-based localization methods, which infer coordinates from channel information within pre-set rectangular areas, often face challenges due to the heterogeneous distribution of fingerprints inherent in non-line-of-sight (NLOS) scenarios, particularly within orthogonal frequency division multiplexing systems. To overcome this limitation, we develop a novel multi-sources information fusion learning framework referred to as the Autosync Multi-Domains NLOS Localization (AMDNLoc). Specifically, AMDNLoc employs a two-stage matched filter fused with a target tracking algorithm and iterative centroid-based clustering to automatically and irregularly segment NLOS regions, ensuring uniform distribution within channel state information across frequency, power, and time-delay domains. Additionally, the framework utilizes a segment-specific linear classifier array, coupled with deep residual network-based feature extraction and fusion, to establish the correlation function between fingerprint features and coordinates within these regions. Simulation results reveal that AMDNLoc achieves an impressive NLOS localization accuracy of 1.46 meters on typical wireless artificial intelligence research datasets and demonstrates significant improvements in interpretability, adaptability, and scalability.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 7","pages":"1339-1350"},"PeriodicalIF":8.7,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-05DOI: 10.1109/JSTSP.2024.3454980
Anton Obukhov;Mikhail Usvyatsov;Christos Sakaridis;Konrad Schindler;Luc Van Gool
Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterization of the neural field, trained with backpropagation to minimize a non-convex objective. We analyze the effect of low-rank compression on the downstream task quality metrics in two settings. First, we demonstrate the efficiency of our method in a sandbox task of tensor denoising, which admits comparison with SVD-based schemes designed to minimize reconstruction error. Furthermore, we apply the proposed approach to Neural Radiance Fields, where the low-rank structure of the field corresponding to the best quality can be discovered only through learning.
{"title":"TT-NF: Tensor Train Neural Fields","authors":"Anton Obukhov;Mikhail Usvyatsov;Christos Sakaridis;Konrad Schindler;Luc Van Gool","doi":"10.1109/JSTSP.2024.3454980","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3454980","url":null,"abstract":"Learning neural fields has been an active topic in deep learning research, focusing, among other issues, on finding more compact and easy-to-fit representations. In this paper, we introduce a novel low-rank representation termed Tensor Train Neural Fields (TT-NF) for learning neural fields on dense regular grids and efficient methods for sampling from them. Our representation is a TT parameterization of the neural field, trained with backpropagation to minimize a non-convex objective. We analyze the effect of low-rank compression on the downstream task quality metrics in two settings. First, we demonstrate the efficiency of our method in a sandbox task of tensor denoising, which admits comparison with SVD-based schemes designed to minimize reconstruction error. Furthermore, we apply the proposed approach to Neural Radiance Fields, where the low-rank structure of the field corresponding to the best quality can be discovered only through learning.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 6","pages":"1024-1035"},"PeriodicalIF":8.7,"publicationDate":"2024-09-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143106521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1109/JSTSP.2024.3424083
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2024.3424083","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3424083","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"C3-C3"},"PeriodicalIF":8.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10665749","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1109/JSTSP.2024.3424079
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2024.3424079","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3424079","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"C2-C2"},"PeriodicalIF":8.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10665923","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-04DOI: 10.1109/JSTSP.2024.3445048
Wenbo Ding
{"title":"Editorial Introduction for the Special Issue on Intelligent Robotics: Sensing, Signal Processing and Interaction","authors":"Wenbo Ding","doi":"10.1109/JSTSP.2024.3445048","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3445048","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"18 3","pages":"263-266"},"PeriodicalIF":8.7,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10665936","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142137522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The development of deep learning technology has injected new vitality into the task of automatic modulation recognition (AMR). Despite achieving promising progress, existing models tend to lose recognition capability in low-quality communication environments due to the neglect of latent distributions within the data, i.e., classifying samples in a single feature space, resulting in unsatisfactory performance. Motivated by this observation, this paper aims to rethink the modulation signals classification from a new perspective on the latent data distribution. To address this, we propose a novel efficient divide-and-conquer domain adapter (EDDA) for AMR tasks, significantly enhancing the existing model's performance in challenging scenarios, irrespective of its architecture. Specifically, we first follow a divide-and-conquer approach to divide the raw data into multiple sub-domain spaces by signal-to-noise ratio (SNR), and then encourage the domain adapter to estimate the latent distributions and learn domain internally-invariant feature projections. Subsequently, we introduce a dynamic strategy for updating domain labels to overcome the limitations of the initial domain label partition by SNR. Finally, we provide theoretical support for EDDA and validate its effectiveness on two widely used benchmark datasets, RadioML2016.10a and RadioML2016.10b. Experimental results show that EDDA achieves average accuracy improvements of 11.63% and 2.32% on the respective datasets. Theoretical and experimental results demonstrate the superiority and versatility of EDDA.
{"title":"EDDA:An Efficient Divide-and-Conquer Domain Adapter for Automatics Modulation Recognition","authors":"Xiangrong Zhang;Yifan Chen;Guanchun Wang;Yifang Zhang;Licheng Jiao","doi":"10.1109/JSTSP.2024.3453559","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3453559","url":null,"abstract":"The development of deep learning technology has injected new vitality into the task of automatic modulation recognition (AMR). Despite achieving promising progress, existing models tend to lose recognition capability in low-quality communication environments due to the neglect of latent distributions within the data, i.e., classifying samples in a single feature space, resulting in unsatisfactory performance. Motivated by this observation, this paper aims to rethink the modulation signals classification from a new perspective on the latent data distribution. To address this, we propose a novel efficient divide-and-conquer domain adapter (EDDA) for AMR tasks, significantly enhancing the existing model's performance in challenging scenarios, irrespective of its architecture. Specifically, we first follow a divide-and-conquer approach to divide the raw data into multiple sub-domain spaces by signal-to-noise ratio (SNR), and then encourage the domain adapter to estimate the latent distributions and learn domain internally-invariant feature projections. Subsequently, we introduce a dynamic strategy for updating domain labels to overcome the limitations of the initial domain label partition by SNR. Finally, we provide theoretical support for EDDA and validate its effectiveness on two widely used benchmark datasets, RadioML2016.10a and RadioML2016.10b. Experimental results show that EDDA achieves average accuracy improvements of 11.63% and 2.32% on the respective datasets. Theoretical and experimental results demonstrate the superiority and versatility of EDDA.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"140-153"},"PeriodicalIF":8.7,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512842","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-03DOI: 10.1109/JSTSP.2024.3434498
Jingge Wang;Liyan Xie;Yao Xie;Shao-Lun Huang;Yang Li
Domain generalization aims at learning a universal model that performs well on unseen target domains, incorporating knowledge from multiple source domains. In this research, we consider the scenario where different domain shifts occur among conditional distributions of different classes across domains. When labeled samples in the source domains are limited, existing approaches are not sufficiently robust. To address this problem, we propose a novel domain generalization framework called Wasserstein Distributionally Robust Domain Generalization (WDRDG), inspired by the concept of distributionally robust optimization. We encourage robustness over conditional distributions within class-specific Wasserstein uncertainty sets and optimize the worst-case performance of a classifier over these uncertainty sets. We further develop a test-time adaptation module, leveraging optimal transport to quantify the relationship between the unseen target domain and source domains to make adaptive inferences for target data. Experiments on the Rotated MNIST, PACS, and VLCS datasets demonstrate that our method could effectively balance the robustness and discriminability in challenging generalization scenarios.
{"title":"Generalizing to Unseen Domains With Wasserstein Distributional Robustness Under Limited Source Knowledge","authors":"Jingge Wang;Liyan Xie;Yao Xie;Shao-Lun Huang;Yang Li","doi":"10.1109/JSTSP.2024.3434498","DOIUrl":"https://doi.org/10.1109/JSTSP.2024.3434498","url":null,"abstract":"Domain generalization aims at learning a universal model that performs well on unseen target domains, incorporating knowledge from multiple source domains. In this research, we consider the scenario where different domain shifts occur among conditional distributions of different classes across domains. When labeled samples in the source domains are limited, existing approaches are not sufficiently robust. To address this problem, we propose a novel domain generalization framework called Wasserstein Distributionally Robust Domain Generalization (WDRDG), inspired by the concept of distributionally robust optimization. We encourage robustness over conditional distributions within class-specific Wasserstein uncertainty sets and optimize the worst-case performance of a classifier over these uncertainty sets. We further develop a test-time adaptation module, leveraging optimal transport to quantify the relationship between the unseen target domain and source domains to make adaptive inferences for target data. Experiments on the Rotated MNIST, PACS, and VLCS datasets demonstrate that our method could effectively balance the robustness and discriminability in challenging generalization scenarios.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 1","pages":"103-114"},"PeriodicalIF":8.7,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143512886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}