Pub Date : 2025-12-17DOI: 10.1109/tnnls.2025.3640733
Bo Liu, Keyi Fu, Tongtong Yuan, Shen Geng
{"title":"Spurious Local Minima Provably Exist for Deep CNNs: Theory and Application","authors":"Bo Liu, Keyi Fu, Tongtong Yuan, Shen Geng","doi":"10.1109/tnnls.2025.3640733","DOIUrl":"https://doi.org/10.1109/tnnls.2025.3640733","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"155 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-17DOI: 10.1109/tnnls.2025.3642162
Gang Hu, Feng Zhao, Essam H. Houssein
{"title":"AFoCo: Ambiguous Focus and Correction for Semi-Supervised Medical Image Segmentation","authors":"Gang Hu, Feng Zhao, Essam H. Houssein","doi":"10.1109/tnnls.2025.3642162","DOIUrl":"https://doi.org/10.1109/tnnls.2025.3642162","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"12 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145770812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-12DOI: 10.1109/tnnls.2025.3640274
Jinyung Hong, Eun Som Jeon, Matthew P. Buman, Pavan Turaga, Theodore P. Pavlic
{"title":"Improved Knowledge Distillation Based on Global Latent Workspace With Multimodal Knowledge Fusion for Understanding Topological Guidance on Wearable Sensor Data","authors":"Jinyung Hong, Eun Som Jeon, Matthew P. Buman, Pavan Turaga, Theodore P. Pavlic","doi":"10.1109/tnnls.2025.3640274","DOIUrl":"https://doi.org/10.1109/tnnls.2025.3640274","url":null,"abstract":"","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"42 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145731223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-11DOI: 10.1109/tnnls.2025.3638632
Shenghao Li,Lianbao Jin,Zhanpeng Wang,Zebin Xu,Na Lei,Zhongxuan Luo
Optimal transport (OT) has gained significant attention in deep learning as a powerful mathematical tool for transforming distributions. Specifically, in deep generative models, the incorporation of OT helps address issues such as training instability, vanishing gradients, and mode collapse. However, in these models, most of the OT mappings learned by neural networks are typically implicit, making it difficult to explicitly model the relationship between the source and target domains. This limitation reduces the interpretability of the model and hinders its applicability in conditional generation tasks. To address this issue, we introduce Nesterov's smoothing technique to smooth the Brenier potential, enabling the derivation of an explicit OT mapping that serves as the foundation for constructing an advanced generative model. The proposed model offers the following advantages. First, it explicitly captures the mapping between the source and target domains, thereby enhancing the interpretability of the generative process and enabling a novel pathway for conditional sample generation based on a smoothed approximation of OT mapping. Second, the model can generate new samples directly through an explicit OT mapping, eliminating the need for interpolation and rejection sampling commonly seen in traditional methods, thereby improving generation efficiency. Moreover, extensive experiments show that our proposed model achieves superior performance in both unconditional and conditional generation tasks.
{"title":"Beyond Implicit Mapping: Advancing Generative Models Through Smoothed Optimal Transport.","authors":"Shenghao Li,Lianbao Jin,Zhanpeng Wang,Zebin Xu,Na Lei,Zhongxuan Luo","doi":"10.1109/tnnls.2025.3638632","DOIUrl":"https://doi.org/10.1109/tnnls.2025.3638632","url":null,"abstract":"Optimal transport (OT) has gained significant attention in deep learning as a powerful mathematical tool for transforming distributions. Specifically, in deep generative models, the incorporation of OT helps address issues such as training instability, vanishing gradients, and mode collapse. However, in these models, most of the OT mappings learned by neural networks are typically implicit, making it difficult to explicitly model the relationship between the source and target domains. This limitation reduces the interpretability of the model and hinders its applicability in conditional generation tasks. To address this issue, we introduce Nesterov's smoothing technique to smooth the Brenier potential, enabling the derivation of an explicit OT mapping that serves as the foundation for constructing an advanced generative model. The proposed model offers the following advantages. First, it explicitly captures the mapping between the source and target domains, thereby enhancing the interpretability of the generative process and enabling a novel pathway for conditional sample generation based on a smoothed approximation of OT mapping. Second, the model can generate new samples directly through an explicit OT mapping, eliminating the need for interpolation and rejection sampling commonly seen in traditional methods, thereby improving generation efficiency. Moreover, extensive experiments show that our proposed model achieves superior performance in both unconditional and conditional generation tasks.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"7 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145728614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thermal imaging offers valuable properties, but suffers from inherently low spatial resolution, which can be enhanced using a high-resolution (HR) visible image as guidance. However, the substantial modality differences between thermal and visible images, coupled with significant resolution gaps, pose challenges to existing guided super-resolution (SR) approaches. In this article, we present dual-conditional diffusion (DuaDiff), an innovative diffusion model featuring a dual-conditioning mechanism to enhance guided thermal image SR. Unlike typical conditional diffusion models, DuaDiff integrates a learnable Laplacian pyramid to extract high-frequency details from the visible image, serving as one of the conditioning inputs. By capturing multiscale high-frequency components, DuaDiff effectively focuses on intricate textures and edges in the HR visible images, significantly enhancing thermal image fidelity. Furthermore, we project both thermal and visible images into a semantic latent space, constructing another conditioning input. Leveraging these complementary conditions, DuaDiff employs a multimodal latent feature cross-attention module to facilitate effective interaction between noise, thermal, and visible latent representations. Extensive experiments on the FLIR-ADAS and CATS datasets for $4times $ and $8times $ guided SR demonstrate that combining learnable Laplacian conditioning with semantic latent conditioning enables DuaDiff to surpass state-of-the-art methods in both visual quality and metric evaluation, particularly in scenarios with a large resolution gap. Besides, the applications to downstream tasks further confirm the capability of DuaDiff to recover high-fidelity semantic information. The code will be released.
{"title":"DuaDiff: Dual-Conditional Diffusion Model for Guided Thermal Image Super-Resolution.","authors":"Linrui Shi,Gaochang Wu,Yingqian Wang,Yebin Liu,Tianyou Chai","doi":"10.1109/tnnls.2025.3640168","DOIUrl":"https://doi.org/10.1109/tnnls.2025.3640168","url":null,"abstract":"Thermal imaging offers valuable properties, but suffers from inherently low spatial resolution, which can be enhanced using a high-resolution (HR) visible image as guidance. However, the substantial modality differences between thermal and visible images, coupled with significant resolution gaps, pose challenges to existing guided super-resolution (SR) approaches. In this article, we present dual-conditional diffusion (DuaDiff), an innovative diffusion model featuring a dual-conditioning mechanism to enhance guided thermal image SR. Unlike typical conditional diffusion models, DuaDiff integrates a learnable Laplacian pyramid to extract high-frequency details from the visible image, serving as one of the conditioning inputs. By capturing multiscale high-frequency components, DuaDiff effectively focuses on intricate textures and edges in the HR visible images, significantly enhancing thermal image fidelity. Furthermore, we project both thermal and visible images into a semantic latent space, constructing another conditioning input. Leveraging these complementary conditions, DuaDiff employs a multimodal latent feature cross-attention module to facilitate effective interaction between noise, thermal, and visible latent representations. Extensive experiments on the FLIR-ADAS and CATS datasets for $4times $ and $8times $ guided SR demonstrate that combining learnable Laplacian conditioning with semantic latent conditioning enables DuaDiff to surpass state-of-the-art methods in both visual quality and metric evaluation, particularly in scenarios with a large resolution gap. Besides, the applications to downstream tasks further confirm the capability of DuaDiff to recover high-fidelity semantic information. The code will be released.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"33 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145728609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Traffic prediction is a cornerstone of intelligent transportation systems (ITSs). The effectiveness of existing spatiotemporal graph neural networks (STGNNs) heavily relies on the independent identically distributed (i.i.d.) assumption of traffic data, which is frequently violated in practice because of distribution shifts owing to exogenous factors. While learning features that remain stable across all environments is promising for modeling robust frameworks, the fundamental challenge involves the decomposition of invariant features from the dynamic nature of spatiotemporal dependencies. In this article, we propose the disentangled spatiotemporal (DIST) graph neural networks, a novel framework for robust traffic forecasting considering distribution shifts. In DIST, latent invariant variables are explicitly decoupled from dynamically evolving spatiotemporal dependencies, enabling the learning of topology-agnostic representations resilient to distribution shifts. Specifically, we formulate a causality-driven learning objective that guides the separation of invariant variables from various exogenous factors. We then propose a spatiotemporal graph modeling module that can adaptively capture spatiotemporal dependencies in evolving traffic systems. Furthermore, we present a graph perturbation module to simulate topology variations during training, thereby encouraging the model to identify perturbation-sensitive dependencies and infer invariant and variant features for prediction and intervention tasks. The prediction risk and its variance on multiple interventional distributions are minimized in our learning strategy, allowing the model to identify invariant features, thus improving its robustness. The results of comprehensive real-world experiments demonstrate the superiority of our approach. The source code is available: https://github.com/tingwang25/DIST.
{"title":"Robust Traffic Forecasting With Disentangled Spatiotemporal Graph Neural Networks.","authors":"Ting Wang,Rui Luo,Daqian Shi,Hao Deng,Shengjie Zhao","doi":"10.1109/tnnls.2025.3635636","DOIUrl":"https://doi.org/10.1109/tnnls.2025.3635636","url":null,"abstract":"Traffic prediction is a cornerstone of intelligent transportation systems (ITSs). The effectiveness of existing spatiotemporal graph neural networks (STGNNs) heavily relies on the independent identically distributed (i.i.d.) assumption of traffic data, which is frequently violated in practice because of distribution shifts owing to exogenous factors. While learning features that remain stable across all environments is promising for modeling robust frameworks, the fundamental challenge involves the decomposition of invariant features from the dynamic nature of spatiotemporal dependencies. In this article, we propose the disentangled spatiotemporal (DIST) graph neural networks, a novel framework for robust traffic forecasting considering distribution shifts. In DIST, latent invariant variables are explicitly decoupled from dynamically evolving spatiotemporal dependencies, enabling the learning of topology-agnostic representations resilient to distribution shifts. Specifically, we formulate a causality-driven learning objective that guides the separation of invariant variables from various exogenous factors. We then propose a spatiotemporal graph modeling module that can adaptively capture spatiotemporal dependencies in evolving traffic systems. Furthermore, we present a graph perturbation module to simulate topology variations during training, thereby encouraging the model to identify perturbation-sensitive dependencies and infer invariant and variant features for prediction and intervention tasks. The prediction risk and its variance on multiple interventional distributions are minimized in our learning strategy, allowing the model to identify invariant features, thus improving its robustness. The results of comprehensive real-world experiments demonstrate the superiority of our approach. The source code is available: https://github.com/tingwang25/DIST.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"9 1","pages":""},"PeriodicalIF":10.4,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145728615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}