Pub Date : 2025-12-03DOI: 10.1109/JSTSP.2025.3607336
{"title":"IEEE Signal Processing Society Information","authors":"","doi":"10.1109/JSTSP.2025.3607336","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3607336","url":null,"abstract":"","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"C3-C3"},"PeriodicalIF":13.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11275990","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fetal brain MRI has become indispensable in prenatal diagnosis, offering unique soft tissue contrast to evaluate cortical development and detect neurological abnormalities. While high-resolution 3D imaging can provide valuable anatomical information, clinical acquisitions are often inadequate for reliable volumetric reconstruction due to unpredictable fetal motion and thick-slice protocols. Conventional iterative reconstruction methods, though effective, are computationally intensive and struggle with severe motion artifacts, limiting their feasibility in fast-paced clinical workflows. At the same time, the scarcity of authentic 3D fetal brain volumes prevents supervised learning approaches from developing generalizable reconstruction models. To overcome these limitations, we introduce SAFFRON, a physics-informed, label-free self-supervised framework for efficient and high-fidelity 3D fetal brain MRI reconstruction. SAFFRON eliminates the need for ground-truth 3D volumes by combining physics-driven slice acquisition modeling with data-driven deep learning, thereby bridging model-based and learning-based paradigms. The reconstruction task is decomposed into two modules: (1) multi-stack motion estimation via an SVR network that aligns slices into a canonical space, and (2) 3D volume reconstruction via an SRR network. Reconstruction quality is further enhanced by two targeted constraints: a stack-level contextual consistency loss to guide more accurate alignment and a slice-level adversarial loss to promote anatomically realistic structures. Extensive experiments on simulated and clinical datasets demonstrate that SAFFRON substantially outperforms state-of-the-art methods, achieving superior reconstruction accuracy while delivering up to a 60× acceleration in processing speed.
{"title":"SAFFRON: A Physics-Informed, Label-Free Self-Supervised Deep Learning Framework for Fast and Accurate 3D Fetal Brain MRI Reconstruction","authors":"Jiangjie Wu;Taotao Sun;Lihui Wang;Yuyao Zhang;Hongjiang Wei","doi":"10.1109/JSTSP.2025.3637544","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3637544","url":null,"abstract":"Fetal brain MRI has become indispensable in prenatal diagnosis, offering unique soft tissue contrast to evaluate cortical development and detect neurological abnormalities. While high-resolution 3D imaging can provide valuable anatomical information, clinical acquisitions are often inadequate for reliable volumetric reconstruction due to unpredictable fetal motion and thick-slice protocols. Conventional iterative reconstruction methods, though effective, are computationally intensive and struggle with severe motion artifacts, limiting their feasibility in fast-paced clinical workflows. At the same time, the scarcity of authentic 3D fetal brain volumes prevents supervised learning approaches from developing generalizable reconstruction models. To overcome these limitations, we introduce SAFFRON, a physics-informed, label-free self-supervised framework for efficient and high-fidelity 3D fetal brain MRI reconstruction. SAFFRON eliminates the need for ground-truth 3D volumes by combining physics-driven slice acquisition modeling with data-driven deep learning, thereby bridging model-based and learning-based paradigms. The reconstruction task is decomposed into two modules: (1) multi-stack motion estimation via an SVR network that aligns slices into a canonical space, and (2) 3D volume reconstruction via an SRR network. Reconstruction quality is further enhanced by two targeted constraints: a stack-level contextual consistency loss to guide more accurate alignment and a slice-level adversarial loss to promote anatomically realistic structures. Extensive experiments on simulated and clinical datasets demonstrate that SAFFRON substantially outperforms state-of-the-art methods, achieving superior reconstruction accuracy while delivering up to a 60× acceleration in processing speed.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1955-1966"},"PeriodicalIF":13.7,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-18DOI: 10.1109/JSTSP.2025.3634288
Gang Qu;Siming Zheng;Mengjie Qin;Xin Yuan
Image compression is an important technique to reduce the requirement of bandwidth, especially for large-scale information transmission and resource-limited platforms, like drones and robotics. In these cases, due to the limit resource, a low-cost encoder is desired. Toward this end, Block Modulation Video Compression (BMVC) encoder is a potential solution for this task. In this paper, we propose BMVC$+$, an enhanced version of BMVC, which promotes both the encoder and decoder of previous design, making it more efficient and experiment-friendly in real-world applications. Rather than separating the large-scale image into non-overlapping partitioned blocks and then summing all blocks together in previous BMVC encoder, the proposed BMVC$+$ encoder treats the large-scale image as a group of low-resolution patches, which can be seen as scanning strategy to compress patch into one pixel, leading to a low resolution compressed measurement. As for reconstruction, the patch sequences are specially extracted and reconstructed by BMVC$+$ decoder, for which we propose a deep unfolding network with a combination of multi-model unrolling Mamba block and channel-wise self-attention block for both local and global feature calibration and long-distance correlation reconstruction. Extensive experiments on the simulation dataset demonstrate the performance of the proposed method, which shows great potential to be applied in real-world system for low-complexity image compression.
{"title":"BMVC$+$: An Enhanced Block Modulation Video Compression Codec for Large-Scale Image Compression","authors":"Gang Qu;Siming Zheng;Mengjie Qin;Xin Yuan","doi":"10.1109/JSTSP.2025.3634288","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3634288","url":null,"abstract":"Image compression is an important technique to reduce the requirement of bandwidth, especially for large-scale information transmission and resource-limited platforms, like drones and robotics. In these cases, due to the limit resource, a low-cost encoder is desired. Toward this end, Block Modulation Video Compression (BMVC) encoder is a potential solution for this task. In this paper, we propose BMVC<inline-formula><tex-math>$+$</tex-math></inline-formula>, an enhanced version of BMVC, which promotes both the encoder and decoder of previous design, making it more efficient and experiment-friendly in real-world applications. Rather than separating the large-scale image into non-overlapping partitioned blocks and then summing all blocks together in previous BMVC encoder, the proposed BMVC<inline-formula><tex-math>$+$</tex-math></inline-formula> encoder treats the large-scale image as a group of low-resolution patches, which can be seen as scanning strategy to compress patch into one pixel, leading to a low resolution compressed measurement. As for reconstruction, the patch sequences are specially extracted and reconstructed by BMVC<inline-formula><tex-math>$+$</tex-math></inline-formula> decoder, for which we propose a deep unfolding network with a combination of multi-model unrolling Mamba block and channel-wise self-attention block for both local and global feature calibration and long-distance correlation reconstruction. Extensive experiments on the simulation dataset demonstrate the performance of the proposed method, which shows great potential to be applied in real-world system for low-complexity image compression.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1943-1954"},"PeriodicalIF":13.7,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1109/JSTSP.2025.3632537
Mahmut Esat Demirhan;Seniha Esen Yuksel;Erkut Erdem;Aykut Erdem;Anna-Maria Raita-Hakola;Ilkka Pölönen
Low-light hyperspectral images (HSIs) suffer from reduced visibility, amplified noise, and distorted spectral signatures, which degrade critical downstream tasks in surveillance, environmental monitoring, and remote sensing. Because collecting paired normal/low-light HSIs is often impractical, we introduce SS-HSLIE, the first self-supervised framework for low-light HSI enhancement. Guided by Retinex theory, our cascaded network (i) decomposes an input HSI into reflectance and illumination maps and (ii) refines the illumination with a Transformer module that models global spatial context. Two physics-aware losses further steer learning: a Fourier spectrum loss that removes noise while protecting high-frequency details, and a spectral smoothness loss that preserves inter-band consistency. Trained solely on unpaired low-light data, SS-HSLIE substantially outperforms recent unsupervised baselines on both an indoor benchmark and a challenging new real-world outdoor dataset, delivering brighter, cleaner HSIs while faithfully preserving material-specific spectra. Code, pretrained models, and our new outdoor HSI dataset will be released.
{"title":"Self-Supervised Low-Light Hyperspectral Image Enhancement via Fourier-Based Transformer Network","authors":"Mahmut Esat Demirhan;Seniha Esen Yuksel;Erkut Erdem;Aykut Erdem;Anna-Maria Raita-Hakola;Ilkka Pölönen","doi":"10.1109/JSTSP.2025.3632537","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3632537","url":null,"abstract":"Low-light hyperspectral images (HSIs) suffer from reduced visibility, amplified noise, and distorted spectral signatures, which degrade critical downstream tasks in surveillance, environmental monitoring, and remote sensing. Because collecting paired normal/low-light HSIs is often impractical, we introduce SS-HSLIE, the first self-supervised framework for low-light HSI enhancement. Guided by Retinex theory, our cascaded network (i) decomposes an input HSI into reflectance and illumination maps and (ii) refines the illumination with a Transformer module that models global spatial context. Two physics-aware losses further steer learning: a Fourier spectrum loss that removes noise while protecting high-frequency details, and a spectral smoothness loss that preserves inter-band consistency. Trained solely on unpaired low-light data, SS-HSLIE substantially outperforms recent unsupervised baselines on both an indoor benchmark and a challenging new real-world outdoor dataset, delivering brighter, cleaner HSIs while faithfully preserving material-specific spectra. Code, pretrained models, and our new outdoor HSI dataset will be released.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1905-1919"},"PeriodicalIF":13.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-14DOI: 10.1109/JSTSP.2025.3632533
Junqi Shi;Qirui Zhang;Ming Lu;Zhan Ma
Hyperspectral image (HSI) signals are high-dimensional, continuous, and sensor-variant, posing significant challenges for compression and restoration—especially under the scarcity of labeled data and the limitations of discretized, supervised models. In this work, we propose HINER++, a unified, self-supervised framework based on implicit neural representations (INRs) for continuous hyperspectral signal modeling. HINER++ formulates each HSI sample as a wavelength-to-intensity mapping, parameterized by a compact neural network. Thus, HSI signal modeling becomes equivalent to a continuous neural function regression problem, where compression is achieved by encoding the parameters of the neural function. Building on the principles of Deep Image Prior (DIP), we further reinterpret the inductive bias of the INR architecture as a natural prior for HSI restoration, enabling tasks such as denoising, inpainting, and super-resolution without labeled supervision. Finally, we investigate the impact of lossy INR-based compression on downstream perception tasks, using classification as a proxy, and demonstrate that such implicit modeling helps preserve task performance under extreme compression via proposed task-aware components—Adaptive Spectral Weighting (ASW) and Implicit Spectral Interpolation (ISI). Extensive experiments on real-world HSI benchmarks demonstrate that HINER++ achieves superior performance across multiple compression and restoration tasks, preserving classification accuracy under extreme compression ratios (up to 100×). This work offers a new perspective: compression is not merely data reduction, but a self-supervised process of signal disentanglement and recovery.
{"title":"Compression as Restoration: A Unified Implicit Approach to Self-Supervised Hyperspectral Image Representation","authors":"Junqi Shi;Qirui Zhang;Ming Lu;Zhan Ma","doi":"10.1109/JSTSP.2025.3632533","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3632533","url":null,"abstract":"Hyperspectral image (HSI) signals are high-dimensional, continuous, and sensor-variant, posing significant challenges for compression and restoration—especially under the scarcity of labeled data and the limitations of discretized, supervised models. In this work, we propose <bold>HINER++</b>, a unified, self-supervised framework based on implicit neural representations (INRs) for continuous hyperspectral signal modeling. HINER++ formulates each HSI sample as a wavelength-to-intensity mapping, parameterized by a compact neural network. Thus, HSI signal modeling becomes equivalent to a continuous neural function regression problem, where compression is achieved by encoding the parameters of the neural function. Building on the principles of Deep Image Prior (DIP), we further reinterpret the inductive bias of the INR architecture as a natural prior for HSI restoration, enabling tasks such as denoising, inpainting, and super-resolution without labeled supervision. Finally, we investigate the impact of lossy INR-based compression on downstream perception tasks, using classification as a proxy, and demonstrate that such implicit modeling helps preserve task performance under extreme compression via proposed task-aware components—Adaptive Spectral Weighting (ASW) and Implicit Spectral Interpolation (ISI). Extensive experiments on real-world HSI benchmarks demonstrate that HINER++ achieves superior performance across multiple compression and restoration tasks, preserving classification accuracy under extreme compression ratios (up to 100×). This work offers a new perspective: compression is not merely data reduction, but a self-supervised process of signal disentanglement and recovery.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1890-1904"},"PeriodicalIF":13.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Deep equilibrium models (DEMs) are emerging as an efficientalternative to train deep unrolling networks (DUNs) realized by recasting iterative optimization algorithms for image restoration (IR) in a single iteration using constant memory usage and are potential to allow infinite iterations until convergence to fixed points. However, existing DEMs for IR suffer from architectural constraints in exploiting skip connections and adaptive parameters varying with iterations to simultaneously benefit feature learning and achieve stable reconstruction with theoretical convergence guarantees. To address these challenges, we formulate a novel IR optimization problem with a deep neural network (DNN)-based energy and propose DenseDiff, a DEM based on proximal gradient descent with weighted densely-connected iteration differences, to solve it. Specifically, for each iteration, we combine densely connected iteration differences from preceding iterations with adaptive weights and step sizes to facilitate feature learning, and reformulate multi-variable fixed-point computation as a two-variable problem to preserve the effective parameter space for optimization. Furthermore, we thoroughly explore the properties of common DNN components for theoretical guidance for constituting the energy in practice, and demonstrate that DenseDiff converges to a critical point of the objective function with at least a sublinear rate. Comprehensive evaluations on image deblurring, inpainting, and CS-MRI reveal that DenseDiff achieves state-of-the-art reconstruction performance with guaranteed convergence, suggesting a promising paradigm for designing interpretable DNNs for IR tasks.
{"title":"Deep Equilibrium Model With Weighted Densely-Connected Iteration Differences for Convergent Image Restoration","authors":"Ziyang Zheng;Wenrui Dai;Jingwei Liang;Xinyu Peng;Duoduo Xue;Hongkai Xiong","doi":"10.1109/JSTSP.2025.3632532","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3632532","url":null,"abstract":"Deep equilibrium models (DEMs) are emerging as an efficientalternative to train deep unrolling networks (DUNs) realized by recasting iterative optimization algorithms for image restoration (IR) in a single iteration using constant memory usage and are potential to allow infinite iterations until convergence to fixed points. However, existing DEMs for IR suffer from architectural constraints in exploiting skip connections and adaptive parameters varying with iterations to simultaneously benefit feature learning and achieve stable reconstruction with theoretical convergence guarantees. To address these challenges, we formulate a novel IR optimization problem with a deep neural network (DNN)-based energy and propose <italic>DenseDiff</i>, a DEM based on proximal gradient descent with weighted densely-connected iteration differences, to solve it. Specifically, for each iteration, we combine densely connected iteration differences from preceding iterations with adaptive weights and step sizes to facilitate feature learning, and reformulate multi-variable fixed-point computation as a two-variable problem to preserve the effective parameter space for optimization. Furthermore, we thoroughly explore the properties of common DNN components for theoretical guidance for constituting the energy in practice, and demonstrate that <italic>DenseDiff</i> converges to a critical point of the objective function with at least a sublinear rate. Comprehensive evaluations on image deblurring, inpainting, and CS-MRI reveal that <italic>DenseDiff</i> achieves state-of-the-art reconstruction performance with guaranteed convergence, suggesting a promising paradigm for designing interpretable DNNs for IR tasks.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1873-1889"},"PeriodicalIF":13.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low-rank based deep unrolling networks have emerged as a powerful technology for Compressed Sensing Magnetic Resonance Imaging (CS-MRI), owing to their superior performance and considerable interpretability. However, most existing methods still face two main challenges: i) inefficient information transmission between stages and ii) high time cost for low-rank prior optimization. To alleviate these limitations, in this paper, we propose a Stage-Aware Deep Unrolling with Neural Tensor Decomposition (STAND-Net) for accelerated MRI, which enhances information flow and enables faster learning of the low-rank prior. Specifically, we introduce a novel Cross-Stage Attention Module (CSAM) to bridge earlier stages with the current one, thereby improving the network’s information flow and representational capability. Additionally, to reduce computational time, we propose a deep matrix factorization-guided module to approximate canonical-polyadic (CP) decomposition, termed the Deep Tensor Decomposition Module (DTDM). This module first generates a series of discriminative rank-one tensors from MRI data, which are then aggregated to form the low-rank representation of the data. Extensive experiments demonstrate that STAND-Net outperforms state-of-the-art methods in both quantitative and qualitative evaluations, achieving faster reconstruction than traditional SVD-based methods.
{"title":"STAND-Net: Lightning-Fast MRI Reconstruction via Stage-Aware Deep Unrolling With Neural Tensor Decomposition","authors":"Maosong Ran;Ziyuan Yang;Tao Wang;Zhiwen Wang;Jingfeng Lu;Yi Zhang","doi":"10.1109/JSTSP.2025.3632569","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3632569","url":null,"abstract":"Low-rank based deep unrolling networks have emerged as a powerful technology for Compressed Sensing Magnetic Resonance Imaging (CS-MRI), owing to their superior performance and considerable interpretability. However, most existing methods still face two main challenges: <italic>i)</i> inefficient information transmission between stages and <italic>ii)</i> high time cost for low-rank prior optimization. To alleviate these limitations, in this paper, we propose a Stage-Aware Deep Unrolling with Neural Tensor Decomposition (<bold>STAND-Net</b>) for accelerated MRI, which enhances information flow and enables faster learning of the low-rank prior. Specifically, we introduce a novel Cross-Stage Attention Module (<bold>CSAM</b>) to bridge earlier stages with the current one, thereby improving the network’s information flow and representational capability. Additionally, to reduce computational time, we propose a deep matrix factorization-guided module to approximate canonical-polyadic (CP) decomposition, termed the Deep Tensor Decomposition Module (<bold>DTDM</b>). This module first generates a series of discriminative rank-one tensors from MRI data, which are then aggregated to form the low-rank representation of the data. Extensive experiments demonstrate that STAND-Net outperforms state-of-the-art methods in both quantitative and qualitative evaluations, achieving faster reconstruction than traditional SVD-based methods.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1930-1942"},"PeriodicalIF":13.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tractography fiber clustering using diffusion MRI (dMRI) is a crucial method for white matter (WM) parcellation to enable analysis of brain’s structural connectivity in health and disease. Current fiber clustering strategies primarily use the fiber geometric characteristics (i.e., the spatial trajectories) to group similar fibers into clusters, while neglecting the functional and microstructural information of the fiber tracts. There is increasing evidence that neural activity in the WM can be measured using functional MRI (fMRI), providing potentially valuable multimodal information for fiber clustering to enhance its functional coherence. Furthermore, microstructural features such as fractional anisotropy (FA) can be computed from dMRI as additional information to ensure the anatomical coherence of the clusters. In this paper, we develop a novel deep learning fiber clustering framework, namely Deep Multi-view Fiber Clustering (DMVFC), which uses joint multi-modal dMRI and fMRI data to enable functionally consistent WM parcellation. DMVFC can effectively integrate the geometric and microstructural characteristics of the WM fibers with the fMRI BOLD signals along the fiber tracts. DMVFC includes two major components: (1) a multi-view pretraining module to compute embedding features from each source of information separately, including fiber geometry, microstructure measures, and functional signals, and (2) a collaborative fine-tuning module to simultaneously refine the differences of embeddings. In the experiments, we compare DMVFC with two state-of-the-art fiber clustering methods and demonstrate superior performance in achieving functionally meaningful and consistent WM parcellation results.
{"title":"DMVFC: Deep Learning Based Functionally Consistent Tractography Fiber Clustering Using Multimodal Diffusion MRI and Functional MRI","authors":"Bocheng Guo;Jin Wang;Yijie Li;Junyi Wang;Mingyu Gao;Puming Feng;Yuqian Chen;Jarrett Rushmore;Nikos Makris;Yogesh Rathi;Lauren J. O’Donnell;Fan Zhang","doi":"10.1109/JSTSP.2025.3632542","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3632542","url":null,"abstract":"Tractography fiber clustering using diffusion MRI (dMRI) is a crucial method for white matter (WM) parcellation to enable analysis of brain’s structural connectivity in health and disease. Current fiber clustering strategies primarily use the fiber geometric characteristics (i.e., the spatial trajectories) to group similar fibers into clusters, while neglecting the functional and microstructural information of the fiber tracts. There is increasing evidence that neural activity in the WM can be measured using functional MRI (fMRI), providing potentially valuable multimodal information for fiber clustering to enhance its functional coherence. Furthermore, microstructural features such as fractional anisotropy (FA) can be computed from dMRI as additional information to ensure the anatomical coherence of the clusters. In this paper, we develop a novel deep learning fiber clustering framework, namely <italic>Deep Multi-view Fiber Clustering (DMVFC)</i>, which uses joint multi-modal dMRI and fMRI data to enable functionally consistent WM parcellation. DMVFC can effectively integrate the geometric and microstructural characteristics of the WM fibers with the fMRI BOLD signals along the fiber tracts. DMVFC includes two major components: (1) a multi-view pretraining module to compute embedding features from each source of information separately, including fiber geometry, microstructure measures, and functional signals, and (2) a collaborative fine-tuning module to simultaneously refine the differences of embeddings. In the experiments, we compare DMVFC with two state-of-the-art fiber clustering methods and demonstrate superior performance in achieving functionally meaningful and consistent WM parcellation results.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1920-1929"},"PeriodicalIF":13.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122734","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-10DOI: 10.1109/JSTSP.2025.3631405
Ye Ma;Songnan Lin;Bihan Wen
Reconstructing hyperspectral images (HSIs) from RGB measurements is a fundamentally ill-posed problem that requires an effective HSI prior. Recent deep learning-based approaches address this by leveraging data-driven priors. However, their performance is constrained by the limited availability of large-scale datasets, which restricts their practical applicability in scenarios that demand high levels of trust. Furthermore, these models often fail silently when exposed to unseen spectral data or objects, offering no indication of potential reconstruction errors. To overcome these limitations, we propose a novel RGB-to-HSI reconstruction framework that exploits the sparse priors of both HSI and RGB features, by associating them through shared sparse codes. The shared sparse representation is obtained by unrolling extragradient-based ISTA with a trained neural network, interpreted as weights of unique spectral bases to capture the intrinsic spectral structure. Moreover, in the proposed framework, the sparse modeling error naturally emerges as an empirical uncertainty score, effectively signaling the presence of unseen spectra. This score can be visualized and utilized to guide selective model refinement. Experimental results confirm that the proposed method not only achieves strong performance for RGB-to-HSI reconstruction, but also demonstrates the utility of the uncertainty score for detecting novel spectral content and informing data selection during model updates.
{"title":"Uncertainty-Aware Hyperspectral Image Reconstruction From RGB Measurements Using Unrolled Sparse Coding","authors":"Ye Ma;Songnan Lin;Bihan Wen","doi":"10.1109/JSTSP.2025.3631405","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3631405","url":null,"abstract":"Reconstructing hyperspectral images (HSIs) from RGB measurements is a fundamentally ill-posed problem that requires an effective HSI prior. Recent deep learning-based approaches address this by leveraging data-driven priors. However, their performance is constrained by the limited availability of large-scale datasets, which restricts their practical applicability in scenarios that demand high levels of trust. Furthermore, these models often fail silently when exposed to unseen spectral data or objects, offering no indication of potential reconstruction errors. To overcome these limitations, we propose a novel RGB-to-HSI reconstruction framework that exploits the sparse priors of both HSI and RGB features, by associating them through shared sparse codes. The shared sparse representation is obtained by unrolling extragradient-based ISTA with a trained neural network, interpreted as weights of unique spectral bases to capture the intrinsic spectral structure. Moreover, in the proposed framework, the sparse modeling error naturally emerges as an empirical uncertainty score, effectively signaling the presence of unseen spectra. This score can be visualized and utilized to guide selective model refinement. Experimental results confirm that the proposed method not only achieves strong performance for RGB-to-HSI reconstruction, but also demonstrates the utility of the uncertainty score for detecting novel spectral content and informing data selection during model updates.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 8","pages":"1861-1872"},"PeriodicalIF":13.7,"publicationDate":"2025-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146122790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities, offering potential biomarkers for early detection. Despite the promise of multi-modal approaches, challenges like multi-lingual generalization and the absence of a unified evaluation framework persist. To address these gaps, we propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan. Leveraging 13 multi-lingual datasets spanning English, Chinese, Greek, French, and Dutch, we systematically evaluate multi-modal fusion performance. Our results show that multi-modal fusion excels in AD and depression detection but underperforms in ASD due to dataset heterogeneity. We also identify modality imbalance as a prevalent issue, where multi-modal fusion fails to surpass the best mono-modal models. Cross-corpus experiments reveal robust performance in task- and language-consistent scenarios but noticeable degradation in multi-lingual and task-heterogeneous settings. By providing extensive benchmarks and a detailed analysis of performance-influencing factors, FEND advances the field of automated, lifespan-inclusive, and multi-lingual neuropsychiatric disorder assessment. We encourage researchers to adopt the FEND framework for fair comparisons and reproducible research.
神经精神疾病,如阿尔茨海默病(AD)、抑郁症和自闭症谱系障碍(ASD),以语言和声学异常为特征,为早期检测提供了潜在的生物标志物。尽管多模态方法带来了希望,但多语言泛化和缺乏统一的评估框架等挑战仍然存在。为了解决这些差距,我们提出了基于基础模型的神经精神疾病评估(Foundation model-based Evaluation of Neuropsychiatric Disorders),这是一个综合的多模式框架,整合了语音和文本模式,用于在整个生命周期中检测AD、抑郁症和ASD。利用13个多语言数据集,包括英语、中文、希腊语、法语和荷兰语,我们系统地评估了多模态融合的性能。我们的研究结果表明,由于数据集的异质性,多模态融合在AD和抑郁检测中表现出色,但在ASD中表现不佳。我们还确定了模态不平衡是一个普遍的问题,其中多模态融合未能超越最好的单模态模型。跨语料库实验显示,在任务和语言一致的情况下,该算法表现稳健,但在多语言和任务异构的情况下,其性能明显下降。通过提供广泛的基准和详细的性能影响因素分析,该系统推动了自动化、寿命包容性和多语言神经精神障碍评估领域的发展。我们鼓励研究人员为了公平比较和可重复的研究而采用免疫免疫系统框架。
{"title":"Foundation Model-Based Evaluation of Neuropsychiatric Disorders: A Lifespan-Inclusive, Multi-Modal, and Multi-Lingual Study","authors":"Zhongren Dong;Haotian Guo;Weixiang Xu;Huan Zhao;Zixing Zhang","doi":"10.1109/JSTSP.2025.3622051","DOIUrl":"https://doi.org/10.1109/JSTSP.2025.3622051","url":null,"abstract":"Neuropsychiatric disorders, such as Alzheimer's disease (AD), depression, and autism spectrum disorder (ASD), are characterized by linguistic and acoustic abnormalities, offering potential biomarkers for early detection. Despite the promise of multi-modal approaches, challenges like multi-lingual generalization and the absence of a unified evaluation framework persist. To address these gaps, we propose FEND (Foundation model-based Evaluation of Neuropsychiatric Disorders), a comprehensive multi-modal framework integrating speech and text modalities for detecting AD, depression, and ASD across the lifespan. Leveraging 13 multi-lingual datasets spanning English, Chinese, Greek, French, and Dutch, we systematically evaluate multi-modal fusion performance. Our results show that multi-modal fusion excels in AD and depression detection but underperforms in ASD due to dataset heterogeneity. We also identify modality imbalance as a prevalent issue, where multi-modal fusion fails to surpass the best mono-modal models. Cross-corpus experiments reveal robust performance in task- and language-consistent scenarios but noticeable degradation in multi-lingual and task-heterogeneous settings. By providing extensive benchmarks and a detailed analysis of performance-influencing factors, FEND advances the field of automated, lifespan-inclusive, and multi-lingual neuropsychiatric disorder assessment. We encourage researchers to adopt the FEND framework for fair comparisons and reproducible research.","PeriodicalId":13038,"journal":{"name":"IEEE Journal of Selected Topics in Signal Processing","volume":"19 5","pages":"796-809"},"PeriodicalIF":13.7,"publicationDate":"2025-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145659177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}