Romeo Lanzino, Federico Fontana, Luigi Cinque, Francesco Scarcello, Atsuto Maki
This paper introduces the Neural Transcoding Vision Transformer (modelname), a generative model designed to estimate high-resolution functional Magnetic Resonance Imaging (fMRI) samples from simultaneous Electroencephalography (EEG) data. A key feature of modelname is its Domain Matching (DM) sub-module which effectively aligns the latent EEG representations with those of fMRI volumes, enhancing the model's accuracy and reliability. Unlike previous methods that tend to struggle with fidelity and reproducibility of images, modelname addresses these challenges by ensuring methodological integrity and higher-quality reconstructions which we showcase through extensive evaluation on two benchmark datasets; modelname outperforms the current state-of-the-art by a significant margin in both cases, e.g. achieving a $10times$ reduction in RMSE and a $3.14times$ increase in SSIM on the Oddball dataset. An ablation study also provides insights into the contribution of each component to the model's overall effectiveness. This development is critical in offering a new approach to lessen the time and financial constraints typically linked with high-resolution brain imaging, thereby aiding in the swift and precise diagnosis of neurological disorders. Although it is not a replacement for actual fMRI but rather a step towards making such imaging more accessible, we believe that it represents a pivotal advancement in clinical practice and neuroscience research. Code is available at url{https://github.com/rom42pla/ntvit}.
{"title":"NT-ViT: Neural Transcoding Vision Transformers for EEG-to-fMRI Synthesis","authors":"Romeo Lanzino, Federico Fontana, Luigi Cinque, Francesco Scarcello, Atsuto Maki","doi":"arxiv-2409.11836","DOIUrl":"https://doi.org/arxiv-2409.11836","url":null,"abstract":"This paper introduces the Neural Transcoding Vision Transformer (modelname),\u0000a generative model designed to estimate high-resolution functional Magnetic\u0000Resonance Imaging (fMRI) samples from simultaneous Electroencephalography (EEG)\u0000data. A key feature of modelname is its Domain Matching (DM) sub-module which\u0000effectively aligns the latent EEG representations with those of fMRI volumes,\u0000enhancing the model's accuracy and reliability. Unlike previous methods that\u0000tend to struggle with fidelity and reproducibility of images, modelname\u0000addresses these challenges by ensuring methodological integrity and\u0000higher-quality reconstructions which we showcase through extensive evaluation\u0000on two benchmark datasets; modelname outperforms the current state-of-the-art\u0000by a significant margin in both cases, e.g. achieving a $10times$ reduction in\u0000RMSE and a $3.14times$ increase in SSIM on the Oddball dataset. An ablation\u0000study also provides insights into the contribution of each component to the\u0000model's overall effectiveness. This development is critical in offering a new\u0000approach to lessen the time and financial constraints typically linked with\u0000high-resolution brain imaging, thereby aiding in the swift and precise\u0000diagnosis of neurological disorders. Although it is not a replacement for\u0000actual fMRI but rather a step towards making such imaging more accessible, we\u0000believe that it represents a pivotal advancement in clinical practice and\u0000neuroscience research. Code is available at\u0000url{https://github.com/rom42pla/ntvit}.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rudolf L. M. van Herten, Ioannis Lagogiannis, Jelmer M. Wolterink, Steffen Bruns, Eva R. Meulendijks, Damini Dey, Joris R. de Groot, José P. Henriques, R. Nils Planken, Simone Saitta, Ivana Išgum
Deep learning-based medical image segmentation and surface mesh generation typically involve a sequential pipeline from image to segmentation to meshes, often requiring large training datasets while making limited use of prior geometric knowledge. This may lead to topological inconsistencies and suboptimal performance in low-data regimes. To address these challenges, we propose a data-efficient deep learning method for direct 3D anatomical object surface meshing using geometric priors. Our approach employs a multi-resolution graph neural network that operates on a prior geometric template which is deformed to fit object boundaries of interest. We show how different templates may be used for the different surface meshing targets, and introduce a novel masked autoencoder pretraining strategy for 3D spherical data. The proposed method outperforms nnUNet in a one-shot setting for segmentation of the pericardium, left ventricle (LV) cavity and the LV myocardium. Similarly, the method outperforms other lumen segmentation operating on multi-planar reformatted images. Results further indicate that mesh quality is on par with or improves upon marching cubes post-processing of voxel mask predictions, while remaining flexible in the choice of mesh triangulation prior, thus paving the way for more accurate and topologically consistent 3D medical object surface meshing.
{"title":"World of Forms: Deformable Geometric Templates for One-Shot Surface Meshing in Coronary CT Angiography","authors":"Rudolf L. M. van Herten, Ioannis Lagogiannis, Jelmer M. Wolterink, Steffen Bruns, Eva R. Meulendijks, Damini Dey, Joris R. de Groot, José P. Henriques, R. Nils Planken, Simone Saitta, Ivana Išgum","doi":"arxiv-2409.11837","DOIUrl":"https://doi.org/arxiv-2409.11837","url":null,"abstract":"Deep learning-based medical image segmentation and surface mesh generation\u0000typically involve a sequential pipeline from image to segmentation to meshes,\u0000often requiring large training datasets while making limited use of prior\u0000geometric knowledge. This may lead to topological inconsistencies and\u0000suboptimal performance in low-data regimes. To address these challenges, we\u0000propose a data-efficient deep learning method for direct 3D anatomical object\u0000surface meshing using geometric priors. Our approach employs a multi-resolution\u0000graph neural network that operates on a prior geometric template which is\u0000deformed to fit object boundaries of interest. We show how different templates\u0000may be used for the different surface meshing targets, and introduce a novel\u0000masked autoencoder pretraining strategy for 3D spherical data. The proposed\u0000method outperforms nnUNet in a one-shot setting for segmentation of the\u0000pericardium, left ventricle (LV) cavity and the LV myocardium. Similarly, the\u0000method outperforms other lumen segmentation operating on multi-planar\u0000reformatted images. Results further indicate that mesh quality is on par with\u0000or improves upon marching cubes post-processing of voxel mask predictions,\u0000while remaining flexible in the choice of mesh triangulation prior, thus paving\u0000the way for more accurate and topologically consistent 3D medical object\u0000surface meshing.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hamza Kalisch, Fabian Hörst, Ken Herrmann, Jens Kleesiek, Constantin Seibold
Lesion segmentation in PET/CT imaging is essential for precise tumor characterization, which supports personalized treatment planning and enhances diagnostic precision in oncology. However, accurate manual segmentation of lesions is time-consuming and prone to inter-observer variability. Given the rising demand and clinical use of PET/CT, automated segmentation methods, particularly deep-learning-based approaches, have become increasingly more relevant. The autoPET III Challenge focuses on advancing automated segmentation of tumor lesions in PET/CT images in a multitracer multicenter setting, addressing the clinical need for quantitative, robust, and generalizable solutions. Building on previous challenges, the third iteration of the autoPET challenge introduces a more diverse dataset featuring two different tracers (FDG and PSMA) from two clinical centers. To this extent, we developed a classifier that identifies the tracer of the given PET/CT based on the Maximum Intensity Projection of the PET scan. We trained two individual nnUNet-ensembles for each tracer where anatomical labels are included as a multi-label task to enhance the model's performance. Our final submission achieves cross-validation Dice scores of 76.90% and 61.33% for the publicly available FDG and PSMA datasets, respectively. The code is available at https://github.com/hakal104/autoPETIII/ .
{"title":"Autopet III challenge: Incorporating anatomical knowledge into nnUNet for lesion segmentation in PET/CT","authors":"Hamza Kalisch, Fabian Hörst, Ken Herrmann, Jens Kleesiek, Constantin Seibold","doi":"arxiv-2409.12155","DOIUrl":"https://doi.org/arxiv-2409.12155","url":null,"abstract":"Lesion segmentation in PET/CT imaging is essential for precise tumor\u0000characterization, which supports personalized treatment planning and enhances\u0000diagnostic precision in oncology. However, accurate manual segmentation of\u0000lesions is time-consuming and prone to inter-observer variability. Given the\u0000rising demand and clinical use of PET/CT, automated segmentation methods,\u0000particularly deep-learning-based approaches, have become increasingly more\u0000relevant. The autoPET III Challenge focuses on advancing automated segmentation\u0000of tumor lesions in PET/CT images in a multitracer multicenter setting,\u0000addressing the clinical need for quantitative, robust, and generalizable\u0000solutions. Building on previous challenges, the third iteration of the autoPET\u0000challenge introduces a more diverse dataset featuring two different tracers\u0000(FDG and PSMA) from two clinical centers. To this extent, we developed a\u0000classifier that identifies the tracer of the given PET/CT based on the Maximum\u0000Intensity Projection of the PET scan. We trained two individual\u0000nnUNet-ensembles for each tracer where anatomical labels are included as a\u0000multi-label task to enhance the model's performance. Our final submission\u0000achieves cross-validation Dice scores of 76.90% and 61.33% for the publicly\u0000available FDG and PSMA datasets, respectively. The code is available at\u0000https://github.com/hakal104/autoPETIII/ .","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. A. G. Yogi Pramana, Faiz Ihza Permana, Muhammad Fazil Maulana, Dzikri Rahadian Fudholi
Tuberculosis (TB) is caused by the bacterium Mycobacterium tuberculosis, primarily affecting the lungs. Early detection is crucial for improving treatment effectiveness and reducing transmission risk. Artificial intelligence (AI), particularly through image classification of chest X-rays, can assist in TB detection. However, class imbalance in TB chest X-ray datasets presents a challenge for accurate classification. In this paper, we propose a few-shot learning (FSL) approach using the Prototypical Network algorithm to address this issue. We compare the performance of ResNet-18, ResNet-50, and VGG16 in feature extraction from the TBX11K Chest X-ray dataset. Experimental results demonstrate classification accuracies of 98.93% for ResNet-18, 98.60% for ResNet-50, and 33.33% for VGG16. These findings indicate that the proposed method outperforms others in mitigating data imbalance, which is particularly beneficial for disease classification applications.
结核病(TB)是由结核分枝杆菌引起的,主要侵犯肺部。早期发现对于提高治疗效果和降低传播风险至关重要。人工智能(AI),尤其是通过对胸部 X 光片进行图像分类,可以帮助检测结核病。然而,结核病胸部 X 光片数据集中的类不平衡给准确分类带来了挑战。在本文中,我们提出了一种使用原型网络算法的 "少量清除"(FSL)方法来解决这一问题。我们比较了 ResNet-18、ResNet-50 和 VGG16 从 TBX11K 胸部 X 光数据集中提取特征的性能。实验结果表明,ResNet-18 的分类准确率为 98.93%,ResNet-50 为 98.60%,VGG16 为 33.33%。这些结果表明,所提出的方法在缓解数据不平衡方面优于其他方法,这对疾病分类应用尤其有益。
{"title":"Few-Shot Learning Approach on Tuberculosis Classification Based on Chest X-Ray Images","authors":"A. A. G. Yogi Pramana, Faiz Ihza Permana, Muhammad Fazil Maulana, Dzikri Rahadian Fudholi","doi":"arxiv-2409.11644","DOIUrl":"https://doi.org/arxiv-2409.11644","url":null,"abstract":"Tuberculosis (TB) is caused by the bacterium Mycobacterium tuberculosis,\u0000primarily affecting the lungs. Early detection is crucial for improving\u0000treatment effectiveness and reducing transmission risk. Artificial intelligence\u0000(AI), particularly through image classification of chest X-rays, can assist in\u0000TB detection. However, class imbalance in TB chest X-ray datasets presents a\u0000challenge for accurate classification. In this paper, we propose a few-shot\u0000learning (FSL) approach using the Prototypical Network algorithm to address\u0000this issue. We compare the performance of ResNet-18, ResNet-50, and VGG16 in\u0000feature extraction from the TBX11K Chest X-ray dataset. Experimental results\u0000demonstrate classification accuracies of 98.93% for ResNet-18, 98.60% for\u0000ResNet-50, and 33.33% for VGG16. These findings indicate that the proposed\u0000method outperforms others in mitigating data imbalance, which is particularly\u0000beneficial for disease classification applications.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the fast-evolving field of Generative AI, platforms like MidJourney, DALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation. However, despite their impressive ability to create high-quality images, they often struggle to generate accurate text within these images. Theoretically, if we could achieve accurate text generation in AI images in a ``zero-shot'' manner, it would not only make AI-generated images more meaningful but also democratize the graphic design industry. The first step towards this goal is to create a robust scoring matrix for evaluating text accuracy in AI-generated images. Although there are existing bench-marking methods like CLIP SCORE and T2I-CompBench++, there's still a gap in systematically evaluating text and typography in AI-generated images, especially with diffusion-based methods. In this paper, we introduce a novel evaluation matrix designed explicitly for quantifying the performance of text and typography generation within AI-generated images. We have used letter by letter matching strategy to compute the exact matching scores from the reference text to the AI generated text. Our novel approach to calculate the score takes care of multiple redundancies such as repetition of words, case sensitivity, mixing of words, irregular incorporation of letters etc. Moreover, we have developed a Novel method named as brevity adjustment to handle excess text. In addition we have also done a quantitative analysis of frequent errors arise due to frequently used words and less frequently used words. Project page is available at: https://github.com/Abhinaw3906/ABHINAW-MATRIX.
{"title":"ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images","authors":"Abhinaw Jagtap, Nachiket Tapas, R. G. Brajesh","doi":"arxiv-2409.11874","DOIUrl":"https://doi.org/arxiv-2409.11874","url":null,"abstract":"In the fast-evolving field of Generative AI, platforms like MidJourney,\u0000DALL-E, and Stable Diffusion have transformed Text-to-Image (T2I) Generation.\u0000However, despite their impressive ability to create high-quality images, they\u0000often struggle to generate accurate text within these images. Theoretically, if\u0000we could achieve accurate text generation in AI images in a ``zero-shot''\u0000manner, it would not only make AI-generated images more meaningful but also\u0000democratize the graphic design industry. The first step towards this goal is to\u0000create a robust scoring matrix for evaluating text accuracy in AI-generated\u0000images. Although there are existing bench-marking methods like CLIP SCORE and\u0000T2I-CompBench++, there's still a gap in systematically evaluating text and\u0000typography in AI-generated images, especially with diffusion-based methods. In\u0000this paper, we introduce a novel evaluation matrix designed explicitly for\u0000quantifying the performance of text and typography generation within\u0000AI-generated images. We have used letter by letter matching strategy to compute\u0000the exact matching scores from the reference text to the AI generated text. Our\u0000novel approach to calculate the score takes care of multiple redundancies such\u0000as repetition of words, case sensitivity, mixing of words, irregular\u0000incorporation of letters etc. Moreover, we have developed a Novel method named\u0000as brevity adjustment to handle excess text. In addition we have also done a\u0000quantitative analysis of frequent errors arise due to frequently used words and\u0000less frequently used words. Project page is available at:\u0000https://github.com/Abhinaw3906/ABHINAW-MATRIX.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269823","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Codruţ-Andrei Diaconu, Konrad Heidler, Jonathan L. Bamber, Harry Zekollari
The more than 200,000 glaciers outside the ice sheets play a crucial role in our society by influencing sea-level rise, water resource management, natural hazards, biodiversity, and tourism. However, only a fraction of these glaciers benefit from consistent and detailed in-situ observations that allow for assessing their status and changes over time. This limitation can, in part, be overcome by relying on satellite-based Earth Observation techniques. Satellite-based glacier mapping applications have historically mainly relied on manual and semi-automatic detection methods, while recently, a fast and notable transition to deep learning techniques has started. This chapter reviews how combining multi-sensor remote sensing data and deep learning allows us to better delineate (i.e. map) glaciers and detect their temporal changes. We explain how relying on deep learning multi-sensor frameworks to map glaciers benefits from the extensive availability of regional and global glacier inventories. We also analyse the rationale behind glacier mapping, the benefits of deep learning methodologies, and the inherent challenges in integrating multi-sensor earth observation data with deep learning algorithms. While our review aims to provide a broad overview of glacier mapping efforts, we highlight a few setups where deep learning multi-sensor remote sensing applications have a considerable potential added value. This includes applications for debris-covered and rock glaciers that are visually difficult to distinguish from surroundings and for calving glaciers that are in contact with the ocean. These specific cases are illustrated through a series of visual imageries, highlighting some significant advantages and challenges when detecting glacier changes, including dealing with seasonal snow cover, changing debris coverage, and distinguishing glacier fronts from the surrounding sea ice.
{"title":"Multi-Sensor Deep Learning for Glacier Mapping","authors":"Codruţ-Andrei Diaconu, Konrad Heidler, Jonathan L. Bamber, Harry Zekollari","doi":"arxiv-2409.12034","DOIUrl":"https://doi.org/arxiv-2409.12034","url":null,"abstract":"The more than 200,000 glaciers outside the ice sheets play a crucial role in\u0000our society by influencing sea-level rise, water resource management, natural\u0000hazards, biodiversity, and tourism. However, only a fraction of these glaciers\u0000benefit from consistent and detailed in-situ observations that allow for\u0000assessing their status and changes over time. This limitation can, in part, be\u0000overcome by relying on satellite-based Earth Observation techniques.\u0000Satellite-based glacier mapping applications have historically mainly relied on\u0000manual and semi-automatic detection methods, while recently, a fast and notable\u0000transition to deep learning techniques has started. This chapter reviews how combining multi-sensor remote sensing data and deep\u0000learning allows us to better delineate (i.e. map) glaciers and detect their\u0000temporal changes. We explain how relying on deep learning multi-sensor\u0000frameworks to map glaciers benefits from the extensive availability of regional\u0000and global glacier inventories. We also analyse the rationale behind glacier\u0000mapping, the benefits of deep learning methodologies, and the inherent\u0000challenges in integrating multi-sensor earth observation data with deep\u0000learning algorithms. While our review aims to provide a broad overview of glacier mapping efforts,\u0000we highlight a few setups where deep learning multi-sensor remote sensing\u0000applications have a considerable potential added value. This includes\u0000applications for debris-covered and rock glaciers that are visually difficult\u0000to distinguish from surroundings and for calving glaciers that are in contact\u0000with the ocean. These specific cases are illustrated through a series of visual\u0000imageries, highlighting some significant advantages and challenges when\u0000detecting glacier changes, including dealing with seasonal snow cover, changing\u0000debris coverage, and distinguishing glacier fronts from the surrounding sea\u0000ice.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Light-Field (LF) image is emerging 4D data of light rays that is capable of realistically presenting spatial and angular information of 3D scene. However, the large data volume of LF images becomes the most challenging issue in real-time processing, transmission, and storage. In this paper, we propose an end-to-end deep LF Image Compression method Using Disentangled Representation and Asymmetrical Strip Convolution (LFIC-DRASC) to improve coding efficiency. Firstly, we formulate the LF image compression problem as learning a disentangled LF representation network and an image encoding-decoding network. Secondly, we propose two novel feature extractors that leverage the structural prior of LF data by integrating features across different dimensions. Meanwhile, disentangled LF representation network is proposed to enhance the LF feature disentangling and decoupling. Thirdly, we propose the LFIC-DRASC for LF image compression, where two Asymmetrical Strip Convolution (ASC) operators, i.e. horizontal and vertical, are proposed to capture long-range correlation in LF feature space. These two ASC operators can be combined with the square convolution to further decouple LF features, which enhances the model ability in representing intricate spatial relationships. Experimental results demonstrate that the proposed LFIC-DRASC achieves an average of 20.5% bit rate reductions comparing with the state-of-the-art methods.
{"title":"LFIC-DRASC: Deep Light Field Image Compression Using Disentangled Representation and Asymmetrical Strip Convolution","authors":"Shiyu Feng, Yun Zhang, Linwei Zhu, Sam Kwong","doi":"arxiv-2409.11711","DOIUrl":"https://doi.org/arxiv-2409.11711","url":null,"abstract":"Light-Field (LF) image is emerging 4D data of light rays that is capable of\u0000realistically presenting spatial and angular information of 3D scene. However,\u0000the large data volume of LF images becomes the most challenging issue in\u0000real-time processing, transmission, and storage. In this paper, we propose an\u0000end-to-end deep LF Image Compression method Using Disentangled Representation\u0000and Asymmetrical Strip Convolution (LFIC-DRASC) to improve coding efficiency.\u0000Firstly, we formulate the LF image compression problem as learning a\u0000disentangled LF representation network and an image encoding-decoding network.\u0000Secondly, we propose two novel feature extractors that leverage the structural\u0000prior of LF data by integrating features across different dimensions.\u0000Meanwhile, disentangled LF representation network is proposed to enhance the LF\u0000feature disentangling and decoupling. Thirdly, we propose the LFIC-DRASC for LF\u0000image compression, where two Asymmetrical Strip Convolution (ASC) operators,\u0000i.e. horizontal and vertical, are proposed to capture long-range correlation in\u0000LF feature space. These two ASC operators can be combined with the square\u0000convolution to further decouple LF features, which enhances the model ability\u0000in representing intricate spatial relationships. Experimental results\u0000demonstrate that the proposed LFIC-DRASC achieves an average of 20.5% bit rate\u0000reductions comparing with the state-of-the-art methods.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263132","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongjun Zhu, Jiaohang Huang, Kuo Chen, Xuehui Ying, Ying Qian
Brain Tumor Segmentation (BraTS) plays a critical role in clinical diagnosis, treatment planning, and monitoring the progression of brain tumors. However, due to the variability in tumor appearance, size, and intensity across different MRI modalities, automated segmentation remains a challenging task. In this study, we propose a novel Transformer-based framework, multiPI-TransBTS, which integrates multi-physical information to enhance segmentation accuracy. The model leverages spatial information, semantic information, and multi-modal imaging data, addressing the inherent heterogeneity in brain tumor characteristics. The multiPI-TransBTS framework consists of an encoder, an Adaptive Feature Fusion (AFF) module, and a multi-source, multi-scale feature decoder. The encoder incorporates a multi-branch architecture to separately extract modality-specific features from different MRI sequences. The AFF module fuses information from multiple sources using channel-wise and element-wise attention, ensuring effective feature recalibration. The decoder combines both common and task-specific features through a Task-Specific Feature Introduction (TSFI) strategy, producing accurate segmentation outputs for Whole Tumor (WT), Tumor Core (TC), and Enhancing Tumor (ET) regions. Comprehensive evaluations on the BraTS2019 and BraTS2020 datasets demonstrate the superiority of multiPI-TransBTS over the state-of-the-art methods. The model consistently achieves better Dice coefficients, Hausdorff distances, and Sensitivity scores, highlighting its effectiveness in addressing the BraTS challenges. Our results also indicate the need for further exploration of the balance between precision and recall in the ET segmentation task. The proposed framework represents a significant advancement in BraTS, with potential implications for improving clinical outcomes for brain tumor patients.
{"title":"multiPI-TransBTS: A Multi-Path Learning Framework for Brain Tumor Image Segmentation Based on Multi-Physical Information","authors":"Hongjun Zhu, Jiaohang Huang, Kuo Chen, Xuehui Ying, Ying Qian","doi":"arxiv-2409.12167","DOIUrl":"https://doi.org/arxiv-2409.12167","url":null,"abstract":"Brain Tumor Segmentation (BraTS) plays a critical role in clinical diagnosis,\u0000treatment planning, and monitoring the progression of brain tumors. However,\u0000due to the variability in tumor appearance, size, and intensity across\u0000different MRI modalities, automated segmentation remains a challenging task. In\u0000this study, we propose a novel Transformer-based framework, multiPI-TransBTS,\u0000which integrates multi-physical information to enhance segmentation accuracy.\u0000The model leverages spatial information, semantic information, and multi-modal\u0000imaging data, addressing the inherent heterogeneity in brain tumor\u0000characteristics. The multiPI-TransBTS framework consists of an encoder, an\u0000Adaptive Feature Fusion (AFF) module, and a multi-source, multi-scale feature\u0000decoder. The encoder incorporates a multi-branch architecture to separately\u0000extract modality-specific features from different MRI sequences. The AFF module\u0000fuses information from multiple sources using channel-wise and element-wise\u0000attention, ensuring effective feature recalibration. The decoder combines both\u0000common and task-specific features through a Task-Specific Feature Introduction\u0000(TSFI) strategy, producing accurate segmentation outputs for Whole Tumor (WT),\u0000Tumor Core (TC), and Enhancing Tumor (ET) regions. Comprehensive evaluations on\u0000the BraTS2019 and BraTS2020 datasets demonstrate the superiority of\u0000multiPI-TransBTS over the state-of-the-art methods. The model consistently\u0000achieves better Dice coefficients, Hausdorff distances, and Sensitivity scores,\u0000highlighting its effectiveness in addressing the BraTS challenges. Our results\u0000also indicate the need for further exploration of the balance between precision\u0000and recall in the ET segmentation task. The proposed framework represents a\u0000significant advancement in BraTS, with potential implications for improving\u0000clinical outcomes for brain tumor patients.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pamela Osuna-Vargas, Maren H. Wehrheim, Lucas Zinz, Johanna Rahm, Ashwin Balakrishnan, Alexandra Kaminer, Mike Heilemann, Matthias Kaschube
Advances in microscopy imaging enable researchers to visualize structures at the nanoscale level thereby unraveling intricate details of biological organization. However, challenges such as image noise, photobleaching of fluorophores, and low tolerability of biological samples to high light doses remain, restricting temporal resolutions and experiment durations. Reduced laser doses enable longer measurements at the cost of lower resolution and increased noise, which hinders accurate downstream analyses. Here we train a denoising diffusion probabilistic model (DDPM) to predict high-resolution images by conditioning the model on low-resolution information. Additionally, the probabilistic aspect of the DDPM allows for repeated generation of images that tend to further increase the signal-to-noise ratio. We show that our model achieves a performance that is better or similar to the previously best-performing methods, across four highly diverse datasets. Importantly, while any of the previous methods show competitive performance for some, but not all datasets, our method consistently achieves high performance across all four data sets, suggesting high generalizability.
{"title":"Denoising diffusion models for high-resolution microscopy image restoration","authors":"Pamela Osuna-Vargas, Maren H. Wehrheim, Lucas Zinz, Johanna Rahm, Ashwin Balakrishnan, Alexandra Kaminer, Mike Heilemann, Matthias Kaschube","doi":"arxiv-2409.12078","DOIUrl":"https://doi.org/arxiv-2409.12078","url":null,"abstract":"Advances in microscopy imaging enable researchers to visualize structures at\u0000the nanoscale level thereby unraveling intricate details of biological\u0000organization. However, challenges such as image noise, photobleaching of\u0000fluorophores, and low tolerability of biological samples to high light doses\u0000remain, restricting temporal resolutions and experiment durations. Reduced\u0000laser doses enable longer measurements at the cost of lower resolution and\u0000increased noise, which hinders accurate downstream analyses. Here we train a\u0000denoising diffusion probabilistic model (DDPM) to predict high-resolution\u0000images by conditioning the model on low-resolution information. Additionally,\u0000the probabilistic aspect of the DDPM allows for repeated generation of images\u0000that tend to further increase the signal-to-noise ratio. We show that our model\u0000achieves a performance that is better or similar to the previously\u0000best-performing methods, across four highly diverse datasets. Importantly,\u0000while any of the previous methods show competitive performance for some, but\u0000not all datasets, our method consistently achieves high performance across all\u0000four data sets, suggesting high generalizability.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leron Julian, Haejoon Lee, Soummya Kar, Aswin C. Sankaranarayanan
The occlusion of the sun by clouds is one of the primary sources of uncertainties in solar power generation, and is a factor that affects the wide-spread use of solar power as a primary energy source. Real-time forecasting of cloud movement and, as a result, solar irradiance is necessary to schedule and allocate energy across grid-connected photovoltaic systems. Previous works monitored cloud movement using wide-angle field of view imagery of the sky. However, such images have poor resolution for clouds that appear near the horizon, which reduces their effectiveness for long term prediction of solar occlusion. Specifically, to be able to predict occlusion of the sun over long time periods, clouds that are near the horizon need to be detected, and their velocities estimated precisely. To enable such a system, we design and deploy a catadioptric system that delivers wide-angle imagery with uniform spatial resolution of the sky over its field of view. To enable prediction over a longer time horizon, we design an algorithm that uses carefully selected spatio-temporal slices of the imagery using estimated wind direction and velocity as inputs. Using ray-tracing simulations as well as a real testbed deployed outdoors, we show that the system is capable of predicting solar occlusion as well as irradiance for tens of minutes in the future, which is an order of magnitude improvement over prior work.
{"title":"Computational Imaging for Long-Term Prediction of Solar Irradiance","authors":"Leron Julian, Haejoon Lee, Soummya Kar, Aswin C. Sankaranarayanan","doi":"arxiv-2409.12016","DOIUrl":"https://doi.org/arxiv-2409.12016","url":null,"abstract":"The occlusion of the sun by clouds is one of the primary sources of\u0000uncertainties in solar power generation, and is a factor that affects the\u0000wide-spread use of solar power as a primary energy source. Real-time\u0000forecasting of cloud movement and, as a result, solar irradiance is necessary\u0000to schedule and allocate energy across grid-connected photovoltaic systems.\u0000Previous works monitored cloud movement using wide-angle field of view imagery\u0000of the sky. However, such images have poor resolution for clouds that appear\u0000near the horizon, which reduces their effectiveness for long term prediction of\u0000solar occlusion. Specifically, to be able to predict occlusion of the sun over\u0000long time periods, clouds that are near the horizon need to be detected, and\u0000their velocities estimated precisely. To enable such a system, we design and\u0000deploy a catadioptric system that delivers wide-angle imagery with uniform\u0000spatial resolution of the sky over its field of view. To enable prediction over\u0000a longer time horizon, we design an algorithm that uses carefully selected\u0000spatio-temporal slices of the imagery using estimated wind direction and\u0000velocity as inputs. Using ray-tracing simulations as well as a real testbed\u0000deployed outdoors, we show that the system is capable of predicting solar\u0000occlusion as well as irradiance for tens of minutes in the future, which is an\u0000order of magnitude improvement over prior work.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}