André F. R. GuardaInstituto de Telecomunicações, Lisbon, Portugal, Nuno M. M. RodriguesInstituto de Telecomunicações, Lisbon, PortugalESTG, Politécnico de Leiria, Leiria, Portugal, Fernando PereiraInstituto de Telecomunicações, Lisbon, PortugalInstituto Superior Técnico - Universidade de Lisboa, Lisbon, Portugal
Efficient point cloud coding has become increasingly critical for multiple applications such as virtual reality, autonomous driving, and digital twin systems, where rich and interactive 3D data representations may functionally make the difference. Deep learning has emerged as a powerful tool in this domain, offering advanced techniques for compressing point clouds more efficiently than conventional coding methods while also allowing effective computer vision tasks performed in the compressed domain thus, for the first time, making available a common compressed visual representation effective for both man and machine. Taking advantage of this potential, JPEG has recently finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard offering efficient lossy coding of static point clouds, targeting both human visualization and machine processing by leveraging deep learning models for geometry and color coding. The geometry is processed directly in its original 3D form using sparse convolutional neural networks, while the color data is projected onto 2D images and encoded using the also learning-based JPEG AI standard. The goal of this paper is to provide a complete technical description of the JPEG PCC standard, along with a thorough benchmarking of its performance against the state-of-the-art, while highlighting its main strengths and weaknesses. In terms of compression performance, JPEG PCC outperforms the conventional MPEG PCC standards, especially in geometry coding, achieving significant rate reductions. Color compression performance is less competitive but this is overcome by the power of a full learning-based coding framework for both geometry and color and the associated effective compressed domain processing.
{"title":"The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine","authors":"André F. R. GuardaInstituto de Telecomunicações, Lisbon, Portugal, Nuno M. M. RodriguesInstituto de Telecomunicações, Lisbon, PortugalESTG, Politécnico de Leiria, Leiria, Portugal, Fernando PereiraInstituto de Telecomunicações, Lisbon, PortugalInstituto Superior Técnico - Universidade de Lisboa, Lisbon, Portugal","doi":"arxiv-2409.08130","DOIUrl":"https://doi.org/arxiv-2409.08130","url":null,"abstract":"Efficient point cloud coding has become increasingly critical for multiple\u0000applications such as virtual reality, autonomous driving, and digital twin\u0000systems, where rich and interactive 3D data representations may functionally\u0000make the difference. Deep learning has emerged as a powerful tool in this\u0000domain, offering advanced techniques for compressing point clouds more\u0000efficiently than conventional coding methods while also allowing effective\u0000computer vision tasks performed in the compressed domain thus, for the first\u0000time, making available a common compressed visual representation effective for\u0000both man and machine. Taking advantage of this potential, JPEG has recently\u0000finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard\u0000offering efficient lossy coding of static point clouds, targeting both human\u0000visualization and machine processing by leveraging deep learning models for\u0000geometry and color coding. The geometry is processed directly in its original\u00003D form using sparse convolutional neural networks, while the color data is\u0000projected onto 2D images and encoded using the also learning-based JPEG AI\u0000standard. The goal of this paper is to provide a complete technical description\u0000of the JPEG PCC standard, along with a thorough benchmarking of its performance\u0000against the state-of-the-art, while highlighting its main strengths and\u0000weaknesses. In terms of compression performance, JPEG PCC outperforms the\u0000conventional MPEG PCC standards, especially in geometry coding, achieving\u0000significant rate reductions. Color compression performance is less competitive\u0000but this is overcome by the power of a full learning-based coding framework for\u0000both geometry and color and the associated effective compressed domain\u0000processing.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alzheimer's Disease (AD) is a non-curable progressive neurodegenerative disorder that affects the human brain, leading to a decline in memory, cognitive abilities, and eventually, the ability to carry out daily tasks. Manual diagnosis of Alzheimer's disease from MRI images is fraught with less sensitivity and it is a very tedious process for neurologists. Therefore, there is a need for an automatic Computer Assisted Diagnosis (CAD) system, which can detect AD at early stages with higher accuracy. In this research, we have proposed a novel AD-Lite Net model (trained from scratch), that could alleviate the aforementioned problem. The novelties we bring here in this research are, (I) We have proposed a very lightweight CNN model by incorporating Depth Wise Separable Convolutional (DWSC) layers and Global Average Pooling (GAP) layers. (II) We have leveraged a ``parallel concatenation block'' (pcb), in the proposed AD-Lite Net model. This pcb consists of a Transformation layer (Tx-layer), followed by two convolutional layers, which are thereby concatenated with the original base model. This Tx-layer converts the features into very distinct kind of features, which are imperative for the Alzheimer's disease. As a consequence, the proposed AD-Lite Net model with ``parallel concatenation'' converges faster and automatically mitigates the class imbalance problem from the MRI datasets in a very generalized way. For the validity of our proposed model, we have implemented it on three different MRI datasets. Furthermore, we have combined the ADNI and AD datasets and subsequently performed a 10-fold cross-validation experiment to verify the model's generalization ability. Extensive experimental results showed that our proposed model has outperformed all the existing CNN models, and one recent trend Vision Transformer (ViT) model by a significant margin.
阿尔茨海默病(Alzheimer's Disease,AD)是一种无法治愈的渐进性神经退行性疾病,会影响人的大脑,导致记忆力和认知能力下降,最终影响日常工作能力。因此,需要一种自动计算机辅助诊断(CAD)系统,它能在早期阶段更准确地检测出阿尔茨海默病。在这项研究中,我们提出了一个新颖的 AD-Lite Net 模型(从零开始训练),可以缓解上述问题。本研究的新颖之处在于:(I) 我们提出了一种非常轻量级的 CNN 模型,该模型包含深度可分离卷积(DWSC)层和全局平均池化(GAP)层。该模块由一个转换层(Tx-layer)和两个卷积层(concatenated with the original base model)组成。Tx 层将特征转换成非常独特的特征,而这些特征对于阿尔茨海默病来说是必不可少的。因此,所提出的具有 "并行合并 "功能的 AD-Lite Net 模型收敛速度更快,并能以非常通用的方式自动缓解核磁共振成像数据集的类别不平衡问题。为了证明我们提出的模型的有效性,我们在三个不同的磁共振成像数据集上实施了该模型。此外,我们还结合了 ADNI 和 AD 数据集,随后进行了 10 倍交叉验证实验,以验证模型的泛化能力。广泛的实验结果表明,我们提出的模型性能明显优于现有的所有 CNN 模型和最近流行的一个视觉转换器(ViT)模型。
{"title":"AD-Lite Net: A Lightweight and Concatenated CNN Model for Alzheimer's Detection from MRI Images","authors":"Santanu Roy, Archit Gupta, Shubhi Tiwari, Palak Sahu","doi":"arxiv-2409.08170","DOIUrl":"https://doi.org/arxiv-2409.08170","url":null,"abstract":"Alzheimer's Disease (AD) is a non-curable progressive neurodegenerative\u0000disorder that affects the human brain, leading to a decline in memory,\u0000cognitive abilities, and eventually, the ability to carry out daily tasks.\u0000Manual diagnosis of Alzheimer's disease from MRI images is fraught with less\u0000sensitivity and it is a very tedious process for neurologists. Therefore, there\u0000is a need for an automatic Computer Assisted Diagnosis (CAD) system, which can\u0000detect AD at early stages with higher accuracy. In this research, we have\u0000proposed a novel AD-Lite Net model (trained from scratch), that could alleviate\u0000the aforementioned problem. The novelties we bring here in this research are,\u0000(I) We have proposed a very lightweight CNN model by incorporating Depth Wise\u0000Separable Convolutional (DWSC) layers and Global Average Pooling (GAP) layers.\u0000(II) We have leveraged a ``parallel concatenation block'' (pcb), in the\u0000proposed AD-Lite Net model. This pcb consists of a Transformation layer\u0000(Tx-layer), followed by two convolutional layers, which are thereby\u0000concatenated with the original base model. This Tx-layer converts the features\u0000into very distinct kind of features, which are imperative for the Alzheimer's\u0000disease. As a consequence, the proposed AD-Lite Net model with ``parallel\u0000concatenation'' converges faster and automatically mitigates the class\u0000imbalance problem from the MRI datasets in a very generalized way. For the\u0000validity of our proposed model, we have implemented it on three different MRI\u0000datasets. Furthermore, we have combined the ADNI and AD datasets and\u0000subsequently performed a 10-fold cross-validation experiment to verify the\u0000model's generalization ability. Extensive experimental results showed that our\u0000proposed model has outperformed all the existing CNN models, and one recent\u0000trend Vision Transformer (ViT) model by a significant margin.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Runjia Li, Junlin Han, Luke Melas-Kyriazi, Chunyi Sun, Zhaochong An, Zhongrui Gui, Shuyang Sun, Philip Torr, Tomas Jakab
We present DreamBeast, a novel method based on score distillation sampling (SDS) for generating fantastical 3D animal assets composed of distinct parts. Existing SDS methods often struggle with this generation task due to a limited understanding of part-level semantics in text-to-image diffusion models. While recent diffusion models, such as Stable Diffusion 3, demonstrate a better part-level understanding, they are prohibitively slow and exhibit other common problems associated with single-view diffusion models. DreamBeast overcomes this limitation through a novel part-aware knowledge transfer mechanism. For each generated asset, we efficiently extract part-level knowledge from the Stable Diffusion 3 model into a 3D Part-Affinity implicit representation. This enables us to instantly generate Part-Affinity maps from arbitrary camera views, which we then use to modulate the guidance of a multi-view diffusion model during SDS to create 3D assets of fantastical animals. DreamBeast significantly enhances the quality of generated 3D creatures with user-specified part compositions while reducing computational overhead, as demonstrated by extensive quantitative and qualitative evaluations.
{"title":"DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer","authors":"Runjia Li, Junlin Han, Luke Melas-Kyriazi, Chunyi Sun, Zhaochong An, Zhongrui Gui, Shuyang Sun, Philip Torr, Tomas Jakab","doi":"arxiv-2409.08271","DOIUrl":"https://doi.org/arxiv-2409.08271","url":null,"abstract":"We present DreamBeast, a novel method based on score distillation sampling\u0000(SDS) for generating fantastical 3D animal assets composed of distinct parts.\u0000Existing SDS methods often struggle with this generation task due to a limited\u0000understanding of part-level semantics in text-to-image diffusion models. While\u0000recent diffusion models, such as Stable Diffusion 3, demonstrate a better\u0000part-level understanding, they are prohibitively slow and exhibit other common\u0000problems associated with single-view diffusion models. DreamBeast overcomes\u0000this limitation through a novel part-aware knowledge transfer mechanism. For\u0000each generated asset, we efficiently extract part-level knowledge from the\u0000Stable Diffusion 3 model into a 3D Part-Affinity implicit representation. This\u0000enables us to instantly generate Part-Affinity maps from arbitrary camera\u0000views, which we then use to modulate the guidance of a multi-view diffusion\u0000model during SDS to create 3D assets of fantastical animals. DreamBeast\u0000significantly enhances the quality of generated 3D creatures with\u0000user-specified part compositions while reducing computational overhead, as\u0000demonstrated by extensive quantitative and qualitative evaluations.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Heejong Kim, Leo Milecki, Mina C Moghadam, Fengbei Liu, Minh Nguyen, Eric Qiu, Abhishek Thanki, Mert R Sabuncu
Segmentation is a crucial task in the medical imaging field and is often an important primary step or even a prerequisite to the analysis of medical volumes. Yet treatments such as surgery complicate the accurate delineation of regions of interest. The BraTS Post-Treatment 2024 Challenge published the first public dataset for post-surgery glioma segmentation and addresses the aforementioned issue by fostering the development of automated segmentation tools for glioma in MRI data. In this effort, we propose two straightforward approaches to enhance the segmentation performances of deep learning-based methodologies. First, we incorporate an additional input based on a simple linear combination of the available MRI sequences input, which highlights enhancing tumors. Second, we employ various ensembling methods to weigh the contribution of a battery of models. Our results demonstrate that these approaches significantly improve segmentation performance compared to baseline models, underscoring the effectiveness of these simple approaches in improving medical image segmentation tasks.
{"title":"Effective Segmentation of Post-Treatment Gliomas Using Simple Approaches: Artificial Sequence Generation and Ensemble Models","authors":"Heejong Kim, Leo Milecki, Mina C Moghadam, Fengbei Liu, Minh Nguyen, Eric Qiu, Abhishek Thanki, Mert R Sabuncu","doi":"arxiv-2409.08143","DOIUrl":"https://doi.org/arxiv-2409.08143","url":null,"abstract":"Segmentation is a crucial task in the medical imaging field and is often an\u0000important primary step or even a prerequisite to the analysis of medical\u0000volumes. Yet treatments such as surgery complicate the accurate delineation of\u0000regions of interest. The BraTS Post-Treatment 2024 Challenge published the\u0000first public dataset for post-surgery glioma segmentation and addresses the\u0000aforementioned issue by fostering the development of automated segmentation\u0000tools for glioma in MRI data. In this effort, we propose two straightforward\u0000approaches to enhance the segmentation performances of deep learning-based\u0000methodologies. First, we incorporate an additional input based on a simple\u0000linear combination of the available MRI sequences input, which highlights\u0000enhancing tumors. Second, we employ various ensembling methods to weigh the\u0000contribution of a battery of models. Our results demonstrate that these\u0000approaches significantly improve segmentation performance compared to baseline\u0000models, underscoring the effectiveness of these simple approaches in improving\u0000medical image segmentation tasks.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Optical Coherence Tomography Angiography (OCTA) is a crucial imaging technique for visualizing retinal vasculature and diagnosing eye diseases such as diabetic retinopathy and glaucoma. However, precise segmentation of OCTA vasculature remains challenging due to the multi-scale vessel structures and noise from poor image quality and eye lesions. In this study, we proposed OCTAMamba, a novel U-shaped network based on the Mamba architecture, designed to segment vasculature in OCTA accurately. OCTAMamba integrates a Quad Stream Efficient Mining Embedding Module for local feature extraction, a Multi-Scale Dilated Asymmetric Convolution Module to capture multi-scale vasculature, and a Focused Feature Recalibration Module to filter noise and highlight target areas. Our method achieves efficient global modeling and local feature extraction while maintaining linear complexity, making it suitable for low-computation medical applications. Extensive experiments on the OCTA 3M, OCTA 6M, and ROSSA datasets demonstrated that OCTAMamba outperforms state-of-the-art methods, providing a new reference for efficient OCTA segmentation. Code is available at https://github.com/zs1314/OCTAMamba
{"title":"OCTAMamba: A State-Space Model Approach for Precision OCTA Vasculature Segmentation","authors":"Shun Zou, Zhuo Zhang, Guangwei Gao","doi":"arxiv-2409.08000","DOIUrl":"https://doi.org/arxiv-2409.08000","url":null,"abstract":"Optical Coherence Tomography Angiography (OCTA) is a crucial imaging\u0000technique for visualizing retinal vasculature and diagnosing eye diseases such\u0000as diabetic retinopathy and glaucoma. However, precise segmentation of OCTA\u0000vasculature remains challenging due to the multi-scale vessel structures and\u0000noise from poor image quality and eye lesions. In this study, we proposed\u0000OCTAMamba, a novel U-shaped network based on the Mamba architecture, designed\u0000to segment vasculature in OCTA accurately. OCTAMamba integrates a Quad Stream\u0000Efficient Mining Embedding Module for local feature extraction, a Multi-Scale\u0000Dilated Asymmetric Convolution Module to capture multi-scale vasculature, and a\u0000Focused Feature Recalibration Module to filter noise and highlight target\u0000areas. Our method achieves efficient global modeling and local feature\u0000extraction while maintaining linear complexity, making it suitable for\u0000low-computation medical applications. Extensive experiments on the OCTA 3M,\u0000OCTA 6M, and ROSSA datasets demonstrated that OCTAMamba outperforms\u0000state-of-the-art methods, providing a new reference for efficient OCTA\u0000segmentation. Code is available at https://github.com/zs1314/OCTAMamba","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Retinal fundus photography offers a non-invasive way to diagnose and monitor a variety of retinal diseases, but is prone to inherent quality glitches arising from systemic imperfections or operator/patient-related factors. However, high-quality retinal images are crucial for carrying out accurate diagnoses and automated analyses. The fundus image enhancement is typically formulated as a distribution alignment problem, by finding a one-to-one mapping between a low-quality image and its high-quality counterpart. This paper proposes a context-informed optimal transport (OT) learning framework for tackling unpaired fundus image enhancement. In contrast to standard generative image enhancement methods, which struggle with handling contextual information (e.g., over-tampered local structures and unwanted artifacts), the proposed context-aware OT learning paradigm better preserves local structures and minimizes unwanted artifacts. Leveraging deep contextual features, we derive the proposed context-aware OT using the earth mover's distance and show that the proposed context-OT has a solid theoretical guarantee. Experimental results on a large-scale dataset demonstrate the superiority of the proposed method over several state-of-the-art supervised and unsupervised methods in terms of signal-to-noise ratio, structural similarity index, as well as two downstream tasks. The code is available at url{https://github.com/Retinal-Research/Contextual-OT}.
视网膜眼底摄影为诊断和监测各种视网膜疾病提供了一种无创方法,但由于系统缺陷或操作员/患者相关因素,容易产生固有的质量问题。然而,高质量的视网膜图像对于进行准确诊断和自动分析至关重要。眼底图像增强通常被表述为分布对齐问题,即在低质量图像和高质量图像之间找到一一对应的映射关系。本文提出了一种基于上下文的最优传输(OT)学习框架,用于解决无配对眼底图像增强问题。标准的生成式图像增强方法在处理上下文信息(如过度篡改的局部结构和不需要的伪影)方面存在困难,与之相比,本文提出的上下文感知 OT 学习范式能更好地保留局部结构,并最大限度地减少不需要的伪影。利用深度上下文特征,我们利用地球移动距离推导出了所提出的上下文感知 OT,并证明所提出的上下文 OT 具有坚实的理论保证。在大规模数据集上的实验结果表明,在信噪比、结构相似性指数以及两个下游任务方面,提出的方法优于几种最先进的有监督和无监督方法。代码可在(url{https://github.com/Retinal-Research/Contextual-OT}.
{"title":"Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement","authors":"Vamsi Krishna Vasa, Peijie Qiu, Wenhui Zhu, Yujian Xiong, Oana Dumitrascu, Yalin Wang","doi":"arxiv-2409.07862","DOIUrl":"https://doi.org/arxiv-2409.07862","url":null,"abstract":"Retinal fundus photography offers a non-invasive way to diagnose and monitor\u0000a variety of retinal diseases, but is prone to inherent quality glitches\u0000arising from systemic imperfections or operator/patient-related factors.\u0000However, high-quality retinal images are crucial for carrying out accurate\u0000diagnoses and automated analyses. The fundus image enhancement is typically\u0000formulated as a distribution alignment problem, by finding a one-to-one mapping\u0000between a low-quality image and its high-quality counterpart. This paper\u0000proposes a context-informed optimal transport (OT) learning framework for\u0000tackling unpaired fundus image enhancement. In contrast to standard generative\u0000image enhancement methods, which struggle with handling contextual information\u0000(e.g., over-tampered local structures and unwanted artifacts), the proposed\u0000context-aware OT learning paradigm better preserves local structures and\u0000minimizes unwanted artifacts. Leveraging deep contextual features, we derive\u0000the proposed context-aware OT using the earth mover's distance and show that\u0000the proposed context-OT has a solid theoretical guarantee. Experimental results\u0000on a large-scale dataset demonstrate the superiority of the proposed method\u0000over several state-of-the-art supervised and unsupervised methods in terms of\u0000signal-to-noise ratio, structural similarity index, as well as two downstream\u0000tasks. The code is available at\u0000url{https://github.com/Retinal-Research/Contextual-OT}.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Accurate lesion segmentation in whole-body PET/CT scans is crucial for cancer diagnosis and treatment planning, but limited datasets often hinder the performance of automated segmentation models. In this paper, we explore the potential of leveraging the deep prior from a generative model to serve as a data augmenter for automated lesion segmentation in PET/CT scans. We adapt the DiffTumor method, originally designed for CT images, to generate synthetic PET-CT images with lesions. Our approach trains the generative model on the AutoPET dataset and uses it to expand the training data. We then compare the performance of segmentation models trained on the original and augmented datasets. Our findings show that the model trained on the augmented dataset achieves a higher Dice score, demonstrating the potential of our data augmentation approach. In a nutshell, this work presents a promising direction for improving lesion segmentation in whole-body PET/CT scans with limited datasets, potentially enhancing the accuracy and reliability of cancer diagnostics.
{"title":"AutoPET Challenge: Tumour Synthesis for Data Augmentation","authors":"Lap Yan Lennon Chan, Chenxin Li, Yixuan Yuan","doi":"arxiv-2409.08068","DOIUrl":"https://doi.org/arxiv-2409.08068","url":null,"abstract":"Accurate lesion segmentation in whole-body PET/CT scans is crucial for cancer\u0000diagnosis and treatment planning, but limited datasets often hinder the\u0000performance of automated segmentation models. In this paper, we explore the\u0000potential of leveraging the deep prior from a generative model to serve as a\u0000data augmenter for automated lesion segmentation in PET/CT scans. We adapt the\u0000DiffTumor method, originally designed for CT images, to generate synthetic\u0000PET-CT images with lesions. Our approach trains the generative model on the\u0000AutoPET dataset and uses it to expand the training data. We then compare the\u0000performance of segmentation models trained on the original and augmented\u0000datasets. Our findings show that the model trained on the augmented dataset\u0000achieves a higher Dice score, demonstrating the potential of our data\u0000augmentation approach. In a nutshell, this work presents a promising direction\u0000for improving lesion segmentation in whole-body PET/CT scans with limited\u0000datasets, potentially enhancing the accuracy and reliability of cancer\u0000diagnostics.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Capellán-Martín, Zhifan Jiang, Abhijeet Parida, Xinyang Liu, Van Lam, Hareem Nisar, Austin Tapp, Sarah Elsharkawi, Maria J. Ledesma-Carbayo, Syed Muhammad Anwar, Marius George Linguraru
Segmenting brain tumors in multi-parametric magnetic resonance imaging enables performing quantitative analysis in support of clinical trials and personalized patient care. This analysis provides the potential to impact clinical decision-making processes, including diagnosis and prognosis. In 2023, the well-established Brain Tumor Segmentation (BraTS) challenge presented a substantial expansion with eight tasks and 4,500 brain tumor cases. In this paper, we present a deep learning-based ensemble strategy that is evaluated for newly included tumor cases in three tasks: pediatric brain tumors (PED), intracranial meningioma (MEN), and brain metastases (MET). In particular, we ensemble outputs from state-of-the-art nnU-Net and Swin UNETR models on a region-wise basis. Furthermore, we implemented a targeted post-processing strategy based on a cross-validated threshold search to improve the segmentation results for tumor sub-regions. The evaluation of our proposed method on unseen test cases for the three tasks resulted in lesion-wise Dice scores for PED: 0.653, 0.809, 0.826; MEN: 0.876, 0.867, 0.849; and MET: 0.555, 0.6, 0.58; for the enhancing tumor, tumor core, and whole tumor, respectively. Our method was ranked first for PED, third for MEN, and fourth for MET, respectively.
{"title":"Model Ensemble for Brain Tumor Segmentation in Magnetic Resonance Imaging","authors":"Daniel Capellán-Martín, Zhifan Jiang, Abhijeet Parida, Xinyang Liu, Van Lam, Hareem Nisar, Austin Tapp, Sarah Elsharkawi, Maria J. Ledesma-Carbayo, Syed Muhammad Anwar, Marius George Linguraru","doi":"arxiv-2409.08232","DOIUrl":"https://doi.org/arxiv-2409.08232","url":null,"abstract":"Segmenting brain tumors in multi-parametric magnetic resonance imaging\u0000enables performing quantitative analysis in support of clinical trials and\u0000personalized patient care. This analysis provides the potential to impact\u0000clinical decision-making processes, including diagnosis and prognosis. In 2023,\u0000the well-established Brain Tumor Segmentation (BraTS) challenge presented a\u0000substantial expansion with eight tasks and 4,500 brain tumor cases. In this\u0000paper, we present a deep learning-based ensemble strategy that is evaluated for\u0000newly included tumor cases in three tasks: pediatric brain tumors (PED),\u0000intracranial meningioma (MEN), and brain metastases (MET). In particular, we\u0000ensemble outputs from state-of-the-art nnU-Net and Swin UNETR models on a\u0000region-wise basis. Furthermore, we implemented a targeted post-processing\u0000strategy based on a cross-validated threshold search to improve the\u0000segmentation results for tumor sub-regions. The evaluation of our proposed\u0000method on unseen test cases for the three tasks resulted in lesion-wise Dice\u0000scores for PED: 0.653, 0.809, 0.826; MEN: 0.876, 0.867, 0.849; and MET: 0.555,\u00000.6, 0.58; for the enhancing tumor, tumor core, and whole tumor, respectively.\u0000Our method was ranked first for PED, third for MEN, and fourth for MET,\u0000respectively.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Baumann, Leonardo Ayala, Alexander Studier-Fischer, Jan Sellner, Berkin Özdemir, Karl-Friedrich Kowalewski, Slobodan Ilic, Silvia Seidlitz, Lena Maier-Hein
Hyperspectral imaging (HSI) is emerging as a promising novel imaging modality with various potential surgical applications. Currently available cameras, however, suffer from poor integration into the clinical workflow because they require the lights to be switched off, or the camera to be manually recalibrated as soon as lighting conditions change. Given this critical bottleneck, the contribution of this paper is threefold: (1) We demonstrate that dynamically changing lighting conditions in the operating room dramatically affect the performance of HSI applications, namely physiological parameter estimation, and surgical scene segmentation. (2) We propose a novel learning-based approach to automatically recalibrating hyperspectral images during surgery and show that it is sufficiently accurate to replace the tedious process of white reference-based recalibration. (3) Based on a total of 742 HSI cubes from a phantom, porcine models, and rats we show that our recalibration method not only outperforms previously proposed methods, but also generalizes across species, lighting conditions, and image processing tasks. Due to its simple workflow integration as well as high accuracy, speed, and generalization capabilities, our method could evolve as a central component in clinical surgical HSI.
{"title":"Deep intra-operative illumination calibration of hyperspectral cameras","authors":"Alexander Baumann, Leonardo Ayala, Alexander Studier-Fischer, Jan Sellner, Berkin Özdemir, Karl-Friedrich Kowalewski, Slobodan Ilic, Silvia Seidlitz, Lena Maier-Hein","doi":"arxiv-2409.07094","DOIUrl":"https://doi.org/arxiv-2409.07094","url":null,"abstract":"Hyperspectral imaging (HSI) is emerging as a promising novel imaging modality\u0000with various potential surgical applications. Currently available cameras,\u0000however, suffer from poor integration into the clinical workflow because they\u0000require the lights to be switched off, or the camera to be manually\u0000recalibrated as soon as lighting conditions change. Given this critical\u0000bottleneck, the contribution of this paper is threefold: (1) We demonstrate\u0000that dynamically changing lighting conditions in the operating room\u0000dramatically affect the performance of HSI applications, namely physiological\u0000parameter estimation, and surgical scene segmentation. (2) We propose a novel\u0000learning-based approach to automatically recalibrating hyperspectral images\u0000during surgery and show that it is sufficiently accurate to replace the tedious\u0000process of white reference-based recalibration. (3) Based on a total of 742 HSI\u0000cubes from a phantom, porcine models, and rats we show that our recalibration\u0000method not only outperforms previously proposed methods, but also generalizes\u0000across species, lighting conditions, and image processing tasks. Due to its\u0000simple workflow integration as well as high accuracy, speed, and generalization\u0000capabilities, our method could evolve as a central component in clinical\u0000surgical HSI.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenjun Li, Dian Yang, Shun Yao, Shuyue Wang, Ye Wu, Le Zhang, Qiannuo Li, Kang Ik Kevin Cho, Johanna Seitz-Holland, Lipeng Ning, Jon Haitz Legarreta, Yogesh Rathi, Carl-Fredrik Westin, Lauren J. O'Donnell, Nir A. Sochen, Ofer Pasternak, Fan Zhang
In this study, we developed an Evidence-based Ensemble Neural Network, namely EVENet, for anatomical brain parcellation using diffusion MRI. The key innovation of EVENet is the design of an evidential deep learning framework to quantify predictive uncertainty at each voxel during a single inference. Using EVENet, we obtained accurate parcellation and uncertainty estimates across different datasets from healthy and clinical populations and with different imaging acquisitions. The overall network includes five parallel subnetworks, where each is dedicated to learning the FreeSurfer parcellation for a certain diffusion MRI parameter. An evidence-based ensemble methodology is then proposed to fuse the individual outputs. We perform experimental evaluations on large-scale datasets from multiple imaging sources, including high-quality diffusion MRI data from healthy adults and clinically diffusion MRI data from participants with various brain diseases (schizophrenia, bipolar disorder, attention-deficit/hyperactivity disorder, Parkinson's disease, cerebral small vessel disease, and neurosurgical patients with brain tumors). Compared to several state-of-the-art methods, our experimental results demonstrate highly improved parcellation accuracy across the multiple testing datasets despite the differences in dMRI acquisition protocols and health conditions. Furthermore, thanks to the uncertainty estimation, our EVENet approach demonstrates a good ability to detect abnormal brain regions in patients with lesions, enhancing the interpretability and reliability of the segmentation results.
{"title":"EVENet: Evidence-based Ensemble Learning for Uncertainty-aware Brain Parcellation Using Diffusion MRI","authors":"Chenjun Li, Dian Yang, Shun Yao, Shuyue Wang, Ye Wu, Le Zhang, Qiannuo Li, Kang Ik Kevin Cho, Johanna Seitz-Holland, Lipeng Ning, Jon Haitz Legarreta, Yogesh Rathi, Carl-Fredrik Westin, Lauren J. O'Donnell, Nir A. Sochen, Ofer Pasternak, Fan Zhang","doi":"arxiv-2409.07020","DOIUrl":"https://doi.org/arxiv-2409.07020","url":null,"abstract":"In this study, we developed an Evidence-based Ensemble Neural Network, namely\u0000EVENet, for anatomical brain parcellation using diffusion MRI. The key\u0000innovation of EVENet is the design of an evidential deep learning framework to\u0000quantify predictive uncertainty at each voxel during a single inference. Using\u0000EVENet, we obtained accurate parcellation and uncertainty estimates across\u0000different datasets from healthy and clinical populations and with different\u0000imaging acquisitions. The overall network includes five parallel subnetworks,\u0000where each is dedicated to learning the FreeSurfer parcellation for a certain\u0000diffusion MRI parameter. An evidence-based ensemble methodology is then\u0000proposed to fuse the individual outputs. We perform experimental evaluations on\u0000large-scale datasets from multiple imaging sources, including high-quality\u0000diffusion MRI data from healthy adults and clinically diffusion MRI data from\u0000participants with various brain diseases (schizophrenia, bipolar disorder,\u0000attention-deficit/hyperactivity disorder, Parkinson's disease, cerebral small\u0000vessel disease, and neurosurgical patients with brain tumors). Compared to\u0000several state-of-the-art methods, our experimental results demonstrate highly\u0000improved parcellation accuracy across the multiple testing datasets despite the\u0000differences in dMRI acquisition protocols and health conditions. Furthermore,\u0000thanks to the uncertainty estimation, our EVENet approach demonstrates a good\u0000ability to detect abnormal brain regions in patients with lesions, enhancing\u0000the interpretability and reliability of the segmentation results.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}