arXiv - EE - Image and Video Processing最新文献_第7页

The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine 基于点云学习的 JPEG Pleno 编码标准：为人类和机器服务

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.08130

André F. R. GuardaInstituto de Telecomunicações, Lisbon, Portugal, Nuno M. M. RodriguesInstituto de Telecomunicações, Lisbon, PortugalESTG, Politécnico de Leiria, Leiria, Portugal, Fernando PereiraInstituto de Telecomunicações, Lisbon, PortugalInstituto Superior Técnico - Universidade de Lisboa, Lisbon, Portugal

Efficient point cloud coding has become increasingly critical for multipleapplications such as virtual reality, autonomous driving, and digital twinsystems, where rich and interactive 3D data representations may functionallymake the difference. Deep learning has emerged as a powerful tool in thisdomain, offering advanced techniques for compressing point clouds moreefficiently than conventional coding methods while also allowing effectivecomputer vision tasks performed in the compressed domain thus, for the firsttime, making available a common compressed visual representation effective forboth man and machine. Taking advantage of this potential, JPEG has recentlyfinalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standardoffering efficient lossy coding of static point clouds, targeting both humanvisualization and machine processing by leveraging deep learning models forgeometry and color coding. The geometry is processed directly in its original3D form using sparse convolutional neural networks, while the color data isprojected onto 2D images and encoded using the also learning-based JPEG AIstandard. The goal of this paper is to provide a complete technical descriptionof the JPEG PCC standard, along with a thorough benchmarking of its performanceagainst the state-of-the-art, while highlighting its main strengths andweaknesses. In terms of compression performance, JPEG PCC outperforms theconventional MPEG PCC standards, especially in geometry coding, achievingsignificant rate reductions. Color compression performance is less competitivebut this is overcome by the power of a full learning-based coding framework forboth geometry and color and the associated effective compressed domainprocessing.

高效的点云编码对于虚拟现实、自动驾驶和数字孪生系统等多种应用越来越重要，在这些应用中，丰富的交互式三维数据表示可能会在功能上起到决定性作用。深度学习已成为这一领域的有力工具，它提供了比传统编码方法更高效的先进点云压缩技术，同时还允许在压缩域中执行有效的计算机视觉任务，从而首次提供了一种对人和机器都有效的通用压缩视觉表示法。利用这一潜力，JPEG 最近确定了基于深度学习模型的 JPEG 点云编码（PCC）标准，提供高效的静态点云有损编码，通过利用几何和颜色编码，同时针对人类视觉和机器处理。几何图形使用稀疏卷积神经网络直接以原始三维形式进行处理，而颜色数据则投射到二维图像上，并使用基于学习的 JPEG AI 标准进行编码。本文的目的是对 JPEG PCC 标准进行完整的技术描述，并对其性能与最先进标准进行全面的基准测试，同时强调其主要优缺点。在压缩性能方面，JPEG PCC 优于传统的 MPEG PCC 标准，特别是在几何编码方面，实现了显著的速率降低。色彩压缩性能的竞争力较弱，但基于学习的几何和色彩完全编码框架以及相关的有效压缩域处理功能克服了这一问题。

{"title":"The JPEG Pleno Learning-based Point Cloud Coding Standard: Serving Man and Machine","authors":"André F. R. GuardaInstituto de Telecomunicações, Lisbon, Portugal, Nuno M. M. RodriguesInstituto de Telecomunicações, Lisbon, PortugalESTG, Politécnico de Leiria, Leiria, Portugal, Fernando PereiraInstituto de Telecomunicações, Lisbon, PortugalInstituto Superior Técnico - Universidade de Lisboa, Lisbon, Portugal","doi":"arxiv-2409.08130","DOIUrl":"https://doi.org/arxiv-2409.08130","url":null,"abstract":"Efficient point cloud coding has become increasingly critical for multiple\u0000applications such as virtual reality, autonomous driving, and digital twin\u0000systems, where rich and interactive 3D data representations may functionally\u0000make the difference. Deep learning has emerged as a powerful tool in this\u0000domain, offering advanced techniques for compressing point clouds more\u0000efficiently than conventional coding methods while also allowing effective\u0000computer vision tasks performed in the compressed domain thus, for the first\u0000time, making available a common compressed visual representation effective for\u0000both man and machine. Taking advantage of this potential, JPEG has recently\u0000finalized the JPEG Pleno Learning-based Point Cloud Coding (PCC) standard\u0000offering efficient lossy coding of static point clouds, targeting both human\u0000visualization and machine processing by leveraging deep learning models for\u0000geometry and color coding. The geometry is processed directly in its original\u00003D form using sparse convolutional neural networks, while the color data is\u0000projected onto 2D images and encoded using the also learning-based JPEG AI\u0000standard. The goal of this paper is to provide a complete technical description\u0000of the JPEG PCC standard, along with a thorough benchmarking of its performance\u0000against the state-of-the-art, while highlighting its main strengths and\u0000weaknesses. In terms of compression performance, JPEG PCC outperforms the\u0000conventional MPEG PCC standards, especially in geometry coding, achieving\u0000significant rate reductions. Color compression performance is less competitive\u0000but this is overcome by the power of a full learning-based coding framework for\u0000both geometry and color and the associated effective compressed domain\u0000processing.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AD-Lite Net: A Lightweight and Concatenated CNN Model for Alzheimer's Detection from MRI Images AD-Lite Net：从核磁共振成像图像中检测阿尔茨海默氏症的轻量级简并 CNN 模型

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.08170

Santanu Roy, Archit Gupta, Shubhi Tiwari, Palak Sahu

Alzheimer's Disease (AD) is a non-curable progressive neurodegenerativedisorder that affects the human brain, leading to a decline in memory,cognitive abilities, and eventually, the ability to carry out daily tasks.Manual diagnosis of Alzheimer's disease from MRI images is fraught with lesssensitivity and it is a very tedious process for neurologists. Therefore, thereis a need for an automatic Computer Assisted Diagnosis (CAD) system, which candetect AD at early stages with higher accuracy. In this research, we haveproposed a novel AD-Lite Net model (trained from scratch), that could alleviatethe aforementioned problem. The novelties we bring here in this research are,(I) We have proposed a very lightweight CNN model by incorporating Depth WiseSeparable Convolutional (DWSC) layers and Global Average Pooling (GAP) layers.(II) We have leveraged a ``parallel concatenation block'' (pcb), in theproposed AD-Lite Net model. This pcb consists of a Transformation layer(Tx-layer), followed by two convolutional layers, which are therebyconcatenated with the original base model. This Tx-layer converts the featuresinto very distinct kind of features, which are imperative for the Alzheimer'sdisease. As a consequence, the proposed AD-Lite Net model with ``parallelconcatenation'' converges faster and automatically mitigates the classimbalance problem from the MRI datasets in a very generalized way. For thevalidity of our proposed model, we have implemented it on three different MRIdatasets. Furthermore, we have combined the ADNI and AD datasets andsubsequently performed a 10-fold cross-validation experiment to verify themodel's generalization ability. Extensive experimental results showed that ourproposed model has outperformed all the existing CNN models, and one recenttrend Vision Transformer (ViT) model by a significant margin.

阿尔茨海默病（Alzheimer's Disease，AD）是一种无法治愈的渐进性神经退行性疾病，会影响人的大脑，导致记忆力和认知能力下降，最终影响日常工作能力。因此，需要一种自动计算机辅助诊断（CAD）系统，它能在早期阶段更准确地检测出阿尔茨海默病。在这项研究中，我们提出了一个新颖的 AD-Lite Net 模型（从零开始训练），可以缓解上述问题。本研究的新颖之处在于：(I) 我们提出了一种非常轻量级的 CNN 模型，该模型包含深度可分离卷积（DWSC）层和全局平均池化（GAP）层。该模块由一个转换层（Tx-layer）和两个卷积层（concatenated with the original base model）组成。Tx 层将特征转换成非常独特的特征，而这些特征对于阿尔茨海默病来说是必不可少的。因此，所提出的具有 "并行合并 "功能的 AD-Lite Net 模型收敛速度更快，并能以非常通用的方式自动缓解核磁共振成像数据集的类别不平衡问题。为了证明我们提出的模型的有效性，我们在三个不同的磁共振成像数据集上实施了该模型。此外，我们还结合了 ADNI 和 AD 数据集，随后进行了 10 倍交叉验证实验，以验证模型的泛化能力。广泛的实验结果表明，我们提出的模型性能明显优于现有的所有 CNN 模型和最近流行的一个视觉转换器（ViT）模型。

{"title":"AD-Lite Net: A Lightweight and Concatenated CNN Model for Alzheimer's Detection from MRI Images","authors":"Santanu Roy, Archit Gupta, Shubhi Tiwari, Palak Sahu","doi":"arxiv-2409.08170","DOIUrl":"https://doi.org/arxiv-2409.08170","url":null,"abstract":"Alzheimer's Disease (AD) is a non-curable progressive neurodegenerative\u0000disorder that affects the human brain, leading to a decline in memory,\u0000cognitive abilities, and eventually, the ability to carry out daily tasks.\u0000Manual diagnosis of Alzheimer's disease from MRI images is fraught with less\u0000sensitivity and it is a very tedious process for neurologists. Therefore, there\u0000is a need for an automatic Computer Assisted Diagnosis (CAD) system, which can\u0000detect AD at early stages with higher accuracy. In this research, we have\u0000proposed a novel AD-Lite Net model (trained from scratch), that could alleviate\u0000the aforementioned problem. The novelties we bring here in this research are,\u0000(I) We have proposed a very lightweight CNN model by incorporating Depth Wise\u0000Separable Convolutional (DWSC) layers and Global Average Pooling (GAP) layers.\u0000(II) We have leveraged a ``parallel concatenation block'' (pcb), in the\u0000proposed AD-Lite Net model. This pcb consists of a Transformation layer\u0000(Tx-layer), followed by two convolutional layers, which are thereby\u0000concatenated with the original base model. This Tx-layer converts the features\u0000into very distinct kind of features, which are imperative for the Alzheimer's\u0000disease. As a consequence, the proposed AD-Lite Net model with ``parallel\u0000concatenation'' converges faster and automatically mitigates the class\u0000imbalance problem from the MRI datasets in a very generalized way. For the\u0000validity of our proposed model, we have implemented it on three different MRI\u0000datasets. Furthermore, we have combined the ADNI and AD datasets and\u0000subsequently performed a 10-fold cross-validation experiment to verify the\u0000model's generalization ability. Extensive experimental results showed that our\u0000proposed model has outperformed all the existing CNN models, and one recent\u0000trend Vision Transformer (ViT) model by a significant margin.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"157 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer 梦幻野兽利用部分感知知识转移提炼 3D 梦幻动物

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.08271

Runjia Li, Junlin Han, Luke Melas-Kyriazi, Chunyi Sun, Zhaochong An, Zhongrui Gui, Shuyang Sun, Philip Torr, Tomas Jakab

We present DreamBeast, a novel method based on score distillation sampling(SDS) for generating fantastical 3D animal assets composed of distinct parts.Existing SDS methods often struggle with this generation task due to a limitedunderstanding of part-level semantics in text-to-image diffusion models. Whilerecent diffusion models, such as Stable Diffusion 3, demonstrate a betterpart-level understanding, they are prohibitively slow and exhibit other commonproblems associated with single-view diffusion models. DreamBeast overcomesthis limitation through a novel part-aware knowledge transfer mechanism. Foreach generated asset, we efficiently extract part-level knowledge from theStable Diffusion 3 model into a 3D Part-Affinity implicit representation. Thisenables us to instantly generate Part-Affinity maps from arbitrary cameraviews, which we then use to modulate the guidance of a multi-view diffusionmodel during SDS to create 3D assets of fantastical animals. DreamBeastsignificantly enhances the quality of generated 3D creatures withuser-specified part compositions while reducing computational overhead, asdemonstrated by extensive quantitative and qualitative evaluations.

由于对文本到图像扩散模型中部件级语义的理解有限，现有的 SDS 方法往往难以完成这一生成任务。虽然新近的扩散模型（如稳定扩散 3）展示了较好的部件级理解，但它们的速度太慢，并表现出与单视角扩散模型相关的其他常见问题。DreamBeast 通过一种新颖的部分感知知识转移机制克服了这一局限。对于每个生成的资产，我们都能高效地从稳定扩散 3 模型中提取部件级知识，并将其转化为三维部件-亲和性隐式表示。这样，我们就能从任意的摄像机视图中即时生成 "部分-亲和性 "图，然后在 SDS 过程中用它来调节多视图扩散模型的引导，从而创建出奇幻动物的三维资产。正如大量定量和定性评估所证明的那样，DreamBeasts 显著提高了根据用户指定的部件组成生成的三维动物的质量，同时降低了计算开销。

{"title":"DreamBeast: Distilling 3D Fantastical Animals with Part-Aware Knowledge Transfer","authors":"Runjia Li, Junlin Han, Luke Melas-Kyriazi, Chunyi Sun, Zhaochong An, Zhongrui Gui, Shuyang Sun, Philip Torr, Tomas Jakab","doi":"arxiv-2409.08271","DOIUrl":"https://doi.org/arxiv-2409.08271","url":null,"abstract":"We present DreamBeast, a novel method based on score distillation sampling\u0000(SDS) for generating fantastical 3D animal assets composed of distinct parts.\u0000Existing SDS methods often struggle with this generation task due to a limited\u0000understanding of part-level semantics in text-to-image diffusion models. While\u0000recent diffusion models, such as Stable Diffusion 3, demonstrate a better\u0000part-level understanding, they are prohibitively slow and exhibit other common\u0000problems associated with single-view diffusion models. DreamBeast overcomes\u0000this limitation through a novel part-aware knowledge transfer mechanism. For\u0000each generated asset, we efficiently extract part-level knowledge from the\u0000Stable Diffusion 3 model into a 3D Part-Affinity implicit representation. This\u0000enables us to instantly generate Part-Affinity maps from arbitrary camera\u0000views, which we then use to modulate the guidance of a multi-view diffusion\u0000model during SDS to create 3D assets of fantastical animals. DreamBeast\u0000significantly enhances the quality of generated 3D creatures with\u0000user-specified part compositions while reducing computational overhead, as\u0000demonstrated by extensive quantitative and qualitative evaluations.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

OCTAMamba: A State-Space Model Approach for Precision OCTA Vasculature Segmentation OCTAMamba：用于精确 OCTA 血管分割的状态空间模型方法

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.08000

Shun Zou, Zhuo Zhang, Guangwei Gao

Optical Coherence Tomography Angiography (OCTA) is a crucial imagingtechnique for visualizing retinal vasculature and diagnosing eye diseases suchas diabetic retinopathy and glaucoma. However, precise segmentation of OCTAvasculature remains challenging due to the multi-scale vessel structures andnoise from poor image quality and eye lesions. In this study, we proposedOCTAMamba, a novel U-shaped network based on the Mamba architecture, designedto segment vasculature in OCTA accurately. OCTAMamba integrates a Quad StreamEfficient Mining Embedding Module for local feature extraction, a Multi-ScaleDilated Asymmetric Convolution Module to capture multi-scale vasculature, and aFocused Feature Recalibration Module to filter noise and highlight targetareas. Our method achieves efficient global modeling and local featureextraction while maintaining linear complexity, making it suitable forlow-computation medical applications. Extensive experiments on the OCTA 3M,OCTA 6M, and ROSSA datasets demonstrated that OCTAMamba outperformsstate-of-the-art methods, providing a new reference for efficient OCTAsegmentation. Code is available at https://github.com/zs1314/OCTAMamba

光学相干断层扫描血管成像（OCTA）是观察视网膜血管和诊断糖尿病视网膜病变和青光眼等眼科疾病的重要成像技术。然而，由于多尺度血管结构以及图像质量差和眼部病变造成的噪声，对 OCTA 血管进行精确分割仍具有挑战性。在这项研究中，我们提出了基于 Mamba 架构的新型 U 形网络 OCTAMamba，旨在精确分割 OCTA 中的血管。OCTAMamba 集成了用于局部特征提取的 Quad StreamEfficient Mining Embedding 模块、用于捕捉多尺度脉管的 Multi-ScaleDilated Asymmetric Convolution 模块以及用于过滤噪声和突出目标区域的 Focused Feature Recalibration 模块。我们的方法在保持线性复杂度的同时，实现了高效的全局建模和局部特征提取，使其适用于低运算量的医疗应用。在 OCTA 3M、OCTA 6M 和 ROSSA 数据集上的广泛实验表明，OCTAMamba 的性能优于最先进的方法，为高效的 OCTA 分割提供了新的参考。代码见 https://github.com/zs1314/OCTAMamba

{"title":"OCTAMamba: A State-Space Model Approach for Precision OCTA Vasculature Segmentation","authors":"Shun Zou, Zhuo Zhang, Guangwei Gao","doi":"arxiv-2409.08000","DOIUrl":"https://doi.org/arxiv-2409.08000","url":null,"abstract":"Optical Coherence Tomography Angiography (OCTA) is a crucial imaging\u0000technique for visualizing retinal vasculature and diagnosing eye diseases such\u0000as diabetic retinopathy and glaucoma. However, precise segmentation of OCTA\u0000vasculature remains challenging due to the multi-scale vessel structures and\u0000noise from poor image quality and eye lesions. In this study, we proposed\u0000OCTAMamba, a novel U-shaped network based on the Mamba architecture, designed\u0000to segment vasculature in OCTA accurately. OCTAMamba integrates a Quad Stream\u0000Efficient Mining Embedding Module for local feature extraction, a Multi-Scale\u0000Dilated Asymmetric Convolution Module to capture multi-scale vasculature, and a\u0000Focused Feature Recalibration Module to filter noise and highlight target\u0000areas. Our method achieves efficient global modeling and local feature\u0000extraction while maintaining linear complexity, making it suitable for\u0000low-computation medical applications. Extensive experiments on the OCTA 3M,\u0000OCTA 6M, and ROSSA datasets demonstrated that OCTAMamba outperforms\u0000state-of-the-art methods, providing a new reference for efficient OCTA\u0000segmentation. Code is available at https://github.com/zs1314/OCTAMamba","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effective Segmentation of Post-Treatment Gliomas Using Simple Approaches: Artificial Sequence Generation and Ensemble Models 使用简单方法有效分割治疗后胶质瘤：人工序列生成和集合模型

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.08143

Heejong Kim, Leo Milecki, Mina C Moghadam, Fengbei Liu, Minh Nguyen, Eric Qiu, Abhishek Thanki, Mert R Sabuncu

Segmentation is a crucial task in the medical imaging field and is often animportant primary step or even a prerequisite to the analysis of medicalvolumes. Yet treatments such as surgery complicate the accurate delineation ofregions of interest. The BraTS Post-Treatment 2024 Challenge published thefirst public dataset for post-surgery glioma segmentation and addresses theaforementioned issue by fostering the development of automated segmentationtools for glioma in MRI data. In this effort, we propose two straightforwardapproaches to enhance the segmentation performances of deep learning-basedmethodologies. First, we incorporate an additional input based on a simplelinear combination of the available MRI sequences input, which highlightsenhancing tumors. Second, we employ various ensembling methods to weigh thecontribution of a battery of models. Our results demonstrate that theseapproaches significantly improve segmentation performance compared to baselinemodels, underscoring the effectiveness of these simple approaches in improvingmedical image segmentation tasks.

分割是医学影像领域的一项关键任务，通常是重要的第一步，甚至是分析医学图像的先决条件。然而，手术等治疗方法使得准确划分相关区域变得更加复杂。BraTS 治疗后 2024 挑战赛发布了首个用于手术后胶质瘤分割的公开数据集，并通过促进磁共振成像数据中胶质瘤自动分割工具的开发来解决上述问题。在这项工作中，我们提出了两种直接的方法来提高基于深度学习的方法的分割性能。首先，我们在现有核磁共振成像序列输入的简单线性组合基础上加入了额外的输入，从而突出了增大的肿瘤。其次，我们采用各种集合方法来权衡一系列模型的贡献。我们的结果表明，与基线模型相比，这些方法显著提高了分割性能，凸显了这些简单方法在改进医学图像分割任务方面的有效性。

{"title":"Effective Segmentation of Post-Treatment Gliomas Using Simple Approaches: Artificial Sequence Generation and Ensemble Models","authors":"Heejong Kim, Leo Milecki, Mina C Moghadam, Fengbei Liu, Minh Nguyen, Eric Qiu, Abhishek Thanki, Mert R Sabuncu","doi":"arxiv-2409.08143","DOIUrl":"https://doi.org/arxiv-2409.08143","url":null,"abstract":"Segmentation is a crucial task in the medical imaging field and is often an\u0000important primary step or even a prerequisite to the analysis of medical\u0000volumes. Yet treatments such as surgery complicate the accurate delineation of\u0000regions of interest. The BraTS Post-Treatment 2024 Challenge published the\u0000first public dataset for post-surgery glioma segmentation and addresses the\u0000aforementioned issue by fostering the development of automated segmentation\u0000tools for glioma in MRI data. In this effort, we propose two straightforward\u0000approaches to enhance the segmentation performances of deep learning-based\u0000methodologies. First, we incorporate an additional input based on a simple\u0000linear combination of the available MRI sequences input, which highlights\u0000enhancing tumors. Second, we employ various ensembling methods to weigh the\u0000contribution of a battery of models. Our results demonstrate that these\u0000approaches significantly improve segmentation performance compared to baseline\u0000models, underscoring the effectiveness of these simple approaches in improving\u0000medical image segmentation tasks.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement 用于视网膜眼底图像增强的情境感知优化传输学习

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.07862

Vamsi Krishna Vasa, Peijie Qiu, Wenhui Zhu, Yujian Xiong, Oana Dumitrascu, Yalin Wang

Retinal fundus photography offers a non-invasive way to diagnose and monitora variety of retinal diseases, but is prone to inherent quality glitchesarising from systemic imperfections or operator/patient-related factors.However, high-quality retinal images are crucial for carrying out accuratediagnoses and automated analyses. The fundus image enhancement is typicallyformulated as a distribution alignment problem, by finding a one-to-one mappingbetween a low-quality image and its high-quality counterpart. This paperproposes a context-informed optimal transport (OT) learning framework fortackling unpaired fundus image enhancement. In contrast to standard generativeimage enhancement methods, which struggle with handling contextual information(e.g., over-tampered local structures and unwanted artifacts), the proposedcontext-aware OT learning paradigm better preserves local structures andminimizes unwanted artifacts. Leveraging deep contextual features, we derivethe proposed context-aware OT using the earth mover's distance and show thatthe proposed context-OT has a solid theoretical guarantee. Experimental resultson a large-scale dataset demonstrate the superiority of the proposed methodover several state-of-the-art supervised and unsupervised methods in terms ofsignal-to-noise ratio, structural similarity index, as well as two downstreamtasks. The code is available aturl{https://github.com/Retinal-Research/Contextual-OT}.

视网膜眼底摄影为诊断和监测各种视网膜疾病提供了一种无创方法，但由于系统缺陷或操作员/患者相关因素，容易产生固有的质量问题。然而，高质量的视网膜图像对于进行准确诊断和自动分析至关重要。眼底图像增强通常被表述为分布对齐问题，即在低质量图像和高质量图像之间找到一一对应的映射关系。本文提出了一种基于上下文的最优传输（OT）学习框架，用于解决无配对眼底图像增强问题。标准的生成式图像增强方法在处理上下文信息（如过度篡改的局部结构和不需要的伪影）方面存在困难，与之相比，本文提出的上下文感知 OT 学习范式能更好地保留局部结构，并最大限度地减少不需要的伪影。利用深度上下文特征，我们利用地球移动距离推导出了所提出的上下文感知 OT，并证明所提出的上下文 OT 具有坚实的理论保证。在大规模数据集上的实验结果表明，在信噪比、结构相似性指数以及两个下游任务方面，提出的方法优于几种最先进的有监督和无监督方法。代码可在（url{https://github.com/Retinal-Research/Contextual-OT}.

{"title":"Context-Aware Optimal Transport Learning for Retinal Fundus Image Enhancement","authors":"Vamsi Krishna Vasa, Peijie Qiu, Wenhui Zhu, Yujian Xiong, Oana Dumitrascu, Yalin Wang","doi":"arxiv-2409.07862","DOIUrl":"https://doi.org/arxiv-2409.07862","url":null,"abstract":"Retinal fundus photography offers a non-invasive way to diagnose and monitor\u0000a variety of retinal diseases, but is prone to inherent quality glitches\u0000arising from systemic imperfections or operator/patient-related factors.\u0000However, high-quality retinal images are crucial for carrying out accurate\u0000diagnoses and automated analyses. The fundus image enhancement is typically\u0000formulated as a distribution alignment problem, by finding a one-to-one mapping\u0000between a low-quality image and its high-quality counterpart. This paper\u0000proposes a context-informed optimal transport (OT) learning framework for\u0000tackling unpaired fundus image enhancement. In contrast to standard generative\u0000image enhancement methods, which struggle with handling contextual information\u0000(e.g., over-tampered local structures and unwanted artifacts), the proposed\u0000context-aware OT learning paradigm better preserves local structures and\u0000minimizes unwanted artifacts. Leveraging deep contextual features, we derive\u0000the proposed context-aware OT using the earth mover's distance and show that\u0000the proposed context-OT has a solid theoretical guarantee. Experimental results\u0000on a large-scale dataset demonstrate the superiority of the proposed method\u0000over several state-of-the-art supervised and unsupervised methods in terms of\u0000signal-to-noise ratio, structural similarity index, as well as two downstream\u0000tasks. The code is available at\u0000url{https://github.com/Retinal-Research/Contextual-OT}.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AutoPET Challenge: Tumour Synthesis for Data Augmentation AutoPET 挑战：用于数据增强的肿瘤合成

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.08068

Lap Yan Lennon Chan, Chenxin Li, Yixuan Yuan

Accurate lesion segmentation in whole-body PET/CT scans is crucial for cancerdiagnosis and treatment planning, but limited datasets often hinder theperformance of automated segmentation models. In this paper, we explore thepotential of leveraging the deep prior from a generative model to serve as adata augmenter for automated lesion segmentation in PET/CT scans. We adapt theDiffTumor method, originally designed for CT images, to generate syntheticPET-CT images with lesions. Our approach trains the generative model on theAutoPET dataset and uses it to expand the training data. We then compare theperformance of segmentation models trained on the original and augmenteddatasets. Our findings show that the model trained on the augmented datasetachieves a higher Dice score, demonstrating the potential of our dataaugmentation approach. In a nutshell, this work presents a promising directionfor improving lesion segmentation in whole-body PET/CT scans with limiteddatasets, potentially enhancing the accuracy and reliability of cancerdiagnostics.

全身 PET/CT 扫描中准确的病灶分割对癌症诊断和治疗计划至关重要，但有限的数据集往往会阻碍自动分割模型的性能。在本文中，我们探索了利用生成模型的深度先验作为 PET/CT 扫描中病灶自动分割的数据增强器的潜力。我们调整了最初为 CT 图像设计的 DiffTumor 方法，以生成带有病灶的合成 PET-CT 图像。我们的方法在自动 PET 数据集上训练生成模型，并用它来扩展训练数据。然后，我们比较了在原始数据集和增强数据集上训练的分割模型的性能。我们的研究结果表明，在扩增数据集上训练的模型获得了更高的 Dice 分数，证明了我们的数据扩增方法的潜力。总之，这项工作为改进数据集有限的全身 PET/CT 扫描中的病灶分割提供了一个很有前景的方向，有可能提高癌症诊断的准确性和可靠性。

引用次数: 0

Model Ensemble for Brain Tumor Segmentation in Magnetic Resonance Imaging 磁共振成像中的脑肿瘤分割模型组合

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-12 DOI: arxiv-2409.08232

Daniel Capellán-Martín, Zhifan Jiang, Abhijeet Parida, Xinyang Liu, Van Lam, Hareem Nisar, Austin Tapp, Sarah Elsharkawi, Maria J. Ledesma-Carbayo, Syed Muhammad Anwar, Marius George Linguraru

Segmenting brain tumors in multi-parametric magnetic resonance imagingenables performing quantitative analysis in support of clinical trials andpersonalized patient care. This analysis provides the potential to impactclinical decision-making processes, including diagnosis and prognosis. In 2023,the well-established Brain Tumor Segmentation (BraTS) challenge presented asubstantial expansion with eight tasks and 4,500 brain tumor cases. In thispaper, we present a deep learning-based ensemble strategy that is evaluated fornewly included tumor cases in three tasks: pediatric brain tumors (PED),intracranial meningioma (MEN), and brain metastases (MET). In particular, weensemble outputs from state-of-the-art nnU-Net and Swin UNETR models on aregion-wise basis. Furthermore, we implemented a targeted post-processingstrategy based on a cross-validated threshold search to improve thesegmentation results for tumor sub-regions. The evaluation of our proposedmethod on unseen test cases for the three tasks resulted in lesion-wise Dicescores for PED: 0.653, 0.809, 0.826; MEN: 0.876, 0.867, 0.849; and MET: 0.555,0.6, 0.58; for the enhancing tumor, tumor core, and whole tumor, respectively.Our method was ranked first for PED, third for MEN, and fourth for MET,respectively.

通过对多参数磁共振成像中的脑肿瘤进行分段，可以进行定量分析，为临床试验和个性化病人护理提供支持。这种分析有可能影响临床决策过程，包括诊断和预后。2023 年，成熟的脑肿瘤分割（BraTS）挑战赛有了实质性的扩展，共有八项任务和 4500 个脑肿瘤病例。在本文中，我们提出了一种基于深度学习的集合策略，并针对新纳入的三个任务中的肿瘤病例进行了评估：小儿脑肿瘤（PED）、颅内脑膜瘤（MEN）和脑转移瘤（MET）。特别是，我们将最先进的 nnU-Net 和 Swin UNETR 模型的输出按区域进行了组合。此外，我们还在交叉验证阈值搜索的基础上实施了有针对性的后处理策略，以改进这些肿瘤子区域的分割结果。在对三个任务的未见测试案例进行评估后，我们提出的方法在增强肿瘤、肿瘤核心和整个肿瘤方面的病灶 Dicescores 分别为：PED：0.653、0.809、0.826；MEN：0.876、0.867、0.849；MET：0.555、0.6、0.58。

{"title":"Model Ensemble for Brain Tumor Segmentation in Magnetic Resonance Imaging","authors":"Daniel Capellán-Martín, Zhifan Jiang, Abhijeet Parida, Xinyang Liu, Van Lam, Hareem Nisar, Austin Tapp, Sarah Elsharkawi, Maria J. Ledesma-Carbayo, Syed Muhammad Anwar, Marius George Linguraru","doi":"arxiv-2409.08232","DOIUrl":"https://doi.org/arxiv-2409.08232","url":null,"abstract":"Segmenting brain tumors in multi-parametric magnetic resonance imaging\u0000enables performing quantitative analysis in support of clinical trials and\u0000personalized patient care. This analysis provides the potential to impact\u0000clinical decision-making processes, including diagnosis and prognosis. In 2023,\u0000the well-established Brain Tumor Segmentation (BraTS) challenge presented a\u0000substantial expansion with eight tasks and 4,500 brain tumor cases. In this\u0000paper, we present a deep learning-based ensemble strategy that is evaluated for\u0000newly included tumor cases in three tasks: pediatric brain tumors (PED),\u0000intracranial meningioma (MEN), and brain metastases (MET). In particular, we\u0000ensemble outputs from state-of-the-art nnU-Net and Swin UNETR models on a\u0000region-wise basis. Furthermore, we implemented a targeted post-processing\u0000strategy based on a cross-validated threshold search to improve the\u0000segmentation results for tumor sub-regions. The evaluation of our proposed\u0000method on unseen test cases for the three tasks resulted in lesion-wise Dice\u0000scores for PED: 0.653, 0.809, 0.826; MEN: 0.876, 0.867, 0.849; and MET: 0.555,\u00000.6, 0.58; for the enhancing tumor, tumor core, and whole tumor, respectively.\u0000Our method was ranked first for PED, third for MEN, and fourth for MET,\u0000respectively.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep intra-operative illumination calibration of hyperspectral cameras 高光谱相机的深度术中照明校准

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07094

Alexander Baumann, Leonardo Ayala, Alexander Studier-Fischer, Jan Sellner, Berkin Özdemir, Karl-Friedrich Kowalewski, Slobodan Ilic, Silvia Seidlitz, Lena Maier-Hein

Hyperspectral imaging (HSI) is emerging as a promising novel imaging modalitywith various potential surgical applications. Currently available cameras,however, suffer from poor integration into the clinical workflow because theyrequire the lights to be switched off, or the camera to be manuallyrecalibrated as soon as lighting conditions change. Given this criticalbottleneck, the contribution of this paper is threefold: (1) We demonstratethat dynamically changing lighting conditions in the operating roomdramatically affect the performance of HSI applications, namely physiologicalparameter estimation, and surgical scene segmentation. (2) We propose a novellearning-based approach to automatically recalibrating hyperspectral imagesduring surgery and show that it is sufficiently accurate to replace the tediousprocess of white reference-based recalibration. (3) Based on a total of 742 HSIcubes from a phantom, porcine models, and rats we show that our recalibrationmethod not only outperforms previously proposed methods, but also generalizesacross species, lighting conditions, and image processing tasks. Due to itssimple workflow integration as well as high accuracy, speed, and generalizationcapabilities, our method could evolve as a central component in clinicalsurgical HSI.

高光谱成像（HSI）正在成为一种前景广阔的新型成像模式，具有各种潜在的外科应用前景。然而，目前可用的照相机与临床工作流程的整合性较差，因为它们需要关闭照明，或者一旦照明条件发生变化，就需要手动重新校准照相机。鉴于这一关键瓶颈，本文有三方面的贡献：(1) 我们证明了手术室中动态变化的照明条件极大地影响了 HSI 应用的性能，即生理参数估计和手术场景分割。(2) 我们提出了一种基于学习的新方法，用于在手术过程中自动重新校准高光谱图像，并证明该方法具有足够的准确性，可以取代繁琐的基于白色参照物的重新校准过程。(3) 基于来自人体模型、猪模型和大鼠的总共 742 个高光谱立方体，我们证明了我们的重新校准方法不仅优于之前提出的方法，而且还具有跨物种、照明条件和图像处理任务的通用性。由于其简单的工作流程集成以及高精确度、高速度和通用能力，我们的方法可以发展成为临床手术人脸成像的核心组件。

{"title":"Deep intra-operative illumination calibration of hyperspectral cameras","authors":"Alexander Baumann, Leonardo Ayala, Alexander Studier-Fischer, Jan Sellner, Berkin Özdemir, Karl-Friedrich Kowalewski, Slobodan Ilic, Silvia Seidlitz, Lena Maier-Hein","doi":"arxiv-2409.07094","DOIUrl":"https://doi.org/arxiv-2409.07094","url":null,"abstract":"Hyperspectral imaging (HSI) is emerging as a promising novel imaging modality\u0000with various potential surgical applications. Currently available cameras,\u0000however, suffer from poor integration into the clinical workflow because they\u0000require the lights to be switched off, or the camera to be manually\u0000recalibrated as soon as lighting conditions change. Given this critical\u0000bottleneck, the contribution of this paper is threefold: (1) We demonstrate\u0000that dynamically changing lighting conditions in the operating room\u0000dramatically affect the performance of HSI applications, namely physiological\u0000parameter estimation, and surgical scene segmentation. (2) We propose a novel\u0000learning-based approach to automatically recalibrating hyperspectral images\u0000during surgery and show that it is sufficiently accurate to replace the tedious\u0000process of white reference-based recalibration. (3) Based on a total of 742 HSI\u0000cubes from a phantom, porcine models, and rats we show that our recalibration\u0000method not only outperforms previously proposed methods, but also generalizes\u0000across species, lighting conditions, and image processing tasks. Due to its\u0000simple workflow integration as well as high accuracy, speed, and generalization\u0000capabilities, our method could evolve as a central component in clinical\u0000surgical HSI.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

EVENet: Evidence-based Ensemble Learning for Uncertainty-aware Brain Parcellation Using Diffusion MRI EVENet：利用弥散核磁共振成像进行基于证据的集合学习，以实现不确定性感知的大脑分层

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07020

Chenjun Li, Dian Yang, Shun Yao, Shuyue Wang, Ye Wu, Le Zhang, Qiannuo Li, Kang Ik Kevin Cho, Johanna Seitz-Holland, Lipeng Ning, Jon Haitz Legarreta, Yogesh Rathi, Carl-Fredrik Westin, Lauren J. O'Donnell, Nir A. Sochen, Ofer Pasternak, Fan Zhang

In this study, we developed an Evidence-based Ensemble Neural Network, namelyEVENet, for anatomical brain parcellation using diffusion MRI. The keyinnovation of EVENet is the design of an evidential deep learning framework toquantify predictive uncertainty at each voxel during a single inference. UsingEVENet, we obtained accurate parcellation and uncertainty estimates acrossdifferent datasets from healthy and clinical populations and with differentimaging acquisitions. The overall network includes five parallel subnetworks,where each is dedicated to learning the FreeSurfer parcellation for a certaindiffusion MRI parameter. An evidence-based ensemble methodology is thenproposed to fuse the individual outputs. We perform experimental evaluations onlarge-scale datasets from multiple imaging sources, including high-qualitydiffusion MRI data from healthy adults and clinically diffusion MRI data fromparticipants with various brain diseases (schizophrenia, bipolar disorder,attention-deficit/hyperactivity disorder, Parkinson's disease, cerebral smallvessel disease, and neurosurgical patients with brain tumors). Compared toseveral state-of-the-art methods, our experimental results demonstrate highlyimproved parcellation accuracy across the multiple testing datasets despite thedifferences in dMRI acquisition protocols and health conditions. Furthermore,thanks to the uncertainty estimation, our EVENet approach demonstrates a goodability to detect abnormal brain regions in patients with lesions, enhancingthe interpretability and reliability of the segmentation results.

在这项研究中，我们开发了一种基于证据的集合神经网络（Evidence-based Ensemble Neural Network，即EVENet），用于使用弥散核磁共振成像进行大脑解剖学划分。EVENet的关键创新之处在于设计了一个证据深度学习框架，用于在单次推理过程中量化每个体素的预测不确定性。利用 EVENet，我们在来自健康和临床人群的不同数据集以及不同的成像采集中获得了准确的分割和不确定性估计。整个网络包括五个并行的子网络，每个子网络专门用于学习某个扩散 MRI 参数的 FreeSurfer 解析。然后，我们提出了一种基于证据的集合方法来融合单个输出。我们在来自多个成像源的大规模数据集上进行了实验评估，这些数据集包括来自健康成年人的高质量弥散 MRI 数据和来自患有各种脑部疾病（精神分裂症、双相情感障碍、注意力缺陷/多动症、帕金森病、脑部小血管疾病和患有脑肿瘤的神经外科患者）的临床弥散 MRI 数据。与几种最先进的方法相比，我们的实验结果表明，尽管 dMRI 采集方案和健康状况存在差异，但在多个测试数据集中，我们的解析准确率得到了极大提高。此外，得益于不确定性估计，我们的 EVENet 方法能够很好地检测出病变患者的异常脑区，从而提高了分割结果的可解释性和可靠性。

{"title":"EVENet: Evidence-based Ensemble Learning for Uncertainty-aware Brain Parcellation Using Diffusion MRI","authors":"Chenjun Li, Dian Yang, Shun Yao, Shuyue Wang, Ye Wu, Le Zhang, Qiannuo Li, Kang Ik Kevin Cho, Johanna Seitz-Holland, Lipeng Ning, Jon Haitz Legarreta, Yogesh Rathi, Carl-Fredrik Westin, Lauren J. O'Donnell, Nir A. Sochen, Ofer Pasternak, Fan Zhang","doi":"arxiv-2409.07020","DOIUrl":"https://doi.org/arxiv-2409.07020","url":null,"abstract":"In this study, we developed an Evidence-based Ensemble Neural Network, namely\u0000EVENet, for anatomical brain parcellation using diffusion MRI. The key\u0000innovation of EVENet is the design of an evidential deep learning framework to\u0000quantify predictive uncertainty at each voxel during a single inference. Using\u0000EVENet, we obtained accurate parcellation and uncertainty estimates across\u0000different datasets from healthy and clinical populations and with different\u0000imaging acquisitions. The overall network includes five parallel subnetworks,\u0000where each is dedicated to learning the FreeSurfer parcellation for a certain\u0000diffusion MRI parameter. An evidence-based ensemble methodology is then\u0000proposed to fuse the individual outputs. We perform experimental evaluations on\u0000large-scale datasets from multiple imaging sources, including high-quality\u0000diffusion MRI data from healthy adults and clinically diffusion MRI data from\u0000participants with various brain diseases (schizophrenia, bipolar disorder,\u0000attention-deficit/hyperactivity disorder, Parkinson's disease, cerebral small\u0000vessel disease, and neurosurgical patients with brain tumors). Compared to\u0000several state-of-the-art methods, our experimental results demonstrate highly\u0000improved parcellation accuracy across the multiple testing datasets despite the\u0000differences in dMRI acquisition protocols and health conditions. Furthermore,\u0000thanks to the uncertainty estimation, our EVENet approach demonstrates a good\u0000ability to detect abnormal brain regions in patients with lesions, enhancing\u0000the interpretability and reliability of the segmentation results.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0