Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention最新文献_第7页

Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting Rad-ReStruct:一种结构化放射学报告的VQA基准和方法

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-11 DOI: 10.48550/arXiv.2307.05766

Chantal Pellegrini, Matthias Keicher, Ege Özsoy, N. Navab

Radiology reporting is a crucial part of the communication between radiologists and other medical professionals, but it can be time-consuming and error-prone. One approach to alleviate this is structured reporting, which saves time and enables a more accurate evaluation than free-text reports. However, there is limited research on automating structured reporting, and no public benchmark is available for evaluating and comparing different methods. To close this gap, we introduce Rad-ReStruct, a new benchmark dataset that provides fine-grained, hierarchically ordered annotations in the form of structured reports for X-Ray images. We model the structured reporting task as hierarchical visual question answering (VQA) and propose hi-VQA, a novel method that considers prior context in the form of previously asked questions and answers for populating a structured radiology report. Our experiments show that hi-VQA achieves competitive performance to the state-of-the-art on the medical VQA benchmark VQARad while performing best among methods without domain-specific vision-language pretraining and provides a strong baseline on Rad-ReStruct. Our work represents a significant step towards the automated population of structured radiology reports and provides a valuable first benchmark for future research in this area. Our dataset and code is available at https://github.com/ChantalMP/Rad-ReStruct.

放射学报告是放射科医生和其他医疗专业人员之间沟通的关键部分，但它可能耗时且容易出错。缓解这种情况的一种方法是结构化报告，它可以节省时间，并且比自由文本报告更准确地进行评估。然而，对自动化结构化报告的研究有限，并且没有公共基准可用于评估和比较不同的方法。为了缩小这一差距，我们引入了Rad-ReStruct，这是一个新的基准数据集，它以结构化报告的形式为x射线图像提供细粒度、分层有序的注释。我们将结构化报告任务建模为分层视觉问题回答(VQA)，并提出hi-VQA，这是一种新颖的方法，它以先前提出的问题和答案的形式考虑先前的上下文，以填充结构化放射学报告。我们的实验表明，hi-VQA在医疗VQA基准VQARad上达到了最先进的性能，同时在没有特定领域视觉语言预训练的方法中表现最好，并提供了一个强大的Rad-ReStruct基线。我们的工作代表了结构化放射学报告自动化的重要一步，并为该领域的未来研究提供了有价值的第一个基准。我们的数据集和代码可在https://github.com/ChantalMP/Rad-ReStruct上获得。

{"title":"Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting","authors":"Chantal Pellegrini, Matthias Keicher, Ege Özsoy, N. Navab","doi":"10.48550/arXiv.2307.05766","DOIUrl":"https://doi.org/10.48550/arXiv.2307.05766","url":null,"abstract":"Radiology reporting is a crucial part of the communication between radiologists and other medical professionals, but it can be time-consuming and error-prone. One approach to alleviate this is structured reporting, which saves time and enables a more accurate evaluation than free-text reports. However, there is limited research on automating structured reporting, and no public benchmark is available for evaluating and comparing different methods. To close this gap, we introduce Rad-ReStruct, a new benchmark dataset that provides fine-grained, hierarchically ordered annotations in the form of structured reports for X-Ray images. We model the structured reporting task as hierarchical visual question answering (VQA) and propose hi-VQA, a novel method that considers prior context in the form of previously asked questions and answers for populating a structured radiology report. Our experiments show that hi-VQA achieves competitive performance to the state-of-the-art on the medical VQA benchmark VQARad while performing best among methods without domain-specific vision-language pretraining and provides a strong baseline on Rad-ReStruct. Our work represents a significant step towards the automated population of structured radiology reports and provides a valuable first benchmark for future research in this area. Our dataset and code is available at https://github.com/ChantalMP/Rad-ReStruct.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"44 6","pages":"409-419"},"PeriodicalIF":0.0,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91431853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Weakly-supervised positional contrastive learning: application to cirrhosis classification 弱监督位置对比学习在肝硬化分类中的应用

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04617

Emma Sarfati, Alexandre Bône, Marc-Michel Roh'e, P. Gori, I. Bloch

Large medical imaging datasets can be cheaply and quickly annotated with low-confidence, weak labels (e.g., radiological scores). Access to high-confidence labels, such as histology-based diagnoses, is rare and costly. Pretraining strategies, like contrastive learning (CL) methods, can leverage unlabeled or weakly-annotated datasets. These methods typically require large batch sizes, which poses a difficulty in the case of large 3D images at full resolution, due to limited GPU memory. Nevertheless, volumetric positional information about the spatial context of each 2D slice can be very important for some medical applications. In this work, we propose an efficient weakly-supervised positional (WSP) contrastive learning strategy where we integrate both the spatial context of each 2D slice and a weak label via a generic kernel-based loss function. We illustrate our method on cirrhosis prediction using a large volume of weakly-labeled images, namely radiological low-confidence annotations, and small strongly-labeled (i.e., high-confidence) datasets. The proposed model improves the classification AUC by 5% with respect to a baseline model on our internal dataset, and by 26% on the public LIHC dataset from the Cancer Genome Atlas. The code is available at: https://github.com/Guerbet-AI/wsp-contrastive.

大型医学成像数据集可以用低置信度、弱标签(例如放射学评分)廉价、快速地进行注释。获得高可信度的标签，如基于组织学的诊断，是罕见和昂贵的。预训练策略，如对比学习(CL)方法，可以利用未标记或弱注释的数据集。这些方法通常需要大量的批量处理，由于GPU内存有限，这在全分辨率的大型3D图像的情况下带来了困难。然而，关于每个二维切片的空间背景的体积位置信息对于某些医学应用可能非常重要。在这项工作中，我们提出了一种有效的弱监督位置(WSP)对比学习策略，我们通过通用的基于核的损失函数整合每个2D切片的空间上下文和弱标签。我们使用大量弱标记图像(即放射学低置信度注释)和小的强标记(即高置信度)数据集来说明我们的肝硬化预测方法。与我们内部数据集的基线模型相比，所提出的模型将分类AUC提高了5%，在来自癌症基因组图谱的公共LIHC数据集上提高了26%。代码可从https://github.com/Guerbet-AI/wsp-contrastive获得。

{"title":"Weakly-supervised positional contrastive learning: application to cirrhosis classification","authors":"Emma Sarfati, Alexandre Bône, Marc-Michel Roh'e, P. Gori, I. Bloch","doi":"10.48550/arXiv.2307.04617","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04617","url":null,"abstract":"Large medical imaging datasets can be cheaply and quickly annotated with low-confidence, weak labels (e.g., radiological scores). Access to high-confidence labels, such as histology-based diagnoses, is rare and costly. Pretraining strategies, like contrastive learning (CL) methods, can leverage unlabeled or weakly-annotated datasets. These methods typically require large batch sizes, which poses a difficulty in the case of large 3D images at full resolution, due to limited GPU memory. Nevertheless, volumetric positional information about the spatial context of each 2D slice can be very important for some medical applications. In this work, we propose an efficient weakly-supervised positional (WSP) contrastive learning strategy where we integrate both the spatial context of each 2D slice and a weak label via a generic kernel-based loss function. We illustrate our method on cirrhosis prediction using a large volume of weakly-labeled images, namely radiological low-confidence annotations, and small strongly-labeled (i.e., high-confidence) datasets. The proposed model improves the classification AUC by 5% with respect to a baseline model on our internal dataset, and by 26% on the public LIHC dataset from the Cancer Genome Atlas. The code is available at: https://github.com/Guerbet-AI/wsp-contrastive.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"47 1","pages":"227-237"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88584959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multimodal brain age estimation using interpretable adaptive population-graph learning 使用可解释自适应人口图学习的多模态脑年龄估计

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04639

Kyriaki-Margarita Bintsi, V. Baltatzis, Rolandos Alexandros Potamias, A. Hammers, D. Rueckert

Brain age estimation is clinically important as it can provide valuable information in the context of neurodegenerative diseases such as Alzheimer's. Population graphs, which include multimodal imaging information of the subjects along with the relationships among the population, have been used in literature along with Graph Convolutional Networks (GCNs) and have proved beneficial for a variety of medical imaging tasks. A population graph is usually static and constructed manually using non-imaging information. However, graph construction is not a trivial task and might significantly affect the performance of the GCN, which is inherently very sensitive to the graph structure. In this work, we propose a framework that learns a population graph structure optimized for the downstream task. An attention mechanism assigns weights to a set of imaging and non-imaging features (phenotypes), which are then used for edge extraction. The resulting graph is used to train the GCN. The entire pipeline can be trained end-to-end. Additionally, by visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph. We use the UK Biobank, which provides a large variety of neuroimaging and non-imaging phenotypes, to evaluate our method on brain age regression and classification. The proposed method outperforms competing static graph approaches and other state-of-the-art adaptive methods. We further show that the assigned attention scores indicate that there are both imaging and non-imaging phenotypes that are informative for brain age estimation and are in agreement with the relevant literature.

脑年龄估计在临床上很重要，因为它可以为阿尔茨海默病等神经退行性疾病提供有价值的信息。人口图包括受试者的多模态成像信息以及人口之间的关系，已与图卷积网络(GCNs)一起在文献中使用，并已被证明对各种医学成像任务有益。人口图通常是静态的，并使用非成像信息手动构建。然而，图的构造并不是一项微不足道的任务，它可能会显著影响GCN的性能，因为GCN本身对图的结构非常敏感。在这项工作中，我们提出了一个框架来学习针对下游任务优化的人口图结构。注意机制为一组成像和非成像特征(表型)分配权重，然后将其用于边缘提取。生成的图用于训练GCN。整个管道可以端到端进行训练。此外，通过可视化对图构建最重要的注意权重，我们增加了图的可解释性。我们使用英国生物银行，它提供了各种各样的神经成像和非成像表型，来评估我们的脑年龄回归和分类方法。该方法优于静态图方法和其他先进的自适应方法。我们进一步表明，分配注意力分数表明，成像和非成像表型都可以为脑年龄估计提供信息，并与相关文献一致。

{"title":"Multimodal brain age estimation using interpretable adaptive population-graph learning","authors":"Kyriaki-Margarita Bintsi, V. Baltatzis, Rolandos Alexandros Potamias, A. Hammers, D. Rueckert","doi":"10.48550/arXiv.2307.04639","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04639","url":null,"abstract":"Brain age estimation is clinically important as it can provide valuable information in the context of neurodegenerative diseases such as Alzheimer's. Population graphs, which include multimodal imaging information of the subjects along with the relationships among the population, have been used in literature along with Graph Convolutional Networks (GCNs) and have proved beneficial for a variety of medical imaging tasks. A population graph is usually static and constructed manually using non-imaging information. However, graph construction is not a trivial task and might significantly affect the performance of the GCN, which is inherently very sensitive to the graph structure. In this work, we propose a framework that learns a population graph structure optimized for the downstream task. An attention mechanism assigns weights to a set of imaging and non-imaging features (phenotypes), which are then used for edge extraction. The resulting graph is used to train the GCN. The entire pipeline can be trained end-to-end. Additionally, by visualizing the attention weights that were the most important for the graph construction, we increase the interpretability of the graph. We use the UK Biobank, which provides a large variety of neuroimaging and non-imaging phenotypes, to evaluate our method on brain age regression and classification. The proposed method outperforms competing static graph approaches and other state-of-the-art adaptive methods. We further show that the assigned attention scores indicate that there are both imaging and non-imaging phenotypes that are informative for brain age estimation and are in agreement with the relevant literature.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"63 1","pages":"195-204"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89699032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Partial Vessels Annotation-based Coronary Artery Segmentation with Self-training and Prototype Learning 基于部分血管注释的自训练和原型学习冠状动脉分割

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04472

Zheng Zhang, Xiaolei Zhang, Yaolei Qi, Guanyu Yang

Coronary artery segmentation on coronary-computed tomography angiography (CCTA) images is crucial for clinical use. Due to the expertise-required and labor-intensive annotation process, there is a growing demand for the relevant label-efficient learning algorithms. To this end, we propose partial vessels annotation (PVA) based on the challenges of coronary artery segmentation and clinical diagnostic characteristics. Further, we propose a progressive weakly supervised learning framework to achieve accurate segmentation under PVA. First, our proposed framework learns the local features of vessels to propagate the knowledge to unlabeled regions. Subsequently, it learns the global structure by utilizing the propagated knowledge, and corrects the errors introduced in the propagation process. Finally, it leverages the similarity between feature embeddings and the feature prototype to enhance testing outputs. Experiments on clinical data reveals that our proposed framework outperforms the competing methods under PVA (24.29% vessels), and achieves comparable performance in trunk continuity with the baseline model using full annotation (100% vessels).

冠状动脉ct血管造影(CCTA)图像的冠状动脉分割对临床应用至关重要。由于标注过程需要专业知识和劳动密集型，因此对相关标签高效学习算法的需求日益增长。为此，我们提出了基于冠状动脉分割挑战和临床诊断特征的部分血管注释(PVA)。进一步，我们提出了一个渐进式弱监督学习框架来实现PVA下的精确分割。首先，我们提出的框架学习容器的局部特征，将知识传播到未标记的区域。随后，利用传播的知识学习全局结构，并对传播过程中引入的误差进行校正。最后，利用特征嵌入和特征原型之间的相似性来增强测试输出。临床数据实验表明，我们提出的框架在PVA(24.29%血管)下优于竞争方法，并且在主干连续性方面与使用全标注的基线模型(100%血管)达到相当的性能。

{"title":"Partial Vessels Annotation-based Coronary Artery Segmentation with Self-training and Prototype Learning","authors":"Zheng Zhang, Xiaolei Zhang, Yaolei Qi, Guanyu Yang","doi":"10.48550/arXiv.2307.04472","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04472","url":null,"abstract":"Coronary artery segmentation on coronary-computed tomography angiography (CCTA) images is crucial for clinical use. Due to the expertise-required and labor-intensive annotation process, there is a growing demand for the relevant label-efficient learning algorithms. To this end, we propose partial vessels annotation (PVA) based on the challenges of coronary artery segmentation and clinical diagnostic characteristics. Further, we propose a progressive weakly supervised learning framework to achieve accurate segmentation under PVA. First, our proposed framework learns the local features of vessels to propagate the knowledge to unlabeled regions. Subsequently, it learns the global structure by utilizing the propagated knowledge, and corrects the errors introduced in the propagation process. Finally, it leverages the similarity between feature embeddings and the feature prototype to enhance testing outputs. Experiments on clinical data reveals that our proposed framework outperforms the competing methods under PVA (24.29% vessels), and achieves comparable performance in trunk continuity with the baseline model using full annotation (100% vessels).","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"16 1","pages":"297-306"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90997631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation CoactSeg:从异构数据中学习新的多发性硬化症病变分割

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04513

Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, W. Chong, Jianfei Cai

New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying large-scale deep learning models. Since single-time-point samples with all-lesion labels are relatively easy to collect, exploiting them to train deep models is highly desirable to improve new lesion segmentation. Therefore, we proposed a coaction segmentation (CoactSeg) framework to exploit the heterogeneous data (i.e., new-lesion annotated two-time-point data and all-lesion annotated single-time-point data) for new MS lesion segmentation. The CoactSeg model is designed as a unified model, with the same three inputs (the baseline, follow-up, and their longitudinal brain differences) and the same three outputs (the corresponding all-lesion and new-lesion predictions), no matter which type of heterogeneous data is being used. Moreover, a simple and effective relation regularization is proposed to ensure the longitudinal relations among the three outputs to improve the model learning. Extensive experiments demonstrate that utilizing the heterogeneous data and the proposed longitudinal relation constraint can significantly improve the performance for both new-lesion and all-lesion segmentation tasks. Meanwhile, we also introduce an in-house MS-23v1 dataset, including 38 Oceania single-time-point samples with all-lesion labels. Codes and the dataset are released at https://github.com/ycwu1997/CoactSeg.

在多发性硬化症(MS)的临床治疗中，新的病灶分割对于评估疾病进展和治疗效果至关重要。然而，昂贵的数据采集和专家标注限制了大规模深度学习模型应用的可行性。由于具有所有病变标签的单时间点样本相对容易收集，因此利用它们来训练深度模型是非常可取的，以改进新的病变分割。因此，我们提出了一个协同分割(CoactSeg)框架，利用异构数据(即新病变标注的双时间点数据和全病变标注的单时间点数据)进行新的MS病变分割。CoactSeg模型被设计为一个统一的模型，无论使用哪种类型的异构数据，都具有相同的三个输入(基线、随访和它们的纵向脑差异)和相同的三个输出(相应的全病变和新病变预测)。此外，提出了一种简单有效的关系正则化方法来保证三个输出之间的纵向关系，以提高模型的学习效果。大量实验表明，利用异构数据和所提出的纵向关系约束可以显著提高新病灶和全病灶分割任务的性能。同时，我们还引入了内部MS-23v1数据集，包括38个大洋洲单时间点样本，所有病变标签。代码和数据集在https://github.com/ycwu1997/CoactSeg上发布。

{"title":"CoactSeg: Learning from Heterogeneous Data for New Multiple Sclerosis Lesion Segmentation","authors":"Yicheng Wu, Zhonghua Wu, Hengcan Shi, Bjoern Picker, W. Chong, Jianfei Cai","doi":"10.48550/arXiv.2307.04513","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04513","url":null,"abstract":"New lesion segmentation is essential to estimate the disease progression and therapeutic effects during multiple sclerosis (MS) clinical treatments. However, the expensive data acquisition and expert annotation restrict the feasibility of applying large-scale deep learning models. Since single-time-point samples with all-lesion labels are relatively easy to collect, exploiting them to train deep models is highly desirable to improve new lesion segmentation. Therefore, we proposed a coaction segmentation (CoactSeg) framework to exploit the heterogeneous data (i.e., new-lesion annotated two-time-point data and all-lesion annotated single-time-point data) for new MS lesion segmentation. The CoactSeg model is designed as a unified model, with the same three inputs (the baseline, follow-up, and their longitudinal brain differences) and the same three outputs (the corresponding all-lesion and new-lesion predictions), no matter which type of heterogeneous data is being used. Moreover, a simple and effective relation regularization is proposed to ensure the longitudinal relations among the three outputs to improve the model learning. Extensive experiments demonstrate that utilizing the heterogeneous data and the proposed longitudinal relation constraint can significantly improve the performance for both new-lesion and all-lesion segmentation tasks. Meanwhile, we also introduce an in-house MS-23v1 dataset, including 38 Oceania single-time-point samples with all-lesion labels. Codes and the dataset are released at https://github.com/ycwu1997/CoactSeg.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"24 1","pages":"3-13"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87486424","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains 糖尿病视网膜病变在不可见领域的分级

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-10 DOI: 10.48550/arXiv.2307.04378

Haoxuan Che, Yu-Tsen Cheng, Haibo Jin, Haoxing Chen

Diabetic Retinopathy (DR) is a common complication of diabetes and a leading cause of blindness worldwide. Early and accurate grading of its severity is crucial for disease management. Although deep learning has shown great potential for automated DR grading, its real-world deployment is still challenging due to distribution shifts among source and target domains, known as the domain generalization problem. Existing works have mainly attributed the performance degradation to limited domain shifts caused by simple visual discrepancies, which cannot handle complex real-world scenarios. Instead, we present preliminary evidence suggesting the existence of three-fold generalization issues: visual and degradation style shifts, diagnostic pattern diversity, and data imbalance. To tackle these issues, we propose a novel unified framework named Generalizable Diabetic Retinopathy Grading Network (GDRNet). GDRNet consists of three vital components: fundus visual-artifact augmentation (FundusAug), dynamic hybrid-supervised loss (DahLoss), and domain-class-aware re-balancing (DCR). FundusAug generates realistic augmented images via visual transformation and image degradation, while DahLoss jointly leverages pixel-level consistency and image-level semantics to capture the diverse diagnostic patterns and build generalizable feature representations. Moreover, DCR mitigates the data imbalance from a domain-class view and avoids undesired over-emphasis on rare domain-class pairs. Finally, we design a publicly available benchmark for fair evaluations. Extensive comparison experiments against advanced methods and exhaustive ablation studies demonstrate the effectiveness and generalization ability of GDRNet.

糖尿病视网膜病变(DR)是糖尿病的常见并发症，也是全世界失明的主要原因。早期和准确的严重程度分级对疾病管理至关重要。尽管深度学习在自动DR分级方面显示出巨大的潜力，但由于源域和目标域之间的分布变化，即域泛化问题，其在现实世界中的部署仍然具有挑战性。现有的工作主要将性能下降归因于简单的视觉差异引起的有限域转移，无法处理复杂的现实场景。相反，我们提出的初步证据表明存在三重泛化问题:视觉和退化风格的转变，诊断模式的多样性和数据的不平衡。为了解决这些问题，我们提出了一个新的统一框架，称为通用糖尿病视网膜病变分级网络(GDRNet)。GDRNet由眼底视觉伪影增强(FundusAug)、动态混合监督损失(DahLoss)和域类感知再平衡(DCR)三个重要组成部分组成。FundusAug通过视觉转换和图像退化生成逼真的增强图像，而DahLoss则共同利用像素级一致性和图像级语义来捕获各种诊断模式并构建可泛化的特征表示。此外，DCR减轻了来自领域类视图的数据不平衡，并避免不必要地过度强调罕见的领域类对。最后，我们设计了一个公开可用的公平评估基准。大量与先进方法的对比实验和详尽的消融研究证明了GDRNet的有效性和泛化能力。

{"title":"Towards Generalizable Diabetic Retinopathy Grading in Unseen Domains","authors":"Haoxuan Che, Yu-Tsen Cheng, Haibo Jin, Haoxing Chen","doi":"10.48550/arXiv.2307.04378","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04378","url":null,"abstract":"Diabetic Retinopathy (DR) is a common complication of diabetes and a leading cause of blindness worldwide. Early and accurate grading of its severity is crucial for disease management. Although deep learning has shown great potential for automated DR grading, its real-world deployment is still challenging due to distribution shifts among source and target domains, known as the domain generalization problem. Existing works have mainly attributed the performance degradation to limited domain shifts caused by simple visual discrepancies, which cannot handle complex real-world scenarios. Instead, we present preliminary evidence suggesting the existence of three-fold generalization issues: visual and degradation style shifts, diagnostic pattern diversity, and data imbalance. To tackle these issues, we propose a novel unified framework named Generalizable Diabetic Retinopathy Grading Network (GDRNet). GDRNet consists of three vital components: fundus visual-artifact augmentation (FundusAug), dynamic hybrid-supervised loss (DahLoss), and domain-class-aware re-balancing (DCR). FundusAug generates realistic augmented images via visual transformation and image degradation, while DahLoss jointly leverages pixel-level consistency and image-level semantics to capture the diverse diagnostic patterns and build generalizable feature representations. Moreover, DCR mitigates the data imbalance from a domain-class view and avoids undesired over-emphasis on rare domain-class pairs. Finally, we design a publicly available benchmark for fair evaluations. Extensive comparison experiments against advanced methods and exhaustive ablation studies demonstrate the effectiveness and generalization ability of GDRNet.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"36 1","pages":"430-440"},"PeriodicalIF":0.0,"publicationDate":"2023-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87632905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Mitosis Detection from Partial Annotation by Dataset Generation via Frame-Order Flipping 基于帧序翻转数据集生成的部分注释有丝分裂检测

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-09 DOI: 10.48550/arXiv.2307.04113

Kazuya Nishimura, Ami Katanaya, S. Chuma, Ryoma Bise

Detection of mitosis events plays an important role in biomedical research. Deep-learning-based mitosis detection methods have achieved outstanding performance with a certain amount of labeled data. However, these methods require annotations for each imaging condition. Collecting labeled data involves time-consuming human labor. In this paper, we propose a mitosis detection method that can be trained with partially annotated sequences. The base idea is to generate a fully labeled dataset from the partial labels and train a mitosis detection model with the generated dataset. First, we generate an image pair not containing mitosis events by frame-order flipping. Then, we paste mitosis events to the image pair by alpha-blending pasting and generate a fully labeled dataset. We demonstrate the performance of our method on four datasets, and we confirm that our method outperforms other comparisons which use partially labeled sequences.

有丝分裂事件的检测在生物医学研究中起着重要的作用。基于深度学习的有丝分裂检测方法在一定数量的标记数据下取得了优异的性能。然而，这些方法需要对每个成像条件进行注释。收集标记数据需要耗费大量人力。在本文中，我们提出了一种可以用部分注释序列训练的有丝分裂检测方法。基本思想是从部分标记生成一个完全标记的数据集，并用生成的数据集训练有丝分裂检测模型。首先，我们通过帧序翻转生成不包含有丝分裂事件的图像对。然后，我们通过alpha-blending粘贴将有丝分裂事件粘贴到图像对上，生成一个完全标记的数据集。我们在四个数据集上展示了我们的方法的性能，并且我们确认我们的方法优于使用部分标记序列的其他比较。

引用次数: 0

Ariadne's Thread: Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images 阿里阿德涅的线索:使用文本提示来改进胸部x光图像中感染区域的分割

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-08 DOI: 10.48550/arXiv.2307.03942

Yishan Zhong, Mengqiu Xu, K. Liang, Kaixin Chen, Ming Wu

Segmentation of the infected areas of the lung is essential for quantifying the severity of lung disease like pulmonary infections. Existing medical image segmentation methods are almost uni-modal methods based on image. However, these image-only methods tend to produce inaccurate results unless trained with large amounts of annotated data. To overcome this challenge, we propose a language-driven segmentation method that uses text prompt to improve to the segmentation result. Experiments on the QaTa-COV19 dataset indicate that our method improves the Dice score by 6.09% at least compared to the uni-modal methods. Besides, our extended study reveals the flexibility of multi-modal methods in terms of the information granularity of text and demonstrates that multi-modal methods have a significant advantage over image-only methods in terms of the size of training data required.

肺部感染区域的分割对于量化肺部感染等肺部疾病的严重程度至关重要。现有的医学图像分割方法几乎都是基于图像的单模方法。然而，除非使用大量带注释的数据进行训练，否则这些仅图像的方法往往会产生不准确的结果。为了克服这一挑战，我们提出了一种语言驱动的切分方法，该方法使用文本提示来改进切分结果。在QaTa-COV19数据集上的实验表明，与单模态方法相比，我们的方法至少提高了6.09%的Dice得分。此外，我们的扩展研究揭示了多模态方法在文本信息粒度方面的灵活性，并表明多模态方法在所需训练数据的大小方面比仅图像方法具有显着优势。

引用次数: 0

VesselVAE: Recursive Variational Autoencoders for 3D Blood Vessel Synthesis 血管vae:递归变分自编码器三维血管合成

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-07 DOI: 10.48550/arXiv.2307.03592

Paula Feldman, Miguel Fainstein, Viviana Siless, C. Delrieux, Emmanuel Iarussi

We present a data-driven generative framework for synthesizing blood vessel 3D geometry. This is a challenging task due to the complexity of vascular systems, which are highly variating in shape, size, and structure. Existing model-based methods provide some degree of control and variation in the structures produced, but fail to capture the diversity of actual anatomical data. We developed VesselVAE, a recursive variational Neural Network that fully exploits the hierarchical organization of the vessel and learns a low-dimensional manifold encoding branch connectivity along with geometry features describing the target surface. After training, the VesselVAE latent space can be sampled to generate new vessel geometries. To the best of our knowledge, this work is the first to utilize this technique for synthesizing blood vessels. We achieve similarities of synthetic and real data for radius (.97), length (.95), and tortuosity (.96). By leveraging the power of deep neural networks, we generate 3D models of blood vessels that are both accurate and diverse, which is crucial for medical and surgical training, hemodynamic simulations, and many other purposes.

我们提出了一个数据驱动的生成框架，用于合成血管三维几何。由于血管系统在形状、大小和结构上的高度变化，这是一项具有挑战性的任务。现有的基于模型的方法在产生的结构中提供了一定程度的控制和变化，但无法捕获实际解剖数据的多样性。我们开发了VesselVAE，这是一种递归变分神经网络，它充分利用了容器的分层组织，并学习了编码分支连接的低维流形以及描述目标表面的几何特征。训练后，可以对VesselVAE潜空间进行采样以生成新的血管几何形状。据我们所知，这项工作是第一次利用这种技术来合成血管。我们在半径(0.97)、长度(0.95)和弯曲度(0.96)方面获得了合成数据和真实数据的相似性。通过利用深度神经网络的力量，我们生成了精确而多样的血管3D模型，这对于医学和外科训练、血流动力学模拟和许多其他目的至关重要。

{"title":"VesselVAE: Recursive Variational Autoencoders for 3D Blood Vessel Synthesis","authors":"Paula Feldman, Miguel Fainstein, Viviana Siless, C. Delrieux, Emmanuel Iarussi","doi":"10.48550/arXiv.2307.03592","DOIUrl":"https://doi.org/10.48550/arXiv.2307.03592","url":null,"abstract":"We present a data-driven generative framework for synthesizing blood vessel 3D geometry. This is a challenging task due to the complexity of vascular systems, which are highly variating in shape, size, and structure. Existing model-based methods provide some degree of control and variation in the structures produced, but fail to capture the diversity of actual anatomical data. We developed VesselVAE, a recursive variational Neural Network that fully exploits the hierarchical organization of the vessel and learns a low-dimensional manifold encoding branch connectivity along with geometry features describing the target surface. After training, the VesselVAE latent space can be sampled to generate new vessel geometries. To the best of our knowledge, this work is the first to utilize this technique for synthesizing blood vessels. We achieve similarities of synthetic and real data for radius (.97), length (.95), and tortuosity (.96). By leveraging the power of deep neural networks, we generate 3D models of blood vessels that are both accurate and diverse, which is crucial for medical and surgical training, hemodynamic simulations, and many other purposes.","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"11 Suppl 6 1","pages":"67-76"},"PeriodicalIF":0.0,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83071266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised 3D out-of-distribution detection with latent diffusion models 基于潜扩散模型的无监督三维超分布检测

Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention

Pub Date : 2023-07-07 DOI: 10.48550/arXiv.2307.03777

M. Graham, W. H. Pinaya, P. Wright, Petru-Daniel Tudosiu, Y. Mah, J. Teo, H. Jäger, D. Werring, P. Nachev, S. Ourselin, M. Cardoso

Methods for out-of-distribution (OOD) detection that scale to 3D data are crucial components of any real-world clinical deep learning system. Classic denoising diffusion probabilistic models (DDPMs) have been recently proposed as a robust way to perform reconstruction-based OOD detection on 2D datasets, but do not trivially scale to 3D data. In this work, we propose to use Latent Diffusion Models (LDMs), which enable the scaling of DDPMs to high-resolution 3D medical data. We validate the proposed approach on near- and far-OOD datasets and compare it to a recently proposed, 3D-enabled approach using Latent Transformer Models (LTMs). Not only does the proposed LDM-based approach achieve statistically significant better performance, it also shows less sensitivity to the underlying latent representation, more favourable memory scaling, and produces better spatial anomaly maps. Code is available at https://github.com/marksgraham/ddpm-ood

对于任何现实世界的临床深度学习系统来说，基于3D数据的分布外(OOD)检测方法都是至关重要的组成部分。经典的去噪扩散概率模型(ddpm)最近被提出作为在2D数据集上执行基于重建的OOD检测的鲁棒方法，但不能简单地扩展到3D数据。在这项工作中，我们建议使用潜在扩散模型(ldm)，这使得ddpm能够缩放到高分辨率的3D医疗数据。我们在近ood和远ood数据集上验证了所提出的方法，并将其与最近提出的使用潜在变压器模型(ltm)的3d方法进行了比较。提出的基于ldm的方法不仅在统计上取得了更好的性能，而且对潜在表示的敏感性更低，更有利的记忆缩放，并产生更好的空间异常图。代码可从https://github.com/marksgraham/ddpm-ood获得

{"title":"Unsupervised 3D out-of-distribution detection with latent diffusion models","authors":"M. Graham, W. H. Pinaya, P. Wright, Petru-Daniel Tudosiu, Y. Mah, J. Teo, H. Jäger, D. Werring, P. Nachev, S. Ourselin, M. Cardoso","doi":"10.48550/arXiv.2307.03777","DOIUrl":"https://doi.org/10.48550/arXiv.2307.03777","url":null,"abstract":"Methods for out-of-distribution (OOD) detection that scale to 3D data are crucial components of any real-world clinical deep learning system. Classic denoising diffusion probabilistic models (DDPMs) have been recently proposed as a robust way to perform reconstruction-based OOD detection on 2D datasets, but do not trivially scale to 3D data. In this work, we propose to use Latent Diffusion Models (LDMs), which enable the scaling of DDPMs to high-resolution 3D medical data. We validate the proposed approach on near- and far-OOD datasets and compare it to a recently proposed, 3D-enabled approach using Latent Transformer Models (LTMs). Not only does the proposed LDM-based approach achieve statistically significant better performance, it also shows less sensitivity to the underlying latent representation, more favourable memory scaling, and produces better spatial anomaly maps. Code is available at https://github.com/marksgraham/ddpm-ood","PeriodicalId":18289,"journal":{"name":"Medical image computing and computer-assisted intervention : MICCAI ... International Conference on Medical Image Computing and Computer-Assisted Intervention","volume":"22 11 1","pages":"446-456"},"PeriodicalIF":0.0,"publicationDate":"2023-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82918567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0