Pub Date : 2023-06-01DOI: 10.1007/978-3-031-34048-2_32
Wenhui Zhu, Peijie Qiu, Oana M Dumitrascu, Jacob M Sobczak, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin Wang
Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the Optimal Transport (OT) theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as regularization by enhancing (RE). We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.
{"title":"OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing.","authors":"Wenhui Zhu, Peijie Qiu, Oana M Dumitrascu, Jacob M Sobczak, Mohammad Farazi, Zhangsihao Yang, Keshav Nandakumar, Yalin Wang","doi":"10.1007/978-3-031-34048-2_32","DOIUrl":"https://doi.org/10.1007/978-3-031-34048-2_32","url":null,"abstract":"<p><p>Non-mydriatic retinal color fundus photography (CFP) is widely available due to the advantage of not requiring pupillary dilation, however, is prone to poor quality due to operators, systemic imperfections, or patient-related causes. Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. Herein, we leveraged the <i>Optimal Transport (OT)</i> theory to propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. Furthermore, to improve the flexibility, robustness, and applicability of our image enhancement pipeline in the clinical practice, we generalized a state-of-the-art model-based image reconstruction method, regularization by denoising, by plugging in priors learned by our OT-guided image-to-image translation network. We named it as <i>regularization by enhancing (RE)</i>. We validated the integrated framework, OTRE, on three publicly available retinal image datasets by assessing the quality after enhancement and their performance on various downstream tasks, including diabetic retinopathy grading, vessel segmentation, and diabetic lesion segmentation. The experimental results demonstrated the superiority of our proposed framework over some state-of-the-art unsupervised competitors and a state-of-the-art supervised method.</p>","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"13939 ","pages":"415-427"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10329768/pdf/nihms-1880857.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9811952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-06-08DOI: 10.1007/978-3-031-34048-2_57
Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna
Can we use sparse tokens for dense prediction, e.g., segmentation? Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a sparse encoding → tokencompletion → dense decoding (SCD) pipeline. We first empirically show that naïvely applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose Soft-topK Token Pruning (STP) and Multi-layer Token Assembly (MTA) to address these problems. In sparse encoding, STP predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In token completion, MTA restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last dense decoding stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with STP and MTA are much faster than baselines without token pruning in both training (up to 120% higher throughput) and inference (up to 60.6% higher throughput) while maintaining segmentation quality. Code is available here: https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg.
{"title":"Token Sparsification for Faster Medical Image Segmentation.","authors":"Lei Zhou, Huidong Liu, Joseph Bae, Junjun He, Dimitris Samaras, Prateek Prasanna","doi":"10.1007/978-3-031-34048-2_57","DOIUrl":"https://doi.org/10.1007/978-3-031-34048-2_57","url":null,"abstract":"<p><p><i>Can we use sparse tokens for dense prediction, e.g., segmentation?</i> Although token sparsification has been applied to Vision Transformers (ViT) to accelerate classification, it is still unknown how to perform segmentation from sparse tokens. To this end, we reformulate segmentation as a <i>s</i><i>parse encoding</i> → <i>token</i> <i>c</i><i>ompletion</i> → <i>d</i><i>ense decoding</i> (SCD) pipeline. We first empirically show that naïvely applying existing approaches from classification token pruning and masked image modeling (MIM) leads to failure and inefficient training caused by inappropriate sampling algorithms and the low quality of the restored dense features. In this paper, we propose <i>Soft-topK Token Pruning (STP)</i> and <i>Multi-layer Token Assembly (MTA)</i> to address these problems. In <i>sparse encoding</i>, <i>STP</i> predicts token importance scores with a lightweight sub-network and samples the topK tokens. The intractable topK gradients are approximated through a continuous perturbed score distribution. In <i>token completion</i>, <i>MTA</i> restores a full token sequence by assembling both sparse output tokens and pruned multi-layer intermediate ones. The last <i>dense decoding</i> stage is compatible with existing segmentation decoders, e.g., UNETR. Experiments show SCD pipelines equipped with <i>STP</i> and <i>MTA</i> are much faster than baselines without token pruning in both training (up to 120% higher throughput) and inference (up to 60.6% higher throughput) while maintaining segmentation quality. Code is available here: https://github.com/cvlab-stonybrook/TokenSparse-for-MedSeg.</p>","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"13939 ","pages":"743-754"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11056020/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140856074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-06-01Epub Date: 2023-06-08DOI: 10.1007/978-3-031-34048-2_22
Jinghan Huang, Moo K Chung, Anqi Qiu
This study proposes a novel heterogeneous graph convolutional neural network (HGCNN) to handle complex brain fMRI data at regional and across-region levels. We introduce a generic formulation of spectral filters on heterogeneous graphs by introducing the k-th Hodge-Laplacian (HL) operator. In particular, we propose Laguerre polynomial approximations of HL spectral filters and prove that their spatial localization on graphs is related to the polynomial order. Furthermore, based on the bijection property of boundary operators on simplex graphs, we introduce a generic topological graph pooling (TGPool) method that can be used at any dimensional simplices. This study designs HL-node, HL-edge, and HL-HGCNN neural networks to learn signal representation at a graph node, edge levels, and both, respectively. Our experiments employ fMRI from the Adolescent Brain Cognitive Development (ABCD; n=7693) to predict general intelligence. Our results demonstrate the advantage of the HL-edge network over the HL-node network when functional brain connectivity is considered as features. The HL-HGCNN outperforms the state-of-the-art graph neural networks (GNNs) approaches, such as GAT, BrainGNN, dGCN, BrainNetCNN, and Hypergraph NN. The functional connectivity features learned from the HL-HGCNN are meaningful in interpreting neural circuits related to general intelligence.
{"title":"Heterogeneous Graph Convolutional Neural Network via Hodge-Laplacian for Brain Functional Data.","authors":"Jinghan Huang, Moo K Chung, Anqi Qiu","doi":"10.1007/978-3-031-34048-2_22","DOIUrl":"10.1007/978-3-031-34048-2_22","url":null,"abstract":"<p><p>This study proposes a novel heterogeneous graph convolutional neural network (HGCNN) to handle complex brain fMRI data at regional and across-region levels. We introduce a generic formulation of spectral filters on heterogeneous graphs by introducing the <i>k</i>-<i>th</i> Hodge-Laplacian (HL) operator. In particular, we propose Laguerre polynomial approximations of HL spectral filters and prove that their spatial localization on graphs is related to the polynomial order. Furthermore, based on the bijection property of boundary operators on simplex graphs, we introduce a generic topological graph pooling (TGPool) method that can be used at any dimensional simplices. This study designs HL-node, HL-edge, and HL-HGCNN neural networks to learn signal representation at a graph node, edge levels, and both, respectively. Our experiments employ fMRI from the Adolescent Brain Cognitive Development (ABCD; n=7693) to predict general intelligence. Our results demonstrate the advantage of the HL-edge network over the HL-node network when functional brain connectivity is considered as features. The HL-HGCNN outperforms the state-of-the-art graph neural networks (GNNs) approaches, such as GAT, BrainGNN, dGCN, BrainNetCNN, and Hypergraph NN. The functional connectivity features learned from the HL-HGCNN are meaningful in interpreting neural circuits related to general intelligence.</p>","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"13939 ","pages":"278-290"},"PeriodicalIF":0.0,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11108189/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141077300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-05-23DOI: 10.48550/arXiv.2305.13756
B. Kim, J. Dolz, Pierre-Marc Jodoin, Christian Desrosiers
Privacy protection in medical data is a legitimate obstacle for centralized machine learning applications. Here, we propose a client-server image segmentation system which allows for the analysis of multi-centric medical images while preserving patient privacy. In this approach, the client protects the to-be-segmented patient image by mixing it to a reference image. As shown in our work, it is challenging to separate the image mixture to exact original content, thus making the data unworkable and unrecognizable for an unauthorized person. This proxy image is sent to a server for processing. The server then returns the mixture of segmentation maps, which the client can revert to a correct target segmentation. Our system has two components: 1) a segmentation network on the server side which processes the image mixture, and 2) a segmentation unmixing network which recovers the correct segmentation map from the segmentation mixture. Furthermore, the whole system is trained end-to-end. The proposed method is validated on the task of MRI brain segmentation using images from two different datasets. Results show that the segmentation accuracy of our method is comparable to a system trained on raw images, and outperforms other privacy-preserving methods with little computational overhead.
{"title":"Mixup-Privacy: A simple yet effective approach for privacy-preserving segmentation","authors":"B. Kim, J. Dolz, Pierre-Marc Jodoin, Christian Desrosiers","doi":"10.48550/arXiv.2305.13756","DOIUrl":"https://doi.org/10.48550/arXiv.2305.13756","url":null,"abstract":"Privacy protection in medical data is a legitimate obstacle for centralized machine learning applications. Here, we propose a client-server image segmentation system which allows for the analysis of multi-centric medical images while preserving patient privacy. In this approach, the client protects the to-be-segmented patient image by mixing it to a reference image. As shown in our work, it is challenging to separate the image mixture to exact original content, thus making the data unworkable and unrecognizable for an unauthorized person. This proxy image is sent to a server for processing. The server then returns the mixture of segmentation maps, which the client can revert to a correct target segmentation. Our system has two components: 1) a segmentation network on the server side which processes the image mixture, and 2) a segmentation unmixing network which recovers the correct segmentation map from the segmentation mixture. Furthermore, the whole system is trained end-to-end. The proposed method is validated on the task of MRI brain segmentation using images from two different datasets. Results show that the segmentation accuracy of our method is comparable to a system trained on raw images, and outperforms other privacy-preserving methods with little computational overhead.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"40 1","pages":"717-729"},"PeriodicalIF":0.0,"publicationDate":"2023-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83483216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Medical image segmentation is a fundamental task in the community of medical image analysis. In this paper, a novel network architecture, referred to as Convolution, Transformer, and Operator (CTO), is proposed. CTO employs a combination of Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and an explicit boundary detection operator to achieve high recognition accuracy while maintaining an optimal balance between accuracy and efficiency. The proposed CTO follows the standard encoder-decoder segmentation paradigm, where the encoder network incorporates a popular CNN backbone for capturing local semantic information, and a lightweight ViT assistant for integrating long-range dependencies. To enhance the learning capacity on boundary, a boundary-guided decoder network is proposed that uses a boundary mask obtained from a dedicated boundary detection operator as explicit supervision to guide the decoding learning process. The performance of the proposed method is evaluated on six challenging medical image segmentation datasets, demonstrating that CTO achieves state-of-the-art accuracy with a competitive model complexity.
{"title":"Rethinking Boundary Detection in Deep Learning Models for Medical Image Segmentation","authors":"Yi-Mou Lin, Dong-Ming Zhang, Xiaori Fang, Yufan Chen, Kwang-Ting Cheng, Hao Chen","doi":"10.48550/arXiv.2305.00678","DOIUrl":"https://doi.org/10.48550/arXiv.2305.00678","url":null,"abstract":"Medical image segmentation is a fundamental task in the community of medical image analysis. In this paper, a novel network architecture, referred to as Convolution, Transformer, and Operator (CTO), is proposed. CTO employs a combination of Convolutional Neural Networks (CNNs), Vision Transformer (ViT), and an explicit boundary detection operator to achieve high recognition accuracy while maintaining an optimal balance between accuracy and efficiency. The proposed CTO follows the standard encoder-decoder segmentation paradigm, where the encoder network incorporates a popular CNN backbone for capturing local semantic information, and a lightweight ViT assistant for integrating long-range dependencies. To enhance the learning capacity on boundary, a boundary-guided decoder network is proposed that uses a boundary mask obtained from a dedicated boundary detection operator as explicit supervision to guide the decoding learning process. The performance of the proposed method is evaluated on six challenging medical image segmentation datasets, demonstrating that CTO achieves state-of-the-art accuracy with a competitive model complexity.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"77 1","pages":"730-742"},"PeriodicalIF":0.0,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87056818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-31DOI: 10.48550/arXiv.2303.18019
Gary Sarwin, A. Carretta, V. Staartjes, M. Zoli, D. Mazzatenta, L. Regli, C. Serra, E. Konukoglu
Advanced minimally invasive neurosurgery navigation relies mainly on Magnetic Resonance Imaging (MRI) guidance. MRI guidance, however, only provides pre-operative information in the majority of the cases. Once the surgery begins, the value of this guidance diminishes to some extent because of the anatomical changes due to surgery. Guidance with live image feedback coming directly from the surgical device, e.g., endoscope, can complement MRI-based navigation or be an alternative if MRI guidance is not feasible. With this motivation, we present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos.First, we report the performance of a deep learning-based object detection method, YOLO, on detecting anatomical structures in neurosurgical images. Second, we present a method for generating neurosurgical roadmaps using unsupervised embedding without assuming exact anatomical matches between patients, presence of an extensive anatomical atlas, or the need for simultaneous localization and mapping. A generated roadmap encodes the common anatomical paths taken in surgeries in the training set. At inference, the roadmap can be used to map a surgeon's current location using live image feedback on the path to provide guidance by being able to predict which structures should appear going forward or backward, much like a mapping application. Even though the embedding is not supervised by position information, we show that it is correlated to the location inside the brain and on the surgical path. We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.
{"title":"Live image-based neurosurgical guidance and roadmap generation using unsupervised embedding","authors":"Gary Sarwin, A. Carretta, V. Staartjes, M. Zoli, D. Mazzatenta, L. Regli, C. Serra, E. Konukoglu","doi":"10.48550/arXiv.2303.18019","DOIUrl":"https://doi.org/10.48550/arXiv.2303.18019","url":null,"abstract":"Advanced minimally invasive neurosurgery navigation relies mainly on Magnetic Resonance Imaging (MRI) guidance. MRI guidance, however, only provides pre-operative information in the majority of the cases. Once the surgery begins, the value of this guidance diminishes to some extent because of the anatomical changes due to surgery. Guidance with live image feedback coming directly from the surgical device, e.g., endoscope, can complement MRI-based navigation or be an alternative if MRI guidance is not feasible. With this motivation, we present a method for live image-only guidance leveraging a large data set of annotated neurosurgical videos.First, we report the performance of a deep learning-based object detection method, YOLO, on detecting anatomical structures in neurosurgical images. Second, we present a method for generating neurosurgical roadmaps using unsupervised embedding without assuming exact anatomical matches between patients, presence of an extensive anatomical atlas, or the need for simultaneous localization and mapping. A generated roadmap encodes the common anatomical paths taken in surgeries in the training set. At inference, the roadmap can be used to map a surgeon's current location using live image feedback on the path to provide guidance by being able to predict which structures should appear going forward or backward, much like a mapping application. Even though the embedding is not supervised by position information, we show that it is correlated to the location inside the brain and on the surgical path. We trained and evaluated the proposed method with a data set of 166 transsphenoidal adenomectomy procedures.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"12 1","pages":"107-118"},"PeriodicalIF":0.0,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82443376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-15DOI: 10.48550/arXiv.2303.08632
A. Sadafi, Oleksandra Adonkina, Ashkan Khakzar, P. Lienemann, Rudolf Matthias Hehr, D. Rueckert, N. Navab, C. Marr
Explainability is a key requirement for computer-aided diagnosis systems in clinical decision-making. Multiple instance learning with attention pooling provides instance-level explainability, however for many clinical applications a deeper, pixel-level explanation is desirable, but missing so far. In this work, we investigate the use of four attribution methods to explain a multiple instance learning models: GradCAM, Layer-Wise Relevance Propagation (LRP), Information Bottleneck Attribution (IBA), and InputIBA. With this collection of methods, we can derive pixel-level explanations on for the task of diagnosing blood cancer from patients' blood smears. We study two datasets of acute myeloid leukemia with over 100 000 single cell images and observe how each attribution method performs on the multiple instance learning architecture focusing on different properties of the white blood single cells. Additionally, we compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard. Our study addresses the challenge of implementing pixel-level explainability in multiple instance learning models and provides insights for clinicians to better understand and trust decisions from computer-aided diagnosis systems.
{"title":"Pixel-Level Explanation of Multiple Instance Learning Models in Biomedical Single Cell Images","authors":"A. Sadafi, Oleksandra Adonkina, Ashkan Khakzar, P. Lienemann, Rudolf Matthias Hehr, D. Rueckert, N. Navab, C. Marr","doi":"10.48550/arXiv.2303.08632","DOIUrl":"https://doi.org/10.48550/arXiv.2303.08632","url":null,"abstract":"Explainability is a key requirement for computer-aided diagnosis systems in clinical decision-making. Multiple instance learning with attention pooling provides instance-level explainability, however for many clinical applications a deeper, pixel-level explanation is desirable, but missing so far. In this work, we investigate the use of four attribution methods to explain a multiple instance learning models: GradCAM, Layer-Wise Relevance Propagation (LRP), Information Bottleneck Attribution (IBA), and InputIBA. With this collection of methods, we can derive pixel-level explanations on for the task of diagnosing blood cancer from patients' blood smears. We study two datasets of acute myeloid leukemia with over 100 000 single cell images and observe how each attribution method performs on the multiple instance learning architecture focusing on different properties of the white blood single cells. Additionally, we compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard. Our study addresses the challenge of implementing pixel-level explainability in multiple instance learning models and provides insights for clinicians to better understand and trust decisions from computer-aided diagnosis systems.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"23 1","pages":"170-182"},"PeriodicalIF":0.0,"publicationDate":"2023-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78891836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-14DOI: 10.48550/arXiv.2303.07717
Anne-Marie Rickmann, Murong Xu, Thomas Wolf, Oksana P. Kovalenko, C. Wachinger
The wide range of research in deep learning-based medical image segmentation pushed the boundaries in a multitude of applications. A clinically relevant problem that received less attention is the handling of scans with irregular anatomy, e.g., after organ resection. State-of-the-art segmentation models often lead to organ hallucinations, i.e., false-positive predictions of organs, which cannot be alleviated by oversampling or post-processing. Motivated by the increasing need to develop robust deep learning models, we propose HALOS for abdominal organ segmentation in MR images that handles cases after organ resection surgery. To this end, we combine missing organ classification and multi-organ segmentation tasks into a multi-task model, yielding a classification-assisted segmentation pipeline. The segmentation network learns to incorporate knowledge about organ existence via feature fusion modules. Extensive experiments on a small labeled test set and large-scale UK Biobank data demonstrate the effectiveness of our approach in terms of higher segmentation Dice scores and near-to-zero false positive prediction rate.
{"title":"HALOS: Hallucination-free Organ Segmentation after Organ Resection Surgery","authors":"Anne-Marie Rickmann, Murong Xu, Thomas Wolf, Oksana P. Kovalenko, C. Wachinger","doi":"10.48550/arXiv.2303.07717","DOIUrl":"https://doi.org/10.48550/arXiv.2303.07717","url":null,"abstract":"The wide range of research in deep learning-based medical image segmentation pushed the boundaries in a multitude of applications. A clinically relevant problem that received less attention is the handling of scans with irregular anatomy, e.g., after organ resection. State-of-the-art segmentation models often lead to organ hallucinations, i.e., false-positive predictions of organs, which cannot be alleviated by oversampling or post-processing. Motivated by the increasing need to develop robust deep learning models, we propose HALOS for abdominal organ segmentation in MR images that handles cases after organ resection surgery. To this end, we combine missing organ classification and multi-organ segmentation tasks into a multi-task model, yielding a classification-assisted segmentation pipeline. The segmentation network learns to incorporate knowledge about organ existence via feature fusion modules. Extensive experiments on a small labeled test set and large-scale UK Biobank data demonstrate the effectiveness of our approach in terms of higher segmentation Dice scores and near-to-zero false positive prediction rate.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"12 1","pages":"667-678"},"PeriodicalIF":0.0,"publicationDate":"2023-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76323908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-13DOI: 10.48550/arXiv.2303.07115
Nian Wu, Miaomiao Zhang
This paper presents NeurEPDiff, a novel network to fast predict the geodesics in deformation spaces generated by a well known Euler-Poincar'e differential equation (EPDiff). To achieve this, we develop a neural operator that for the first time learns the evolving trajectory of geodesic deformations parameterized in the tangent space of diffeomorphisms(a.k.a velocity fields). In contrast to previous methods that purely fit the training images, our proposed NeurEPDiff learns a nonlinear mapping function between the time-dependent velocity fields. A composition of integral operators and smooth activation functions is formulated in each layer of NeurEPDiff to effectively approximate such mappings. The fact that NeurEPDiff is able to rapidly provide the numerical solution of EPDiff (given any initial condition) results in a significantly reduced computational cost of geodesic shooting of diffeomorphisms in a high-dimensional image space. Additionally, the properties of discretiztion/resolution-invariant of NeurEPDiff make its performance generalizable to multiple image resolutions after being trained offline. We demonstrate the effectiveness of NeurEPDiff in registering two image datasets: 2D synthetic data and 3D brain resonance imaging (MRI). The registration accuracy and computational efficiency are compared with the state-of-the-art diffeomophic registration algorithms with geodesic shooting.
{"title":"NeurEPDiff: Neural Operators to Predict Geodesics in Deformation Spaces","authors":"Nian Wu, Miaomiao Zhang","doi":"10.48550/arXiv.2303.07115","DOIUrl":"https://doi.org/10.48550/arXiv.2303.07115","url":null,"abstract":"This paper presents NeurEPDiff, a novel network to fast predict the geodesics in deformation spaces generated by a well known Euler-Poincar'e differential equation (EPDiff). To achieve this, we develop a neural operator that for the first time learns the evolving trajectory of geodesic deformations parameterized in the tangent space of diffeomorphisms(a.k.a velocity fields). In contrast to previous methods that purely fit the training images, our proposed NeurEPDiff learns a nonlinear mapping function between the time-dependent velocity fields. A composition of integral operators and smooth activation functions is formulated in each layer of NeurEPDiff to effectively approximate such mappings. The fact that NeurEPDiff is able to rapidly provide the numerical solution of EPDiff (given any initial condition) results in a significantly reduced computational cost of geodesic shooting of diffeomorphisms in a high-dimensional image space. Additionally, the properties of discretiztion/resolution-invariant of NeurEPDiff make its performance generalizable to multiple image resolutions after being trained offline. We demonstrate the effectiveness of NeurEPDiff in registering two image datasets: 2D synthetic data and 3D brain resonance imaging (MRI). The registration accuracy and computational efficiency are compared with the state-of-the-art diffeomophic registration algorithms with geodesic shooting.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"24 1","pages":"588-600"},"PeriodicalIF":0.0,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79122410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-13DOI: 10.48550/arXiv.2303.07264
Shuxian Wang, Yubo Zhang, Sarah K. McGill, J. Rosenman, Jan-Michael Frahm, Soumyadip Sengupta, S. Pizer
Reconstructing a 3D surface from colonoscopy video is challenging due to illumination and reflectivity variation in the video frame that can cause defective shape predictions. Aiming to overcome this challenge, we utilize the characteristics of surface normal vectors and develop a two-step neural framework that significantly improves the colonoscopy reconstruction quality. The normal-based depth initialization network trained with self-supervised normal consistency loss provides depth map initialization to the normal-depth refinement module, which utilizes the relationship between illumination and surface normals to refine the frame-wise normal and depth predictions recursively. Our framework's depth accuracy performance on phantom colonoscopy data demonstrates the value of exploiting the surface normals in colonoscopy reconstruction, especially on en face views. Due to its low depth error, the prediction result from our framework will require limited post-processing to be clinically applicable for real-time colonoscopy reconstruction.
{"title":"A Surface-normal Based Neural Framework for Colonoscopy Reconstruction","authors":"Shuxian Wang, Yubo Zhang, Sarah K. McGill, J. Rosenman, Jan-Michael Frahm, Soumyadip Sengupta, S. Pizer","doi":"10.48550/arXiv.2303.07264","DOIUrl":"https://doi.org/10.48550/arXiv.2303.07264","url":null,"abstract":"Reconstructing a 3D surface from colonoscopy video is challenging due to illumination and reflectivity variation in the video frame that can cause defective shape predictions. Aiming to overcome this challenge, we utilize the characteristics of surface normal vectors and develop a two-step neural framework that significantly improves the colonoscopy reconstruction quality. The normal-based depth initialization network trained with self-supervised normal consistency loss provides depth map initialization to the normal-depth refinement module, which utilizes the relationship between illumination and surface normals to refine the frame-wise normal and depth predictions recursively. Our framework's depth accuracy performance on phantom colonoscopy data demonstrates the value of exploiting the surface normals in colonoscopy reconstruction, especially on en face views. Due to its low depth error, the prediction result from our framework will require limited post-processing to be clinically applicable for real-time colonoscopy reconstruction.","PeriodicalId":73379,"journal":{"name":"Information processing in medical imaging : proceedings of the ... conference","volume":"30 1","pages":"797-809"},"PeriodicalIF":0.0,"publicationDate":"2023-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91340772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}