Ultrafast Plane-Wave (PW) imaging often produces artifacts and shadows that vary with insonification angles. We propose a novel approach using Implicit Neural Representations (INRs) to compactly encode multi-planar sequences while preserving crucial orientation-dependent information. To our knowledge, this is the first application of INRs for PW angular interpolation. Our method employs a Multi-Layer Perceptron (MLP)-based model with a concise physics-enhanced rendering technique. Quantitative evaluations using SSIM, PSNR, and standard ultrasound metrics, along with qualitative visual assessments, confirm the effectiveness of our approach. Additionally, our method demonstrates significant storage efficiency, with model weights requiring 530 KB compared to 8 MB for directly storing the 75 PW images, achieving a notable compression ratio of approximately 15:1.
{"title":"Compact Implicit Neural Representations for Plane Wave Images","authors":"Mathilde Monvoisin, Yuxin Zhang, Diana Mateus","doi":"arxiv-2409.11370","DOIUrl":"https://doi.org/arxiv-2409.11370","url":null,"abstract":"Ultrafast Plane-Wave (PW) imaging often produces artifacts and shadows that\u0000vary with insonification angles. We propose a novel approach using Implicit\u0000Neural Representations (INRs) to compactly encode multi-planar sequences while\u0000preserving crucial orientation-dependent information. To our knowledge, this is\u0000the first application of INRs for PW angular interpolation. Our method employs\u0000a Multi-Layer Perceptron (MLP)-based model with a concise physics-enhanced\u0000rendering technique. Quantitative evaluations using SSIM, PSNR, and standard\u0000ultrasound metrics, along with qualitative visual assessments, confirm the\u0000effectiveness of our approach. Additionally, our method demonstrates\u0000significant storage efficiency, with model weights requiring 530 KB compared to\u00008 MB for directly storing the 75 PW images, achieving a notable compression\u0000ratio of approximately 15:1.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263235","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ryugo Morita, Hitoshi Nishimura, Ko Watanabe, Andreas Dengel, Jinjia Zhou
In recent years, deep learning-based image compression, particularly through generative models, has emerged as a pivotal area of research. Despite significant advancements, challenges such as diminished sharpness and quality in reconstructed images, learning inefficiencies due to mode collapse, and data loss during transmission persist. To address these issues, we propose a novel compression model that incorporates a denoising step with diffusion models, significantly enhancing image reconstruction fidelity by sub-information(e.g., edge and depth) from leveraging latent space. Empirical experiments demonstrate that our model achieves superior or comparable results in terms of image quality and compression efficiency when measured against the existing models. Notably, our model excels in scenarios of partial image loss or excessive noise by introducing an edge estimation network to preserve the integrity of reconstructed images, offering a robust solution to the current limitations of image compression.
{"title":"Edge-based Denoising Image Compression","authors":"Ryugo Morita, Hitoshi Nishimura, Ko Watanabe, Andreas Dengel, Jinjia Zhou","doi":"arxiv-2409.10978","DOIUrl":"https://doi.org/arxiv-2409.10978","url":null,"abstract":"In recent years, deep learning-based image compression, particularly through\u0000generative models, has emerged as a pivotal area of research. Despite\u0000significant advancements, challenges such as diminished sharpness and quality\u0000in reconstructed images, learning inefficiencies due to mode collapse, and data\u0000loss during transmission persist. To address these issues, we propose a novel\u0000compression model that incorporates a denoising step with diffusion models,\u0000significantly enhancing image reconstruction fidelity by sub-information(e.g.,\u0000edge and depth) from leveraging latent space. Empirical experiments demonstrate\u0000that our model achieves superior or comparable results in terms of image\u0000quality and compression efficiency when measured against the existing models.\u0000Notably, our model excels in scenarios of partial image loss or excessive noise\u0000by introducing an edge estimation network to preserve the integrity of\u0000reconstructed images, offering a robust solution to the current limitations of\u0000image compression.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huidong Xie, Liang Guo, Alexandre Velo, Zhao Liu, Qiong Liu, Xueqi Guo, Bo Zhou, Xiongchao Chen, Yu-Jung Tsai, Tianshun Miao, Menghua Xia, Yi-Hwa Liu, Ian S. Armstrong, Ge Wang, Richard E. Carson, Albert J. Sinusas, Chi Liu
Rb-82 is a radioactive isotope widely used for cardiac PET imaging. Despite numerous benefits of 82-Rb, there are several factors that limits its image quality and quantitative accuracy. First, the short half-life of 82-Rb results in noisy dynamic frames. Low signal-to-noise ratio would result in inaccurate and biased image quantification. Noisy dynamic frames also lead to highly noisy parametric images. The noise levels also vary substantially in different dynamic frames due to radiotracer decay and short half-life. Existing denoising methods are not applicable for this task due to the lack of paired training inputs/labels and inability to generalize across varying noise levels. Second, 82-Rb emits high-energy positrons. Compared with other tracers such as 18-F, 82-Rb travels a longer distance before annihilation, which negatively affect image spatial resolution. Here, the goal of this study is to propose a self-supervised method for simultaneous (1) noise-aware dynamic image denoising and (2) positron range correction for 82-Rb cardiac PET imaging. Tested on a series of PET scans from a cohort of normal volunteers, the proposed method produced images with superior visual quality. To demonstrate the improvement in image quantification, we compared image-derived input functions (IDIFs) with arterial input functions (AIFs) from continuous arterial blood samples. The IDIF derived from the proposed method led to lower AUC differences, decreasing from 11.09% to 7.58% on average, compared to the original dynamic frames. The proposed method also improved the quantification of myocardium blood flow (MBF), as validated against 15-O-water scans, with mean MBF differences decreased from 0.43 to 0.09, compared to the original dynamic frames. We also conducted a generalizability experiment on 37 patient scans obtained from a different country using a different scanner.
{"title":"Noise-aware Dynamic Image Denoising and Positron Range Correction for Rubidium-82 Cardiac PET Imaging via Self-supervision","authors":"Huidong Xie, Liang Guo, Alexandre Velo, Zhao Liu, Qiong Liu, Xueqi Guo, Bo Zhou, Xiongchao Chen, Yu-Jung Tsai, Tianshun Miao, Menghua Xia, Yi-Hwa Liu, Ian S. Armstrong, Ge Wang, Richard E. Carson, Albert J. Sinusas, Chi Liu","doi":"arxiv-2409.11543","DOIUrl":"https://doi.org/arxiv-2409.11543","url":null,"abstract":"Rb-82 is a radioactive isotope widely used for cardiac PET imaging. Despite\u0000numerous benefits of 82-Rb, there are several factors that limits its image\u0000quality and quantitative accuracy. First, the short half-life of 82-Rb results\u0000in noisy dynamic frames. Low signal-to-noise ratio would result in inaccurate\u0000and biased image quantification. Noisy dynamic frames also lead to highly noisy\u0000parametric images. The noise levels also vary substantially in different\u0000dynamic frames due to radiotracer decay and short half-life. Existing denoising\u0000methods are not applicable for this task due to the lack of paired training\u0000inputs/labels and inability to generalize across varying noise levels. Second,\u000082-Rb emits high-energy positrons. Compared with other tracers such as 18-F,\u000082-Rb travels a longer distance before annihilation, which negatively affect\u0000image spatial resolution. Here, the goal of this study is to propose a\u0000self-supervised method for simultaneous (1) noise-aware dynamic image denoising\u0000and (2) positron range correction for 82-Rb cardiac PET imaging. Tested on a\u0000series of PET scans from a cohort of normal volunteers, the proposed method\u0000produced images with superior visual quality. To demonstrate the improvement in\u0000image quantification, we compared image-derived input functions (IDIFs) with\u0000arterial input functions (AIFs) from continuous arterial blood samples. The\u0000IDIF derived from the proposed method led to lower AUC differences, decreasing\u0000from 11.09% to 7.58% on average, compared to the original dynamic frames. The\u0000proposed method also improved the quantification of myocardium blood flow\u0000(MBF), as validated against 15-O-water scans, with mean MBF differences\u0000decreased from 0.43 to 0.09, compared to the original dynamic frames. We also\u0000conducted a generalizability experiment on 37 patient scans obtained from a\u0000different country using a different scanner.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Emile Saillard, Aurélie Levillain, David Mitton, Jean-Baptiste Pialat, Cyrille Confavreux, Hélène Follet, Thomas Grenier
Purpose: Bone metastasis have a major impact on the quality of life of patients and they are diverse in terms of size and location, making their segmentation complex. Manual segmentation is time-consuming, and expert segmentations are subject to operator variability, which makes obtaining accurate and reproducible segmentations of bone metastasis on CT-scans a challenging yet important task to achieve. Materials and Methods: Deep learning methods tackle segmentation tasks efficiently but require large datasets along with expert manual segmentations to generalize on new images. We propose an automated data synthesis pipeline using 3D Denoising Diffusion Probabilistic Models (DDPM) to enchance the segmentation of femoral metastasis from CT-scan volumes of patients. We used 29 existing lesions along with 26 healthy femurs to create new realistic synthetic metastatic images, and trained a DDPM to improve the diversity and realism of the simulated volumes. We also investigated the operator variability on manual segmentation. Results: We created 5675 new volumes, then trained 3D U-Net segmentation models on real and synthetic data to compare segmentation performance, and we evaluated the performance of the models depending on the amount of synthetic data used in training. Conclusion: Our results showed that segmentation models trained with synthetic data outperformed those trained on real volumes only, and that those models perform especially well when considering operator variability.
{"title":"Enhanced segmentation of femoral bone metastasis in CT scans of patients using synthetic data generation with 3D diffusion models","authors":"Emile Saillard, Aurélie Levillain, David Mitton, Jean-Baptiste Pialat, Cyrille Confavreux, Hélène Follet, Thomas Grenier","doi":"arxiv-2409.11011","DOIUrl":"https://doi.org/arxiv-2409.11011","url":null,"abstract":"Purpose: Bone metastasis have a major impact on the quality of life of\u0000patients and they are diverse in terms of size and location, making their\u0000segmentation complex. Manual segmentation is time-consuming, and expert\u0000segmentations are subject to operator variability, which makes obtaining\u0000accurate and reproducible segmentations of bone metastasis on CT-scans a\u0000challenging yet important task to achieve. Materials and Methods: Deep learning\u0000methods tackle segmentation tasks efficiently but require large datasets along\u0000with expert manual segmentations to generalize on new images. We propose an\u0000automated data synthesis pipeline using 3D Denoising Diffusion Probabilistic\u0000Models (DDPM) to enchance the segmentation of femoral metastasis from CT-scan\u0000volumes of patients. We used 29 existing lesions along with 26 healthy femurs\u0000to create new realistic synthetic metastatic images, and trained a DDPM to\u0000improve the diversity and realism of the simulated volumes. We also\u0000investigated the operator variability on manual segmentation. Results: We\u0000created 5675 new volumes, then trained 3D U-Net segmentation models on real and\u0000synthetic data to compare segmentation performance, and we evaluated the\u0000performance of the models depending on the amount of synthetic data used in\u0000training. Conclusion: Our results showed that segmentation models trained with\u0000synthetic data outperformed those trained on real volumes only, and that those\u0000models perform especially well when considering operator variability.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Numerous deep learning-based solutions have been proposed for histopathological image analysis over the past years. While they usually demonstrate exceptionally high accuracy, one key question is whether their precision might be affected by low-level image properties not related to histopathology but caused by microscopy image handling and pre-processing. In this paper, we analyze a popular NCT-CRC-HE-100K colorectal cancer dataset used in numerous prior works and show that both this dataset and the obtained results may be affected by data-specific biases. The most prominent revealed dataset issues are inappropriate color normalization, severe JPEG artifacts inconsistent between different classes, and completely corrupted tissue samples resulting from incorrect image dynamic range handling. We show that even the simplest model using only 3 features per image (red, green and blue color intensities) can demonstrate over 50% accuracy on this 9-class dataset, while using color histogram not explicitly capturing cell morphology features yields over 82% accuracy. Moreover, we show that a basic EfficientNet-B0 ImageNet pretrained model can achieve over 97.7% accuracy on this dataset, outperforming all previously proposed solutions developed for this task, including dedicated foundation histopathological models and large cell morphology-aware neural networks. The NCT-CRC-HE dataset is publicly available and can be freely used to replicate the presented results. The codes and pre-trained models used in this paper are available at https://github.com/gmalivenko/NCT-CRC-HE-experiments
{"title":"NCT-CRC-HE: Not All Histopathological Datasets Are Equally Useful","authors":"Andrey Ignatov, Grigory Malivenko","doi":"arxiv-2409.11546","DOIUrl":"https://doi.org/arxiv-2409.11546","url":null,"abstract":"Numerous deep learning-based solutions have been proposed for\u0000histopathological image analysis over the past years. While they usually\u0000demonstrate exceptionally high accuracy, one key question is whether their\u0000precision might be affected by low-level image properties not related to\u0000histopathology but caused by microscopy image handling and pre-processing. In\u0000this paper, we analyze a popular NCT-CRC-HE-100K colorectal cancer dataset used\u0000in numerous prior works and show that both this dataset and the obtained\u0000results may be affected by data-specific biases. The most prominent revealed\u0000dataset issues are inappropriate color normalization, severe JPEG artifacts\u0000inconsistent between different classes, and completely corrupted tissue samples\u0000resulting from incorrect image dynamic range handling. We show that even the\u0000simplest model using only 3 features per image (red, green and blue color\u0000intensities) can demonstrate over 50% accuracy on this 9-class dataset, while\u0000using color histogram not explicitly capturing cell morphology features yields\u0000over 82% accuracy. Moreover, we show that a basic EfficientNet-B0 ImageNet\u0000pretrained model can achieve over 97.7% accuracy on this dataset, outperforming\u0000all previously proposed solutions developed for this task, including dedicated\u0000foundation histopathological models and large cell morphology-aware neural\u0000networks. The NCT-CRC-HE dataset is publicly available and can be freely used\u0000to replicate the presented results. The codes and pre-trained models used in\u0000this paper are available at\u0000https://github.com/gmalivenko/NCT-CRC-HE-experiments","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saif Khalid, Hatem A. Rashwan, Saddam Abdulwahab, Mohamed Abdel-Nasser, Facundo Manuel Quiroga, Domenec Puig
The performance of diagnostic Computer-Aided Design (CAD) systems for retinal diseases depends on the quality of the retinal images being screened. Thus, many studies have been developed to evaluate and assess the quality of such retinal images. However, most of them did not investigate the relationship between the accuracy of the developed models and the quality of the visualization of interpretability methods for distinguishing between gradable and non-gradable retinal images. Consequently, this paper presents a novel framework called FGR-Net to automatically assess and interpret underlying fundus image quality by merging an autoencoder network with a classifier network. The FGR-Net model also provides an interpretable quality assessment through visualizations. In particular, FGR-Net uses a deep autoencoder to reconstruct the input image in order to extract the visual characteristics of the input fundus images based on self-supervised learning. The extracted features by the autoencoder are then fed into a deep classifier network to distinguish between gradable and ungradable fundus images. FGR-Net is evaluated with different interpretability methods, which indicates that the autoencoder is a key factor in forcing the classifier to focus on the relevant structures of the fundus images, such as the fovea, optic disk, and prominent blood vessels. Additionally, the interpretability methods can provide visual feedback for ophthalmologists to understand how our model evaluates the quality of fundus images. The experimental results showed the superiority of FGR-Net over the state-of-the-art quality assessment methods, with an accuracy of 89% and an F1-score of 87%.
{"title":"FGR-Net:Interpretable fundus imagegradeability classification based on deepreconstruction learning","authors":"Saif Khalid, Hatem A. Rashwan, Saddam Abdulwahab, Mohamed Abdel-Nasser, Facundo Manuel Quiroga, Domenec Puig","doi":"arxiv-2409.10246","DOIUrl":"https://doi.org/arxiv-2409.10246","url":null,"abstract":"The performance of diagnostic Computer-Aided Design (CAD) systems for retinal\u0000diseases depends on the quality of the retinal images being screened. Thus,\u0000many studies have been developed to evaluate and assess the quality of such\u0000retinal images. However, most of them did not investigate the relationship\u0000between the accuracy of the developed models and the quality of the\u0000visualization of interpretability methods for distinguishing between gradable\u0000and non-gradable retinal images. Consequently, this paper presents a novel\u0000framework called FGR-Net to automatically assess and interpret underlying\u0000fundus image quality by merging an autoencoder network with a classifier\u0000network. The FGR-Net model also provides an interpretable quality assessment\u0000through visualizations. In particular, FGR-Net uses a deep autoencoder to\u0000reconstruct the input image in order to extract the visual characteristics of\u0000the input fundus images based on self-supervised learning. The extracted\u0000features by the autoencoder are then fed into a deep classifier network to\u0000distinguish between gradable and ungradable fundus images. FGR-Net is evaluated\u0000with different interpretability methods, which indicates that the autoencoder\u0000is a key factor in forcing the classifier to focus on the relevant structures\u0000of the fundus images, such as the fovea, optic disk, and prominent blood\u0000vessels. Additionally, the interpretability methods can provide visual feedback\u0000for ophthalmologists to understand how our model evaluates the quality of\u0000fundus images. The experimental results showed the superiority of FGR-Net over\u0000the state-of-the-art quality assessment methods, with an accuracy of 89% and an\u0000F1-score of 87%.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose depth from coupled optical differentiation, a low-computation passive-lighting 3D sensing mechanism. It is based on our discovery that per-pixel object distance can be rigorously determined by a coupled pair of optical derivatives of a defocused image using a simple, closed-form relationship. Unlike previous depth-from-defocus (DfD) methods that leverage spatial derivatives of the image to estimate scene depths, the proposed mechanism's use of only optical derivatives makes it significantly more robust to noise. Furthermore, unlike many previous DfD algorithms with requirements on aperture code, this relationship is proved to be universal to a broad range of aperture codes. We build the first 3D sensor based on depth from coupled optical differentiation. Its optical assembly includes a deformable lens and a motorized iris, which enables dynamic adjustments to the optical power and aperture radius. The sensor captures two pairs of images: one pair with a differential change of optical power and the other with a differential change of aperture scale. From the four images, a depth and confidence map can be generated with only 36 floating point operations per output pixel (FLOPOP), more than ten times lower than the previous lowest passive-lighting depth sensing solution to our knowledge. Additionally, the depth map generated by the proposed sensor demonstrates more than twice the working range of previous DfD methods while using significantly lower computation.
{"title":"Depth from Coupled Optical Differentiation","authors":"Junjie Luo, Yuxuan Liu, Emma Alexander, Qi Guo","doi":"arxiv-2409.10725","DOIUrl":"https://doi.org/arxiv-2409.10725","url":null,"abstract":"We propose depth from coupled optical differentiation, a low-computation\u0000passive-lighting 3D sensing mechanism. It is based on our discovery that\u0000per-pixel object distance can be rigorously determined by a coupled pair of\u0000optical derivatives of a defocused image using a simple, closed-form\u0000relationship. Unlike previous depth-from-defocus (DfD) methods that leverage\u0000spatial derivatives of the image to estimate scene depths, the proposed\u0000mechanism's use of only optical derivatives makes it significantly more robust\u0000to noise. Furthermore, unlike many previous DfD algorithms with requirements on\u0000aperture code, this relationship is proved to be universal to a broad range of\u0000aperture codes. We build the first 3D sensor based on depth from coupled optical\u0000differentiation. Its optical assembly includes a deformable lens and a\u0000motorized iris, which enables dynamic adjustments to the optical power and\u0000aperture radius. The sensor captures two pairs of images: one pair with a\u0000differential change of optical power and the other with a differential change\u0000of aperture scale. From the four images, a depth and confidence map can be\u0000generated with only 36 floating point operations per output pixel (FLOPOP),\u0000more than ten times lower than the previous lowest passive-lighting depth\u0000sensing solution to our knowledge. Additionally, the depth map generated by the\u0000proposed sensor demonstrates more than twice the working range of previous DfD\u0000methods while using significantly lower computation.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Balint Kovacs, Shuhan Xiao, Maximilian Rokuss, Constantin Ulrich, Fabian Isensee, Klaus H. Maier-Hein
The third autoPET challenge introduced a new data-centric task this year, shifting the focus from model development to improving metastatic lesion segmentation on PET/CT images through data quality and handling strategies. In response, we developed targeted methods to enhance segmentation performance tailored to the characteristics of PET/CT imaging. Our approach encompasses two key elements. First, to address potential alignment errors between CT and PET modalities as well as the prevalence of punctate lesions, we modified the baseline data augmentation scheme and extended it with misalignment augmentation. This adaptation aims to improve segmentation accuracy, particularly for tiny metastatic lesions. Second, to tackle the variability in image dimensions significantly affecting the prediction time, we implemented a dynamic ensembling and test-time augmentation (TTA) strategy. This method optimizes the use of ensembling and TTA within a 5-minute prediction time limit, effectively leveraging the generalization potential for both small and large images. Both of our solutions are designed to be robust across different tracers and institutional settings, offering a general, yet imaging-specific approach to the multi-tracer and multi-institutional challenges of the competition. We made the challenge repository with our modifications publicly available at url{https://github.com/MIC-DKFZ/miccai2024_autopet3_datacentric}.
{"title":"Data-Centric Strategies for Overcoming PET/CT Heterogeneity: Insights from the AutoPET III Lesion Segmentation Challenge","authors":"Balint Kovacs, Shuhan Xiao, Maximilian Rokuss, Constantin Ulrich, Fabian Isensee, Klaus H. Maier-Hein","doi":"arxiv-2409.10120","DOIUrl":"https://doi.org/arxiv-2409.10120","url":null,"abstract":"The third autoPET challenge introduced a new data-centric task this year,\u0000shifting the focus from model development to improving metastatic lesion\u0000segmentation on PET/CT images through data quality and handling strategies. In\u0000response, we developed targeted methods to enhance segmentation performance\u0000tailored to the characteristics of PET/CT imaging. Our approach encompasses two\u0000key elements. First, to address potential alignment errors between CT and PET\u0000modalities as well as the prevalence of punctate lesions, we modified the\u0000baseline data augmentation scheme and extended it with misalignment\u0000augmentation. This adaptation aims to improve segmentation accuracy,\u0000particularly for tiny metastatic lesions. Second, to tackle the variability in\u0000image dimensions significantly affecting the prediction time, we implemented a\u0000dynamic ensembling and test-time augmentation (TTA) strategy. This method\u0000optimizes the use of ensembling and TTA within a 5-minute prediction time\u0000limit, effectively leveraging the generalization potential for both small and\u0000large images. Both of our solutions are designed to be robust across different\u0000tracers and institutional settings, offering a general, yet imaging-specific\u0000approach to the multi-tracer and multi-institutional challenges of the\u0000competition. We made the challenge repository with our modifications publicly\u0000available at url{https://github.com/MIC-DKFZ/miccai2024_autopet3_datacentric}.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"189 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Koch, Orhun Utku Aydin, Adam Hilbert, Jana Rieger, Satoru Tanioka, Fujimaro Ishida, Dietmar Frey
Cerebrovascular disease often requires multiple imaging modalities for accurate diagnosis, treatment, and monitoring. Computed Tomography Angiography (CTA) and Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) are two common non-invasive angiography techniques, each with distinct strengths in accessibility, safety, and diagnostic accuracy. While CTA is more widely used in acute stroke due to its faster acquisition times and higher diagnostic accuracy, TOF-MRA is preferred for its safety, as it avoids radiation exposure and contrast agent-related health risks. Despite the predominant role of CTA in clinical workflows, there is a scarcity of open-source CTA data, limiting the research and development of AI models for tasks such as large vessel occlusion detection and aneurysm segmentation. This study explores diffusion-based image-to-image translation models to generate synthetic CTA images from TOF-MRA input. We demonstrate the modality conversion from TOF-MRA to CTA and show that diffusion models outperform a traditional U-Net-based approach. Our work compares different state-of-the-art diffusion architectures and samplers, offering recommendations for optimal model performance in this cross-modality translation task.
{"title":"Cross-modality image synthesis from TOF-MRA to CTA using diffusion-based models","authors":"Alexander Koch, Orhun Utku Aydin, Adam Hilbert, Jana Rieger, Satoru Tanioka, Fujimaro Ishida, Dietmar Frey","doi":"arxiv-2409.10089","DOIUrl":"https://doi.org/arxiv-2409.10089","url":null,"abstract":"Cerebrovascular disease often requires multiple imaging modalities for\u0000accurate diagnosis, treatment, and monitoring. Computed Tomography Angiography\u0000(CTA) and Time-of-Flight Magnetic Resonance Angiography (TOF-MRA) are two\u0000common non-invasive angiography techniques, each with distinct strengths in\u0000accessibility, safety, and diagnostic accuracy. While CTA is more widely used\u0000in acute stroke due to its faster acquisition times and higher diagnostic\u0000accuracy, TOF-MRA is preferred for its safety, as it avoids radiation exposure\u0000and contrast agent-related health risks. Despite the predominant role of CTA in\u0000clinical workflows, there is a scarcity of open-source CTA data, limiting the\u0000research and development of AI models for tasks such as large vessel occlusion\u0000detection and aneurysm segmentation. This study explores diffusion-based\u0000image-to-image translation models to generate synthetic CTA images from TOF-MRA\u0000input. We demonstrate the modality conversion from TOF-MRA to CTA and show that\u0000diffusion models outperform a traditional U-Net-based approach. Our work\u0000compares different state-of-the-art diffusion architectures and samplers,\u0000offering recommendations for optimal model performance in this cross-modality\u0000translation task.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyperspectral imaging has been widely used for spectral and spatial identification of target molecules, yet often contaminated by sophisticated noise. Current denoising methods generally rely on independent and identically distributed noise statistics, showing corrupted performance for non-independent noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise Denoising (SPEND), a deep learning denoising architecture tailor-made for removing non-independent noise from a single hyperspectral image stack. We utilize hyperspectral stimulated Raman scattering and mid-infrared photothermal microscopy as the testbeds, where the noise is spatially correlated and spectrally varied. Based on single hyperspectral images, SPEND permutates odd and even spectral frames to generate two stacks with identical noise properties, and uses the pairs for efficient self-supervised noise-to-noise training. SPEND achieved an 8-fold signal-to-noise improvement without having access to the ground truth data. SPEND enabled accurate mapping of low concentration biomolecules in both fingerprint and silent regions, demonstrating its robustness in sophisticated cellular environments.
{"title":"Self-Supervised Elimination of Non-Independent Noise in Hyperspectral Imaging","authors":"Guangrui Ding, Chang Liu, Jiaze Yin, Xinyan Teng, Yuying Tan, Hongjian He, Haonan Lin, Lei Tian, Ji-Xin Cheng","doi":"arxiv-2409.09910","DOIUrl":"https://doi.org/arxiv-2409.09910","url":null,"abstract":"Hyperspectral imaging has been widely used for spectral and spatial\u0000identification of target molecules, yet often contaminated by sophisticated\u0000noise. Current denoising methods generally rely on independent and identically\u0000distributed noise statistics, showing corrupted performance for non-independent\u0000noise removal. Here, we demonstrate Self-supervised PErmutation Noise2noise\u0000Denoising (SPEND), a deep learning denoising architecture tailor-made for\u0000removing non-independent noise from a single hyperspectral image stack. We\u0000utilize hyperspectral stimulated Raman scattering and mid-infrared photothermal\u0000microscopy as the testbeds, where the noise is spatially correlated and\u0000spectrally varied. Based on single hyperspectral images, SPEND permutates odd\u0000and even spectral frames to generate two stacks with identical noise\u0000properties, and uses the pairs for efficient self-supervised noise-to-noise\u0000training. SPEND achieved an 8-fold signal-to-noise improvement without having\u0000access to the ground truth data. SPEND enabled accurate mapping of low\u0000concentration biomolecules in both fingerprint and silent regions,\u0000demonstrating its robustness in sophisticated cellular environments.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"92 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}