In recent years, significant progress has been made in tumor segmentation within the field of digital pathology. However, variations in organs, tissue preparation methods, and image acquisition processes can lead to domain discrepancies among digital pathology images. To address this problem, in this paper, we use Rein, a fine-tuning method, to parametrically and efficiently fine-tune various vision foundation models (VFMs) for MICCAI 2024 Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation (COSAS2024). The core of Rein consists of a set of learnable tokens, which are directly linked to instances, improving functionality at the instance level in each layer. In the data environment of the COSAS2024 Challenge, extensive experiments demonstrate that Rein fine-tuned the VFMs to achieve satisfactory results. Specifically, we used Rein to fine-tune ConvNeXt and DINOv2. Our team used the former to achieve scores of 0.7719 and 0.7557 on the preliminary test phase and final test phase in task1, respectively, while the latter achieved scores of 0.8848 and 0.8192 on the preliminary test phase and final test phase in task2. Code is available at GitHub.
{"title":"Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation using Rein to Fine-tune Vision Foundation Models","authors":"Pengzhou Cai, Xueyuan Zhang, Ze Zhao","doi":"arxiv-2409.11752","DOIUrl":"https://doi.org/arxiv-2409.11752","url":null,"abstract":"In recent years, significant progress has been made in tumor segmentation\u0000within the field of digital pathology. However, variations in organs, tissue\u0000preparation methods, and image acquisition processes can lead to domain\u0000discrepancies among digital pathology images. To address this problem, in this\u0000paper, we use Rein, a fine-tuning method, to parametrically and efficiently\u0000fine-tune various vision foundation models (VFMs) for MICCAI 2024 Cross-Organ\u0000and Cross-Scanner Adenocarcinoma Segmentation (COSAS2024). The core of Rein\u0000consists of a set of learnable tokens, which are directly linked to instances,\u0000improving functionality at the instance level in each layer. In the data\u0000environment of the COSAS2024 Challenge, extensive experiments demonstrate that\u0000Rein fine-tuned the VFMs to achieve satisfactory results. Specifically, we used\u0000Rein to fine-tune ConvNeXt and DINOv2. Our team used the former to achieve\u0000scores of 0.7719 and 0.7557 on the preliminary test phase and final test phase\u0000in task1, respectively, while the latter achieved scores of 0.8848 and 0.8192\u0000on the preliminary test phase and final test phase in task2. Code is available\u0000at GitHub.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yang Liu, Yahui Li, Rui Li, Liming Zhou, Lanxue Dang, Huiyu Mu, Qiang Ge
Convolutional neural network (CNN) performs well in Hyperspectral Image (HSI) classification tasks, but its high energy consumption and complex network structure make it difficult to directly apply it to edge computing devices. At present, spiking neural networks (SNN) have developed rapidly in HSI classification tasks due to their low energy consumption and event driven characteristics. However, it usually requires a longer time step to achieve optimal accuracy. In response to the above problems, this paper builds a spiking neural network (SNN-SWMR) based on the leaky integrate-and-fire (LIF) neuron model for HSI classification tasks. The network uses the spiking width mixed residual (SWMR) module as the basic unit to perform feature extraction operations. The spiking width mixed residual module is composed of spiking mixed convolution (SMC), which can effectively extract spatial-spectral features. Secondly, this paper designs a simple and efficient arcsine approximate derivative (AAD), which solves the non-differentiable problem of spike firing by fitting the Dirac function. Through AAD, we can directly train supervised spike neural networks. Finally, this paper conducts comparative experiments with multiple advanced HSI classification algorithms based on spiking neural networks on six public hyperspectral data sets. Experimental results show that the AAD function has strong robustness and a good fitting effect. Meanwhile, compared with other algorithms, SNN-SWMR requires a time step reduction of about 84%, training time, and testing time reduction of about 63% and 70% at the same accuracy. This study solves the key problem of SNN based HSI classification algorithms, which has important practical significance for promoting the practical application of HSI classification algorithms in edge devices such as spaceborne and airborne devices.
{"title":"Hyperspectral Image Classification Based on Faster Residual Multi-branch Spiking Neural Network","authors":"Yang Liu, Yahui Li, Rui Li, Liming Zhou, Lanxue Dang, Huiyu Mu, Qiang Ge","doi":"arxiv-2409.11619","DOIUrl":"https://doi.org/arxiv-2409.11619","url":null,"abstract":"Convolutional neural network (CNN) performs well in Hyperspectral Image (HSI)\u0000classification tasks, but its high energy consumption and complex network\u0000structure make it difficult to directly apply it to edge computing devices. At\u0000present, spiking neural networks (SNN) have developed rapidly in HSI\u0000classification tasks due to their low energy consumption and event driven\u0000characteristics. However, it usually requires a longer time step to achieve\u0000optimal accuracy. In response to the above problems, this paper builds a\u0000spiking neural network (SNN-SWMR) based on the leaky integrate-and-fire (LIF)\u0000neuron model for HSI classification tasks. The network uses the spiking width\u0000mixed residual (SWMR) module as the basic unit to perform feature extraction\u0000operations. The spiking width mixed residual module is composed of spiking\u0000mixed convolution (SMC), which can effectively extract spatial-spectral\u0000features. Secondly, this paper designs a simple and efficient arcsine\u0000approximate derivative (AAD), which solves the non-differentiable problem of\u0000spike firing by fitting the Dirac function. Through AAD, we can directly train\u0000supervised spike neural networks. Finally, this paper conducts comparative\u0000experiments with multiple advanced HSI classification algorithms based on\u0000spiking neural networks on six public hyperspectral data sets. Experimental\u0000results show that the AAD function has strong robustness and a good fitting\u0000effect. Meanwhile, compared with other algorithms, SNN-SWMR requires a time\u0000step reduction of about 84%, training time, and testing time reduction of about\u000063% and 70% at the same accuracy. This study solves the key problem of SNN\u0000based HSI classification algorithms, which has important practical significance\u0000for promoting the practical application of HSI classification algorithms in\u0000edge devices such as spaceborne and airborne devices.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Seongmin Hong, Jaehyeok Bae, Jongho Lee, Se Young Chun
Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist sampling. However, traditional optimization-based reconstruction is slow and can not yield an exact image in practice. Deep learning-based reconstruction has been a promising alternative to optimization-based reconstruction, outperforming it in accuracy and computation speed. Finding an efficient sampling method with deep learning-based reconstruction, especially for Fourier CS remains a challenge. Existing joint optimization of sampling-reconstruction works (H1) optimize the sampling mask but have low potential as it is not adaptive to each data point. Adaptive sampling (H2) has also disadvantages of difficult optimization and Pareto sub-optimality. Here, we propose a novel adaptive selection of sampling-reconstruction (H1.5) framework that selects the best sampling mask and reconstruction network for each input data. We provide theorems that our method has a higher potential than H1 and effectively solves the Pareto sub-optimality problem in sampling-reconstruction by using separate reconstruction networks for different sampling masks. To select the best sampling mask, we propose to quantify the high-frequency Bayesian uncertainty of the input, using a super-resolution space generation model. Our method outperforms joint optimization of sampling-reconstruction (H1) and adaptive sampling (H2) by achieving significant improvements on several Fourier CS problems.
{"title":"Adaptive Selection of Sampling-Reconstruction in Fourier Compressed Sensing","authors":"Seongmin Hong, Jaehyeok Bae, Jongho Lee, Se Young Chun","doi":"arxiv-2409.11738","DOIUrl":"https://doi.org/arxiv-2409.11738","url":null,"abstract":"Compressed sensing (CS) has emerged to overcome the inefficiency of Nyquist\u0000sampling. However, traditional optimization-based reconstruction is slow and\u0000can not yield an exact image in practice. Deep learning-based reconstruction\u0000has been a promising alternative to optimization-based reconstruction,\u0000outperforming it in accuracy and computation speed. Finding an efficient\u0000sampling method with deep learning-based reconstruction, especially for Fourier\u0000CS remains a challenge. Existing joint optimization of sampling-reconstruction\u0000works (H1) optimize the sampling mask but have low potential as it is not\u0000adaptive to each data point. Adaptive sampling (H2) has also disadvantages of\u0000difficult optimization and Pareto sub-optimality. Here, we propose a novel\u0000adaptive selection of sampling-reconstruction (H1.5) framework that selects the\u0000best sampling mask and reconstruction network for each input data. We provide\u0000theorems that our method has a higher potential than H1 and effectively solves\u0000the Pareto sub-optimality problem in sampling-reconstruction by using separate\u0000reconstruction networks for different sampling masks. To select the best\u0000sampling mask, we propose to quantify the high-frequency Bayesian uncertainty\u0000of the input, using a super-resolution space generation model. Our method\u0000outperforms joint optimization of sampling-reconstruction (H1) and adaptive\u0000sampling (H2) by achieving significant improvements on several Fourier CS\u0000problems.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jue Jiang, Chloe Min Seo Choi, Maria Thor, Joseph O. Deasy, Harini Veeraraghavan
Background: Voxel-based analysis (VBA) for population level radiotherapy (RT) outcomes modeling requires topology preserving inter-patient deformable image registration (DIR) that preserves tumors on moving images while avoiding unrealistic deformations due to tumors occurring on fixed images. Purpose: We developed a tumor-aware recurrent registration (TRACER) deep learning (DL) method and evaluated its suitability for VBA. Methods: TRACER consists of encoder layers implemented with stacked 3D convolutional long short term memory network (3D-CLSTM) followed by decoder and spatial transform layers to compute dense deformation vector field (DVF). Multiple CLSTM steps are used to compute a progressive sequence of deformations. Input conditioning was applied by including tumor segmentations with 3D image pairs as input channels. Bidirectional tumor rigidity, image similarity, and deformation smoothness losses were used to optimize the network in an unsupervised manner. TRACER and multiple DL methods were trained with 204 3D CT image pairs from patients with lung cancers (LC) and evaluated using (a) Dataset I (N = 308 pairs) with DL segmented LCs, (b) Dataset II (N = 765 pairs) with manually delineated LCs, and (c) Dataset III with 42 LC patients treated with RT. Results: TRACER accurately aligned normal tissues. It best preserved tumors, blackindicated by the smallest tumor volume difference of 0.24%, 0.40%, and 0.13 % and mean square error in CT intensities of 0.005, 0.005, 0.004, computed between original and resampled moving image tumors, for Datasets I, II, and III, respectively. It resulted in the smallest planned RT tumor dose difference computed between original and resampled moving images of 0.01 Gy and 0.013 Gy when using a female and a male reference.
{"title":"Tumor aware recurrent inter-patient deformable image registration of computed tomography scans with lung cancer","authors":"Jue Jiang, Chloe Min Seo Choi, Maria Thor, Joseph O. Deasy, Harini Veeraraghavan","doi":"arxiv-2409.11910","DOIUrl":"https://doi.org/arxiv-2409.11910","url":null,"abstract":"Background: Voxel-based analysis (VBA) for population level radiotherapy (RT)\u0000outcomes modeling requires topology preserving inter-patient deformable image\u0000registration (DIR) that preserves tumors on moving images while avoiding\u0000unrealistic deformations due to tumors occurring on fixed images. Purpose: We\u0000developed a tumor-aware recurrent registration (TRACER) deep learning (DL)\u0000method and evaluated its suitability for VBA. Methods: TRACER consists of\u0000encoder layers implemented with stacked 3D convolutional long short term memory\u0000network (3D-CLSTM) followed by decoder and spatial transform layers to compute\u0000dense deformation vector field (DVF). Multiple CLSTM steps are used to compute\u0000a progressive sequence of deformations. Input conditioning was applied by\u0000including tumor segmentations with 3D image pairs as input channels.\u0000Bidirectional tumor rigidity, image similarity, and deformation smoothness\u0000losses were used to optimize the network in an unsupervised manner. TRACER and\u0000multiple DL methods were trained with 204 3D CT image pairs from patients with\u0000lung cancers (LC) and evaluated using (a) Dataset I (N = 308 pairs) with DL\u0000segmented LCs, (b) Dataset II (N = 765 pairs) with manually delineated LCs, and\u0000(c) Dataset III with 42 LC patients treated with RT. Results: TRACER accurately\u0000aligned normal tissues. It best preserved tumors, blackindicated by the\u0000smallest tumor volume difference of 0.24%, 0.40%, and 0.13 % and mean square\u0000error in CT intensities of 0.005, 0.005, 0.004, computed between original and\u0000resampled moving image tumors, for Datasets I, II, and III, respectively. It\u0000resulted in the smallest planned RT tumor dose difference computed between\u0000original and resampled moving images of 0.01 Gy and 0.013 Gy when using a\u0000female and a male reference.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tianyu Zhang, Haotian Zhang, Yuqi Li, Li Li, Dong Liu
Learned image compression (LIC) has achieved state-of-the-art rate-distortion performance, deemed promising for next-generation image compression techniques. However, pre-trained LIC models usually suffer from significant performance degradation when applied to out-of-training-domain images, implying their poor generalization capabilities. To tackle this problem, we propose a few-shot domain adaptation method for LIC by integrating plug-and-play adapters into pre-trained models. Drawing inspiration from the analogy between latent channels and frequency components, we examine domain gaps in LIC and observe that out-of-training-domain images disrupt pre-trained channel-wise decomposition. Consequently, we introduce a method for channel-wise re-allocation using convolution-based adapters and low-rank adapters, which are lightweight and compatible to mainstream LIC schemes. Extensive experiments across multiple domains and multiple representative LIC schemes demonstrate that our method significantly enhances pre-trained models, achieving comparable performance to H.266/VVC intra coding with merely 25 target-domain samples. Additionally, our method matches the performance of full-model finetune while transmitting fewer than $2%$ of the parameters.
{"title":"Few-Shot Domain Adaptation for Learned Image Compression","authors":"Tianyu Zhang, Haotian Zhang, Yuqi Li, Li Li, Dong Liu","doi":"arxiv-2409.11111","DOIUrl":"https://doi.org/arxiv-2409.11111","url":null,"abstract":"Learned image compression (LIC) has achieved state-of-the-art rate-distortion\u0000performance, deemed promising for next-generation image compression techniques.\u0000However, pre-trained LIC models usually suffer from significant performance\u0000degradation when applied to out-of-training-domain images, implying their poor\u0000generalization capabilities. To tackle this problem, we propose a few-shot\u0000domain adaptation method for LIC by integrating plug-and-play adapters into\u0000pre-trained models. Drawing inspiration from the analogy between latent\u0000channels and frequency components, we examine domain gaps in LIC and observe\u0000that out-of-training-domain images disrupt pre-trained channel-wise\u0000decomposition. Consequently, we introduce a method for channel-wise\u0000re-allocation using convolution-based adapters and low-rank adapters, which are\u0000lightweight and compatible to mainstream LIC schemes. Extensive experiments\u0000across multiple domains and multiple representative LIC schemes demonstrate\u0000that our method significantly enhances pre-trained models, achieving comparable\u0000performance to H.266/VVC intra coding with merely 25 target-domain samples.\u0000Additionally, our method matches the performance of full-model finetune while\u0000transmitting fewer than $2%$ of the parameters.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Whole Slide Images (WSIs) are critical for various clinical applications, including histopathological analysis. However, current deep learning approaches in this field predominantly focus on individual tumor types, limiting model generalization and scalability. This relatively narrow focus ultimately stems from the inherent heterogeneity in histopathology and the diverse morphological and molecular characteristics of different tumors. To this end, we propose a novel approach for multi-cohort WSI analysis, designed to leverage the diversity of different tumor types. We introduce a Cohort-Aware Attention module, enabling the capture of both shared and tumor-specific pathological patterns, enhancing cross-tumor generalization. Furthermore, we construct an adversarial cohort regularization mechanism to minimize cohort-specific biases through mutual information minimization. Additionally, we develop a hierarchical sample balancing strategy to mitigate cohort imbalances and promote unbiased learning. Together, these form a cohesive framework for unbiased multi-cohort WSI analysis. Extensive experiments on a uniquely constructed multi-cancer dataset demonstrate significant improvements in generalization, providing a scalable solution for WSI classification across diverse cancer types. Our code for the experiments is publicly available at .
全切片图像(WSI)对于包括组织病理学分析在内的各种临床应用至关重要。然而,目前该领域的深度学习方法主要集中在单个肿瘤类型上,限制了模型的通用性和可扩展性。这种相对狭隘的关注点最终源于组织病理学固有的异质性以及不同肿瘤形态和分子特征的多样性。为此,我们提出了一种新的多队列 WSI 分析方法,旨在利用不同肿瘤类型的多样性。我们引入了群组感知注意力模块(Cohort-Aware Attentionmodule),能够捕捉共有的和肿瘤特有的病理模式,从而增强跨肿瘤的概括能力。此外,我们还构建了一种对抗群组正则化机制,通过互信息最小化来最小化群组特异性偏差。此外,我们还开发了一种分层样本平衡策略,以减轻队列不平衡,促进无偏学习。这些措施共同构成了无偏多队列 WSI 分析的内聚框架。在一个独特构建的多癌症数据集上进行的广泛实验表明,该方法的泛化能力有了显著提高,为不同癌症类型的 WSI 分类提供了一个可扩展的解决方案。我们的实验代码可在以下网址公开获取。
{"title":"Multi-Cohort Framework with Cohort-Aware Attention and Adversarial Mutual-Information Minimization for Whole Slide Image Classification","authors":"Sharon Peled, Yosef E. Maruvka, Moti Freiman","doi":"arxiv-2409.11119","DOIUrl":"https://doi.org/arxiv-2409.11119","url":null,"abstract":"Whole Slide Images (WSIs) are critical for various clinical applications,\u0000including histopathological analysis. However, current deep learning approaches\u0000in this field predominantly focus on individual tumor types, limiting model\u0000generalization and scalability. This relatively narrow focus ultimately stems\u0000from the inherent heterogeneity in histopathology and the diverse morphological\u0000and molecular characteristics of different tumors. To this end, we propose a\u0000novel approach for multi-cohort WSI analysis, designed to leverage the\u0000diversity of different tumor types. We introduce a Cohort-Aware Attention\u0000module, enabling the capture of both shared and tumor-specific pathological\u0000patterns, enhancing cross-tumor generalization. Furthermore, we construct an\u0000adversarial cohort regularization mechanism to minimize cohort-specific biases\u0000through mutual information minimization. Additionally, we develop a\u0000hierarchical sample balancing strategy to mitigate cohort imbalances and\u0000promote unbiased learning. Together, these form a cohesive framework for\u0000unbiased multi-cohort WSI analysis. Extensive experiments on a uniquely\u0000constructed multi-cancer dataset demonstrate significant improvements in\u0000generalization, providing a scalable solution for WSI classification across\u0000diverse cancer types. Our code for the experiments is publicly available at\u0000<link>.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Out-of-distribution (OOD) detection is crucial for enhancing the generalization of AI models used in mammogram screening. Given the challenge of limited prior knowledge about OOD samples in external datasets, unsupervised generative learning is a preferable solution which trains the model to discern the normal characteristics of in-distribution (ID) data. The hypothesis is that during inference, the model aims to reconstruct ID samples accurately, while OOD samples exhibit poorer reconstruction due to their divergence from normality. Inspired by state-of-the-art (SOTA) hybrid architectures combining CNNs and transformers, we developed a novel backbone - HAND, for detecting OOD from large-scale digital screening mammogram studies. To boost the learning efficiency, we incorporated synthetic OOD samples and a parallel discriminator in the latent space to distinguish between ID and OOD samples. Gradient reversal to the OOD reconstruction loss penalizes the model for learning OOD reconstructions. An anomaly score is computed by weighting the reconstruction and discriminator loss. On internal RSNA mammogram held-out test and external Mayo clinic hand-curated dataset, the proposed HAND model outperformed encoder-based and GAN-based baselines, and interestingly, it also outperformed the hybrid CNN+transformer baselines. Therefore, the proposed HAND pipeline offers an automated efficient computational solution for domain-specific quality checks in external screening mammograms, yielding actionable insights without direct exposure to the private medical imaging data.
分布外(OOD)检测对于提高乳腺 X 光筛查所用人工智能模型的泛化能力至关重要。鉴于外部数据集中有关 OOD 样本的先验知识有限,无监督生成学习是一种可取的解决方案,它可以训练模型辨别分布内(ID)数据的正常特征。假设在推理过程中,模型的目标是准确重建 ID 样本,而 OOD 样本由于偏离正态性,重建效果较差。受结合了 CNN 和变压器的最先进(SOTA)混合体系结构的启发,我们开发了一种新型骨架--HAND,用于从大规模数字乳腺 X 光筛查研究中检测 OOD。为了提高学习效率,我们在潜空间中加入了合成 OOD 样本和并行判别器,以区分 ID 和 OOD 样本。对 OOD 重建损失的梯度反转对学习 OOD 重建的模型进行惩罚。通过对重构损失和判别损失进行加权,计算出异常得分。在内部 RSNA 乳房 X 射线照片保留测试和外部马约诊所人工合成数据集上,拟议的 HAND 模型优于基于编码器和基于 GAN 的基线,有趣的是,它还优于混合 CNN+ 变换器基线。因此,所提出的 HAND 流水线为外部乳房 X 光筛查中特定领域的质量检查提供了自动化的高效计算解决方案,在不直接接触私人医疗成像数据的情况下产生了可操作的洞察力。
{"title":"Unsupervised Hybrid framework for ANomaly Detection (HAND) -- applied to Screening Mammogram","authors":"Zhemin Zhang, Bhavika Patel, Bhavik Patel, Imon Banerjee","doi":"arxiv-2409.11534","DOIUrl":"https://doi.org/arxiv-2409.11534","url":null,"abstract":"Out-of-distribution (OOD) detection is crucial for enhancing the\u0000generalization of AI models used in mammogram screening. Given the challenge of\u0000limited prior knowledge about OOD samples in external datasets, unsupervised\u0000generative learning is a preferable solution which trains the model to discern\u0000the normal characteristics of in-distribution (ID) data. The hypothesis is that\u0000during inference, the model aims to reconstruct ID samples accurately, while\u0000OOD samples exhibit poorer reconstruction due to their divergence from\u0000normality. Inspired by state-of-the-art (SOTA) hybrid architectures combining\u0000CNNs and transformers, we developed a novel backbone - HAND, for detecting OOD\u0000from large-scale digital screening mammogram studies. To boost the learning\u0000efficiency, we incorporated synthetic OOD samples and a parallel discriminator\u0000in the latent space to distinguish between ID and OOD samples. Gradient\u0000reversal to the OOD reconstruction loss penalizes the model for learning OOD\u0000reconstructions. An anomaly score is computed by weighting the reconstruction\u0000and discriminator loss. On internal RSNA mammogram held-out test and external\u0000Mayo clinic hand-curated dataset, the proposed HAND model outperformed\u0000encoder-based and GAN-based baselines, and interestingly, it also outperformed\u0000the hybrid CNN+transformer baselines. Therefore, the proposed HAND pipeline\u0000offers an automated efficient computational solution for domain-specific\u0000quality checks in external screening mammograms, yielding actionable insights\u0000without direct exposure to the private medical imaging data.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Micro-CT scanning of rocks significantly enhances our understanding of pore-scale physics in porous media. With advancements in pore-scale simulation methods, such as pore network models, it is now possible to accurately simulate multiphase flow properties, including relative permeability, from CT-scanned rock samples. However, the limited number of CT-scanned samples and the challenge of connecting pore-scale networks to field-scale rock properties often make it difficult to use pore-scale simulated properties in realistic field-scale reservoir simulations. Deep learning approaches to create synthetic 3D rock structures allow us to simulate variations in CT rock structures, which can then be used to compute representative rock properties and flow functions. However, most current deep learning methods for 3D rock structure synthesis don't consider rock properties derived from well observations, lacking a direct link between pore-scale structures and field-scale data. We present a method to construct 3D rock structures constrained to observed rock properties using generative adversarial networks (GANs) with conditioning accomplished through a gradual Gaussian deformation process. We begin by pre-training a Wasserstein GAN to reconstruct 3D rock structures. Subsequently, we use a pore network model simulator to compute rock properties. The latent vectors for image generation in GAN are progressively altered using the Gaussian deformation approach to produce 3D rock structures constrained by well-derived conditioning data. This GAN and Gaussian deformation approach enables high-resolution synthetic image generation and reproduces user-defined rock properties such as porosity, permeability, and pore size distribution. Our research provides a novel way to link GAN-generated models to field-derived quantities.
对岩石进行显微 CT 扫描极大地增强了我们对多孔介质孔隙尺度物理学的了解。随着孔隙尺度模拟方法(如孔隙网络模型)的进步,现在可以通过 CT 扫描岩石样本精确模拟多相流特性,包括相对渗透率。然而,由于 CT 扫描样本的数量有限,以及将孔隙尺度网络与油田尺度岩石属性连接起来的挑战,通常很难在现实油田尺度储层模拟中使用孔隙尺度模拟属性。然而,目前大多数用于三维岩石结构合成的深度学习方法并不考虑从油井观测中得出的岩石属性,孔隙尺度结构与油田尺度数据之间缺乏直接联系。我们提出了一种利用生成对抗网络(GANs)构建三维岩石结构的方法,该方法通过渐变高斯变形过程完成调节,并受制于观测到的岩石属性。我们首先对 WassersteinGAN 进行预训练,以重建三维岩石结构。随后,我们使用孔隙网络模型模拟器计算岩石属性。使用高斯变形方法逐步改变 GAN 中用于图像生成的潜向量,以生成受推导出的条件数据约束的三维岩石结构。这种 GAN 和高斯变形方法能够生成高分辨率的合成图像,并再现用户定义的岩石属性,如孔隙度、渗透性和孔径分布。我们的研究提供了一种将 GAN 生成的模型与现场数据联系起来的新方法。
{"title":"Using Physics Informed Generative Adversarial Networks to Model 3D porous media","authors":"Zihan Ren, Sanjay Srinivasan","doi":"arxiv-2409.11541","DOIUrl":"https://doi.org/arxiv-2409.11541","url":null,"abstract":"Micro-CT scanning of rocks significantly enhances our understanding of\u0000pore-scale physics in porous media. With advancements in pore-scale simulation\u0000methods, such as pore network models, it is now possible to accurately simulate\u0000multiphase flow properties, including relative permeability, from CT-scanned\u0000rock samples. However, the limited number of CT-scanned samples and the\u0000challenge of connecting pore-scale networks to field-scale rock properties\u0000often make it difficult to use pore-scale simulated properties in realistic\u0000field-scale reservoir simulations. Deep learning approaches to create synthetic\u00003D rock structures allow us to simulate variations in CT rock structures, which\u0000can then be used to compute representative rock properties and flow functions.\u0000However, most current deep learning methods for 3D rock structure synthesis\u0000don't consider rock properties derived from well observations, lacking a direct\u0000link between pore-scale structures and field-scale data. We present a method to\u0000construct 3D rock structures constrained to observed rock properties using\u0000generative adversarial networks (GANs) with conditioning accomplished through a\u0000gradual Gaussian deformation process. We begin by pre-training a Wasserstein\u0000GAN to reconstruct 3D rock structures. Subsequently, we use a pore network\u0000model simulator to compute rock properties. The latent vectors for image\u0000generation in GAN are progressively altered using the Gaussian deformation\u0000approach to produce 3D rock structures constrained by well-derived conditioning\u0000data. This GAN and Gaussian deformation approach enables high-resolution\u0000synthetic image generation and reproduces user-defined rock properties such as\u0000porosity, permeability, and pore size distribution. Our research provides a\u0000novel way to link GAN-generated models to field-derived quantities.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hao Fang, Zhe Liu, Yi Feng, Zhen Qiu, Pierre Bagnaninchi, Yunjie Yang
Multi-frequency Electrical Impedance Tomography (mfEIT) is a promising biomedical imaging technique that estimates tissue conductivities across different frequencies. Current state-of-the-art (SOTA) algorithms, which rely on supervised learning and Multiple Measurement Vectors (MMV), require extensive training data, making them time-consuming, costly, and less practical for widespread applications. Moreover, the dependency on training data in supervised MMV methods can introduce erroneous conductivity contrasts across frequencies, posing significant concerns in biomedical applications. To address these challenges, we propose a novel unsupervised learning approach based on Multi-Branch Attention Image Prior (MAIP) for mfEIT reconstruction. Our method employs a carefully designed Multi-Branch Attention Network (MBA-Net) to represent multiple frequency-dependent conductivity images and simultaneously reconstructs mfEIT images by iteratively updating its parameters. By leveraging the implicit regularization capability of the MBA-Net, our algorithm can capture significant inter- and intra-frequency correlations, enabling robust mfEIT reconstruction without the need for training data. Through simulation and real-world experiments, our approach demonstrates performance comparable to, or better than, SOTA algorithms while exhibiting superior generalization capability. These results suggest that the MAIP-based method can be used to improve the reliability and applicability of mfEIT in various settings.
{"title":"Multi-frequency Electrical Impedance Tomography Reconstruction with Multi-Branch Attention Image Prior","authors":"Hao Fang, Zhe Liu, Yi Feng, Zhen Qiu, Pierre Bagnaninchi, Yunjie Yang","doi":"arxiv-2409.10794","DOIUrl":"https://doi.org/arxiv-2409.10794","url":null,"abstract":"Multi-frequency Electrical Impedance Tomography (mfEIT) is a promising\u0000biomedical imaging technique that estimates tissue conductivities across\u0000different frequencies. Current state-of-the-art (SOTA) algorithms, which rely\u0000on supervised learning and Multiple Measurement Vectors (MMV), require\u0000extensive training data, making them time-consuming, costly, and less practical\u0000for widespread applications. Moreover, the dependency on training data in\u0000supervised MMV methods can introduce erroneous conductivity contrasts across\u0000frequencies, posing significant concerns in biomedical applications. To address\u0000these challenges, we propose a novel unsupervised learning approach based on\u0000Multi-Branch Attention Image Prior (MAIP) for mfEIT reconstruction. Our method\u0000employs a carefully designed Multi-Branch Attention Network (MBA-Net) to\u0000represent multiple frequency-dependent conductivity images and simultaneously\u0000reconstructs mfEIT images by iteratively updating its parameters. By leveraging\u0000the implicit regularization capability of the MBA-Net, our algorithm can\u0000capture significant inter- and intra-frequency correlations, enabling robust\u0000mfEIT reconstruction without the need for training data. Through simulation and\u0000real-world experiments, our approach demonstrates performance comparable to, or\u0000better than, SOTA algorithms while exhibiting superior generalization\u0000capability. These results suggest that the MAIP-based method can be used to\u0000improve the reliability and applicability of mfEIT in various settings.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Armand Collin, Arthur Boschet, Mathieu Boudreau, Julien Cohen-Adad
Quantifying axon and myelin properties (e.g., axon diameter, myelin thickness, g-ratio) in histology images can provide useful information about microstructural changes caused by neurodegenerative diseases. Automatic tissue segmentation is an important tool for these datasets, as a single stained section can contain up to thousands of axons. Advances in deep learning have made this task quick and reliable with minimal overhead, but a deep learning model trained by one research group will hardly ever be usable by other groups due to differences in their histology training data. This is partly due to subject diversity (different body parts, species, genetics, pathologies) and also to the range of modern microscopy imaging techniques resulting in a wide variability of image features (i.e., contrast, resolution). There is a pressing need to make AI accessible to neuroscience researchers to facilitate and accelerate their workflow, but publicly available models are scarce and poorly maintained. Our approach is to aggregate data from multiple imaging modalities (bright field, electron microscopy, Raman spectroscopy) and species (mouse, rat, rabbit, human), to create an open-source, durable tool for axon and myelin segmentation. Our generalist model makes it easier for researchers to process their data and can be fine-tuned for better performance on specific domains. We study the benefits of different aggregation schemes. This multi-domain segmentation model performs better than single-modality dedicated learners (p=0.03077), generalizes better on out-of-distribution data and is easier to use and maintain. Importantly, we package the segmentation tool into a well-maintained open-source software ecosystem (see https://github.com/axondeepseg/axondeepseg).
{"title":"Multi-Domain Data Aggregation for Axon and Myelin Segmentation in Histology Images","authors":"Armand Collin, Arthur Boschet, Mathieu Boudreau, Julien Cohen-Adad","doi":"arxiv-2409.11552","DOIUrl":"https://doi.org/arxiv-2409.11552","url":null,"abstract":"Quantifying axon and myelin properties (e.g., axon diameter, myelin\u0000thickness, g-ratio) in histology images can provide useful information about\u0000microstructural changes caused by neurodegenerative diseases. Automatic tissue\u0000segmentation is an important tool for these datasets, as a single stained\u0000section can contain up to thousands of axons. Advances in deep learning have\u0000made this task quick and reliable with minimal overhead, but a deep learning\u0000model trained by one research group will hardly ever be usable by other groups\u0000due to differences in their histology training data. This is partly due to\u0000subject diversity (different body parts, species, genetics, pathologies) and\u0000also to the range of modern microscopy imaging techniques resulting in a wide\u0000variability of image features (i.e., contrast, resolution). There is a pressing\u0000need to make AI accessible to neuroscience researchers to facilitate and\u0000accelerate their workflow, but publicly available models are scarce and poorly\u0000maintained. Our approach is to aggregate data from multiple imaging modalities\u0000(bright field, electron microscopy, Raman spectroscopy) and species (mouse,\u0000rat, rabbit, human), to create an open-source, durable tool for axon and myelin\u0000segmentation. Our generalist model makes it easier for researchers to process\u0000their data and can be fine-tuned for better performance on specific domains. We\u0000study the benefits of different aggregation schemes. This multi-domain\u0000segmentation model performs better than single-modality dedicated learners\u0000(p=0.03077), generalizes better on out-of-distribution data and is easier to\u0000use and maintain. Importantly, we package the segmentation tool into a\u0000well-maintained open-source software ecosystem (see\u0000https://github.com/axondeepseg/axondeepseg).","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263134","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}