arXiv - EE - Image and Video Processing最新文献_第5页

MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning MOST：通过持续学习优化多个下行流任务的磁共振重构

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-16 DOI: arxiv-2409.10394

Hwihun Jeong, Se Young Chun, Jongho Lee

Deep learning-based Magnetic Resonance (MR) reconstruction methods havefocused on generating high-quality images but they often overlook the impact ondownstream tasks (e.g., segmentation) that utilize the reconstructed images.Cascading separately trained reconstruction network and downstream task networkhas been shown to introduce performance degradation due to error propagationand domain gaps between training datasets. To mitigate this issue, downstreamtask-oriented reconstruction optimization has been proposed for a singledownstream task. Expanding this optimization to multi-task scenarios is notstraightforward. In this work, we extended this optimization to sequentiallyintroduced multiple downstream tasks and demonstrated that a single MRreconstruction network can be optimized for multiple downstream tasks bydeploying continual learning (MOST). MOST integrated techniques fromreplay-based continual learning and image-guided loss to overcome catastrophicforgetting. Comparative experiments demonstrated that MOST outperformed areconstruction network without finetuning, a reconstruction network withna"ive finetuning, and conventional continual learning methods. Thisadvancement empowers the application of a single MR reconstruction network formultiple downstream tasks. The source code is available at:https://github.com/SNU-LIST/MOST

基于深度学习的磁共振（MR）重建方法专注于生成高质量图像，但往往忽略了对利用重建图像的下游任务（如分割）的影响。为了缓解这一问题，针对单下游任务提出了面向下游任务的重建优化。将这种优化扩展到多任务场景并非易事。在这项工作中，我们将这一优化扩展到连续引入的多个下游任务，并证明通过部署持续学习（MOST），可以针对多个下游任务优化单个磁共振重建网络。MOST 整合了基于回放的持续学习和图像引导损失技术，以克服灾难性遗忘。对比实验表明，MOST的性能优于不带微调的重建网络、带微调的重建网络和传统的持续学习方法。这一进步有助于将单一的磁共振重建网络应用于多个下游任务。源代码见：https://github.com/SNU-LIST/MOST

{"title":"MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning","authors":"Hwihun Jeong, Se Young Chun, Jongho Lee","doi":"arxiv-2409.10394","DOIUrl":"https://doi.org/arxiv-2409.10394","url":null,"abstract":"Deep learning-based Magnetic Resonance (MR) reconstruction methods have\u0000focused on generating high-quality images but they often overlook the impact on\u0000downstream tasks (e.g., segmentation) that utilize the reconstructed images.\u0000Cascading separately trained reconstruction network and downstream task network\u0000has been shown to introduce performance degradation due to error propagation\u0000and domain gaps between training datasets. To mitigate this issue, downstream\u0000task-oriented reconstruction optimization has been proposed for a single\u0000downstream task. Expanding this optimization to multi-task scenarios is not\u0000straightforward. In this work, we extended this optimization to sequentially\u0000introduced multiple downstream tasks and demonstrated that a single MR\u0000reconstruction network can be optimized for multiple downstream tasks by\u0000deploying continual learning (MOST). MOST integrated techniques from\u0000replay-based continual learning and image-guided loss to overcome catastrophic\u0000forgetting. Comparative experiments demonstrated that MOST outperformed a\u0000reconstruction network without finetuning, a reconstruction network with\u0000na\"ive finetuning, and conventional continual learning methods. This\u0000advancement empowers the application of a single MR reconstruction network for\u0000multiple downstream tasks. The source code is available at:\u0000https://github.com/SNU-LIST/MOST","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency WaveMixSR-V2：以更高的效率增强超分辨率

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-16 DOI: arxiv-2409.10582

Pranav Jeevan, Neeraj Nixon, Amit Sethi

Recent advancements in single image super-resolution have been predominantlydriven by token mixers and transformer architectures. WaveMixSR utilized theWaveMix architecture, employing a two-dimensional discrete wavelet transformfor spatial token mixing, achieving superior performance in super-resolutiontasks with remarkable resource efficiency. In this work, we present an enhancedversion of the WaveMixSR architecture by (1) replacing the traditionaltranspose convolution layer with a pixel shuffle operation and (2) implementinga multistage design for higher resolution tasks ($4times$). Our experimentsdemonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms otherarchitectures in multiple super-resolution tasks, achieving state-of-the-artfor the BSD100 dataset, while also consuming fewer resources, exhibits higherparameter efficiency, lower latency and higher throughput. Our code isavailable at https://github.com/pranavphoenix/WaveMixSR.

单图像超分辨率的最新进展主要是由令牌混合器和变换器架构推动的。WaveMixSR 利用 WaveMix 架构，采用二维离散小波变换进行空间令牌混合，在超分辨率任务中实现了卓越的性能和显著的资源效率。在这项工作中，我们提出了 WaveMixSR 架构的增强版本，具体做法是：（1）用像素洗牌操作取代传统的跨距卷积层；（2）针对更高分辨率任务（4 美元/次）实施多级设计。我们的实验证明，我们的增强型模型--WaveMixSR-V2--在多个超分辨率任务中的表现优于其他架构，在 BSD100 数据集上达到了最先进水平，同时还消耗更少的资源，表现出更高的参数效率、更低的延迟和更高的吞吐量。我们的代码见 https://github.com/pranavphoenix/WaveMixSR。

引用次数: 0

SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds SPAC：基于采样的高密度点云渐进式属性压缩

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-16 DOI: arxiv-2409.10293

Xiaolong Mao, Hui Yuan, Tian Guo, Shiqi Jiang, Raouf Hamzaoui, Sam Kwong

We propose an end-to-end attribute compression method for dense point clouds.The proposed method combines a frequency sampling module, an adaptive scalefeature extraction module with geometry assistance, and a global hyperpriorentropy model. The frequency sampling module uses a Hamming window and the FastFourier Transform to extract high-frequency components of the point cloud. Thedifference between the original point cloud and the sampled point cloud isdivided into multiple sub-point clouds. These sub-point clouds are thenpartitioned using an octree, providing a structured input for featureextraction. The feature extraction module integrates adaptive convolutionallayers and uses offset-attention to capture both local and global features.Then, a geometry-assisted attribute feature refinement module is used to refinethe extracted attribute features. Finally, a global hyperprior model isintroduced for entropy encoding. This model propagates hyperprior parametersfrom the deepest (base) layer to the other layers, further enhancing theencoding efficiency. At the decoder, a mirrored network is used toprogressively restore features and reconstruct the color attribute throughtransposed convolutional layers. The proposed method encodes base layerinformation at a low bitrate and progressively adds enhancement layerinformation to improve reconstruction accuracy. Compared to the latest G-PCCtest model (TMC13v23) under the MPEG common test conditions (CTCs), theproposed method achieved an average Bjontegaard delta bitrate reduction of24.58% for the Y component (21.23% for YUV combined) on the MPEG Category Soliddataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEGCategory Dense dataset. This is the first instance of a learning-based codecoutperforming the G-PCC standard on these datasets under the MPEG CTCs.

我们提出了一种端到端的高密度点云属性压缩方法，该方法结合了频率采样模块、几何辅助自适应比例特征提取模块和全局超前熵模型。频率采样模块使用汉明窗和快速傅里叶变换来提取点云的高频成分。原始点云和采样点云之间的差异被划分为多个子点云。然后使用八叉树对这些子点云进行分区，为特征提取提供结构化输入。特征提取模块集成了自适应卷积分层，并使用偏移注意来捕捉局部和全局特征，然后使用几何辅助属性特征细化模块来细化提取的属性特征。最后，引入全局超先验模型进行熵编码。该模型将超先验参数从最深（基础）层传播到其他层，进一步提高了编码效率。在解码器中，使用镜像网络逐步还原特征，并通过交叉卷积层重建颜色属性。所提出的方法以较低的比特率对基础层信息进行编码，并逐步增加增强层信息以提高重构精度。在 MPEG 通用测试条件（CTCs）下，与最新的 G-PCC 测试模型（TMC13v23）相比，拟议方法在 MPEG 类别 Soliddataset 上的 Y 分量平均比特率降低了 24.58%（YUV 合并比特率降低了 21.23%），在 MPEG 类别 Dense 数据集上的 Y 分量平均比特率降低了 22.48%（YUV 合并比特率降低了 17.19%）。这是基于学习的编解码器在 MPEG CTC 下的这些数据集上首次超越 G-PCC 标准。

{"title":"SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds","authors":"Xiaolong Mao, Hui Yuan, Tian Guo, Shiqi Jiang, Raouf Hamzaoui, Sam Kwong","doi":"arxiv-2409.10293","DOIUrl":"https://doi.org/arxiv-2409.10293","url":null,"abstract":"We propose an end-to-end attribute compression method for dense point clouds.\u0000The proposed method combines a frequency sampling module, an adaptive scale\u0000feature extraction module with geometry assistance, and a global hyperprior\u0000entropy model. The frequency sampling module uses a Hamming window and the Fast\u0000Fourier Transform to extract high-frequency components of the point cloud. The\u0000difference between the original point cloud and the sampled point cloud is\u0000divided into multiple sub-point clouds. These sub-point clouds are then\u0000partitioned using an octree, providing a structured input for feature\u0000extraction. The feature extraction module integrates adaptive convolutional\u0000layers and uses offset-attention to capture both local and global features.\u0000Then, a geometry-assisted attribute feature refinement module is used to refine\u0000the extracted attribute features. Finally, a global hyperprior model is\u0000introduced for entropy encoding. This model propagates hyperprior parameters\u0000from the deepest (base) layer to the other layers, further enhancing the\u0000encoding efficiency. At the decoder, a mirrored network is used to\u0000progressively restore features and reconstruct the color attribute through\u0000transposed convolutional layers. The proposed method encodes base layer\u0000information at a low bitrate and progressively adds enhancement layer\u0000information to improve reconstruction accuracy. Compared to the latest G-PCC\u0000test model (TMC13v23) under the MPEG common test conditions (CTCs), the\u0000proposed method achieved an average Bjontegaard delta bitrate reduction of\u000024.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid\u0000dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG\u0000Category Dense dataset. This is the first instance of a learning-based codec\u0000outperforming the G-PCC standard on these datasets under the MPEG CTCs.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Universal Topology Refinement for Medical Image Segmentation with Polynomial Feature Synthesis 利用多项式特征合成对医学图像进行通用拓扑细化分割

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-15 DOI: arxiv-2409.09796

Liu Li, Hanchun Wang, Matthew Baugh, Qiang Ma, Weitong Zhang, Cheng Ouyang, Daniel Rueckert, Bernhard Kainz

Although existing medical image segmentation methods provide impressivepixel-wise accuracy, they often neglect topological correctness, making theirsegmentations unusable for many downstream tasks. One option is to retrain suchmodels whilst including a topology-driven loss component. However, this iscomputationally expensive and often impractical. A better solution would be tohave a versatile plug-and-play topology refinement method that is compatiblewith any domain-specific segmentation pipeline. Directly training apost-processing model to mitigate topological errors often fails as such modelstend to be biased towards the topological errors of a target segmentationnetwork. The diversity of these errors is confined to the information providedby a labelled training set, which is especially problematic for small datasets.Our method solves this problem by training a model-agnostic topology refinementnetwork with synthetic segmentations that cover a wide variety of topologicalerrors. Inspired by the Stone-Weierstrass theorem, we synthesizetopology-perturbation masks with randomly sampled coefficients of orthogonalpolynomial bases, which ensures a complete and unbiased representation.Practically, we verified the efficiency and effectiveness of our methods asbeing compatible with multiple families of polynomial bases, and show evidencethat our universal plug-and-play topology refinement network outperforms bothexisting topology-driven learning-based and post-processing methods. We alsoshow that combining our method with learning-based models provides aneffortless add-on, which can further improve the performance of existingapproaches.

尽管现有的医学图像分割方法在像素精度上令人印象深刻，但它们往往忽视拓扑的正确性，导致其分割结果无法用于许多下游任务。一种方法是重新训练此类模型，同时加入拓扑驱动的损失成分。然而，这样做的计算成本很高，而且往往不切实际。更好的解决方案是采用一种通用的即插即用拓扑细化方法，这种方法可以与任何特定领域的分割管道兼容。直接训练拓扑处理模型来减少拓扑误差往往会失败，因为这种模型往往偏向于目标分割网络的拓扑误差。我们的方法通过使用涵盖各种拓扑错误的合成分割来训练一个与模型无关的拓扑细化网络，从而解决了这个问题。在 Stone-Weierstrass 定理的启发下，我们用随机采样的正交多项式基的系数合成拓扑扰动掩码，从而确保了完整且无偏的表示。在实践中，我们验证了我们的方法与多个多项式基系列兼容的效率和有效性，并证明我们的通用即插即用拓扑细化网络优于现有的基于拓扑驱动学习的方法和后处理方法。我们还展示了将我们的方法与基于学习的模型相结合的简便附加方法，它能进一步提高现有方法的性能。

{"title":"Universal Topology Refinement for Medical Image Segmentation with Polynomial Feature Synthesis","authors":"Liu Li, Hanchun Wang, Matthew Baugh, Qiang Ma, Weitong Zhang, Cheng Ouyang, Daniel Rueckert, Bernhard Kainz","doi":"arxiv-2409.09796","DOIUrl":"https://doi.org/arxiv-2409.09796","url":null,"abstract":"Although existing medical image segmentation methods provide impressive\u0000pixel-wise accuracy, they often neglect topological correctness, making their\u0000segmentations unusable for many downstream tasks. One option is to retrain such\u0000models whilst including a topology-driven loss component. However, this is\u0000computationally expensive and often impractical. A better solution would be to\u0000have a versatile plug-and-play topology refinement method that is compatible\u0000with any domain-specific segmentation pipeline. Directly training a\u0000post-processing model to mitigate topological errors often fails as such models\u0000tend to be biased towards the topological errors of a target segmentation\u0000network. The diversity of these errors is confined to the information provided\u0000by a labelled training set, which is especially problematic for small datasets.\u0000Our method solves this problem by training a model-agnostic topology refinement\u0000network with synthetic segmentations that cover a wide variety of topological\u0000errors. Inspired by the Stone-Weierstrass theorem, we synthesize\u0000topology-perturbation masks with randomly sampled coefficients of orthogonal\u0000polynomial bases, which ensures a complete and unbiased representation.\u0000Practically, we verified the efficiency and effectiveness of our methods as\u0000being compatible with multiple families of polynomial bases, and show evidence\u0000that our universal plug-and-play topology refinement network outperforms both\u0000existing topology-driven learning-based and post-processing methods. We also\u0000show that combining our method with learning-based models provides an\u0000effortless add-on, which can further improve the performance of existing\u0000approaches.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pack my weights and run! Minimizing overheads for in-memory computing accelerators 收拾行装跑路内存计算加速器开销最小化

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-15 DOI: arxiv-2409.11437

Pouya Houshmand, Marian Verhelst

In-memory computing hardware accelerators allow more than 10x improvements inpeak efficiency and performance for matrix-vector multiplications (MVM)compared to conventional digital designs. For this, they have gained greatinterest for the acceleration of neural network workloads. Nevertheless, thesepotential gains are only achieved when the utilization of the computationalresources is maximized and the overhead from loading operands in the memoryarray minimized. To this aim, this paper proposes a novel mapping algorithm forthe weights in the IMC macro, based on efficient packing of the weights ofnetwork layers in the available memory. The algorithm realizes 1) minimizationof weight loading times while at the same time 2) maximally exploiting theparallelism of the IMC computational fabric. A set of case studies are carriedout to show achievable trade-offs for the MLPerf Tiny benchmarkcite{mlperftiny} on IMC architectures, with potential $10-100times$ EDPimprovements.

与传统数字设计相比，内存计算硬件加速器可将矩阵-向量乘法（MVM）的峰值效率和性能提高 10 倍以上。因此，它们在加速神经网络工作负载方面获得了极大的关注。然而，只有在计算资源利用率最大化、内存阵列中操作数加载开销最小化的情况下，才能实现这些潜在收益。为此，本文提出了一种新颖的 IMC 宏权值映射算法，该算法基于在可用内存中高效打包网络层的权值。该算法实现了 1) 权重加载时间最小化，同时 2) 最大限度地利用 IMC 计算结构的并行性。我们进行了一系列案例研究，展示了 IMC 架构上 MLPerf Tiny 基准（MLPerf Tiny benchmark/cite{mlperftiny}）可实现的权衡，以及潜在的 10-100 美元/次的 EDP 改进。

引用次数: 0

Machine Learning for Analyzing Atomic Force Microscopy (AFM) Images Generated from Polymer Blends 利用机器学习分析聚合物混合物生成的原子力显微镜 (AFM) 图像

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-15 DOI: arxiv-2409.11438

Aanish Paruchuri, Yunfei Wang, Xiaodan Gu, Arthi Jayaraman

In this paper we present a new machine learning workflow with unsupervisedlearning techniques to identify domains within atomic force microscopy imagesobtained from polymer films. The goal of the workflow is to identify thespatial location of the two types of polymer domains with little to no manualintervention and calculate the domain size distributions which in turn can helpqualify the phase separated state of the material as macrophase or microphaseordered or disordered domains. We briefly review existing approaches used inother fields, computer vision and signal processing that can be applicable forthe above tasks that happen frequently in the field of polymer science andengineering. We then test these approaches from computer vision and signalprocessing on the AFM image dataset to identify the strengths and limitationsof each of these approaches for our first task. For our first domainsegmentation task, we found that the workflow using discrete Fourier transformor discrete cosine transform with variance statistics as the feature works thebest. The popular ResNet50 deep learning approach from computer vision fieldexhibited relatively poorer performance in the domain segmentation task for ourAFM images as compared to the DFT and DCT based workflows. For the second task,for each of 144 input AFM images, we then used an existing porespy pythonpackage to calculate the domain size distribution from the output of that imagefrom DFT based workflow. The information and open source codes we share in thispaper can serve as a guide for researchers in the polymer and soft materialsfields who need ML modeling and workflows for automated analyses of AFM imagesfrom polymer samples that may have crystalline or amorphous domains, sharp orrough interfaces between domains, or micro or macrophase separated domains.

在本文中，我们介绍了一种新的机器学习工作流程，它采用无监督学习技术来识别聚合物薄膜原子力显微镜图像中的畴。该工作流程的目标是在几乎不需要人工干预的情况下识别两种类型聚合物畴的空间位置，并计算畴的尺寸分布，进而帮助将材料的相分离状态鉴定为巨相、微相有序畴或无序畴。我们简要回顾了在其他领域、计算机视觉和信号处理中使用的现有方法，这些方法可用于聚合物科学与工程领域中经常出现的上述任务。然后，我们在原子力显微镜图像数据集上测试了这些计算机视觉和信号处理方法，以确定每种方法在第一项任务中的优势和局限性。对于我们的第一个领域分割任务，我们发现使用离散傅里叶变换或离散余弦变换与方差统计作为特征的工作流程效果最好。与基于离散傅立叶变换和离散余弦变换的工作流程相比，计算机视觉领域流行的 ResNet50 深度学习方法在 AFM 图像的域分割任务中表现相对较差。在第二项任务中，对于 144 幅输入 AFM 图像中的每一幅图像，我们都使用现有的 porespy python 程序包来计算基于 DFT 工作流程的图像输出中的畴大小分布。我们在本文中分享的信息和开放源代码可为聚合物和软材料领域的研究人员提供指导，这些研究人员需要 ML 建模和工作流，以便自动分析来自聚合物样品的 AFM 图像，这些样品可能具有结晶或无定形结构域、结构域之间的尖锐或穿透界面，或微观或宏观相分离结构域。

{"title":"Machine Learning for Analyzing Atomic Force Microscopy (AFM) Images Generated from Polymer Blends","authors":"Aanish Paruchuri, Yunfei Wang, Xiaodan Gu, Arthi Jayaraman","doi":"arxiv-2409.11438","DOIUrl":"https://doi.org/arxiv-2409.11438","url":null,"abstract":"In this paper we present a new machine learning workflow with unsupervised\u0000learning techniques to identify domains within atomic force microscopy images\u0000obtained from polymer films. The goal of the workflow is to identify the\u0000spatial location of the two types of polymer domains with little to no manual\u0000intervention and calculate the domain size distributions which in turn can help\u0000qualify the phase separated state of the material as macrophase or microphase\u0000ordered or disordered domains. We briefly review existing approaches used in\u0000other fields, computer vision and signal processing that can be applicable for\u0000the above tasks that happen frequently in the field of polymer science and\u0000engineering. We then test these approaches from computer vision and signal\u0000processing on the AFM image dataset to identify the strengths and limitations\u0000of each of these approaches for our first task. For our first domain\u0000segmentation task, we found that the workflow using discrete Fourier transform\u0000or discrete cosine transform with variance statistics as the feature works the\u0000best. The popular ResNet50 deep learning approach from computer vision field\u0000exhibited relatively poorer performance in the domain segmentation task for our\u0000AFM images as compared to the DFT and DCT based workflows. For the second task,\u0000for each of 144 input AFM images, we then used an existing porespy python\u0000package to calculate the domain size distribution from the output of that image\u0000from DFT based workflow. The information and open source codes we share in this\u0000paper can serve as a guide for researchers in the polymer and soft materials\u0000fields who need ML modeling and workflows for automated analyses of AFM images\u0000from polymer samples that may have crystalline or amorphous domains, sharp or\u0000rough interfaces between domains, or micro or macrophase separated domains.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Learning Two-factor Representation for Magnetic Resonance Image Super-resolution 学习用于磁共振图像超分辨率的双因子表示法

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-15 DOI: arxiv-2409.09731

Weifeng Wei, Heng Chen, Pengxiang Su

Magnetic Resonance Imaging (MRI) requires a trade-off between resolution,signal-to-noise ratio, and scan time, making high-resolution (HR) acquisitionchallenging. Therefore, super-resolution for MR image is a feasible solution.However, most existing methods face challenges in accurately learning acontinuous volumetric representation from low-resolution image or require HRimage for supervision. To solve these challenges, we propose a novel method forMR image super-resolution based on two-factor representation. Specifically, wefactorize intensity signals into a linear combination of learnable basis andcoefficient factors, enabling efficient continuous volumetric representationfrom low-resolution MR image. Besides, we introduce a coordinate-based encodingto capture structural relationships between sparse voxels, facilitating smoothcompletion in unobserved regions. Experiments on BraTS 2019 and MSSEG 2016datasets demonstrate that our method achieves state-of-the-art performance,providing superior visual fidelity and robustness, particularly in largeup-sampling scale MR image super-resolution.

磁共振成像（MRI）需要在分辨率、信噪比和扫描时间之间进行权衡，因此高分辨率（HR）采集具有挑战性。然而，大多数现有方法在从低分辨率图像准确学习连续容积表示方面面临挑战，或者需要高分辨率图像进行监督。为了解决这些难题，我们提出了一种基于双因子表示的 MR 图像超分辨率新方法。具体来说，我们将强度信号因子化为可学习的基础因子和系数因子的线性组合，从而从低分辨率磁共振图像中获得高效的连续容积表示。此外，我们还引入了基于坐标的编码，以捕捉稀疏体素之间的结构关系，从而促进未观察区域的顺利完成。在 BraTS 2019 和 MSSEG 2016 数据集上进行的实验表明，我们的方法达到了最先进的性能，提供了卓越的视觉保真度和鲁棒性，尤其是在大采样规模的磁共振图像超分辨率中。

{"title":"Learning Two-factor Representation for Magnetic Resonance Image Super-resolution","authors":"Weifeng Wei, Heng Chen, Pengxiang Su","doi":"arxiv-2409.09731","DOIUrl":"https://doi.org/arxiv-2409.09731","url":null,"abstract":"Magnetic Resonance Imaging (MRI) requires a trade-off between resolution,\u0000signal-to-noise ratio, and scan time, making high-resolution (HR) acquisition\u0000challenging. Therefore, super-resolution for MR image is a feasible solution.\u0000However, most existing methods face challenges in accurately learning a\u0000continuous volumetric representation from low-resolution image or require HR\u0000image for supervision. To solve these challenges, we propose a novel method for\u0000MR image super-resolution based on two-factor representation. Specifically, we\u0000factorize intensity signals into a linear combination of learnable basis and\u0000coefficient factors, enabling efficient continuous volumetric representation\u0000from low-resolution MR image. Besides, we introduce a coordinate-based encoding\u0000to capture structural relationships between sparse voxels, facilitating smooth\u0000completion in unobserved regions. Experiments on BraTS 2019 and MSSEG 2016\u0000datasets demonstrate that our method achieves state-of-the-art performance,\u0000providing superior visual fidelity and robustness, particularly in large\u0000up-sampling scale MR image super-resolution.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Domain and Content Adaptive Convolutions for Cross-Domain Adenocarcinoma Segmentation 用于跨域腺癌分段的域和内容自适应卷积

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-15 DOI: arxiv-2409.09797

Frauke Wilm, Mathias Öttl, Marc Aubreville, Katharina Breininger

Recent advances in computer-aided diagnosis for histopathology have beenlargely driven by the use of deep learning models for automated image analysis.While these networks can perform on par with medical experts, their performancecan be impeded by out-of-distribution data. The Cross-Organ and Cross-ScannerAdenocarcinoma Segmentation (COSAS) challenge aimed to address the task ofcross-domain adenocarcinoma segmentation in the presence of morphological andscanner-induced domain shifts. In this paper, we present a U-Net-basedsegmentation framework designed to tackle this challenge. Our approach achievedsegmentation scores of 0.8020 for the cross-organ track and 0.8527 for thecross-scanner track on the final challenge test sets, ranking it thebest-performing submission.

虽然这些网络的性能可以与医学专家媲美，但它们的性能可能会受到分布外数据的阻碍。跨器官和跨扫描仪腺癌分割（COSAS）挑战赛旨在解决形态学和扫描仪引起的域偏移情况下的跨域腺癌分割任务。在本文中，我们提出了一个基于 U-Net 的分割框架，旨在应对这一挑战。在最终的挑战测试集上，我们的方法在跨器官轨迹和跨扫描仪轨迹上分别获得了 0.8020 和 0.8527 的分割分数，成为表现最好的提交论文。

引用次数: 0

Reliable Multi-View Learning with Conformal Prediction for Aortic Stenosis Classification in Echocardiography 超声心动图主动脉瓣狭窄分类的可靠多视图学习与构象预测

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-15 DOI: arxiv-2409.09680

Ang Nan Gu, Michael Tsang, Hooman Vaseli, Teresa Tsang, Purang Abolmaesumi

The fundamental problem with ultrasound-guided diagnosis is that the acquiredimages are often 2-D cross-sections of a 3-D anatomy, potentially missingimportant anatomical details. This limitation leads to challenges in ultrasoundechocardiography, such as poor visualization of heart valves or foreshorteningof ventricles. Clinicians must interpret these images with inherentuncertainty, a nuance absent in machine learning's one-hot labels. We proposeRe-Training for Uncertainty (RT4U), a data-centric method to introduceuncertainty to weakly informative inputs in the training set. This simpleapproach can be incorporated to existing state-of-the-art aortic stenosisclassification methods to further improve their accuracy. When combined withconformal prediction techniques, RT4U can yield adaptively sized predictionsets which are guaranteed to contain the ground truth class to a high accuracy.We validate the effectiveness of RT4U on three diverse datasets: a public(TMED-2) and a private AS dataset, along with a CIFAR-10-derived toy dataset.Results show improvement on all the datasets.

超声引导诊断的根本问题在于，获取的图像通常是三维解剖的二维横截面，可能会遗漏重要的解剖细节。这一局限性导致超声心动图检查面临挑战，如心脏瓣膜显示不清或心室缩短。临床医生在解释这些图像时必须考虑到固有的不确定性，而机器学习的单击标签不具备这种细微差别。我们提出了不确定性再训练（RT4U），这是一种以数据为中心的方法，可将不确定性引入训练集中的弱信息输入。这种简单的方法可用于现有的最先进的主动脉狭窄分类方法，以进一步提高其准确性。我们在三个不同的数据集上验证了 RT4U 的有效性：一个公共（TMED-2）和一个私人 AS 数据集，以及一个源自 CIFAR-10 的玩具数据集。

{"title":"Reliable Multi-View Learning with Conformal Prediction for Aortic Stenosis Classification in Echocardiography","authors":"Ang Nan Gu, Michael Tsang, Hooman Vaseli, Teresa Tsang, Purang Abolmaesumi","doi":"arxiv-2409.09680","DOIUrl":"https://doi.org/arxiv-2409.09680","url":null,"abstract":"The fundamental problem with ultrasound-guided diagnosis is that the acquired\u0000images are often 2-D cross-sections of a 3-D anatomy, potentially missing\u0000important anatomical details. This limitation leads to challenges in ultrasound\u0000echocardiography, such as poor visualization of heart valves or foreshortening\u0000of ventricles. Clinicians must interpret these images with inherent\u0000uncertainty, a nuance absent in machine learning's one-hot labels. We propose\u0000Re-Training for Uncertainty (RT4U), a data-centric method to introduce\u0000uncertainty to weakly informative inputs in the training set. This simple\u0000approach can be incorporated to existing state-of-the-art aortic stenosis\u0000classification methods to further improve their accuracy. When combined with\u0000conformal prediction techniques, RT4U can yield adaptively sized prediction\u0000sets which are guaranteed to contain the ground truth class to a high accuracy.\u0000We validate the effectiveness of RT4U on three diverse datasets: a public\u0000(TMED-2) and a private AS dataset, along with a CIFAR-10-derived toy dataset.\u0000Results show improvement on all the datasets.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux SciDVS：在 0.7 勒克斯条件下具有 1.7% 时间对比灵敏度的科学事件相机

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-15 DOI: arxiv-2409.09648

Rui Graca, Sheng Zhou, Brian McReynolds, Tobi Delbruck

This paper reports a Dynamic Vision Sensor (DVS) event camera that is 6x moresensitive at 14x lower illumination than existing commercial and prototypecameras. Event cameras output a sparse stream of brightness change events.Their high dynamic range (HDR), quick response, and high temporal resolutionprovide key advantages for scientific applications that involve low lightingconditions and sparse visual events. However, current DVS are hindered by lowsensitivity, resulting from shot noise and pixel-to-pixel mismatch. CommercialDVS have a minimum brightness change threshold of >10%. Sensitive prototypesachieved as low as 1%, but required kilo-lux illumination. Our SciDVS prototypefabricated in a 180nm CMOS image sensor process achieves 1.7% sensitivity atchip illumination of 0.7 lx and 18 Hz bandwidth. Novel features of SciDVS are(1) an auto-centering in-pixel preamplifier providing intrascene HDR andincreased sensitivity, (2) improved control of bandwidth to limit shot noise,and (3) optional pixel binning, allowing the user to trade spatial resolutionfor sensitivity.

本文报告了一种动态视觉传感器（DVS）事件相机，与现有的商用相机和原型相机相比，它在 14 倍低照度条件下的灵敏度提高了 6 倍。事件相机可输出稀疏的亮度变化事件流。它们的高动态范围（HDR）、快速响应和高时间分辨率为涉及低照度条件和稀疏视觉事件的科学应用提供了关键优势。然而，目前的 DVS 受制于拍摄噪声和像素间不匹配造成的低灵敏度。商用 DVS 的最小亮度变化阈值大于 10%。灵敏度原型可低至 1%，但需要千流明照明。我们的 SciDVS 原型采用 180nm CMOS 图像传感器工艺制造，在 0.7 lx 的芯片照明和 18 Hz 带宽条件下实现了 1.7% 的灵敏度。SciDVS 的新功能包括：（1）自动对中像素内前置放大器，提供级内 HDR 并提高灵敏度；（2）改进带宽控制以限制拍摄噪声；以及（3）可选像素分档，允许用户以空间分辨率换取灵敏度。

{"title":"SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux","authors":"Rui Graca, Sheng Zhou, Brian McReynolds, Tobi Delbruck","doi":"arxiv-2409.09648","DOIUrl":"https://doi.org/arxiv-2409.09648","url":null,"abstract":"This paper reports a Dynamic Vision Sensor (DVS) event camera that is 6x more\u0000sensitive at 14x lower illumination than existing commercial and prototype\u0000cameras. Event cameras output a sparse stream of brightness change events.\u0000Their high dynamic range (HDR), quick response, and high temporal resolution\u0000provide key advantages for scientific applications that involve low lighting\u0000conditions and sparse visual events. However, current DVS are hindered by low\u0000sensitivity, resulting from shot noise and pixel-to-pixel mismatch. Commercial\u0000DVS have a minimum brightness change threshold of >10%. Sensitive prototypes\u0000achieved as low as 1%, but required kilo-lux illumination. Our SciDVS prototype\u0000fabricated in a 180nm CMOS image sensor process achieves 1.7% sensitivity at\u0000chip illumination of 0.7 lx and 18 Hz bandwidth. Novel features of SciDVS are\u0000(1) an auto-centering in-pixel preamplifier providing intrascene HDR and\u0000increased sensitivity, (2) improved control of bandwidth to limit shot noise,\u0000and (3) optional pixel binning, allowing the user to trade spatial resolution\u0000for sensitivity.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0