Deep learning-based Magnetic Resonance (MR) reconstruction methods have focused on generating high-quality images but they often overlook the impact on downstream tasks (e.g., segmentation) that utilize the reconstructed images. Cascading separately trained reconstruction network and downstream task network has been shown to introduce performance degradation due to error propagation and domain gaps between training datasets. To mitigate this issue, downstream task-oriented reconstruction optimization has been proposed for a single downstream task. Expanding this optimization to multi-task scenarios is not straightforward. In this work, we extended this optimization to sequentially introduced multiple downstream tasks and demonstrated that a single MR reconstruction network can be optimized for multiple downstream tasks by deploying continual learning (MOST). MOST integrated techniques from replay-based continual learning and image-guided loss to overcome catastrophic forgetting. Comparative experiments demonstrated that MOST outperformed a reconstruction network without finetuning, a reconstruction network with na"ive finetuning, and conventional continual learning methods. This advancement empowers the application of a single MR reconstruction network for multiple downstream tasks. The source code is available at: https://github.com/SNU-LIST/MOST
{"title":"MOST: MR reconstruction Optimization for multiple downStream Tasks via continual learning","authors":"Hwihun Jeong, Se Young Chun, Jongho Lee","doi":"arxiv-2409.10394","DOIUrl":"https://doi.org/arxiv-2409.10394","url":null,"abstract":"Deep learning-based Magnetic Resonance (MR) reconstruction methods have\u0000focused on generating high-quality images but they often overlook the impact on\u0000downstream tasks (e.g., segmentation) that utilize the reconstructed images.\u0000Cascading separately trained reconstruction network and downstream task network\u0000has been shown to introduce performance degradation due to error propagation\u0000and domain gaps between training datasets. To mitigate this issue, downstream\u0000task-oriented reconstruction optimization has been proposed for a single\u0000downstream task. Expanding this optimization to multi-task scenarios is not\u0000straightforward. In this work, we extended this optimization to sequentially\u0000introduced multiple downstream tasks and demonstrated that a single MR\u0000reconstruction network can be optimized for multiple downstream tasks by\u0000deploying continual learning (MOST). MOST integrated techniques from\u0000replay-based continual learning and image-guided loss to overcome catastrophic\u0000forgetting. Comparative experiments demonstrated that MOST outperformed a\u0000reconstruction network without finetuning, a reconstruction network with\u0000na\"ive finetuning, and conventional continual learning methods. This\u0000advancement empowers the application of a single MR reconstruction network for\u0000multiple downstream tasks. The source code is available at:\u0000https://github.com/SNU-LIST/MOST","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262988","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recent advancements in single image super-resolution have been predominantly driven by token mixers and transformer architectures. WaveMixSR utilized the WaveMix architecture, employing a two-dimensional discrete wavelet transform for spatial token mixing, achieving superior performance in super-resolution tasks with remarkable resource efficiency. In this work, we present an enhanced version of the WaveMixSR architecture by (1) replacing the traditional transpose convolution layer with a pixel shuffle operation and (2) implementing a multistage design for higher resolution tasks ($4times$). Our experiments demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other architectures in multiple super-resolution tasks, achieving state-of-the-art for the BSD100 dataset, while also consuming fewer resources, exhibits higher parameter efficiency, lower latency and higher throughput. Our code is available at https://github.com/pranavphoenix/WaveMixSR.
{"title":"WaveMixSR-V2: Enhancing Super-resolution with Higher Efficiency","authors":"Pranav Jeevan, Neeraj Nixon, Amit Sethi","doi":"arxiv-2409.10582","DOIUrl":"https://doi.org/arxiv-2409.10582","url":null,"abstract":"Recent advancements in single image super-resolution have been predominantly\u0000driven by token mixers and transformer architectures. WaveMixSR utilized the\u0000WaveMix architecture, employing a two-dimensional discrete wavelet transform\u0000for spatial token mixing, achieving superior performance in super-resolution\u0000tasks with remarkable resource efficiency. In this work, we present an enhanced\u0000version of the WaveMixSR architecture by (1) replacing the traditional\u0000transpose convolution layer with a pixel shuffle operation and (2) implementing\u0000a multistage design for higher resolution tasks ($4times$). Our experiments\u0000demonstrate that our enhanced model -- WaveMixSR-V2 -- outperforms other\u0000architectures in multiple super-resolution tasks, achieving state-of-the-art\u0000for the BSD100 dataset, while also consuming fewer resources, exhibits higher\u0000parameter efficiency, lower latency and higher throughput. Our code is\u0000available at https://github.com/pranavphoenix/WaveMixSR.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose an end-to-end attribute compression method for dense point clouds. The proposed method combines a frequency sampling module, an adaptive scale feature extraction module with geometry assistance, and a global hyperprior entropy model. The frequency sampling module uses a Hamming window and the Fast Fourier Transform to extract high-frequency components of the point cloud. The difference between the original point cloud and the sampled point cloud is divided into multiple sub-point clouds. These sub-point clouds are then partitioned using an octree, providing a structured input for feature extraction. The feature extraction module integrates adaptive convolutional layers and uses offset-attention to capture both local and global features. Then, a geometry-assisted attribute feature refinement module is used to refine the extracted attribute features. Finally, a global hyperprior model is introduced for entropy encoding. This model propagates hyperprior parameters from the deepest (base) layer to the other layers, further enhancing the encoding efficiency. At the decoder, a mirrored network is used to progressively restore features and reconstruct the color attribute through transposed convolutional layers. The proposed method encodes base layer information at a low bitrate and progressively adds enhancement layer information to improve reconstruction accuracy. Compared to the latest G-PCC test model (TMC13v23) under the MPEG common test conditions (CTCs), the proposed method achieved an average Bjontegaard delta bitrate reduction of 24.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG Category Dense dataset. This is the first instance of a learning-based codec outperforming the G-PCC standard on these datasets under the MPEG CTCs.
{"title":"SPAC: Sampling-based Progressive Attribute Compression for Dense Point Clouds","authors":"Xiaolong Mao, Hui Yuan, Tian Guo, Shiqi Jiang, Raouf Hamzaoui, Sam Kwong","doi":"arxiv-2409.10293","DOIUrl":"https://doi.org/arxiv-2409.10293","url":null,"abstract":"We propose an end-to-end attribute compression method for dense point clouds.\u0000The proposed method combines a frequency sampling module, an adaptive scale\u0000feature extraction module with geometry assistance, and a global hyperprior\u0000entropy model. The frequency sampling module uses a Hamming window and the Fast\u0000Fourier Transform to extract high-frequency components of the point cloud. The\u0000difference between the original point cloud and the sampled point cloud is\u0000divided into multiple sub-point clouds. These sub-point clouds are then\u0000partitioned using an octree, providing a structured input for feature\u0000extraction. The feature extraction module integrates adaptive convolutional\u0000layers and uses offset-attention to capture both local and global features.\u0000Then, a geometry-assisted attribute feature refinement module is used to refine\u0000the extracted attribute features. Finally, a global hyperprior model is\u0000introduced for entropy encoding. This model propagates hyperprior parameters\u0000from the deepest (base) layer to the other layers, further enhancing the\u0000encoding efficiency. At the decoder, a mirrored network is used to\u0000progressively restore features and reconstruct the color attribute through\u0000transposed convolutional layers. The proposed method encodes base layer\u0000information at a low bitrate and progressively adds enhancement layer\u0000information to improve reconstruction accuracy. Compared to the latest G-PCC\u0000test model (TMC13v23) under the MPEG common test conditions (CTCs), the\u0000proposed method achieved an average Bjontegaard delta bitrate reduction of\u000024.58% for the Y component (21.23% for YUV combined) on the MPEG Category Solid\u0000dataset and 22.48% for the Y component (17.19% for YUV combined) on the MPEG\u0000Category Dense dataset. This is the first instance of a learning-based codec\u0000outperforming the G-PCC standard on these datasets under the MPEG CTCs.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liu Li, Hanchun Wang, Matthew Baugh, Qiang Ma, Weitong Zhang, Cheng Ouyang, Daniel Rueckert, Bernhard Kainz
Although existing medical image segmentation methods provide impressive pixel-wise accuracy, they often neglect topological correctness, making their segmentations unusable for many downstream tasks. One option is to retrain such models whilst including a topology-driven loss component. However, this is computationally expensive and often impractical. A better solution would be to have a versatile plug-and-play topology refinement method that is compatible with any domain-specific segmentation pipeline. Directly training a post-processing model to mitigate topological errors often fails as such models tend to be biased towards the topological errors of a target segmentation network. The diversity of these errors is confined to the information provided by a labelled training set, which is especially problematic for small datasets. Our method solves this problem by training a model-agnostic topology refinement network with synthetic segmentations that cover a wide variety of topological errors. Inspired by the Stone-Weierstrass theorem, we synthesize topology-perturbation masks with randomly sampled coefficients of orthogonal polynomial bases, which ensures a complete and unbiased representation. Practically, we verified the efficiency and effectiveness of our methods as being compatible with multiple families of polynomial bases, and show evidence that our universal plug-and-play topology refinement network outperforms both existing topology-driven learning-based and post-processing methods. We also show that combining our method with learning-based models provides an effortless add-on, which can further improve the performance of existing approaches.
{"title":"Universal Topology Refinement for Medical Image Segmentation with Polynomial Feature Synthesis","authors":"Liu Li, Hanchun Wang, Matthew Baugh, Qiang Ma, Weitong Zhang, Cheng Ouyang, Daniel Rueckert, Bernhard Kainz","doi":"arxiv-2409.09796","DOIUrl":"https://doi.org/arxiv-2409.09796","url":null,"abstract":"Although existing medical image segmentation methods provide impressive\u0000pixel-wise accuracy, they often neglect topological correctness, making their\u0000segmentations unusable for many downstream tasks. One option is to retrain such\u0000models whilst including a topology-driven loss component. However, this is\u0000computationally expensive and often impractical. A better solution would be to\u0000have a versatile plug-and-play topology refinement method that is compatible\u0000with any domain-specific segmentation pipeline. Directly training a\u0000post-processing model to mitigate topological errors often fails as such models\u0000tend to be biased towards the topological errors of a target segmentation\u0000network. The diversity of these errors is confined to the information provided\u0000by a labelled training set, which is especially problematic for small datasets.\u0000Our method solves this problem by training a model-agnostic topology refinement\u0000network with synthetic segmentations that cover a wide variety of topological\u0000errors. Inspired by the Stone-Weierstrass theorem, we synthesize\u0000topology-perturbation masks with randomly sampled coefficients of orthogonal\u0000polynomial bases, which ensures a complete and unbiased representation.\u0000Practically, we verified the efficiency and effectiveness of our methods as\u0000being compatible with multiple families of polynomial bases, and show evidence\u0000that our universal plug-and-play topology refinement network outperforms both\u0000existing topology-driven learning-based and post-processing methods. We also\u0000show that combining our method with learning-based models provides an\u0000effortless add-on, which can further improve the performance of existing\u0000approaches.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"74 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In-memory computing hardware accelerators allow more than 10x improvements in peak efficiency and performance for matrix-vector multiplications (MVM) compared to conventional digital designs. For this, they have gained great interest for the acceleration of neural network workloads. Nevertheless, these potential gains are only achieved when the utilization of the computational resources is maximized and the overhead from loading operands in the memory array minimized. To this aim, this paper proposes a novel mapping algorithm for the weights in the IMC macro, based on efficient packing of the weights of network layers in the available memory. The algorithm realizes 1) minimization of weight loading times while at the same time 2) maximally exploiting the parallelism of the IMC computational fabric. A set of case studies are carried out to show achievable trade-offs for the MLPerf Tiny benchmark cite{mlperftiny} on IMC architectures, with potential $10-100times$ EDP improvements.
{"title":"Pack my weights and run! Minimizing overheads for in-memory computing accelerators","authors":"Pouya Houshmand, Marian Verhelst","doi":"arxiv-2409.11437","DOIUrl":"https://doi.org/arxiv-2409.11437","url":null,"abstract":"In-memory computing hardware accelerators allow more than 10x improvements in\u0000peak efficiency and performance for matrix-vector multiplications (MVM)\u0000compared to conventional digital designs. For this, they have gained great\u0000interest for the acceleration of neural network workloads. Nevertheless, these\u0000potential gains are only achieved when the utilization of the computational\u0000resources is maximized and the overhead from loading operands in the memory\u0000array minimized. To this aim, this paper proposes a novel mapping algorithm for\u0000the weights in the IMC macro, based on efficient packing of the weights of\u0000network layers in the available memory. The algorithm realizes 1) minimization\u0000of weight loading times while at the same time 2) maximally exploiting the\u0000parallelism of the IMC computational fabric. A set of case studies are carried\u0000out to show achievable trade-offs for the MLPerf Tiny benchmark\u0000cite{mlperftiny} on IMC architectures, with potential $10-100times$ EDP\u0000improvements.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper we present a new machine learning workflow with unsupervised learning techniques to identify domains within atomic force microscopy images obtained from polymer films. The goal of the workflow is to identify the spatial location of the two types of polymer domains with little to no manual intervention and calculate the domain size distributions which in turn can help qualify the phase separated state of the material as macrophase or microphase ordered or disordered domains. We briefly review existing approaches used in other fields, computer vision and signal processing that can be applicable for the above tasks that happen frequently in the field of polymer science and engineering. We then test these approaches from computer vision and signal processing on the AFM image dataset to identify the strengths and limitations of each of these approaches for our first task. For our first domain segmentation task, we found that the workflow using discrete Fourier transform or discrete cosine transform with variance statistics as the feature works the best. The popular ResNet50 deep learning approach from computer vision field exhibited relatively poorer performance in the domain segmentation task for our AFM images as compared to the DFT and DCT based workflows. For the second task, for each of 144 input AFM images, we then used an existing porespy python package to calculate the domain size distribution from the output of that image from DFT based workflow. The information and open source codes we share in this paper can serve as a guide for researchers in the polymer and soft materials fields who need ML modeling and workflows for automated analyses of AFM images from polymer samples that may have crystalline or amorphous domains, sharp or rough interfaces between domains, or micro or macrophase separated domains.
{"title":"Machine Learning for Analyzing Atomic Force Microscopy (AFM) Images Generated from Polymer Blends","authors":"Aanish Paruchuri, Yunfei Wang, Xiaodan Gu, Arthi Jayaraman","doi":"arxiv-2409.11438","DOIUrl":"https://doi.org/arxiv-2409.11438","url":null,"abstract":"In this paper we present a new machine learning workflow with unsupervised\u0000learning techniques to identify domains within atomic force microscopy images\u0000obtained from polymer films. The goal of the workflow is to identify the\u0000spatial location of the two types of polymer domains with little to no manual\u0000intervention and calculate the domain size distributions which in turn can help\u0000qualify the phase separated state of the material as macrophase or microphase\u0000ordered or disordered domains. We briefly review existing approaches used in\u0000other fields, computer vision and signal processing that can be applicable for\u0000the above tasks that happen frequently in the field of polymer science and\u0000engineering. We then test these approaches from computer vision and signal\u0000processing on the AFM image dataset to identify the strengths and limitations\u0000of each of these approaches for our first task. For our first domain\u0000segmentation task, we found that the workflow using discrete Fourier transform\u0000or discrete cosine transform with variance statistics as the feature works the\u0000best. The popular ResNet50 deep learning approach from computer vision field\u0000exhibited relatively poorer performance in the domain segmentation task for our\u0000AFM images as compared to the DFT and DCT based workflows. For the second task,\u0000for each of 144 input AFM images, we then used an existing porespy python\u0000package to calculate the domain size distribution from the output of that image\u0000from DFT based workflow. The information and open source codes we share in this\u0000paper can serve as a guide for researchers in the polymer and soft materials\u0000fields who need ML modeling and workflows for automated analyses of AFM images\u0000from polymer samples that may have crystalline or amorphous domains, sharp or\u0000rough interfaces between domains, or micro or macrophase separated domains.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"9 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Magnetic Resonance Imaging (MRI) requires a trade-off between resolution, signal-to-noise ratio, and scan time, making high-resolution (HR) acquisition challenging. Therefore, super-resolution for MR image is a feasible solution. However, most existing methods face challenges in accurately learning a continuous volumetric representation from low-resolution image or require HR image for supervision. To solve these challenges, we propose a novel method for MR image super-resolution based on two-factor representation. Specifically, we factorize intensity signals into a linear combination of learnable basis and coefficient factors, enabling efficient continuous volumetric representation from low-resolution MR image. Besides, we introduce a coordinate-based encoding to capture structural relationships between sparse voxels, facilitating smooth completion in unobserved regions. Experiments on BraTS 2019 and MSSEG 2016 datasets demonstrate that our method achieves state-of-the-art performance, providing superior visual fidelity and robustness, particularly in large up-sampling scale MR image super-resolution.
{"title":"Learning Two-factor Representation for Magnetic Resonance Image Super-resolution","authors":"Weifeng Wei, Heng Chen, Pengxiang Su","doi":"arxiv-2409.09731","DOIUrl":"https://doi.org/arxiv-2409.09731","url":null,"abstract":"Magnetic Resonance Imaging (MRI) requires a trade-off between resolution,\u0000signal-to-noise ratio, and scan time, making high-resolution (HR) acquisition\u0000challenging. Therefore, super-resolution for MR image is a feasible solution.\u0000However, most existing methods face challenges in accurately learning a\u0000continuous volumetric representation from low-resolution image or require HR\u0000image for supervision. To solve these challenges, we propose a novel method for\u0000MR image super-resolution based on two-factor representation. Specifically, we\u0000factorize intensity signals into a linear combination of learnable basis and\u0000coefficient factors, enabling efficient continuous volumetric representation\u0000from low-resolution MR image. Besides, we introduce a coordinate-based encoding\u0000to capture structural relationships between sparse voxels, facilitating smooth\u0000completion in unobserved regions. Experiments on BraTS 2019 and MSSEG 2016\u0000datasets demonstrate that our method achieves state-of-the-art performance,\u0000providing superior visual fidelity and robustness, particularly in large\u0000up-sampling scale MR image super-resolution.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262995","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Frauke Wilm, Mathias Öttl, Marc Aubreville, Katharina Breininger
Recent advances in computer-aided diagnosis for histopathology have been largely driven by the use of deep learning models for automated image analysis. While these networks can perform on par with medical experts, their performance can be impeded by out-of-distribution data. The Cross-Organ and Cross-Scanner Adenocarcinoma Segmentation (COSAS) challenge aimed to address the task of cross-domain adenocarcinoma segmentation in the presence of morphological and scanner-induced domain shifts. In this paper, we present a U-Net-based segmentation framework designed to tackle this challenge. Our approach achieved segmentation scores of 0.8020 for the cross-organ track and 0.8527 for the cross-scanner track on the final challenge test sets, ranking it the best-performing submission.
{"title":"Domain and Content Adaptive Convolutions for Cross-Domain Adenocarcinoma Segmentation","authors":"Frauke Wilm, Mathias Öttl, Marc Aubreville, Katharina Breininger","doi":"arxiv-2409.09797","DOIUrl":"https://doi.org/arxiv-2409.09797","url":null,"abstract":"Recent advances in computer-aided diagnosis for histopathology have been\u0000largely driven by the use of deep learning models for automated image analysis.\u0000While these networks can perform on par with medical experts, their performance\u0000can be impeded by out-of-distribution data. The Cross-Organ and Cross-Scanner\u0000Adenocarcinoma Segmentation (COSAS) challenge aimed to address the task of\u0000cross-domain adenocarcinoma segmentation in the presence of morphological and\u0000scanner-induced domain shifts. In this paper, we present a U-Net-based\u0000segmentation framework designed to tackle this challenge. Our approach achieved\u0000segmentation scores of 0.8020 for the cross-organ track and 0.8527 for the\u0000cross-scanner track on the final challenge test sets, ranking it the\u0000best-performing submission.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"4 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ang Nan Gu, Michael Tsang, Hooman Vaseli, Teresa Tsang, Purang Abolmaesumi
The fundamental problem with ultrasound-guided diagnosis is that the acquired images are often 2-D cross-sections of a 3-D anatomy, potentially missing important anatomical details. This limitation leads to challenges in ultrasound echocardiography, such as poor visualization of heart valves or foreshortening of ventricles. Clinicians must interpret these images with inherent uncertainty, a nuance absent in machine learning's one-hot labels. We propose Re-Training for Uncertainty (RT4U), a data-centric method to introduce uncertainty to weakly informative inputs in the training set. This simple approach can be incorporated to existing state-of-the-art aortic stenosis classification methods to further improve their accuracy. When combined with conformal prediction techniques, RT4U can yield adaptively sized prediction sets which are guaranteed to contain the ground truth class to a high accuracy. We validate the effectiveness of RT4U on three diverse datasets: a public (TMED-2) and a private AS dataset, along with a CIFAR-10-derived toy dataset. Results show improvement on all the datasets.
超声引导诊断的根本问题在于,获取的图像通常是三维解剖的二维横截面,可能会遗漏重要的解剖细节。这一局限性导致超声心动图检查面临挑战,如心脏瓣膜显示不清或心室缩短。临床医生在解释这些图像时必须考虑到固有的不确定性,而机器学习的单击标签不具备这种细微差别。我们提出了不确定性再训练(RT4U),这是一种以数据为中心的方法,可将不确定性引入训练集中的弱信息输入。这种简单的方法可用于现有的最先进的主动脉狭窄分类方法,以进一步提高其准确性。我们在三个不同的数据集上验证了 RT4U 的有效性:一个公共(TMED-2)和一个私人 AS 数据集,以及一个源自 CIFAR-10 的玩具数据集。
{"title":"Reliable Multi-View Learning with Conformal Prediction for Aortic Stenosis Classification in Echocardiography","authors":"Ang Nan Gu, Michael Tsang, Hooman Vaseli, Teresa Tsang, Purang Abolmaesumi","doi":"arxiv-2409.09680","DOIUrl":"https://doi.org/arxiv-2409.09680","url":null,"abstract":"The fundamental problem with ultrasound-guided diagnosis is that the acquired\u0000images are often 2-D cross-sections of a 3-D anatomy, potentially missing\u0000important anatomical details. This limitation leads to challenges in ultrasound\u0000echocardiography, such as poor visualization of heart valves or foreshortening\u0000of ventricles. Clinicians must interpret these images with inherent\u0000uncertainty, a nuance absent in machine learning's one-hot labels. We propose\u0000Re-Training for Uncertainty (RT4U), a data-centric method to introduce\u0000uncertainty to weakly informative inputs in the training set. This simple\u0000approach can be incorporated to existing state-of-the-art aortic stenosis\u0000classification methods to further improve their accuracy. When combined with\u0000conformal prediction techniques, RT4U can yield adaptively sized prediction\u0000sets which are guaranteed to contain the ground truth class to a high accuracy.\u0000We validate the effectiveness of RT4U on three diverse datasets: a public\u0000(TMED-2) and a private AS dataset, along with a CIFAR-10-derived toy dataset.\u0000Results show improvement on all the datasets.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"48 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Graca, Sheng Zhou, Brian McReynolds, Tobi Delbruck
This paper reports a Dynamic Vision Sensor (DVS) event camera that is 6x more sensitive at 14x lower illumination than existing commercial and prototype cameras. Event cameras output a sparse stream of brightness change events. Their high dynamic range (HDR), quick response, and high temporal resolution provide key advantages for scientific applications that involve low lighting conditions and sparse visual events. However, current DVS are hindered by low sensitivity, resulting from shot noise and pixel-to-pixel mismatch. Commercial DVS have a minimum brightness change threshold of >10%. Sensitive prototypes achieved as low as 1%, but required kilo-lux illumination. Our SciDVS prototype fabricated in a 180nm CMOS image sensor process achieves 1.7% sensitivity at chip illumination of 0.7 lx and 18 Hz bandwidth. Novel features of SciDVS are (1) an auto-centering in-pixel preamplifier providing intrascene HDR and increased sensitivity, (2) improved control of bandwidth to limit shot noise, and (3) optional pixel binning, allowing the user to trade spatial resolution for sensitivity.
{"title":"SciDVS: A Scientific Event Camera with 1.7% Temporal Contrast Sensitivity at 0.7 lux","authors":"Rui Graca, Sheng Zhou, Brian McReynolds, Tobi Delbruck","doi":"arxiv-2409.09648","DOIUrl":"https://doi.org/arxiv-2409.09648","url":null,"abstract":"This paper reports a Dynamic Vision Sensor (DVS) event camera that is 6x more\u0000sensitive at 14x lower illumination than existing commercial and prototype\u0000cameras. Event cameras output a sparse stream of brightness change events.\u0000Their high dynamic range (HDR), quick response, and high temporal resolution\u0000provide key advantages for scientific applications that involve low lighting\u0000conditions and sparse visual events. However, current DVS are hindered by low\u0000sensitivity, resulting from shot noise and pixel-to-pixel mismatch. Commercial\u0000DVS have a minimum brightness change threshold of >10%. Sensitive prototypes\u0000achieved as low as 1%, but required kilo-lux illumination. Our SciDVS prototype\u0000fabricated in a 180nm CMOS image sensor process achieves 1.7% sensitivity at\u0000chip illumination of 0.7 lx and 18 Hz bandwidth. Novel features of SciDVS are\u0000(1) an auto-centering in-pixel preamplifier providing intrascene HDR and\u0000increased sensitivity, (2) improved control of bandwidth to limit shot noise,\u0000and (3) optional pixel binning, allowing the user to trade spatial resolution\u0000for sensitivity.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142262998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}