arXiv - EE - Image and Video Processing最新文献_第9页

Three-Dimensional, Multimodal Synchrotron Data for Machine Learning Applications 用于机器学习应用的三维多模态同步辐射数据

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07322

Calum Green, Sharif Ahmed, Shashidhara Marathe, Liam Perera, Alberto Leonardi, Killian Gmyrek, Daniele Dini, James Le Houx

Machine learning techniques are being increasingly applied in medical andphysical sciences across a variety of imaging modalities; however, an importantissue when developing these tools is the availability of good quality trainingdata. Here we present a unique, multimodal synchrotron dataset of a bespokezinc-doped Zeolite 13X sample that can be used to develop advanced deeplearning and data fusion pipelines. Multi-resolution micro X-ray computedtomography was performed on a zinc-doped Zeolite 13X fragment to characteriseits pores and features, before spatially resolved X-ray diffraction computedtomography was carried out to characterise the homogeneous distribution ofsodium and zinc phases. Zinc absorption was controlled to create a simple,spatially isolated, two-phase material. Both raw and processed data isavailable as a series of Zenodo entries. Altogether we present a spatiallyresolved, three-dimensional, multimodal, multi-resolution dataset that can beused for the development of machine learning techniques. Such techniquesinclude development of super-resolution, multimodal data fusion, and 3Dreconstruction algorithm development.

机器学习技术正越来越多地应用于医学和物理科学领域的各种成像模式；然而，开发这些工具时的一个重要问题是能否获得高质量的训练数据。在这里，我们展示了一个定制的掺锌沸石 13X 样品的独特、多模态同步加速器数据集，该数据集可用于开发先进的深度学习和数据融合管道。对掺锌沸石 13X 碎片进行了多分辨率显微 X 射线计算机断层成像，以确定其孔隙和特征，然后进行了空间分辨 X 射线衍射计算机断层成像，以确定钠相和锌相的均匀分布特征。对锌的吸收进行了控制，以形成一种简单、空间隔离的两相材料。原始数据和经过处理的数据都以一系列 Zenodo 条目的形式提供。总之，我们提供了一个空间分辨、三维、多模态、多分辨率的数据集，可用于机器学习技术的开发。这些技术包括超分辨率开发、多模态数据融合和三维重建算法开发。

{"title":"Three-Dimensional, Multimodal Synchrotron Data for Machine Learning Applications","authors":"Calum Green, Sharif Ahmed, Shashidhara Marathe, Liam Perera, Alberto Leonardi, Killian Gmyrek, Daniele Dini, James Le Houx","doi":"arxiv-2409.07322","DOIUrl":"https://doi.org/arxiv-2409.07322","url":null,"abstract":"Machine learning techniques are being increasingly applied in medical and\u0000physical sciences across a variety of imaging modalities; however, an important\u0000issue when developing these tools is the availability of good quality training\u0000data. Here we present a unique, multimodal synchrotron dataset of a bespoke\u0000zinc-doped Zeolite 13X sample that can be used to develop advanced deep\u0000learning and data fusion pipelines. Multi-resolution micro X-ray computed\u0000tomography was performed on a zinc-doped Zeolite 13X fragment to characterise\u0000its pores and features, before spatially resolved X-ray diffraction computed\u0000tomography was carried out to characterise the homogeneous distribution of\u0000sodium and zinc phases. Zinc absorption was controlled to create a simple,\u0000spatially isolated, two-phase material. Both raw and processed data is\u0000available as a series of Zenodo entries. Altogether we present a spatially\u0000resolved, three-dimensional, multimodal, multi-resolution dataset that can be\u0000used for the development of machine learning techniques. Such techniques\u0000include development of super-resolution, multimodal data fusion, and 3D\u0000reconstruction algorithm development.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AC-IND: Sparse CT reconstruction based on attenuation coefficient estimation and implicit neural distribution AC-IND：基于衰减系数估计和隐式神经分布的稀疏 CT 重构

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07171

Wangduo Xie, Richard Schoonhoven, Tristan van Leeuwen, Matthew B. Blaschko

Computed tomography (CT) reconstruction plays a crucial role in industrialnondestructive testing and medical diagnosis. Sparse view CT reconstructionaims to reconstruct high-quality CT images while only using a small number ofprojections, which helps to improve the detection speed of industrial assemblylines and is also meaningful for reducing radiation in medical scenarios.Sparse CT reconstruction methods based on implicit neural representations(INRs) have recently shown promising performance, but still produce artifactsbecause of the difficulty of obtaining useful prior information. In this work,we incorporate a powerful prior: the total number of material categories ofobjects. To utilize the prior, we design AC-IND, a self-supervised method basedon Attenuation Coefficient Estimation and Implicit Neural Distribution.Specifically, our method first transforms the traditional INR from scalarmapping to probability distribution mapping. Then we design a compactattenuation coefficient estimator initialized with values from a roughreconstruction and fast segmentation. Finally, our algorithm finishes the CTreconstruction by jointly optimizing the estimator and the generateddistribution. Through experiments, we find that our method not only outperformsthe comparative methods in sparse CT reconstruction but also can automaticallygenerate semantic segmentation maps.

计算机断层扫描（CT）重建在工业无损检测和医疗诊断中起着至关重要的作用。基于隐式神经表征（INRs）的稀疏 CT 重建方法最近表现出良好的性能，但由于难以获得有用的先验信息，仍然会产生伪影。在这项工作中，我们加入了一个强大的先验信息：物体的材料类别总数。为了利用该先验信息，我们设计了基于衰减系数估计和隐式神经分布的自监督方法 AC-IND。具体来说，我们的方法首先将传统的 INR 从标尺映射转换为概率分布映射。然后，我们设计了一个紧凑的衰减系数估计器，该估计器的初始值来自粗略重建和快速分割。最后，我们的算法通过联合优化估计器和生成的分布来完成 CT 重建。通过实验，我们发现我们的方法不仅在稀疏 CT 重建方面优于其他比较方法，而且还能自动生成语义分割图。

{"title":"AC-IND: Sparse CT reconstruction based on attenuation coefficient estimation and implicit neural distribution","authors":"Wangduo Xie, Richard Schoonhoven, Tristan van Leeuwen, Matthew B. Blaschko","doi":"arxiv-2409.07171","DOIUrl":"https://doi.org/arxiv-2409.07171","url":null,"abstract":"Computed tomography (CT) reconstruction plays a crucial role in industrial\u0000nondestructive testing and medical diagnosis. Sparse view CT reconstruction\u0000aims to reconstruct high-quality CT images while only using a small number of\u0000projections, which helps to improve the detection speed of industrial assembly\u0000lines and is also meaningful for reducing radiation in medical scenarios.\u0000Sparse CT reconstruction methods based on implicit neural representations\u0000(INRs) have recently shown promising performance, but still produce artifacts\u0000because of the difficulty of obtaining useful prior information. In this work,\u0000we incorporate a powerful prior: the total number of material categories of\u0000objects. To utilize the prior, we design AC-IND, a self-supervised method based\u0000on Attenuation Coefficient Estimation and Implicit Neural Distribution.\u0000Specifically, our method first transforms the traditional INR from scalar\u0000mapping to probability distribution mapping. Then we design a compact\u0000attenuation coefficient estimator initialized with values from a rough\u0000reconstruction and fast segmentation. Finally, our algorithm finishes the CT\u0000reconstruction by jointly optimizing the estimator and the generated\u0000distribution. Through experiments, we find that our method not only outperforms\u0000the comparative methods in sparse CT reconstruction but also can automatically\u0000generate semantic segmentation maps.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review 行人意向预测中的特征重要性：情境感知回顾

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07645

Mohsen Azarmi, Mahdi Rezaei, He Wang, Ali Arabian

Recent advancements in predicting pedestrian crossing intentions forAutonomous Vehicles using Computer Vision and Deep Neural Networks arepromising. However, the black-box nature of DNNs poses challenges inunderstanding how the model works and how input features contribute to finalpredictions. This lack of interpretability delimits the trust in modelperformance and hinders informed decisions on feature selection,representation, and model optimisation; thereby affecting the efficacy offuture research in the field. To address this, we introduce Context-awarePermutation Feature Importance (CAPFI), a novel approach tailored forpedestrian intention prediction. CAPFI enables more interpretability andreliable assessments of feature importance by leveraging subdivided scenariocontexts, mitigating the randomness of feature values through targetedshuffling. This aims to reduce variance and prevent biased estimations inimportance scores during permutations. We divide the Pedestrian IntentionEstimation (PIE) dataset into 16 comparable context sets, measure the baselineperformance of five distinct neural network architectures for intentionprediction in each context, and assess input feature importance using CAPFI. Weobserved nuanced differences among models across various contextualcharacteristics. The research reveals the critical role of pedestrian boundingboxes and ego-vehicle speed in predicting pedestrian intentions, and potentialprediction biases due to the speed feature through cross-context permutationevaluation. We propose an alternative feature representation by consideringproximity change rate for rendering dynamic pedestrian-vehicle locomotion,thereby enhancing the contributions of input features to intention prediction.These findings underscore the importance of contextual features and theirdiversity to develop accurate and robust intent-predictive models.

利用计算机视觉和深度神经网络为自动驾驶汽车预测行人过马路意图的最新进展令人期待。然而，深度神经网络的黑箱性质给理解模型如何工作以及输入特征如何有助于最终预测带来了挑战。这种不可解释性限制了对模型性能的信任，阻碍了在特征选择、表示和模型优化方面做出明智的决策，从而影响了该领域未来研究的效率。为了解决这个问题，我们引入了上下文感知突变特征重要性（CAPFI），这是一种为行人意图预测量身定制的新方法。CAPFI 利用细分的场景上下文，通过有针对性的洗牌减轻特征值的随机性，从而提高特征重要性的可解释性和可靠评估。这样做的目的是减少差异，防止在排列过程中对重要度分数的估计出现偏差。我们将行人意向预测（PIE）数据集分为 16 个可比较的情境集，测量了五种不同神经网络架构在每个情境中的意向预测基准性能，并使用 CAPFI 评估了输入特征的重要性。我们观察到了不同语境特征下模型之间的细微差别。研究揭示了行人边界框和自我车辆速度在预测行人意图中的关键作用，并通过跨情境排列评估发现了速度特征可能导致的预测偏差。我们提出了另一种特征表示方法，即考虑近距离变化率来呈现动态的行人-车辆运动，从而提高输入特征对意图预测的贡献。

{"title":"Feature Importance in Pedestrian Intention Prediction: A Context-Aware Review","authors":"Mohsen Azarmi, Mahdi Rezaei, He Wang, Ali Arabian","doi":"arxiv-2409.07645","DOIUrl":"https://doi.org/arxiv-2409.07645","url":null,"abstract":"Recent advancements in predicting pedestrian crossing intentions for\u0000Autonomous Vehicles using Computer Vision and Deep Neural Networks are\u0000promising. However, the black-box nature of DNNs poses challenges in\u0000understanding how the model works and how input features contribute to final\u0000predictions. This lack of interpretability delimits the trust in model\u0000performance and hinders informed decisions on feature selection,\u0000representation, and model optimisation; thereby affecting the efficacy of\u0000future research in the field. To address this, we introduce Context-aware\u0000Permutation Feature Importance (CAPFI), a novel approach tailored for\u0000pedestrian intention prediction. CAPFI enables more interpretability and\u0000reliable assessments of feature importance by leveraging subdivided scenario\u0000contexts, mitigating the randomness of feature values through targeted\u0000shuffling. This aims to reduce variance and prevent biased estimations in\u0000importance scores during permutations. We divide the Pedestrian Intention\u0000Estimation (PIE) dataset into 16 comparable context sets, measure the baseline\u0000performance of five distinct neural network architectures for intention\u0000prediction in each context, and assess input feature importance using CAPFI. We\u0000observed nuanced differences among models across various contextual\u0000characteristics. The research reveals the critical role of pedestrian bounding\u0000boxes and ego-vehicle speed in predicting pedestrian intentions, and potential\u0000prediction biases due to the speed feature through cross-context permutation\u0000evaluation. We propose an alternative feature representation by considering\u0000proximity change rate for rendering dynamic pedestrian-vehicle locomotion,\u0000thereby enhancing the contributions of input features to intention prediction.\u0000These findings underscore the importance of contextual features and their\u0000diversity to develop accurate and robust intent-predictive models.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NVRC: Neural Video Representation Compression NVRC：神经视频表示压缩

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07414

Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull

Recent advances in implicit neural representation (INR)-based video codinghave demonstrated its potential to compete with both conventional and otherlearning-based approaches. With INR methods, a neural network is trained tooverfit a video sequence, with its parameters compressed to obtain a compactrepresentation of the video content. However, although promising results havebeen achieved, the best INR-based methods are still out-performed by the lateststandard codecs, such as VVC VTM, partially due to the simple model compressiontechniques employed. In this paper, rather than focusing on representationarchitectures as in many existing works, we propose a novel INR-based videocompression framework, Neural Video Representation Compression (NVRC),targeting compression of the representation. Based on the novel entropy codingand quantization models proposed, NVRC, for the first time, is able to optimizean INR-based video codec in a fully end-to-end manner. To further minimize theadditional bitrate overhead introduced by the entropy models, we have alsoproposed a new model compression framework for coding all the network,quantization and entropy model parameters hierarchically. Our experiments showthat NVRC outperforms many conventional and learning-based benchmark codecs,with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset,measured in PSNR. As far as we are aware, this is the first time an INR-basedvideo codec achieving such performance. The implementation of NVRC will bereleased at www.github.com.

基于隐式神经表示（INR）的视频编码技术的最新进展表明，它具有与传统方法和其他基于学习的方法相抗衡的潜力。通过 INR 方法，训练神经网络以适应视频序列，并压缩其参数以获得视频内容的紧凑表示。然而，尽管已经取得了可喜的成果，但基于 INR 的最佳方法仍然比不上最新的标准编解码器，如 VVC VTM，部分原因是采用了简单的模型压缩技术。在本文中，我们没有像许多现有著作那样专注于表示架构，而是提出了一种新颖的基于 INR 的视频压缩框架--神经视频表示压缩（NVRC），其目标是压缩表示。基于所提出的新型熵编码和量化模型，NVRC 首次能够以完全端到端的方式优化基于 INR 的视频编解码器。为了进一步减少熵模型带来的额外比特率开销，我们还提出了一种新的模型压缩框架，对所有网络、量化和熵模型参数进行分层编码。我们的实验表明，在 UVG 数据集上，NVRC 的性能优于许多传统的和基于学习的基准编解码器，与 VVC VTM（随机存取）相比，NVRC 的平均编码增益为 24%（以 PSNR 衡量）。据我们所知，这是基于 INR 的视频编解码器首次达到这样的性能。NVRC 的实现将在 www.github.com 上发布。

{"title":"NVRC: Neural Video Representation Compression","authors":"Ho Man Kwan, Ge Gao, Fan Zhang, Andrew Gower, David Bull","doi":"arxiv-2409.07414","DOIUrl":"https://doi.org/arxiv-2409.07414","url":null,"abstract":"Recent advances in implicit neural representation (INR)-based video coding\u0000have demonstrated its potential to compete with both conventional and other\u0000learning-based approaches. With INR methods, a neural network is trained to\u0000overfit a video sequence, with its parameters compressed to obtain a compact\u0000representation of the video content. However, although promising results have\u0000been achieved, the best INR-based methods are still out-performed by the latest\u0000standard codecs, such as VVC VTM, partially due to the simple model compression\u0000techniques employed. In this paper, rather than focusing on representation\u0000architectures as in many existing works, we propose a novel INR-based video\u0000compression framework, Neural Video Representation Compression (NVRC),\u0000targeting compression of the representation. Based on the novel entropy coding\u0000and quantization models proposed, NVRC, for the first time, is able to optimize\u0000an INR-based video codec in a fully end-to-end manner. To further minimize the\u0000additional bitrate overhead introduced by the entropy models, we have also\u0000proposed a new model compression framework for coding all the network,\u0000quantization and entropy model parameters hierarchically. Our experiments show\u0000that NVRC outperforms many conventional and learning-based benchmark codecs,\u0000with a 24% average coding gain over VVC VTM (Random Access) on the UVG dataset,\u0000measured in PSNR. As far as we are aware, this is the first time an INR-based\u0000video codec achieving such performance. The implementation of NVRC will be\u0000released at www.github.com.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Medical Shape Reconstruction via Meta-learned Implicit Neural Representations 通过元学习隐含神经表征实现快速医学形状重构

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07100

Gaia Romana De Paolis, Dimitrios Lenis, Johannes Novotny, Maria Wimmer, Astrid Berg, Theresa Neubauer, Philip Matthias Winter, David Major, Ariharasudhan Muthusami, Gerald Schröcker, Martin Mienkina, Katja Bühler

Efficient and fast reconstruction of anatomical structures plays a crucialrole in clinical practice. Minimizing retrieval and processing times not onlypotentially enhances swift response and decision-making in critical scenariosbut also supports interactive surgical planning and navigation. Recent methodsattempt to solve the medical shape reconstruction problem by utilizing implicitneural functions. However, their performance suffers in terms of generalizationand computation time, a critical metric for real-time applications. To addressthese challenges, we propose to leverage meta-learning to improve the networkparameters initialization, reducing inference time by an order of magnitudewhile maintaining high accuracy. We evaluate our approach on three publicdatasets covering different anatomical shapes and modalities, namely CT andMRI. Our experimental results show that our model can handle various inputconfigurations, such as sparse slices with different orientations and spacings.Additionally, we demonstrate that our method exhibits strong transferablecapabilities in generalizing to shape domains unobserved at training time.

高效、快速地重建解剖结构在临床实践中起着至关重要的作用。尽量缩短检索和处理时间不仅能在关键场景中提高快速反应和决策能力，还能支持交互式手术规划和导航。最近的方法试图利用隐式神经函数来解决医学形状重建问题。然而，这些方法在泛化和计算时间（实时应用的关键指标）方面表现不佳。为了应对这些挑战，我们建议利用元学习来改进网络参数初始化，从而将推理时间减少一个数量级，同时保持高精度。我们在三个公共数据集（涵盖不同的解剖形状和模式，即 CT 和 MRI）上评估了我们的方法。实验结果表明，我们的模型可以处理各种输入配置，例如具有不同方向和间距的稀疏切片。此外，我们还证明了我们的方法在泛化到训练时未观察到的形状域方面具有很强的可迁移能力。

{"title":"Fast Medical Shape Reconstruction via Meta-learned Implicit Neural Representations","authors":"Gaia Romana De Paolis, Dimitrios Lenis, Johannes Novotny, Maria Wimmer, Astrid Berg, Theresa Neubauer, Philip Matthias Winter, David Major, Ariharasudhan Muthusami, Gerald Schröcker, Martin Mienkina, Katja Bühler","doi":"arxiv-2409.07100","DOIUrl":"https://doi.org/arxiv-2409.07100","url":null,"abstract":"Efficient and fast reconstruction of anatomical structures plays a crucial\u0000role in clinical practice. Minimizing retrieval and processing times not only\u0000potentially enhances swift response and decision-making in critical scenarios\u0000but also supports interactive surgical planning and navigation. Recent methods\u0000attempt to solve the medical shape reconstruction problem by utilizing implicit\u0000neural functions. However, their performance suffers in terms of generalization\u0000and computation time, a critical metric for real-time applications. To address\u0000these challenges, we propose to leverage meta-learning to improve the network\u0000parameters initialization, reducing inference time by an order of magnitude\u0000while maintaining high accuracy. We evaluate our approach on three public\u0000datasets covering different anatomical shapes and modalities, namely CT and\u0000MRI. Our experimental results show that our model can handle various input\u0000configurations, such as sparse slices with different orientations and spacings.\u0000Additionally, we demonstrate that our method exhibits strong transferable\u0000capabilities in generalizing to shape domains unobserved at training time.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dual channel CW nnU-Net for 3D PET-CT Lesion Segmentation in 2024 autoPET III Challenge 2024 autoPET III 挑战赛中用于三维 PET-CT 病灶分割的双通道 CW nnU-Net

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07144

Ching-Wei Wang, Ting-Sheng Su, Keng-Wei Liu

PET/CT is extensively used in imaging malignant tumors because it highlightsareas of increased glucose metabolism, indicative of cancerous activity.Accurate 3D lesion segmentation in PET/CT imaging is essential for effectiveoncological diagnostics and treatment planning. In this study, we developed anadvanced 3D residual U-Net model for the Automated Lesion Segmentation inWhole-Body PET/CT - Multitracer Multicenter Generalization (autoPET III)Challenge, which will be held jointly with 2024 Medical Image Computing andComputer Assisted Intervention (MICCAI) conference at Marrakesh, Morocco.Proposed model incorporates a novel sample attention boosting technique toenhance segmentation performance by adjusting the contribution of challengingcases during training, improving generalization across FDG and PSMA tracers.The proposed model outperformed the challenge baseline model in the preliminarytest set on the Grand Challenge platform, and our team is currently ranking inthe 2nd place among 497 participants worldwide from 53 countries (accesseddate: 2024/9/4), with Dice score of 0.8700, False Negative Volume of 19.3969and False Positive Volume of 1.0857.

正电子发射计算机断层显像/计算机断层扫描（PET/CT）被广泛应用于恶性肿瘤的成像，因为它能突出显示葡萄糖代谢增加的区域，而葡萄糖代谢增加是癌症活动的标志。在这项研究中，我们为全身 PET/CT 自动病灶分割--多示踪剂多中心泛化（autoPET III）挑战赛开发了一种先进的三维残留 U-Net 模型，该挑战赛将与 2024 年医学影像计算和计算机辅助干预（MICCAI）会议在摩洛哥马拉喀什联合举行。在大挑战平台的初步测试集中，拟议模型的表现优于挑战基线模型，我们的团队目前在来自全球 53 个国家的 497 名参赛者中排名第二（访问日期：2024/9/4），Dice 分数为 0.8700，假阴性量为 19.3969，假阳性量为 1.0857。

{"title":"Dual channel CW nnU-Net for 3D PET-CT Lesion Segmentation in 2024 autoPET III Challenge","authors":"Ching-Wei Wang, Ting-Sheng Su, Keng-Wei Liu","doi":"arxiv-2409.07144","DOIUrl":"https://doi.org/arxiv-2409.07144","url":null,"abstract":"PET/CT is extensively used in imaging malignant tumors because it highlights\u0000areas of increased glucose metabolism, indicative of cancerous activity.\u0000Accurate 3D lesion segmentation in PET/CT imaging is essential for effective\u0000oncological diagnostics and treatment planning. In this study, we developed an\u0000advanced 3D residual U-Net model for the Automated Lesion Segmentation in\u0000Whole-Body PET/CT - Multitracer Multicenter Generalization (autoPET III)\u0000Challenge, which will be held jointly with 2024 Medical Image Computing and\u0000Computer Assisted Intervention (MICCAI) conference at Marrakesh, Morocco.\u0000Proposed model incorporates a novel sample attention boosting technique to\u0000enhance segmentation performance by adjusting the contribution of challenging\u0000cases during training, improving generalization across FDG and PSMA tracers.\u0000The proposed model outperformed the challenge baseline model in the preliminary\u0000test set on the Grand Challenge platform, and our team is currently ranking in\u0000the 2nd place among 497 participants worldwide from 53 countries (accessed\u0000date: 2024/9/4), with Dice score of 0.8700, False Negative Volume of 19.3969\u0000and False Positive Volume of 1.0857.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis DS-ViT：用于阿尔茨海默氏症早期诊断中跨任务蒸馏的双流视觉转换器

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07584

Ke Chen, Yifeng Wang, Yufei Zhou, Haohan Wang

In the field of Alzheimer's disease diagnosis, segmentation andclassification tasks are inherently interconnected. Sharing knowledge betweenmodels for these tasks can significantly improve training efficiency,particularly when training data is scarce. However, traditional knowledgedistillation techniques often struggle to bridge the gap between segmentationand classification due to the distinct nature of tasks and different modelarchitectures. To address this challenge, we propose a dual-stream pipelinethat facilitates cross-task and cross-architecture knowledge sharing. Ourapproach introduces a dual-stream embedding module that unifies featurerepresentations from segmentation and classification models, enablingdimensional integration of these features to guide the classification model. Wevalidated our method on multiple 3D datasets for Alzheimer's disease diagnosis,demonstrating significant improvements in classification performance,especially on small datasets. Furthermore, we extended our pipeline with aresidual temporal attention mechanism for early diagnosis, utilizing imagestaken before the atrophy of patients' brain mass. This advancement showspromise in enabling diagnosis approximately six months earlier in mild andasymptomatic stages, offering critical time for intervention.

在阿尔茨海默病诊断领域，分割和分类任务本质上是相互关联的。在这些任务的模型之间共享知识可以显著提高训练效率，尤其是在训练数据稀缺的情况下。然而，由于任务的不同性质和模型架构的不同，传统的知识发散技术往往难以弥合分割和分类之间的差距。为了应对这一挑战，我们提出了一种双流管道，以促进跨任务和跨架构的知识共享。我们的方法引入了一个双流嵌入模块，该模块统一了来自分割和分类模型的特征表示，使这些特征的维度整合能够指导分类模型。我们在用于阿尔茨海默病诊断的多个三维数据集上验证了我们的方法，结果表明分类性能显著提高，尤其是在小型数据集上。此外，我们还利用在患者脑组织萎缩之前拍摄的图像，扩展了用于早期诊断的时间注意力机制。这一进步有望使轻度和无症状阶段的诊断提前约六个月，为干预提供关键时间。

{"title":"DS-ViT: Dual-Stream Vision Transformer for Cross-Task Distillation in Alzheimer's Early Diagnosis","authors":"Ke Chen, Yifeng Wang, Yufei Zhou, Haohan Wang","doi":"arxiv-2409.07584","DOIUrl":"https://doi.org/arxiv-2409.07584","url":null,"abstract":"In the field of Alzheimer's disease diagnosis, segmentation and\u0000classification tasks are inherently interconnected. Sharing knowledge between\u0000models for these tasks can significantly improve training efficiency,\u0000particularly when training data is scarce. However, traditional knowledge\u0000distillation techniques often struggle to bridge the gap between segmentation\u0000and classification due to the distinct nature of tasks and different model\u0000architectures. To address this challenge, we propose a dual-stream pipeline\u0000that facilitates cross-task and cross-architecture knowledge sharing. Our\u0000approach introduces a dual-stream embedding module that unifies feature\u0000representations from segmentation and classification models, enabling\u0000dimensional integration of these features to guide the classification model. We\u0000validated our method on multiple 3D datasets for Alzheimer's disease diagnosis,\u0000demonstrating significant improvements in classification performance,\u0000especially on small datasets. Furthermore, we extended our pipeline with a\u0000residual temporal attention mechanism for early diagnosis, utilizing images\u0000taken before the atrophy of patients' brain mass. This advancement shows\u0000promise in enabling diagnosis approximately six months earlier in mild and\u0000asymptomatic stages, offering critical time for intervention.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"7 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BLS-GAN: A Deep Layer Separation Framework for Eliminating Bone Overlap in Conventional Radiographs BLS-GAN：消除传统射线照片中骨重叠的深层分离框架

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07304

Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima

Conventional radiography is the widely used imaging technology in diagnosing,monitoring, and prognosticating musculoskeletal (MSK) diseases because of itseasy availability, versatility, and cost-effectiveness. In conventionalradiographs, bone overlaps are prevalent, and can impede the accurateassessment of bone characteristics by radiologists or algorithms, posingsignificant challenges to conventional and computer-aided diagnoses. This workinitiated the study of a challenging scenario - bone layer separation inconventional radiographs, in which separate overlapped bone regions enable theindependent assessment of the bone characteristics of each bone layer and laythe groundwork for MSK disease diagnosis and its automation. This work proposeda Bone Layer Separation GAN (BLS-GAN) framework that can produce high-qualitybone layer images with reasonable bone characteristics and texture. Thisframework introduced a reconstructor based on conventional radiography imagingprinciples, which achieved efficient reconstruction and mitigates the recurrentcalculations and training instability issues caused by soft tissue in theoverlapped regions. Additionally, pre-training with synthetic images wasimplemented to enhance the stability of both the training process and theresults. The generated images passed the visual Turing test, and improvedperformance in downstream tasks. This work affirms the feasibility ofextracting bone layer images from conventional radiographs, which holds promisefor leveraging bone layer separation technology to facilitate morecomprehensive analytical research in MSK diagnosis, monitoring, and prognosis.Code and dataset will be made available.

传统放射摄影技术因其简便易行、用途广泛和成本效益高而被广泛用于诊断、监测和预后肌肉骨骼（MSK）疾病。在传统射线照片中，骨重叠现象非常普遍，会妨碍放射科医生或算法对骨特征的准确评估，给传统诊断和计算机辅助诊断带来了巨大挑战。这项工作开始研究一种具有挑战性的情况--骨层分离的非传统射线照片，在这种情况下，分离重叠的骨区域可以独立评估每个骨层的骨特征，并为 MSK 疾病诊断及其自动化奠定基础。这项研究提出了一种骨层分离 GAN（Bone Layer Separation GAN，BLS-GAN）框架，它能生成具有合理骨骼特征和纹理的高质量骨层图像。该框架引入了基于传统放射成像原理的重建器，实现了高效重建，并缓解了重叠区域软组织引起的重复计算和训练不稳定问题。此外，还利用合成图像进行了预训练，以提高训练过程和结果的稳定性。生成的图像通过了视觉图灵测试，并提高了下游任务的性能。这项工作证实了从传统射线照片中提取骨层图像的可行性，有望利用骨层分离技术促进MSK诊断、监测和预后方面更全面的分析研究。

{"title":"BLS-GAN: A Deep Layer Separation Framework for Eliminating Bone Overlap in Conventional Radiographs","authors":"Haolin Wang, Yafei Ou, Prasoon Ambalathankandy, Gen Ota, Pengyu Dai, Masayuki Ikebe, Kenji Suzuki, Tamotsu Kamishima","doi":"arxiv-2409.07304","DOIUrl":"https://doi.org/arxiv-2409.07304","url":null,"abstract":"Conventional radiography is the widely used imaging technology in diagnosing,\u0000monitoring, and prognosticating musculoskeletal (MSK) diseases because of its\u0000easy availability, versatility, and cost-effectiveness. In conventional\u0000radiographs, bone overlaps are prevalent, and can impede the accurate\u0000assessment of bone characteristics by radiologists or algorithms, posing\u0000significant challenges to conventional and computer-aided diagnoses. This work\u0000initiated the study of a challenging scenario - bone layer separation in\u0000conventional radiographs, in which separate overlapped bone regions enable the\u0000independent assessment of the bone characteristics of each bone layer and lay\u0000the groundwork for MSK disease diagnosis and its automation. This work proposed\u0000a Bone Layer Separation GAN (BLS-GAN) framework that can produce high-quality\u0000bone layer images with reasonable bone characteristics and texture. This\u0000framework introduced a reconstructor based on conventional radiography imaging\u0000principles, which achieved efficient reconstruction and mitigates the recurrent\u0000calculations and training instability issues caused by soft tissue in the\u0000overlapped regions. Additionally, pre-training with synthetic images was\u0000implemented to enhance the stability of both the training process and the\u0000results. The generated images passed the visual Turing test, and improved\u0000performance in downstream tasks. This work affirms the feasibility of\u0000extracting bone layer images from conventional radiographs, which holds promise\u0000for leveraging bone layer separation technology to facilitate more\u0000comprehensive analytical research in MSK diagnosis, monitoring, and prognosis.\u0000Code and dataset will be made available.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy 利用条件 StyleGAN 和潜空间操作进行可控视网膜图像合成，改进糖尿病视网膜病变的诊断和分级

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07422

Somayeh PakdelmoezDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Saba OmidikiaDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyed Ali SeyyedsalehiDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyede Zohreh SeyyedsalehiDepartment of Biomedical Engineering, Faculty of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran

Diabetic retinopathy (DR) is a consequence of diabetes mellitus characterizedby vascular damage within the retinal tissue. Timely detection is paramount tomitigate the risk of vision loss. However, training robust grading models ishindered by a shortage of annotated data, particularly for severe cases. Thispaper proposes a framework for controllably generating high-fidelity anddiverse DR fundus images, thereby improving classifier performance in DRgrading and detection. We achieve comprehensive control over DR severity andvisual features (optic disc, vessel structure, lesion areas) within generatedimages solely through a conditional StyleGAN, eliminating the need for featuremasks or auxiliary networks. Specifically, leveraging the SeFa algorithm toidentify meaningful semantics within the latent space, we manipulate the DRimages generated conditionally on grades, further enhancing the datasetdiversity. Additionally, we propose a novel, effective SeFa-based dataaugmentation strategy, helping the classifier focus on discriminative regionswhile ignoring redundant features. Using this approach, a ResNet50 modeltrained for DR detection achieves 98.09% accuracy, 99.44% specificity, 99.45%precision, and an F1-score of 98.09%. Moreover, incorporating synthetic imagesgenerated by conditional StyleGAN into ResNet50 training for DR grading yields83.33% accuracy, a quadratic kappa score of 87.64%, 95.67% specificity, and72.24% precision. Extensive experiments conducted on the APTOS 2019 datasetdemonstrate the exceptional realism of the generated images and the superiorperformance of our classifier compared to recent studies.

糖尿病视网膜病变（DR）是糖尿病的一种后遗症，其特点是视网膜组织内的血管受损。及时检测对降低视力丧失的风险至关重要。然而，由于缺乏注释数据，尤其是严重病例的注释数据，训练稳健的分级模型受到了阻碍。本文提出了一种框架，用于可控地生成高保真和多样化的 DR 眼底图像，从而提高 DR 分级和检测中分类器的性能。我们仅通过条件式广域网（StyleGAN）就实现了对 DR 严重程度和生成图像中视觉特征（视盘、血管结构、病变区域）的全面控制，从而消除了对特征掩码或辅助网络的需求。具体来说，我们利用 SeFa 算法识别潜空间内有意义的语义，根据等级有条件地处理生成的 DR 图像，进一步增强了数据集的多样性。此外，我们还提出了一种新颖、有效的基于 SeFa 的数据分割策略，帮助分类器专注于有区分度的区域，同时忽略冗余特征。利用这种方法，针对 DR 检测训练的 ResNet50 模型达到了 98.09% 的准确率、99.44% 的特异性、99.45% 的精确性和 98.09% 的 F1 分数。此外，将有条件的 StyleGAN 生成的合成图像纳入用于 DR 分级的 ResNet50 训练，可获得 83.33% 的准确率、87.64% 的二次 kappa 分数、95.67% 的特异性和 72.24% 的精确度。在 APTOS 2019 数据集上进行的大量实验证明，生成的图像异常逼真，与近期的研究相比，我们的分类器性能更优。

{"title":"Controllable retinal image synthesis using conditional StyleGAN and latent space manipulation for improved diagnosis and grading of diabetic retinopathy","authors":"Somayeh PakdelmoezDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Saba OmidikiaDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyed Ali SeyyedsalehiDepartment of Biomedical Engineering, Amirkabir University of Technology, Tehran, Iran, Seyyede Zohreh SeyyedsalehiDepartment of Biomedical Engineering, Faculty of Health, Tehran Medical Sciences, Islamic Azad University, Tehran, Iran","doi":"arxiv-2409.07422","DOIUrl":"https://doi.org/arxiv-2409.07422","url":null,"abstract":"Diabetic retinopathy (DR) is a consequence of diabetes mellitus characterized\u0000by vascular damage within the retinal tissue. Timely detection is paramount to\u0000mitigate the risk of vision loss. However, training robust grading models is\u0000hindered by a shortage of annotated data, particularly for severe cases. This\u0000paper proposes a framework for controllably generating high-fidelity and\u0000diverse DR fundus images, thereby improving classifier performance in DR\u0000grading and detection. We achieve comprehensive control over DR severity and\u0000visual features (optic disc, vessel structure, lesion areas) within generated\u0000images solely through a conditional StyleGAN, eliminating the need for feature\u0000masks or auxiliary networks. Specifically, leveraging the SeFa algorithm to\u0000identify meaningful semantics within the latent space, we manipulate the DR\u0000images generated conditionally on grades, further enhancing the dataset\u0000diversity. Additionally, we propose a novel, effective SeFa-based data\u0000augmentation strategy, helping the classifier focus on discriminative regions\u0000while ignoring redundant features. Using this approach, a ResNet50 model\u0000trained for DR detection achieves 98.09% accuracy, 99.44% specificity, 99.45%\u0000precision, and an F1-score of 98.09%. Moreover, incorporating synthetic images\u0000generated by conditional StyleGAN into ResNet50 training for DR grading yields\u000083.33% accuracy, a quadratic kappa score of 87.64%, 95.67% specificity, and\u000072.24% precision. Extensive experiments conducted on the APTOS 2019 dataset\u0000demonstrate the exceptional realism of the generated images and the superior\u0000performance of our classifier compared to recent studies.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TabMixer: Noninvasive Estimation of the Mean Pulmonary Artery Pressure via Imaging and Tabular Data Mixing TabMixer：通过成像和表格数据混合无创估算平均肺动脉压

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07564

Michal K. Grzeszczyk, Przemysław Korzeniowski, Samer Alabed, Andrew J. Swift, Tomasz Trzciński, Arkadiusz Sitek

Right Heart Catheterization is a gold standard procedure for diagnosingPulmonary Hypertension by measuring mean Pulmonary Artery Pressure (mPAP). Itis invasive, costly, time-consuming and carries risks. In this paper, for thefirst time, we explore the estimation of mPAP from videos of noninvasiveCardiac Magnetic Resonance Imaging. To enhance the predictive capabilities ofDeep Learning models used for this task, we introduce an additional modality inthe form of demographic features and clinical measurements. Inspired byall-Multilayer Perceptron architectures, we present TabMixer, a novel moduleenabling the integration of imaging and tabular data through spatial, temporaland channel mixing. Specifically, we present the first approach that utilizesMultilayer Perceptrons to interchange tabular information with imaging featuresin vision models. We test TabMixer for mPAP estimation and show that itenhances the performance of Convolutional Neural Networks, 3D-MLP and VisionTransformers while being competitive with previous modules for imaging andtabular data. Our approach has the potential to improve clinical processesinvolving both modalities, particularly in noninvasive mPAP estimation, thus,significantly enhancing the quality of life for individuals affected byPulmonary Hypertension. We provide a source code for using TabMixer athttps://github.com/SanoScience/TabMixer.

右心导管检查是通过测量平均肺动脉压 (mPAP) 诊断肺动脉高压的金标准程序。这种方法具有侵入性、成本高、耗时长且有风险。在本文中，我们首次探索了从无创心脏磁共振成像视频中估算 mPAP。为了增强深度学习模型的预测能力，我们引入了人口统计学特征和临床测量结果等额外模式。受多层感知器（Multilayer Perceptron）架构的启发，我们推出了 TabMixer，这是一种通过空间、时间和通道混合实现成像和表格数据整合的新型模块。具体来说，我们提出了第一种利用多层感知器将表格信息与视觉模型中的成像特征互换的方法。我们对 TabMixer 进行了 mPAP 估算测试，结果表明它提高了卷积神经网络、3D-MLP 和视觉变换器的性能，同时在成像和表格数据方面与以前的模块相比也具有竞争力。我们的方法有望改善涉及这两种模式的临床流程，尤其是无创 mPAP 估算，从而显著提高肺动脉高压患者的生活质量。我们提供了使用 TabMixer 的源代码：https://github.com/SanoScience/TabMixer。

{"title":"TabMixer: Noninvasive Estimation of the Mean Pulmonary Artery Pressure via Imaging and Tabular Data Mixing","authors":"Michal K. Grzeszczyk, Przemysław Korzeniowski, Samer Alabed, Andrew J. Swift, Tomasz Trzciński, Arkadiusz Sitek","doi":"arxiv-2409.07564","DOIUrl":"https://doi.org/arxiv-2409.07564","url":null,"abstract":"Right Heart Catheterization is a gold standard procedure for diagnosing\u0000Pulmonary Hypertension by measuring mean Pulmonary Artery Pressure (mPAP). It\u0000is invasive, costly, time-consuming and carries risks. In this paper, for the\u0000first time, we explore the estimation of mPAP from videos of noninvasive\u0000Cardiac Magnetic Resonance Imaging. To enhance the predictive capabilities of\u0000Deep Learning models used for this task, we introduce an additional modality in\u0000the form of demographic features and clinical measurements. Inspired by\u0000all-Multilayer Perceptron architectures, we present TabMixer, a novel module\u0000enabling the integration of imaging and tabular data through spatial, temporal\u0000and channel mixing. Specifically, we present the first approach that utilizes\u0000Multilayer Perceptrons to interchange tabular information with imaging features\u0000in vision models. We test TabMixer for mPAP estimation and show that it\u0000enhances the performance of Convolutional Neural Networks, 3D-MLP and Vision\u0000Transformers while being competitive with previous modules for imaging and\u0000tabular data. Our approach has the potential to improve clinical processes\u0000involving both modalities, particularly in noninvasive mPAP estimation, thus,\u0000significantly enhancing the quality of life for individuals affected by\u0000Pulmonary Hypertension. We provide a source code for using TabMixer at\u0000https://github.com/SanoScience/TabMixer.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"165 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0