arXiv - EE - Image and Video Processing最新文献_第8页

Deep Learning Techniques for Hand Vein Biometrics: A Comprehensive Review 用于手部静脉生物识别的深度学习技术：全面回顾

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07128

Mustapha Hemis, Hamza Kheddar, Sami Bourouis, Nasir Saleem

Biometric authentication has garnered significant attention as a secure andefficient method of identity verification. Among the various modalities, handvein biometrics, including finger vein, palm vein, and dorsal hand veinrecognition, offer unique advantages due to their high accuracy, lowsusceptibility to forgery, and non-intrusiveness. The vein patterns within thehand are highly complex and distinct for each individual, making them an idealbiometric identifier. Additionally, hand vein recognition is contactless,enhancing user convenience and hygiene compared to other modalities such asfingerprint or iris recognition. Furthermore, the veins are internally located,rendering them less susceptible to damage or alteration, thus enhancing thesecurity and reliability of the biometric system. The combination of thesefactors makes hand vein biometrics a highly effective and secure method foridentity verification. This review paper delves into the latest advancements indeep learning techniques applied to finger vein, palm vein, and dorsal handvein recognition. It encompasses all essential fundamentals of hand veinbiometrics, summarizes publicly available datasets, and discussesstate-of-the-art metrics used for evaluating the three modes. Moreover, itprovides a comprehensive overview of suggested approaches for finger, palm,dorsal, and multimodal vein techniques, offering insights into the bestperformance achieved, data augmentation techniques, and effective transferlearning methods, along with associated pretrained deep learning models.Additionally, the review addresses research challenges faced and outlinesfuture directions and perspectives, encouraging researchers to enhance existingmethods and propose innovative techniques.

生物识别身份验证作为一种安全、高效的身份验证方法，已经引起了广泛关注。在各种方式中，手静脉生物识别（包括手指静脉、手掌静脉和手背静脉识别）因其准确性高、不易伪造和非侵入性而具有独特的优势。每个人的手部静脉图案都非常复杂且各不相同，因此是理想的生物识别器。此外，手部静脉识别是非接触式的，与指纹或虹膜识别等其他方式相比，可提高用户的便利性和卫生性。此外，静脉位于内部，不易损坏或改变，从而提高了生物识别系统的安全性和可靠性。这些因素的结合使手静脉生物识别技术成为一种高效、安全的身份验证方法。本综述论文深入探讨了应用于手指静脉、手掌静脉和手背静脉识别的深度学习技术的最新进展。它涵盖了手部静脉生物统计学的所有基本要素，总结了公开可用的数据集，并讨论了用于评估这三种模式的最新指标。此外，它还全面概述了针对手指、手掌、手背和多模态静脉技术所建议的方法，深入分析了所取得的最佳性能、数据增强技术和有效的迁移学习方法，以及相关的预训练深度学习模型。

{"title":"Deep Learning Techniques for Hand Vein Biometrics: A Comprehensive Review","authors":"Mustapha Hemis, Hamza Kheddar, Sami Bourouis, Nasir Saleem","doi":"arxiv-2409.07128","DOIUrl":"https://doi.org/arxiv-2409.07128","url":null,"abstract":"Biometric authentication has garnered significant attention as a secure and\u0000efficient method of identity verification. Among the various modalities, hand\u0000vein biometrics, including finger vein, palm vein, and dorsal hand vein\u0000recognition, offer unique advantages due to their high accuracy, low\u0000susceptibility to forgery, and non-intrusiveness. The vein patterns within the\u0000hand are highly complex and distinct for each individual, making them an ideal\u0000biometric identifier. Additionally, hand vein recognition is contactless,\u0000enhancing user convenience and hygiene compared to other modalities such as\u0000fingerprint or iris recognition. Furthermore, the veins are internally located,\u0000rendering them less susceptible to damage or alteration, thus enhancing the\u0000security and reliability of the biometric system. The combination of these\u0000factors makes hand vein biometrics a highly effective and secure method for\u0000identity verification. This review paper delves into the latest advancements in\u0000deep learning techniques applied to finger vein, palm vein, and dorsal hand\u0000vein recognition. It encompasses all essential fundamentals of hand vein\u0000biometrics, summarizes publicly available datasets, and discusses\u0000state-of-the-art metrics used for evaluating the three modes. Moreover, it\u0000provides a comprehensive overview of suggested approaches for finger, palm,\u0000dorsal, and multimodal vein techniques, offering insights into the best\u0000performance achieved, data augmentation techniques, and effective transfer\u0000learning methods, along with associated pretrained deep learning models.\u0000Additionally, the review addresses research challenges faced and outlines\u0000future directions and perspectives, encouraging researchers to enhance existing\u0000methods and propose innovative techniques.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RICAU-Net: Residual-block Inspired Coordinate Attention U-Net for Segmentation of Small and Sparse Calcium Lesions in Cardiac CT RICAU-Net：用于心脏 CT 中细小稀疏钙化病变分割的残余阻滞启发坐标注意 U 网

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.06993

Doyoung Park, Jinsoo Kim, Qi Chang, Shuang Leng, Liang Zhong, Lohendran Baskaran

The Agatston score, which is the sum of the calcification in the four maincoronary arteries, has been widely used in the diagnosis of coronary arterydisease (CAD). However, many studies have emphasized the importance of thevessel-specific Agatston score, as calcification in a specific vessel issignificantly correlated with the occurrence of coronary heart disease (CHD).In this paper, we propose the Residual-block Inspired Coordinate AttentionU-Net (RICAU-Net), which incorporates coordinate attention in two distinctmanners and a customized combo loss function for lesion-specific coronaryartery calcium (CAC) segmentation. This approach aims to tackle the highclass-imbalance issue associated with small and sparse lesions, particularlyfor CAC in the left main coronary artery (LM) which is generally small and thescarcest in the dataset due to its anatomical structure. The proposed methodwas compared with six different methods using Dice score, precision, andrecall. Our approach achieved the highest per-lesion Dice scores for all fourlesions, especially for CAC in LM compared to other methods. The ablationstudies demonstrated the significance of positional information from thecoordinate attention and the customized loss function in segmenting small andsparse lesions with a high class-imbalance problem.

Agatston 评分是四条主要冠状动脉钙化程度的总和，已被广泛用于冠状动脉疾病（CAD）的诊断。在本文中，我们提出了残余区块启发坐标注意力网络（RICAU-Net），它结合了两个不同manner中的坐标注意力和定制的组合损失函数，用于病变特异性冠状动脉钙化（CAC）分割。这种方法旨在解决与小病变和稀疏病变相关的高病变不平衡问题，尤其是左冠状动脉主干（LM）的 CAC，由于其解剖结构，数据集中的左冠状动脉主干（LM）通常很小，也最稀疏。我们使用 Dice 评分、精确度和召回率将所提出的方法与六种不同的方法进行了比较。与其他方法相比，我们的方法在所有四个病变中的每个病变 Dice 分数都是最高的，尤其是 LM 中的 CAC。消融研究表明，来自坐标注意的位置信息和定制的损失函数在分割具有高度类不平衡问题的小病灶和稀疏病灶时具有重要意义。

{"title":"RICAU-Net: Residual-block Inspired Coordinate Attention U-Net for Segmentation of Small and Sparse Calcium Lesions in Cardiac CT","authors":"Doyoung Park, Jinsoo Kim, Qi Chang, Shuang Leng, Liang Zhong, Lohendran Baskaran","doi":"arxiv-2409.06993","DOIUrl":"https://doi.org/arxiv-2409.06993","url":null,"abstract":"The Agatston score, which is the sum of the calcification in the four main\u0000coronary arteries, has been widely used in the diagnosis of coronary artery\u0000disease (CAD). However, many studies have emphasized the importance of the\u0000vessel-specific Agatston score, as calcification in a specific vessel is\u0000significantly correlated with the occurrence of coronary heart disease (CHD).\u0000In this paper, we propose the Residual-block Inspired Coordinate Attention\u0000U-Net (RICAU-Net), which incorporates coordinate attention in two distinct\u0000manners and a customized combo loss function for lesion-specific coronary\u0000artery calcium (CAC) segmentation. This approach aims to tackle the high\u0000class-imbalance issue associated with small and sparse lesions, particularly\u0000for CAC in the left main coronary artery (LM) which is generally small and the\u0000scarcest in the dataset due to its anatomical structure. The proposed method\u0000was compared with six different methods using Dice score, precision, and\u0000recall. Our approach achieved the highest per-lesion Dice scores for all four\u0000lesions, especially for CAC in LM compared to other methods. The ablation\u0000studies demonstrated the significance of positional information from the\u0000coordinate attention and the customized loss function in segmenting small and\u0000sparse lesions with a high class-imbalance problem.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging 用于快照压缩成像的高效一步扩散细化技术

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07417

Yunzhen Wang, Haijin Zeng, Shaoguang Huang, Hongyu Chen, Hongyan Zhang

Coded Aperture Snapshot Spectral Imaging (CASSI) is a crucial technique forcapturing three-dimensional multispectral images (MSIs) through the complexinverse task of reconstructing these images from coded two-dimensionalmeasurements. Current state-of-the-art methods, predominantly end-to-end, facelimitations in reconstructing high-frequency details and often rely onconstrained datasets like KAIST and CAVE, resulting in models with poorgeneralizability. In response to these challenges, this paper introduces anovel one-step Diffusion Probabilistic Model within a self-supervisedadaptation framework for Snapshot Compressive Imaging (SCI). Our approachleverages a pretrained SCI reconstruction network to generate initialpredictions from two-dimensional measurements. Subsequently, a one-stepdiffusion model produces high-frequency residuals to enhance these initialpredictions. Additionally, acknowledging the high costs associated withcollecting MSIs, we develop a self-supervised paradigm based on the EquivariantImaging (EI) framework. Experimental results validate the superiority of ourmodel compared to previous methods, showcasing its simplicity and adaptabilityto various end-to-end or unfolding techniques.

编码孔径快照光谱成像（CASSI）是获取三维多光谱图像（MSI）的关键技术，其复杂的逆任务是从编码的二维测量数据中重建这些图像。目前最先进的方法主要是端到端方法，在重建高频细节方面存在局限性，而且往往依赖于 KAIST 和 CAVE 等受限数据集，导致模型的通用性较差。为了应对这些挑战，本文在快照压缩成像（SCI）的自监督适应框架内引入了一种新的一步扩散概率模型。我们的方法利用预先训练好的 SCI 重建网络，从二维测量中生成初始预测。随后，一步扩散模型产生高频残差来增强这些初始预测。此外，考虑到收集 MSIs 的成本较高，我们开发了基于等变成像（EI）框架的自监督范例。实验结果验证了我们的模型优于之前的方法，展示了它的简单性和对各种端到端或展开技术的适应性。

{"title":"Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging","authors":"Yunzhen Wang, Haijin Zeng, Shaoguang Huang, Hongyu Chen, Hongyan Zhang","doi":"arxiv-2409.07417","DOIUrl":"https://doi.org/arxiv-2409.07417","url":null,"abstract":"Coded Aperture Snapshot Spectral Imaging (CASSI) is a crucial technique for\u0000capturing three-dimensional multispectral images (MSIs) through the complex\u0000inverse task of reconstructing these images from coded two-dimensional\u0000measurements. Current state-of-the-art methods, predominantly end-to-end, face\u0000limitations in reconstructing high-frequency details and often rely on\u0000constrained datasets like KAIST and CAVE, resulting in models with poor\u0000generalizability. In response to these challenges, this paper introduces a\u0000novel one-step Diffusion Probabilistic Model within a self-supervised\u0000adaptation framework for Snapshot Compressive Imaging (SCI). Our approach\u0000leverages a pretrained SCI reconstruction network to generate initial\u0000predictions from two-dimensional measurements. Subsequently, a one-step\u0000diffusion model produces high-frequency residuals to enhance these initial\u0000predictions. Additionally, acknowledging the high costs associated with\u0000collecting MSIs, we develop a self-supervised paradigm based on the Equivariant\u0000Imaging (EI) framework. Experimental results validate the superiority of our\u0000model compared to previous methods, showcasing its simplicity and adaptability\u0000to various end-to-end or unfolding techniques.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CWT-Net: Super-resolution of Histopathology Images Using a Cross-scale Wavelet-based Transformer CWT-Net：使用基于小波的跨尺度变换器实现组织病理学图像的超级分辨率

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07092

Feiyang Jia, Zhineng Chen, Ziying Song, Lin Liu, Caiyan Jia

Super-resolution (SR) aims to enhance the quality of low-resolution imagesand has been widely applied in medical imaging. We found that the designprinciples of most existing methods are influenced by SR tasks based onreal-world images and do not take into account the significance of themulti-level structure in pathological images, even if they can achieverespectable objective metric evaluations. In this work, we delve into twosuper-resolution working paradigms and propose a novel network called CWT-Net,which leverages cross-scale image wavelet transform and Transformerarchitecture. Our network consists of two branches: one dedicated to learningsuper-resolution and the other to high-frequency wavelet features. To generatehigh-resolution histopathology images, the Transformer module shares and fusesfeatures from both branches at various stages. Notably, we have designed aspecialized wavelet reconstruction module to effectively enhance the waveletdomain features and enable the network to operate in different modes, allowingfor the introduction of additional relevant information from cross-scaleimages. Our experimental results demonstrate that our model significantlyoutperforms state-of-the-art methods in both performance and visualizationevaluations and can substantially boost the accuracy of image diagnosticnetworks.

超分辨率（SR）旨在提高低分辨率图像的质量，已被广泛应用于医学成像领域。我们发现，大多数现有方法的设计原则都受到基于真实世界图像的 SR 任务的影响，没有考虑到病理图像中多层次结构的重要性，即使它们能实现可观的客观度量评估。在这项工作中，我们深入研究了两种超分辨率工作模式，并提出了一种名为 CWT-Net 的新型网络，它利用了跨尺度图像小波变换和变换器架构。我们的网络由两个分支组成：一个专门学习超分辨率，另一个专门学习高频小波特征。为了生成高分辨率的组织病理学图像，变换器模块在不同阶段共享和融合来自两个分支的特征。值得注意的是，我们设计了专门的小波重构模块，以有效增强小波域特征，并使网络以不同模式运行，允许从跨尺度图像中引入额外的相关信息。实验结果表明，我们的模型在性能和可视化评估方面都明显优于最先进的方法，可以大大提高图像诊断网络的准确性。

{"title":"CWT-Net: Super-resolution of Histopathology Images Using a Cross-scale Wavelet-based Transformer","authors":"Feiyang Jia, Zhineng Chen, Ziying Song, Lin Liu, Caiyan Jia","doi":"arxiv-2409.07092","DOIUrl":"https://doi.org/arxiv-2409.07092","url":null,"abstract":"Super-resolution (SR) aims to enhance the quality of low-resolution images\u0000and has been widely applied in medical imaging. We found that the design\u0000principles of most existing methods are influenced by SR tasks based on\u0000real-world images and do not take into account the significance of the\u0000multi-level structure in pathological images, even if they can achieve\u0000respectable objective metric evaluations. In this work, we delve into two\u0000super-resolution working paradigms and propose a novel network called CWT-Net,\u0000which leverages cross-scale image wavelet transform and Transformer\u0000architecture. Our network consists of two branches: one dedicated to learning\u0000super-resolution and the other to high-frequency wavelet features. To generate\u0000high-resolution histopathology images, the Transformer module shares and fuses\u0000features from both branches at various stages. Notably, we have designed a\u0000specialized wavelet reconstruction module to effectively enhance the wavelet\u0000domain features and enable the network to operate in different modes, allowing\u0000for the introduction of additional relevant information from cross-scale\u0000images. Our experimental results demonstrate that our model significantly\u0000outperforms state-of-the-art methods in both performance and visualization\u0000evaluations and can substantially boost the accuracy of image diagnostic\u0000networks.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"6 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents 3DGCQA：三维人工智能生成内容质量评估数据库

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07236

Yingjie Zhou, Zicheng Zhang, Farong Wen, Jun Jia, Yanwei Jiang, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai

Although 3D generated content (3DGC) offers advantages in reducing productioncosts and accelerating design timelines, its quality often falls short whencompared to 3D professionally generated content. Common quality issuesfrequently affect 3DGC, highlighting the importance of timely and effectivequality assessment. Such evaluations not only ensure a higher standard of 3DGCsfor end-users but also provide critical insights for advancing generativetechnologies. To address existing gaps in this domain, this paper introduces anovel 3DGC quality assessment dataset, 3DGCQA, built using 7 representativeText-to-3D generation methods. During the dataset's construction, 50 fixedprompts are utilized to generate contents across all methods, resulting in thecreation of 313 textured meshes that constitute the 3DGCQA dataset. Thevisualization intuitively reveals the presence of 6 common distortioncategories in the generated 3DGCs. To further explore the quality of the 3DGCs,subjective quality assessment is conducted by evaluators, whose ratings revealsignificant variation in quality across different generation methods.Additionally, several objective quality assessment algorithms are tested on the3DGCQA dataset. The results expose limitations in the performance of existingalgorithms and underscore the need for developing more specialized qualityassessment methods. To provide a valuable resource for future research anddevelopment in 3D content generation and quality assessment, the dataset hasbeen open-sourced in https://github.com/zyj-2000/3DGCQA.

尽管三维生成内容（3DGC）在降低生产成本和加快设计进度方面具有优势，但与三维专业生成内容相比，其质量往往不尽如人意。常见的质量问题经常会影响 3DGC 的质量，这凸显了及时有效的质量评估的重要性。此类评估不仅能确保最终用户获得更高标准的 3DGC 内容，还能为生成技术的发展提供重要的启示。为了弥补该领域的现有差距，本文介绍了一个高级 3DGC 质量评估数据集 3DGCQA，该数据集采用 7 种具有代表性的文本到 3D 生成方法构建而成。在数据集的构建过程中，所有方法都使用了 50 个固定矩阵来生成内容，最终创建了 313 个纹理网格，这些网格构成了 3DGCQA 数据集。可视化直观地揭示了生成的 3DGC 中存在 6 种常见的失真类别。为了进一步探索 3DGC 的质量，评估人员对 3DGC 进行了主观质量评估，评估结果显示不同生成方法的质量存在显著差异。测试结果揭示了现有算法性能的局限性，并强调了开发更专业的质量评估方法的必要性。为了给未来三维内容生成和质量评估的研究和开发提供宝贵的资源，该数据集已在 https://github.com/zyj-2000/3DGCQA 中开源。

{"title":"3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents","authors":"Yingjie Zhou, Zicheng Zhang, Farong Wen, Jun Jia, Yanwei Jiang, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai","doi":"arxiv-2409.07236","DOIUrl":"https://doi.org/arxiv-2409.07236","url":null,"abstract":"Although 3D generated content (3DGC) offers advantages in reducing production\u0000costs and accelerating design timelines, its quality often falls short when\u0000compared to 3D professionally generated content. Common quality issues\u0000frequently affect 3DGC, highlighting the importance of timely and effective\u0000quality assessment. Such evaluations not only ensure a higher standard of 3DGCs\u0000for end-users but also provide critical insights for advancing generative\u0000technologies. To address existing gaps in this domain, this paper introduces a\u0000novel 3DGC quality assessment dataset, 3DGCQA, built using 7 representative\u0000Text-to-3D generation methods. During the dataset's construction, 50 fixed\u0000prompts are utilized to generate contents across all methods, resulting in the\u0000creation of 313 textured meshes that constitute the 3DGCQA dataset. The\u0000visualization intuitively reveals the presence of 6 common distortion\u0000categories in the generated 3DGCs. To further explore the quality of the 3DGCs,\u0000subjective quality assessment is conducted by evaluators, whose ratings reveal\u0000significant variation in quality across different generation methods.\u0000Additionally, several objective quality assessment algorithms are tested on the\u00003DGCQA dataset. The results expose limitations in the performance of existing\u0000algorithms and underscore the need for developing more specialized quality\u0000assessment methods. To provide a valuable resource for future research and\u0000development in 3D content generation and quality assessment, the dataset has\u0000been open-sourced in https://github.com/zyj-2000/3DGCQA.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Quantifying Knee Cartilage Shape and Lesion: From Image to Metrics 量化膝关节软骨形状和病变：从图像到指标

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07361

Yongcheng Yao, Weitian Chen

Imaging features of knee articular cartilage have been shown to be potentialimaging biomarkers for knee osteoarthritis. Despite recent methodologicaladvancements in image analysis techniques like image segmentation,registration, and domain-specific image computing algorithms, only a few worksfocus on building fully automated pipelines for imaging feature extraction. Inthis study, we developed a deep-learning-based medical image analysisapplication for knee cartilage morphometrics, CartiMorph Toolbox (CMT). Weproposed a 2-stage joint template learning and registration network, CMT-reg.We trained the model using the OAI-ZIB dataset and assessed its performance intemplate-to-image registration. The CMT-reg demonstrated competitive resultscompared to other state-of-the-art models. We integrated the proposed modelinto an automated pipeline for the quantification of cartilage shape and lesion(full-thickness cartilage loss, specifically). The toolbox provides acomprehensive, user-friendly solution for medical image analysis and datavisualization. The software and models are available athttps://github.com/YongchengYAO/CMT-AMAI24paper .

膝关节软骨的成像特征已被证明是膝关节骨关节炎的潜在成像生物标志物。尽管最近在图像分割、配准和特定领域图像计算算法等图像分析技术方面取得了方法学上的进步，但只有少数作品专注于构建全自动的图像特征提取管道。在这项研究中，我们开发了基于深度学习的膝关节软骨形态计量医学图像分析应用程序 CartiMorph Toolbox（CMT）。我们使用 OAI-ZIB 数据集训练了该模型，并评估了其模板到图像的配准性能。与其他最先进的模型相比，CMT-reg 的结果极具竞争力。我们将所提出的模型集成到一个自动化流水线中，用于量化软骨形状和病变（特别是全厚软骨损失）。该工具箱为医学图像分析和数据可视化提供了一个全面、用户友好的解决方案。软件和模型可从以下网址获取：https://github.com/YongchengYAO/CMT-AMAI24paper 。

{"title":"Quantifying Knee Cartilage Shape and Lesion: From Image to Metrics","authors":"Yongcheng Yao, Weitian Chen","doi":"arxiv-2409.07361","DOIUrl":"https://doi.org/arxiv-2409.07361","url":null,"abstract":"Imaging features of knee articular cartilage have been shown to be potential\u0000imaging biomarkers for knee osteoarthritis. Despite recent methodological\u0000advancements in image analysis techniques like image segmentation,\u0000registration, and domain-specific image computing algorithms, only a few works\u0000focus on building fully automated pipelines for imaging feature extraction. In\u0000this study, we developed a deep-learning-based medical image analysis\u0000application for knee cartilage morphometrics, CartiMorph Toolbox (CMT). We\u0000proposed a 2-stage joint template learning and registration network, CMT-reg.\u0000We trained the model using the OAI-ZIB dataset and assessed its performance in\u0000template-to-image registration. The CMT-reg demonstrated competitive results\u0000compared to other state-of-the-art models. We integrated the proposed model\u0000into an automated pipeline for the quantification of cartilage shape and lesion\u0000(full-thickness cartilage loss, specifically). The toolbox provides a\u0000comprehensive, user-friendly solution for medical image analysis and data\u0000visualization. The software and models are available at\u0000https://github.com/YongchengYAO/CMT-AMAI24paper .","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records 基于电子健康记录预测患者胸部 X 射线图像的时间变化

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07012

Daeun Kyung, Junu Kim, Tackeun Kim, Edward Choi

Chest X-ray imaging (CXR) is an important diagnostic tool used in hospitalsto assess patient conditions and monitor changes over time. Generative models,specifically diffusion-based models, have shown promise in generating realisticsynthetic X-rays. However, these models mainly focus on conditional generationusing single-time-point data, i.e., typically CXRs taken at a specific timewith their corresponding reports, limiting their clinical utility, particularlyfor capturing temporal changes. To address this limitation, we propose a novelframework, EHRXDiff, which predicts future CXR images by integrating previousCXRs with subsequent medical events, e.g., prescriptions, lab measures, etc.Our framework dynamically tracks and predicts disease progression based on alatent diffusion model, conditioned on the previous CXR image and a history ofmedical events. We comprehensively evaluate the performance of our frameworkacross three key aspects, including clinical consistency, demographicconsistency, and visual realism. We demonstrate that our framework generateshigh-quality, realistic future images that capture potential temporal changes,suggesting its potential for further development as a clinical simulation tool.This could offer valuable insights for patient monitoring and treatmentplanning in the medical field.

胸部 X 射线成像（CXR）是医院用于评估病人病情和监测随时间变化的重要诊断工具。生成模型，特别是基于扩散的模型，在生成逼真的合成 X 射线方面已显示出前景。然而，这些模型主要侧重于利用单时间点数据（即通常在特定时间拍摄的 X 光片及其相应报告）进行条件生成，从而限制了其临床实用性，尤其是在捕捉时间变化方面。为了解决这一局限性，我们提出了一个新颖的框架 EHRXDiff，该框架通过将以前的 CXR 与随后的医疗事件（如处方、化验指标等）相结合来预测未来的 CXR 图像。我们全面评估了框架在临床一致性、人口统计学一致性和视觉真实性等三个关键方面的性能。我们证明，我们的框架能生成高质量、逼真的未来图像，并能捕捉潜在的时间变化，这表明它有潜力进一步发展成为临床模拟工具。

{"title":"Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records","authors":"Daeun Kyung, Junu Kim, Tackeun Kim, Edward Choi","doi":"arxiv-2409.07012","DOIUrl":"https://doi.org/arxiv-2409.07012","url":null,"abstract":"Chest X-ray imaging (CXR) is an important diagnostic tool used in hospitals\u0000to assess patient conditions and monitor changes over time. Generative models,\u0000specifically diffusion-based models, have shown promise in generating realistic\u0000synthetic X-rays. However, these models mainly focus on conditional generation\u0000using single-time-point data, i.e., typically CXRs taken at a specific time\u0000with their corresponding reports, limiting their clinical utility, particularly\u0000for capturing temporal changes. To address this limitation, we propose a novel\u0000framework, EHRXDiff, which predicts future CXR images by integrating previous\u0000CXRs with subsequent medical events, e.g., prescriptions, lab measures, etc.\u0000Our framework dynamically tracks and predicts disease progression based on a\u0000latent diffusion model, conditioned on the previous CXR image and a history of\u0000medical events. We comprehensively evaluate the performance of our framework\u0000across three key aspects, including clinical consistency, demographic\u0000consistency, and visual realism. We demonstrate that our framework generates\u0000high-quality, realistic future images that capture potential temporal changes,\u0000suggesting its potential for further development as a clinical simulation tool.\u0000This could offer valuable insights for patient monitoring and treatment\u0000planning in the medical field.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment 用于盲法图像质量评估的注意力向下采样变换器、相对排序和自一致性

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07115

Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman

The no-reference image quality assessment is a challenging domain thataddresses estimating image quality without the original reference. We introducean improved mechanism to extract local and non-local information from imagesvia different transformer encoders and CNNs. The utilization of Transformerencoders aims to mitigate locality bias and generate a non-local representationby sequentially processing CNN features, which inherently capture local visualstructures. Establishing a stronger connection between subjective and objectiveassessments is achieved through sorting within batches of images based onrelative distance information. A self-consistency approach to self-supervisionis presented, explicitly addressing the degradation of no-reference imagequality assessment (NR-IQA) models under equivariant transformations. Ourapproach ensures model robustness by maintaining consistency between an imageand its horizontally flipped equivalent. Through empirical evaluation of fivepopular image quality assessment datasets, the proposed model outperformsalternative algorithms in the context of no-reference image quality assessmentdatasets, especially on smaller datasets. Codes are available athref{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}

无参照物图像质量评估是一个具有挑战性的领域，它涉及在没有原始参照物的情况下估计图像质量。我们引入了一种改进机制，通过不同的变换编码器和 CNN 从图像中提取本地和非本地信息。变压器编码器的使用旨在减轻局部性偏差，并通过连续处理 CNN 特征生成非局部表示，而 CNN 本身就能捕捉局部视觉结构。根据相对距离信息在成批图像中进行排序，从而在主观评价和客观评价之间建立更强的联系。我们提出了一种自我监督的自一致性方法，明确解决了无参考图像质量评估（NR-IQA）模型在等变量变换下的退化问题。我们的方法通过保持图像与其水平翻转等效图像之间的一致性来确保模型的稳健性。通过对五个流行的图像质量评估数据集进行实证评估，在无参考图像质量评估数据集的情况下，所提出的模型优于其他算法，尤其是在较小的数据集上。代码见：href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}。

{"title":"Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment","authors":"Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman","doi":"arxiv-2409.07115","DOIUrl":"https://doi.org/arxiv-2409.07115","url":null,"abstract":"The no-reference image quality assessment is a challenging domain that\u0000addresses estimating image quality without the original reference. We introduce\u0000an improved mechanism to extract local and non-local information from images\u0000via different transformer encoders and CNNs. The utilization of Transformer\u0000encoders aims to mitigate locality bias and generate a non-local representation\u0000by sequentially processing CNN features, which inherently capture local visual\u0000structures. Establishing a stronger connection between subjective and objective\u0000assessments is achieved through sorting within batches of images based on\u0000relative distance information. A self-consistency approach to self-supervision\u0000is presented, explicitly addressing the degradation of no-reference image\u0000quality assessment (NR-IQA) models under equivariant transformations. Our\u0000approach ensures model robustness by maintaining consistency between an image\u0000and its horizontally flipped equivalent. Through empirical evaluation of five\u0000popular image quality assessment datasets, the proposed model outperforms\u0000alternative algorithms in the context of no-reference image quality assessment\u0000datasets, especially on smaller datasets. Codes are available at\u0000href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"2 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Event-based Mosaicing Bundle Adjustment 基于事件的镶嵌捆绑调整

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07365

Shuang Guo, Guillermo Gallego

We tackle the problem of mosaicing bundle adjustment (i.e., simultaneousrefinement of camera orientations and scene map) for a purely rotating eventcamera. We formulate the problem as a regularized non-linear least squaresoptimization. The objective function is defined using the linearized eventgeneration model in the camera orientations and the panoramic gradient map ofthe scene. We show that this BA optimization has an exploitable block-diagonalsparsity structure, so that the problem can be solved efficiently. To the bestof our knowledge, this is the first work to leverage such sparsity to speed upthe optimization in the context of event-based cameras, without the need toconvert events into image-like representations. We evaluate our method, calledEMBA, on both synthetic and real-world datasets to show its effectiveness (50%photometric error decrease), yielding results of unprecedented quality. Inaddition, we demonstrate EMBA using high spatial resolution event cameras,yielding delicate panoramas in the wild, even without an initial map. Projectpage: https://github.com/tub-rip/emba

我们解决了纯旋转事件摄像机的马赛克拼接束调整（即同时调整摄像机方向和场景地图）问题。我们将该问题表述为正则化非线性最小二乘优化。目标函数是利用摄像机方向的线性化事件生成模型和场景的全景梯度图来定义的。我们证明，这种 BA 优化具有可利用的块对角线稀疏结构，因此可以高效地解决问题。据我们所知，这是第一项利用这种稀疏性加速基于事件的摄像机优化的工作，而无需将事件转换为类似图像的表示。我们在合成数据集和真实数据集上评估了我们的方法（称为 EMBA），以证明其有效性（光度误差减少 50%），并获得了前所未有的高质量结果。此外，我们还利用高空间分辨率事件相机演示了 EMBA，即使没有初始地图，也能在野外生成精致的全景图。项目网页： https://github.com/tub-rip/emba

引用次数: 0

Performance Assessment of Feature Detection Methods for 2-D FS Sonar Imagery 二维 FS 声纳图像特征检测方法性能评估

arXiv - EE - Image and Video Processing

Pub Date : 2024-09-11 DOI: arxiv-2409.07004

Hitesh Kyatham, Shahriar Negahdaripour, Michael Xu, Xiaomin Lin, Miao Yu, Yiannis Aloimonos

Underwater robot perception is crucial in scientific subsea exploration andcommercial operations. The key challenges include non-uniform lighting and poorvisibility in turbid environments. High-frequency forward-look sonar camerasaddress these issues, by providing high-resolution imagery at maximum range oftens of meters, despite complexities posed by high degree of speckle noise, andlack of color and texture. In particular, robust feature detection is anessential initial step for automated object recognition, localization,navigation, and 3-D mapping. Various local feature detectors developed for RGBimages are not well-suited for sonar data. To assess their performances, weevaluate a number of feature detectors using real sonar images from fivedifferent sonar devices. Performance metrics such as detection accuracy, falsepositives, and robustness to variations in target characteristics and sonardevices are applied to analyze the experimental results. The study wouldprovide a deeper insight into the bottlenecks of feature detection for sonardata, and developing more effective methods

水下机器人的感知能力对于海底科学勘探和商业运营至关重要。主要挑战包括照明不均匀和在浑浊环境中可视性差。高频前视声纳相机可以解决这些问题，它可以在几十米的最大范围内提供高分辨率图像，尽管存在高度斑点噪声、缺乏颜色和纹理等复杂问题。特别是，稳健的特征检测是自动物体识别、定位、导航和三维制图必不可少的第一步。针对 RGB 图像开发的各种局部特征检测器并不适合声纳数据。为了评估它们的性能，我们使用来自五种不同声纳设备的真实声纳图像对一些特征检测器进行了评估。在分析实验结果时，我们采用了一些性能指标，如检测精度、误报率以及对目标特征和声纳设备变化的鲁棒性。这项研究将有助于深入了解声纳特征检测的瓶颈，并开发出更有效的方法。

{"title":"Performance Assessment of Feature Detection Methods for 2-D FS Sonar Imagery","authors":"Hitesh Kyatham, Shahriar Negahdaripour, Michael Xu, Xiaomin Lin, Miao Yu, Yiannis Aloimonos","doi":"arxiv-2409.07004","DOIUrl":"https://doi.org/arxiv-2409.07004","url":null,"abstract":"Underwater robot perception is crucial in scientific subsea exploration and\u0000commercial operations. The key challenges include non-uniform lighting and poor\u0000visibility in turbid environments. High-frequency forward-look sonar cameras\u0000address these issues, by providing high-resolution imagery at maximum range of\u0000tens of meters, despite complexities posed by high degree of speckle noise, and\u0000lack of color and texture. In particular, robust feature detection is an\u0000essential initial step for automated object recognition, localization,\u0000navigation, and 3-D mapping. Various local feature detectors developed for RGB\u0000images are not well-suited for sonar data. To assess their performances, we\u0000evaluate a number of feature detectors using real sonar images from five\u0000different sonar devices. Performance metrics such as detection accuracy, false\u0000positives, and robustness to variations in target characteristics and sonar\u0000devices are applied to analyze the experimental results. The study would\u0000provide a deeper insight into the bottlenecks of feature detection for sonar\u0000data, and developing more effective methods","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0