Mustapha Hemis, Hamza Kheddar, Sami Bourouis, Nasir Saleem
Biometric authentication has garnered significant attention as a secure and efficient method of identity verification. Among the various modalities, hand vein biometrics, including finger vein, palm vein, and dorsal hand vein recognition, offer unique advantages due to their high accuracy, low susceptibility to forgery, and non-intrusiveness. The vein patterns within the hand are highly complex and distinct for each individual, making them an ideal biometric identifier. Additionally, hand vein recognition is contactless, enhancing user convenience and hygiene compared to other modalities such as fingerprint or iris recognition. Furthermore, the veins are internally located, rendering them less susceptible to damage or alteration, thus enhancing the security and reliability of the biometric system. The combination of these factors makes hand vein biometrics a highly effective and secure method for identity verification. This review paper delves into the latest advancements in deep learning techniques applied to finger vein, palm vein, and dorsal hand vein recognition. It encompasses all essential fundamentals of hand vein biometrics, summarizes publicly available datasets, and discusses state-of-the-art metrics used for evaluating the three modes. Moreover, it provides a comprehensive overview of suggested approaches for finger, palm, dorsal, and multimodal vein techniques, offering insights into the best performance achieved, data augmentation techniques, and effective transfer learning methods, along with associated pretrained deep learning models. Additionally, the review addresses research challenges faced and outlines future directions and perspectives, encouraging researchers to enhance existing methods and propose innovative techniques.
{"title":"Deep Learning Techniques for Hand Vein Biometrics: A Comprehensive Review","authors":"Mustapha Hemis, Hamza Kheddar, Sami Bourouis, Nasir Saleem","doi":"arxiv-2409.07128","DOIUrl":"https://doi.org/arxiv-2409.07128","url":null,"abstract":"Biometric authentication has garnered significant attention as a secure and\u0000efficient method of identity verification. Among the various modalities, hand\u0000vein biometrics, including finger vein, palm vein, and dorsal hand vein\u0000recognition, offer unique advantages due to their high accuracy, low\u0000susceptibility to forgery, and non-intrusiveness. The vein patterns within the\u0000hand are highly complex and distinct for each individual, making them an ideal\u0000biometric identifier. Additionally, hand vein recognition is contactless,\u0000enhancing user convenience and hygiene compared to other modalities such as\u0000fingerprint or iris recognition. Furthermore, the veins are internally located,\u0000rendering them less susceptible to damage or alteration, thus enhancing the\u0000security and reliability of the biometric system. The combination of these\u0000factors makes hand vein biometrics a highly effective and secure method for\u0000identity verification. This review paper delves into the latest advancements in\u0000deep learning techniques applied to finger vein, palm vein, and dorsal hand\u0000vein recognition. It encompasses all essential fundamentals of hand vein\u0000biometrics, summarizes publicly available datasets, and discusses\u0000state-of-the-art metrics used for evaluating the three modes. Moreover, it\u0000provides a comprehensive overview of suggested approaches for finger, palm,\u0000dorsal, and multimodal vein techniques, offering insights into the best\u0000performance achieved, data augmentation techniques, and effective transfer\u0000learning methods, along with associated pretrained deep learning models.\u0000Additionally, the review addresses research challenges faced and outlines\u0000future directions and perspectives, encouraging researchers to enhance existing\u0000methods and propose innovative techniques.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Doyoung Park, Jinsoo Kim, Qi Chang, Shuang Leng, Liang Zhong, Lohendran Baskaran
The Agatston score, which is the sum of the calcification in the four main coronary arteries, has been widely used in the diagnosis of coronary artery disease (CAD). However, many studies have emphasized the importance of the vessel-specific Agatston score, as calcification in a specific vessel is significantly correlated with the occurrence of coronary heart disease (CHD). In this paper, we propose the Residual-block Inspired Coordinate Attention U-Net (RICAU-Net), which incorporates coordinate attention in two distinct manners and a customized combo loss function for lesion-specific coronary artery calcium (CAC) segmentation. This approach aims to tackle the high class-imbalance issue associated with small and sparse lesions, particularly for CAC in the left main coronary artery (LM) which is generally small and the scarcest in the dataset due to its anatomical structure. The proposed method was compared with six different methods using Dice score, precision, and recall. Our approach achieved the highest per-lesion Dice scores for all four lesions, especially for CAC in LM compared to other methods. The ablation studies demonstrated the significance of positional information from the coordinate attention and the customized loss function in segmenting small and sparse lesions with a high class-imbalance problem.
{"title":"RICAU-Net: Residual-block Inspired Coordinate Attention U-Net for Segmentation of Small and Sparse Calcium Lesions in Cardiac CT","authors":"Doyoung Park, Jinsoo Kim, Qi Chang, Shuang Leng, Liang Zhong, Lohendran Baskaran","doi":"arxiv-2409.06993","DOIUrl":"https://doi.org/arxiv-2409.06993","url":null,"abstract":"The Agatston score, which is the sum of the calcification in the four main\u0000coronary arteries, has been widely used in the diagnosis of coronary artery\u0000disease (CAD). However, many studies have emphasized the importance of the\u0000vessel-specific Agatston score, as calcification in a specific vessel is\u0000significantly correlated with the occurrence of coronary heart disease (CHD).\u0000In this paper, we propose the Residual-block Inspired Coordinate Attention\u0000U-Net (RICAU-Net), which incorporates coordinate attention in two distinct\u0000manners and a customized combo loss function for lesion-specific coronary\u0000artery calcium (CAC) segmentation. This approach aims to tackle the high\u0000class-imbalance issue associated with small and sparse lesions, particularly\u0000for CAC in the left main coronary artery (LM) which is generally small and the\u0000scarcest in the dataset due to its anatomical structure. The proposed method\u0000was compared with six different methods using Dice score, precision, and\u0000recall. Our approach achieved the highest per-lesion Dice scores for all four\u0000lesions, especially for CAC in LM compared to other methods. The ablation\u0000studies demonstrated the significance of positional information from the\u0000coordinate attention and the customized loss function in segmenting small and\u0000sparse lesions with a high class-imbalance problem.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feiyang Jia, Zhineng Chen, Ziying Song, Lin Liu, Caiyan Jia
Super-resolution (SR) aims to enhance the quality of low-resolution images and has been widely applied in medical imaging. We found that the design principles of most existing methods are influenced by SR tasks based on real-world images and do not take into account the significance of the multi-level structure in pathological images, even if they can achieve respectable objective metric evaluations. In this work, we delve into two super-resolution working paradigms and propose a novel network called CWT-Net, which leverages cross-scale image wavelet transform and Transformer architecture. Our network consists of two branches: one dedicated to learning super-resolution and the other to high-frequency wavelet features. To generate high-resolution histopathology images, the Transformer module shares and fuses features from both branches at various stages. Notably, we have designed a specialized wavelet reconstruction module to effectively enhance the wavelet domain features and enable the network to operate in different modes, allowing for the introduction of additional relevant information from cross-scale images. Our experimental results demonstrate that our model significantly outperforms state-of-the-art methods in both performance and visualization evaluations and can substantially boost the accuracy of image diagnostic networks.
超分辨率(SR)旨在提高低分辨率图像的质量,已被广泛应用于医学成像领域。我们发现,大多数现有方法的设计原则都受到基于真实世界图像的 SR 任务的影响,没有考虑到病理图像中多层次结构的重要性,即使它们能实现可观的客观度量评估。在这项工作中,我们深入研究了两种超分辨率工作模式,并提出了一种名为 CWT-Net 的新型网络,它利用了跨尺度图像小波变换和变换器架构。我们的网络由两个分支组成:一个专门学习超分辨率,另一个专门学习高频小波特征。为了生成高分辨率的组织病理学图像,变换器模块在不同阶段共享和融合来自两个分支的特征。值得注意的是,我们设计了专门的小波重构模块,以有效增强小波域特征,并使网络以不同模式运行,允许从跨尺度图像中引入额外的相关信息。实验结果表明,我们的模型在性能和可视化评估方面都明显优于最先进的方法,可以大大提高图像诊断网络的准确性。
{"title":"CWT-Net: Super-resolution of Histopathology Images Using a Cross-scale Wavelet-based Transformer","authors":"Feiyang Jia, Zhineng Chen, Ziying Song, Lin Liu, Caiyan Jia","doi":"arxiv-2409.07092","DOIUrl":"https://doi.org/arxiv-2409.07092","url":null,"abstract":"Super-resolution (SR) aims to enhance the quality of low-resolution images\u0000and has been widely applied in medical imaging. We found that the design\u0000principles of most existing methods are influenced by SR tasks based on\u0000real-world images and do not take into account the significance of the\u0000multi-level structure in pathological images, even if they can achieve\u0000respectable objective metric evaluations. In this work, we delve into two\u0000super-resolution working paradigms and propose a novel network called CWT-Net,\u0000which leverages cross-scale image wavelet transform and Transformer\u0000architecture. Our network consists of two branches: one dedicated to learning\u0000super-resolution and the other to high-frequency wavelet features. To generate\u0000high-resolution histopathology images, the Transformer module shares and fuses\u0000features from both branches at various stages. Notably, we have designed a\u0000specialized wavelet reconstruction module to effectively enhance the wavelet\u0000domain features and enable the network to operate in different modes, allowing\u0000for the introduction of additional relevant information from cross-scale\u0000images. Our experimental results demonstrate that our model significantly\u0000outperforms state-of-the-art methods in both performance and visualization\u0000evaluations and can substantially boost the accuracy of image diagnostic\u0000networks.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Although 3D generated content (3DGC) offers advantages in reducing production costs and accelerating design timelines, its quality often falls short when compared to 3D professionally generated content. Common quality issues frequently affect 3DGC, highlighting the importance of timely and effective quality assessment. Such evaluations not only ensure a higher standard of 3DGCs for end-users but also provide critical insights for advancing generative technologies. To address existing gaps in this domain, this paper introduces a novel 3DGC quality assessment dataset, 3DGCQA, built using 7 representative Text-to-3D generation methods. During the dataset's construction, 50 fixed prompts are utilized to generate contents across all methods, resulting in the creation of 313 textured meshes that constitute the 3DGCQA dataset. The visualization intuitively reveals the presence of 6 common distortion categories in the generated 3DGCs. To further explore the quality of the 3DGCs, subjective quality assessment is conducted by evaluators, whose ratings reveal significant variation in quality across different generation methods. Additionally, several objective quality assessment algorithms are tested on the 3DGCQA dataset. The results expose limitations in the performance of existing algorithms and underscore the need for developing more specialized quality assessment methods. To provide a valuable resource for future research and development in 3D content generation and quality assessment, the dataset has been open-sourced in https://github.com/zyj-2000/3DGCQA.
{"title":"3DGCQA: A Quality Assessment Database for 3D AI-Generated Contents","authors":"Yingjie Zhou, Zicheng Zhang, Farong Wen, Jun Jia, Yanwei Jiang, Xiaohong Liu, Xiongkuo Min, Guangtao Zhai","doi":"arxiv-2409.07236","DOIUrl":"https://doi.org/arxiv-2409.07236","url":null,"abstract":"Although 3D generated content (3DGC) offers advantages in reducing production\u0000costs and accelerating design timelines, its quality often falls short when\u0000compared to 3D professionally generated content. Common quality issues\u0000frequently affect 3DGC, highlighting the importance of timely and effective\u0000quality assessment. Such evaluations not only ensure a higher standard of 3DGCs\u0000for end-users but also provide critical insights for advancing generative\u0000technologies. To address existing gaps in this domain, this paper introduces a\u0000novel 3DGC quality assessment dataset, 3DGCQA, built using 7 representative\u0000Text-to-3D generation methods. During the dataset's construction, 50 fixed\u0000prompts are utilized to generate contents across all methods, resulting in the\u0000creation of 313 textured meshes that constitute the 3DGCQA dataset. The\u0000visualization intuitively reveals the presence of 6 common distortion\u0000categories in the generated 3DGCs. To further explore the quality of the 3DGCs,\u0000subjective quality assessment is conducted by evaluators, whose ratings reveal\u0000significant variation in quality across different generation methods.\u0000Additionally, several objective quality assessment algorithms are tested on the\u00003DGCQA dataset. The results expose limitations in the performance of existing\u0000algorithms and underscore the need for developing more specialized quality\u0000assessment methods. To provide a valuable resource for future research and\u0000development in 3D content generation and quality assessment, the dataset has\u0000been open-sourced in https://github.com/zyj-2000/3DGCQA.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Imaging features of knee articular cartilage have been shown to be potential imaging biomarkers for knee osteoarthritis. Despite recent methodological advancements in image analysis techniques like image segmentation, registration, and domain-specific image computing algorithms, only a few works focus on building fully automated pipelines for imaging feature extraction. In this study, we developed a deep-learning-based medical image analysis application for knee cartilage morphometrics, CartiMorph Toolbox (CMT). We proposed a 2-stage joint template learning and registration network, CMT-reg. We trained the model using the OAI-ZIB dataset and assessed its performance in template-to-image registration. The CMT-reg demonstrated competitive results compared to other state-of-the-art models. We integrated the proposed model into an automated pipeline for the quantification of cartilage shape and lesion (full-thickness cartilage loss, specifically). The toolbox provides a comprehensive, user-friendly solution for medical image analysis and data visualization. The software and models are available at https://github.com/YongchengYAO/CMT-AMAI24paper .
{"title":"Quantifying Knee Cartilage Shape and Lesion: From Image to Metrics","authors":"Yongcheng Yao, Weitian Chen","doi":"arxiv-2409.07361","DOIUrl":"https://doi.org/arxiv-2409.07361","url":null,"abstract":"Imaging features of knee articular cartilage have been shown to be potential\u0000imaging biomarkers for knee osteoarthritis. Despite recent methodological\u0000advancements in image analysis techniques like image segmentation,\u0000registration, and domain-specific image computing algorithms, only a few works\u0000focus on building fully automated pipelines for imaging feature extraction. In\u0000this study, we developed a deep-learning-based medical image analysis\u0000application for knee cartilage morphometrics, CartiMorph Toolbox (CMT). We\u0000proposed a 2-stage joint template learning and registration network, CMT-reg.\u0000We trained the model using the OAI-ZIB dataset and assessed its performance in\u0000template-to-image registration. The CMT-reg demonstrated competitive results\u0000compared to other state-of-the-art models. We integrated the proposed model\u0000into an automated pipeline for the quantification of cartilage shape and lesion\u0000(full-thickness cartilage loss, specifically). The toolbox provides a\u0000comprehensive, user-friendly solution for medical image analysis and data\u0000visualization. The software and models are available at\u0000https://github.com/YongchengYAO/CMT-AMAI24paper .","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Coded Aperture Snapshot Spectral Imaging (CASSI) is a crucial technique for capturing three-dimensional multispectral images (MSIs) through the complex inverse task of reconstructing these images from coded two-dimensional measurements. Current state-of-the-art methods, predominantly end-to-end, face limitations in reconstructing high-frequency details and often rely on constrained datasets like KAIST and CAVE, resulting in models with poor generalizability. In response to these challenges, this paper introduces a novel one-step Diffusion Probabilistic Model within a self-supervised adaptation framework for Snapshot Compressive Imaging (SCI). Our approach leverages a pretrained SCI reconstruction network to generate initial predictions from two-dimensional measurements. Subsequently, a one-step diffusion model produces high-frequency residuals to enhance these initial predictions. Additionally, acknowledging the high costs associated with collecting MSIs, we develop a self-supervised paradigm based on the Equivariant Imaging (EI) framework. Experimental results validate the superiority of our model compared to previous methods, showcasing its simplicity and adaptability to various end-to-end or unfolding techniques.
{"title":"Efficient One-Step Diffusion Refinement for Snapshot Compressive Imaging","authors":"Yunzhen Wang, Haijin Zeng, Shaoguang Huang, Hongyu Chen, Hongyan Zhang","doi":"arxiv-2409.07417","DOIUrl":"https://doi.org/arxiv-2409.07417","url":null,"abstract":"Coded Aperture Snapshot Spectral Imaging (CASSI) is a crucial technique for\u0000capturing three-dimensional multispectral images (MSIs) through the complex\u0000inverse task of reconstructing these images from coded two-dimensional\u0000measurements. Current state-of-the-art methods, predominantly end-to-end, face\u0000limitations in reconstructing high-frequency details and often rely on\u0000constrained datasets like KAIST and CAVE, resulting in models with poor\u0000generalizability. In response to these challenges, this paper introduces a\u0000novel one-step Diffusion Probabilistic Model within a self-supervised\u0000adaptation framework for Snapshot Compressive Imaging (SCI). Our approach\u0000leverages a pretrained SCI reconstruction network to generate initial\u0000predictions from two-dimensional measurements. Subsequently, a one-step\u0000diffusion model produces high-frequency residuals to enhance these initial\u0000predictions. Additionally, acknowledging the high costs associated with\u0000collecting MSIs, we develop a self-supervised paradigm based on the Equivariant\u0000Imaging (EI) framework. Experimental results validate the superiority of our\u0000model compared to previous methods, showcasing its simplicity and adaptability\u0000to various end-to-end or unfolding techniques.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chest X-ray imaging (CXR) is an important diagnostic tool used in hospitals to assess patient conditions and monitor changes over time. Generative models, specifically diffusion-based models, have shown promise in generating realistic synthetic X-rays. However, these models mainly focus on conditional generation using single-time-point data, i.e., typically CXRs taken at a specific time with their corresponding reports, limiting their clinical utility, particularly for capturing temporal changes. To address this limitation, we propose a novel framework, EHRXDiff, which predicts future CXR images by integrating previous CXRs with subsequent medical events, e.g., prescriptions, lab measures, etc. Our framework dynamically tracks and predicts disease progression based on a latent diffusion model, conditioned on the previous CXR image and a history of medical events. We comprehensively evaluate the performance of our framework across three key aspects, including clinical consistency, demographic consistency, and visual realism. We demonstrate that our framework generates high-quality, realistic future images that capture potential temporal changes, suggesting its potential for further development as a clinical simulation tool. This could offer valuable insights for patient monitoring and treatment planning in the medical field.
胸部 X 射线成像(CXR)是医院用于评估病人病情和监测随时间变化的重要诊断工具。生成模型,特别是基于扩散的模型,在生成逼真的合成 X 射线方面已显示出前景。然而,这些模型主要侧重于利用单时间点数据(即通常在特定时间拍摄的 X 光片及其相应报告)进行条件生成,从而限制了其临床实用性,尤其是在捕捉时间变化方面。为了解决这一局限性,我们提出了一个新颖的框架 EHRXDiff,该框架通过将以前的 CXR 与随后的医疗事件(如处方、化验指标等)相结合来预测未来的 CXR 图像。我们全面评估了框架在临床一致性、人口统计学一致性和视觉真实性等三个关键方面的性能。我们证明,我们的框架能生成高质量、逼真的未来图像,并能捕捉潜在的时间变化,这表明它有潜力进一步发展成为临床模拟工具。
{"title":"Towards Predicting Temporal Changes in a Patient's Chest X-ray Images based on Electronic Health Records","authors":"Daeun Kyung, Junu Kim, Tackeun Kim, Edward Choi","doi":"arxiv-2409.07012","DOIUrl":"https://doi.org/arxiv-2409.07012","url":null,"abstract":"Chest X-ray imaging (CXR) is an important diagnostic tool used in hospitals\u0000to assess patient conditions and monitor changes over time. Generative models,\u0000specifically diffusion-based models, have shown promise in generating realistic\u0000synthetic X-rays. However, these models mainly focus on conditional generation\u0000using single-time-point data, i.e., typically CXRs taken at a specific time\u0000with their corresponding reports, limiting their clinical utility, particularly\u0000for capturing temporal changes. To address this limitation, we propose a novel\u0000framework, EHRXDiff, which predicts future CXR images by integrating previous\u0000CXRs with subsequent medical events, e.g., prescriptions, lab measures, etc.\u0000Our framework dynamically tracks and predicts disease progression based on a\u0000latent diffusion model, conditioned on the previous CXR image and a history of\u0000medical events. We comprehensively evaluate the performance of our framework\u0000across three key aspects, including clinical consistency, demographic\u0000consistency, and visual realism. We demonstrate that our framework generates\u0000high-quality, realistic future images that capture potential temporal changes,\u0000suggesting its potential for further development as a clinical simulation tool.\u0000This could offer valuable insights for patient monitoring and treatment\u0000planning in the medical field.","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman
The no-reference image quality assessment is a challenging domain that addresses estimating image quality without the original reference. We introduce an improved mechanism to extract local and non-local information from images via different transformer encoders and CNNs. The utilization of Transformer encoders aims to mitigate locality bias and generate a non-local representation by sequentially processing CNN features, which inherently capture local visual structures. Establishing a stronger connection between subjective and objective assessments is achieved through sorting within batches of images based on relative distance information. A self-consistency approach to self-supervision is presented, explicitly addressing the degradation of no-reference image quality assessment (NR-IQA) models under equivariant transformations. Our approach ensures model robustness by maintaining consistency between an image and its horizontally flipped equivalent. Through empirical evaluation of five popular image quality assessment datasets, the proposed model outperforms alternative algorithms in the context of no-reference image quality assessment datasets, especially on smaller datasets. Codes are available at href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}
{"title":"Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment","authors":"Mohammed Alsaafin, Musab Alsheikh, Saeed Anwar, Muhammad Usman","doi":"arxiv-2409.07115","DOIUrl":"https://doi.org/arxiv-2409.07115","url":null,"abstract":"The no-reference image quality assessment is a challenging domain that\u0000addresses estimating image quality without the original reference. We introduce\u0000an improved mechanism to extract local and non-local information from images\u0000via different transformer encoders and CNNs. The utilization of Transformer\u0000encoders aims to mitigate locality bias and generate a non-local representation\u0000by sequentially processing CNN features, which inherently capture local visual\u0000structures. Establishing a stronger connection between subjective and objective\u0000assessments is achieved through sorting within batches of images based on\u0000relative distance information. A self-consistency approach to self-supervision\u0000is presented, explicitly addressing the degradation of no-reference image\u0000quality assessment (NR-IQA) models under equivariant transformations. Our\u0000approach ensures model robustness by maintaining consistency between an image\u0000and its horizontally flipped equivalent. Through empirical evaluation of five\u0000popular image quality assessment datasets, the proposed model outperforms\u0000alternative algorithms in the context of no-reference image quality assessment\u0000datasets, especially on smaller datasets. Codes are available at\u0000href{https://github.com/mas94/ADTRS}{https://github.com/mas94/ADTRS}","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We tackle the problem of mosaicing bundle adjustment (i.e., simultaneous refinement of camera orientations and scene map) for a purely rotating event camera. We formulate the problem as a regularized non-linear least squares optimization. The objective function is defined using the linearized event generation model in the camera orientations and the panoramic gradient map of the scene. We show that this BA optimization has an exploitable block-diagonal sparsity structure, so that the problem can be solved efficiently. To the best of our knowledge, this is the first work to leverage such sparsity to speed up the optimization in the context of event-based cameras, without the need to convert events into image-like representations. We evaluate our method, called EMBA, on both synthetic and real-world datasets to show its effectiveness (50% photometric error decrease), yielding results of unprecedented quality. In addition, we demonstrate EMBA using high spatial resolution event cameras, yielding delicate panoramas in the wild, even without an initial map. Project page: https://github.com/tub-rip/emba
我们解决了纯旋转事件摄像机的马赛克拼接束调整(即同时调整摄像机方向和场景地图)问题。我们将该问题表述为正则化非线性最小二乘优化。目标函数是利用摄像机方向的线性化事件生成模型和场景的全景梯度图来定义的。我们证明,这种 BA 优化具有可利用的块对角线稀疏结构,因此可以高效地解决问题。据我们所知,这是第一项利用这种稀疏性加速基于事件的摄像机优化的工作,而无需将事件转换为类似图像的表示。我们在合成数据集和真实数据集上评估了我们的方法(称为 EMBA),以证明其有效性(光度误差减少 50%),并获得了前所未有的高质量结果。此外,我们还利用高空间分辨率事件相机演示了 EMBA,即使没有初始地图,也能在野外生成精致的全景图。项目网页: https://github.com/tub-rip/emba
{"title":"Event-based Mosaicing Bundle Adjustment","authors":"Shuang Guo, Guillermo Gallego","doi":"arxiv-2409.07365","DOIUrl":"https://doi.org/arxiv-2409.07365","url":null,"abstract":"We tackle the problem of mosaicing bundle adjustment (i.e., simultaneous\u0000refinement of camera orientations and scene map) for a purely rotating event\u0000camera. We formulate the problem as a regularized non-linear least squares\u0000optimization. The objective function is defined using the linearized event\u0000generation model in the camera orientations and the panoramic gradient map of\u0000the scene. We show that this BA optimization has an exploitable block-diagonal\u0000sparsity structure, so that the problem can be solved efficiently. To the best\u0000of our knowledge, this is the first work to leverage such sparsity to speed up\u0000the optimization in the context of event-based cameras, without the need to\u0000convert events into image-like representations. We evaluate our method, called\u0000EMBA, on both synthetic and real-world datasets to show its effectiveness (50%\u0000photometric error decrease), yielding results of unprecedented quality. In\u0000addition, we demonstrate EMBA using high spatial resolution event cameras,\u0000yielding delicate panoramas in the wild, even without an initial map. Project\u0000page: https://github.com/tub-rip/emba","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Underwater robot perception is crucial in scientific subsea exploration and commercial operations. The key challenges include non-uniform lighting and poor visibility in turbid environments. High-frequency forward-look sonar cameras address these issues, by providing high-resolution imagery at maximum range of tens of meters, despite complexities posed by high degree of speckle noise, and lack of color and texture. In particular, robust feature detection is an essential initial step for automated object recognition, localization, navigation, and 3-D mapping. Various local feature detectors developed for RGB images are not well-suited for sonar data. To assess their performances, we evaluate a number of feature detectors using real sonar images from five different sonar devices. Performance metrics such as detection accuracy, false positives, and robustness to variations in target characteristics and sonar devices are applied to analyze the experimental results. The study would provide a deeper insight into the bottlenecks of feature detection for sonar data, and developing more effective methods
{"title":"Performance Assessment of Feature Detection Methods for 2-D FS Sonar Imagery","authors":"Hitesh Kyatham, Shahriar Negahdaripour, Michael Xu, Xiaomin Lin, Miao Yu, Yiannis Aloimonos","doi":"arxiv-2409.07004","DOIUrl":"https://doi.org/arxiv-2409.07004","url":null,"abstract":"Underwater robot perception is crucial in scientific subsea exploration and\u0000commercial operations. The key challenges include non-uniform lighting and poor\u0000visibility in turbid environments. High-frequency forward-look sonar cameras\u0000address these issues, by providing high-resolution imagery at maximum range of\u0000tens of meters, despite complexities posed by high degree of speckle noise, and\u0000lack of color and texture. In particular, robust feature detection is an\u0000essential initial step for automated object recognition, localization,\u0000navigation, and 3-D mapping. Various local feature detectors developed for RGB\u0000images are not well-suited for sonar data. To assess their performances, we\u0000evaluate a number of feature detectors using real sonar images from five\u0000different sonar devices. Performance metrics such as detection accuracy, false\u0000positives, and robustness to variations in target characteristics and sonar\u0000devices are applied to analyze the experimental results. The study would\u0000provide a deeper insight into the bottlenecks of feature detection for sonar\u0000data, and developing more effective methods","PeriodicalId":501289,"journal":{"name":"arXiv - EE - Image and Video Processing","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142200261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}