首页 > 最新文献

Image and Vision Computing最新文献

英文 中文
Automated anthropometric measurements from 3D point clouds of scanned bodies 从扫描人体的 3D 点云自动进行人体测量
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-24 DOI: 10.1016/j.imavis.2024.105306
Nahuel E. Garcia-D’Urso, Antonio Macia-Lillo, Higinio Mora-Mora, Jorge Azorin-Lopez, Andres Fuster-Guillo
Anthropometry plays a critical role across numerous sectors, particularly within healthcare and fashion, by facilitating the analysis of the human body structure. The significance of anthropometric data cannot be overstated; it is crucial for assessing nutritional status among children and adults alike, enabling early detection of conditions such as malnutrition, obesity, and being overweight. Furthermore, it is instrumental in creating tailored dietary interventions. This study introduces a novel automated technique for extracting anthropometric measurements from any body part. The proposed method leverages a parametric model to accurately determine the measurement parameters from either an unstructured point cloud or a mesh. We conducted a comprehensive evaluation of our approach by comparing perimetral measurements from over 400 body scans with expert assessments and existing state-of-the-art methods. The results demonstrate that our approach significantly surpasses the current methods for measuring the waist, hip, thigh, chest, and wrist perimeters with exceptional accuracy. These findings indicate the potential of our method to automate anthropometric analysis and offer efficient and accurate measurements for various applications in healthcare and fashion industries.
人体测量学通过促进对人体结构的分析,在众多领域,尤其是在医疗保健和时尚领域发挥着至关重要的作用。人体测量数据的重要性怎么强调都不为过;它对于评估儿童和成人的营养状况至关重要,可以及早发现营养不良、肥胖和超重等情况。此外,它还有助于制定有针对性的饮食干预措施。本研究介绍了一种新型自动技术,用于从任何身体部位提取人体测量数据。所提出的方法利用参数模型从非结构化点云或网格中准确确定测量参数。我们将 400 多张人体扫描图像的周缘测量结果与专家评估结果和现有的最先进方法进行了比较,从而对我们的方法进行了全面评估。结果表明,在测量腰围、臀围、大腿围、胸围和腕围方面,我们的方法明显优于现有方法,而且准确度极高。这些研究结果表明,我们的方法具有自动化人体测量分析的潜力,可为医疗保健和时尚行业的各种应用提供高效、准确的测量。
{"title":"Automated anthropometric measurements from 3D point clouds of scanned bodies","authors":"Nahuel E. Garcia-D’Urso,&nbsp;Antonio Macia-Lillo,&nbsp;Higinio Mora-Mora,&nbsp;Jorge Azorin-Lopez,&nbsp;Andres Fuster-Guillo","doi":"10.1016/j.imavis.2024.105306","DOIUrl":"10.1016/j.imavis.2024.105306","url":null,"abstract":"<div><div>Anthropometry plays a critical role across numerous sectors, particularly within healthcare and fashion, by facilitating the analysis of the human body structure. The significance of anthropometric data cannot be overstated; it is crucial for assessing nutritional status among children and adults alike, enabling early detection of conditions such as malnutrition, obesity, and being overweight. Furthermore, it is instrumental in creating tailored dietary interventions. This study introduces a novel automated technique for extracting anthropometric measurements from any body part. The proposed method leverages a parametric model to accurately determine the measurement parameters from either an unstructured point cloud or a mesh. We conducted a comprehensive evaluation of our approach by comparing perimetral measurements from over 400 body scans with expert assessments and existing state-of-the-art methods. The results demonstrate that our approach significantly surpasses the current methods for measuring the waist, hip, thigh, chest, and wrist perimeters with exceptional accuracy. These findings indicate the potential of our method to automate anthropometric analysis and offer efficient and accurate measurements for various applications in healthcare and fashion industries.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105306"},"PeriodicalIF":4.2,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142571748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gaussian error loss function for image smoothing 用于图像平滑的高斯误差损失函数
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1016/j.imavis.2024.105300
Wenzheng Dong, Lanling Zeng, Shunli Ji, Yang Yang
Edge-preserving image smoothing plays an important role in the fields of image processing and computational photography, and is widely used for a variety of applications. The edge-preserving filters based on global optimization models have attracted widespread attention due to their nice smoothing quality. According to existing research, the edge-preserving capability is strongly correlated to the penalty function used for gradient regularization. By analyzing the edge-stopping function of existing penalties, we demonstrate that existing image smoothing models are not adequately edge-preserving. In this paper, based on a Gaussian error function (ERF), we propose a Gaussian error loss function (ERLF), which shows stronger edge-preserving capability. We embed the proposed loss function into a global optimization model for edge-preserving image smoothing. In addition, we propose an efficient solution based on additive half-quadratic minimization and Fourier-domain optimization that is capable of processing 720P color images (over 20 fps) in real-time on an NVIDIA RTX 3070 GPU. We have experimented with the proposed filter on a number of low-level vision tasks. Both quantitative and qualitative experimental results show that the proposed filter outperforms existing filters. Therefore, it can be practical for real applications.
保边图像平滑在图像处理和计算摄影领域发挥着重要作用,并被广泛应用于各种领域。基于全局优化模型的保边滤波器因其良好的平滑质量而受到广泛关注。现有研究表明,边缘保留能力与梯度正则化所用的惩罚函数密切相关。通过分析现有惩罚函数的边缘停止函数,我们证明了现有的图像平滑模型并不能充分地保留边缘。本文以高斯误差函数(ERF)为基础,提出了一种高斯误差损失函数(ERLF),它具有更强的边缘保护能力。我们将提出的损失函数嵌入到边缘保留图像平滑的全局优化模型中。此外,我们还提出了一种基于加性半二次方最小化和傅里叶域优化的高效解决方案,能够在 NVIDIA RTX 3070 GPU 上实时处理 720P 彩色图像(超过 20 帧/秒)。我们在一些低级视觉任务中对所提出的滤波器进行了实验。定量和定性实验结果都表明,所提出的滤波器优于现有的滤波器。因此,它在实际应用中是可行的。
{"title":"Gaussian error loss function for image smoothing","authors":"Wenzheng Dong,&nbsp;Lanling Zeng,&nbsp;Shunli Ji,&nbsp;Yang Yang","doi":"10.1016/j.imavis.2024.105300","DOIUrl":"10.1016/j.imavis.2024.105300","url":null,"abstract":"<div><div>Edge-preserving image smoothing plays an important role in the fields of image processing and computational photography, and is widely used for a variety of applications. The edge-preserving filters based on global optimization models have attracted widespread attention due to their nice smoothing quality. According to existing research, the edge-preserving capability is strongly correlated to the penalty function used for gradient regularization. By analyzing the edge-stopping function of existing penalties, we demonstrate that existing image smoothing models are not adequately edge-preserving. In this paper, based on a Gaussian error function (ERF), we propose a Gaussian error loss function (ERLF), which shows stronger edge-preserving capability. We embed the proposed loss function into a global optimization model for edge-preserving image smoothing. In addition, we propose an efficient solution based on additive half-quadratic minimization and Fourier-domain optimization that is capable of processing 720P color images (over 20 fps) in real-time on an NVIDIA RTX 3070 GPU. We have experimented with the proposed filter on a number of low-level vision tasks. Both quantitative and qualitative experimental results show that the proposed filter outperforms existing filters. Therefore, it can be practical for real applications.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105300"},"PeriodicalIF":4.2,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552769","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-object tracking using score-driven hierarchical association strategy between predicted tracklets and objects 利用预测轨迹点和物体之间的分数驱动分层关联策略进行多物体跟踪
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-19 DOI: 10.1016/j.imavis.2024.105303
Tianyi Zhao, Guanci Yang, Yang Li, Minglang Lu, Haoran Sun
Machine vision is one of the major technologies to guarantee intelligent robots’ human-centered embodied intelligence. Especially in the complex dynamic scene involving multi-person, Multi-Object Tracking (MOT), which can accurately identify and track specific targets, significantly influences intelligent robots’ performance regarding behavior perception and monitoring, autonomous decision-making, and providing personalized humanoid services. In order to solve the problem of targets lost and identity switches caused by the scale variations of objects and frequent overlaps during the tracking process, this paper presents a multi-object tracking method using score-driven hierarchical association strategy between predicted tracklets and objects (ScoreMOT). Firstly, a motion prediction of occluded objects based on bounding box variation (MPOBV) is proposed to estimate the position of occluded objects. MPOBV models the motion state of the object using the bounding box and confidence score. Then, a score-driven hierarchical association strategy between predicted tracklets and objects (SHAS) is proposed to correctly associate them in frequently overlapping scenarios. SHAS associates the predicted tracklets and detected objects with different confidence in different stages. The comparison results with 16 state-of-the-art methods on Multiple Object Tracking Benchmark 20 (MOT20) and DanceTrack datasets are conducted, and ScoreMOT outperforms the compared methods.
机器视觉是保证智能机器人以人为本的智能体现的主要技术之一。特别是在多人参与的复杂动态场景中,能够准确识别和跟踪特定目标的多目标跟踪(MOT)技术对智能机器人的行为感知与监控、自主决策以及提供个性化仿人服务等性能有着重要影响。为了解决跟踪过程中因物体尺度变化和频繁重叠而导致的目标丢失和身份转换问题,本文提出了一种多目标跟踪方法,该方法采用预测轨迹子和物体之间的分数驱动分层关联策略(ScoreMOT)。首先,本文提出了一种基于边界框变化的隐蔽物体运动预测(MPOBV)来估计隐蔽物体的位置。MPOBV 利用边界框和置信度分数对物体的运动状态进行建模。然后,提出了预测轨迹点和物体之间的分数驱动分层关联策略(SHAS),以便在频繁重叠的场景中正确关联它们。SHAS 在不同阶段以不同的置信度关联预测的小轨迹和检测到的物体。在多目标跟踪基准 20(MOT20)和 DanceTrack 数据集上与 16 种最先进的方法进行了比较,结果表明 ScoreMOT 优于所比较的方法。
{"title":"Multi-object tracking using score-driven hierarchical association strategy between predicted tracklets and objects","authors":"Tianyi Zhao,&nbsp;Guanci Yang,&nbsp;Yang Li,&nbsp;Minglang Lu,&nbsp;Haoran Sun","doi":"10.1016/j.imavis.2024.105303","DOIUrl":"10.1016/j.imavis.2024.105303","url":null,"abstract":"<div><div>Machine vision is one of the major technologies to guarantee intelligent robots’ human-centered embodied intelligence. Especially in the complex dynamic scene involving multi-person, Multi-Object Tracking (MOT), which can accurately identify and track specific targets, significantly influences intelligent robots’ performance regarding behavior perception and monitoring, autonomous decision-making, and providing personalized humanoid services. In order to solve the problem of targets lost and identity switches caused by the scale variations of objects and frequent overlaps during the tracking process, this paper presents a multi-object tracking method using score-driven hierarchical association strategy between predicted tracklets and objects (ScoreMOT). Firstly, a motion prediction of occluded objects based on bounding box variation (MPOBV) is proposed to estimate the position of occluded objects. MPOBV models the motion state of the object using the bounding box and confidence score. Then, a score-driven hierarchical association strategy between predicted tracklets and objects (SHAS) is proposed to correctly associate them in frequently overlapping scenarios. SHAS associates the predicted tracklets and detected objects with different confidence in different stages. The comparison results with 16 state-of-the-art methods on Multiple Object Tracking Benchmark 20 (MOT20) and DanceTrack datasets are conducted, and ScoreMOT outperforms the compared methods.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105303"},"PeriodicalIF":4.2,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of recent advances in 3D Gaussian Splatting for optimization and reconstruction 用于优化和重建的 3D 高斯拼接技术最新进展综述
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-19 DOI: 10.1016/j.imavis.2024.105304
Jie Luo, Tianlun Huang, Weijun Wang, Wei Feng
3D Gaussian Splatting (3DGS) represents a significant breakthrough in computer graphics and vision, offering an explicit scene representation and novel view synthesis without the reliance on neural networks, unlike Neural Radiance Fields (NeRF). This paper provides a comprehensive survey of recent research on 3DGS optimization and reconstruction, with a particular focus on studies featuring published or forthcoming open-source code. In terms of optimization, the paper examines techniques such as compression, densification, splitting, anti-aliasing, and reflection enhancement. For reconstruction, it explores methods including surface mesh extraction, sparse-view object and scene reconstruction, large-scale scene reconstruction, and dynamic object and scene reconstruction. Through comparative analysis and case studies, the paper highlights the practical advantages of 3DGS and outlines future research directions, offering valuable insights for advancing the field.
3D Gaussian Splatting(3DGS)是计算机图形学和视觉领域的一项重大突破,它与神经辐射场(NeRF)不同,无需依赖神经网络就能提供明确的场景表示和新颖的视图合成。本文全面介绍了近期有关 3DGS 优化和重建的研究,尤其关注已发布或即将发布开源代码的研究。在优化方面,本文探讨了压缩、致密化、分割、抗锯齿和反射增强等技术。在重建方面,论文探讨了包括表面网格提取、稀疏视图对象和场景重建、大规模场景重建以及动态对象和场景重建等方法。通过对比分析和案例研究,论文强调了 3DGS 的实用优势,并概述了未来的研究方向,为推动该领域的发展提供了宝贵的见解。
{"title":"A review of recent advances in 3D Gaussian Splatting for optimization and reconstruction","authors":"Jie Luo,&nbsp;Tianlun Huang,&nbsp;Weijun Wang,&nbsp;Wei Feng","doi":"10.1016/j.imavis.2024.105304","DOIUrl":"10.1016/j.imavis.2024.105304","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) represents a significant breakthrough in computer graphics and vision, offering an explicit scene representation and novel view synthesis without the reliance on neural networks, unlike Neural Radiance Fields (NeRF). This paper provides a comprehensive survey of recent research on 3DGS optimization and reconstruction, with a particular focus on studies featuring published or forthcoming open-source code. In terms of optimization, the paper examines techniques such as compression, densification, splitting, anti-aliasing, and reflection enhancement. For reconstruction, it explores methods including surface mesh extraction, sparse-view object and scene reconstruction, large-scale scene reconstruction, and dynamic object and scene reconstruction. Through comparative analysis and case studies, the paper highlights the practical advantages of 3DGS and outlines future research directions, offering valuable insights for advancing the field.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105304"},"PeriodicalIF":4.2,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525914","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FHLight: A novel method of indoor scene illumination estimation using improved loss function FHLight:利用改进的损失函数估算室内场景光照度的新方法
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-19 DOI: 10.1016/j.imavis.2024.105299
Yang Wang , Ao Wang , Shijia Song , Fan Xie , Chang Ma , Jiawei Xu , Lijun Zhao
In augmented reality tasks, especially in indoor scenes, achieving illumination consistency between virtual objects and real environments is a critical challenge. Currently, mainstream methods are illumination parameters regression and illumination map generation. Among these two categories of methods, few works can effectively recover both high-frequency and low-frequency illumination information within indoor scenes. In this work, we argue that effective restoration of low-frequency illumination information forms the foundation for capturing high-frequency illumination details. In this way, we propose a novel illumination estimation method called FHLight. Technically, we use a low-frequency spherical harmonic irradiance map (LFSHIM) restored by the low-frequency illumination regression network (LFIRN) as prior information to guide the high-frequency illumination generator (HFIG) to restore the illumination map. Furthermore, we suggest an improved loss function to optimize the network training procedure, ensuring that the model accurately restores both low-frequency and high-frequency illumination information within the scene. We compare FHLight with several competitive methods, and the results demonstrate significant improvements in metrics such as RMSE, si-RMSE, and Angular error. In addition, visual experiments further confirm that FHLight is capable of generating scene illumination maps with genuine frequencies, effectively resolving the illumination consistency issue between virtual objects and real scenes. The code is available at https://github.com/WA-tyro/FHLight.git.
在增强现实任务中,尤其是在室内场景中,实现虚拟物体与真实环境之间的光照一致性是一项严峻的挑战。目前,主流的方法有光照参数回归法和光照图生成法。在这两类方法中,能有效恢复室内场景中高频和低频光照信息的作品很少。在这项工作中,我们认为有效恢复低频光照信息是捕捉高频光照细节的基础。因此,我们提出了一种名为 FHLight 的新型光照估计方法。在技术上,我们使用低频光照回归网络(LFIRN)恢复的低频球谐波辐照度图(LFSHIM)作为先验信息,引导高频光照发生器(HFIG)恢复光照图。此外,我们还提出了一种改进的损失函数来优化网络训练程序,确保模型能够准确还原场景中的低频和高频光照信息。我们将 FHLight 与几种有竞争力的方法进行了比较,结果表明在 RMSE、si-RMSE 和角度误差等指标上都有显著改善。此外,视觉实验进一步证实,FHLight 能够生成具有真实频率的场景光照图,有效解决了虚拟物体与真实场景之间的光照一致性问题。代码可在 https://github.com/WA-tyro/FHLight.git 上获取。
{"title":"FHLight: A novel method of indoor scene illumination estimation using improved loss function","authors":"Yang Wang ,&nbsp;Ao Wang ,&nbsp;Shijia Song ,&nbsp;Fan Xie ,&nbsp;Chang Ma ,&nbsp;Jiawei Xu ,&nbsp;Lijun Zhao","doi":"10.1016/j.imavis.2024.105299","DOIUrl":"10.1016/j.imavis.2024.105299","url":null,"abstract":"<div><div>In augmented reality tasks, especially in indoor scenes, achieving illumination consistency between virtual objects and real environments is a critical challenge. Currently, mainstream methods are illumination parameters regression and illumination map generation. Among these two categories of methods, few works can effectively recover both high-frequency and low-frequency illumination information within indoor scenes. In this work, we argue that effective restoration of low-frequency illumination information forms the foundation for capturing high-frequency illumination details. In this way, we propose a novel illumination estimation method called FHLight. Technically, we use a low-frequency spherical harmonic irradiance map (LFSHIM) restored by the low-frequency illumination regression network (LFIRN) as prior information to guide the high-frequency illumination generator (HFIG) to restore the illumination map. Furthermore, we suggest an improved loss function to optimize the network training procedure, ensuring that the model accurately restores both low-frequency and high-frequency illumination information within the scene. We compare FHLight with several competitive methods, and the results demonstrate significant improvements in metrics such as RMSE, si-RMSE, and Angular error. In addition, visual experiments further confirm that FHLight is capable of generating scene illumination maps with genuine frequencies, effectively resolving the illumination consistency issue between virtual objects and real scenes. The code is available at <span><span>https://github.com/WA-tyro/FHLight.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105299"},"PeriodicalIF":4.2,"publicationDate":"2024-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142552851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature differences reduction and specific features preserving network for RGB-T salient object detection 减少特征差异和特定特征保存网络用于 RGB-T 突出物体检测
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-18 DOI: 10.1016/j.imavis.2024.105302
Qiqi Xu, Zhenguang Di, Haoyu Dong, Gang Yang
In RGB-T salient object detection, effective utilization of the different characteristics of RGB and thermal modalities is essential to achieve accurate detection. Most of the previous methods usually only focus on reducing the differences between modalities, which may ignore the specific features that are crucial for salient object detection, leading to suboptimal results. To address the above issue, an RGB-T SOD network that simultaneously considers the reduction of modality differences and the preservation of specific features is proposed. Specifically, we construct a modality differences reduction and specific features preserving module (MDRSFPM) which aims to bridge the gap between modalities and enhance the specific features of each modality. In MDRSFPM, the dynamic vector generated by the interaction of RGB and thermal features is used to reduce modality differences, and then a dual branch is constructed to deal with the RGB and thermal modalities separately, employing a combination of channel-level and spatial-level operations to preserve their respective specific features. In addition, a multi-scale global feature enhancement module (MGFEM) is proposed to enhance global contextual information to provide guidance information for the subsequent decoding stage, so that the model can more easily localize the salient objects. Furthermore, our approach includes a fully fusion and gate module (FFGM) that utilizes dynamically generated importance maps to selectively filter and fuse features during the decoding process. Extensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on three publicly available RGB-T datasets remarkably. Our code will be released at https://github.com/JOOOOKII/FRPNet.
在 RGB-T 突出物体检测中,有效利用 RGB 和热成像模式的不同特性对于实现精确检测至关重要。以往的大多数方法通常只关注减少模态之间的差异,这可能会忽略对突出物体检测至关重要的特定特征,从而导致检测结果不理想。为了解决上述问题,我们提出了一种同时考虑减少模态差异和保留特定特征的 RGB-T SOD 网络。具体来说,我们构建了一个减少模态差异和保留特定特征的模块(MDRSFPM),旨在弥合模态之间的差距并增强每种模态的特定特征。在 MDRSFPM 中,由 RGB 和热敏特征交互产生的动态向量被用来减少模态差异,然后构建一个双分支来分别处理 RGB 和热敏模态,采用通道级和空间级操作的组合来保留各自的特定特征。此外,我们还提出了多尺度全局特征增强模块(MGFEM),以增强全局上下文信息,为后续解码阶段提供指导信息,从而使模型更容易定位突出对象。此外,我们的方法还包括一个完全融合和门模块(FFGM),它利用动态生成的重要性图在解码过程中选择性地过滤和融合特征。广泛的实验证明,在三个公开的 RGB-T 数据集上,我们提出的模型明显超越了其他最先进的模型。我们的代码将在 https://github.com/JOOOOKII/FRPNet 上发布。
{"title":"Feature differences reduction and specific features preserving network for RGB-T salient object detection","authors":"Qiqi Xu,&nbsp;Zhenguang Di,&nbsp;Haoyu Dong,&nbsp;Gang Yang","doi":"10.1016/j.imavis.2024.105302","DOIUrl":"10.1016/j.imavis.2024.105302","url":null,"abstract":"<div><div>In RGB-T salient object detection, effective utilization of the different characteristics of RGB and thermal modalities is essential to achieve accurate detection. Most of the previous methods usually only focus on reducing the differences between modalities, which may ignore the specific features that are crucial for salient object detection, leading to suboptimal results. To address the above issue, an RGB-T SOD network that simultaneously considers the reduction of modality differences and the preservation of specific features is proposed. Specifically, we construct a modality differences reduction and specific features preserving module (MDRSFPM) which aims to bridge the gap between modalities and enhance the specific features of each modality. In MDRSFPM, the dynamic vector generated by the interaction of RGB and thermal features is used to reduce modality differences, and then a dual branch is constructed to deal with the RGB and thermal modalities separately, employing a combination of channel-level and spatial-level operations to preserve their respective specific features. In addition, a multi-scale global feature enhancement module (MGFEM) is proposed to enhance global contextual information to provide guidance information for the subsequent decoding stage, so that the model can more easily localize the salient objects. Furthermore, our approach includes a fully fusion and gate module (FFGM) that utilizes dynamically generated importance maps to selectively filter and fuse features during the decoding process. Extensive experiments demonstrate that our proposed model surpasses other state-of-the-art models on three publicly available RGB-T datasets remarkably. Our code will be released at <span><span>https://github.com/JOOOOKII/FRPNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"152 ","pages":"Article 105302"},"PeriodicalIF":4.2,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142561279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pyramid quaternion discrete cosine transform based ConvNet for cancelable face recognition 基于金字塔四元离散余弦变换的 ConvNet,用于可抵消人脸识别
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-13 DOI: 10.1016/j.imavis.2024.105301
Zhuhong Shao , Zuowei Zhang , Leding Li , Hailiang Li , Xuanyi Li , Bicao Li , Yuanyuan Shang , Bin Chen
The current face scanning era can quickly and conveniently attain identity authentication, but face images imply sensitive information simultaneously. Under such context, we introduce a novel cancelable face recognition methodology by using quaternion transform based convolutional network. Firstly, face images in different modalities (e.g., RGB and depth or near-infrared) are encoded into full quaternion matrix for synchronous processing. Based on the designed multiresolution quaternion singular value decomposition, we can obtain pyramid representation. Then they are transformed through random projection for making the process noninvertible. Even if the feature template is compromised, a new one can be generated. Subsequently, a three-stream convolutional network is developed to learn features, where predefined filters are stemmed from quaternion two-dimensional discrete cosine transform basis. Extensive experiments on the TIII-D, NVIE and CASIA datasets have demonstrated that the proposed method obtains competitive performance, also satisfies redistributable and irreversible.
当前的人脸扫描时代可以快速便捷地实现身份验证,但人脸图像同时意味着敏感信息。在此背景下,我们利用基于四元变换的卷积网络引入了一种新型的可抵消人脸识别方法。首先,将不同模态(如 RGB、深度或近红外)的人脸图像编码成全四元数矩阵进行同步处理。根据设计的多分辨率四元数奇异值分解,我们可以得到金字塔表示。然后通过随机投影进行变换,使处理过程不可逆。即使特征模板受到破坏,也能生成新的模板。随后,我们开发了一个三流卷积网络来学习特征,其中预定义的滤波器源于四元二维离散余弦变换基础。在 TIII-D、NVIE 和 CASIA 数据集上进行的大量实验表明,所提出的方法不仅能获得具有竞争力的性能,还能满足可再分配和不可逆的要求。
{"title":"Pyramid quaternion discrete cosine transform based ConvNet for cancelable face recognition","authors":"Zhuhong Shao ,&nbsp;Zuowei Zhang ,&nbsp;Leding Li ,&nbsp;Hailiang Li ,&nbsp;Xuanyi Li ,&nbsp;Bicao Li ,&nbsp;Yuanyuan Shang ,&nbsp;Bin Chen","doi":"10.1016/j.imavis.2024.105301","DOIUrl":"10.1016/j.imavis.2024.105301","url":null,"abstract":"<div><div>The current <em>face scanning era</em> can quickly and conveniently attain identity authentication, but face images imply sensitive information simultaneously. Under such context, we introduce a novel cancelable face recognition methodology by using quaternion transform based convolutional network. Firstly, face images in different modalities (e.g., RGB and depth or near-infrared) are encoded into full quaternion matrix for synchronous processing. Based on the designed multiresolution quaternion singular value decomposition, we can obtain pyramid representation. Then they are transformed through random projection for making the process noninvertible. Even if the feature template is compromised, a new one can be generated. Subsequently, a three-stream convolutional network is developed to learn features, where predefined filters are stemmed from quaternion two-dimensional discrete cosine transform basis. Extensive experiments on the TIII-D, NVIE and CASIA datasets have demonstrated that the proposed method obtains competitive performance, also satisfies redistributable and irreversible.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105301"},"PeriodicalIF":4.2,"publicationDate":"2024-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A decision support system for acute lymphoblastic leukemia detection based on explainable artificial intelligence 基于可解释人工智能的急性淋巴细胞白血病检测决策支持系统
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-11 DOI: 10.1016/j.imavis.2024.105298
Angelo Genovese, Vincenzo Piuri, Fabio Scotti
The detection of acute lymphoblastic leukemia (ALL) via deep learning (DL) has received great interest because of its high accuracy in detecting lymphoblasts without the need for handcrafted feature extraction. However, current DL models, such as convolutional neural networks and vision Transformers, are extremely complex, making them black boxes that perform classification in an obscure way. To compensate for this and increase the explainability of the decisions made by such methods, in this paper, we propose an innovative decision support system for ALL detection that is based on DL and explainable artificial intelligence (XAI). Our approach first introduces causality into the decision with a metric learning approach, enabling a decision to be made by analyzing the most similar images in the database. Second, our method integrates XAI techniques to allow even non-trained personnel to obtain an informed decision by analyzing which regions of the images are most similar and how the samples are organized in the latent space. The results on publicly available ALL databases confirm the validity of our approach in opening the black box while achieving similar or superior accuracy to that of existing approaches.
通过深度学习(DL)检测急性淋巴细胞白血病(ALL)受到了极大的关注,因为它无需人工特征提取就能高精度地检测出淋巴细胞。然而,目前的深度学习模型(如卷积神经网络和视觉变形器)极其复杂,使其成为以模糊方式进行分类的黑盒子。为了弥补这一缺陷并提高此类方法所做决策的可解释性,我们在本文中提出了一种基于 DL 和可解释人工智能(XAI)的创新型 ALL 检测决策支持系统。我们的方法首先通过度量学习方法将因果关系引入决策,通过分析数据库中最相似的图像来做出决策。其次,我们的方法整合了 XAI 技术,通过分析图像中最相似的区域以及样本在潜在空间中的组织方式,即使是未经培训的人员也能做出明智的决策。在公开可用的 ALL 数据库上取得的结果证实了我们的方法在打开黑箱方面的有效性,同时达到了与现有方法相似或更高的准确性。
{"title":"A decision support system for acute lymphoblastic leukemia detection based on explainable artificial intelligence","authors":"Angelo Genovese,&nbsp;Vincenzo Piuri,&nbsp;Fabio Scotti","doi":"10.1016/j.imavis.2024.105298","DOIUrl":"10.1016/j.imavis.2024.105298","url":null,"abstract":"<div><div>The detection of acute lymphoblastic leukemia (ALL) via deep learning (DL) has received great interest because of its high accuracy in detecting lymphoblasts without the need for handcrafted feature extraction. However, current DL models, such as convolutional neural networks and vision Transformers, are extremely complex, making them black boxes that perform classification in an obscure way. To compensate for this and increase the explainability of the decisions made by such methods, in this paper, we propose an innovative decision support system for ALL detection that is based on DL and explainable artificial intelligence (XAI). Our approach first introduces causality into the decision with a metric learning approach, enabling a decision to be made by analyzing the most similar images in the database. Second, our method integrates XAI techniques to allow even non-trained personnel to obtain an informed decision by analyzing which regions of the images are most similar and how the samples are organized in the latent space. The results on publicly available ALL databases confirm the validity of our approach in opening the black box while achieving similar or superior accuracy to that of existing approaches.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105298"},"PeriodicalIF":4.2,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142445306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parameter efficient finetuning of text-to-image models with trainable self-attention layer 利用可训练的自我注意层对文本到图像模型进行参数高效微调
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.imavis.2024.105296
Zhuoyuan Li, Yi Sun
We propose a novel model to efficiently finetune pretrained Text-to-Image models by introducing additional image prompts. The model integrates information from image prompts into the text-to-image (T2I) diffusion process by locking the parameters of the large T2I model and reusing its trainable copy, rather than relying on additional adapters. The trainable copy guides the model by injecting its trainable self-attention features into the original diffusion model, enabling the synthesis of a new specific concept. We also apply Low-Rank Adaptation (LoRA) to restrict the trainable parameters in the self-attention layers. Furthermore, the network is optimized alongside a text embedding that serves as an object identifier to generate contextually relevant visual content. Our model is simple and effective, with a small memory footprint, yet can achieve comparable performance to a fully fine-tuned T2I model in both qualitative and quantitative evaluations.
我们提出了一种新颖的模型,通过引入额外的图像提示,有效地对预训练的文本到图像模型进行微调。该模型通过锁定大型 T2I 模型的参数并重复使用其可训练副本,而不是依赖额外的适配器,将图像提示信息整合到文本到图像(T2I)的扩散过程中。可训练副本通过将其可训练的自我注意特征注入原始扩散模型来引导模型,从而实现新的特定概念的合成。我们还应用了低级自适应(Low-Rank Adaptation,LoRA)技术来限制自我注意层中的可训练参数。此外,我们还对网络进行了优化,将文本嵌入作为对象标识符,以生成与上下文相关的视觉内容。我们的模型简单有效,内存占用小,但在定性和定量评估中的性能可与经过全面微调的 T2I 模型相媲美。
{"title":"Parameter efficient finetuning of text-to-image models with trainable self-attention layer","authors":"Zhuoyuan Li,&nbsp;Yi Sun","doi":"10.1016/j.imavis.2024.105296","DOIUrl":"10.1016/j.imavis.2024.105296","url":null,"abstract":"<div><div>We propose a novel model to efficiently finetune pretrained Text-to-Image models by introducing additional image prompts. The model integrates information from image prompts into the text-to-image (T2I) diffusion process by locking the parameters of the large T2I model and reusing its trainable copy, rather than relying on additional adapters. The trainable copy guides the model by injecting its trainable self-attention features into the original diffusion model, enabling the synthesis of a new specific concept. We also apply Low-Rank Adaptation (LoRA) to restrict the trainable parameters in the self-attention layers. Furthermore, the network is optimized alongside a text embedding that serves as an object identifier to generate contextually relevant visual content. Our model is simple and effective, with a small memory footprint, yet can achieve comparable performance to a fully fine-tuned T2I model in both qualitative and quantitative evaluations.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105296"},"PeriodicalIF":4.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142525913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global information regulation network for multimodal sentiment analysis 用于多模态情感分析的全球信息监管网络
IF 4.2 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-10 DOI: 10.1016/j.imavis.2024.105297
Shufan Xie, Qiaohong Chen, Xian Fang, Qi Sun
Human language is considered multimodal, containing natural language, visual elements, and acoustic signals. Multimodal Sentiment Analysis (MSA) concentrates on the integration of various modalities to capture the sentiment polarity or intensity expressed in human language. Nevertheless, the absence of a comprehensive strategy for processing and integrating multimodal representations results in the inclusion of inaccurate or noisy data from diverse modalities in the ultimate decision-making process, potentially leading to the neglect of crucial information within or across modalities. To address this issue, we propose the Global Information Regulation Network (GIRN), a novel framework designed to regulate information flow and decision-making processes across various stages, ranging from unimodal feature extraction to multimodal outcome prediction. Specifically, before modal fusion stage, we maximize the mutual information between modalities and refine the input signals through random feature erasing, yielding a more robust unimodal representation. In the process of modal fusion, we enhance the traditional Transformer encoder through the gate mechanism and stacked attention to dynamically fuse the target and auxiliary modalities. After modal fusion, cross-hierarchical contrastive learning and decision gate are employed to integrate the valuable information represented in different categories and hierarchies. Extensive experiments conducted on the CMU-MOSI and CMU-MOSEI datasets suggest that our methodology outperforms existing approaches across nearly all criteria.
人类语言被认为是多模态的,包含自然语言、视觉元素和声音信号。多模态情感分析(MSA)侧重于整合各种模态,以捕捉人类语言中表达的情感极性或强度。然而,由于缺乏处理和整合多模态表征的综合策略,在最终决策过程中会包含来自不同模态的不准确或有噪声的数据,从而可能导致忽略模态内或模态间的关键信息。为了解决这个问题,我们提出了全球信息调节网络(GIRN),这是一个新颖的框架,旨在调节从单模态特征提取到多模态结果预测等各个阶段的信息流和决策过程。具体来说,在模态融合阶段之前,我们会最大化模态之间的互信息,并通过随机特征擦除来完善输入信号,从而获得更稳健的单模态表示。在模态融合过程中,我们通过门机制和叠加注意力来增强传统的 Transformer 编码器,从而实现目标模态和辅助模态的动态融合。模态融合后,我们采用跨层级对比学习和决策门来整合不同类别和层级所代表的有价值信息。在 CMU-MOSI 和 CMU-MOSEI 数据集上进行的大量实验表明,我们的方法在几乎所有标准上都优于现有方法。
{"title":"Global information regulation network for multimodal sentiment analysis","authors":"Shufan Xie,&nbsp;Qiaohong Chen,&nbsp;Xian Fang,&nbsp;Qi Sun","doi":"10.1016/j.imavis.2024.105297","DOIUrl":"10.1016/j.imavis.2024.105297","url":null,"abstract":"<div><div>Human language is considered multimodal, containing natural language, visual elements, and acoustic signals. Multimodal Sentiment Analysis (MSA) concentrates on the integration of various modalities to capture the sentiment polarity or intensity expressed in human language. Nevertheless, the absence of a comprehensive strategy for processing and integrating multimodal representations results in the inclusion of inaccurate or noisy data from diverse modalities in the ultimate decision-making process, potentially leading to the neglect of crucial information within or across modalities. To address this issue, we propose the Global Information Regulation Network (GIRN), a novel framework designed to regulate information flow and decision-making processes across various stages, ranging from unimodal feature extraction to multimodal outcome prediction. Specifically, before modal fusion stage, we maximize the mutual information between modalities and refine the input signals through random feature erasing, yielding a more robust unimodal representation. In the process of modal fusion, we enhance the traditional Transformer encoder through the gate mechanism and stacked attention to dynamically fuse the target and auxiliary modalities. After modal fusion, cross-hierarchical contrastive learning and decision gate are employed to integrate the valuable information represented in different categories and hierarchies. Extensive experiments conducted on the CMU-MOSI and CMU-MOSEI datasets suggest that our methodology outperforms existing approaches across nearly all criteria.</div></div>","PeriodicalId":50374,"journal":{"name":"Image and Vision Computing","volume":"151 ","pages":"Article 105297"},"PeriodicalIF":4.2,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Image and Vision Computing
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1