首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
Slope-Track: Multiple Object Tracking on Ski Slopes 斜坡跟踪:滑雪斜坡上的多目标跟踪
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 DOI: 10.1016/j.cviu.2026.104663
M’Saydez Campbell , Christophe Ducottet , Damien Muselet , Rémi Emonet
In this paper, we introduce Slope-Track. Slope-Track is a novel multiple object tracking (MOT) dataset designed to reflect the complexities of real ski slope environments. The dataset has over 96,000 frames collected from 10 different ski resorts under various weather and visibility conditions. Slope-Track addresses significant challenges in slope monitoring, including small object sizes, occlusions, fast and irregular motion, and low appearance consistency. It is densely annotated with bounding boxes and object identities, facilitating the evaluation of detection and tracking algorithms. We analyze the dataset’s characteristics comparing it to the existing MOT datasets. The results demonstrate that Slope-Track encapsulates a combination of challenges found in other datasets. Additionally, we benchmark a range of existing tracking algorithms and propose a new module that improves motion-based association by dealing with the specific shape of trajectories along ski slopes. Our results demonstrate that incorporating appearance features can have a mixed impact, depending on how they are used within each tracking algorithm. In contrast, motion-based methods and spatial association strategies show more reliable performance. Overall, we provide a challenging benchmark for evaluating and improving multi-object tracking systems in real-world outdoor environments. The dataset and code can be found at https://slopetrack.github.io/.
在本文中,我们介绍了斜坡轨道。slope - track是一种新的多目标跟踪(MOT)数据集,旨在反映真实滑雪场环境的复杂性。该数据集从10个不同的滑雪胜地收集了96000帧图像,这些图像是在不同的天气和能见度条件下收集的。slope - track解决了斜坡监测中的重大挑战,包括物体尺寸小、遮挡、快速和不规则运动以及低外观一致性。它密集地标注了边界框和目标标识,便于检测和跟踪算法的评估。我们分析了数据集的特征,并将其与现有的MOT数据集进行了比较。结果表明,Slope-Track封装了在其他数据集中发现的挑战组合。此外,我们对一系列现有的跟踪算法进行了基准测试,并提出了一个新的模块,该模块通过处理沿滑雪场轨迹的特定形状来改进基于运动的关联。我们的研究结果表明,结合外观特征可能会产生混合影响,这取决于它们在每个跟踪算法中的使用方式。相比之下,基于运动的方法和空间关联策略表现出更可靠的性能。总的来说,我们为评估和改进现实世界户外环境中的多目标跟踪系统提供了一个具有挑战性的基准。数据集和代码可以在https://slopetrack.github.io/上找到。
{"title":"Slope-Track: Multiple Object Tracking on Ski Slopes","authors":"M’Saydez Campbell ,&nbsp;Christophe Ducottet ,&nbsp;Damien Muselet ,&nbsp;Rémi Emonet","doi":"10.1016/j.cviu.2026.104663","DOIUrl":"10.1016/j.cviu.2026.104663","url":null,"abstract":"<div><div>In this paper, we introduce Slope-Track. Slope-Track is a novel multiple object tracking (MOT) dataset designed to reflect the complexities of real ski slope environments. The dataset has over 96,000 frames collected from 10 different ski resorts under various weather and visibility conditions. Slope-Track addresses significant challenges in slope monitoring, including small object sizes, occlusions, fast and irregular motion, and low appearance consistency. It is densely annotated with bounding boxes and object identities, facilitating the evaluation of detection and tracking algorithms. We analyze the dataset’s characteristics comparing it to the existing MOT datasets. The results demonstrate that Slope-Track encapsulates a combination of challenges found in other datasets. Additionally, we benchmark a range of existing tracking algorithms and propose a new module that improves motion-based association by dealing with the specific shape of trajectories along ski slopes. Our results demonstrate that incorporating appearance features can have a mixed impact, depending on how they are used within each tracking algorithm. In contrast, motion-based methods and spatial association strategies show more reliable performance. Overall, we provide a challenging benchmark for evaluating and improving multi-object tracking systems in real-world outdoor environments. The dataset and code can be found at <span><span>https://slopetrack.github.io/</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104663"},"PeriodicalIF":3.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive review on advances in instance-level 6D object pose tracking 实例级6D物体姿态跟踪研究进展综述
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 DOI: 10.1016/j.cviu.2026.104667
Yanming Wu , Hui Zhang , Patrick Vandewalle , Peter Slaets , Eric Demeester
Instance-level 6D object pose tracking involves tracking a known object in 3D space and estimating its six degrees of freedom (6DoF) pose across consecutive images, starting from the initial pose in the first frame. This technology has wide-ranging applications in various fields, including robotics, augmented reality, and human–machine interaction. Over the years, significant progress has been made in this field. Many methods tackle the problem of instance-level 6D pose tracking from RGB images. These techniques can be classified based on their use of keypoints, edges, region information or direct optimization. Additionally, the availability of affordable RGB-D sensors has prompted the utilization of depth data for 6D pose tracking. Another notable advancement is the adoption of deep neural networks, which have shown promising results. Despite these developments, survey studies on the latest advancements in this field are lacking. Therefore, this work aims to fill this gap by providing a comprehensive review of recent progress in instance-level 6D object tracking, covering the aforementioned advancements. This paper provides a detailed examination of metrics, datasets, and methodology employed in this field. Based on the problem modeling approach, methods reviewed in this paper are categorized into optimization-based, learning-based, filtering-based approaches and hybrid approaches that combine various techniques. Furthermore, quantitative results on several publicly available datasets are presented and analyzed, along with applications and open challenges for future research directions.
实例级6D对象姿态跟踪涉及跟踪3D空间中的已知对象,并从第一帧中的初始姿态开始,在连续图像中估计其六个自由度(6DoF)姿态。该技术在机器人、增强现实、人机交互等领域有着广泛的应用。多年来,这一领域取得了重大进展。许多方法解决了RGB图像的实例级6D姿态跟踪问题。这些技术可以根据关键点、边缘、区域信息或直接优化的使用进行分类。此外,价格实惠的RGB-D传感器的可用性促使深度数据用于6D姿态跟踪。另一个显著的进步是深度神经网络的采用,它已经显示出有希望的结果。尽管有这些发展,但缺乏对这一领域最新进展的调查研究。因此,这项工作旨在通过提供实例级6D对象跟踪的最新进展的全面审查来填补这一空白,涵盖上述进展。本文提供了一个详细的检查指标,数据集,并在该领域采用的方法。基于问题建模方法,本文综述的方法可分为基于优化的方法、基于学习的方法、基于过滤的方法和结合各种技术的混合方法。此外,对几个公开可用数据集的定量结果进行了介绍和分析,以及未来研究方向的应用和开放挑战。
{"title":"A comprehensive review on advances in instance-level 6D object pose tracking","authors":"Yanming Wu ,&nbsp;Hui Zhang ,&nbsp;Patrick Vandewalle ,&nbsp;Peter Slaets ,&nbsp;Eric Demeester","doi":"10.1016/j.cviu.2026.104667","DOIUrl":"10.1016/j.cviu.2026.104667","url":null,"abstract":"<div><div>Instance-level 6D object pose tracking involves tracking a known object in 3D space and estimating its six degrees of freedom (6DoF) pose across consecutive images, starting from the initial pose in the first frame. This technology has wide-ranging applications in various fields, including robotics, augmented reality, and human–machine interaction. Over the years, significant progress has been made in this field. Many methods tackle the problem of instance-level 6D pose tracking from RGB images. These techniques can be classified based on their use of keypoints, edges, region information or direct optimization. Additionally, the availability of affordable RGB-D sensors has prompted the utilization of depth data for 6D pose tracking. Another notable advancement is the adoption of deep neural networks, which have shown promising results. Despite these developments, survey studies on the latest advancements in this field are lacking. Therefore, this work aims to fill this gap by providing a comprehensive review of recent progress in instance-level 6D object tracking, covering the aforementioned advancements. This paper provides a detailed examination of metrics, datasets, and methodology employed in this field. Based on the problem modeling approach, methods reviewed in this paper are categorized into optimization-based, learning-based, filtering-based approaches and hybrid approaches that combine various techniques. Furthermore, quantitative results on several publicly available datasets are presented and analyzed, along with applications and open challenges for future research directions.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104667"},"PeriodicalIF":3.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Interaction-aware representation learning for action quality assessment in freestyle skiing big air 基于交互感知表征学习的自由式滑雪大空中动作质量评估
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 DOI: 10.1016/j.cviu.2026.104634
Shiyue Chen , Yanchao Liu , Ziyue Wang , Xina Cheng , Takeshi Ikenaga
Freestyle skiing big air requires precise athlete–ski coordination to determine both technical difficulty and execution quality. Accurate action quality assessment in this discipline therefore necessitates explicit modeling of human–object interactions. However, most existing methods rely on video-level or human-centric representations, overlooking structured athlete-ski relationships and limiting evaluation of control and stability. To address this, we construct a freestyle skiing big air dataset with fine-grained annotations, including frame-level athlete-ski bounding boxes and performance-related metadata. Based on this dataset, we propose an interaction-aware framework that captures athlete–ski coordination by combining instance-level appearance and positional features through spatiotemporal reasoning. Furthermore, to avoid commonly used uniform sampling diluting performance-critical moments in long sequences, we introduce a training-free entropy-based sampling strategy that exploits athlete–ski geometric dynamics to identify performance-critical moments such as take-off, rotation, and landing, thereby reducing redundancy. Together, these designs address where to look and when to focus in big air assessment. Extensive experiments demonstrate that our method achieves a Spearman’s rank correlation of 0.7173 on the proposed dataset, outperforming state-of-the-art methods.
自由式滑雪大空中需要精确的运动员-滑雪板协调,以确定技术难度和执行质量。因此,在这一学科中,准确的行动质量评估需要对人与物之间的相互作用进行明确的建模。然而,大多数现有方法依赖于视频级或以人为中心的表示,忽略了结构化的运动员-滑雪关系,限制了对控制和稳定性的评估。为了解决这个问题,我们构建了一个带有细粒度注释的自由式滑雪大空气数据集,包括帧级运动员滑雪边界框和与性能相关的元数据。基于此数据集,我们提出了一个交互感知框架,该框架通过时空推理结合实例级外观和位置特征来捕获运动员-滑雪协调。此外,为了避免通常使用的均匀采样在长序列中稀释性能关键时刻,我们引入了一种基于无训练熵的采样策略,该策略利用运动员滑雪的几何动力学来识别起飞、旋转和着陆等性能关键时刻,从而减少冗余。总之,这些设计说明了在大空气评估中应该关注哪里以及何时关注。大量的实验表明,我们的方法在提出的数据集上实现了0.7173的Spearman秩相关,优于最先进的方法。
{"title":"Interaction-aware representation learning for action quality assessment in freestyle skiing big air","authors":"Shiyue Chen ,&nbsp;Yanchao Liu ,&nbsp;Ziyue Wang ,&nbsp;Xina Cheng ,&nbsp;Takeshi Ikenaga","doi":"10.1016/j.cviu.2026.104634","DOIUrl":"10.1016/j.cviu.2026.104634","url":null,"abstract":"<div><div>Freestyle skiing big air requires precise athlete–ski coordination to determine both technical difficulty and execution quality. Accurate action quality assessment in this discipline therefore necessitates explicit modeling of human–object interactions. However, most existing methods rely on video-level or human-centric representations, overlooking structured athlete-ski relationships and limiting evaluation of control and stability. To address this, we construct a freestyle skiing big air dataset with fine-grained annotations, including frame-level athlete-ski bounding boxes and performance-related metadata. Based on this dataset, we propose an interaction-aware framework that captures athlete–ski coordination by combining instance-level appearance and positional features through spatiotemporal reasoning. Furthermore, to avoid commonly used uniform sampling diluting performance-critical moments in long sequences, we introduce a training-free entropy-based sampling strategy that exploits athlete–ski geometric dynamics to identify performance-critical moments such as take-off, rotation, and landing, thereby reducing redundancy. Together, these designs address <em>where to look</em> and <em>when to focus</em> in big air assessment. Extensive experiments demonstrate that our method achieves a Spearman’s rank correlation of 0.7173 on the proposed dataset, outperforming state-of-the-art methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104634"},"PeriodicalIF":3.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078741","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D-aware virtual try-on using only 2D inputs 仅使用2D输入的3d感知虚拟试戴
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-02-01 DOI: 10.1016/j.cviu.2026.104661
Jaeyoon Lee , Hojoon Jung , Jongwon Choi
We present 3DFit, which is a novel 3D-aware virtual try-on framework that synthesizes realistic try-on images using only 2D inputs. Unlike previous methods that either ignore 3D body geometry or rely entirely on 3D clothing models, 3DFit utilizes 3D human meshes estimated from 2D images and adaptively transforms 3D clothing templates guided by 2D clothing images. We further introduce a warping strategy that integrates 3D information into 2D clothing images using a set of pre-designed 3D templates, enabling efficient adaptation to various body shapes and poses. As a result, our method supports accurate and personalized virtual try-on experiences. Experimental results on the VITON-HD dataset demonstrate that 3DFit outperforms existing methods in preserving garment structure and maintaining high visual quality across a wide range of body types and poses.
我们提出3DFit,这是一种新颖的3d感知虚拟试戴框架,仅使用2D输入合成逼真的试戴图像。与以往的方法忽略3D人体几何或完全依赖3D服装模型不同,3DFit利用从2D图像估计的3D人体网格,并在2D服装图像的指导下自适应变换3D服装模板。我们进一步介绍了一种翘曲策略,该策略使用一组预先设计的3D模板将3D信息集成到2D服装图像中,从而能够有效地适应各种体型和姿势。因此,我们的方法支持准确和个性化的虚拟试戴体验。VITON-HD数据集上的实验结果表明,3DFit在保留服装结构和保持高视觉质量方面优于现有方法,适用于各种体型和姿势。
{"title":"3D-aware virtual try-on using only 2D inputs","authors":"Jaeyoon Lee ,&nbsp;Hojoon Jung ,&nbsp;Jongwon Choi","doi":"10.1016/j.cviu.2026.104661","DOIUrl":"10.1016/j.cviu.2026.104661","url":null,"abstract":"<div><div>We present 3DFit, which is a novel 3D-aware virtual try-on framework that synthesizes realistic try-on images using only 2D inputs. Unlike previous methods that either ignore 3D body geometry or rely entirely on 3D clothing models, 3DFit utilizes 3D human meshes estimated from 2D images and adaptively transforms 3D clothing templates guided by 2D clothing images. We further introduce a warping strategy that integrates 3D information into 2D clothing images using a set of pre-designed 3D templates, enabling efficient adaptation to various body shapes and poses. As a result, our method supports accurate and personalized virtual try-on experiences. Experimental results on the VITON-HD dataset demonstrate that 3DFit outperforms existing methods in preserving garment structure and maintaining high visual quality across a wide range of body types and poses.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104661"},"PeriodicalIF":3.5,"publicationDate":"2026-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146078309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust video anomaly detection via causal feature-guided data augmentation 基于因果特征引导数据增强的鲁棒视频异常检测
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-29 DOI: 10.1016/j.cviu.2026.104671
Zhaoyi Li , Chenrui Shi , Che Sun , Yuwei Wu
Data augmentation is a commonly used technique to learn distinguishing features for accurately identifying video anomaly patterns. However, traditional data augmentation methods introduce spurious features-spurious correlations (e.g., lighting conditions) that lack causal links to anomalies, yielding models that excel on specific test sets but fail in real-world scenarios with varying conditions. In this paper, we propose a novel robust video anomaly detection method by designing a causal feature-guided augmentation technique that strategically enhances the learnability of predictive patterns while suppressing spurious correlations. Specifically, we first disentangle causal features ,i.e., directly predictive of labels, from spurious features via a causal generative model. We then perturb causal features to enhance their variability in training data, compelling the model to focus on invariant patterns while ignoring spurious correlations. The augmented samples are reconstructed with the refined causal representations, enhancing the model’s discriminative capability. Furthermore, we introduce a systematic evaluation framework with three increasing difficulty levels to assess robustness: seen/unseen variations, cross-dataset generalization, and cross-domain adaptation. This evaluation examines system stability under varying conditions, closely aligning with real-world surveillance deployment requirements. Experimental results validate both the effectiveness and robustness of our method.
为了准确识别视频异常模式,数据增强是一种常用的特征学习技术。然而,传统的数据增强方法引入了虚假的特征-虚假的相关性(例如,照明条件),缺乏与异常的因果联系,产生的模型在特定的测试集上表现出色,但在具有不同条件的真实场景中失败。在本文中,我们提出了一种新的鲁棒视频异常检测方法,通过设计一种因果特征引导的增强技术,在抑制虚假相关性的同时战略性地增强预测模式的可学习性。具体来说,我们首先解开因果特征,即:通过因果生成模型从虚假特征中直接预测标签。然后,我们扰动因果特征以增强其在训练数据中的可变性,迫使模型专注于不变模式而忽略虚假相关性。利用改进的因果表示重构扩充样本,增强了模型的判别能力。此外,我们引入了一个系统评估框架,该框架具有三个增加难度的级别来评估鲁棒性:可见/未见变化,跨数据集泛化和跨领域适应。该评估检查了系统在不同条件下的稳定性,与现实世界的监视部署需求密切一致。实验结果验证了该方法的有效性和鲁棒性。
{"title":"Robust video anomaly detection via causal feature-guided data augmentation","authors":"Zhaoyi Li ,&nbsp;Chenrui Shi ,&nbsp;Che Sun ,&nbsp;Yuwei Wu","doi":"10.1016/j.cviu.2026.104671","DOIUrl":"10.1016/j.cviu.2026.104671","url":null,"abstract":"<div><div>Data augmentation is a commonly used technique to learn distinguishing features for accurately identifying video anomaly patterns. However, traditional data augmentation methods introduce spurious features-spurious correlations (<em>e.g.</em>, lighting conditions) that lack causal links to anomalies, yielding models that excel on specific test sets but fail in real-world scenarios with varying conditions. In this paper, we propose a novel robust video anomaly detection method by designing a causal feature-guided augmentation technique that strategically enhances the learnability of predictive patterns while suppressing spurious correlations. Specifically, we first disentangle causal features ,<em>i.e.</em>, directly predictive of labels, from spurious features via a causal generative model. We then perturb causal features to enhance their variability in training data, compelling the model to focus on invariant patterns while ignoring spurious correlations. The augmented samples are reconstructed with the refined causal representations, enhancing the model’s discriminative capability. Furthermore, we introduce a systematic evaluation framework with three increasing difficulty levels to assess robustness: seen/unseen variations, cross-dataset generalization, and cross-domain adaptation. This evaluation examines system stability under varying conditions, closely aligning with real-world surveillance deployment requirements. Experimental results validate both the effectiveness and robustness of our method.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"265 ","pages":"Article 104671"},"PeriodicalIF":3.5,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSN: A compact semantic segmentation network for visual scene perception in assistive navigation 一种用于辅助导航视觉场景感知的紧凑语义分割网络
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.cviu.2026.104665
Yunjia Lei , Son Lam Phung , Yang Di , Abdesselam Bouzerdoum
Accuracy and efficiency are essential in assistive navigation algorithms to ensure accessibility and reliability for visually impaired individuals. However, existing deep learning models often require substantial computational resources to achieve high accuracy, making them impractical for deployment on mobile devices. To address this problem, we introduce CSN, a compact semantic segmentation network designed for assistive navigation, optimizing performance in resource-constrained environments. With CSN, we introduce two innovative modules, the cascaded atrous multi-scale enhancement (CAME) layer and the dual-path residual bottleneck (DPRB) block. The CAME layer efficiently enhances multi-scale representation through feature resampling, while the DPRB block improves feature refinement with minimal computational cost. These modules enable CSN to achieve robust and reliable segmentation across diverse and complex pedestrian environments. The proposed approach achieves the best result on the challenging TrueSight dataset, demonstrating superior prediction accuracy and computational efficiency compared to state-of-the-art lightweight models. CSN achieves a mean intersection over union of 60.99%, while maintaining a low computational cost of 83.46 giga floating-point operations, a compact model size of 8.41 million parameters, and a real-time inference speed of 52.59 frames per second.
准确性和效率是辅助导航算法的关键,以确保视障人士的可访问性和可靠性。然而,现有的深度学习模型通常需要大量的计算资源来实现高精度,这使得它们在移动设备上的部署不切实际。为了解决这个问题,我们引入了CSN,一种紧凑的语义分割网络,用于辅助导航,优化资源受限环境下的性能。在CSN中,我们引入了两个创新模块,级联的多尺度增强(CAME)层和双路剩余瓶颈(DPRB)块。CAME层通过特征重采样有效地增强了多尺度表示,而DPRB块以最小的计算成本提高了特征细化。这些模块使CSN能够在各种复杂的行人环境中实现稳健可靠的分割。该方法在具有挑战性的TrueSight数据集上取得了最佳结果,与最先进的轻量级模型相比,展示了更高的预测精度和计算效率。CSN实现了60.99%的平均交并,同时保持了83.46千兆浮点运算的低计算成本,841万个参数的紧凑模型规模,52.59帧/秒的实时推理速度。
{"title":"CSN: A compact semantic segmentation network for visual scene perception in assistive navigation","authors":"Yunjia Lei ,&nbsp;Son Lam Phung ,&nbsp;Yang Di ,&nbsp;Abdesselam Bouzerdoum","doi":"10.1016/j.cviu.2026.104665","DOIUrl":"10.1016/j.cviu.2026.104665","url":null,"abstract":"<div><div>Accuracy and efficiency are essential in assistive navigation algorithms to ensure accessibility and reliability for visually impaired individuals. However, existing deep learning models often require substantial computational resources to achieve high accuracy, making them impractical for deployment on mobile devices. To address this problem, we introduce CSN, a compact semantic segmentation network designed for assistive navigation, optimizing performance in resource-constrained environments. With CSN, we introduce two innovative modules, the cascaded atrous multi-scale enhancement (CAME) layer and the dual-path residual bottleneck (DPRB) block. The CAME layer efficiently enhances multi-scale representation through feature resampling, while the DPRB block improves feature refinement with minimal computational cost. These modules enable CSN to achieve robust and reliable segmentation across diverse and complex pedestrian environments. The proposed approach achieves the best result on the challenging TrueSight dataset, demonstrating superior prediction accuracy and computational efficiency compared to state-of-the-art lightweight models. CSN achieves a mean intersection over union of 60.99%, while maintaining a low computational cost of 83.46 giga floating-point operations, a compact model size of 8.41 million parameters, and a real-time inference speed of 52.59 frames per second.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104665"},"PeriodicalIF":3.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Biometric technology roadmapping for personalized augmentative and alternative communication 个性化、增强和替代通信的生物识别技术路线图
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-22 DOI: 10.1016/j.cviu.2026.104664
Svetlana Yanushkevich , Eva Berepiki , Philip Ciunkiewicz , Vlad Shmerko , Gregor Wolbring , Richard Guest
The purpose of this paper is to provide a biometric based technology roadmap to advance personalized Augmentative and Alternative Communication (AAC) systems for individuals with disabilities. The proposed technology roadmap introduces two core components: an AAC biometric register and interoperable technological modules. The biometric register provides a structured framework for capturing and transforming physiological and behavioral traits to enable adaptive and context-aware communication. The interoperable module design supports reconfigurable AAC architectures, ensuring compatibility across diverse interfaces and devices. Together, these mechanisms establish a foundation for automated and scalable personalization in AAC technologies. The study demonstrates that the proposed methodology for technology roadmapping effectively connects established research with emerging computational practices. This is confirmed by the results of the case study such as sign language recognition.
本文的目的是提供一个基于生物识别的技术路线图,以推进残疾人个性化的辅助和替代通信(AAC)系统。提出的技术路线图引入了两个核心组件:AAC生物识别寄存器和可互操作的技术模块。生物识别寄存器提供了一个结构化的框架,用于捕获和转换生理和行为特征,以实现自适应和上下文感知的通信。可互操作的模块设计支持可重构的AAC架构,确保跨不同接口和设备的兼容性。总之,这些机制为AAC技术中的自动化和可伸缩个性化奠定了基础。研究表明,提出的技术路线图方法有效地将已建立的研究与新兴的计算实践联系起来。手语识别等案例研究的结果也证实了这一点。
{"title":"Biometric technology roadmapping for personalized augmentative and alternative communication","authors":"Svetlana Yanushkevich ,&nbsp;Eva Berepiki ,&nbsp;Philip Ciunkiewicz ,&nbsp;Vlad Shmerko ,&nbsp;Gregor Wolbring ,&nbsp;Richard Guest","doi":"10.1016/j.cviu.2026.104664","DOIUrl":"10.1016/j.cviu.2026.104664","url":null,"abstract":"<div><div>The purpose of this paper is to provide a biometric based technology roadmap to advance personalized Augmentative and Alternative Communication (AAC) systems for individuals with disabilities. The proposed technology roadmap introduces two core components: an AAC biometric register and interoperable technological modules. The biometric register provides a structured framework for capturing and transforming physiological and behavioral traits to enable adaptive and context-aware communication. The interoperable module design supports reconfigurable AAC architectures, ensuring compatibility across diverse interfaces and devices. Together, these mechanisms establish a foundation for automated and scalable personalization in AAC technologies. The study demonstrates that the proposed methodology for technology roadmapping effectively connects established research with emerging computational practices. This is confirmed by the results of the case study such as sign language recognition.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"265 ","pages":"Article 104664"},"PeriodicalIF":3.5,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HGAvatar: One-shot high-quality 3D Gaussian head avatar HGAvatar:一次拍摄的高质量3D高斯头像
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-21 DOI: 10.1016/j.cviu.2026.104660
Xueping Wang, Jun Xu, Feihu Yan, Guangzhe Zhao
3D Gaussian Splatting (3DGS) has sparked growing interest among researchers in 3D head avatars, which is an important topic in the fields of digital human and artificial intelligence. Due to the limitations of monocular videos and the complexity of facial textures and motions, creating high-fidelity 3D head avatars still poses significant challenges. In this paper, we propose a High-Quality 3D Gaussian Head Avatar (HGAvatar) that achieves modeling head avatars with realistic textures and accurate poses within the limitations of monocular video. Specifically, we design a Detail Optimization Module (DOM) to enhance the capability in capturing intricate facial texture details. Additionally, we introduce a Geometric Deformation Module (GDM), which decouples head pose from facial expression and utilizes them as guiding factors to further adjust the properties of the Gaussian ellipsoid, allowing the model to learn more accurate fine details of facial motions. Both qualitative and quantitative experiments demonstrate that the proposed method can better preserve the fine facial texture details and accurately reproduce subtle differences in facial dynamics compared to the state-of-the-art methods.
三维高斯溅射(3DGS)是数字人类和人工智能领域的一个重要课题,引起了人们对三维头像研究的兴趣。由于单目视频的局限性和面部纹理和运动的复杂性,创建高保真3D头部头像仍然面临重大挑战。在本文中,我们提出了一种高质量的3D高斯头部头像(HGAvatar),在单目视频的限制下实现了具有逼真纹理和准确姿态的头部头像建模。具体来说,我们设计了一个细节优化模块(DOM)来增强对复杂面部纹理细节的捕获能力。此外,我们还引入了一个几何变形模块(GDM),该模块将头部姿态与面部表情解耦,并利用它们作为指导因素进一步调整高斯椭球的属性,使模型能够更准确地学习面部运动的精细细节。定性和定量实验均表明,与现有方法相比,该方法能更好地保留面部纹理细节,准确再现面部动态的细微差异。
{"title":"HGAvatar: One-shot high-quality 3D Gaussian head avatar","authors":"Xueping Wang,&nbsp;Jun Xu,&nbsp;Feihu Yan,&nbsp;Guangzhe Zhao","doi":"10.1016/j.cviu.2026.104660","DOIUrl":"10.1016/j.cviu.2026.104660","url":null,"abstract":"<div><div>3D Gaussian Splatting (3DGS) has sparked growing interest among researchers in 3D head avatars, which is an important topic in the fields of digital human and artificial intelligence. Due to the limitations of monocular videos and the complexity of facial textures and motions, creating high-fidelity 3D head avatars still poses significant challenges. In this paper, we propose a High-Quality 3D Gaussian Head Avatar (HGAvatar) that achieves modeling head avatars with realistic textures and accurate poses within the limitations of monocular video. Specifically, we design a Detail Optimization Module (DOM) to enhance the capability in capturing intricate facial texture details. Additionally, we introduce a Geometric Deformation Module (GDM), which decouples head pose from facial expression and utilizes them as guiding factors to further adjust the properties of the Gaussian ellipsoid, allowing the model to learn more accurate fine details of facial motions. Both qualitative and quantitative experiments demonstrate that the proposed method can better preserve the fine facial texture details and accurately reproduce subtle differences in facial dynamics compared to the state-of-the-art methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"265 ","pages":"Article 104660"},"PeriodicalIF":3.5,"publicationDate":"2026-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146081623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial Sensitive Grad-CAM++: Towards High-Quality Visual Explanations for Object Detectors via Weighted Combination of Gradient Maps 空间敏感的Grad-CAM++:通过梯度图的加权组合实现高质量的目标检测器视觉解释
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.cviu.2026.104658
Toshinori Yamauchi
Visual explanation for object detectors is crucial for ensuring model reliability and promoting their application across a variety of domains. However, existing visual explanation methods often produce low-quality heat maps that highlight only parts of important regions for detected instances. This limitation reduces the interpretability of object detectors and hinders effective model analysis. To address this issue, we propose Spatial Sensitive Grad-CAM++, a visual explanation method for object detectors designed to enhance the quality of heat maps. The proposed method introduces a weighted combination of gradient maps into the heat map computation, taking into account the architecture of object detectors. This approach enables a more accurate representation of each CNN channel’s contribution to the final heat map, resulting in high-quality heat maps that better identify important regions. In quantitative evaluations using the deletion and insertion metrics, we confirm that the proposed method outperforms existing methods by approximately 30% and 8%, respectively. In qualitative evaluations, we further demonstrate the superiority of the proposed method. These results suggest that the proposed method provides more faithful explanations, allowing for more accurate and reliable model analysis.
对象检测器的可视化解释对于确保模型可靠性和促进其在各种领域的应用至关重要。然而,现有的视觉解释方法经常产生低质量的热图,只突出了被检测实例的部分重要区域。这种限制降低了目标检测器的可解释性,阻碍了有效的模型分析。为了解决这一问题,我们提出了一种用于物体探测器的视觉解释方法——空间敏感的grad - cam++,旨在提高热图的质量。该方法在考虑目标探测器结构的情况下,将梯度图的加权组合引入热图计算中。这种方法能够更准确地表示每个CNN频道对最终热图的贡献,从而产生更好地识别重要区域的高质量热图。在使用删除和插入指标的定量评估中,我们确认所提出的方法比现有方法分别高出约30%和8%。在定性评价中,我们进一步证明了所提出方法的优越性。这些结果表明,所提出的方法提供了更忠实的解释,允许更准确和可靠的模型分析。
{"title":"Spatial Sensitive Grad-CAM++: Towards High-Quality Visual Explanations for Object Detectors via Weighted Combination of Gradient Maps","authors":"Toshinori Yamauchi","doi":"10.1016/j.cviu.2026.104658","DOIUrl":"10.1016/j.cviu.2026.104658","url":null,"abstract":"<div><div>Visual explanation for object detectors is crucial for ensuring model reliability and promoting their application across a variety of domains. However, existing visual explanation methods often produce low-quality heat maps that highlight only parts of important regions for detected instances. This limitation reduces the interpretability of object detectors and hinders effective model analysis. To address this issue, we propose Spatial Sensitive Grad-CAM++, a visual explanation method for object detectors designed to enhance the quality of heat maps. The proposed method introduces a weighted combination of gradient maps into the heat map computation, taking into account the architecture of object detectors. This approach enables a more accurate representation of each CNN channel’s contribution to the final heat map, resulting in high-quality heat maps that better identify important regions. In quantitative evaluations using the deletion and insertion metrics, we confirm that the proposed method outperforms existing methods by approximately 30% and 8%, respectively. In qualitative evaluations, we further demonstrate the superiority of the proposed method. These results suggest that the proposed method provides more faithful explanations, allowing for more accurate and reliable model analysis.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104658"},"PeriodicalIF":3.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiPG-FER: Bi-intelligence probabilistic graph for facial expression inference drived by action units BiPG-FER:动作单元驱动的面部表情推理的双智能概率图
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1016/j.cviu.2026.104655
Fei Wan, Ruicong Zhi
Investigating the associations between facial action units (AUs) and emotions (EMOs) helps to eliminate the constraints imposed by predefined emotion patterns. However, accurately modeling the soft, probabilistic AU–EMO relationships and inferring emotional states from AU sequences remains a challenging task. To address this, we propose a Bi-intelligence Probabilistic Graph model (BiPG-FER), which flexibly learns interpretable AU–AU and AU–EMO associations and enables automatic facial expression inference from AU sequences. In the input phase, a small portion of external prior knowledge is incorporated to mitigate the high-entropy fluctuations often caused by random initialization. We construct a two-layer fully connected AU–EMO association graph and develop an end-to-end architecture with a masking mechanism that dynamically updates the AU–AU and AU–EMO relationships by computing joint probabilities. An oversampling strategy, combined with a adaptive thresholding and a data-contribution-aware reweighting scheme, is introduced to address the skewed post-distribution of emotion labels. Finally, we design a strategy that preserves previous model weights and generates pseudo-samples based on the top-k conditional AU–EMO probabilities, allowing the model to evolve smoothly in a continuously changing and heterogeneous data stream. Experimental results demonstrate that the proposed BiPG-FER effectively produces interpretable probabilistic associations while improving recognition performance on both micro-expression and macro-expression datasets.
研究面部动作单元(AUs)和情绪(emo)之间的关联有助于消除预定义情绪模式所施加的约束。然而,准确建模软概率AU - emo关系并从AU序列推断情绪状态仍然是一项具有挑战性的任务。为了解决这个问题,我们提出了一个双智能概率图模型(BiPG-FER),该模型灵活地学习可解释的AU - AU和AU - emo关联,并能够从AU序列中自动推断面部表情。在输入阶段,加入一小部分外部先验知识,以减轻随机初始化引起的高熵波动。我们构建了一个两层全连接AU-EMO关联图,并开发了一个端到端架构,该架构采用屏蔽机制,通过计算联合概率来动态更新AU-AU和AU-EMO关系。引入了一种超采样策略,结合自适应阈值和数据贡献感知重加权方案,以解决情感标签的后分布偏差。最后,我们设计了一种策略,该策略保留了先前的模型权重,并基于top-k条件AU-EMO概率生成伪样本,使模型能够在不断变化的异构数据流中平稳演化。实验结果表明,所提出的BiPG-FER在微表达和宏表达数据集上有效地产生可解释的概率关联,同时提高了识别性能。
{"title":"BiPG-FER: Bi-intelligence probabilistic graph for facial expression inference drived by action units","authors":"Fei Wan,&nbsp;Ruicong Zhi","doi":"10.1016/j.cviu.2026.104655","DOIUrl":"10.1016/j.cviu.2026.104655","url":null,"abstract":"<div><div>Investigating the associations between facial action units (AUs) and emotions (EMOs) helps to eliminate the constraints imposed by predefined emotion patterns. However, accurately modeling the soft, probabilistic AU–EMO relationships and inferring emotional states from AU sequences remains a challenging task. To address this, we propose a Bi-intelligence Probabilistic Graph model (BiPG-FER), which flexibly learns interpretable AU–AU and AU–EMO associations and enables automatic facial expression inference from AU sequences. In the input phase, a small portion of external prior knowledge is incorporated to mitigate the high-entropy fluctuations often caused by random initialization. We construct a two-layer fully connected AU–EMO association graph and develop an end-to-end architecture with a masking mechanism that dynamically updates the AU–AU and AU–EMO relationships by computing joint probabilities. An oversampling strategy, combined with a adaptive thresholding and a data-contribution-aware reweighting scheme, is introduced to address the skewed post-distribution of emotion labels. Finally, we design a strategy that preserves previous model weights and generates pseudo-samples based on the top-k conditional AU–EMO probabilities, allowing the model to evolve smoothly in a continuously changing and heterogeneous data stream. Experimental results demonstrate that the proposed BiPG-FER effectively produces interpretable probabilistic associations while improving recognition performance on both micro-expression and macro-expression datasets.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104655"},"PeriodicalIF":3.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1