首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
Spatial Sensitive Grad-CAM++: Towards High-Quality Visual Explanations for Object Detectors via Weighted Combination of Gradient Maps 空间敏感的Grad-CAM++:通过梯度图的加权组合实现高质量的目标检测器视觉解释
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-16 DOI: 10.1016/j.cviu.2026.104658
Toshinori Yamauchi
Visual explanation for object detectors is crucial for ensuring model reliability and promoting their application across a variety of domains. However, existing visual explanation methods often produce low-quality heat maps that highlight only parts of important regions for detected instances. This limitation reduces the interpretability of object detectors and hinders effective model analysis. To address this issue, we propose Spatial Sensitive Grad-CAM++, a visual explanation method for object detectors designed to enhance the quality of heat maps. The proposed method introduces a weighted combination of gradient maps into the heat map computation, taking into account the architecture of object detectors. This approach enables a more accurate representation of each CNN channel’s contribution to the final heat map, resulting in high-quality heat maps that better identify important regions. In quantitative evaluations using the deletion and insertion metrics, we confirm that the proposed method outperforms existing methods by approximately 30% and 8%, respectively. In qualitative evaluations, we further demonstrate the superiority of the proposed method. These results suggest that the proposed method provides more faithful explanations, allowing for more accurate and reliable model analysis.
对象检测器的可视化解释对于确保模型可靠性和促进其在各种领域的应用至关重要。然而,现有的视觉解释方法经常产生低质量的热图,只突出了被检测实例的部分重要区域。这种限制降低了目标检测器的可解释性,阻碍了有效的模型分析。为了解决这一问题,我们提出了一种用于物体探测器的视觉解释方法——空间敏感的grad - cam++,旨在提高热图的质量。该方法在考虑目标探测器结构的情况下,将梯度图的加权组合引入热图计算中。这种方法能够更准确地表示每个CNN频道对最终热图的贡献,从而产生更好地识别重要区域的高质量热图。在使用删除和插入指标的定量评估中,我们确认所提出的方法比现有方法分别高出约30%和8%。在定性评价中,我们进一步证明了所提出方法的优越性。这些结果表明,所提出的方法提供了更忠实的解释,允许更准确和可靠的模型分析。
{"title":"Spatial Sensitive Grad-CAM++: Towards High-Quality Visual Explanations for Object Detectors via Weighted Combination of Gradient Maps","authors":"Toshinori Yamauchi","doi":"10.1016/j.cviu.2026.104658","DOIUrl":"10.1016/j.cviu.2026.104658","url":null,"abstract":"<div><div>Visual explanation for object detectors is crucial for ensuring model reliability and promoting their application across a variety of domains. However, existing visual explanation methods often produce low-quality heat maps that highlight only parts of important regions for detected instances. This limitation reduces the interpretability of object detectors and hinders effective model analysis. To address this issue, we propose Spatial Sensitive Grad-CAM++, a visual explanation method for object detectors designed to enhance the quality of heat maps. The proposed method introduces a weighted combination of gradient maps into the heat map computation, taking into account the architecture of object detectors. This approach enables a more accurate representation of each CNN channel’s contribution to the final heat map, resulting in high-quality heat maps that better identify important regions. In quantitative evaluations using the deletion and insertion metrics, we confirm that the proposed method outperforms existing methods by approximately 30% and 8%, respectively. In qualitative evaluations, we further demonstrate the superiority of the proposed method. These results suggest that the proposed method provides more faithful explanations, allowing for more accurate and reliable model analysis.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104658"},"PeriodicalIF":3.5,"publicationDate":"2026-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiPG-FER: Bi-intelligence probabilistic graph for facial expression inference drived by action units BiPG-FER:动作单元驱动的面部表情推理的双智能概率图
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-14 DOI: 10.1016/j.cviu.2026.104655
Fei Wan, Ruicong Zhi
Investigating the associations between facial action units (AUs) and emotions (EMOs) helps to eliminate the constraints imposed by predefined emotion patterns. However, accurately modeling the soft, probabilistic AU–EMO relationships and inferring emotional states from AU sequences remains a challenging task. To address this, we propose a Bi-intelligence Probabilistic Graph model (BiPG-FER), which flexibly learns interpretable AU–AU and AU–EMO associations and enables automatic facial expression inference from AU sequences. In the input phase, a small portion of external prior knowledge is incorporated to mitigate the high-entropy fluctuations often caused by random initialization. We construct a two-layer fully connected AU–EMO association graph and develop an end-to-end architecture with a masking mechanism that dynamically updates the AU–AU and AU–EMO relationships by computing joint probabilities. An oversampling strategy, combined with a adaptive thresholding and a data-contribution-aware reweighting scheme, is introduced to address the skewed post-distribution of emotion labels. Finally, we design a strategy that preserves previous model weights and generates pseudo-samples based on the top-k conditional AU–EMO probabilities, allowing the model to evolve smoothly in a continuously changing and heterogeneous data stream. Experimental results demonstrate that the proposed BiPG-FER effectively produces interpretable probabilistic associations while improving recognition performance on both micro-expression and macro-expression datasets.
研究面部动作单元(AUs)和情绪(emo)之间的关联有助于消除预定义情绪模式所施加的约束。然而,准确建模软概率AU - emo关系并从AU序列推断情绪状态仍然是一项具有挑战性的任务。为了解决这个问题,我们提出了一个双智能概率图模型(BiPG-FER),该模型灵活地学习可解释的AU - AU和AU - emo关联,并能够从AU序列中自动推断面部表情。在输入阶段,加入一小部分外部先验知识,以减轻随机初始化引起的高熵波动。我们构建了一个两层全连接AU-EMO关联图,并开发了一个端到端架构,该架构采用屏蔽机制,通过计算联合概率来动态更新AU-AU和AU-EMO关系。引入了一种超采样策略,结合自适应阈值和数据贡献感知重加权方案,以解决情感标签的后分布偏差。最后,我们设计了一种策略,该策略保留了先前的模型权重,并基于top-k条件AU-EMO概率生成伪样本,使模型能够在不断变化的异构数据流中平稳演化。实验结果表明,所提出的BiPG-FER在微表达和宏表达数据集上有效地产生可解释的概率关联,同时提高了识别性能。
{"title":"BiPG-FER: Bi-intelligence probabilistic graph for facial expression inference drived by action units","authors":"Fei Wan,&nbsp;Ruicong Zhi","doi":"10.1016/j.cviu.2026.104655","DOIUrl":"10.1016/j.cviu.2026.104655","url":null,"abstract":"<div><div>Investigating the associations between facial action units (AUs) and emotions (EMOs) helps to eliminate the constraints imposed by predefined emotion patterns. However, accurately modeling the soft, probabilistic AU–EMO relationships and inferring emotional states from AU sequences remains a challenging task. To address this, we propose a Bi-intelligence Probabilistic Graph model (BiPG-FER), which flexibly learns interpretable AU–AU and AU–EMO associations and enables automatic facial expression inference from AU sequences. In the input phase, a small portion of external prior knowledge is incorporated to mitigate the high-entropy fluctuations often caused by random initialization. We construct a two-layer fully connected AU–EMO association graph and develop an end-to-end architecture with a masking mechanism that dynamically updates the AU–AU and AU–EMO relationships by computing joint probabilities. An oversampling strategy, combined with a adaptive thresholding and a data-contribution-aware reweighting scheme, is introduced to address the skewed post-distribution of emotion labels. Finally, we design a strategy that preserves previous model weights and generates pseudo-samples based on the top-k conditional AU–EMO probabilities, allowing the model to evolve smoothly in a continuously changing and heterogeneous data stream. Experimental results demonstrate that the proposed BiPG-FER effectively produces interpretable probabilistic associations while improving recognition performance on both micro-expression and macro-expression datasets.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104655"},"PeriodicalIF":3.5,"publicationDate":"2026-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cascaded-parallel decoders and anchor-guided query generator for human-object interaction 用于人-物交互的级联并行解码器和锚引导查询生成器
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1016/j.cviu.2026.104648
Hang Su , Hong-Bo Zhang , Jia-Yin Luo , Jing-Hua Liu , Zhen-Zhen Sun , Ji-Xiang Du
Transformer-based methods have shown strong potential in human-object interaction (HOI) detection, yet challenges remain due to task interference and unstable query initialization. To overcome these limitations, the paper adopts a cascaded-parallel decoders architecture that balances the efficiency of one-stage models with the task decoupling advantages of two-stage designs. A key component is an anchor-guided query generator, which explicitly incorporates human-object spatial information into query initialization. This provides queries with strong spatial awareness, stabilizes training, and significantly improves human and object localization-a crucial prerequisite for accurate HOI detection. In addition, interaction features are further refined by modeling multi-relational cues within each triplet, facilitating more reliable verb classification. Extensive experiments on the HICO-DET and V-COCO datasets demonstrate that the proposed method achieves superior performance compared to state-of-the-art approaches.
基于变压器的方法在人机交互(HOI)检测中显示出强大的潜力,但由于任务干扰和查询初始化不稳定,仍然存在挑战。为了克服这些限制,本文采用级联并行解码器架构,平衡了一级模型的效率和两级设计的任务解耦优势。关键组件是锚引导的查询生成器,它显式地将人-对象空间信息合并到查询初始化中。这为查询提供了强大的空间意识,稳定了训练,并显著提高了人和物体的定位——这是准确检测HOI的关键先决条件。此外,通过在每个三元组中建模多关系线索,进一步改进了交互特性,促进了更可靠的动词分类。在HICO-DET和V-COCO数据集上进行的大量实验表明,与最先进的方法相比,所提出的方法具有优越的性能。
{"title":"Cascaded-parallel decoders and anchor-guided query generator for human-object interaction","authors":"Hang Su ,&nbsp;Hong-Bo Zhang ,&nbsp;Jia-Yin Luo ,&nbsp;Jing-Hua Liu ,&nbsp;Zhen-Zhen Sun ,&nbsp;Ji-Xiang Du","doi":"10.1016/j.cviu.2026.104648","DOIUrl":"10.1016/j.cviu.2026.104648","url":null,"abstract":"<div><div>Transformer-based methods have shown strong potential in human-object interaction (HOI) detection, yet challenges remain due to task interference and unstable query initialization. To overcome these limitations, the paper adopts a cascaded-parallel decoders architecture that balances the efficiency of one-stage models with the task decoupling advantages of two-stage designs. A key component is an anchor-guided query generator, which explicitly incorporates human-object spatial information into query initialization. This provides queries with strong spatial awareness, stabilizes training, and significantly improves human and object localization-a crucial prerequisite for accurate HOI detection. In addition, interaction features are further refined by modeling multi-relational cues within each triplet, facilitating more reliable verb classification. Extensive experiments on the HICO-DET and V-COCO datasets demonstrate that the proposed method achieves superior performance compared to state-of-the-art approaches.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104648"},"PeriodicalIF":3.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TOTNet: Occlusion-aware temporal tracking for robust ball detection in sports videos TOTNet:运动视频中用于鲁棒球检测的闭塞感知时间跟踪
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1016/j.cviu.2026.104657
Hao Xu , Arbind Agrahari Baniya , Sam Wells , Mohamed Reda Bouadjenek , Richard Dazeley , Sunil Aryal
Ball tracking is a fundamental problem in computer vision, particularly in sports analytics, where it underpins tasks such as analyzing ball movement in soccer and basketball or detecting bounce locations in tennis and table tennis. Most existing methods are developed and evaluated on resource-rich, commercial sports footage with ideal camera angles, high-resolution imagery, and multiple viewpoints. In contrast, many other sports contexts, including semi-professional leagues, local amateur competitions, and Paralympic sports, lack these resources. Footage in these settings often comes from single, fixed, and suboptimal viewpoints, where occlusion becomes a dominant challenge for automated tracking. Existing methods frequently fall short in such conditions because their architectures and training strategies do not explicitly account for prolonged or full occlusion. To address this gap, we present the Table Tennis Australia (TTA) dataset, the first professionally annotated Paralympic table tennis benchmark with dense visibility labels, captured under realistic single-view conditions. With 2,396 occluded instances (including 998 fully occluded), TTA is the most occlusion-rich publicly available dataset to date. Alongside the dataset, we propose the Temporal Occlusion Tracking Network (TOTNet), a novel tracking system designed to maintain localization accuracy even under extended occlusion. Through comprehensive experiments on four sports tracking datasets, TOTNet achieves state-of-the-art performance, with substantial gains in full-occlusion scenarios. We release the dataset, code, and evaluation scripts to foster reproducibility and future research in occlusion robust tracking for low resource sports; all materials are available at https://github.com/AugustRushG/TOTNet.
球跟踪是计算机视觉中的一个基本问题,特别是在运动分析中,它支持诸如分析足球和篮球中的球运动或检测网球和乒乓球中的弹跳位置等任务。大多数现有的方法都是在资源丰富、具有理想摄像机角度、高分辨率图像和多视点的商业体育镜头上开发和评估的。相比之下,许多其他体育项目,包括半职业联赛、当地业余比赛和残奥会,都缺乏这些资源。这些设置中的镜头通常来自单一,固定和次优视点,其中遮挡成为自动跟踪的主要挑战。现有的方法在这种情况下经常失败,因为它们的架构和训练策略没有明确地考虑到长时间或完全的遮挡。为了解决这一差距,我们提出了澳大利亚乒乓球(TTA)数据集,这是第一个专业注释的残奥会乒乓球基准,具有密集可见性标签,在现实的单视图条件下捕获。TTA有2396个遮挡实例(包括998个完全遮挡),是迄今为止遮挡最丰富的公开可用数据集。除了数据集,我们还提出了一种新的跟踪系统,即时间遮挡跟踪网络(TOTNet),该系统旨在即使在长时间遮挡下也能保持定位精度。通过在四个运动跟踪数据集上的综合实验,TOTNet达到了最先进的性能,在全遮挡场景下取得了显著的进步。我们发布了数据集、代码和评估脚本,以促进低资源运动遮挡鲁棒跟踪的可重复性和未来研究;所有材料可在https://github.com/AugustRushG/TOTNet上获得。
{"title":"TOTNet: Occlusion-aware temporal tracking for robust ball detection in sports videos","authors":"Hao Xu ,&nbsp;Arbind Agrahari Baniya ,&nbsp;Sam Wells ,&nbsp;Mohamed Reda Bouadjenek ,&nbsp;Richard Dazeley ,&nbsp;Sunil Aryal","doi":"10.1016/j.cviu.2026.104657","DOIUrl":"10.1016/j.cviu.2026.104657","url":null,"abstract":"<div><div>Ball tracking is a fundamental problem in computer vision, particularly in sports analytics, where it underpins tasks such as analyzing ball movement in soccer and basketball or detecting bounce locations in tennis and table tennis. Most existing methods are developed and evaluated on resource-rich, commercial sports footage with ideal camera angles, high-resolution imagery, and multiple viewpoints. In contrast, many other sports contexts, including semi-professional leagues, local amateur competitions, and Paralympic sports, lack these resources. Footage in these settings often comes from single, fixed, and suboptimal viewpoints, where occlusion becomes a dominant challenge for automated tracking. Existing methods frequently fall short in such conditions because their architectures and training strategies do not explicitly account for prolonged or full occlusion. To address this gap, we present the <strong>Table Tennis Australia (TTA) dataset</strong>, the first professionally annotated Paralympic table tennis benchmark with dense visibility labels, captured under realistic single-view conditions. With <strong>2,396</strong> occluded instances (including 998 fully occluded), TTA is the most occlusion-rich publicly available dataset to date. Alongside the dataset, we propose the <strong>Temporal Occlusion Tracking Network (TOTNet)</strong>, a novel tracking system designed to maintain localization accuracy even under extended occlusion. Through comprehensive experiments on four sports tracking datasets, TOTNet achieves state-of-the-art performance, with substantial gains in full-occlusion scenarios. We release the dataset, code, and evaluation scripts to foster reproducibility and future research in occlusion robust tracking for low resource sports; all materials are available at <span><span>https://github.com/AugustRushG/TOTNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104657"},"PeriodicalIF":3.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
STAR Block: Adaptive spatio-temporal recalibration for action quality assessment STAR块:行动质量评估的自适应时空再校准
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-12 DOI: 10.1016/j.cviu.2026.104656
Junhao Sun, Lanfei Zhao
Action Quality Assessment (AQA) aims to quantitatively evaluate the execution quality of complex human actions, which poses significant challenges due to the need for jointly modeling spatio-temporal dynamics and semantic structures. Existing approaches typically rely on static single-branch architectures, limiting their capacity to balance local fine-grained details and global rhythmic dependencies, especially in high-complexity scenarios. To address these limitations, we propose a novel Spatio-Temporal Adaptive Recalibration (STAR) Block, which enables highly discriminative representation learning via a multi-dimensional modeling strategy. Specifically, we first design a Multi-Scale Context Encoder to capture subtle local cues by leveraging parallel convolutions across spatial, temporal, and joint domains, enhancing the perception of motion details and short-term dynamics. Second, we introduce an Axial Attention-Based Global Dependency Modeling Module, which efficiently captures long-range temporal relationships while preserving the original spatio-temporal structure, thus reinforcing the understanding of phase coherence and motion rhythm. Third, a Dynamic Attention-Guided Adaptive Feature Fusion mechanism is proposed to integrate multi-path temporal semantics by assigning adaptive weights to local and global representations, enabling dynamic equilibrium in temporal modeling. Across multiple metrics, our STAR Block delivers remarkably superior performance with significant margins over state-of-the-art methods, achieving an average Spearman’s ρ improvement of 1.56% on AQA-7, 0.57% on MTL-AQA with DD supervision, and near-perfect 99.52% accuracy on FR-FS, as proven by extensive evaluations.
行动质量评估(Action Quality Assessment, AQA)旨在定量评估人类复杂行动的执行质量,但由于需要对时空动态和语义结构进行联合建模,这一研究面临着重大挑战。现有的方法通常依赖于静态的单分支架构,限制了它们平衡局部细粒度细节和全局节奏依赖性的能力,特别是在高复杂性的场景中。为了解决这些限制,我们提出了一种新的时空自适应再校准(STAR)块,它通过多维建模策略实现高度判别的表征学习。具体来说,我们首先设计了一个多尺度上下文编码器,通过利用跨空间、时间和关节域的并行卷积来捕获微妙的局部线索,增强对运动细节和短期动态的感知。其次,我们引入了一个基于轴向注意力的全局依赖建模模块,该模块在保留原始时空结构的同时有效地捕获了远程时间关系,从而加强了对相位相干性和运动节奏的理解。第三,提出了一种动态注意引导的自适应特征融合机制,通过对局部和全局表征赋予自适应权重来整合多路径时间语义,实现时间建模的动态平衡。在多个指标中,我们的STAR Block提供了显著优于最先进方法的性能,在DD监督下,AQA-7的平均Spearman ρ提高了1.56%,MTL-AQA的平均Spearman ρ提高了0.57%,FR-FS的准确率接近完美的99.52%,这得到了广泛评估的证明。
{"title":"STAR Block: Adaptive spatio-temporal recalibration for action quality assessment","authors":"Junhao Sun,&nbsp;Lanfei Zhao","doi":"10.1016/j.cviu.2026.104656","DOIUrl":"10.1016/j.cviu.2026.104656","url":null,"abstract":"<div><div>Action Quality Assessment (AQA) aims to quantitatively evaluate the execution quality of complex human actions, which poses significant challenges due to the need for jointly modeling spatio-temporal dynamics and semantic structures. Existing approaches typically rely on static single-branch architectures, limiting their capacity to balance local fine-grained details and global rhythmic dependencies, especially in high-complexity scenarios. To address these limitations, we propose a novel Spatio-Temporal Adaptive Recalibration (STAR) Block, which enables highly discriminative representation learning via a multi-dimensional modeling strategy. Specifically, we first design a Multi-Scale Context Encoder to capture subtle local cues by leveraging parallel convolutions across spatial, temporal, and joint domains, enhancing the perception of motion details and short-term dynamics. Second, we introduce an Axial Attention-Based Global Dependency Modeling Module, which efficiently captures long-range temporal relationships while preserving the original spatio-temporal structure, thus reinforcing the understanding of phase coherence and motion rhythm. Third, a Dynamic Attention-Guided Adaptive Feature Fusion mechanism is proposed to integrate multi-path temporal semantics by assigning adaptive weights to local and global representations, enabling dynamic equilibrium in temporal modeling. Across multiple metrics, our STAR Block delivers remarkably superior performance with significant margins over state-of-the-art methods, achieving an average Spearman’s <span><math><mi>ρ</mi></math></span> improvement of 1.56% on AQA-7, 0.57% on MTL-AQA with DD supervision, and near-perfect 99.52% accuracy on FR-FS, as proven by extensive evaluations.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104656"},"PeriodicalIF":3.5,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Indoor UAV navigation using event cameras and intermediate frame reconstruction 基于事件相机和中间帧重建的室内无人机导航
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-10 DOI: 10.1016/j.cviu.2026.104650
David Tejero-Ruiz , David Solís-Martín , Francisco J. Pérez-Grau , Joaquín Borrego-Díaz
Indoor UAV navigation faces significant challenges due to GPS signal absence and limitations of conventional visual-inertial systems under challenging lighting and motion conditions. This paper presents an event-based visual-inertial odometry system that addresses these limitations through intermediate frame reconstruction from event streams combined with established odometry algorithms. The approach leverages event cameras’ unique characteristics — microsecond temporal resolution, high dynamic range (120 dB), and motion blur immunity — to maintain stable navigation performance under conditions that cause conventional systems to fail. The system achieves real-time operation at 30 Hz frame reconstruction and 20 Hz pose estimation on embedded hardware, consuming 15 W power while adding only 50 g to the UAV platform. Experimental validation in controlled indoor environments demonstrates mean absolute pose errors of 26–42 cm across different operational conditions, comparable to conventional visual-inertial systems. Critically, the system maintains stable performance during rapid lighting transitions, showing only 59% performance degradation compared to baseline conditions, while conventional cameras typically experience complete tracking failure. The results establish event-based visual-inertial odometry as a viable alternative for indoor UAV navigation, particularly in applications requiring environmental robustness over marginal accuracy improvements under optimal conditions.
由于GPS信号缺失和传统视觉惯性系统在具有挑战性的光照和运动条件下的局限性,室内无人机导航面临着重大挑战。本文提出了一种基于事件的视觉惯性里程计系统,该系统通过从事件流中重建中间帧并结合已建立的里程计算法来解决这些限制。该方法利用事件相机的独特特性——微秒时间分辨率、高动态范围(120 dB)和运动模糊免疫——在导致传统系统失效的条件下保持稳定的导航性能。该系统在嵌入式硬件上实现了30 Hz帧重构和20 Hz姿态估计的实时运行,功耗为15 W,而无人机平台仅增加50 g。在受控的室内环境中进行的实验验证表明,在不同的操作条件下,平均绝对姿势误差为26-42厘米,与传统的视觉惯性系统相当。关键是,系统在快速照明转换期间保持稳定的性能,与基线条件相比,仅显示59%的性能下降,而传统相机通常会经历完全的跟踪故障。结果表明,基于事件的视觉惯性里程计是室内无人机导航的可行替代方案,特别是在需要在最佳条件下边际精度提高的环境鲁棒性的应用中。
{"title":"Indoor UAV navigation using event cameras and intermediate frame reconstruction","authors":"David Tejero-Ruiz ,&nbsp;David Solís-Martín ,&nbsp;Francisco J. Pérez-Grau ,&nbsp;Joaquín Borrego-Díaz","doi":"10.1016/j.cviu.2026.104650","DOIUrl":"10.1016/j.cviu.2026.104650","url":null,"abstract":"<div><div>Indoor UAV navigation faces significant challenges due to GPS signal absence and limitations of conventional visual-inertial systems under challenging lighting and motion conditions. This paper presents an event-based visual-inertial odometry system that addresses these limitations through intermediate frame reconstruction from event streams combined with established odometry algorithms. The approach leverages event cameras’ unique characteristics — microsecond temporal resolution, high dynamic range (120 dB), and motion blur immunity — to maintain stable navigation performance under conditions that cause conventional systems to fail. The system achieves real-time operation at 30 Hz frame reconstruction and 20 Hz pose estimation on embedded hardware, consuming 15 W power while adding only 50 g to the UAV platform. Experimental validation in controlled indoor environments demonstrates mean absolute pose errors of 26–42 cm across different operational conditions, comparable to conventional visual-inertial systems. Critically, the system maintains stable performance during rapid lighting transitions, showing only 59% performance degradation compared to baseline conditions, while conventional cameras typically experience complete tracking failure. The results establish event-based visual-inertial odometry as a viable alternative for indoor UAV navigation, particularly in applications requiring environmental robustness over marginal accuracy improvements under optimal conditions.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104650"},"PeriodicalIF":3.5,"publicationDate":"2026-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145978912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UAVDet: A CNN–Mamba hybrid network for efficient small object detection in UAV imagery UAVDet:一种用于无人机图像中高效小目标检测的CNN-Mamba混合网络
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.cviu.2026.104637
Yiming Yang, Feng Guo, Pei Niu
Real-time object detection is pivotal in traffic-related Unmanned Aerial Vehicles (UAV) applications. However, UAV imagery presents significant challenges due to the predominance of small objects and complex backgrounds. Traditional backbones generally perform aggressive early-stage downsampling, causing the loss of fine-grained features. To address these issues, we propose UAVDet, a real-time detection model that combines Convolutional Neural Network (CNN) and Mamba architectures. First, we revisit the conventional backbone design by reconfiguring its depth and width, with a focus on preserving fine-grained details crucial for small object detection. Second, we propose the Cross Stage Partial Mamba (CSPMB) module, which integrates the Mamba structure into the CNN framework to enhance global feature representation and improve robustness against complex background interference. Third, we design Tiny-focused Feature Pyramid Network (TFPN) by rebalancing the feature fusion flow and replacing the large-object detection head with a tiny-object detection head, which significantly improves the perception of small objects. Comprehensive experiments on the VisDrone dataset show that our method improves AP and APS by 4.5% and 5.0%, respectively, while reducing parameters by 84.9% compared to the baseline. It also reaches 53 FPS on an RTX 4090, exceeding the 30 FPS real-time threshold. Additional evaluations on UAVDT and DroneVehicle further verify the method’s robust generalization. These results indicate the effectiveness of the developed method in UAV image detection.
实时目标检测是交通相关无人机(UAV)应用的关键。然而,由于小物体和复杂背景的优势,无人机图像呈现出显著的挑战。传统的骨干网通常会执行激进的早期降采样,导致细粒度特征的丢失。为了解决这些问题,我们提出了UAVDet,一种结合卷积神经网络(CNN)和曼巴架构的实时检测模型。首先,我们通过重新配置其深度和宽度来重新审视传统的骨干设计,重点是保留对小目标检测至关重要的细粒度细节。其次,我们提出了跨阶段部分曼巴(CSPMB)模块,该模块将曼巴结构集成到CNN框架中,以增强全局特征表示并提高对复杂背景干扰的鲁棒性。第三,通过重新平衡特征融合流,将大目标检测头替换为小目标检测头,设计了聚焦小目标的特征金字塔网络(TFPN),显著提高了小目标的感知能力。在VisDrone数据集上的综合实验表明,与基线相比,我们的方法将AP和APS分别提高了4.5%和5.0%,同时将参数降低了84.9%。在RTX 4090上达到53 FPS,超过了30 FPS的实时阈值。对UAVDT和无人机的附加评估进一步验证了该方法的鲁棒泛化。这些结果表明了该方法在无人机图像检测中的有效性。
{"title":"UAVDet: A CNN–Mamba hybrid network for efficient small object detection in UAV imagery","authors":"Yiming Yang,&nbsp;Feng Guo,&nbsp;Pei Niu","doi":"10.1016/j.cviu.2026.104637","DOIUrl":"10.1016/j.cviu.2026.104637","url":null,"abstract":"<div><div>Real-time object detection is pivotal in traffic-related Unmanned Aerial Vehicles (UAV) applications. However, UAV imagery presents significant challenges due to the predominance of small objects and complex backgrounds. Traditional backbones generally perform aggressive early-stage downsampling, causing the loss of fine-grained features. To address these issues, we propose UAVDet, a real-time detection model that combines Convolutional Neural Network (CNN) and Mamba architectures. First, we revisit the conventional backbone design by reconfiguring its depth and width, with a focus on preserving fine-grained details crucial for small object detection. Second, we propose the Cross Stage Partial Mamba (CSPMB) module, which integrates the Mamba structure into the CNN framework to enhance global feature representation and improve robustness against complex background interference. Third, we design Tiny-focused Feature Pyramid Network (TFPN) by rebalancing the feature fusion flow and replacing the large-object detection head with a tiny-object detection head, which significantly improves the perception of small objects. Comprehensive experiments on the VisDrone dataset show that our method improves AP and AP<span><math><msub><mrow></mrow><mrow><mi>S</mi></mrow></msub></math></span> by 4.5% and 5.0%, respectively, while reducing parameters by 84.9% compared to the baseline. It also reaches 53 FPS on an RTX 4090, exceeding the 30 FPS real-time threshold. Additional evaluations on UAVDT and DroneVehicle further verify the method’s robust generalization. These results indicate the effectiveness of the developed method in UAV image detection.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104637"},"PeriodicalIF":3.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local–global collaborative feature learning with level-wise decoding for infrared small target detection 基于分层解码的局部-全局协同特征学习红外小目标检测
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.cviu.2026.104636
Weiwei Duan, Luping Ji, Shengjia Chen, Jianghong Huang
Due to the low signal-to-noise ratio and weak visual contrast, infrared small targets are often submerged in the background. Therefore, it is crucial to preserve target information while extracting distinctive features that distinguish them from the background. However, existing methods generally rely on convolutions and transformers in isolation, which limits their ability to capture robust target features in complex scenes. To address this issue, we propose a new local–global feature collaborative learning (LGFC) framework. It could adequately integrate the local spatial features with the global context of targets in a unified manner. Specifically, we develop an enhanced Gaussian-mask Vision Transformer group with Global Gaussian Attention and Local Window Attention to extract refined global features. The local coarse features obtained from the convolution encoder are then coordinated with the refined global features through Local–Global Collaborating. Moreover, to avoid feature loss during decoding, we propose a level-wise decoding strategy with Cross-layer Feature Interaction to to mitigate information loss in deep networks. Additionally, we introduce a Coarse-to-Fine Refinement post-processing mechanism to improve the precision of target contours. The extensive experiments on three public datasets (NUAA-SIRST, IRSTD-1K and SIRST-AUG) demonstrate the superiority and generalization ability of our proposed LGFC framework for infrared small target detection, outperforming state-of-the-art methods by approximately 2.3% in F1-score on each dataset.
红外小目标由于信噪比低、视觉对比度弱,经常被淹没在背景中。因此,在保留目标信息的同时,提取与背景相区别的特征是至关重要的。然而,现有的方法通常孤立地依赖于卷积和变压器,这限制了它们在复杂场景中捕获鲁棒目标特征的能力。为了解决这个问题,我们提出了一个新的局部-全局特征协同学习(LGFC)框架。它可以将目标的局部空间特征与目标的全局背景充分统一起来。具体来说,我们开发了一个增强的高斯掩模视觉变压器组,具有全局高斯注意和局部窗口注意,以提取精细的全局特征。然后通过局部-全局协作将卷积编码器得到的局部粗特征与精细的全局特征进行协调。此外,为了避免解码过程中的特征丢失,我们提出了一种具有跨层特征交互的分层解码策略来减轻深度网络中的信息丢失。此外,我们还引入了一种从粗到精的后处理机制,以提高目标轮廓的精度。在三个公共数据集(NUAA-SIRST, IRSTD-1K和SIRST-AUG)上进行的大量实验表明,我们提出的LGFC框架在红外小目标检测方面的优势和泛化能力,在每个数据集上的f1得分都比目前最先进的方法高出约2.3%。
{"title":"Local–global collaborative feature learning with level-wise decoding for infrared small target detection","authors":"Weiwei Duan,&nbsp;Luping Ji,&nbsp;Shengjia Chen,&nbsp;Jianghong Huang","doi":"10.1016/j.cviu.2026.104636","DOIUrl":"10.1016/j.cviu.2026.104636","url":null,"abstract":"<div><div>Due to the low signal-to-noise ratio and weak visual contrast, infrared small targets are often submerged in the background. Therefore, it is crucial to preserve target information while extracting distinctive features that distinguish them from the background. However, existing methods generally rely on convolutions and transformers in isolation, which limits their ability to capture robust target features in complex scenes. To address this issue, we propose a new local–global feature collaborative learning (LGFC) framework. It could adequately integrate the local spatial features with the global context of targets in a unified manner. Specifically, we develop an enhanced <em>Gaussian-mask Vision Transformer</em> group with <em>Global Gaussian Attention</em> and <em>Local Window Attention</em> to extract refined global features. The local coarse features obtained from the convolution encoder are then coordinated with the refined global features through <em>Local–Global Collaborating</em>. Moreover, to avoid feature loss during decoding, we propose a level-wise decoding strategy with <em>Cross-layer Feature Interaction</em> to to mitigate information loss in deep networks. Additionally, we introduce a <em>Coarse-to-Fine Refinement</em> post-processing mechanism to improve the precision of target contours. The extensive experiments on three public datasets (NUAA-SIRST, IRSTD-1K and SIRST-AUG) demonstrate the superiority and generalization ability of our proposed LGFC framework for infrared small target detection, outperforming state-of-the-art methods by approximately 2.3% in F1-score on each dataset.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104636"},"PeriodicalIF":3.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CSNet: A content and structure-aware approach for color constancy 色彩稳定性的内容和结构感知方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.cviu.2026.104638
Zhuo-Ming Du , Hong-An Li , Qian Yu , Wen-He Chen , Fei-long Han
Accurate estimation and correction of global illuminant color, known as color constancy, is crucial for computational photography and computer vision but remains challenging under complex lighting conditions. We propose CSNet, an end-to-end framework that improves color constancy through a novel content-guided feature fusion approach. The input image is first decomposed into three precomputed components: mean intensity, variation magnitude, and variation direction. These components are dynamically reweighted by the Content-Weighting Network (CWN), which generates spatially varying weight maps by leveraging both local and global image features. The reweighted components are fused via the Adaptive Fusion Module (AFM) to produce an HDR-like intermediate representation. This representation is then processed by the Illumination Prediction Network (IPN), which applies semantic-aware weighting to estimate the global illuminant color as an RGB triplet. Extensive experiments on standard benchmarks demonstrate that CSNet achieves state-of-the-art performance, offering robust and visually consistent results under diverse lighting conditions. These advantages make CSNet a powerful tool for applications such as automatic photo correction and augmented reality.
全局光源颜色的准确估计和校正,即颜色常数,对于计算摄影和计算机视觉至关重要,但在复杂的照明条件下仍然具有挑战性。我们提出了CSNet,这是一个端到端框架,通过一种新颖的内容引导特征融合方法来提高颜色稳定性。首先将输入图像分解为三个预先计算的分量:平均强度、变化幅度和变化方向。这些组件由内容加权网络(Content-Weighting Network, CWN)动态地重新加权,该网络通过利用局部和全局图像特征生成空间变化的权重图。重新加权的分量通过自适应融合模块(AFM)进行融合,产生类似hdr的中间表示。然后由照明预测网络(IPN)处理该表示,该网络应用语义感知加权来估计作为RGB三元组的全局光源颜色。在标准基准测试上进行的大量实验表明,CSNet实现了最先进的性能,在不同的照明条件下提供稳健且视觉一致的结果。这些优点使CSNet成为自动照片校正和增强现实等应用程序的强大工具。
{"title":"CSNet: A content and structure-aware approach for color constancy","authors":"Zhuo-Ming Du ,&nbsp;Hong-An Li ,&nbsp;Qian Yu ,&nbsp;Wen-He Chen ,&nbsp;Fei-long Han","doi":"10.1016/j.cviu.2026.104638","DOIUrl":"10.1016/j.cviu.2026.104638","url":null,"abstract":"<div><div>Accurate estimation and correction of global illuminant color, known as color constancy, is crucial for computational photography and computer vision but remains challenging under complex lighting conditions. We propose CSNet, an end-to-end framework that improves color constancy through a novel content-guided feature fusion approach. The input image is first decomposed into three precomputed components: mean intensity, variation magnitude, and variation direction. These components are dynamically reweighted by the Content-Weighting Network (CWN), which generates spatially varying weight maps by leveraging both local and global image features. The reweighted components are fused via the Adaptive Fusion Module (AFM) to produce an HDR-like intermediate representation. This representation is then processed by the Illumination Prediction Network (IPN), which applies semantic-aware weighting to estimate the global illuminant color as an RGB triplet. Extensive experiments on standard benchmarks demonstrate that CSNet achieves state-of-the-art performance, offering robust and visually consistent results under diverse lighting conditions. These advantages make CSNet a powerful tool for applications such as automatic photo correction and augmented reality.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104638"},"PeriodicalIF":3.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A dual-channel model based on multi-feature fusion for face liveness detection 基于多特征融合的双通道人脸活体检测模型
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-07 DOI: 10.1016/j.cviu.2026.104635
Bowen Xu , Yaru Sui , Longxin Liu, Zhenlong Ma, Yunlong Shi, Wentong Li, Xiaoqiang Ji
Face liveness detection algorithms are widely used in anti-spoofing applications, which guarantee the accuracy and security of face recognition systems. However, with the continuous development of technologies such as 3D printing and artificial intelligence, traditional face-liveness detection algorithms struggle to withstand spoofing attacks effectively. In this paper, we propose a multi-feature fusion algorithm using only facial video for face liveness detection. Initially, we design a dual-channel network named DC-Net. It can extract robust remote photoplethysmography signals directly from 5-second facial videos, as well as fine global texture features from the keyframes of the image sequence. Subsequently, a fusion module based on the attention mechanism is used to carry out feature-level fusion. Ultimately, we use the fully connected layer for binary classification. Our methodology was validated using the REPLAY-ATTACK dataset and the 3DMAD dataset, demonstrating that for printing attacks, screen replay attacks, and 3D mask attacks, our approach attained an accuracy of 99.79% and 100% on both datasets, respectively. Meanwhile, cross-dataset testing was conducted on the CASIA-FASD and HKBU-MARs V1+ datasets, achieving HTER of 25.56% and 0.00%, respectively. This indicates that the algorithm has good accuracy and robustness in dealing with spoofing attacks in many different scenarios, which provides important ideas and technical support for the design and implementation of reliable face recognition systems.
人脸活体检测算法广泛应用于防欺骗应用,保证了人脸识别系统的准确性和安全性。然而,随着3D打印、人工智能等技术的不断发展,传统的人脸检测算法难以有效抵御欺骗攻击。本文提出了一种仅基于人脸视频的多特征融合算法,用于人脸活体检测。最初,我们设计了一个双通道网络,命名为DC-Net。它可以直接从5秒的人脸视频中提取鲁棒的远程光体积脉搏波信号,并从图像序列的关键帧中提取精细的全局纹理特征。随后,利用基于注意机制的融合模块进行特征级融合。最后,我们使用全连通层进行二值分类。使用replay - attack数据集和3DMAD数据集验证了我们的方法,表明对于打印攻击,屏幕重播攻击和3D掩码攻击,我们的方法在两个数据集上分别达到了99.79%和100%的准确率。同时,对CASIA-FASD和HKBU-MARs V1+数据集进行了跨数据集检验,HTER分别达到25.56%和0.00%。这表明该算法在应对多种不同场景下的欺骗攻击方面具有良好的准确性和鲁棒性,为设计和实现可靠的人脸识别系统提供了重要的思想和技术支持。
{"title":"A dual-channel model based on multi-feature fusion for face liveness detection","authors":"Bowen Xu ,&nbsp;Yaru Sui ,&nbsp;Longxin Liu,&nbsp;Zhenlong Ma,&nbsp;Yunlong Shi,&nbsp;Wentong Li,&nbsp;Xiaoqiang Ji","doi":"10.1016/j.cviu.2026.104635","DOIUrl":"10.1016/j.cviu.2026.104635","url":null,"abstract":"<div><div>Face liveness detection algorithms are widely used in anti-spoofing applications, which guarantee the accuracy and security of face recognition systems. However, with the continuous development of technologies such as 3D printing and artificial intelligence, traditional face-liveness detection algorithms struggle to withstand spoofing attacks effectively. In this paper, we propose a multi-feature fusion algorithm using only facial video for face liveness detection. Initially, we design a dual-channel network named DC-Net. It can extract robust remote photoplethysmography signals directly from 5-second facial videos, as well as fine global texture features from the keyframes of the image sequence. Subsequently, a fusion module based on the attention mechanism is used to carry out feature-level fusion. Ultimately, we use the fully connected layer for binary classification. Our methodology was validated using the REPLAY-ATTACK dataset and the 3DMAD dataset, demonstrating that for printing attacks, screen replay attacks, and 3D mask attacks, our approach attained an accuracy of 99.79% and 100% on both datasets, respectively. Meanwhile, cross-dataset testing was conducted on the CASIA-FASD and HKBU-MARs V1+ datasets, achieving HTER of 25.56% and 0.00%, respectively. This indicates that the algorithm has good accuracy and robustness in dealing with spoofing attacks in many different scenarios, which provides important ideas and technical support for the design and implementation of reliable face recognition systems.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104635"},"PeriodicalIF":3.5,"publicationDate":"2026-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145928064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1