首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
SynTaskNet: A synergistic multi-task network for joint segmentation and classification of small anatomical structures in ultrasound imaging SynTaskNet:超声成像中用于关节分割和小解剖结构分类的协同多任务网络
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-18 DOI: 10.1016/j.cviu.2025.104616
Abdulrhman H. Al-Jebrni , Saba Ghazanfar Ali , Bin Sheng , Huating Li , Xiao Lin , Ping Li , Younhyun Jung , Jinman Kim , Li Xu , Lixin Jiang , Jing Du
Segmenting small, low-contrast anatomical structures and classifying their pathological status in ultrasound (US) images remain challenging tasks in computer vision, especially under the noise and ambiguity inherent in real-world clinical data. Papillary thyroid microcarcinoma (PTMC), characterized by nodules 1.0 cm, exemplifies these challenges where both precise segmentation and accurate lymph node metastasis (LNM) prediction are essential for informed clinical decisions. We propose SynTaskNet, a synergistic multi-task learning (MTL) architecture that jointly performs PTMC nodule segmentation and LNM classification from US images. Built upon a DenseNet201 backbone, SynTaskNet incorporates several specialized modules: a Coordinated Depth-wise Convolution (CDC) layer for enhancing spatial features, an Adaptive Context Block (ACB) for embedding contextual dependencies, and a Multi-scale Contextual Boundary Attention (MCBA) module to improve boundary localization in low-contrast regions. To strengthen task interaction, we introduce a Selective Enhancement Fusion (SEF) mechanism that hierarchically integrates features across three semantic levels, enabling effective information exchange between segmentation and classification branches. On top of this, we formulate a synergistic learning scheme wherein an Auxiliary Segmentation Map (ASM) generated by the segmentation decoder is injected into SEF’s third class-specific fusion path to guide LNM classification. In parallel, the predicted LNM label is concatenated with the third-path SEF output to refine the Final Segmentation Map (FSM), enabling bidirectional task reinforcement. Extensive evaluations on a dedicated PTMC US dataset demonstrate that SynTaskNet achieves state-of-the-art performance, with a Dice score of 93.0% for segmentation and a classification accuracy of 94.2% for LNM prediction, validating its clinical relevance and technical efficacy.
在超声(US)图像中分割小的、低对比度的解剖结构并对其病理状态进行分类仍然是计算机视觉中具有挑战性的任务,特别是在现实世界临床数据中固有的噪声和模糊性下。甲状腺乳头状微癌(PTMC)以结节≤1.0 cm为特征,体现了这些挑战,其中精确的分割和准确的淋巴结转移(LNM)预测对于知情的临床决策至关重要。我们提出了SynTaskNet,这是一种协同多任务学习(MTL)架构,可以联合执行PTMC模块分割和LNM分类。基于DenseNet201主干,SynTaskNet集成了几个专用模块:用于增强空间特征的协调深度卷积(CDC)层,用于嵌入上下文依赖的自适应上下文块(ACB),以及用于改进低对比度区域边界定位的多尺度上下文边界注意(MCBA)模块。为了加强任务交互,我们引入了一种选择性增强融合(SEF)机制,该机制分层地集成了三个语义级别的特征,从而实现了分词和分类分支之间的有效信息交换。在此基础上,我们制定了一种协同学习方案,将分割解码器生成的辅助分割映射(ASM)注入到SEF的第三类特定融合路径中,以指导LNM分类。同时,将预测的LNM标签与第三路径SEF输出连接起来,以改进最终分割映射(FSM),从而实现双向任务强化。对专用PTMC US数据集的广泛评估表明,SynTaskNet达到了最先进的性能,分割的Dice得分为93.0%,LNM预测的分类准确率为94.2%,验证了其临床相关性和技术有效性。
{"title":"SynTaskNet: A synergistic multi-task network for joint segmentation and classification of small anatomical structures in ultrasound imaging","authors":"Abdulrhman H. Al-Jebrni ,&nbsp;Saba Ghazanfar Ali ,&nbsp;Bin Sheng ,&nbsp;Huating Li ,&nbsp;Xiao Lin ,&nbsp;Ping Li ,&nbsp;Younhyun Jung ,&nbsp;Jinman Kim ,&nbsp;Li Xu ,&nbsp;Lixin Jiang ,&nbsp;Jing Du","doi":"10.1016/j.cviu.2025.104616","DOIUrl":"10.1016/j.cviu.2025.104616","url":null,"abstract":"<div><div>Segmenting small, low-contrast anatomical structures and classifying their pathological status in ultrasound (US) images remain challenging tasks in computer vision, especially under the noise and ambiguity inherent in real-world clinical data. Papillary thyroid microcarcinoma (PTMC), characterized by nodules <span><math><mrow><mo>≤</mo><mn>1</mn><mo>.</mo><mn>0</mn></mrow></math></span> cm, exemplifies these challenges where both precise segmentation and accurate lymph node metastasis (LNM) prediction are essential for informed clinical decisions. We propose SynTaskNet, a synergistic multi-task learning (MTL) architecture that jointly performs PTMC nodule segmentation and LNM classification from US images. Built upon a DenseNet201 backbone, SynTaskNet incorporates several specialized modules: a Coordinated Depth-wise Convolution (CDC) layer for enhancing spatial features, an Adaptive Context Block (ACB) for embedding contextual dependencies, and a Multi-scale Contextual Boundary Attention (MCBA) module to improve boundary localization in low-contrast regions. To strengthen task interaction, we introduce a Selective Enhancement Fusion (SEF) mechanism that hierarchically integrates features across three semantic levels, enabling effective information exchange between segmentation and classification branches. On top of this, we formulate a synergistic learning scheme wherein an Auxiliary Segmentation Map (ASM) generated by the segmentation decoder is injected into SEF’s third class-specific fusion path to guide LNM classification. In parallel, the predicted LNM label is concatenated with the third-path SEF output to refine the Final Segmentation Map (FSM), enabling bidirectional task reinforcement. Extensive evaluations on a dedicated PTMC US dataset demonstrate that SynTaskNet achieves state-of-the-art performance, with a Dice score of 93.0% for segmentation and a classification accuracy of 94.2% for LNM prediction, validating its clinical relevance and technical efficacy.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104616"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Question-guided multigranular visual augmentation for knowledge-based visual question answering 基于知识的视觉问答问题导向的多颗粒视觉增强
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-20 DOI: 10.1016/j.cviu.2025.104569
Jing Liu , Lizong Zhang , Chong Mu , Guangxi Lu , Ben Zhang , Junsong Li
In knowledge-based visual question answering, most current research focuses on the integration of external knowledge with VQA systems. However, the extraction of visual features within knowledge-based VQA remains relatively unexplored. This is surprising since even for the same image, answering different questions requires attention to different visual regions. In this paper, we propose a novel question-guided multigranular visual augmentation method for knowledge-based VQA tasks. Our method uses input questions to identify and focus on question-related regions within the image, which improves prediction quality. Specifically, our method first performs semantic embedding learning for questions at both the word-level and the phrase-level. To preserve rich visual information for QA, our method uses questions as a guide to extract question-related visual features. This is implemented by multiple convolution operations. In these operations, the convolutional kernels are dynamically derived from the representations of questions. By capturing visual information from diverse perspectives, our method extract information at the word level, phrase level, and common level more comprehensively. Additionally, relevant knowledge is retrieved from knowledge graph through entity linking and random walk techniques to respond to the question. A series of experiments are conducted on public knowledge-based VQA datasets to demonstrate the effectiveness of our model. The experimental results show that our method achieves state-of-the-art performance.
在基于知识的可视化问答中,目前的研究主要集中在外部知识与VQA系统的集成上。然而,基于知识的VQA中视觉特征的提取仍然相对未被探索。这是令人惊讶的,因为即使是同一幅图像,回答不同的问题也需要关注不同的视觉区域。本文针对基于知识的VQA任务,提出了一种新的问题导向的多粒度视觉增强方法。我们的方法使用输入问题来识别和关注图像中与问题相关的区域,从而提高了预测质量。具体来说,我们的方法首先在单词级和短语级对问题进行语义嵌入学习。为了为QA保留丰富的视觉信息,我们的方法使用问题作为向导来提取与问题相关的视觉特征。这是通过多个卷积操作实现的。在这些操作中,卷积核是由问题的表示动态导出的。我们的方法通过从多个角度获取视觉信息,更全面地提取词级、短语级和通用级的信息。此外,通过实体链接和随机游走技术从知识图谱中检索相关知识来响应问题。在基于公共知识的VQA数据集上进行了一系列实验,以验证该模型的有效性。实验结果表明,我们的方法达到了最先进的性能。
{"title":"Question-guided multigranular visual augmentation for knowledge-based visual question answering","authors":"Jing Liu ,&nbsp;Lizong Zhang ,&nbsp;Chong Mu ,&nbsp;Guangxi Lu ,&nbsp;Ben Zhang ,&nbsp;Junsong Li","doi":"10.1016/j.cviu.2025.104569","DOIUrl":"10.1016/j.cviu.2025.104569","url":null,"abstract":"<div><div>In knowledge-based visual question answering, most current research focuses on the integration of external knowledge with VQA systems. However, the extraction of visual features within knowledge-based VQA remains relatively unexplored. This is surprising since even for the same image, answering different questions requires attention to different visual regions. In this paper, we propose a novel question-guided multigranular visual augmentation method for knowledge-based VQA tasks. Our method uses input questions to identify and focus on question-related regions within the image, which improves prediction quality. Specifically, our method first performs semantic embedding learning for questions at both the word-level and the phrase-level. To preserve rich visual information for QA, our method uses questions as a guide to extract question-related visual features. This is implemented by multiple convolution operations. In these operations, the convolutional kernels are dynamically derived from the representations of questions. By capturing visual information from diverse perspectives, our method extract information at the word level, phrase level, and common level more comprehensively. Additionally, relevant knowledge is retrieved from knowledge graph through entity linking and random walk techniques to respond to the question. A series of experiments are conducted on public knowledge-based VQA datasets to demonstrate the effectiveness of our model. The experimental results show that our method achieves state-of-the-art performance.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104569"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A vision-based framework and dataset for human behavior understanding in industrial assembly lines 用于工业装配线中人类行为理解的基于视觉的框架和数据集
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-06 DOI: 10.1016/j.cviu.2025.104592
Konstantinos Papoutsakis , Nikolaos Bakalos , Athena Zacharia , Konstantinos Fragkoulis , Georgia Kapetadimitri , Maria Pateraki
This paper introduces a vision-based framework and dataset for capturing and understanding human behavior in industrial assembly lines, focusing on car door manufacturing. The framework leverages advanced computer vision techniques to estimate workers’ locations and 3D poses and analyze work postures, actions, and task progress. A key contribution is the introduction of the CarDA dataset, which contains domain-relevant assembly actions captured in a realistic setting to support the analysis of the framework for human pose and action analysis. The dataset comprises time-synchronized multi-camera RGB-D videos, motion capture data recorded in a real car manufacturing environment, and annotations for EAWS-based ergonomic risk scores and assembly activities. Experimental results demonstrate the effectiveness of the proposed approach in classifying worker postures and robust performance in monitoring assembly task progress.
本文介绍了一个基于视觉的框架和数据集,用于捕获和理解工业装配线中的人类行为,重点是车门制造。该框架利用先进的计算机视觉技术来估计工人的位置和3D姿势,并分析工作姿势、动作和任务进度。一个关键的贡献是引入了CarDA数据集,它包含了在现实环境中捕获的领域相关装配动作,以支持对人体姿势和动作分析框架的分析。该数据集包括时间同步的多摄像头RGB-D视频,在真实汽车制造环境中记录的动作捕捉数据,以及基于eaws的人体工程学风险评分和装配活动的注释。实验结果表明,该方法在工人姿态分类方面是有效的,在监控装配任务进度方面具有鲁棒性。
{"title":"A vision-based framework and dataset for human behavior understanding in industrial assembly lines","authors":"Konstantinos Papoutsakis ,&nbsp;Nikolaos Bakalos ,&nbsp;Athena Zacharia ,&nbsp;Konstantinos Fragkoulis ,&nbsp;Georgia Kapetadimitri ,&nbsp;Maria Pateraki","doi":"10.1016/j.cviu.2025.104592","DOIUrl":"10.1016/j.cviu.2025.104592","url":null,"abstract":"<div><div>This paper introduces a vision-based framework and dataset for capturing and understanding human behavior in industrial assembly lines, focusing on car door manufacturing. The framework leverages advanced computer vision techniques to estimate workers’ locations and 3D poses and analyze work postures, actions, and task progress. A key contribution is the introduction of the CarDA dataset, which contains domain-relevant assembly actions captured in a realistic setting to support the analysis of the framework for human pose and action analysis. The dataset comprises time-synchronized multi-camera RGB-D videos, motion capture data recorded in a real car manufacturing environment, and annotations for EAWS-based ergonomic risk scores and assembly activities. Experimental results demonstrate the effectiveness of the proposed approach in classifying worker postures and robust performance in monitoring assembly task progress.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104592"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring visual language models for driver gaze estimation: A task-based approach to debugging AI 探索驾驶员注视估计的视觉语言模型:基于任务的人工智能调试方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-08 DOI: 10.1016/j.cviu.2025.104593
Paola Natalia Cañas , Alejandro H. Artiles , Marcos Nieto , Igor Rodríguez
Visual Language Models (VLMs) have demonstrated superior context understanding and generalization across various tasks compared to models tailored for specific tasks. However, due to their complexity and limited information on their training processes, estimating their performance on specific tasks often requires exhaustive testing, which can be costly and may not account for edge cases. To leverage the zero-shot capabilities of VLMs in safety-critical applications like Driver Monitoring Systems, it is crucial to characterize their knowledge and abilities to ensure consistent performance. This research proposes a methodology to explore and gain a deeper understanding of the functioning of these models in driver’s gaze estimation. It involves detailed task decomposition, identification of necessary data knowledge and abilities (e.g., understanding gaze concepts), and exploration through targeted prompting strategies. Applying this methodology to several VLMs (Idefics2, Qwen2-VL, Moondream, GPT-4o) revealed significant limitations, including sensitivity to prompt phrasing, vocabulary mismatches, reliance on image-relative spatial frames, and difficulties inferring non-visible elements. The findings from this evaluation have highlighted specific areas for improvement and guided the development of more effective prompting and fine-tuning strategies, resulting in enhanced performance comparable with traditional CNN-based approaches. This research is also useful for initial model filtering, for selecting the best model among alternatives and for understanding the model’s limitations and expected behaviors, thereby increasing reliability.
与为特定任务量身定制的模型相比,视觉语言模型(VLMs)在各种任务中表现出更好的上下文理解和泛化能力。然而,由于它们的复杂性和关于它们的训练过程的有限信息,估计它们在特定任务上的表现通常需要详尽的测试,这可能是昂贵的,并且可能无法解释边缘情况。为了在驾驶员监控系统(Driver Monitoring Systems)等安全关键应用中充分利用vlm的零射击功能,对其知识和能力进行特征描述以确保一致的性能至关重要。本研究提出了一种方法来探索和深入了解这些模型在驾驶员注视估计中的功能。它包括详细的任务分解,识别必要的数据知识和能力(例如,理解凝视概念),以及通过有针对性的提示策略进行探索。将这种方法应用于几个VLMs (Idefics2, Qwen2-VL, Moondream, gpt - 40)发现了显著的局限性,包括对提示短语的敏感性,词汇不匹配,对图像相对空间框架的依赖,以及推断不可见元素的困难。这项评估的结果突出了需要改进的具体领域,并指导了更有效的提示和微调策略的发展,从而提高了与传统的基于cnn的方法相媲美的性能。该研究还有助于初始模型过滤,在备选模型中选择最佳模型,了解模型的局限性和预期行为,从而提高可靠性。
{"title":"Exploring visual language models for driver gaze estimation: A task-based approach to debugging AI","authors":"Paola Natalia Cañas ,&nbsp;Alejandro H. Artiles ,&nbsp;Marcos Nieto ,&nbsp;Igor Rodríguez","doi":"10.1016/j.cviu.2025.104593","DOIUrl":"10.1016/j.cviu.2025.104593","url":null,"abstract":"<div><div>Visual Language Models (VLMs) have demonstrated superior context understanding and generalization across various tasks compared to models tailored for specific tasks. However, due to their complexity and limited information on their training processes, estimating their performance on specific tasks often requires exhaustive testing, which can be costly and may not account for edge cases. To leverage the zero-shot capabilities of VLMs in safety-critical applications like Driver Monitoring Systems, it is crucial to characterize their knowledge and abilities to ensure consistent performance. This research proposes a methodology to explore and gain a deeper understanding of the functioning of these models in driver’s gaze estimation. It involves detailed task decomposition, identification of necessary data knowledge and abilities (e.g., understanding gaze concepts), and exploration through targeted prompting strategies. Applying this methodology to several VLMs (Idefics2, Qwen2-VL, Moondream, GPT-4o) revealed significant limitations, including sensitivity to prompt phrasing, vocabulary mismatches, reliance on image-relative spatial frames, and difficulties inferring non-visible elements. The findings from this evaluation have highlighted specific areas for improvement and guided the development of more effective prompting and fine-tuning strategies, resulting in enhanced performance comparable with traditional CNN-based approaches. This research is also useful for initial model filtering, for selecting the best model among alternatives and for understanding the model’s limitations and expected behaviors, thereby increasing reliability.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104593"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection 基于双向适配器的自适应空频交互网络的泛化人脸伪造检测
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-04 DOI: 10.1016/j.cviu.2025.104599
Junchang Jing , Yanyan Lv , Ming Li , Dong Liu , Zhiyong Zhang
Although existing face forgery detection methods have demonstrated remarkable performance, they still suffer a significant performance drop when confronted with samples generated by unseen manipulation techniques. This poor generalization performance arises from the detectors overfitting to specific datasets and failing to learn generalizable feature representations. To tackle this problem, we propose a novel adaptive spatial-frequency interactive network with Bi-directional adapter for generalizable face forgery detection. Specifically, we design an Adaptive Region Dynamic Convolution (ARDConv) module and an Adaptive Frequency Dynamic Filter (AFDF) module. The ARDConv module divides the spatial dimension into several regions based on the guided features of the input image, and employs the multi-head cross-attention mechanism to dynamically generate filters, effectively focusing on subtle texture artifacts in the spatial domain. The AFDF module applies frequency decomposition and dynamic convolution kernels in the frequency domain, which adaptively selecting frequency information to capture refined clues. Additionally, we present a dual-domain fusion module based on Bi-directional Adapter (BAT) to transfer domain-specific feature information from one domain to another. The advantage of this module lies in its ability to enable efficient feature fusion by fine-tuning only minimal BAT parameters. Our method exhibits exceptional generalization capabilities in cross-dataset evaluation, outperforming optimal approaches by 3.07% and 3.15% AUC improvements. Moreover, the proposed approach only utilizes 547K trainable parameters and 130M FLOPs, significantly reducing computational costs compared to other state-of-the-art face forgery detection methods. The code is released at https://github.com/lvyanyana/ASFI.
尽管现有的人脸伪造检测方法已经表现出了显著的性能,但当面对由看不见的操纵技术产生的样本时,它们的性能仍然会下降。这种较差的泛化性能源于检测器对特定数据集的过度拟合和未能学习可泛化的特征表示。为了解决这一问题,我们提出了一种具有双向适配器的自适应空频交互网络,用于通用人脸伪造检测。具体来说,我们设计了一个自适应区域动态卷积(ARDConv)模块和一个自适应频率动态滤波器(AFDF)模块。ARDConv模块根据输入图像的引导特征将空间维度划分为多个区域,并采用多头交叉注意机制动态生成滤波器,有效聚焦空间域中的细微纹理伪像。AFDF模块在频域应用频率分解和动态卷积核,自适应选择频率信息,捕捉精细线索。此外,我们还提出了一种基于双向适配器(BAT)的双域融合模块,将特定领域的特征信息从一个领域传递到另一个领域。该模块的优点在于它能够通过微调最小的BAT参数来实现有效的特征融合。我们的方法在跨数据集评估中表现出卓越的泛化能力,比最优方法的AUC提高了3.07%和3.15%。此外,该方法仅使用547K可训练参数和130M FLOPs,与其他先进的人脸伪造检测方法相比,显著降低了计算成本。该代码发布在https://github.com/lvyanyana/ASFI。
{"title":"Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection","authors":"Junchang Jing ,&nbsp;Yanyan Lv ,&nbsp;Ming Li ,&nbsp;Dong Liu ,&nbsp;Zhiyong Zhang","doi":"10.1016/j.cviu.2025.104599","DOIUrl":"10.1016/j.cviu.2025.104599","url":null,"abstract":"<div><div>Although existing face forgery detection methods have demonstrated remarkable performance, they still suffer a significant performance drop when confronted with samples generated by unseen manipulation techniques. This poor generalization performance arises from the detectors overfitting to specific datasets and failing to learn generalizable feature representations. To tackle this problem, we propose a novel adaptive spatial-frequency interactive network with Bi-directional adapter for generalizable face forgery detection. Specifically, we design an Adaptive Region Dynamic Convolution (ARDConv) module and an Adaptive Frequency Dynamic Filter (AFDF) module. The ARDConv module divides the spatial dimension into several regions based on the guided features of the input image, and employs the multi-head cross-attention mechanism to dynamically generate filters, effectively focusing on subtle texture artifacts in the spatial domain. The AFDF module applies frequency decomposition and dynamic convolution kernels in the frequency domain, which adaptively selecting frequency information to capture refined clues. Additionally, we present a dual-domain fusion module based on Bi-directional Adapter (BAT) to transfer domain-specific feature information from one domain to another. The advantage of this module lies in its ability to enable efficient feature fusion by fine-tuning only minimal BAT parameters. Our method exhibits exceptional generalization capabilities in cross-dataset evaluation, outperforming optimal approaches by 3.07% and 3.15% AUC improvements. Moreover, the proposed approach only utilizes 547K trainable parameters and 130M FLOPs, significantly reducing computational costs compared to other state-of-the-art face forgery detection methods. The code is released at <span><span>https://github.com/lvyanyana/ASFI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104599"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Label-informed knowledge integration: Advancing visual prompt for VLMs adaptation 基于标签的知识集成:推进vlm适应的可视化提示
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-18 DOI: 10.1016/j.cviu.2025.104614
Yue Wu , Yunhong Wang , Guodong Wang , Jinjin Zhang , Yingjie Gao , Xiuguo Bao , Di Huang
Prompt tuning has emerged as a pivotal technique for adapting pre-trained vision-language models (VLMs) to a wide range of downstream tasks. Recent developments have introduced multimodal learnable prompts to construct task-specific classifiers. However, these methods often exhibit limited generalization to unseen classes, primarily due to fixed prompt designs that are tightly coupled with seen training data and lack adaptability to novel class distributions. To overcome this limitation, we propose Label-Informed Knowledge Integration (LIKI)—a novel framework that harnesses the robust generalizability of textual label semantics to guide the generation of adaptive visual prompts. Rather than directly mapping textual prompts into the visual domain, LIKI utilizes robust text embeddings as a knowledge source to inform the visual prompt optimization. Central to our method is a simple yet effective Label Semantic Integration (LSI) module, which dynamically incorporates knowledge from both seen and unseen labels into the visual prompts. This label-informed prompting strategy imbues the visual encoder with semantic awareness, thereby enhancing the generalization and discriminative capacity of VLMs across diverse scenarios. Extensive experiments demonstrate that LIKI consistently outperforms state-of-the-art approaches in base-to-novel generalization, cross-dataset transfer, and domain generalization tasks, offering a significant advancement in prompt-based VLM adaptation.
快速调优已经成为一种关键的技术,使预训练的视觉语言模型(VLMs)适应广泛的下游任务。最近的发展引入了多模态可学习提示来构建特定于任务的分类器。然而,这些方法通常对未见过的类表现出有限的泛化,主要是由于固定的提示设计与可见的训练数据紧密耦合,并且缺乏对新类分布的适应性。为了克服这一限制,我们提出了标签通知知识集成(LIKI) -一个利用文本标签语义的鲁棒泛化性来指导自适应视觉提示生成的新框架。LIKI不是直接将文本提示映射到视觉域,而是利用健壮的文本嵌入作为知识来源来通知视觉提示优化。我们的方法的核心是一个简单而有效的标签语义集成(LSI)模块,它动态地将来自可见和不可见标签的知识整合到视觉提示中。这种标注提示策略增强了视觉编码器的语义感知能力,从而增强了vlm在不同场景下的泛化和判别能力。广泛的实验表明,LIKI在基础到新概化、跨数据集传输和领域概化任务中始终优于最先进的方法,在基于提示的VLM适应方面取得了重大进展。
{"title":"Label-informed knowledge integration: Advancing visual prompt for VLMs adaptation","authors":"Yue Wu ,&nbsp;Yunhong Wang ,&nbsp;Guodong Wang ,&nbsp;Jinjin Zhang ,&nbsp;Yingjie Gao ,&nbsp;Xiuguo Bao ,&nbsp;Di Huang","doi":"10.1016/j.cviu.2025.104614","DOIUrl":"10.1016/j.cviu.2025.104614","url":null,"abstract":"<div><div>Prompt tuning has emerged as a pivotal technique for adapting pre-trained vision-language models (VLMs) to a wide range of downstream tasks. Recent developments have introduced multimodal learnable prompts to construct task-specific classifiers. However, these methods often exhibit limited generalization to unseen classes, primarily due to fixed prompt designs that are tightly coupled with seen training data and lack adaptability to novel class distributions. To overcome this limitation, we propose Label-Informed Knowledge Integration (LIKI)—a novel framework that harnesses the robust generalizability of textual label semantics to guide the generation of adaptive visual prompts. Rather than directly mapping textual prompts into the visual domain, LIKI utilizes robust text embeddings as a knowledge source to inform the visual prompt optimization. Central to our method is a simple yet effective Label Semantic Integration (LSI) module, which dynamically incorporates knowledge from both seen and unseen labels into the visual prompts. This label-informed prompting strategy imbues the visual encoder with semantic awareness, thereby enhancing the generalization and discriminative capacity of VLMs across diverse scenarios. Extensive experiments demonstrate that LIKI consistently outperforms state-of-the-art approaches in base-to-novel generalization, cross-dataset transfer, and domain generalization tasks, offering a significant advancement in prompt-based VLM adaptation.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104614"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SGCNet: Silhouette Guided Cascaded Network for multi-modal image fusion 面向多模态图像融合的轮廓引导级联网络
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-16 DOI: 10.1016/j.cviu.2025.104603
Yuxuan Wang , Zhongwei Shen , Hui Li , Yuning Zhang , Zhenping Xia
For generating high-quality fused images in the field of image fusion, it is essential to effectively capture local detail information (e.g., texture) alongside global information (e.g., color blocks). However, conventional fusion techniques often fail to balance local and global information. This imbalance can lead to fused results excessively favoring either infrared or visible light characteristics, compromising the contrast and detail in the fused image. To tackle this problem, we propose the Silhouette Guided Cascaded Network (SGCNet). The encoder of our method employs Cascaded Dense Connection structure that integrates CNN and Transformer-based encoders to extract both local and global features in a compatible manner. In the fusion stage, the silhouettes of the targets are extracted by a pretrained semantic segmentation model that provides global spatial weighting for detailed features, guiding the alignment of features across different modalities. Extensive experiments demonstrate that SGCNet outperforms existing fusion methods across a variety of tasks, including infrared-visible and medical image fusion, highlighting its technological advancements and broad practical application potential.
在图像融合领域中,为了生成高质量的融合图像,必须有效地捕获局部细节信息(如纹理)和全局信息(如色块)。然而,传统的融合技术往往不能平衡局部和全局信息。这种不平衡可能导致融合的结果过于有利于红外或可见光的特性,妥协的对比度和细节在融合的图像。为了解决这个问题,我们提出了轮廓引导级联网络(SGCNet)。我们方法的编码器采用级联密集连接结构,该结构集成了CNN和基于transformer的编码器,以兼容的方式提取局部和全局特征。在融合阶段,通过预训练的语义分割模型提取目标轮廓,该模型为细节特征提供全局空间权重,指导特征跨不同模态的对齐。大量实验表明,SGCNet在红外-可见光和医学图像融合等多种任务上都优于现有的融合方法,凸显了其技术的先进性和广泛的实际应用潜力。
{"title":"SGCNet: Silhouette Guided Cascaded Network for multi-modal image fusion","authors":"Yuxuan Wang ,&nbsp;Zhongwei Shen ,&nbsp;Hui Li ,&nbsp;Yuning Zhang ,&nbsp;Zhenping Xia","doi":"10.1016/j.cviu.2025.104603","DOIUrl":"10.1016/j.cviu.2025.104603","url":null,"abstract":"<div><div>For generating high-quality fused images in the field of image fusion, it is essential to effectively capture local detail information (e.g., texture) alongside global information (e.g., color blocks). However, conventional fusion techniques often fail to balance local and global information. This imbalance can lead to fused results excessively favoring either infrared or visible light characteristics, compromising the contrast and detail in the fused image. To tackle this problem, we propose the Silhouette Guided Cascaded Network (SGCNet). The encoder of our method employs Cascaded Dense Connection structure that integrates CNN and Transformer-based encoders to extract both local and global features in a compatible manner. In the fusion stage, the silhouettes of the targets are extracted by a pretrained semantic segmentation model that provides global spatial weighting for detailed features, guiding the alignment of features across different modalities. Extensive experiments demonstrate that SGCNet outperforms existing fusion methods across a variety of tasks, including infrared-visible and medical image fusion, highlighting its technological advancements and broad practical application potential.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104603"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A detector-free feature matching method with dual-frequency transformer 一种无检测器的双频变压器特征匹配方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-13 DOI: 10.1016/j.cviu.2025.104597
Zhen Han , Ning Lv , Chen Chen , Li Cong , Chengbin Huang , Bin Wang
Detector-free methods have achieved notable progress in recent years, but the limited capacity of existing models to leverage multi-frequency features continues to constrain matching performance. To address this challenge, we propose a novel feature matching approach based on a dual-frequency Transformer model, which effectively exploits multi-level image information. The proposed architecture employs dual attention branches, specifically designed to capture high-frequency details and low-frequency structural features. The high-frequency attention branch incorporates a feature enhancement module to accentuate edge visual features, which play a pivotal role in matching tasks. In addition, a frequency-based loss function is designed to constrain the consistency and integrity of features in the frequency domain during the feature extraction process, effectively mitigating frequency feature distortion. The proposed method not only enhances the model’s ability to represent contextual features across different frequency components but also improves selective attention to reliable feature details. Experimental results demonstrate the proposed method achieves superior performance in multiple feature matching tasks.
近年来,无检测器方法取得了显著进展,但现有模型利用多频率特征的能力有限,继续限制匹配性能。为了解决这一问题,我们提出了一种新的基于双频变压器模型的特征匹配方法,该方法有效地利用了多层次的图像信息。所提出的架构采用双注意分支,专门用于捕获高频细节和低频结构特征。高频注意分支包含特征增强模块,以突出在匹配任务中起关键作用的边缘视觉特征。此外,设计了基于频率的损失函数,在特征提取过程中约束特征在频域的一致性和完整性,有效缓解频率特征失真。该方法不仅增强了模型跨不同频率分量表示上下文特征的能力,而且提高了对可靠特征细节的选择性关注。实验结果表明,该方法在多种特征匹配任务中取得了较好的效果。
{"title":"A detector-free feature matching method with dual-frequency transformer","authors":"Zhen Han ,&nbsp;Ning Lv ,&nbsp;Chen Chen ,&nbsp;Li Cong ,&nbsp;Chengbin Huang ,&nbsp;Bin Wang","doi":"10.1016/j.cviu.2025.104597","DOIUrl":"10.1016/j.cviu.2025.104597","url":null,"abstract":"<div><div>Detector-free methods have achieved notable progress in recent years, but the limited capacity of existing models to leverage multi-frequency features continues to constrain matching performance. To address this challenge, we propose a novel feature matching approach based on a dual-frequency Transformer model, which effectively exploits multi-level image information. The proposed architecture employs dual attention branches, specifically designed to capture high-frequency details and low-frequency structural features. The high-frequency attention branch incorporates a feature enhancement module to accentuate edge visual features, which play a pivotal role in matching tasks. In addition, a frequency-based loss function is designed to constrain the consistency and integrity of features in the frequency domain during the feature extraction process, effectively mitigating frequency feature distortion. The proposed method not only enhances the model’s ability to represent contextual features across different frequency components but also improves selective attention to reliable feature details. Experimental results demonstrate the proposed method achieves superior performance in multiple feature matching tasks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104597"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Place recognition for visual assistive localization under challenging visual appearance variations 视觉外观变化条件下视觉辅助定位的位置识别
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-12-23 DOI: 10.1016/j.cviu.2025.104623
Ruiqi Cheng , Hai-Miao Hu , Chongze Wang , Xuan Gong
Due to the complexity of real-world environments, self-localization remains critical yet unresolved challenges for individuals with visual impairments during travel. Visual appearance variations in the context of assistive technology, such as season changes, illumination changes, viewpoint changes, and dynamic occlusions, significantly hinder the performance of place recognition. This paper proposes a novel assistive visual localization method to address these challenges. In order to extract landmark-related features from images with appearance variations, the dual constraints of place classification and feature distillation are proposed based on large-scale place recognition and human matting datasets. Additionally, online sequential matching is employed for place recognition, leveraging temporal consistency embedded in multi-frame sequences to further eliminate erroneous localization results. Evaluated on the large-scale SF-XL dataset augmented with human matting, the proposed image feature model achieves a 3% improvement in Recall@1 compared to state-of-the-art approaches using similar backbone architectures, which indicates the better performance of image retrieval under the assistive occlusion scenarios. More importantly, in real-world validation using self-collected assistive datasets, the proposed visual localization pipeline incorporating sequential matching achieves F1 scores over 0.85 and shows advantages over existing sequential place recognition methods. The implementation codes of the proposed algorithm, along with a real-world testing dataset for assistive localization, are released at https://github.com/chengricky/AssistivePlace.
由于现实世界环境的复杂性,对于视力受损的人来说,自我定位仍然是一个关键但尚未解决的挑战。在辅助技术的背景下,视觉外观的变化,如季节变化、照明变化、视点变化和动态遮挡,会严重阻碍位置识别的表现。本文提出了一种新的辅助视觉定位方法来解决这些问题。为了从具有外观变化的图像中提取地标相关特征,提出了基于大规模地点识别和人类抠图数据集的地点分类和特征蒸馏的双重约束。此外,在线序列匹配用于位置识别,利用嵌入在多帧序列中的时间一致性进一步消除错误的定位结果。在经过人类消光增强的大规模SF-XL数据集上进行评估,与使用类似主干架构的最新方法相比,所提出的图像特征模型在Recall@1上实现了3%的改进,这表明在辅助遮挡场景下的图像检索性能更好。更重要的是,在使用自收集的辅助数据集进行实际验证时,所提出的包含顺序匹配的视觉定位管道的F1分数超过0.85,比现有的顺序位置识别方法更具优势。该算法的实现代码以及辅助定位的真实测试数据集发布在https://github.com/chengricky/AssistivePlace。
{"title":"Place recognition for visual assistive localization under challenging visual appearance variations","authors":"Ruiqi Cheng ,&nbsp;Hai-Miao Hu ,&nbsp;Chongze Wang ,&nbsp;Xuan Gong","doi":"10.1016/j.cviu.2025.104623","DOIUrl":"10.1016/j.cviu.2025.104623","url":null,"abstract":"<div><div>Due to the complexity of real-world environments, self-localization remains critical yet unresolved challenges for individuals with visual impairments during travel. Visual appearance variations in the context of assistive technology, such as season changes, illumination changes, viewpoint changes, and dynamic occlusions, significantly hinder the performance of place recognition. This paper proposes a novel assistive visual localization method to address these challenges. In order to extract landmark-related features from images with appearance variations, the dual constraints of place classification and feature distillation are proposed based on large-scale place recognition and human matting datasets. Additionally, online sequential matching is employed for place recognition, leveraging temporal consistency embedded in multi-frame sequences to further eliminate erroneous localization results. Evaluated on the large-scale SF-XL dataset augmented with human matting, the proposed image feature model achieves a 3% improvement in Recall@1 compared to state-of-the-art approaches using similar backbone architectures, which indicates the better performance of image retrieval under the assistive occlusion scenarios. More importantly, in real-world validation using self-collected assistive datasets, the proposed visual localization pipeline incorporating sequential matching achieves <span><math><msub><mrow><mi>F</mi></mrow><mrow><mn>1</mn></mrow></msub></math></span> scores over 0.85 and shows advantages over existing sequential place recognition methods. The implementation codes of the proposed algorithm, along with a real-world testing dataset for assistive localization, are released at <span><span>https://github.com/chengricky/AssistivePlace</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104623"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145883901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pay more attention to dark regions for faster shadow detection 为了更快地检测阴影,要更加注意暗区
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2026-01-01 Epub Date: 2025-11-27 DOI: 10.1016/j.cviu.2025.104589
Xian-Tao Wu , Xiao-Diao Chen , Hongyu Chen , Wen Wu , Weiyin Ma , Haichuan Song
Deep learning-based shadow detection methods primarily focus on achieving higher accuracy, while often overlooking the importance of inference efficiency for downstream applications. This work attempts to reduce the number of processed patches during the feed-forward process and proposes a faster framework for shadow detection (namely FasterSD) based on vision transformer. We found that most of bright regions can converge to a stable status even at early stages of the feed-forward process, revealing massive computational redundancy. From this observation, we introduce a token pausing strategy to locate these simple patches and pause to refine their feature representations (i.e., tokens), enabling us to use most of computational resources to the remaining challenging patches. Specifically, we propose to use predicted posterior entropy as a proxy for prediction correctness, and design a random pausing scheme to ensure that the model meets flexible runtime requirements by directly adjusting the pausing configuration without repeated training. Extensive experiments on three shadow detection benchmarks (i.e., SBU, ISTD, and UCF) demonstrate that our FasterSD can run 12× faster than the state-of-the-art shadow detector with a comparable performance. The code will be available at https://github.com/wuwen1994/FasterSD.
基于深度学习的阴影检测方法主要侧重于实现更高的准确性,而往往忽略了下游应用的推理效率的重要性。本工作试图减少前馈过程中处理的斑块数量,并提出了一种基于视觉变换的更快的阴影检测框架(即FasterSD)。我们发现,即使在前馈过程的早期阶段,大多数亮区也可以收敛到稳定状态,从而显示出大量的计算冗余。从这个观察中,我们引入了一个令牌暂停策略来定位这些简单的补丁,并暂停以改进它们的特征表示(即令牌),使我们能够将大部分计算资源用于剩余的具有挑战性的补丁。具体而言,我们建议使用预测后验熵作为预测正确性的代理,并设计随机暂停方案,通过直接调整暂停配置而无需重复训练来确保模型满足灵活的运行时需求。在三个阴影检测基准(即SBU, ISTD和UCF)上进行的大量实验表明,我们的FasterSD在性能相当的情况下比最先进的阴影检测器快12倍。代码可在https://github.com/wuwen1994/FasterSD上获得。
{"title":"Pay more attention to dark regions for faster shadow detection","authors":"Xian-Tao Wu ,&nbsp;Xiao-Diao Chen ,&nbsp;Hongyu Chen ,&nbsp;Wen Wu ,&nbsp;Weiyin Ma ,&nbsp;Haichuan Song","doi":"10.1016/j.cviu.2025.104589","DOIUrl":"10.1016/j.cviu.2025.104589","url":null,"abstract":"<div><div>Deep learning-based shadow detection methods primarily focus on achieving higher accuracy, while often overlooking the importance of inference efficiency for downstream applications. This work attempts to reduce the number of processed patches during the feed-forward process and proposes a faster framework for shadow detection (namely FasterSD) based on vision transformer. We found that most of bright regions can converge to a stable status even at early stages of the feed-forward process, revealing massive computational redundancy. From this observation, we introduce a token pausing strategy to locate these simple patches and pause to refine their feature representations (<em>i.e.</em>, tokens), enabling us to use most of computational resources to the remaining challenging patches. Specifically, we propose to use predicted posterior entropy as a proxy for prediction correctness, and design a random pausing scheme to ensure that the model meets flexible runtime requirements by directly adjusting the pausing configuration without repeated training. Extensive experiments on three shadow detection benchmarks (<em>i.e.</em>, SBU, ISTD, and UCF) demonstrate that our FasterSD can run 12<span><math><mo>×</mo></math></span> faster than the state-of-the-art shadow detector with a comparable performance. The code will be available at <span><span>https://github.com/wuwen1994/FasterSD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104589"},"PeriodicalIF":3.5,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1