首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
SynTaskNet: A synergistic multi-task network for joint segmentation and classification of small anatomical structures in ultrasound imaging SynTaskNet:超声成像中用于关节分割和小解剖结构分类的协同多任务网络
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1016/j.cviu.2025.104616
Abdulrhman H. Al-Jebrni , Saba Ghazanfar Ali , Bin Sheng , Huating Li , Xiao Lin , Ping Li , Younhyun Jung , Jinman Kim , Li Xu , Lixin Jiang , Jing Du
Segmenting small, low-contrast anatomical structures and classifying their pathological status in ultrasound (US) images remain challenging tasks in computer vision, especially under the noise and ambiguity inherent in real-world clinical data. Papillary thyroid microcarcinoma (PTMC), characterized by nodules 1.0 cm, exemplifies these challenges where both precise segmentation and accurate lymph node metastasis (LNM) prediction are essential for informed clinical decisions. We propose SynTaskNet, a synergistic multi-task learning (MTL) architecture that jointly performs PTMC nodule segmentation and LNM classification from US images. Built upon a DenseNet201 backbone, SynTaskNet incorporates several specialized modules: a Coordinated Depth-wise Convolution (CDC) layer for enhancing spatial features, an Adaptive Context Block (ACB) for embedding contextual dependencies, and a Multi-scale Contextual Boundary Attention (MCBA) module to improve boundary localization in low-contrast regions. To strengthen task interaction, we introduce a Selective Enhancement Fusion (SEF) mechanism that hierarchically integrates features across three semantic levels, enabling effective information exchange between segmentation and classification branches. On top of this, we formulate a synergistic learning scheme wherein an Auxiliary Segmentation Map (ASM) generated by the segmentation decoder is injected into SEF’s third class-specific fusion path to guide LNM classification. In parallel, the predicted LNM label is concatenated with the third-path SEF output to refine the Final Segmentation Map (FSM), enabling bidirectional task reinforcement. Extensive evaluations on a dedicated PTMC US dataset demonstrate that SynTaskNet achieves state-of-the-art performance, with a Dice score of 93.0% for segmentation and a classification accuracy of 94.2% for LNM prediction, validating its clinical relevance and technical efficacy.
在超声(US)图像中分割小的、低对比度的解剖结构并对其病理状态进行分类仍然是计算机视觉中具有挑战性的任务,特别是在现实世界临床数据中固有的噪声和模糊性下。甲状腺乳头状微癌(PTMC)以结节≤1.0 cm为特征,体现了这些挑战,其中精确的分割和准确的淋巴结转移(LNM)预测对于知情的临床决策至关重要。我们提出了SynTaskNet,这是一种协同多任务学习(MTL)架构,可以联合执行PTMC模块分割和LNM分类。基于DenseNet201主干,SynTaskNet集成了几个专用模块:用于增强空间特征的协调深度卷积(CDC)层,用于嵌入上下文依赖的自适应上下文块(ACB),以及用于改进低对比度区域边界定位的多尺度上下文边界注意(MCBA)模块。为了加强任务交互,我们引入了一种选择性增强融合(SEF)机制,该机制分层地集成了三个语义级别的特征,从而实现了分词和分类分支之间的有效信息交换。在此基础上,我们制定了一种协同学习方案,将分割解码器生成的辅助分割映射(ASM)注入到SEF的第三类特定融合路径中,以指导LNM分类。同时,将预测的LNM标签与第三路径SEF输出连接起来,以改进最终分割映射(FSM),从而实现双向任务强化。对专用PTMC US数据集的广泛评估表明,SynTaskNet达到了最先进的性能,分割的Dice得分为93.0%,LNM预测的分类准确率为94.2%,验证了其临床相关性和技术有效性。
{"title":"SynTaskNet: A synergistic multi-task network for joint segmentation and classification of small anatomical structures in ultrasound imaging","authors":"Abdulrhman H. Al-Jebrni ,&nbsp;Saba Ghazanfar Ali ,&nbsp;Bin Sheng ,&nbsp;Huating Li ,&nbsp;Xiao Lin ,&nbsp;Ping Li ,&nbsp;Younhyun Jung ,&nbsp;Jinman Kim ,&nbsp;Li Xu ,&nbsp;Lixin Jiang ,&nbsp;Jing Du","doi":"10.1016/j.cviu.2025.104616","DOIUrl":"10.1016/j.cviu.2025.104616","url":null,"abstract":"<div><div>Segmenting small, low-contrast anatomical structures and classifying their pathological status in ultrasound (US) images remain challenging tasks in computer vision, especially under the noise and ambiguity inherent in real-world clinical data. Papillary thyroid microcarcinoma (PTMC), characterized by nodules <span><math><mrow><mo>≤</mo><mn>1</mn><mo>.</mo><mn>0</mn></mrow></math></span> cm, exemplifies these challenges where both precise segmentation and accurate lymph node metastasis (LNM) prediction are essential for informed clinical decisions. We propose SynTaskNet, a synergistic multi-task learning (MTL) architecture that jointly performs PTMC nodule segmentation and LNM classification from US images. Built upon a DenseNet201 backbone, SynTaskNet incorporates several specialized modules: a Coordinated Depth-wise Convolution (CDC) layer for enhancing spatial features, an Adaptive Context Block (ACB) for embedding contextual dependencies, and a Multi-scale Contextual Boundary Attention (MCBA) module to improve boundary localization in low-contrast regions. To strengthen task interaction, we introduce a Selective Enhancement Fusion (SEF) mechanism that hierarchically integrates features across three semantic levels, enabling effective information exchange between segmentation and classification branches. On top of this, we formulate a synergistic learning scheme wherein an Auxiliary Segmentation Map (ASM) generated by the segmentation decoder is injected into SEF’s third class-specific fusion path to guide LNM classification. In parallel, the predicted LNM label is concatenated with the third-path SEF output to refine the Final Segmentation Map (FSM), enabling bidirectional task reinforcement. Extensive evaluations on a dedicated PTMC US dataset demonstrate that SynTaskNet achieves state-of-the-art performance, with a Dice score of 93.0% for segmentation and a classification accuracy of 94.2% for LNM prediction, validating its clinical relevance and technical efficacy.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104616"},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFDiff: Diffusion probabilistic model for medical image segmentation with multi-scale features and frequency-aware attention MFDiff:基于多尺度特征和频率感知关注的医学图像分割扩散概率模型
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-17 DOI: 10.1016/j.cviu.2025.104605
Xingli Zhang , Yameng Liu , Haiyang Yu , Zhihui Wang
Medical image segmentation serves as a critical technique in clinical applications such as disease diagnosis, surgical planning, and image-guided therapy, where segmentation accuracy directly impacts the precision of clinical decisions. However, existing methods still face significant challenges in handling inherent issues of medical images, including blurred boundaries, complex multi-scale structures, and difficulties in fine-grained feature representation. To address these challenges, this paper proposes a medical image segmentation method based on a diffusion probabilistic model, MFDiff, which aims to enhance multi-scale contextual awareness and fine-grained structural modeling capabilities. The method incorporates a frequency-aware attention fusion module that effectively strengthens the model’s ability to represent complex structures and ambiguous boundaries. Additionally, a multi-scale feature enhancement module is introduced to expand the receptive field while maintaining low computational cost, thereby improving the extraction and fusion of multi-scale features. Furthermore, an uncertainty-weighted majority voting fusion strategy is proposed to enhance the robustness and consistency of fused predictions from multiple sampling iterations. The proposed method was validated on five medical image segmentation datasets. Experimental results demonstrate that MFDiff outperforms current mainstream methods across all datasets, exhibiting strong generalization ability and robustness.
医学图像分割是疾病诊断、手术计划、图像引导治疗等临床应用中的一项关键技术,分割的准确性直接影响临床决策的准确性。然而,现有的方法在处理医学图像固有问题时仍然面临重大挑战,包括边界模糊、复杂的多尺度结构以及细粒度特征表示困难。为了解决这些问题,本文提出了一种基于扩散概率模型MFDiff的医学图像分割方法,旨在增强多尺度上下文感知和细粒度结构建模能力。该方法结合了频率感知注意力融合模块,有效增强了模型表示复杂结构和模糊边界的能力。此外,引入了多尺度特征增强模块,在保持低计算成本的同时扩大了接收域,从而提高了多尺度特征的提取和融合。在此基础上,提出了一种不确定性加权多数投票融合策略,以提高多采样迭代融合预测的鲁棒性和一致性。在5个医学图像分割数据集上进行了验证。实验结果表明,MFDiff在所有数据集上都优于当前主流方法,具有较强的泛化能力和鲁棒性。
{"title":"MFDiff: Diffusion probabilistic model for medical image segmentation with multi-scale features and frequency-aware attention","authors":"Xingli Zhang ,&nbsp;Yameng Liu ,&nbsp;Haiyang Yu ,&nbsp;Zhihui Wang","doi":"10.1016/j.cviu.2025.104605","DOIUrl":"10.1016/j.cviu.2025.104605","url":null,"abstract":"<div><div>Medical image segmentation serves as a critical technique in clinical applications such as disease diagnosis, surgical planning, and image-guided therapy, where segmentation accuracy directly impacts the precision of clinical decisions. However, existing methods still face significant challenges in handling inherent issues of medical images, including blurred boundaries, complex multi-scale structures, and difficulties in fine-grained feature representation. To address these challenges, this paper proposes a medical image segmentation method based on a diffusion probabilistic model, MFDiff, which aims to enhance multi-scale contextual awareness and fine-grained structural modeling capabilities. The method incorporates a frequency-aware attention fusion module that effectively strengthens the model’s ability to represent complex structures and ambiguous boundaries. Additionally, a multi-scale feature enhancement module is introduced to expand the receptive field while maintaining low computational cost, thereby improving the extraction and fusion of multi-scale features. Furthermore, an uncertainty-weighted majority voting fusion strategy is proposed to enhance the robustness and consistency of fused predictions from multiple sampling iterations. The proposed method was validated on five medical image segmentation datasets. Experimental results demonstrate that MFDiff outperforms current mainstream methods across all datasets, exhibiting strong generalization ability and robustness.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104605"},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation 基于特征校正和语义调制的广义提示驱动零距域自适应分割
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-17 DOI: 10.1016/j.cviu.2025.104615
Jinyi Li , Longyu Yang , Donghyun Kim , Kuniaki Saito , Kate Saenko , Stan Sclaroff , Xiaofeng Zhu , Ping Hu
Recent prompt-driven zero-shot adaptation methods offer a promising way to handle domain shifts in semantic segmentation by learning with features simulated from natural language prompts. However, these methods typically depend on a fixed set of predefined domain descriptions, which limits their capacity to generalize to previously undefined domains and often necessitates retraining when encountering novel environments. To address this challenge, we propose a Generalized Prompt-driven Zero-shot Domain Adaptive Segmentation framework that enables flexible and robust cross-domain segmentation by learning to map target domain features into the source domain space. This allows inference to be performed through a unified and well-optimized source model, without requiring target data-based or prompt-based retraining when encountering novel conditions. Our framework comprises two key modules: a Low-level Feature Rectification (LLFR) module that aligns visual styles using a historical source-style memory bank, and a High-level Semantic Modulation (HLSM) module that applies language-guided affine transformations to align high-level semantics. Together, these modules enable adaptive multi-level feature adaptation that maps target inputs into the source domain space, thus allowing the model to handle unseen domains effectively at test time. Extensive experiments on multiple zero-shot domain adaptation benchmarks are conducted, and the results show that our method consistently outperforms previous approaches.
近年来,提示驱动的零距自适应方法通过学习自然语言提示中模拟的特征,提供了一种很有前途的处理语义分割领域转移的方法。然而,这些方法通常依赖于一组固定的预定义领域描述,这限制了它们泛化到以前未定义的领域的能力,并且在遇到新环境时经常需要重新训练。为了解决这一挑战,我们提出了一种广义提示驱动的零射击域自适应分割框架,该框架通过学习将目标域特征映射到源域空间,实现灵活而稳健的跨域分割。这允许通过统一且优化良好的源模型执行推理,而不需要在遇到新情况时进行基于目标数据或基于提示的再训练。我们的框架包括两个关键模块:一个低级特征校正(LLFR)模块,它使用历史源风格的记忆库来对齐视觉样式,一个高级语义调制(HLSM)模块,它应用语言引导的仿射变换来对齐高级语义。总之,这些模块支持自适应多层次特征适应,将目标输入映射到源域空间,从而允许模型在测试时有效地处理未见过的域。在多个零射击域自适应基准上进行了大量实验,结果表明我们的方法始终优于先前的方法。
{"title":"Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation","authors":"Jinyi Li ,&nbsp;Longyu Yang ,&nbsp;Donghyun Kim ,&nbsp;Kuniaki Saito ,&nbsp;Kate Saenko ,&nbsp;Stan Sclaroff ,&nbsp;Xiaofeng Zhu ,&nbsp;Ping Hu","doi":"10.1016/j.cviu.2025.104615","DOIUrl":"10.1016/j.cviu.2025.104615","url":null,"abstract":"<div><div>Recent prompt-driven zero-shot adaptation methods offer a promising way to handle domain shifts in semantic segmentation by learning with features simulated from natural language prompts. However, these methods typically depend on a fixed set of predefined domain descriptions, which limits their capacity to generalize to previously undefined domains and often necessitates retraining when encountering novel environments. To address this challenge, we propose a Generalized Prompt-driven Zero-shot Domain Adaptive Segmentation framework that enables flexible and robust cross-domain segmentation by learning to map target domain features into the source domain space. This allows inference to be performed through a unified and well-optimized source model, without requiring target data-based or prompt-based retraining when encountering novel conditions. Our framework comprises two key modules: a Low-level Feature Rectification (LLFR) module that aligns visual styles using a historical source-style memory bank, and a High-level Semantic Modulation (HLSM) module that applies language-guided affine transformations to align high-level semantics. Together, these modules enable adaptive multi-level feature adaptation that maps target inputs into the source domain space, thus allowing the model to handle unseen domains effectively at test time. Extensive experiments on multiple zero-shot domain adaptation benchmarks are conducted, and the results show that our method consistently outperforms previous approaches.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104615"},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FreqOR: Frequency-guided sampling initialization with attention enhancements for training-free object repositioning FreqOR:用于无训练对象重定位的带有注意增强的频率引导采样初始化
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.cviu.2025.104610
Yuanxiang Fang, Jingyue Wang, Meiqing Wang, Shujie Zhang, Huimin Liu
Object repositioning in real images remains a challenging task. Existing approaches are typically built upon the DDIM inversion framework, whose sampling initialization tends to preserve strong layout priors in the latent space, thereby leading to object residuals or ghosting artifacts in the vacated region. Additionally, masking low-resolution self-attention maps often results in boundary misjudgments, which impair the inpainting capability. To address these limitations, we propose FreqOR, a training-free framework that integrates sampling initialization optimization with attention-level enhancements. For sampling initialization, high-frequency components of the inverted latent in the vacated region are suppressed to weaken inherited priors, thereby providing a cleaner sampling initialization. For attention enhancement, we incorporate two complementary strategies. The first is Resolution-Aligned Key–Value Interpolation, which achieves precise regional control by enabling pixel-wise masking of attention maps. The second is Query-Guided Consistency, which preserves the identity and texture consistency of the designated object by reusing inversion queries as priors during sampling. Integrated into the energy-based guidance framework, FreqOR is evaluated on the COCO-130 and VOC-100 datasets. The results demonstrate that it effectively suppresses residuals in the vacated region and enhances object consistency.
物体在真实图像中的重新定位仍然是一项具有挑战性的任务。现有的方法通常建立在DDIM反演框架之上,其采样初始化倾向于在潜在空间中保留强布局先验,从而导致空出区域中的目标残差或鬼影伪影。此外,掩蔽低分辨率的自注意图往往会导致边界误判,从而影响图像的绘制能力。为了解决这些限制,我们提出了FreqOR,这是一个集成了采样初始化优化和注意力水平增强的无训练框架。对于采样初始化,在空出的区域中,反向隐波的高频分量被抑制以削弱继承的先验,从而提供更清晰的采样初始化。为了提高注意力,我们采用了两种互补的策略。第一种是分辨率对齐键值插值,它通过启用逐像素的注意力地图掩蔽来实现精确的区域控制。二是查询引导一致性,通过在采样过程中重用反转查询作为先验,保持指定对象的身份和纹理一致性。FreqOR集成到基于能量的指导框架中,在COCO-130和VOC-100数据集上进行评估。结果表明,该方法有效地抑制了空出区域的残差,提高了目标的一致性。
{"title":"FreqOR: Frequency-guided sampling initialization with attention enhancements for training-free object repositioning","authors":"Yuanxiang Fang,&nbsp;Jingyue Wang,&nbsp;Meiqing Wang,&nbsp;Shujie Zhang,&nbsp;Huimin Liu","doi":"10.1016/j.cviu.2025.104610","DOIUrl":"10.1016/j.cviu.2025.104610","url":null,"abstract":"<div><div>Object repositioning in real images remains a challenging task. Existing approaches are typically built upon the DDIM inversion framework, whose sampling initialization tends to preserve strong layout priors in the latent space, thereby leading to object residuals or ghosting artifacts in the vacated region. Additionally, masking low-resolution self-attention maps often results in boundary misjudgments, which impair the inpainting capability. To address these limitations, we propose FreqOR, a training-free framework that integrates sampling initialization optimization with attention-level enhancements. For sampling initialization, high-frequency components of the inverted latent in the vacated region are suppressed to weaken inherited priors, thereby providing a cleaner sampling initialization. For attention enhancement, we incorporate two complementary strategies. The first is Resolution-Aligned Key–Value Interpolation, which achieves precise regional control by enabling pixel-wise masking of attention maps. The second is Query-Guided Consistency, which preserves the identity and texture consistency of the designated object by reusing inversion queries as priors during sampling. Integrated into the energy-based guidance framework, FreqOR is evaluated on the COCO-130 and VOC-100 datasets. The results demonstrate that it effectively suppresses residuals in the vacated region and enhances object consistency.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104610"},"PeriodicalIF":3.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaMulti: An adaptive cascaded multi-modal recognition framework for sports action analysis 一个用于运动动作分析的自适应级联多模态识别框架
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.cviu.2025.104604
Jianwei Li , Rui Cao , Haiqing Hu , Xiaomei Zhao , Pengju Zhang
Computer vision-based sports action analysis has emerged as a pivotal research domain, driving transformative applications including healthcare and sports analytics. While deep learning advancements have significantly improved automatic human action recognition and assessment, existing approaches typically rely exclusively on either RGB video streams or skeletal key points-each presenting unique advantages. RGB data offers rich contextual information and widespread accessibility, whereas skeleton data provides a compact representation ideal for direct pose analysis. To harness the complementary strengths of both modalities, we propose AdaMulti, an adaptive cascaded multi-modal framework for fine-grained human action analysis. Our novel approach integrates both RGB and skeleton data through two key innovations: (1) an intelligent policy network that dynamically selects the optimal modality (RGB or skeleton) for each frame, and (2) a cascaded recognition architecture that effectively fuses multi-modal features. We evaluate AdaMulti using a newly constructed multi-modal dataset derived from our 3D-Yoga project, comprising extensive yoga poses with detailed performance annotations. Experimental results demonstrate that AdaMulti outperforms single-modal methods by 17% and 32% in recognition accuracy. Furthermore, comparative studies on the public NTU-RGB+D 60 benchmark show that our method achieves a 0.6% higher accuracy than the state-of-the-art method, validating its effectiveness for complex action analysis tasks.
基于计算机视觉的体育动作分析已经成为一个关键的研究领域,推动了包括医疗保健和体育分析在内的变革性应用。虽然深度学习的进步显著改善了人类行为的自动识别和评估,但现有的方法通常只依赖于RGB视频流或骨架关键点,每种方法都具有独特的优势。RGB数据提供了丰富的上下文信息和广泛的可访问性,而骨骼数据提供了一个紧凑的表示,适合直接姿态分析。为了利用这两种模式的互补优势,我们提出了AdaMulti,一个用于细粒度人类行为分析的自适应级联多模式框架。我们的新方法通过两个关键创新集成了RGB和骨架数据:(1)智能策略网络,为每帧动态选择最佳模态(RGB或骨架);(2)级联识别架构,有效融合多模态特征。我们使用来自3d瑜伽项目的新构建的多模态数据集来评估AdaMulti,该数据集包括广泛的瑜伽姿势和详细的性能注释。实验结果表明,AdaMulti的识别准确率分别比单模态方法高17%和32%。此外,在公开的NTU-RGB+ d60基准上的比较研究表明,我们的方法比最先进的方法准确率高0.6%,验证了其对复杂动作分析任务的有效性。
{"title":"AdaMulti: An adaptive cascaded multi-modal recognition framework for sports action analysis","authors":"Jianwei Li ,&nbsp;Rui Cao ,&nbsp;Haiqing Hu ,&nbsp;Xiaomei Zhao ,&nbsp;Pengju Zhang","doi":"10.1016/j.cviu.2025.104604","DOIUrl":"10.1016/j.cviu.2025.104604","url":null,"abstract":"<div><div>Computer vision-based sports action analysis has emerged as a pivotal research domain, driving transformative applications including healthcare and sports analytics. While deep learning advancements have significantly improved automatic human action recognition and assessment, existing approaches typically rely exclusively on either RGB video streams or skeletal key points-each presenting unique advantages. RGB data offers rich contextual information and widespread accessibility, whereas skeleton data provides a compact representation ideal for direct pose analysis. To harness the complementary strengths of both modalities, we propose AdaMulti, an adaptive cascaded multi-modal framework for fine-grained human action analysis. Our novel approach integrates both RGB and skeleton data through two key innovations: (1) an intelligent policy network that dynamically selects the optimal modality (RGB or skeleton) for each frame, and (2) a cascaded recognition architecture that effectively fuses multi-modal features. We evaluate AdaMulti using a newly constructed multi-modal dataset derived from our 3D-Yoga project, comprising extensive yoga poses with detailed performance annotations. Experimental results demonstrate that AdaMulti outperforms single-modal methods by 17% and 32% in recognition accuracy. Furthermore, comparative studies on the public NTU-RGB+D 60 benchmark show that our method achieves a 0.6% higher accuracy than the state-of-the-art method, validating its effectiveness for complex action analysis tasks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104604"},"PeriodicalIF":3.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A detector-free feature matching method with dual-frequency transformer 一种无检测器的双频变压器特征匹配方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.cviu.2025.104597
Zhen Han , Ning Lv , Chen Chen , Li Cong , Chengbin Huang , Bin Wang
Detector-free methods have achieved notable progress in recent years, but the limited capacity of existing models to leverage multi-frequency features continues to constrain matching performance. To address this challenge, we propose a novel feature matching approach based on a dual-frequency Transformer model, which effectively exploits multi-level image information. The proposed architecture employs dual attention branches, specifically designed to capture high-frequency details and low-frequency structural features. The high-frequency attention branch incorporates a feature enhancement module to accentuate edge visual features, which play a pivotal role in matching tasks. In addition, a frequency-based loss function is designed to constrain the consistency and integrity of features in the frequency domain during the feature extraction process, effectively mitigating frequency feature distortion. The proposed method not only enhances the model’s ability to represent contextual features across different frequency components but also improves selective attention to reliable feature details. Experimental results demonstrate the proposed method achieves superior performance in multiple feature matching tasks.
近年来,无检测器方法取得了显著进展,但现有模型利用多频率特征的能力有限,继续限制匹配性能。为了解决这一问题,我们提出了一种新的基于双频变压器模型的特征匹配方法,该方法有效地利用了多层次的图像信息。所提出的架构采用双注意分支,专门用于捕获高频细节和低频结构特征。高频注意分支包含特征增强模块,以突出在匹配任务中起关键作用的边缘视觉特征。此外,设计了基于频率的损失函数,在特征提取过程中约束特征在频域的一致性和完整性,有效缓解频率特征失真。该方法不仅增强了模型跨不同频率分量表示上下文特征的能力,而且提高了对可靠特征细节的选择性关注。实验结果表明,该方法在多种特征匹配任务中取得了较好的效果。
{"title":"A detector-free feature matching method with dual-frequency transformer","authors":"Zhen Han ,&nbsp;Ning Lv ,&nbsp;Chen Chen ,&nbsp;Li Cong ,&nbsp;Chengbin Huang ,&nbsp;Bin Wang","doi":"10.1016/j.cviu.2025.104597","DOIUrl":"10.1016/j.cviu.2025.104597","url":null,"abstract":"<div><div>Detector-free methods have achieved notable progress in recent years, but the limited capacity of existing models to leverage multi-frequency features continues to constrain matching performance. To address this challenge, we propose a novel feature matching approach based on a dual-frequency Transformer model, which effectively exploits multi-level image information. The proposed architecture employs dual attention branches, specifically designed to capture high-frequency details and low-frequency structural features. The high-frequency attention branch incorporates a feature enhancement module to accentuate edge visual features, which play a pivotal role in matching tasks. In addition, a frequency-based loss function is designed to constrain the consistency and integrity of features in the frequency domain during the feature extraction process, effectively mitigating frequency feature distortion. The proposed method not only enhances the model’s ability to represent contextual features across different frequency components but also improves selective attention to reliable feature details. Experimental results demonstrate the proposed method achieves superior performance in multiple feature matching tasks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104597"},"PeriodicalIF":3.5,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal prompt guided visual–text–object alignment for zero-shot video captioning 零镜头视频字幕的时间提示引导视觉-文本-对象对齐
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-12 DOI: 10.1016/j.cviu.2025.104601
Ping Li , Tao Wang , Zeyu Pan
Video captioning generates the descriptive sentence for a video. Existing methods rely on a plentiful of annotated captions for training the model, but it is usually very expensive to collect so many captions. This raises a challenge that how to generate video captions with unpaired videos and sentences, i.e., zero-shot video captioning. While some progress using Large Language Model (LLM) has been made in zero-shot image captioning, it still fails to consider the temporal relations in the video domain. This may easily lead to the incorrect verbs and nouns in sentences if directly adapting LLM-based image methods to video. To address this problem, we propose the Temporal Prompt guided Visual–text–object Alignment (TPVA) approach for zero-shot video captioning. It consists of the temporal prompt guidance module and the visual–text–object alignment module. The former employs the pre-trained action recognition model to yield the action class as the key word of the temporal prompt, which guides the LLM to generate the text phrase containing the verb identifying action. The latter implements both visual–text alignment and text–object alignment by computing their similarity scores, respectively, which allows the model to generate the words better revealing the video semantics. Experimental results on several benchmarks demonstrate the superiority of the proposed method in zero-shot video captioning. Code is available at https://github.com/mlvccn/TPVA_VidCap_ZeroShot.
视频字幕生成视频的描述性句子。现有的方法依赖于大量带注释的说明文字来训练模型,但是收集这么多说明文字通常是非常昂贵的。这就提出了一个挑战,即如何用不配对的视频和句子生成视频字幕,即零镜头视频字幕。虽然利用大语言模型(Large Language Model, LLM)在零镜头图像字幕中取得了一些进展,但它仍然没有考虑视频域的时间关系。如果将基于llm的图像方法直接应用到视频中,很容易导致句子中的动词和名词出现错误。为了解决这个问题,我们提出了零镜头视频字幕的时间提示引导视觉文本对象对齐(TPVA)方法。它由时间提示引导模块和可视-文本-对象对齐模块组成。前者采用预先训练好的动作识别模型,生成动作类作为时态提示的关键词,引导LLM生成包含识别动作动词的文本短语。后者分别通过计算它们的相似度得分来实现视觉文本对齐和文本对象对齐,这使得模型能够生成更好地揭示视频语义的单词。几个基准的实验结果证明了该方法在零镜头视频字幕中的优越性。代码可从https://github.com/mlvccn/TPVA_VidCap_ZeroShot获得。
{"title":"Temporal prompt guided visual–text–object alignment for zero-shot video captioning","authors":"Ping Li ,&nbsp;Tao Wang ,&nbsp;Zeyu Pan","doi":"10.1016/j.cviu.2025.104601","DOIUrl":"10.1016/j.cviu.2025.104601","url":null,"abstract":"<div><div>Video captioning generates the descriptive sentence for a video. Existing methods rely on a plentiful of annotated captions for training the model, but it is usually very expensive to collect so many captions. This raises a challenge that how to generate video captions with unpaired videos and sentences, i.e., zero-shot video captioning. While some progress using Large Language Model (LLM) has been made in zero-shot image captioning, it still fails to consider the temporal relations in the video domain. This may easily lead to the incorrect verbs and nouns in sentences if directly adapting LLM-based image methods to video. To address this problem, we propose the Temporal Prompt guided Visual–text–object Alignment (<strong>TPVA</strong>) approach for zero-shot video captioning. It consists of the temporal prompt guidance module and the visual–text–object alignment module. The former employs the pre-trained action recognition model to yield the action class as the key word of the temporal prompt, which guides the LLM to generate the text phrase containing the verb identifying action. The latter implements both visual–text alignment and text–object alignment by computing their similarity scores, respectively, which allows the model to generate the words better revealing the video semantics. Experimental results on several benchmarks demonstrate the superiority of the proposed method in zero-shot video captioning. Code is available at <span><span>https://github.com/mlvccn/TPVA_VidCap_ZeroShot</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104601"},"PeriodicalIF":3.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-time habitat mapping with YOLOv8: A multi-threaded approach to biodiversity preservation 基于YOLOv8的实时生境制图:一种多线程的生物多样性保护方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-12 DOI: 10.1016/j.cviu.2025.104606
Oluwakemi Akinwehinmi , Alberto Tena , Javier Mora , Francesc Solsona , Pedro Arnau del Amo
This paper presents a robust system for real-time object detection and counting in ecological video streams. It is based on the YOLOv8 architecture integrated within a multi-threaded video processing architecture. The system reduces latency and improves throughput by parallelizing object detection and preprocessing tasks. This leads to outperforming traditional single-threaded implementations in continuous video analysis.
The system also incorporates dynamic thresholding methods, fine-tuning, and data augmentation to enhance object detection reliability in dynamic natural environments. These mechanisms improve robustness to changing lighting, occlusions, and background complexity, common challenges in outdoor footage. The system is thoroughly evaluated through performance comparisons between multi-threaded and single-threaded implementations, environmental stress tests, and an ablation study.
Results demonstrate improved consistency in object detection and counting in dynamic environments, along with significant gains in processing speed. Designed for deployment on lightweight and low-power devices, the system is suitable for remote or resource-constrained settings.
While designed for biodiversity monitoring, the approach is applicable to other domains requiring efficient, real-time video analysis in unstructured environments.
提出了一种鲁棒的生态视频流实时目标检测与计数系统。它基于集成在多线程视频处理架构中的YOLOv8架构。该系统通过并行处理对象检测和预处理任务,减少了延迟,提高了吞吐量。这导致在连续视频分析中优于传统的单线程实现。该系统还结合了动态阈值方法、微调和数据增强,以提高动态自然环境中目标检测的可靠性。这些机制提高了对改变照明,遮挡和背景复杂性的鲁棒性,这是户外镜头中常见的挑战。通过对多线程和单线程实现、环境压力测试和消融研究之间的性能比较,对系统进行了全面评估。结果表明,在动态环境中,目标检测和计数的一致性得到了改善,处理速度也有了显著提高。该系统专为轻量级和低功耗设备而设计,适用于远程或资源受限的环境。虽然该方法是为生物多样性监测而设计的,但它也适用于其他需要在非结构化环境中进行高效、实时视频分析的领域。
{"title":"Real-time habitat mapping with YOLOv8: A multi-threaded approach to biodiversity preservation","authors":"Oluwakemi Akinwehinmi ,&nbsp;Alberto Tena ,&nbsp;Javier Mora ,&nbsp;Francesc Solsona ,&nbsp;Pedro Arnau del Amo","doi":"10.1016/j.cviu.2025.104606","DOIUrl":"10.1016/j.cviu.2025.104606","url":null,"abstract":"<div><div>This paper presents a robust system for real-time object detection and counting in ecological video streams. It is based on the YOLOv8 architecture integrated within a multi-threaded video processing architecture. The system reduces latency and improves throughput by parallelizing object detection and preprocessing tasks. This leads to outperforming traditional single-threaded implementations in continuous video analysis.</div><div>The system also incorporates dynamic thresholding methods, fine-tuning, and data augmentation to enhance object detection reliability in dynamic natural environments. These mechanisms improve robustness to changing lighting, occlusions, and background complexity, common challenges in outdoor footage. The system is thoroughly evaluated through performance comparisons between multi-threaded and single-threaded implementations, environmental stress tests, and an ablation study.</div><div>Results demonstrate improved consistency in object detection and counting in dynamic environments, along with significant gains in processing speed. Designed for deployment on lightweight and low-power devices, the system is suitable for remote or resource-constrained settings.</div><div>While designed for biodiversity monitoring, the approach is applicable to other domains requiring efficient, real-time video analysis in unstructured environments.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104606"},"PeriodicalIF":3.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Distractor suppression Siamese network with task-aware attention for visual tracking 基于任务感知注意力的干扰抑制Siamese网络视觉跟踪
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-11 DOI: 10.1016/j.cviu.2025.104607
Zhigang Liu , Fuyuan Xing , Hao Huang , Kexin Wang , Yuxuan Shao
Existing IoU-guided trackers suppress background distractors by weighting the classification scores with IoU predictions, which limits their effectiveness in complex tracking scenarios. In this paper, we propose a Distractor feature suppression Siamese network with Task-aware attention (SiamDT) for visual tracking. Firstly, we design a distractor feature suppression network that uses IoU scores to suppress distractor features in the classification feature, achieving distractor suppression at the feature level. At the same time, we design a task-aware attention network that reconstructs the cross-correlation feature by using a hybrid attention mechanism, which enhances the semantic representation capability of the features from the classification and regression branches across spatial and channel domains. Extensive experiments on benchmarks including OTB2013, OTB2015, UAV123, LaSOT, and GOT10k demonstrate that the proposed SiamDT achieves state-of-the-art tracking performance.
现有的IoU引导跟踪器通过IoU预测加权分类分数来抑制背景干扰物,这限制了其在复杂跟踪场景中的有效性。在本文中,我们提出了一种带有任务感知注意(SiamDT)的干扰特征抑制Siamese网络用于视觉跟踪。首先,我们设计了一个干扰物特征抑制网络,利用IoU分数在分类特征中抑制干扰物特征,实现特征层面的干扰物抑制。同时,我们设计了一个任务感知的注意网络,利用混合注意机制重构相互关联特征,增强了分类和回归分支跨空间和通道域特征的语义表示能力。在包括OTB2013、OTB2015、UAV123、LaSOT和GOT10k在内的基准测试上进行的大量实验表明,所提出的SiamDT实现了最先进的跟踪性能。
{"title":"Distractor suppression Siamese network with task-aware attention for visual tracking","authors":"Zhigang Liu ,&nbsp;Fuyuan Xing ,&nbsp;Hao Huang ,&nbsp;Kexin Wang ,&nbsp;Yuxuan Shao","doi":"10.1016/j.cviu.2025.104607","DOIUrl":"10.1016/j.cviu.2025.104607","url":null,"abstract":"<div><div>Existing IoU-guided trackers suppress background distractors by weighting the classification scores with IoU predictions, which limits their effectiveness in complex tracking scenarios. In this paper, we propose a Distractor feature suppression Siamese network with Task-aware attention (SiamDT) for visual tracking. Firstly, we design a distractor feature suppression network that uses IoU scores to suppress distractor features in the classification feature, achieving distractor suppression at the feature level. At the same time, we design a task-aware attention network that reconstructs the cross-correlation feature by using a hybrid attention mechanism, which enhances the semantic representation capability of the features from the classification and regression branches across spatial and channel domains. Extensive experiments on benchmarks including OTB2013, OTB2015, UAV123, LaSOT, and GOT10k demonstrate that the proposed SiamDT achieves state-of-the-art tracking performance.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104607"},"PeriodicalIF":3.5,"publicationDate":"2025-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring visual language models for driver gaze estimation: A task-based approach to debugging AI 探索驾驶员注视估计的视觉语言模型:基于任务的人工智能调试方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-08 DOI: 10.1016/j.cviu.2025.104593
Paola Natalia Cañas , Alejandro H. Artiles , Marcos Nieto , Igor Rodríguez
Visual Language Models (VLMs) have demonstrated superior context understanding and generalization across various tasks compared to models tailored for specific tasks. However, due to their complexity and limited information on their training processes, estimating their performance on specific tasks often requires exhaustive testing, which can be costly and may not account for edge cases. To leverage the zero-shot capabilities of VLMs in safety-critical applications like Driver Monitoring Systems, it is crucial to characterize their knowledge and abilities to ensure consistent performance. This research proposes a methodology to explore and gain a deeper understanding of the functioning of these models in driver’s gaze estimation. It involves detailed task decomposition, identification of necessary data knowledge and abilities (e.g., understanding gaze concepts), and exploration through targeted prompting strategies. Applying this methodology to several VLMs (Idefics2, Qwen2-VL, Moondream, GPT-4o) revealed significant limitations, including sensitivity to prompt phrasing, vocabulary mismatches, reliance on image-relative spatial frames, and difficulties inferring non-visible elements. The findings from this evaluation have highlighted specific areas for improvement and guided the development of more effective prompting and fine-tuning strategies, resulting in enhanced performance comparable with traditional CNN-based approaches. This research is also useful for initial model filtering, for selecting the best model among alternatives and for understanding the model’s limitations and expected behaviors, thereby increasing reliability.
与为特定任务量身定制的模型相比,视觉语言模型(VLMs)在各种任务中表现出更好的上下文理解和泛化能力。然而,由于它们的复杂性和关于它们的训练过程的有限信息,估计它们在特定任务上的表现通常需要详尽的测试,这可能是昂贵的,并且可能无法解释边缘情况。为了在驾驶员监控系统(Driver Monitoring Systems)等安全关键应用中充分利用vlm的零射击功能,对其知识和能力进行特征描述以确保一致的性能至关重要。本研究提出了一种方法来探索和深入了解这些模型在驾驶员注视估计中的功能。它包括详细的任务分解,识别必要的数据知识和能力(例如,理解凝视概念),以及通过有针对性的提示策略进行探索。将这种方法应用于几个VLMs (Idefics2, Qwen2-VL, Moondream, gpt - 40)发现了显著的局限性,包括对提示短语的敏感性,词汇不匹配,对图像相对空间框架的依赖,以及推断不可见元素的困难。这项评估的结果突出了需要改进的具体领域,并指导了更有效的提示和微调策略的发展,从而提高了与传统的基于cnn的方法相媲美的性能。该研究还有助于初始模型过滤,在备选模型中选择最佳模型,了解模型的局限性和预期行为,从而提高可靠性。
{"title":"Exploring visual language models for driver gaze estimation: A task-based approach to debugging AI","authors":"Paola Natalia Cañas ,&nbsp;Alejandro H. Artiles ,&nbsp;Marcos Nieto ,&nbsp;Igor Rodríguez","doi":"10.1016/j.cviu.2025.104593","DOIUrl":"10.1016/j.cviu.2025.104593","url":null,"abstract":"<div><div>Visual Language Models (VLMs) have demonstrated superior context understanding and generalization across various tasks compared to models tailored for specific tasks. However, due to their complexity and limited information on their training processes, estimating their performance on specific tasks often requires exhaustive testing, which can be costly and may not account for edge cases. To leverage the zero-shot capabilities of VLMs in safety-critical applications like Driver Monitoring Systems, it is crucial to characterize their knowledge and abilities to ensure consistent performance. This research proposes a methodology to explore and gain a deeper understanding of the functioning of these models in driver’s gaze estimation. It involves detailed task decomposition, identification of necessary data knowledge and abilities (e.g., understanding gaze concepts), and exploration through targeted prompting strategies. Applying this methodology to several VLMs (Idefics2, Qwen2-VL, Moondream, GPT-4o) revealed significant limitations, including sensitivity to prompt phrasing, vocabulary mismatches, reliance on image-relative spatial frames, and difficulties inferring non-visible elements. The findings from this evaluation have highlighted specific areas for improvement and guided the development of more effective prompting and fine-tuning strategies, resulting in enhanced performance comparable with traditional CNN-based approaches. This research is also useful for initial model filtering, for selecting the best model among alternatives and for understanding the model’s limitations and expected behaviors, thereby increasing reliability.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104593"},"PeriodicalIF":3.5,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1