首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
A dynamic hybrid network with attention and mamba for image captioning 一个带有注意力和曼巴的动态混合网络,用于图像字幕
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-18 DOI: 10.1016/j.cviu.2025.104617
Lulu Wang, Ruiji Xue, Zhengtao Yu, Ruoyu Zhang, Tongling Pan, Yingna Li
Image captioning (IC) is a pivotal cross-modal task that generates coherent textual descriptions for visual inputs, bridging vision and language domains. Attention-based methods have significantly advanced the field of image captioning. However, empirical observations indicate that attention mechanisms often allocate focus uniformly across the full spectrum of feature sequences, which inadvertently diminishes emphasis on long-range dependencies. Such remote elements, nevertheless, play a critical role in yielding captions of superior quality. Therefore, we pursued strategies that harmonize comprehensive feature representation with targeted prioritization of key signals, ultimately proposed the Dynamic Hybrid Network (DH-Net) to enhance caption quality. Specifically, following the encoder–decoder architecture, we propose a hybrid encoder (HE) to integrate the attention mechanisms with the mamba blocks. which further complements the attention by leveraging mamba’s superior long-sequence modeling capabilities, and enables a synergistic combination of local feature extraction and global context modeling. Additionally, we introduce a Feature Aggregation Module (FAM) into the decoder, which dynamically adapts multi-modal feature fusion to evolving decoding contexts, ensuring context-sensitive integration of heterogeneous features. Extensive evaluations on the MSCOCO and Flickr30k dataset demonstrate that DH-Net achieves state-of-the-art performance, significantly outperforming existing approaches in generating accurate and semantically rich captions. The implementation code is accessible via https://github.com/simple-boy/DH-Net.
图像字幕(IC)是一项关键的跨模态任务,它为视觉输入生成连贯的文本描述,架起视觉和语言领域的桥梁。基于注意力的方法极大地推动了图像字幕领域的发展。然而,经验观察表明,注意机制通常将注意力均匀地分配到特征序列的全谱上,这无意中减少了对远程依赖关系的强调。然而,这些遥远的元素在产生高质量的字幕中起着关键作用。因此,我们寻求将综合特征表示与关键信号的目标优先级协调起来的策略,最终提出了动态混合网络(Dynamic Hybrid Network, DH-Net)来提高字幕质量。具体来说,在编码器-解码器架构的基础上,我们提出了一个混合编码器(HE),将注意力机制与曼巴块集成在一起。通过利用曼巴优越的长序列建模能力,进一步补充了注意力,并实现了局部特征提取和全局上下文建模的协同结合。此外,我们在解码器中引入了特征聚合模块(FAM),该模块可以根据不断变化的解码上下文动态地适应多模态特征融合,从而确保异构特征的上下文敏感集成。对MSCOCO和Flickr30k数据集的广泛评估表明,DH-Net实现了最先进的性能,在生成准确且语义丰富的字幕方面显著优于现有方法。实现代码可通过https://github.com/simple-boy/DH-Net访问。
{"title":"A dynamic hybrid network with attention and mamba for image captioning","authors":"Lulu Wang,&nbsp;Ruiji Xue,&nbsp;Zhengtao Yu,&nbsp;Ruoyu Zhang,&nbsp;Tongling Pan,&nbsp;Yingna Li","doi":"10.1016/j.cviu.2025.104617","DOIUrl":"10.1016/j.cviu.2025.104617","url":null,"abstract":"<div><div>Image captioning (IC) is a pivotal cross-modal task that generates coherent textual descriptions for visual inputs, bridging vision and language domains. Attention-based methods have significantly advanced the field of image captioning. However, empirical observations indicate that attention mechanisms often allocate focus uniformly across the full spectrum of feature sequences, which inadvertently diminishes emphasis on long-range dependencies. Such remote elements, nevertheless, play a critical role in yielding captions of superior quality. Therefore, we pursued strategies that harmonize comprehensive feature representation with targeted prioritization of key signals, ultimately proposed the Dynamic Hybrid Network (DH-Net) to enhance caption quality. Specifically, following the encoder–decoder architecture, we propose a hybrid encoder (HE) to integrate the attention mechanisms with the mamba blocks. which further complements the attention by leveraging mamba’s superior long-sequence modeling capabilities, and enables a synergistic combination of local feature extraction and global context modeling. Additionally, we introduce a Feature Aggregation Module (FAM) into the decoder, which dynamically adapts multi-modal feature fusion to evolving decoding contexts, ensuring context-sensitive integration of heterogeneous features. Extensive evaluations on the MSCOCO and Flickr30k dataset demonstrate that DH-Net achieves state-of-the-art performance, significantly outperforming existing approaches in generating accurate and semantically rich captions. The implementation code is accessible via <span><span>https://github.com/simple-boy/DH-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104617"},"PeriodicalIF":3.5,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MFDiff: Diffusion probabilistic model for medical image segmentation with multi-scale features and frequency-aware attention MFDiff:基于多尺度特征和频率感知关注的医学图像分割扩散概率模型
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-17 DOI: 10.1016/j.cviu.2025.104605
Xingli Zhang , Yameng Liu , Haiyang Yu , Zhihui Wang
Medical image segmentation serves as a critical technique in clinical applications such as disease diagnosis, surgical planning, and image-guided therapy, where segmentation accuracy directly impacts the precision of clinical decisions. However, existing methods still face significant challenges in handling inherent issues of medical images, including blurred boundaries, complex multi-scale structures, and difficulties in fine-grained feature representation. To address these challenges, this paper proposes a medical image segmentation method based on a diffusion probabilistic model, MFDiff, which aims to enhance multi-scale contextual awareness and fine-grained structural modeling capabilities. The method incorporates a frequency-aware attention fusion module that effectively strengthens the model’s ability to represent complex structures and ambiguous boundaries. Additionally, a multi-scale feature enhancement module is introduced to expand the receptive field while maintaining low computational cost, thereby improving the extraction and fusion of multi-scale features. Furthermore, an uncertainty-weighted majority voting fusion strategy is proposed to enhance the robustness and consistency of fused predictions from multiple sampling iterations. The proposed method was validated on five medical image segmentation datasets. Experimental results demonstrate that MFDiff outperforms current mainstream methods across all datasets, exhibiting strong generalization ability and robustness.
医学图像分割是疾病诊断、手术计划、图像引导治疗等临床应用中的一项关键技术,分割的准确性直接影响临床决策的准确性。然而,现有的方法在处理医学图像固有问题时仍然面临重大挑战,包括边界模糊、复杂的多尺度结构以及细粒度特征表示困难。为了解决这些问题,本文提出了一种基于扩散概率模型MFDiff的医学图像分割方法,旨在增强多尺度上下文感知和细粒度结构建模能力。该方法结合了频率感知注意力融合模块,有效增强了模型表示复杂结构和模糊边界的能力。此外,引入了多尺度特征增强模块,在保持低计算成本的同时扩大了接收域,从而提高了多尺度特征的提取和融合。在此基础上,提出了一种不确定性加权多数投票融合策略,以提高多采样迭代融合预测的鲁棒性和一致性。在5个医学图像分割数据集上进行了验证。实验结果表明,MFDiff在所有数据集上都优于当前主流方法,具有较强的泛化能力和鲁棒性。
{"title":"MFDiff: Diffusion probabilistic model for medical image segmentation with multi-scale features and frequency-aware attention","authors":"Xingli Zhang ,&nbsp;Yameng Liu ,&nbsp;Haiyang Yu ,&nbsp;Zhihui Wang","doi":"10.1016/j.cviu.2025.104605","DOIUrl":"10.1016/j.cviu.2025.104605","url":null,"abstract":"<div><div>Medical image segmentation serves as a critical technique in clinical applications such as disease diagnosis, surgical planning, and image-guided therapy, where segmentation accuracy directly impacts the precision of clinical decisions. However, existing methods still face significant challenges in handling inherent issues of medical images, including blurred boundaries, complex multi-scale structures, and difficulties in fine-grained feature representation. To address these challenges, this paper proposes a medical image segmentation method based on a diffusion probabilistic model, MFDiff, which aims to enhance multi-scale contextual awareness and fine-grained structural modeling capabilities. The method incorporates a frequency-aware attention fusion module that effectively strengthens the model’s ability to represent complex structures and ambiguous boundaries. Additionally, a multi-scale feature enhancement module is introduced to expand the receptive field while maintaining low computational cost, thereby improving the extraction and fusion of multi-scale features. Furthermore, an uncertainty-weighted majority voting fusion strategy is proposed to enhance the robustness and consistency of fused predictions from multiple sampling iterations. The proposed method was validated on five medical image segmentation datasets. Experimental results demonstrate that MFDiff outperforms current mainstream methods across all datasets, exhibiting strong generalization ability and robustness.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104605"},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation 基于特征校正和语义调制的广义提示驱动零距域自适应分割
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-17 DOI: 10.1016/j.cviu.2025.104615
Jinyi Li , Longyu Yang , Donghyun Kim , Kuniaki Saito , Kate Saenko , Stan Sclaroff , Xiaofeng Zhu , Ping Hu
Recent prompt-driven zero-shot adaptation methods offer a promising way to handle domain shifts in semantic segmentation by learning with features simulated from natural language prompts. However, these methods typically depend on a fixed set of predefined domain descriptions, which limits their capacity to generalize to previously undefined domains and often necessitates retraining when encountering novel environments. To address this challenge, we propose a Generalized Prompt-driven Zero-shot Domain Adaptive Segmentation framework that enables flexible and robust cross-domain segmentation by learning to map target domain features into the source domain space. This allows inference to be performed through a unified and well-optimized source model, without requiring target data-based or prompt-based retraining when encountering novel conditions. Our framework comprises two key modules: a Low-level Feature Rectification (LLFR) module that aligns visual styles using a historical source-style memory bank, and a High-level Semantic Modulation (HLSM) module that applies language-guided affine transformations to align high-level semantics. Together, these modules enable adaptive multi-level feature adaptation that maps target inputs into the source domain space, thus allowing the model to handle unseen domains effectively at test time. Extensive experiments on multiple zero-shot domain adaptation benchmarks are conducted, and the results show that our method consistently outperforms previous approaches.
近年来,提示驱动的零距自适应方法通过学习自然语言提示中模拟的特征,提供了一种很有前途的处理语义分割领域转移的方法。然而,这些方法通常依赖于一组固定的预定义领域描述,这限制了它们泛化到以前未定义的领域的能力,并且在遇到新环境时经常需要重新训练。为了解决这一挑战,我们提出了一种广义提示驱动的零射击域自适应分割框架,该框架通过学习将目标域特征映射到源域空间,实现灵活而稳健的跨域分割。这允许通过统一且优化良好的源模型执行推理,而不需要在遇到新情况时进行基于目标数据或基于提示的再训练。我们的框架包括两个关键模块:一个低级特征校正(LLFR)模块,它使用历史源风格的记忆库来对齐视觉样式,一个高级语义调制(HLSM)模块,它应用语言引导的仿射变换来对齐高级语义。总之,这些模块支持自适应多层次特征适应,将目标输入映射到源域空间,从而允许模型在测试时有效地处理未见过的域。在多个零射击域自适应基准上进行了大量实验,结果表明我们的方法始终优于先前的方法。
{"title":"Generalized prompt-driven zero-shot domain adaptive segmentation with feature rectification and semantic modulation","authors":"Jinyi Li ,&nbsp;Longyu Yang ,&nbsp;Donghyun Kim ,&nbsp;Kuniaki Saito ,&nbsp;Kate Saenko ,&nbsp;Stan Sclaroff ,&nbsp;Xiaofeng Zhu ,&nbsp;Ping Hu","doi":"10.1016/j.cviu.2025.104615","DOIUrl":"10.1016/j.cviu.2025.104615","url":null,"abstract":"<div><div>Recent prompt-driven zero-shot adaptation methods offer a promising way to handle domain shifts in semantic segmentation by learning with features simulated from natural language prompts. However, these methods typically depend on a fixed set of predefined domain descriptions, which limits their capacity to generalize to previously undefined domains and often necessitates retraining when encountering novel environments. To address this challenge, we propose a Generalized Prompt-driven Zero-shot Domain Adaptive Segmentation framework that enables flexible and robust cross-domain segmentation by learning to map target domain features into the source domain space. This allows inference to be performed through a unified and well-optimized source model, without requiring target data-based or prompt-based retraining when encountering novel conditions. Our framework comprises two key modules: a Low-level Feature Rectification (LLFR) module that aligns visual styles using a historical source-style memory bank, and a High-level Semantic Modulation (HLSM) module that applies language-guided affine transformations to align high-level semantics. Together, these modules enable adaptive multi-level feature adaptation that maps target inputs into the source domain space, thus allowing the model to handle unseen domains effectively at test time. Extensive experiments on multiple zero-shot domain adaptation benchmarks are conducted, and the results show that our method consistently outperforms previous approaches.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104615"},"PeriodicalIF":3.5,"publicationDate":"2025-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SGCNet: Silhouette Guided Cascaded Network for multi-modal image fusion 面向多模态图像融合的轮廓引导级联网络
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.cviu.2025.104603
Yuxuan Wang , Zhongwei Shen , Hui Li , Yuning Zhang , Zhenping Xia
For generating high-quality fused images in the field of image fusion, it is essential to effectively capture local detail information (e.g., texture) alongside global information (e.g., color blocks). However, conventional fusion techniques often fail to balance local and global information. This imbalance can lead to fused results excessively favoring either infrared or visible light characteristics, compromising the contrast and detail in the fused image. To tackle this problem, we propose the Silhouette Guided Cascaded Network (SGCNet). The encoder of our method employs Cascaded Dense Connection structure that integrates CNN and Transformer-based encoders to extract both local and global features in a compatible manner. In the fusion stage, the silhouettes of the targets are extracted by a pretrained semantic segmentation model that provides global spatial weighting for detailed features, guiding the alignment of features across different modalities. Extensive experiments demonstrate that SGCNet outperforms existing fusion methods across a variety of tasks, including infrared-visible and medical image fusion, highlighting its technological advancements and broad practical application potential.
在图像融合领域中,为了生成高质量的融合图像,必须有效地捕获局部细节信息(如纹理)和全局信息(如色块)。然而,传统的融合技术往往不能平衡局部和全局信息。这种不平衡可能导致融合的结果过于有利于红外或可见光的特性,妥协的对比度和细节在融合的图像。为了解决这个问题,我们提出了轮廓引导级联网络(SGCNet)。我们方法的编码器采用级联密集连接结构,该结构集成了CNN和基于transformer的编码器,以兼容的方式提取局部和全局特征。在融合阶段,通过预训练的语义分割模型提取目标轮廓,该模型为细节特征提供全局空间权重,指导特征跨不同模态的对齐。大量实验表明,SGCNet在红外-可见光和医学图像融合等多种任务上都优于现有的融合方法,凸显了其技术的先进性和广泛的实际应用潜力。
{"title":"SGCNet: Silhouette Guided Cascaded Network for multi-modal image fusion","authors":"Yuxuan Wang ,&nbsp;Zhongwei Shen ,&nbsp;Hui Li ,&nbsp;Yuning Zhang ,&nbsp;Zhenping Xia","doi":"10.1016/j.cviu.2025.104603","DOIUrl":"10.1016/j.cviu.2025.104603","url":null,"abstract":"<div><div>For generating high-quality fused images in the field of image fusion, it is essential to effectively capture local detail information (e.g., texture) alongside global information (e.g., color blocks). However, conventional fusion techniques often fail to balance local and global information. This imbalance can lead to fused results excessively favoring either infrared or visible light characteristics, compromising the contrast and detail in the fused image. To tackle this problem, we propose the Silhouette Guided Cascaded Network (SGCNet). The encoder of our method employs Cascaded Dense Connection structure that integrates CNN and Transformer-based encoders to extract both local and global features in a compatible manner. In the fusion stage, the silhouettes of the targets are extracted by a pretrained semantic segmentation model that provides global spatial weighting for detailed features, guiding the alignment of features across different modalities. Extensive experiments demonstrate that SGCNet outperforms existing fusion methods across a variety of tasks, including infrared-visible and medical image fusion, highlighting its technological advancements and broad practical application potential.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104603"},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal side tuning pre-trained foundation models for video-based pedestrian attribute recognition 基于视频行人属性识别的时空侧调优预训练基础模型
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-16 DOI: 10.1016/j.cviu.2025.104588
Xiao Wang , Qian Zhu , Jiandong Jin , Jun Zhu , Futian Wang , Bo Jiang , Yaowei Wang , Yonghong Tian
Pedestrian Attribute Recognition (PAR) models based on static images struggle to handle issues such as occlusion and motion blur, and recently proposed video-PAR models have not fully utilized the potential of larger models, resulting in sub-optimal performance. In this work, we propose a video-PAR framework that leverages temporal information by efficiently fine-tuning a multi-modal foundation model. Specifically, we cast video-based PAR as a vision-language fusion task, using CLIP for visual feature extraction and prompt engineering to convert attributes into sentences for text embedding. We introduce a spatiotemporal side-tuning strategy for parameter-efficient optimization and fuse visual and textual tokens via a Transformer for interactive learning. The enhanced tokens are used for final attribute prediction. Experiments on two video-PAR datasets validate the effectiveness of our method. The source code of this paper is available at https://github.com/Event-AHU/OpenPAR.
基于静态图像的行人属性识别(PAR)模型难以处理遮挡和运动模糊等问题,最近提出的视频-PAR模型没有充分利用大型模型的潜力,导致性能不佳。在这项工作中,我们提出了一个视频par框架,该框架通过有效微调多模态基础模型来利用时间信息。具体来说,我们将基于视频的PAR作为视觉语言融合任务,使用CLIP进行视觉特征提取,并使用提示工程将属性转换为句子以进行文本嵌入。我们引入了一种时空侧调策略,用于参数高效优化,并通过Transformer融合视觉和文本标记以进行交互式学习。增强的令牌用于最终属性预测。在两个视频par数据集上的实验验证了该方法的有效性。本文的源代码可从https://github.com/Event-AHU/OpenPAR获得。
{"title":"Spatio-temporal side tuning pre-trained foundation models for video-based pedestrian attribute recognition","authors":"Xiao Wang ,&nbsp;Qian Zhu ,&nbsp;Jiandong Jin ,&nbsp;Jun Zhu ,&nbsp;Futian Wang ,&nbsp;Bo Jiang ,&nbsp;Yaowei Wang ,&nbsp;Yonghong Tian","doi":"10.1016/j.cviu.2025.104588","DOIUrl":"10.1016/j.cviu.2025.104588","url":null,"abstract":"<div><div>Pedestrian Attribute Recognition (PAR) models based on static images struggle to handle issues such as occlusion and motion blur, and recently proposed video-PAR models have not fully utilized the potential of larger models, resulting in sub-optimal performance. In this work, we propose a video-PAR framework that leverages temporal information by efficiently fine-tuning a multi-modal foundation model. Specifically, we cast video-based PAR as a vision-language fusion task, using CLIP for visual feature extraction and prompt engineering to convert attributes into sentences for text embedding. We introduce a spatiotemporal side-tuning strategy for parameter-efficient optimization and fuse visual and textual tokens via a Transformer for interactive learning. The enhanced tokens are used for final attribute prediction. Experiments on two video-PAR datasets validate the effectiveness of our method. The source code of this paper is available at <span><span>https://github.com/Event-AHU/OpenPAR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104588"},"PeriodicalIF":3.5,"publicationDate":"2025-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145840281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FreqOR: Frequency-guided sampling initialization with attention enhancements for training-free object repositioning FreqOR:用于无训练对象重定位的带有注意增强的频率引导采样初始化
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.cviu.2025.104610
Yuanxiang Fang, Jingyue Wang, Meiqing Wang, Shujie Zhang, Huimin Liu
Object repositioning in real images remains a challenging task. Existing approaches are typically built upon the DDIM inversion framework, whose sampling initialization tends to preserve strong layout priors in the latent space, thereby leading to object residuals or ghosting artifacts in the vacated region. Additionally, masking low-resolution self-attention maps often results in boundary misjudgments, which impair the inpainting capability. To address these limitations, we propose FreqOR, a training-free framework that integrates sampling initialization optimization with attention-level enhancements. For sampling initialization, high-frequency components of the inverted latent in the vacated region are suppressed to weaken inherited priors, thereby providing a cleaner sampling initialization. For attention enhancement, we incorporate two complementary strategies. The first is Resolution-Aligned Key–Value Interpolation, which achieves precise regional control by enabling pixel-wise masking of attention maps. The second is Query-Guided Consistency, which preserves the identity and texture consistency of the designated object by reusing inversion queries as priors during sampling. Integrated into the energy-based guidance framework, FreqOR is evaluated on the COCO-130 and VOC-100 datasets. The results demonstrate that it effectively suppresses residuals in the vacated region and enhances object consistency.
物体在真实图像中的重新定位仍然是一项具有挑战性的任务。现有的方法通常建立在DDIM反演框架之上,其采样初始化倾向于在潜在空间中保留强布局先验,从而导致空出区域中的目标残差或鬼影伪影。此外,掩蔽低分辨率的自注意图往往会导致边界误判,从而影响图像的绘制能力。为了解决这些限制,我们提出了FreqOR,这是一个集成了采样初始化优化和注意力水平增强的无训练框架。对于采样初始化,在空出的区域中,反向隐波的高频分量被抑制以削弱继承的先验,从而提供更清晰的采样初始化。为了提高注意力,我们采用了两种互补的策略。第一种是分辨率对齐键值插值,它通过启用逐像素的注意力地图掩蔽来实现精确的区域控制。二是查询引导一致性,通过在采样过程中重用反转查询作为先验,保持指定对象的身份和纹理一致性。FreqOR集成到基于能量的指导框架中,在COCO-130和VOC-100数据集上进行评估。结果表明,该方法有效地抑制了空出区域的残差,提高了目标的一致性。
{"title":"FreqOR: Frequency-guided sampling initialization with attention enhancements for training-free object repositioning","authors":"Yuanxiang Fang,&nbsp;Jingyue Wang,&nbsp;Meiqing Wang,&nbsp;Shujie Zhang,&nbsp;Huimin Liu","doi":"10.1016/j.cviu.2025.104610","DOIUrl":"10.1016/j.cviu.2025.104610","url":null,"abstract":"<div><div>Object repositioning in real images remains a challenging task. Existing approaches are typically built upon the DDIM inversion framework, whose sampling initialization tends to preserve strong layout priors in the latent space, thereby leading to object residuals or ghosting artifacts in the vacated region. Additionally, masking low-resolution self-attention maps often results in boundary misjudgments, which impair the inpainting capability. To address these limitations, we propose FreqOR, a training-free framework that integrates sampling initialization optimization with attention-level enhancements. For sampling initialization, high-frequency components of the inverted latent in the vacated region are suppressed to weaken inherited priors, thereby providing a cleaner sampling initialization. For attention enhancement, we incorporate two complementary strategies. The first is Resolution-Aligned Key–Value Interpolation, which achieves precise regional control by enabling pixel-wise masking of attention maps. The second is Query-Guided Consistency, which preserves the identity and texture consistency of the designated object by reusing inversion queries as priors during sampling. Integrated into the energy-based guidance framework, FreqOR is evaluated on the COCO-130 and VOC-100 datasets. The results demonstrate that it effectively suppresses residuals in the vacated region and enhances object consistency.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104610"},"PeriodicalIF":3.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AdaMulti: An adaptive cascaded multi-modal recognition framework for sports action analysis 一个用于运动动作分析的自适应级联多模态识别框架
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-15 DOI: 10.1016/j.cviu.2025.104604
Jianwei Li , Rui Cao , Haiqing Hu , Xiaomei Zhao , Pengju Zhang
Computer vision-based sports action analysis has emerged as a pivotal research domain, driving transformative applications including healthcare and sports analytics. While deep learning advancements have significantly improved automatic human action recognition and assessment, existing approaches typically rely exclusively on either RGB video streams or skeletal key points-each presenting unique advantages. RGB data offers rich contextual information and widespread accessibility, whereas skeleton data provides a compact representation ideal for direct pose analysis. To harness the complementary strengths of both modalities, we propose AdaMulti, an adaptive cascaded multi-modal framework for fine-grained human action analysis. Our novel approach integrates both RGB and skeleton data through two key innovations: (1) an intelligent policy network that dynamically selects the optimal modality (RGB or skeleton) for each frame, and (2) a cascaded recognition architecture that effectively fuses multi-modal features. We evaluate AdaMulti using a newly constructed multi-modal dataset derived from our 3D-Yoga project, comprising extensive yoga poses with detailed performance annotations. Experimental results demonstrate that AdaMulti outperforms single-modal methods by 17% and 32% in recognition accuracy. Furthermore, comparative studies on the public NTU-RGB+D 60 benchmark show that our method achieves a 0.6% higher accuracy than the state-of-the-art method, validating its effectiveness for complex action analysis tasks.
基于计算机视觉的体育动作分析已经成为一个关键的研究领域,推动了包括医疗保健和体育分析在内的变革性应用。虽然深度学习的进步显著改善了人类行为的自动识别和评估,但现有的方法通常只依赖于RGB视频流或骨架关键点,每种方法都具有独特的优势。RGB数据提供了丰富的上下文信息和广泛的可访问性,而骨骼数据提供了一个紧凑的表示,适合直接姿态分析。为了利用这两种模式的互补优势,我们提出了AdaMulti,一个用于细粒度人类行为分析的自适应级联多模式框架。我们的新方法通过两个关键创新集成了RGB和骨架数据:(1)智能策略网络,为每帧动态选择最佳模态(RGB或骨架);(2)级联识别架构,有效融合多模态特征。我们使用来自3d瑜伽项目的新构建的多模态数据集来评估AdaMulti,该数据集包括广泛的瑜伽姿势和详细的性能注释。实验结果表明,AdaMulti的识别准确率分别比单模态方法高17%和32%。此外,在公开的NTU-RGB+ d60基准上的比较研究表明,我们的方法比最先进的方法准确率高0.6%,验证了其对复杂动作分析任务的有效性。
{"title":"AdaMulti: An adaptive cascaded multi-modal recognition framework for sports action analysis","authors":"Jianwei Li ,&nbsp;Rui Cao ,&nbsp;Haiqing Hu ,&nbsp;Xiaomei Zhao ,&nbsp;Pengju Zhang","doi":"10.1016/j.cviu.2025.104604","DOIUrl":"10.1016/j.cviu.2025.104604","url":null,"abstract":"<div><div>Computer vision-based sports action analysis has emerged as a pivotal research domain, driving transformative applications including healthcare and sports analytics. While deep learning advancements have significantly improved automatic human action recognition and assessment, existing approaches typically rely exclusively on either RGB video streams or skeletal key points-each presenting unique advantages. RGB data offers rich contextual information and widespread accessibility, whereas skeleton data provides a compact representation ideal for direct pose analysis. To harness the complementary strengths of both modalities, we propose AdaMulti, an adaptive cascaded multi-modal framework for fine-grained human action analysis. Our novel approach integrates both RGB and skeleton data through two key innovations: (1) an intelligent policy network that dynamically selects the optimal modality (RGB or skeleton) for each frame, and (2) a cascaded recognition architecture that effectively fuses multi-modal features. We evaluate AdaMulti using a newly constructed multi-modal dataset derived from our 3D-Yoga project, comprising extensive yoga poses with detailed performance annotations. Experimental results demonstrate that AdaMulti outperforms single-modal methods by 17% and 32% in recognition accuracy. Furthermore, comparative studies on the public NTU-RGB+D 60 benchmark show that our method achieves a 0.6% higher accuracy than the state-of-the-art method, validating its effectiveness for complex action analysis tasks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104604"},"PeriodicalIF":3.5,"publicationDate":"2025-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A detector-free feature matching method with dual-frequency transformer 一种无检测器的双频变压器特征匹配方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-13 DOI: 10.1016/j.cviu.2025.104597
Zhen Han , Ning Lv , Chen Chen , Li Cong , Chengbin Huang , Bin Wang
Detector-free methods have achieved notable progress in recent years, but the limited capacity of existing models to leverage multi-frequency features continues to constrain matching performance. To address this challenge, we propose a novel feature matching approach based on a dual-frequency Transformer model, which effectively exploits multi-level image information. The proposed architecture employs dual attention branches, specifically designed to capture high-frequency details and low-frequency structural features. The high-frequency attention branch incorporates a feature enhancement module to accentuate edge visual features, which play a pivotal role in matching tasks. In addition, a frequency-based loss function is designed to constrain the consistency and integrity of features in the frequency domain during the feature extraction process, effectively mitigating frequency feature distortion. The proposed method not only enhances the model’s ability to represent contextual features across different frequency components but also improves selective attention to reliable feature details. Experimental results demonstrate the proposed method achieves superior performance in multiple feature matching tasks.
近年来,无检测器方法取得了显著进展,但现有模型利用多频率特征的能力有限,继续限制匹配性能。为了解决这一问题,我们提出了一种新的基于双频变压器模型的特征匹配方法,该方法有效地利用了多层次的图像信息。所提出的架构采用双注意分支,专门用于捕获高频细节和低频结构特征。高频注意分支包含特征增强模块,以突出在匹配任务中起关键作用的边缘视觉特征。此外,设计了基于频率的损失函数,在特征提取过程中约束特征在频域的一致性和完整性,有效缓解频率特征失真。该方法不仅增强了模型跨不同频率分量表示上下文特征的能力,而且提高了对可靠特征细节的选择性关注。实验结果表明,该方法在多种特征匹配任务中取得了较好的效果。
{"title":"A detector-free feature matching method with dual-frequency transformer","authors":"Zhen Han ,&nbsp;Ning Lv ,&nbsp;Chen Chen ,&nbsp;Li Cong ,&nbsp;Chengbin Huang ,&nbsp;Bin Wang","doi":"10.1016/j.cviu.2025.104597","DOIUrl":"10.1016/j.cviu.2025.104597","url":null,"abstract":"<div><div>Detector-free methods have achieved notable progress in recent years, but the limited capacity of existing models to leverage multi-frequency features continues to constrain matching performance. To address this challenge, we propose a novel feature matching approach based on a dual-frequency Transformer model, which effectively exploits multi-level image information. The proposed architecture employs dual attention branches, specifically designed to capture high-frequency details and low-frequency structural features. The high-frequency attention branch incorporates a feature enhancement module to accentuate edge visual features, which play a pivotal role in matching tasks. In addition, a frequency-based loss function is designed to constrain the consistency and integrity of features in the frequency domain during the feature extraction process, effectively mitigating frequency feature distortion. The proposed method not only enhances the model’s ability to represent contextual features across different frequency components but also improves selective attention to reliable feature details. Experimental results demonstrate the proposed method achieves superior performance in multiple feature matching tasks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104597"},"PeriodicalIF":3.5,"publicationDate":"2025-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced approach with edge feature guidance for LiDAR signal denoising 基于边缘特征制导的激光雷达信号去噪方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-12 DOI: 10.1016/j.cviu.2025.104609
A. Anigo Merjora , P. Sardar Maran
This research addresses the challenging task of denoising Light Detection and Ranging (LiDAR) signals, with a specific focus on Rayleigh backscatter signals that are particularly vulnerable to noise due to atmospheric interference and sensor limitations. An enhanced shearlet wavelet U-Net with edge feature guidance is proposed, which is a novel deep learning framework that integrates the multi-directional, multi-scale decomposition capabilities of the Shearlet wavelet transform with the powerful feature extraction and localization properties of the U-Net architecture. A key contribution of this research is the introduction of an edge feature guidance module within the U-Net, designed to preserve critical structural and edge details typically lost during denoising. The denoising process uses the Shearlet transform to decompose the noisy input signal into different scales and orientations. This allows the model to better identify the difference between noise and signal, and, more importantly, make this differentiation based on resolutions. The experimental assessments applied the suggested method to both synthetic and real-world atmospheric LiDAR datasets and compared it with a number of cutting-edge denoising methods, specifically classical wavelet-based denoising methods, as well as supervised deep-learning methods. Quantitative results indicate that our model produces a 28% higher average signal-to-noise ratio (SNR) and a 31% higher mean squared error (MSE) improvement on average than baseline methods. Qualitative analysis shows the proposed model continues to retain small-scale atmospheric structures and edge continuity. Overall, the results indicate that the proposed method is effective for improving LiDAR signal quality implemented for a wide range of applications in environmental monitoring and meteorology, and signal fidelity is critical.
本研究解决了光探测和测距(LiDAR)信号去噪的挑战性任务,特别关注由于大气干扰和传感器限制而特别容易受到噪声影响的瑞利后向散射信号。提出了一种基于边缘特征引导的增强shearlet小波U-Net深度学习框架,该框架将shearlet小波变换的多向、多尺度分解能力与U-Net结构强大的特征提取和定位特性相结合。本研究的一个关键贡献是在U-Net中引入了一个边缘特征引导模块,旨在保留在去噪过程中通常丢失的关键结构和边缘细节。去噪过程采用Shearlet变换将噪声输入信号分解成不同的尺度和方向。这使得模型能够更好地识别噪声和信号之间的差异,更重要的是,基于分辨率进行这种区分。实验评估将建议的方法应用于合成和真实的大气激光雷达数据集,并将其与许多先进的去噪方法进行了比较,特别是经典的基于小波的去噪方法,以及监督深度学习方法。定量结果表明,与基线方法相比,我们的模型平均信噪比(SNR)提高28%,平均均方误差(MSE)提高31%。定性分析表明,该模型继续保持小尺度大气结构和边缘连续性。总体而言,结果表明,所提出的方法对于提高激光雷达信号质量是有效的,可用于环境监测和气象的广泛应用,信号保真度至关重要。
{"title":"Enhanced approach with edge feature guidance for LiDAR signal denoising","authors":"A. Anigo Merjora ,&nbsp;P. Sardar Maran","doi":"10.1016/j.cviu.2025.104609","DOIUrl":"10.1016/j.cviu.2025.104609","url":null,"abstract":"<div><div>This research addresses the challenging task of denoising Light Detection and Ranging (LiDAR) signals, with a specific focus on Rayleigh backscatter signals that are particularly vulnerable to noise due to atmospheric interference and sensor limitations. An enhanced shearlet wavelet U-Net with edge feature guidance is proposed, which is a novel deep learning framework that integrates the multi-directional, multi-scale decomposition capabilities of the Shearlet wavelet transform with the powerful feature extraction and localization properties of the U-Net architecture. A key contribution of this research is the introduction of an edge feature guidance module within the U-Net, designed to preserve critical structural and edge details typically lost during denoising. The denoising process uses the Shearlet transform to decompose the noisy input signal into different scales and orientations. This allows the model to better identify the difference between noise and signal, and, more importantly, make this differentiation based on resolutions. The experimental assessments applied the suggested method to both synthetic and real-world atmospheric LiDAR datasets and compared it with a number of cutting-edge denoising methods, specifically classical wavelet-based denoising methods, as well as supervised deep-learning methods. Quantitative results indicate that our model produces a 28% higher average signal-to-noise ratio (SNR) and a 31% higher mean squared error (MSE) improvement on average than baseline methods. Qualitative analysis shows the proposed model continues to retain small-scale atmospheric structures and edge continuity. Overall, the results indicate that the proposed method is effective for improving LiDAR signal quality implemented for a wide range of applications in environmental monitoring and meteorology, and signal fidelity is critical.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"264 ","pages":"Article 104609"},"PeriodicalIF":3.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146038534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Temporal prompt guided visual–text–object alignment for zero-shot video captioning 零镜头视频字幕的时间提示引导视觉-文本-对象对齐
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-12 DOI: 10.1016/j.cviu.2025.104601
Ping Li , Tao Wang , Zeyu Pan
Video captioning generates the descriptive sentence for a video. Existing methods rely on a plentiful of annotated captions for training the model, but it is usually very expensive to collect so many captions. This raises a challenge that how to generate video captions with unpaired videos and sentences, i.e., zero-shot video captioning. While some progress using Large Language Model (LLM) has been made in zero-shot image captioning, it still fails to consider the temporal relations in the video domain. This may easily lead to the incorrect verbs and nouns in sentences if directly adapting LLM-based image methods to video. To address this problem, we propose the Temporal Prompt guided Visual–text–object Alignment (TPVA) approach for zero-shot video captioning. It consists of the temporal prompt guidance module and the visual–text–object alignment module. The former employs the pre-trained action recognition model to yield the action class as the key word of the temporal prompt, which guides the LLM to generate the text phrase containing the verb identifying action. The latter implements both visual–text alignment and text–object alignment by computing their similarity scores, respectively, which allows the model to generate the words better revealing the video semantics. Experimental results on several benchmarks demonstrate the superiority of the proposed method in zero-shot video captioning. Code is available at https://github.com/mlvccn/TPVA_VidCap_ZeroShot.
视频字幕生成视频的描述性句子。现有的方法依赖于大量带注释的说明文字来训练模型,但是收集这么多说明文字通常是非常昂贵的。这就提出了一个挑战,即如何用不配对的视频和句子生成视频字幕,即零镜头视频字幕。虽然利用大语言模型(Large Language Model, LLM)在零镜头图像字幕中取得了一些进展,但它仍然没有考虑视频域的时间关系。如果将基于llm的图像方法直接应用到视频中,很容易导致句子中的动词和名词出现错误。为了解决这个问题,我们提出了零镜头视频字幕的时间提示引导视觉文本对象对齐(TPVA)方法。它由时间提示引导模块和可视-文本-对象对齐模块组成。前者采用预先训练好的动作识别模型,生成动作类作为时态提示的关键词,引导LLM生成包含识别动作动词的文本短语。后者分别通过计算它们的相似度得分来实现视觉文本对齐和文本对象对齐,这使得模型能够生成更好地揭示视频语义的单词。几个基准的实验结果证明了该方法在零镜头视频字幕中的优越性。代码可从https://github.com/mlvccn/TPVA_VidCap_ZeroShot获得。
{"title":"Temporal prompt guided visual–text–object alignment for zero-shot video captioning","authors":"Ping Li ,&nbsp;Tao Wang ,&nbsp;Zeyu Pan","doi":"10.1016/j.cviu.2025.104601","DOIUrl":"10.1016/j.cviu.2025.104601","url":null,"abstract":"<div><div>Video captioning generates the descriptive sentence for a video. Existing methods rely on a plentiful of annotated captions for training the model, but it is usually very expensive to collect so many captions. This raises a challenge that how to generate video captions with unpaired videos and sentences, i.e., zero-shot video captioning. While some progress using Large Language Model (LLM) has been made in zero-shot image captioning, it still fails to consider the temporal relations in the video domain. This may easily lead to the incorrect verbs and nouns in sentences if directly adapting LLM-based image methods to video. To address this problem, we propose the Temporal Prompt guided Visual–text–object Alignment (<strong>TPVA</strong>) approach for zero-shot video captioning. It consists of the temporal prompt guidance module and the visual–text–object alignment module. The former employs the pre-trained action recognition model to yield the action class as the key word of the temporal prompt, which guides the LLM to generate the text phrase containing the verb identifying action. The latter implements both visual–text alignment and text–object alignment by computing their similarity scores, respectively, which allows the model to generate the words better revealing the video semantics. Experimental results on several benchmarks demonstrate the superiority of the proposed method in zero-shot video captioning. Code is available at <span><span>https://github.com/mlvccn/TPVA_VidCap_ZeroShot</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104601"},"PeriodicalIF":3.5,"publicationDate":"2025-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1