首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
Exploring visual language models for driver gaze estimation: A task-based approach to debugging AI 探索驾驶员注视估计的视觉语言模型:基于任务的人工智能调试方法
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-08 DOI: 10.1016/j.cviu.2025.104593
Paola Natalia Cañas , Alejandro H. Artiles , Marcos Nieto , Igor Rodríguez
Visual Language Models (VLMs) have demonstrated superior context understanding and generalization across various tasks compared to models tailored for specific tasks. However, due to their complexity and limited information on their training processes, estimating their performance on specific tasks often requires exhaustive testing, which can be costly and may not account for edge cases. To leverage the zero-shot capabilities of VLMs in safety-critical applications like Driver Monitoring Systems, it is crucial to characterize their knowledge and abilities to ensure consistent performance. This research proposes a methodology to explore and gain a deeper understanding of the functioning of these models in driver’s gaze estimation. It involves detailed task decomposition, identification of necessary data knowledge and abilities (e.g., understanding gaze concepts), and exploration through targeted prompting strategies. Applying this methodology to several VLMs (Idefics2, Qwen2-VL, Moondream, GPT-4o) revealed significant limitations, including sensitivity to prompt phrasing, vocabulary mismatches, reliance on image-relative spatial frames, and difficulties inferring non-visible elements. The findings from this evaluation have highlighted specific areas for improvement and guided the development of more effective prompting and fine-tuning strategies, resulting in enhanced performance comparable with traditional CNN-based approaches. This research is also useful for initial model filtering, for selecting the best model among alternatives and for understanding the model’s limitations and expected behaviors, thereby increasing reliability.
与为特定任务量身定制的模型相比,视觉语言模型(VLMs)在各种任务中表现出更好的上下文理解和泛化能力。然而,由于它们的复杂性和关于它们的训练过程的有限信息,估计它们在特定任务上的表现通常需要详尽的测试,这可能是昂贵的,并且可能无法解释边缘情况。为了在驾驶员监控系统(Driver Monitoring Systems)等安全关键应用中充分利用vlm的零射击功能,对其知识和能力进行特征描述以确保一致的性能至关重要。本研究提出了一种方法来探索和深入了解这些模型在驾驶员注视估计中的功能。它包括详细的任务分解,识别必要的数据知识和能力(例如,理解凝视概念),以及通过有针对性的提示策略进行探索。将这种方法应用于几个VLMs (Idefics2, Qwen2-VL, Moondream, gpt - 40)发现了显著的局限性,包括对提示短语的敏感性,词汇不匹配,对图像相对空间框架的依赖,以及推断不可见元素的困难。这项评估的结果突出了需要改进的具体领域,并指导了更有效的提示和微调策略的发展,从而提高了与传统的基于cnn的方法相媲美的性能。该研究还有助于初始模型过滤,在备选模型中选择最佳模型,了解模型的局限性和预期行为,从而提高可靠性。
{"title":"Exploring visual language models for driver gaze estimation: A task-based approach to debugging AI","authors":"Paola Natalia Cañas ,&nbsp;Alejandro H. Artiles ,&nbsp;Marcos Nieto ,&nbsp;Igor Rodríguez","doi":"10.1016/j.cviu.2025.104593","DOIUrl":"10.1016/j.cviu.2025.104593","url":null,"abstract":"<div><div>Visual Language Models (VLMs) have demonstrated superior context understanding and generalization across various tasks compared to models tailored for specific tasks. However, due to their complexity and limited information on their training processes, estimating their performance on specific tasks often requires exhaustive testing, which can be costly and may not account for edge cases. To leverage the zero-shot capabilities of VLMs in safety-critical applications like Driver Monitoring Systems, it is crucial to characterize their knowledge and abilities to ensure consistent performance. This research proposes a methodology to explore and gain a deeper understanding of the functioning of these models in driver’s gaze estimation. It involves detailed task decomposition, identification of necessary data knowledge and abilities (e.g., understanding gaze concepts), and exploration through targeted prompting strategies. Applying this methodology to several VLMs (Idefics2, Qwen2-VL, Moondream, GPT-4o) revealed significant limitations, including sensitivity to prompt phrasing, vocabulary mismatches, reliance on image-relative spatial frames, and difficulties inferring non-visible elements. The findings from this evaluation have highlighted specific areas for improvement and guided the development of more effective prompting and fine-tuning strategies, resulting in enhanced performance comparable with traditional CNN-based approaches. This research is also useful for initial model filtering, for selecting the best model among alternatives and for understanding the model’s limitations and expected behaviors, thereby increasing reliability.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104593"},"PeriodicalIF":3.5,"publicationDate":"2025-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145790192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A vision-based framework and dataset for human behavior understanding in industrial assembly lines 用于工业装配线中人类行为理解的基于视觉的框架和数据集
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-06 DOI: 10.1016/j.cviu.2025.104592
Konstantinos Papoutsakis , Nikolaos Bakalos , Athena Zacharia , Konstantinos Fragkoulis , Georgia Kapetadimitri , Maria Pateraki
This paper introduces a vision-based framework and dataset for capturing and understanding human behavior in industrial assembly lines, focusing on car door manufacturing. The framework leverages advanced computer vision techniques to estimate workers’ locations and 3D poses and analyze work postures, actions, and task progress. A key contribution is the introduction of the CarDA dataset, which contains domain-relevant assembly actions captured in a realistic setting to support the analysis of the framework for human pose and action analysis. The dataset comprises time-synchronized multi-camera RGB-D videos, motion capture data recorded in a real car manufacturing environment, and annotations for EAWS-based ergonomic risk scores and assembly activities. Experimental results demonstrate the effectiveness of the proposed approach in classifying worker postures and robust performance in monitoring assembly task progress.
本文介绍了一个基于视觉的框架和数据集,用于捕获和理解工业装配线中的人类行为,重点是车门制造。该框架利用先进的计算机视觉技术来估计工人的位置和3D姿势,并分析工作姿势、动作和任务进度。一个关键的贡献是引入了CarDA数据集,它包含了在现实环境中捕获的领域相关装配动作,以支持对人体姿势和动作分析框架的分析。该数据集包括时间同步的多摄像头RGB-D视频,在真实汽车制造环境中记录的动作捕捉数据,以及基于eaws的人体工程学风险评分和装配活动的注释。实验结果表明,该方法在工人姿态分类方面是有效的,在监控装配任务进度方面具有鲁棒性。
{"title":"A vision-based framework and dataset for human behavior understanding in industrial assembly lines","authors":"Konstantinos Papoutsakis ,&nbsp;Nikolaos Bakalos ,&nbsp;Athena Zacharia ,&nbsp;Konstantinos Fragkoulis ,&nbsp;Georgia Kapetadimitri ,&nbsp;Maria Pateraki","doi":"10.1016/j.cviu.2025.104592","DOIUrl":"10.1016/j.cviu.2025.104592","url":null,"abstract":"<div><div>This paper introduces a vision-based framework and dataset for capturing and understanding human behavior in industrial assembly lines, focusing on car door manufacturing. The framework leverages advanced computer vision techniques to estimate workers’ locations and 3D poses and analyze work postures, actions, and task progress. A key contribution is the introduction of the CarDA dataset, which contains domain-relevant assembly actions captured in a realistic setting to support the analysis of the framework for human pose and action analysis. The dataset comprises time-synchronized multi-camera RGB-D videos, motion capture data recorded in a real car manufacturing environment, and annotations for EAWS-based ergonomic risk scores and assembly activities. Experimental results demonstrate the effectiveness of the proposed approach in classifying worker postures and robust performance in monitoring assembly task progress.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104592"},"PeriodicalIF":3.5,"publicationDate":"2025-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736971","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FSATFusion: Frequency-Spatial Attention Transformer for infrared and visible image fusion FSATFusion:用于红外和可见光图像融合的频率-空间注意力转换器
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-05 DOI: 10.1016/j.cviu.2025.104600
Tianpei Zhang, Jufeng Zhao, Yiming Zhu, Guangmang Cui, Yuhan Lyu
The infrared and visible images fusion (IVIF) is receiving increasing attention from both the research community and industry due to its excellent results in downstream applications. However, existing deep learning methods exhibit limitations in global feature modeling, balancing fusion performance with computational efficiency and effectively leveraging frequency-domain information. To address this limitation, we propose an end-to-end fusion network named the Frequency-Spatial Attention Transformer Fusion Network (FSATFusion). The FSATFusion contains the frequency-spatial attention Transformer (FSAT) module designed to effectively capture discriminate features from source images. The FSAT module includes a frequency-spatial attention mechanism (FSAM) capable of extracting significant features from feature maps. Additionally, we propose an improved Transformer module (ITM) to enhance the ability to extract global context information of vanilla Transformer without incurring additional computational overhead. Across four public datasets (TNO, MSRS, RoadScene, and RGB–NIR), we conducted extensive qualitative comparisons and quantitative evaluations based on eight metrics against fourteen representative state-of-the-art fusion algorithms. Experimental results demonstrate that the proposed method outperforms state-of-the-art deep learning approaches (e.g., GANMcC, MDA, and EMMA) in terms of qualitative visual quality, objective metrics (e.g., achieving an average improvement of approximately 34% in MI, 5% in Qy, and 4% in VIF), as well as computational efficiency. Furthermore, the fused images generated by our method exhibit superior applicability and performance in downstream object detection tasks. Our code is available at https://github.com/Lmmh058/FSATFusion.
红外与可见光图像融合(IVIF)由于在下游应用中取得了优异的成绩,越来越受到研究界和工业界的关注。然而,现有的深度学习方法在全局特征建模、平衡融合性能与计算效率以及有效利用频域信息方面存在局限性。为了解决这一限制,我们提出了一个端到端的融合网络,称为频率-空间注意力转换器融合网络(FSATFusion)。FSATFusion包含频率-空间注意力转换器(FSAT)模块,旨在有效捕获源图像中的区别特征。FSAT模块包括一个频率-空间注意机制(FSAM),能够从特征图中提取重要特征。此外,我们提出了一个改进的Transformer模块(ITM),以增强提取vanilla Transformer的全局上下文信息的能力,而不会产生额外的计算开销。在四个公共数据集(TNO, MSRS, RoadScene和RGB-NIR)中,我们基于8个指标对14个代表性的最先进的融合算法进行了广泛的定性比较和定量评估。实验结果表明,所提出的方法在定性视觉质量、客观指标(例如,MI平均提高约34%,Qy提高5%,VIF提高4%)以及计算效率方面优于最先进的深度学习方法(例如,GANMcC、MDA和EMMA)。此外,该方法生成的融合图像在下游目标检测任务中具有优越的适用性和性能。我们的代码可在https://github.com/Lmmh058/FSATFusion上获得。
{"title":"FSATFusion: Frequency-Spatial Attention Transformer for infrared and visible image fusion","authors":"Tianpei Zhang,&nbsp;Jufeng Zhao,&nbsp;Yiming Zhu,&nbsp;Guangmang Cui,&nbsp;Yuhan Lyu","doi":"10.1016/j.cviu.2025.104600","DOIUrl":"10.1016/j.cviu.2025.104600","url":null,"abstract":"<div><div>The infrared and visible images fusion (IVIF) is receiving increasing attention from both the research community and industry due to its excellent results in downstream applications. However, existing deep learning methods exhibit limitations in global feature modeling, balancing fusion performance with computational efficiency and effectively leveraging frequency-domain information. To address this limitation, we propose an end-to-end fusion network named the Frequency-Spatial Attention Transformer Fusion Network (FSATFusion). The FSATFusion contains the frequency-spatial attention Transformer (FSAT) module designed to effectively capture discriminate features from source images. The FSAT module includes a frequency-spatial attention mechanism (FSAM) capable of extracting significant features from feature maps. Additionally, we propose an improved Transformer module (ITM) to enhance the ability to extract global context information of vanilla Transformer without incurring additional computational overhead. Across four public datasets (TNO, MSRS, RoadScene, and RGB–NIR), we conducted extensive qualitative comparisons and quantitative evaluations based on eight metrics against fourteen representative state-of-the-art fusion algorithms. Experimental results demonstrate that the proposed method outperforms state-of-the-art deep learning approaches (e.g., GANMcC, MDA, and EMMA) in terms of qualitative visual quality, objective metrics (e.g., achieving an average improvement of approximately 34% in <span><math><mrow><mi>M</mi><mi>I</mi></mrow></math></span>, 5% in <span><math><msub><mrow><mi>Q</mi></mrow><mrow><mi>y</mi></mrow></msub></math></span>, and 4% in <span><math><mrow><mi>V</mi><mi>I</mi><mi>F</mi></mrow></math></span>), as well as computational efficiency. Furthermore, the fused images generated by our method exhibit superior applicability and performance in downstream object detection tasks. Our code is available at <span><span>https://github.com/Lmmh058/FSATFusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104600"},"PeriodicalIF":3.5,"publicationDate":"2025-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection 基于双向适配器的自适应空频交互网络的泛化人脸伪造检测
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-04 DOI: 10.1016/j.cviu.2025.104599
Junchang Jing , Yanyan Lv , Ming Li , Dong Liu , Zhiyong Zhang
Although existing face forgery detection methods have demonstrated remarkable performance, they still suffer a significant performance drop when confronted with samples generated by unseen manipulation techniques. This poor generalization performance arises from the detectors overfitting to specific datasets and failing to learn generalizable feature representations. To tackle this problem, we propose a novel adaptive spatial-frequency interactive network with Bi-directional adapter for generalizable face forgery detection. Specifically, we design an Adaptive Region Dynamic Convolution (ARDConv) module and an Adaptive Frequency Dynamic Filter (AFDF) module. The ARDConv module divides the spatial dimension into several regions based on the guided features of the input image, and employs the multi-head cross-attention mechanism to dynamically generate filters, effectively focusing on subtle texture artifacts in the spatial domain. The AFDF module applies frequency decomposition and dynamic convolution kernels in the frequency domain, which adaptively selecting frequency information to capture refined clues. Additionally, we present a dual-domain fusion module based on Bi-directional Adapter (BAT) to transfer domain-specific feature information from one domain to another. The advantage of this module lies in its ability to enable efficient feature fusion by fine-tuning only minimal BAT parameters. Our method exhibits exceptional generalization capabilities in cross-dataset evaluation, outperforming optimal approaches by 3.07% and 3.15% AUC improvements. Moreover, the proposed approach only utilizes 547K trainable parameters and 130M FLOPs, significantly reducing computational costs compared to other state-of-the-art face forgery detection methods. The code is released at https://github.com/lvyanyana/ASFI.
尽管现有的人脸伪造检测方法已经表现出了显著的性能,但当面对由看不见的操纵技术产生的样本时,它们的性能仍然会下降。这种较差的泛化性能源于检测器对特定数据集的过度拟合和未能学习可泛化的特征表示。为了解决这一问题,我们提出了一种具有双向适配器的自适应空频交互网络,用于通用人脸伪造检测。具体来说,我们设计了一个自适应区域动态卷积(ARDConv)模块和一个自适应频率动态滤波器(AFDF)模块。ARDConv模块根据输入图像的引导特征将空间维度划分为多个区域,并采用多头交叉注意机制动态生成滤波器,有效聚焦空间域中的细微纹理伪像。AFDF模块在频域应用频率分解和动态卷积核,自适应选择频率信息,捕捉精细线索。此外,我们还提出了一种基于双向适配器(BAT)的双域融合模块,将特定领域的特征信息从一个领域传递到另一个领域。该模块的优点在于它能够通过微调最小的BAT参数来实现有效的特征融合。我们的方法在跨数据集评估中表现出卓越的泛化能力,比最优方法的AUC提高了3.07%和3.15%。此外,该方法仅使用547K可训练参数和130M FLOPs,与其他先进的人脸伪造检测方法相比,显著降低了计算成本。该代码发布在https://github.com/lvyanyana/ASFI。
{"title":"Constructing adaptive spatial-frequency interactive network with bi-directional adapter for generalizable face forgery detection","authors":"Junchang Jing ,&nbsp;Yanyan Lv ,&nbsp;Ming Li ,&nbsp;Dong Liu ,&nbsp;Zhiyong Zhang","doi":"10.1016/j.cviu.2025.104599","DOIUrl":"10.1016/j.cviu.2025.104599","url":null,"abstract":"<div><div>Although existing face forgery detection methods have demonstrated remarkable performance, they still suffer a significant performance drop when confronted with samples generated by unseen manipulation techniques. This poor generalization performance arises from the detectors overfitting to specific datasets and failing to learn generalizable feature representations. To tackle this problem, we propose a novel adaptive spatial-frequency interactive network with Bi-directional adapter for generalizable face forgery detection. Specifically, we design an Adaptive Region Dynamic Convolution (ARDConv) module and an Adaptive Frequency Dynamic Filter (AFDF) module. The ARDConv module divides the spatial dimension into several regions based on the guided features of the input image, and employs the multi-head cross-attention mechanism to dynamically generate filters, effectively focusing on subtle texture artifacts in the spatial domain. The AFDF module applies frequency decomposition and dynamic convolution kernels in the frequency domain, which adaptively selecting frequency information to capture refined clues. Additionally, we present a dual-domain fusion module based on Bi-directional Adapter (BAT) to transfer domain-specific feature information from one domain to another. The advantage of this module lies in its ability to enable efficient feature fusion by fine-tuning only minimal BAT parameters. Our method exhibits exceptional generalization capabilities in cross-dataset evaluation, outperforming optimal approaches by 3.07% and 3.15% AUC improvements. Moreover, the proposed approach only utilizes 547K trainable parameters and 130M FLOPs, significantly reducing computational costs compared to other state-of-the-art face forgery detection methods. The code is released at <span><span>https://github.com/lvyanyana/ASFI</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104599"},"PeriodicalIF":3.5,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LoTeR: Localized text prompt refinement for zero-shot referring image segmentation LoTeR:用于零点参考图像分割的局部文本提示细化
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-04 DOI: 10.1016/j.cviu.2025.104596
Lei Zhang , Yongqiu Huang , Yingjun Du , Fang Lei , Zhiying Yang , Cees G.M. Snoek , Yehui Wang
This paper addresses the challenge of segmenting an object in an image based solely on a textual description, without requiring any training on specific object classes. In contrast to traditional methods that rely on generating numerous mask proposals, we introduce a novel patch-based approach. Our method computes the similarity between small image patches, extracted using a sliding window, and textual descriptions, producing a patch score map that identifies the regions most likely to contain the target object. This score map guides a segment-anything model to generate precise mask proposals. To further improve segmentation accuracy, we refine the textual prompts by generating detailed object descriptions using a multi-modal large language model. Our method’s effectiveness is validated through extensive experiments on the RefCOCO, RefCOCO+, and RefCOCOg datasets, where it outperforms state-of-the-art zero-shot referring image segmentation methods. Ablation studies confirm the key contributions of our patch-based segmentation and localized text prompt refinement, demonstrating their significant role in enhancing both precision and robustness.
本文解决了仅基于文本描述分割图像中对象的挑战,而不需要对特定对象类进行任何训练。与传统的依赖于生成大量掩模建议的方法相比,我们引入了一种基于补丁的新方法。我们的方法计算使用滑动窗口提取的小图像补丁和文本描述之间的相似性,生成一个补丁评分图,该图识别最有可能包含目标物体的区域。这个分数地图引导一个分段-任何模型来生成精确的掩码建议。为了进一步提高分割精度,我们通过使用多模态大语言模型生成详细的对象描述来改进文本提示。通过在RefCOCO、RefCOCO+和RefCOCO数据集上的大量实验,我们的方法的有效性得到了验证,在这些数据集上,它优于最先进的零采样参考图像分割方法。消融研究证实了我们基于补丁的分割和本地化文本提示细化的关键贡献,证明了它们在提高精度和鲁棒性方面的重要作用。
{"title":"LoTeR: Localized text prompt refinement for zero-shot referring image segmentation","authors":"Lei Zhang ,&nbsp;Yongqiu Huang ,&nbsp;Yingjun Du ,&nbsp;Fang Lei ,&nbsp;Zhiying Yang ,&nbsp;Cees G.M. Snoek ,&nbsp;Yehui Wang","doi":"10.1016/j.cviu.2025.104596","DOIUrl":"10.1016/j.cviu.2025.104596","url":null,"abstract":"<div><div>This paper addresses the challenge of segmenting an object in an image based solely on a textual description, without requiring any training on specific object classes. In contrast to traditional methods that rely on generating numerous mask proposals, we introduce a novel patch-based approach. Our method computes the similarity between small image patches, extracted using a sliding window, and textual descriptions, producing a patch score map that identifies the regions most likely to contain the target object. This score map guides a segment-anything model to generate precise mask proposals. To further improve segmentation accuracy, we refine the textual prompts by generating detailed object descriptions using a multi-modal large language model. Our method’s effectiveness is validated through extensive experiments on the RefCOCO, RefCOCO+, and RefCOCOg datasets, where it outperforms state-of-the-art zero-shot referring image segmentation methods. Ablation studies confirm the key contributions of our patch-based segmentation and localized text prompt refinement, demonstrating their significant role in enhancing both precision and robustness.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104596"},"PeriodicalIF":3.5,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A configurable global context reconstruction hybrid detector for enhanced small object detection in UAV aerial imagery 一种用于增强无人机航拍图像小目标检测的可配置全局上下文重建混合检测器
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-04 DOI: 10.1016/j.cviu.2025.104598
Hongcheng Xue , Tong Gao , Zhan Tang , Yuantian Xia , Longhe Wang , Lin Li
To address the challenge of balancing detection accuracy and efficiency for small objects in complex aerial scenes, we propose a Configurable Global Context Reconstruction Hybrid Detector (GCRH) to enhance overall detection performance. The GCRH framework consists of three key components. First, the Efficient Re-parameterized Encoder (ERE) reduces the computational overhead of multi-head self-attention through re-parameterization while maintaining the integrity and independence of global–local feature interactions. Second, the Global-Aware Feature Pyramid Network (GAFPN) reconstructs and injects global contextual semantics, cascading selective feature fusion to distribute this semantic information across feature layers, thereby alleviating small-object feature degradation and cross-level semantic inconsistency. Finally, two configurable model variants are provided, allowing the control of high-resolution feature layers to balance detection accuracy and inference efficiency. Experiments on the VisDrone2019 and TinyPerson datasets demonstrate that GCRH achieves an effective trade-off between precision and efficiency, validating its applicability to small object detection in aerial imagery. The code is available at: https://github.com/Mundane-X/GCRH.
为了解决复杂航空场景中小目标检测精度和效率平衡的挑战,我们提出了一种可配置的全局上下文重建混合检测器(GCRH)来提高整体检测性能。GCRH框架由三个关键部分组成。首先,高效重参数化编码器(ERE)通过重参数化降低了多头自关注的计算开销,同时保持了全局-局部特征交互的完整性和独立性。其次,全局感知特征金字塔网络(GAFPN)重构并注入全局上下文语义,级联选择性特征融合将该语义信息分布在特征层之间,从而缓解小目标特征退化和跨层语义不一致。最后,提供了两种可配置的模型变体,允许控制高分辨率特征层以平衡检测精度和推理效率。在VisDrone2019和TinyPerson数据集上的实验表明,GCRH实现了精度和效率之间的有效权衡,验证了其在航空图像小目标检测中的适用性。代码可从https://github.com/Mundane-X/GCRH获得。
{"title":"A configurable global context reconstruction hybrid detector for enhanced small object detection in UAV aerial imagery","authors":"Hongcheng Xue ,&nbsp;Tong Gao ,&nbsp;Zhan Tang ,&nbsp;Yuantian Xia ,&nbsp;Longhe Wang ,&nbsp;Lin Li","doi":"10.1016/j.cviu.2025.104598","DOIUrl":"10.1016/j.cviu.2025.104598","url":null,"abstract":"<div><div>To address the challenge of balancing detection accuracy and efficiency for small objects in complex aerial scenes, we propose a Configurable Global Context Reconstruction Hybrid Detector (GCRH) to enhance overall detection performance. The GCRH framework consists of three key components. First, the Efficient Re-parameterized Encoder (ERE) reduces the computational overhead of multi-head self-attention through re-parameterization while maintaining the integrity and independence of global–local feature interactions. Second, the Global-Aware Feature Pyramid Network (GAFPN) reconstructs and injects global contextual semantics, cascading selective feature fusion to distribute this semantic information across feature layers, thereby alleviating small-object feature degradation and cross-level semantic inconsistency. Finally, two configurable model variants are provided, allowing the control of high-resolution feature layers to balance detection accuracy and inference efficiency. Experiments on the VisDrone2019 and TinyPerson datasets demonstrate that GCRH achieves an effective trade-off between precision and efficiency, validating its applicability to small object detection in aerial imagery. The code is available at: <span><span>https://github.com/Mundane-X/GCRH</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104598"},"PeriodicalIF":3.5,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring joint embedding predictive architectures for pretraining convolutional neural networks 探索预训练卷积神经网络的联合嵌入预测架构
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-04 DOI: 10.1016/j.cviu.2025.104595
András Kalapos, Bálint Gyires-Tóth
Joint Embedding Predictive Architectures present a novel intermediate approach to visual self-supervised learning combining mechanisms from instance discrimination and masked modeling. CNN-JEPA adapts this approach to convolutional neural networks and demonstrates its computational efficiency and accuracy on image classification benchmarks. In this study, we investigate CNN-JEPA, adapt it for semantic segmentation, and propose a learning objective that improves image-level representation learning through a joint embedding predictive architecture. We conduct an extensive evaluation, comparing it with other SSL methods by analyzing data efficiency and computational demands across downstream classification and segmentation benchmarks. Our results show that its classification and segmentation accuracy outperforms similar masked modeling methods such as I-JEPA and SparK with a ResNet-50 or a similarly sized ViT-Small encoder. Furthermore, CNN-JEPA requires fewer computational resources during pretraining, demonstrates excellent data efficiency in data-limited downstream segmentation, and achieves competitive accuracy with successful instance discrimination-based SSL methods for pretraining encoders on ImageNet.
联合嵌入预测体系结构结合实例识别和屏蔽建模机制,提出了一种新的视觉自监督学习中间方法。CNN-JEPA将该方法应用于卷积神经网络,并在图像分类基准上验证了其计算效率和准确性。在本研究中,我们研究了CNN-JEPA,将其用于语义分割,并提出了一个学习目标,通过联合嵌入预测架构改进图像级表示学习。我们进行了广泛的评估,通过分析下游分类和细分基准的数据效率和计算需求,将其与其他SSL方法进行比较。我们的研究结果表明,它的分类和分割精度优于类似的屏蔽建模方法,如使用ResNet-50或类似大小的ViT-Small编码器的I-JEPA和SparK。此外,CNN-JEPA在预训练过程中需要较少的计算资源,在数据有限的下游分割中表现出出色的数据效率,并且在ImageNet上成功地使用基于实例区分的SSL方法对编码器进行预训练,达到了相当的精度。
{"title":"Exploring joint embedding predictive architectures for pretraining convolutional neural networks","authors":"András Kalapos,&nbsp;Bálint Gyires-Tóth","doi":"10.1016/j.cviu.2025.104595","DOIUrl":"10.1016/j.cviu.2025.104595","url":null,"abstract":"<div><div>Joint Embedding Predictive Architectures present a novel intermediate approach to visual self-supervised learning combining mechanisms from instance discrimination and masked modeling. CNN-JEPA adapts this approach to convolutional neural networks and demonstrates its computational efficiency and accuracy on image classification benchmarks. In this study, we investigate CNN-JEPA, adapt it for semantic segmentation, and propose a learning objective that improves image-level representation learning through a joint embedding predictive architecture. We conduct an extensive evaluation, comparing it with other SSL methods by analyzing data efficiency and computational demands across downstream classification and segmentation benchmarks. Our results show that its classification and segmentation accuracy outperforms similar masked modeling methods such as I-JEPA and SparK with a ResNet-50 or a similarly sized ViT-Small encoder. Furthermore, CNN-JEPA requires fewer computational resources during pretraining, demonstrates excellent data efficiency in data-limited downstream segmentation, and achieves competitive accuracy with successful instance discrimination-based SSL methods for pretraining encoders on ImageNet.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104595"},"PeriodicalIF":3.5,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An end-to-end pipeline for team-aware, pose-aligned augmented reality in cycling broadcasts 一个端到端的管道团队意识,姿势对齐增强现实在自行车广播
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-04 DOI: 10.1016/j.cviu.2025.104602
Winter Clinckemaillie, Jelle Vanhaeverbeke, Maarten Slembrouck, Steven Verstockt
Advanced computer vision and machine learning technologies transform how we experience sports events. This work enriches helicopter footage of cycling races with dynamic, in-scene, pose-aligned augmented reality (AR) overlays (e.g., rider name, speed, wind direction) that remain visually attached to each rider. To achieve this, we propose a multi-stage pipeline: cyclists are first detected and tracked, followed by team recognition using a one-shot learning approach based on Siamese neural networks, which achieves a classification accuracy of 85% on a test set composed of unseen teams during training. This design allows easy adaptation and reuse across different races and seasons, enabling frequent jersey and team changes with minimal effort. We introduce a pose-based AR overlay that anchors rider labels to moving cyclists without fixed field landmarks or homography, enabling dynamic overlays in unconstrained cycling broadcasts. Real-time feasibility is demonstrated through runtime profiling and TensorRT optimizations. Finally, a user study evaluates the readability, informativeness, visual stability, and engagement of our AR-enhanced broadcasts. The combination of advanced computer vision, AR, and user-centered evaluation showcases new possibilities for improving live sports broadcasts, particularly in challenging environments like road cycling.
先进的计算机视觉和机器学习技术改变了我们体验体育赛事的方式。这项工作通过动态的、现场的、姿势对齐的增强现实(AR)叠加(例如,骑手姓名、速度、风向)丰富了自行车比赛的直升机镜头,这些叠加在视觉上与每个骑手保持联系。为了实现这一目标,我们提出了一个多阶段的流程:首先检测和跟踪骑自行车的人,然后使用基于Siamese神经网络的一次性学习方法进行团队识别,该方法在训练期间由未见过的团队组成的测试集上实现了85%的分类准确率。这种设计允许在不同的比赛和季节之间轻松地进行调整和重用,以最小的努力实现频繁的球衣和团队更改。我们引入了一种基于姿态的AR覆盖,可以将车手标签锚定到移动的自行车手身上,而不需要固定的场地地标或单应性,从而在不受约束的自行车广播中实现动态覆盖。通过运行时分析和TensorRT优化演示了实时可行性。最后,一项用户研究评估了我们增强ar广播的可读性、信息量、视觉稳定性和参与度。先进的计算机视觉、增强现实和以用户为中心的评估相结合,展示了改善体育直播的新可能性,特别是在公路自行车等具有挑战性的环境中。
{"title":"An end-to-end pipeline for team-aware, pose-aligned augmented reality in cycling broadcasts","authors":"Winter Clinckemaillie,&nbsp;Jelle Vanhaeverbeke,&nbsp;Maarten Slembrouck,&nbsp;Steven Verstockt","doi":"10.1016/j.cviu.2025.104602","DOIUrl":"10.1016/j.cviu.2025.104602","url":null,"abstract":"<div><div>Advanced computer vision and machine learning technologies transform how we experience sports events. This work enriches helicopter footage of cycling races with dynamic, in-scene, pose-aligned augmented reality (AR) overlays (e.g., rider name, speed, wind direction) that remain visually attached to each rider. To achieve this, we propose a multi-stage pipeline: cyclists are first detected and tracked, followed by team recognition using a one-shot learning approach based on Siamese neural networks, which achieves a classification accuracy of 85% on a test set composed of unseen teams during training. This design allows easy adaptation and reuse across different races and seasons, enabling frequent jersey and team changes with minimal effort. We introduce a pose-based AR overlay that anchors rider labels to moving cyclists without fixed field landmarks or homography, enabling dynamic overlays in unconstrained cycling broadcasts. Real-time feasibility is demonstrated through runtime profiling and TensorRT optimizations. Finally, a user study evaluates the readability, informativeness, visual stability, and engagement of our AR-enhanced broadcasts. The combination of advanced computer vision, AR, and user-centered evaluation showcases new possibilities for improving live sports broadcasts, particularly in challenging environments like road cycling.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104602"},"PeriodicalIF":3.5,"publicationDate":"2025-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodal driver behavior recognition based on frame-adaptive convolution and feature fusion 基于帧自适应卷积和特征融合的多模式驾驶员行为识别
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-03 DOI: 10.1016/j.cviu.2025.104587
Jiafeng Li, Jiajun Sun, Ziqing Li, Jing Zhang, Li Zhuo
The identification of driver behavior plays a vital role in the autonomous driving systems of intelligent vehicles. However, the complexity of real-world driving scenarios presents significant challenges. Several existing approaches struggle to effectively exploit multimodal feature-level fusion and suffer from suboptimal temporal modeling, resulting in unsatisfactory performance. We introduce a new multimodal framework that combines RGB frames with skeletal data at the feature level, incorporating a frame-adaptive convolution mechanism to improve temporal modeling. Specifically, we first propose the local spatial attention enhancement module (LSAEM). This module refines RGB features using local spatial attention from skeletal features, prioritizing critical local regions and mitigating the negative effects of complex backgrounds in the RGB modality. Next, we introduce the heatmap enhancement module (HEM), which enriches skeletal features with contextual scene information from RGB heatmaps, thus addressing the lack of local scene context in skeletal data. Finally, we propose a frame-adaptive convolution mechanism that dynamically adjusts convolutional weights per frame, emphasizing key temporal frames and further strengthening the model’s temporal modeling capabilities. Extensive experiments on the Drive&Act dataset validate the efficacy of the presented approach, showing remarkable enhancements in recognition accuracy as compared to existing SOTA methods.
驾驶员行为识别在智能汽车自动驾驶系统中起着至关重要的作用。然而,现实驾驶场景的复杂性带来了重大挑战。现有的几种方法难以有效地利用多模态特征级融合,并且存在时间建模不理想的问题,导致性能不理想。我们引入了一个新的多模态框架,该框架将RGB帧与骨骼数据结合在特征级别,并结合帧自适应卷积机制来改进时间建模。具体而言,我们首先提出了局部空间注意增强模块(LSAEM)。该模块使用来自骨骼特征的局部空间注意力来细化RGB特征,优先考虑关键的局部区域,并减轻RGB模态中复杂背景的负面影响。接下来,我们引入热图增强模块(HEM),该模块通过RGB热图中的上下文场景信息丰富骨骼特征,从而解决骨骼数据中缺乏局部场景上下文的问题。最后,我们提出了一种帧自适应卷积机制,该机制可以动态调整每帧的卷积权重,强调关键的时间帧,进一步增强模型的时间建模能力。在Drive&;Act数据集上进行的大量实验验证了所提出方法的有效性,与现有的SOTA方法相比,识别精度显着提高。
{"title":"Multimodal driver behavior recognition based on frame-adaptive convolution and feature fusion","authors":"Jiafeng Li,&nbsp;Jiajun Sun,&nbsp;Ziqing Li,&nbsp;Jing Zhang,&nbsp;Li Zhuo","doi":"10.1016/j.cviu.2025.104587","DOIUrl":"10.1016/j.cviu.2025.104587","url":null,"abstract":"<div><div>The identification of driver behavior plays a vital role in the autonomous driving systems of intelligent vehicles. However, the complexity of real-world driving scenarios presents significant challenges. Several existing approaches struggle to effectively exploit multimodal feature-level fusion and suffer from suboptimal temporal modeling, resulting in unsatisfactory performance. We introduce a new multimodal framework that combines RGB frames with skeletal data at the feature level, incorporating a frame-adaptive convolution mechanism to improve temporal modeling. Specifically, we first propose the local spatial attention enhancement module (LSAEM). This module refines RGB features using local spatial attention from skeletal features, prioritizing critical local regions and mitigating the negative effects of complex backgrounds in the RGB modality. Next, we introduce the heatmap enhancement module (HEM), which enriches skeletal features with contextual scene information from RGB heatmaps, thus addressing the lack of local scene context in skeletal data. Finally, we propose a frame-adaptive convolution mechanism that dynamically adjusts convolutional weights per frame, emphasizing key temporal frames and further strengthening the model’s temporal modeling capabilities. Extensive experiments on the Drive&amp;Act dataset validate the efficacy of the presented approach, showing remarkable enhancements in recognition accuracy as compared to existing SOTA methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104587"},"PeriodicalIF":3.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised multi-modal domain adaptation for RGB-T Semantic Segmentation RGB-T语义分割的无监督多模态域自适应
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-03 DOI: 10.1016/j.cviu.2025.104573
Zeyang Chen , Chunyu Lin , Yao Zhao , Tammam Tillo
This paper proposes an Unsupervised multi-modal domain adaptation approach for semantic segmentation of visible and thermal images. The method addresses the issue of data scarcity by transferring knowledge from existing semantic segmentation networks, thereby helping to avoid the high costs associated with data labeling. We take into account changes in temperature and light to reduce the intra-domain gap between visible and thermal images captured during the day and night. Additionally, we narrow the inter-domain gap between visible and thermal images using a self-distillation loss. Our approach allows for high-quality semantic segmentation without the need for annotations, even under challenging conditions such as nighttime and adverse weather. Experiments conducted on both visible and thermal benchmarks demonstrate the effectiveness of our method, quantitatively and qualitatively.
提出了一种无监督多模态域自适应的视觉图像和热图像语义分割方法。该方法通过从现有的语义分割网络转移知识来解决数据稀缺的问题,从而有助于避免与数据标记相关的高成本。我们考虑了温度和光线的变化,以减少在白天和晚上捕获的可见光和热图像之间的域内差距。此外,我们利用自蒸馏损失缩小了可见光图像和热图像之间的域间差距。我们的方法允许在不需要注释的情况下进行高质量的语义分割,即使在夜间和恶劣天气等具有挑战性的条件下也是如此。在可见光和热基准上进行的实验证明了我们的方法在定量和定性上的有效性。
{"title":"Unsupervised multi-modal domain adaptation for RGB-T Semantic Segmentation","authors":"Zeyang Chen ,&nbsp;Chunyu Lin ,&nbsp;Yao Zhao ,&nbsp;Tammam Tillo","doi":"10.1016/j.cviu.2025.104573","DOIUrl":"10.1016/j.cviu.2025.104573","url":null,"abstract":"<div><div>This paper proposes an Unsupervised multi-modal domain adaptation approach for semantic segmentation of visible and thermal images. The method addresses the issue of data scarcity by transferring knowledge from existing semantic segmentation networks, thereby helping to avoid the high costs associated with data labeling. We take into account changes in temperature and light to reduce the intra-domain gap between visible and thermal images captured during the day and night. Additionally, we narrow the inter-domain gap between visible and thermal images using a self-distillation loss. Our approach allows for high-quality semantic segmentation without the need for annotations, even under challenging conditions such as nighttime and adverse weather. Experiments conducted on both visible and thermal benchmarks demonstrate the effectiveness of our method, quantitatively and qualitatively.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104573"},"PeriodicalIF":3.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1