首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
Multimodal driver behavior recognition based on frame-adaptive convolution and feature fusion 基于帧自适应卷积和特征融合的多模式驾驶员行为识别
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-03 DOI: 10.1016/j.cviu.2025.104587
Jiafeng Li, Jiajun Sun, Ziqing Li, Jing Zhang, Li Zhuo
The identification of driver behavior plays a vital role in the autonomous driving systems of intelligent vehicles. However, the complexity of real-world driving scenarios presents significant challenges. Several existing approaches struggle to effectively exploit multimodal feature-level fusion and suffer from suboptimal temporal modeling, resulting in unsatisfactory performance. We introduce a new multimodal framework that combines RGB frames with skeletal data at the feature level, incorporating a frame-adaptive convolution mechanism to improve temporal modeling. Specifically, we first propose the local spatial attention enhancement module (LSAEM). This module refines RGB features using local spatial attention from skeletal features, prioritizing critical local regions and mitigating the negative effects of complex backgrounds in the RGB modality. Next, we introduce the heatmap enhancement module (HEM), which enriches skeletal features with contextual scene information from RGB heatmaps, thus addressing the lack of local scene context in skeletal data. Finally, we propose a frame-adaptive convolution mechanism that dynamically adjusts convolutional weights per frame, emphasizing key temporal frames and further strengthening the model’s temporal modeling capabilities. Extensive experiments on the Drive&Act dataset validate the efficacy of the presented approach, showing remarkable enhancements in recognition accuracy as compared to existing SOTA methods.
驾驶员行为识别在智能汽车自动驾驶系统中起着至关重要的作用。然而,现实驾驶场景的复杂性带来了重大挑战。现有的几种方法难以有效地利用多模态特征级融合,并且存在时间建模不理想的问题,导致性能不理想。我们引入了一个新的多模态框架,该框架将RGB帧与骨骼数据结合在特征级别,并结合帧自适应卷积机制来改进时间建模。具体而言,我们首先提出了局部空间注意增强模块(LSAEM)。该模块使用来自骨骼特征的局部空间注意力来细化RGB特征,优先考虑关键的局部区域,并减轻RGB模态中复杂背景的负面影响。接下来,我们引入热图增强模块(HEM),该模块通过RGB热图中的上下文场景信息丰富骨骼特征,从而解决骨骼数据中缺乏局部场景上下文的问题。最后,我们提出了一种帧自适应卷积机制,该机制可以动态调整每帧的卷积权重,强调关键的时间帧,进一步增强模型的时间建模能力。在Drive&;Act数据集上进行的大量实验验证了所提出方法的有效性,与现有的SOTA方法相比,识别精度显着提高。
{"title":"Multimodal driver behavior recognition based on frame-adaptive convolution and feature fusion","authors":"Jiafeng Li,&nbsp;Jiajun Sun,&nbsp;Ziqing Li,&nbsp;Jing Zhang,&nbsp;Li Zhuo","doi":"10.1016/j.cviu.2025.104587","DOIUrl":"10.1016/j.cviu.2025.104587","url":null,"abstract":"<div><div>The identification of driver behavior plays a vital role in the autonomous driving systems of intelligent vehicles. However, the complexity of real-world driving scenarios presents significant challenges. Several existing approaches struggle to effectively exploit multimodal feature-level fusion and suffer from suboptimal temporal modeling, resulting in unsatisfactory performance. We introduce a new multimodal framework that combines RGB frames with skeletal data at the feature level, incorporating a frame-adaptive convolution mechanism to improve temporal modeling. Specifically, we first propose the local spatial attention enhancement module (LSAEM). This module refines RGB features using local spatial attention from skeletal features, prioritizing critical local regions and mitigating the negative effects of complex backgrounds in the RGB modality. Next, we introduce the heatmap enhancement module (HEM), which enriches skeletal features with contextual scene information from RGB heatmaps, thus addressing the lack of local scene context in skeletal data. Finally, we propose a frame-adaptive convolution mechanism that dynamically adjusts convolutional weights per frame, emphasizing key temporal frames and further strengthening the model’s temporal modeling capabilities. Extensive experiments on the Drive&amp;Act dataset validate the efficacy of the presented approach, showing remarkable enhancements in recognition accuracy as compared to existing SOTA methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104587"},"PeriodicalIF":3.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145736968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised multi-modal domain adaptation for RGB-T Semantic Segmentation RGB-T语义分割的无监督多模态域自适应
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-03 DOI: 10.1016/j.cviu.2025.104573
Zeyang Chen , Chunyu Lin , Yao Zhao , Tammam Tillo
This paper proposes an Unsupervised multi-modal domain adaptation approach for semantic segmentation of visible and thermal images. The method addresses the issue of data scarcity by transferring knowledge from existing semantic segmentation networks, thereby helping to avoid the high costs associated with data labeling. We take into account changes in temperature and light to reduce the intra-domain gap between visible and thermal images captured during the day and night. Additionally, we narrow the inter-domain gap between visible and thermal images using a self-distillation loss. Our approach allows for high-quality semantic segmentation without the need for annotations, even under challenging conditions such as nighttime and adverse weather. Experiments conducted on both visible and thermal benchmarks demonstrate the effectiveness of our method, quantitatively and qualitatively.
提出了一种无监督多模态域自适应的视觉图像和热图像语义分割方法。该方法通过从现有的语义分割网络转移知识来解决数据稀缺的问题,从而有助于避免与数据标记相关的高成本。我们考虑了温度和光线的变化,以减少在白天和晚上捕获的可见光和热图像之间的域内差距。此外,我们利用自蒸馏损失缩小了可见光图像和热图像之间的域间差距。我们的方法允许在不需要注释的情况下进行高质量的语义分割,即使在夜间和恶劣天气等具有挑战性的条件下也是如此。在可见光和热基准上进行的实验证明了我们的方法在定量和定性上的有效性。
{"title":"Unsupervised multi-modal domain adaptation for RGB-T Semantic Segmentation","authors":"Zeyang Chen ,&nbsp;Chunyu Lin ,&nbsp;Yao Zhao ,&nbsp;Tammam Tillo","doi":"10.1016/j.cviu.2025.104573","DOIUrl":"10.1016/j.cviu.2025.104573","url":null,"abstract":"<div><div>This paper proposes an Unsupervised multi-modal domain adaptation approach for semantic segmentation of visible and thermal images. The method addresses the issue of data scarcity by transferring knowledge from existing semantic segmentation networks, thereby helping to avoid the high costs associated with data labeling. We take into account changes in temperature and light to reduce the intra-domain gap between visible and thermal images captured during the day and night. Additionally, we narrow the inter-domain gap between visible and thermal images using a self-distillation loss. Our approach allows for high-quality semantic segmentation without the need for annotations, even under challenging conditions such as nighttime and adverse weather. Experiments conducted on both visible and thermal benchmarks demonstrate the effectiveness of our method, quantitatively and qualitatively.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104573"},"PeriodicalIF":3.5,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A modular augmented reality framework for real-time clinical data visualization and interaction 用于实时临床数据可视化和交互的模块化增强现实框架
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-02 DOI: 10.1016/j.cviu.2025.104594
Lucia Cascone , Lucia Cimmino , Michele Nappi , Chiara Pero
This paper presents a modular augmented reality (AR) framework designed to support healthcare professionals in the real-time visualization and interaction with clinical data. The system integrates biometric patient identification, large language models (LLMs) for multimodal clinical data structuring, and ontology-driven AR overlays for anatomy-aware spatial projection. Unlike conventional systems, the framework enables immersive, context-aware visualization that improves both the accessibility and interpretability of medical information. The architecture is fully modular and mobile-compatible, allowing independent refinement of its core components. Patient identification is performed through facial recognition, while clinical documents are processed by a vision-language pipeline that standardizes heterogeneous records into structured data. Body-tracking technology anchors these parameters to the corresponding anatomical regions, supporting intuitive and dynamic interaction during consultations. The framework has been validated through a diabetology case study and a usability assessment with five clinicians, achieving a System Usability Scale (SUS) score of 73.0, which indicates good usability. Experimental results confirm the accuracy of biometric identification (97.1%). The LLM-based pipeline achieved an exact match accuracy of 98.0% for diagnosis extraction and 86.0% for treatment extraction from unstructured clinical images, confirming its reliability in structuring heterogeneous medical content. The system is released as open source to encourage reproducibility and collaborative development. Overall, this work contributes a flexible, clinician-oriented AR platform that combines biometric recognition, multimodal data processing, and interactive visualization to advance next-generation digital healthcare applications.
本文提出了一个模块化增强现实(AR)框架,旨在支持医疗保健专业人员实时可视化和与临床数据的交互。该系统集成了生物特征患者识别,用于多模式临床数据结构的大型语言模型(llm),以及用于解剖感知空间投影的本体驱动的AR覆盖。与传统系统不同,该框架能够实现沉浸式、上下文感知的可视化,从而提高医疗信息的可访问性和可解释性。该架构是完全模块化和移动兼容的,允许对其核心组件进行独立的改进。通过面部识别进行患者识别,而临床文件则由视觉语言管道处理,该管道将异构记录标准化为结构化数据。身体跟踪技术将这些参数锚定到相应的解剖区域,在咨询过程中支持直观和动态的交互。该框架已通过糖尿病案例研究和5名临床医生的可用性评估进行验证,系统可用性量表(SUS)得分为73.0,表明具有良好的可用性。实验结果证实了生物特征识别的准确率(97.1%)。基于llm的流水线在非结构化临床图像中诊断提取的精确匹配准确率为98.0%,治疗提取的精确匹配准确率为86.0%,证实了其在结构化异构医学内容方面的可靠性。该系统作为开放源代码发布,以鼓励可再现性和协作开发。总的来说,这项工作提供了一个灵活的、面向临床医生的AR平台,该平台结合了生物识别、多模态数据处理和交互式可视化,以推进下一代数字医疗保健应用。
{"title":"A modular augmented reality framework for real-time clinical data visualization and interaction","authors":"Lucia Cascone ,&nbsp;Lucia Cimmino ,&nbsp;Michele Nappi ,&nbsp;Chiara Pero","doi":"10.1016/j.cviu.2025.104594","DOIUrl":"10.1016/j.cviu.2025.104594","url":null,"abstract":"<div><div>This paper presents a modular augmented reality (AR) framework designed to support healthcare professionals in the real-time visualization and interaction with clinical data. The system integrates biometric patient identification, large language models (LLMs) for multimodal clinical data structuring, and ontology-driven AR overlays for anatomy-aware spatial projection. Unlike conventional systems, the framework enables immersive, context-aware visualization that improves both the accessibility and interpretability of medical information. The architecture is fully modular and mobile-compatible, allowing independent refinement of its core components. Patient identification is performed through facial recognition, while clinical documents are processed by a vision-language pipeline that standardizes heterogeneous records into structured data. Body-tracking technology anchors these parameters to the corresponding anatomical regions, supporting intuitive and dynamic interaction during consultations. The framework has been validated through a diabetology case study and a usability assessment with five clinicians, achieving a System Usability Scale (SUS) score of 73.0, which indicates good usability. Experimental results confirm the accuracy of biometric identification (97.1%). The LLM-based pipeline achieved an exact match accuracy of 98.0% for diagnosis extraction and 86.0% for treatment extraction from unstructured clinical images, confirming its reliability in structuring heterogeneous medical content. The system is released as open source to encourage reproducibility and collaborative development. Overall, this work contributes a flexible, clinician-oriented AR platform that combines biometric recognition, multimodal data processing, and interactive visualization to advance next-generation digital healthcare applications.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104594"},"PeriodicalIF":3.5,"publicationDate":"2025-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Open-vocabulary object detection for high-resolution remote sensing images 高分辨率遥感图像的开放词汇目标检测
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-12-01 DOI: 10.1016/j.cviu.2025.104566
HuaDong Li
In high-resolution remote sensing interpretation, object detection is evolving from closed-set to open-set, i.e., generalizing traditional detection models to detect objects described by open-vocabulary. The rapid development of vision-language pre-training in recent years has made research on open-vocabulary detection (OVD) feasible, which is also considered a critical step in the transition from weak to strong artificial intelligence. However, limited by the scarcity of large-scale vision-language paired datasets, research on open-vocabulary detection for high-resolution remote sensing images (RS-OVD) significantly lags behind that of natural images. Additionally, the high-scale variability of remote-sensing objects poses more significant challenges for open-vocabulary object detection. To address these challenges, we innovatively disentangle the generalizing process into an object-level task transformation problem and a semantic expansion problem. Furthermore, we propose a Cascade Knowledge Distillation model addressing the problems stage by stage. We evaluate our method on the DIOR and NWPU VHR-10 datasets. The experimental results demonstrate that the proposed method effectively generalizes the object detector to unknown categories.
在高分辨率遥感解译中,目标检测正从封闭集向开放集发展,即将传统的检测模型推广到开放词汇描述的目标检测。近年来,视觉语言预训练的快速发展使得开放词汇检测(OVD)的研究成为可能,这也是人工智能从弱智能向强智能过渡的关键一步。然而,受大规模视觉语言配对数据集的限制,高分辨率遥感图像开放词汇检测的研究明显滞后于自然图像。此外,遥感目标的高尺度变异性对开放词汇目标检测提出了更大的挑战。为了解决这些挑战,我们创新地将泛化过程分解为对象级任务转换问题和语义扩展问题。在此基础上,提出了一种逐级解决问题的级联知识蒸馏模型。我们在DIOR和NWPU VHR-10数据集上评估了我们的方法。实验结果表明,该方法有效地将目标检测器推广到未知类别。
{"title":"Open-vocabulary object detection for high-resolution remote sensing images","authors":"HuaDong Li","doi":"10.1016/j.cviu.2025.104566","DOIUrl":"10.1016/j.cviu.2025.104566","url":null,"abstract":"<div><div>In high-resolution remote sensing interpretation, object detection is evolving from closed-set to open-set, i.e., generalizing traditional detection models to detect objects described by open-vocabulary. The rapid development of vision-language pre-training in recent years has made research on open-vocabulary detection (OVD) feasible, which is also considered a critical step in the transition from weak to strong artificial intelligence. However, limited by the scarcity of large-scale vision-language paired datasets, research on open-vocabulary detection for high-resolution remote sensing images (RS-OVD) significantly lags behind that of natural images. Additionally, the high-scale variability of remote-sensing objects poses more significant challenges for open-vocabulary object detection. To address these challenges, we innovatively disentangle the generalizing process into an object-level task transformation problem and a semantic expansion problem. Furthermore, we propose a Cascade Knowledge Distillation model addressing the problems stage by stage. We evaluate our method on the DIOR and NWPU VHR-10 datasets. The experimental results demonstrate that the proposed method effectively generalizes the object detector to unknown categories.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104566"},"PeriodicalIF":3.5,"publicationDate":"2025-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human-in-the-loop adaptation in group activity feature learning for team sports video retrieval 团队运动视频检索中群体活动特征学习的人在环适应
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-29 DOI: 10.1016/j.cviu.2025.104577
Chihiro Nakatani , Hiroaki Kawashima , Norimichi Ukita
This paper proposes human-in-the-loop adaptation for Group Activity Feature Learning (GAFL) without group activity annotations. This human-in-the-loop adaptation is employed in a group-activity video retrieval framework to improve its retrieval performance. Our method initially pre-trains the GAF space based on the similarity of group activities in a self-supervised manner, unlike prior work that classifies videos into pre-defined group activity classes in a supervised learning manner. Our interactive fine-tuning process updates the GAF space to allow a user to better retrieve videos similar to query videos given by the user. In this fine-tuning, our proposed data-efficient video selection process provides several videos, which are selected from a video database, to the user in order to manually label these videos as positive or negative. These labeled videos are used to update (i.e., fine-tune) the GAF space, so that the positive and negative videos move closer to and farther away from the query videos through contrastive learning. Our comprehensive experimental results on two team sports datasets validate that our method significantly improves the retrieval performance. Ablation studies also demonstrate that several components in our human-in-the-loop adaptation contribute to the improvement of the retrieval performance. Code: https://github.com/chihina/GAFL-FINE-CVIU.
本文提出了一种针对群组活动特征学习(GAFL)的人在环自适应方法。在群体活动视频检索框架中引入人在环自适应,提高了检索性能。我们的方法首先以自监督的方式基于小组活动的相似性对GAF空间进行预训练,而不像之前的工作那样以监督学习的方式将视频分类为预定义的小组活动类。我们的交互式微调过程更新了GAF空间,允许用户更好地检索类似于用户提供的查询视频的视频。在这种微调中,我们提出的数据高效视频选择过程向用户提供从视频数据库中选择的几个视频,以便手动将这些视频标记为积极或消极。这些被标记的视频用于更新(即微调)GAF空间,通过对比学习,使正视频和负视频离查询视频更近或更远。我们在两个团队运动数据集上的综合实验结果验证了我们的方法显著提高了检索性能。消融研究还表明,我们的人在环适应中的几个组成部分有助于提高检索性能。代码:https://github.com/chihina/GAFL-FINE-CVIU。
{"title":"Human-in-the-loop adaptation in group activity feature learning for team sports video retrieval","authors":"Chihiro Nakatani ,&nbsp;Hiroaki Kawashima ,&nbsp;Norimichi Ukita","doi":"10.1016/j.cviu.2025.104577","DOIUrl":"10.1016/j.cviu.2025.104577","url":null,"abstract":"<div><div>This paper proposes human-in-the-loop adaptation for Group Activity Feature Learning (GAFL) without group activity annotations. This human-in-the-loop adaptation is employed in a group-activity video retrieval framework to improve its retrieval performance. Our method initially pre-trains the GAF space based on the similarity of group activities in a self-supervised manner, unlike prior work that classifies videos into pre-defined group activity classes in a supervised learning manner. Our interactive fine-tuning process updates the GAF space to allow a user to better retrieve videos similar to query videos given by the user. In this fine-tuning, our proposed data-efficient video selection process provides several videos, which are selected from a video database, to the user in order to manually label these videos as positive or negative. These labeled videos are used to update (i.e., fine-tune) the GAF space, so that the positive and negative videos move closer to and farther away from the query videos through contrastive learning. Our comprehensive experimental results on two team sports datasets validate that our method significantly improves the retrieval performance. Ablation studies also demonstrate that several components in our human-in-the-loop adaptation contribute to the improvement of the retrieval performance. Code: <span><span>https://github.com/chihina/GAFL-FINE-CVIU</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104577"},"PeriodicalIF":3.5,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pay more attention to dark regions for faster shadow detection 为了更快地检测阴影,要更加注意暗区
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-27 DOI: 10.1016/j.cviu.2025.104589
Xian-Tao Wu , Xiao-Diao Chen , Hongyu Chen , Wen Wu , Weiyin Ma , Haichuan Song
Deep learning-based shadow detection methods primarily focus on achieving higher accuracy, while often overlooking the importance of inference efficiency for downstream applications. This work attempts to reduce the number of processed patches during the feed-forward process and proposes a faster framework for shadow detection (namely FasterSD) based on vision transformer. We found that most of bright regions can converge to a stable status even at early stages of the feed-forward process, revealing massive computational redundancy. From this observation, we introduce a token pausing strategy to locate these simple patches and pause to refine their feature representations (i.e., tokens), enabling us to use most of computational resources to the remaining challenging patches. Specifically, we propose to use predicted posterior entropy as a proxy for prediction correctness, and design a random pausing scheme to ensure that the model meets flexible runtime requirements by directly adjusting the pausing configuration without repeated training. Extensive experiments on three shadow detection benchmarks (i.e., SBU, ISTD, and UCF) demonstrate that our FasterSD can run 12× faster than the state-of-the-art shadow detector with a comparable performance. The code will be available at https://github.com/wuwen1994/FasterSD.
基于深度学习的阴影检测方法主要侧重于实现更高的准确性,而往往忽略了下游应用的推理效率的重要性。本工作试图减少前馈过程中处理的斑块数量,并提出了一种基于视觉变换的更快的阴影检测框架(即FasterSD)。我们发现,即使在前馈过程的早期阶段,大多数亮区也可以收敛到稳定状态,从而显示出大量的计算冗余。从这个观察中,我们引入了一个令牌暂停策略来定位这些简单的补丁,并暂停以改进它们的特征表示(即令牌),使我们能够将大部分计算资源用于剩余的具有挑战性的补丁。具体而言,我们建议使用预测后验熵作为预测正确性的代理,并设计随机暂停方案,通过直接调整暂停配置而无需重复训练来确保模型满足灵活的运行时需求。在三个阴影检测基准(即SBU, ISTD和UCF)上进行的大量实验表明,我们的FasterSD在性能相当的情况下比最先进的阴影检测器快12倍。代码可在https://github.com/wuwen1994/FasterSD上获得。
{"title":"Pay more attention to dark regions for faster shadow detection","authors":"Xian-Tao Wu ,&nbsp;Xiao-Diao Chen ,&nbsp;Hongyu Chen ,&nbsp;Wen Wu ,&nbsp;Weiyin Ma ,&nbsp;Haichuan Song","doi":"10.1016/j.cviu.2025.104589","DOIUrl":"10.1016/j.cviu.2025.104589","url":null,"abstract":"<div><div>Deep learning-based shadow detection methods primarily focus on achieving higher accuracy, while often overlooking the importance of inference efficiency for downstream applications. This work attempts to reduce the number of processed patches during the feed-forward process and proposes a faster framework for shadow detection (namely FasterSD) based on vision transformer. We found that most of bright regions can converge to a stable status even at early stages of the feed-forward process, revealing massive computational redundancy. From this observation, we introduce a token pausing strategy to locate these simple patches and pause to refine their feature representations (<em>i.e.</em>, tokens), enabling us to use most of computational resources to the remaining challenging patches. Specifically, we propose to use predicted posterior entropy as a proxy for prediction correctness, and design a random pausing scheme to ensure that the model meets flexible runtime requirements by directly adjusting the pausing configuration without repeated training. Extensive experiments on three shadow detection benchmarks (<em>i.e.</em>, SBU, ISTD, and UCF) demonstrate that our FasterSD can run 12<span><math><mo>×</mo></math></span> faster than the state-of-the-art shadow detector with a comparable performance. The code will be available at <span><span>https://github.com/wuwen1994/FasterSD</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104589"},"PeriodicalIF":3.5,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GL2T-Diff: Medical image translation via spatial-frequency fusion diffusion models GL2T-Diff:基于空频融合扩散模型的医学图像平移
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-27 DOI: 10.1016/j.cviu.2025.104586
Dong Sui , Nanting Song , Xiao Tian , Han Zhou , Yacong Li , Maozu Guo , Kuanquan Wang , Gongning Luo
Diffusion Probabilistic Models (DPMs) are effective in medical image translation (MIT), but they tend to lose high-frequency details during the noise addition process, making it challenging to recover these details during the denoising process. This hinders the model’s ability to accurately preserve anatomical details during MIT tasks, which may ultimately affect the accuracy of diagnostic outcomes. To address this issue, we propose a diffusion model (GL2T-Diff) based on convolutional channel and Laplacian frequency attention mechanisms, which is designed to enhance MIT tasks by effectively preserving critical image features. We introduce two novel modules: the Global Channel Correlation Attention Module (GC2A Module) and the Laplacian Frequency Attention Module (LFA Module). The GC2A Module enhances the model’s ability to capture global dependencies between channels, while the LFA Module effectively retains high-frequency components, which are crucial for preserving anatomical structures. To leverage the complementary strengths of both GC2A Module and LFA Module, we propose the Laplacian Convolutional Attention with Phase-Amplitude Fusion (FusLCA), which facilitates effective integration of spatial and frequency domain features. Experimental results show that GL2T-Diff outperforms state-of-the-art (SOTA) methods, including those based on Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and other DPMs, across the BraTS-2021/2024, IXI, and Pelvic datasets. The code is available at https://github.com/puzzlesong8277/GL2T-Diff.
扩散概率模型(Diffusion Probabilistic Models, dpm)在医学图像翻译(MIT)中是有效的,但在加噪过程中容易丢失高频细节,使得在去噪过程中难以恢复这些细节。这阻碍了模型在MIT任务中准确保存解剖细节的能力,这可能最终影响诊断结果的准确性。为了解决这个问题,我们提出了一个基于卷积通道和拉普拉斯频率注意机制的扩散模型(GL2T-Diff),该模型旨在通过有效地保留关键图像特征来增强MIT任务。我们介绍了两个新颖的模块:全局信道相关注意模块(GC2A模块)和拉普拉斯频率注意模块(LFA模块)。GC2A模块增强了模型捕获通道之间全局依赖关系的能力,而LFA模块有效地保留了高频成分,这对于保存解剖结构至关重要。为了利用GC2A模块和LFA模块的互补优势,我们提出了拉普拉斯卷积注意与相位振幅融合(FusLCA),该方法可以有效地整合空间和频域特征。实验结果表明,在BraTS-2021/2024、IXI和骨盆数据集上,GL2T-Diff优于最先进的(SOTA)方法,包括基于生成对抗网络(gan)、变分自动编码器(VAEs)和其他dms的方法。代码可在https://github.com/puzzlesong8277/GL2T-Diff上获得。
{"title":"GL2T-Diff: Medical image translation via spatial-frequency fusion diffusion models","authors":"Dong Sui ,&nbsp;Nanting Song ,&nbsp;Xiao Tian ,&nbsp;Han Zhou ,&nbsp;Yacong Li ,&nbsp;Maozu Guo ,&nbsp;Kuanquan Wang ,&nbsp;Gongning Luo","doi":"10.1016/j.cviu.2025.104586","DOIUrl":"10.1016/j.cviu.2025.104586","url":null,"abstract":"<div><div>Diffusion Probabilistic Models (DPMs) are effective in medical image translation (MIT), but they tend to lose high-frequency details during the noise addition process, making it challenging to recover these details during the denoising process. This hinders the model’s ability to accurately preserve anatomical details during MIT tasks, which may ultimately affect the accuracy of diagnostic outcomes. To address this issue, we propose a diffusion model (<span><math><mrow><msup><mrow><mi>GL</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>T</mi></mrow></math></span>-Diff) based on convolutional channel and Laplacian frequency attention mechanisms, which is designed to enhance MIT tasks by effectively preserving critical image features. We introduce two novel modules: the Global Channel Correlation Attention Module (<span><math><mrow><msup><mrow><mi>GC</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>A</mi></mrow></math></span> Module) and the Laplacian Frequency Attention Module (LFA Module). The <span><math><mrow><msup><mrow><mi>GC</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>A</mi></mrow></math></span> Module enhances the model’s ability to capture global dependencies between channels, while the LFA Module effectively retains high-frequency components, which are crucial for preserving anatomical structures. To leverage the complementary strengths of both <span><math><mrow><msup><mrow><mi>GC</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>A</mi></mrow></math></span> Module and LFA Module, we propose the Laplacian Convolutional Attention with Phase-Amplitude Fusion (FusLCA), which facilitates effective integration of spatial and frequency domain features. Experimental results show that <span><math><mrow><msup><mrow><mi>GL</mi></mrow><mrow><mn>2</mn></mrow></msup><mi>T</mi></mrow></math></span>-Diff outperforms state-of-the-art (SOTA) methods, including those based on Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and other DPMs, across the BraTS-2021/2024, IXI, and Pelvic datasets. The code is available at <span><span>https://github.com/puzzlesong8277/GL2T-Diff</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104586"},"PeriodicalIF":3.5,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatio-temporal transformers for action unit classification with event cameras 用事件摄像机进行动作单元分类的时空变换
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-26 DOI: 10.1016/j.cviu.2025.104578
Luca Cultrera , Federico Becattini , Lorenzo Berlincioni , Claudio Ferrari , Alberto Del Bimbo
Facial analysis plays a vital role in assistive technologies aimed at improving human–computer interaction, emotional well-being, and non-verbal communication monitoring. For more fine-grained tasks, however, standard sensors might not be up to the task, due to their latency, making it impossible to record and detect micro-movements that carry a highly informative signal, which is necessary for inferring the true emotions of a subject. Event cameras have been increasingly gaining interest as a possible solution to this and similar high-frame rate tasks. In this paper we propose a novel spatio-temporal Vision Transformer model that uses Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) to enhance the accuracy of Action Unit classification from event streams. We also address the lack of labeled event data in the literature, which can be considered a major cause of an existing gap between the maturity of RGB and neuromorphic vision models. In fact, gathering data is harder in the event domain since it cannot be crawled from the web and labeling frames should take into account event aggregation rates and the fact that static parts might not be visible in certain frames. To this end, we present FACEMORPHIC, a temporally synchronized multimodal face dataset composed of both RGB videos and event streams. The dataset is annotated at a video level with facial Action Units and also contains streams collected with a variety of possible applications, ranging from 3D shape estimation to lip-reading. We then show how temporal synchronization can allow effective neuromorphic face analysis without the need to manually annotate videos: we instead leverage cross-modal supervision bridging the domain gap by representing face shapes in a 3D space. This makes our model suitable for real-world assistive scenarios, including privacy-preserving wearable systems and responsive social interaction monitoring. Our proposed model outperforms baseline methods by capturing spatial and temporal information, crucial for recognizing subtle facial micro-expressions.
面部分析在旨在改善人机交互、情绪健康和非语言交流监测的辅助技术中起着至关重要的作用。然而,对于更细粒度的任务,标准传感器可能无法胜任这项任务,因为它们的延迟,使得不可能记录和检测携带高信息量信号的微运动,这对于推断受试者的真实情绪是必要的。事件相机作为一种可能的解决方案已经越来越引起人们的兴趣,因为它可以解决类似的高帧率任务。本文提出了一种新的时空视觉转换模型,该模型利用移位Patch标记化(SPT)和局部自注意(LSA)来提高从事件流中分类动作单元的准确性。我们还解决了文献中缺乏标记事件数据的问题,这可能被认为是RGB和神经形态视觉模型成熟度之间存在差距的主要原因。事实上,在事件域中收集数据更加困难,因为它不能从web中抓取,标记框架应该考虑到事件聚合率和静态部分在某些框架中可能不可见的事实。为此,我们提出了FACEMORPHIC,这是一个由RGB视频和事件流组成的临时同步多模态人脸数据集。该数据集在视频级别上使用面部动作单元进行注释,并且还包含从3D形状估计到唇读等各种可能应用程序收集的流。然后,我们展示了时间同步如何允许有效的神经形态面部分析,而无需手动注释视频:我们通过在3D空间中表示面部形状来利用跨模态监督弥合域差距。这使得我们的模型适用于现实世界的辅助场景,包括保护隐私的可穿戴系统和响应式社交互动监控。我们提出的模型通过捕获空间和时间信息来优于基线方法,这对于识别细微的面部微表情至关重要。
{"title":"Spatio-temporal transformers for action unit classification with event cameras","authors":"Luca Cultrera ,&nbsp;Federico Becattini ,&nbsp;Lorenzo Berlincioni ,&nbsp;Claudio Ferrari ,&nbsp;Alberto Del Bimbo","doi":"10.1016/j.cviu.2025.104578","DOIUrl":"10.1016/j.cviu.2025.104578","url":null,"abstract":"<div><div>Facial analysis plays a vital role in assistive technologies aimed at improving human–computer interaction, emotional well-being, and non-verbal communication monitoring. For more fine-grained tasks, however, standard sensors might not be up to the task, due to their latency, making it impossible to record and detect micro-movements that carry a highly informative signal, which is necessary for inferring the true emotions of a subject. Event cameras have been increasingly gaining interest as a possible solution to this and similar high-frame rate tasks. In this paper we propose a novel spatio-temporal Vision Transformer model that uses Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA) to enhance the accuracy of Action Unit classification from event streams. We also address the lack of labeled event data in the literature, which can be considered a major cause of an existing gap between the maturity of RGB and neuromorphic vision models. In fact, gathering data is harder in the event domain since it cannot be crawled from the web and labeling frames should take into account event aggregation rates and the fact that static parts might not be visible in certain frames. To this end, we present FACEMORPHIC, a temporally synchronized multimodal face dataset composed of both RGB videos and event streams. The dataset is annotated at a video level with facial Action Units and also contains streams collected with a variety of possible applications, ranging from 3D shape estimation to lip-reading. We then show how temporal synchronization can allow effective neuromorphic face analysis without the need to manually annotate videos: we instead leverage cross-modal supervision bridging the domain gap by representing face shapes in a 3D space. This makes our model suitable for real-world assistive scenarios, including privacy-preserving wearable systems and responsive social interaction monitoring. Our proposed model outperforms baseline methods by capturing spatial and temporal information, crucial for recognizing subtle facial micro-expressions.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104578"},"PeriodicalIF":3.5,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
What2Keep: A communication-efficient collaborative perception framework for 3D detection via keeping valuable information What2Keep:通过保存有价值的信息来进行3D检测的高效通信协作感知框架
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-26 DOI: 10.1016/j.cviu.2025.104572
Hongkun Zhang, Yan Wu, Zhengbin Zhang
Collaborative perception has aroused significant attention in autonomous driving, as the ability to share information among Connected Autonomous Vehicles (CAVs) substantially enhances perception performance. However, collaborative perception faces critical challenges, among which limited communication bandwidth remains a fundamental bottleneck due to inherent constraints in current communication technologies. Bandwidth limitations can severely degrade transmitted information, leading to a sharp decline in perception performance. To address this issue, we propose What To Keep (What2Keep), a collaborative perception framework that dynamically adapts to communication bandwidth fluctuations. Our method aims to establish a consensus between vehicles, prioritizing the transmission of intermediate features that are most critical to the ego vehicle. The proposed framework offers two key advantages: (1) the consensus-based feature selection mechanism effectively incorporates different collaborative patterns as prior knowledge to help vehicles preserves the most valuable features, improving communication efficiency and enhancing model robustness against communication degradation; and (2) What2Keep employs a cross-vehicle fusion strategy that effectively aggregates cooperative perception information while exhibiting robustness against varying communication volume. Extensive experiments have demonstrated the superior performance of our method in OPV2V and V2XSet benchmarks, achieving state-of-the-art [email protected] scores of 83.57% and 77.78% respectively while maintaining approximately 20% relative improvement under severe bandwidth constraints (214B). Our qualitative experiments successfully explain the working mechanism of What2Keep. Code will be available at https://github.com/CHAMELENON/What2Keep.
协同感知在自动驾驶领域引起了广泛关注,因为互联自动驾驶汽车(cav)之间共享信息的能力大大提高了感知性能。然而,协作感知面临着严峻的挑战,其中有限的通信带宽仍然是一个基本瓶颈,由于现有通信技术的固有约束。带宽限制会严重降低传输的信息,导致感知性能急剧下降。为了解决这个问题,我们提出了保持什么(What2Keep),这是一个动态适应通信带宽波动的协作感知框架。我们的方法旨在在车辆之间建立共识,优先传输对自我车辆最关键的中间特征。该框架具有两个主要优点:(1)基于共识的特征选择机制有效地将不同的协作模式作为先验知识,帮助车辆保留最有价值的特征,提高通信效率,增强模型对通信退化的鲁棒性;(2) What2Keep采用跨车辆融合策略,有效聚合合作感知信息,同时对不同通信量表现出鲁棒性。大量的实验证明了我们的方法在OPV2V和V2XSet基准测试中的优越性能,分别达到了最先进的[email protected]分数83.57%和77.78%,同时在严格的带宽限制下保持了大约20%的相对改进(214B)。我们的定性实验成功地解释了What2Keep的工作机制。代码将在https://github.com/CHAMELENON/What2Keep上提供。
{"title":"What2Keep: A communication-efficient collaborative perception framework for 3D detection via keeping valuable information","authors":"Hongkun Zhang,&nbsp;Yan Wu,&nbsp;Zhengbin Zhang","doi":"10.1016/j.cviu.2025.104572","DOIUrl":"10.1016/j.cviu.2025.104572","url":null,"abstract":"<div><div>Collaborative perception has aroused significant attention in autonomous driving, as the ability to share information among Connected Autonomous Vehicles (CAVs) substantially enhances perception performance. However, collaborative perception faces critical challenges, among which limited communication bandwidth remains a fundamental bottleneck due to inherent constraints in current communication technologies. Bandwidth limitations can severely degrade transmitted information, leading to a sharp decline in perception performance. To address this issue, we propose What To Keep (What2Keep), a collaborative perception framework that dynamically adapts to communication bandwidth fluctuations. Our method aims to establish a consensus between vehicles, prioritizing the transmission of intermediate features that are most critical to the ego vehicle. The proposed framework offers two key advantages: (1) the consensus-based feature selection mechanism effectively incorporates different collaborative patterns as prior knowledge to help vehicles preserves the most valuable features, improving communication efficiency and enhancing model robustness against communication degradation; and (2) What2Keep employs a cross-vehicle fusion strategy that effectively aggregates cooperative perception information while exhibiting robustness against varying communication volume. Extensive experiments have demonstrated the superior performance of our method in OPV2V and V2XSet benchmarks, achieving state-of-the-art [email protected] scores of 83.57% and 77.78% respectively while maintaining approximately 20% relative improvement under severe bandwidth constraints (<span><math><mrow><msup><mrow><mn>2</mn></mrow><mrow><mn>14</mn></mrow></msup><mtext>B</mtext></mrow></math></span>). Our qualitative experiments successfully explain the working mechanism of What2Keep. Code will be available at <span><span>https://github.com/CHAMELENON/What2Keep</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104572"},"PeriodicalIF":3.5,"publicationDate":"2025-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145685150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transformer tracking with high-low frequency attention 变压器跟踪要注意高低频
IF 3.5 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2025-11-25 DOI: 10.1016/j.cviu.2025.104563
Zhi Chen , Zhen Yu
Transformer-based trackers have achieved impressive performance due to their powerful global modeling capability. However, most existing methods employ vanilla attention modules, which treat template and search regions homogeneously and overlook the distinct characteristics of different frequency features—high-frequency components capture local details critical for target identification, while low-frequency components provide global structural context. To bridge this gap, we propose a novel Transformer architecture with High-low (Hi–Lo) frequency attention for visual object tracking. Specifically, a high-frequency attention module is applied to the template region to preserve fine-grained target details. Conversely, a low-frequency attention module processes the search region to efficiently capture global dependencies with reduced computational cost. Furthermore, we introduce a Global–Local Dual Interaction (GLDI) module to establish reciprocal feature enhancement between the template and search feature maps, effectively integrating multi-frequency information. Extensive experiments on six challenging benchmarks (LaSOT, GOT-10k, TrackingNet, UAV123, OTB100, and NFS) demonstrate that our method, named HiLoTT, achieves state-of-the-art performance while maintaining a real-time speed of 45 frames per second.
基于变压器的跟踪器由于其强大的全局建模能力而取得了令人印象深刻的性能。然而,大多数现有的方法都采用了普通的注意力模块,它们对模板和搜索区域进行同质处理,忽略了不同频率特征的鲜明特征——高频成分捕获了对目标识别至关重要的局部细节,而低频成分提供了全局结构背景。为了弥补这一差距,我们提出了一种新颖的变压器架构,具有高-低(Hi-Lo)频率关注视觉目标跟踪。具体来说,在模板区域应用高频注意模块来保留细粒度的目标细节。相反,低频注意力模块处理搜索区域,有效捕获全局依赖关系,减少计算成本。此外,我们引入了全局-局部双交互(GLDI)模块,在模板和搜索特征映射之间建立互惠的特征增强,有效地集成了多频信息。在六个具有挑战性的基准测试(LaSOT, GOT-10k, TrackingNet, UAV123, OTB100和NFS)上进行的广泛实验表明,我们的方法HiLoTT在保持45帧/秒的实时速度的同时实现了最先进的性能。
{"title":"Transformer tracking with high-low frequency attention","authors":"Zhi Chen ,&nbsp;Zhen Yu","doi":"10.1016/j.cviu.2025.104563","DOIUrl":"10.1016/j.cviu.2025.104563","url":null,"abstract":"<div><div>Transformer-based trackers have achieved impressive performance due to their powerful global modeling capability. However, most existing methods employ vanilla attention modules, which treat template and search regions homogeneously and overlook the distinct characteristics of different frequency features—high-frequency components capture local details critical for target identification, while low-frequency components provide global structural context. To bridge this gap, we propose a novel Transformer architecture with High-low (Hi–Lo) frequency attention for visual object tracking. Specifically, a high-frequency attention module is applied to the template region to preserve fine-grained target details. Conversely, a low-frequency attention module processes the search region to efficiently capture global dependencies with reduced computational cost. Furthermore, we introduce a Global–Local Dual Interaction (GLDI) module to establish reciprocal feature enhancement between the template and search feature maps, effectively integrating multi-frequency information. Extensive experiments on six challenging benchmarks (LaSOT, GOT-10k, TrackingNet, UAV123, OTB100, and NFS) demonstrate that our method, named HiLoTT, achieves state-of-the-art performance while maintaining a real-time speed of 45 frames per second.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"263 ","pages":"Article 104563"},"PeriodicalIF":3.5,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145618394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1