首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
A lightweight convolutional neural network-based feature extractor for visible images 基于卷积神经网络的轻量级可见光图像特征提取器
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-12 DOI: 10.1016/j.cviu.2024.104157

Feature extraction networks (FENs), as the first stage in many computer vision tasks, play critical roles. Previous studies regarding FENs employed deeper and wider networks to attain higher accuracy, but their approaches were memory-inefficient and computationally intensive. Here, we present an accurate and lightweight feature extractor (RoShuNet) for visible images based on ShuffleNetV2. The provided improvements are threefold. To make ShuffleNetV2 compact without degrading its feature extraction ability, we propose an aggregated dual group convolutional module; to better aid the channel interflow process, we propose a γ-weighted shuffling module; to further reduce the complexity and size of the model, we introduce slimming strategies. Classification experiments demonstrate the state-of-the-art (SOTA) performance of RoShuNet, which yields an increase in accuracy and reduces the complexity and size of the model compared to those of ShuffleNetV2. Generalization experiments verify that the proposed method is also applicable to feature extraction tasks in semantic segmentation and multiple-object tracking scenarios, achieving comparable accuracy to that of other approaches with more memory and greater computational efficiency. Our method provides a novel perspective for designing lightweight models.

特征提取网络(FEN)作为许多计算机视觉任务的第一阶段,发挥着至关重要的作用。以往关于特征提取网络的研究采用了更深、更广的网络来获得更高的精度,但这些方法内存不足、计算量大。在此,我们提出了一种基于 ShuffleNetV2 的适用于可见光图像的精确、轻量级特征提取器(RoShuNet)。其改进体现在三个方面。为了使 ShuffleNetV2 结构紧凑而不降低其特征提取能力,我们提出了一个聚合双组卷积模块;为了更好地帮助通道互流过程,我们提出了一个 γ 加权洗牌模块;为了进一步降低模型的复杂性和大小,我们引入了瘦身策略。分类实验证明了 RoShuNet 最先进(SOTA)的性能,与 ShuffleNetV2 相比,RoShuNet 提高了准确率,降低了模型的复杂度和大小。通用化实验验证了所提出的方法也适用于语义分割和多目标跟踪场景中的特征提取任务,其准确度与其他方法相当,但内存更大,计算效率更高。我们的方法为设计轻量级模型提供了一个新的视角。
{"title":"A lightweight convolutional neural network-based feature extractor for visible images","authors":"","doi":"10.1016/j.cviu.2024.104157","DOIUrl":"10.1016/j.cviu.2024.104157","url":null,"abstract":"<div><p>Feature extraction networks (FENs), as the first stage in many computer vision tasks, play critical roles. Previous studies regarding FENs employed deeper and wider networks to attain higher accuracy, but their approaches were memory-inefficient and computationally intensive. Here, we present an accurate and lightweight feature extractor (RoShuNet) for visible images based on ShuffleNetV2. The provided improvements are threefold. To make ShuffleNetV2 compact without degrading its feature extraction ability, we propose an aggregated dual group convolutional module; to better aid the channel interflow process, we propose a <span><math><mi>γ</mi></math></span>-weighted shuffling module; to further reduce the complexity and size of the model, we introduce slimming strategies. Classification experiments demonstrate the state-of-the-art (SOTA) performance of RoShuNet, which yields an increase in accuracy and reduces the complexity and size of the model compared to those of ShuffleNetV2. Generalization experiments verify that the proposed method is also applicable to feature extraction tasks in semantic segmentation and multiple-object tracking scenarios, achieving comparable accuracy to that of other approaches with more memory and greater computational efficiency. Our method provides a novel perspective for designing lightweight models.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240201","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LightSOD: Towards lightweight and efficient network for salient object detection LightSOD:为突出物体检测建立轻量级高效网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-12 DOI: 10.1016/j.cviu.2024.104148

The recent emphasis has been on achieving rapid and precise detection of salient objects, which presents a challenge for resource-constrained edge devices because the current models are too computationally demanding for deployment. Some recent research has prioritized inference speed over accuracy to address this issue. In response to the inherent trade-off between accuracy and efficiency, we introduce an innovative framework called LightSOD, with the primary objective of achieving a balance between precision and computational efficiency. LightSOD comprises several vital components, including the spatial-frequency boundary refinement module (SFBR), which utilizes wavelet transform to restore spatial loss information and capture edge features from the spatial-frequency domain. Additionally, we introduce a cross-pyramid enhancement module (CPE), which utilizes adaptive kernels to capture multi-scale group-wise features in deep layers. Besides, we introduce a group-wise semantic enhancement module (GSRM) to boost global semantic features in the topmost layer. Finally, we introduce a cross-aggregation module (CAM) to incorporate channel-wise features across layers, followed by a triple features fusion (TFF) that aggregates features from coarse to fine levels. By conducting experiments on five datasets and utilizing various backbones, we have demonstrated that LSOD achieves competitive performance compared with heavyweight cutting-edge models while significantly reducing computational complexity.

最近的重点是实现对突出物体的快速和精确检测,这对资源有限的边缘设备来说是一个挑战,因为目前的模型对部署的计算要求太高。为解决这一问题,最近的一些研究将推理速度置于精确度之上。针对精确度和效率之间的固有权衡,我们引入了一个名为 LightSOD 的创新框架,其主要目标是实现精确度和计算效率之间的平衡。LightSOD 由几个重要组件组成,包括空间-频率边界细化模块(SFBR),它利用小波变换来恢复空间损失信息,并从空间-频率域捕捉边缘特征。此外,我们还引入了跨金字塔增强模块(CPE),该模块利用自适应核来捕捉深层的多尺度分组特征。此外,我们还引入了分组语义增强模块(GSRM),以增强最顶层的全局语义特征。最后,我们引入了交叉聚合模块(CAM)来整合跨层的信道特征,然后引入三重特征融合模块(TFF)来聚合从粗层到细层的特征。通过在五个数据集上利用各种骨干网进行实验,我们证明 LSOD 与重量级的前沿模型相比,在大幅降低计算复杂度的同时,还能实现具有竞争力的性能。
{"title":"LightSOD: Towards lightweight and efficient network for salient object detection","authors":"","doi":"10.1016/j.cviu.2024.104148","DOIUrl":"10.1016/j.cviu.2024.104148","url":null,"abstract":"<div><p>The recent emphasis has been on achieving rapid and precise detection of salient objects, which presents a challenge for resource-constrained edge devices because the current models are too computationally demanding for deployment. Some recent research has prioritized inference speed over accuracy to address this issue. In response to the inherent trade-off between accuracy and efficiency, we introduce an innovative framework called LightSOD, with the primary objective of achieving a balance between precision and computational efficiency. LightSOD comprises several vital components, including the spatial-frequency boundary refinement module (SFBR), which utilizes wavelet transform to restore spatial loss information and capture edge features from the spatial-frequency domain. Additionally, we introduce a cross-pyramid enhancement module (CPE), which utilizes adaptive kernels to capture multi-scale group-wise features in deep layers. Besides, we introduce a group-wise semantic enhancement module (GSRM) to boost global semantic features in the topmost layer. Finally, we introduce a cross-aggregation module (CAM) to incorporate channel-wise features across layers, followed by a triple features fusion (TFF) that aggregates features from coarse to fine levels. By conducting experiments on five datasets and utilizing various backbones, we have demonstrated that LSOD achieves competitive performance compared with heavyweight cutting-edge models while significantly reducing computational complexity.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002297/pdfft?md5=b9d62426fc2e76aa1cbe833773c6cfaa&pid=1-s2.0-S1077314224002297-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Triple-Stream Commonsense Circulation Transformer Network for Image Captioning 用于图像字幕的三流共用循环变压器网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-12 DOI: 10.1016/j.cviu.2024.104165

Traditional image captioning methods only have a local perspective at the dataset level, allowing them to explore dispersed information within individual images. However, the lack of a global perspective prevents them from capturing common characteristics among similar images. To address the limitation, this paper introduces a novel Triple-stream Commonsense Circulating Transformer Network (TCCTN). It incorporates contextual stream into the encoder, combining enhanced channel stream and spatial stream for comprehensive feature learning. The proposed commonsense-aware contextual attention (CCA) module queries commonsense contextual features from the dataset, obtaining global contextual association information by projecting grid features into the contextual space. The pure semantic channel attention (PSCA) module leverages compressed spatial domain for channel pooling, focusing on attention weights of pure channel features to capture inherent semantic features. The region spatial attention (RSA) module enhances spatial concepts in semantic learning by incorporating region position information. Furthermore, leveraging the complementary differences among the three features, TCCTN introduces the mixture of experts strategy to enhance the unique discriminative ability of features and promote their integration in textual feature learning. Extensive experiments on the MS-COCO dataset demonstrate the effectiveness of contextual commonsense stream and the superior performance of TCCTN.

传统的图像标题制作方法只能从数据集层面的局部视角出发,探索单张图像中的分散信息。然而,由于缺乏全局视角,这些方法无法捕捉相似图像的共同特征。为解决这一局限性,本文介绍了一种新颖的三重流常识循环变压器网络(TCCTN)。它将上下文流纳入编码器,结合增强的信道流和空间流进行综合特征学习。所提出的常识感知上下文注意(CCA)模块从数据集中查询常识上下文特征,通过将网格特征投射到上下文空间来获取全局上下文关联信息。纯语义通道注意(PSCA)模块利用压缩空间域进行通道池化,重点关注纯通道特征的注意权重,以捕捉固有的语义特征。区域空间注意力(RSA)模块通过纳入区域位置信息,增强了语义学习中的空间概念。此外,TCCTN 利用三种特征之间的互补性差异,引入了专家混合策略,以增强特征的独特判别能力,并促进它们在文本特征学习中的融合。在 MS-COCO 数据集上进行的大量实验证明了上下文常识流的有效性和 TCCTN 的优越性能。
{"title":"Triple-Stream Commonsense Circulation Transformer Network for Image Captioning","authors":"","doi":"10.1016/j.cviu.2024.104165","DOIUrl":"10.1016/j.cviu.2024.104165","url":null,"abstract":"<div><p>Traditional image captioning methods only have a local perspective at the dataset level, allowing them to explore dispersed information within individual images. However, the lack of a global perspective prevents them from capturing common characteristics among similar images. To address the limitation, this paper introduces a novel <strong>T</strong>riple-stream <strong>C</strong>ommonsense <strong>C</strong>irculating <strong>T</strong>ransformer <strong>N</strong>etwork (TCCTN). It incorporates contextual stream into the encoder, combining enhanced channel stream and spatial stream for comprehensive feature learning. The proposed commonsense-aware contextual attention (CCA) module queries commonsense contextual features from the dataset, obtaining global contextual association information by projecting grid features into the contextual space. The pure semantic channel attention (PSCA) module leverages compressed spatial domain for channel pooling, focusing on attention weights of pure channel features to capture inherent semantic features. The region spatial attention (RSA) module enhances spatial concepts in semantic learning by incorporating region position information. Furthermore, leveraging the complementary differences among the three features, TCCTN introduces the mixture of experts strategy to enhance the unique discriminative ability of features and promote their integration in textual feature learning. Extensive experiments on the MS-COCO dataset demonstrate the effectiveness of contextual commonsense stream and the superior performance of TCCTN.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A convex Kullback–Leibler optimization for semi-supervised few-shot learning 用于半监督少点学习的凸库尔巴克-莱伯勒优化方法
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-12 DOI: 10.1016/j.cviu.2024.104152

Few-shot learning has achieved great success in many fields, thanks to its requirement of limited number of labeled data. However, most of the state-of-the-art techniques of few-shot learning employ transfer learning, which still requires massive labeled data to train a meta-learning system. To simulate the human learning mechanism, a deep model of few-shot learning is proposed to learn from one, or a few examples. First of all in this paper, we analyze and note that the problem with representative semi-supervised few-shot learning methods is getting stuck in local optimization and the negligence of intra-class compactness problem. To address these issue, we propose a novel semi-supervised few-shot learning method with Convex Kullback–Leibler, hereafter referred to as CKL, in which KL divergence is employed to achieve global optimum solution by optimizing a strictly convex functions to perform clustering; whereas sample selection strategy is employed to achieve intra-class compactness. In training, the CKL is optimized iteratively via deep learning and expectation–maximization algorithm. Intensive experiments have been conducted on three popular benchmark data sets, take miniImagenet data set for example, our proposed CKL achieved 76.83% and 85.78% under 5-way 1-shot and 5-way 5-shot, the experimental results show that this method significantly improves the classification ability of few-shot learning tasks and obtains the start-of-the-art performance.

少量学习只需要有限数量的标记数据,因此在许多领域都取得了巨大成功。然而,最先进的少量学习技术大多采用迁移学习,这仍然需要大量的标记数据来训练元学习系统。为了模拟人类的学习机制,我们提出了一种从一个或几个示例中学习的少次学习深度模型。本文首先分析并指出,具有代表性的半监督少量学习方法的问题在于陷入局部优化和忽略类内紧凑性问题。为了解决这些问题,我们提出了一种新颖的带凸 Kullback-Leibler(以下简称 CKL)的半监督少点学习方法,该方法通过优化严格凸函数来进行聚类,从而利用 KL 发散实现全局最优解;同时利用样本选择策略来实现类内紧凑性。在训练过程中,通过深度学习和期望最大化算法对 CKL 进行迭代优化。以 miniImagenet 数据集为例,我们提出的 CKL 在 5 路 1-shot 和 5 路 5-shot 下分别达到了 76.83% 和 85.78%,实验结果表明该方法显著提高了少数几次学习任务的分类能力,并获得了最先进的性能。
{"title":"A convex Kullback–Leibler optimization for semi-supervised few-shot learning","authors":"","doi":"10.1016/j.cviu.2024.104152","DOIUrl":"10.1016/j.cviu.2024.104152","url":null,"abstract":"<div><p>Few-shot learning has achieved great success in many fields, thanks to its requirement of limited number of labeled data. However, most of the state-of-the-art techniques of few-shot learning employ transfer learning, which still requires massive labeled data to train a meta-learning system. To simulate the human learning mechanism, a deep model of few-shot learning is proposed to learn from one, or a few examples. First of all in this paper, we analyze and note that the problem with representative semi-supervised few-shot learning methods is getting stuck in local optimization and the negligence of intra-class compactness problem. To address these issue, we propose a novel semi-supervised few-shot learning method with Convex Kullback–Leibler, hereafter referred to as CKL, in which KL divergence is employed to achieve global optimum solution by optimizing a strictly convex functions to perform clustering; whereas sample selection strategy is employed to achieve intra-class compactness. In training, the CKL is optimized iteratively via deep learning and expectation–maximization algorithm. Intensive experiments have been conducted on three popular benchmark data sets, take miniImagenet data set for example, our proposed CKL achieved 76.83% and 85.78% under 5-way 1-shot and 5-way 5-shot, the experimental results show that this method significantly improves the classification ability of few-shot learning tasks and obtains the start-of-the-art performance.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142271746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CAFNet: Context aligned fusion for depth completion CAFNet:上下文对齐融合,实现深度补全
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-11 DOI: 10.1016/j.cviu.2024.104158

Depth completion aims at reconstructing a dense depth from sparse depth input, frequently using color images as guidance. The sparse depth map lacks sufficient contexts for reconstructing focal contexts such as the shape of objects. The RGB images contain redundant contexts including details useless for reconstruction, which reduces the efficiency of focal context extraction. The unaligned contextual information from these two modalities poses a challenge to focal context extraction and further fusion, as well as the accuracy of depth completion. To optimize the utilization of multimodal contextual information, we explore a novel framework: Context Aligned Fusion Network (CAFNet). CAFNet comprises two stages: the context-aligned stage and the full-scale stage. In the context-aligned stage, CAFNet downsamples input RGB-D pairs to the scale, at which multimodal contextual information is adequately aligned for feature extraction in two encoders and fusion in CF modules. In the full-scale stage, feature maps with fused multimodal context from the previous stage are upsampled to the original scale and subsequentially fused with full-scale depth features by the GF module utilizing a dynamic masked fusion strategy. Ultimately, accurate dense depth maps are reconstructed, leveraging the GF module’s resultant features. Experiments conducted on indoor and outdoor benchmark datasets show that the CAFNet produces results comparable to state-of-the-art methods while effectively reducing computational costs.

深度补全旨在从稀疏的深度输入中重建密集的深度,通常使用彩色图像作为指导。稀疏深度图缺乏足够的上下文来重建物体形状等焦点上下文。RGB 图像包含冗余上下文,包括对重建无用的细节,这降低了焦点上下文提取的效率。这两种模式的上下文信息不一致,给焦点上下文提取和进一步融合以及深度补全的准确性带来了挑战。为了优化多模态上下文信息的利用,我们探索了一种新颖的框架:上下文对齐融合网络(CAFNet)。CAFNet 包括两个阶段:上下文对齐阶段和全面阶段。在上下文对齐阶段,CAFNet 对输入的 RGB-D 对进行缩放采样,在此阶段,多模态上下文信息得到充分对齐,以便在两个编码器中进行特征提取,并在 CF 模块中进行融合。在全尺度阶段,上一阶段融合了多模态上下文的特征图被上采样到原始尺度,随后由 GF 模块利用动态屏蔽融合策略与全尺度深度特征融合。最终,利用 GF 模块的结果特征重建精确的密集深度图。在室内和室外基准数据集上进行的实验表明,CAFNet 所产生的结果可与最先进的方法相媲美,同时还能有效降低计算成本。
{"title":"CAFNet: Context aligned fusion for depth completion","authors":"","doi":"10.1016/j.cviu.2024.104158","DOIUrl":"10.1016/j.cviu.2024.104158","url":null,"abstract":"<div><p>Depth completion aims at reconstructing a dense depth from sparse depth input, frequently using color images as guidance. The sparse depth map lacks sufficient contexts for reconstructing focal contexts such as the shape of objects. The RGB images contain redundant contexts including details useless for reconstruction, which reduces the efficiency of focal context extraction. The unaligned contextual information from these two modalities poses a challenge to focal context extraction and further fusion, as well as the accuracy of depth completion. To optimize the utilization of multimodal contextual information, we explore a novel framework: Context Aligned Fusion Network (CAFNet). CAFNet comprises two stages: the context-aligned stage and the full-scale stage. In the context-aligned stage, CAFNet downsamples input RGB-D pairs to the scale, at which multimodal contextual information is adequately aligned for feature extraction in two encoders and fusion in CF modules. In the full-scale stage, feature maps with fused multimodal context from the previous stage are upsampled to the original scale and subsequentially fused with full-scale depth features by the GF module utilizing a dynamic masked fusion strategy. Ultimately, accurate dense depth maps are reconstructed, leveraging the GF module’s resultant features. Experiments conducted on indoor and outdoor benchmark datasets show that the CAFNet produces results comparable to state-of-the-art methods while effectively reducing computational costs.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240202","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HBANet: A hybrid boundary-aware attention network for infrared and visible image fusion HBANet:用于红外和可见光图像融合的混合边界感知注意力网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-10 DOI: 10.1016/j.cviu.2024.104161

Infrared and visible image fusion is an extensively investigated problem in infrared image processing, aiming to extract useful information from source images. However, the automatic fusion of these images presents a significant challenge due to the large domain difference and ambiguous boundaries. In this article, we propose a novel image fusion approach based on hybrid boundary-aware attention, termed HBANet, which models global dependencies across the image and leverages boundary-wise prior knowledge to supplement local details. Specifically, we design a novel mixed boundary-aware attention module that is capable of leveraging spatial information to the fullest extent and integrating long dependencies across different domains. To preserve the integrity of texture and structural information, we introduced a sophisticated loss function that comprises structure, intensity, and variation losses. Our method has been demonstrated to outperform state-of-the-art methods in terms of both visual and quantitative metrics, in our experiments on public datasets. Furthermore, our approach also exhibits great generalization capability, achieving satisfactory results in CT and MRI image fusion tasks.

红外与可见光图像融合是红外图像处理中一个广泛研究的问题,其目的是从源图像中提取有用信息。然而,由于领域差异大、边界模糊,这些图像的自动融合面临着巨大挑战。在本文中,我们提出了一种基于混合边界感知注意力的新型图像融合方法(称为 HBANet),该方法对整个图像的全局依赖性进行建模,并利用边界先验知识对局部细节进行补充。具体来说,我们设计了一种新颖的混合边界感知注意力模块,能够最大限度地利用空间信息,并整合不同领域的长期依赖关系。为了保持纹理和结构信息的完整性,我们引入了一个复杂的损失函数,其中包括结构、强度和变化损失。我们在公开数据集上进行的实验证明,我们的方法在视觉和定量指标方面都优于最先进的方法。此外,我们的方法还具有很强的通用能力,在 CT 和 MRI 图像融合任务中取得了令人满意的结果。
{"title":"HBANet: A hybrid boundary-aware attention network for infrared and visible image fusion","authors":"","doi":"10.1016/j.cviu.2024.104161","DOIUrl":"10.1016/j.cviu.2024.104161","url":null,"abstract":"<div><p>Infrared and visible image fusion is an extensively investigated problem in infrared image processing, aiming to extract useful information from source images. However, the automatic fusion of these images presents a significant challenge due to the large domain difference and ambiguous boundaries. In this article, we propose a novel image fusion approach based on hybrid boundary-aware attention, termed HBANet, which models global dependencies across the image and leverages boundary-wise prior knowledge to supplement local details. Specifically, we design a novel mixed boundary-aware attention module that is capable of leveraging spatial information to the fullest extent and integrating long dependencies across different domains. To preserve the integrity of texture and structural information, we introduced a sophisticated loss function that comprises structure, intensity, and variation losses. Our method has been demonstrated to outperform state-of-the-art methods in terms of both visual and quantitative metrics, in our experiments on public datasets. Furthermore, our approach also exhibits great generalization capability, achieving satisfactory results in CT and MRI image fusion tasks.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142173647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal transformer with language modality distillation for early pedestrian action anticipation 多模态转换器与语言模态提炼,用于早期行人行动预测
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-10 DOI: 10.1016/j.cviu.2024.104144

Language-vision integration has become an increasingly popular research direction within the computer vision field. In recent years, there has been a growing recognition of the importance of incorporating linguistic information into visual tasks, particularly in domains such as action anticipation. This integration allows anticipation models to leverage textual descriptions to gain deeper contextual understanding, leading to more accurate predictions. In this work, we focus on pedestrian action anticipation, where the objective is the early prediction of pedestrians’ future actions in urban environments. Our method relies on a multi-modal transformer model that encodes past observations and produces predictions at different anticipation times, employing a learned mask technique to filter out redundancy in the observed frames. Instead of relying solely on visual cues extracted from images or videos, we explore the impact of integrating textual information in enriching the input modalities of our pedestrian action anticipation model. We investigate various techniques for generating descriptive captions corresponding to input images, aiming to enhance the anticipation performance. Evaluation results on available public benchmarks demonstrate the effectiveness of our method in improving the prediction performance at different anticipation times compared to previous works. Additionally, incorporating the language modality in our anticipation model proved significant improvement, reaching a 29.5% increase in the F1 score at 1-second anticipation and a 16.66% increase at 4-second anticipation. These results underscore the potential of language-vision integration in advancing pedestrian action anticipation in complex urban environments.

语言-视觉整合已成为计算机视觉领域日益流行的研究方向。近年来,越来越多的人认识到将语言信息融入视觉任务的重要性,尤其是在动作预测等领域。这种整合使预测模型能够利用文本描述获得更深入的上下文理解,从而做出更准确的预测。在这项工作中,我们的重点是行人行动预测,目标是尽早预测行人在城市环境中的未来行动。我们的方法依赖于多模态转换器模型,该模型可对过去的观察结果进行编码,并在不同的预测时间进行预测,同时采用学习掩码技术来过滤观察帧中的冗余信息。我们没有单纯依赖从图像或视频中提取的视觉线索,而是探索了整合文本信息对丰富行人行动预测模型输入模式的影响。我们研究了生成与输入图像相对应的描述性标题的各种技术,旨在提高预测性能。现有公共基准的评估结果表明,与之前的研究相比,我们的方法能有效提高不同预测时间的预测性能。此外,在我们的预测模型中加入语言模式后,效果显著,在 1 秒钟预测时间内,F1 分数提高了 29.5%,在 4 秒钟预测时间内,F1 分数提高了 16.66%。这些结果凸显了语言-视觉整合在复杂城市环境中提高行人行动预测能力的潜力。
{"title":"Multi-modal transformer with language modality distillation for early pedestrian action anticipation","authors":"","doi":"10.1016/j.cviu.2024.104144","DOIUrl":"10.1016/j.cviu.2024.104144","url":null,"abstract":"<div><p>Language-vision integration has become an increasingly popular research direction within the computer vision field. In recent years, there has been a growing recognition of the importance of incorporating linguistic information into visual tasks, particularly in domains such as action anticipation. This integration allows anticipation models to leverage textual descriptions to gain deeper contextual understanding, leading to more accurate predictions. In this work, we focus on pedestrian action anticipation, where the objective is the early prediction of pedestrians’ future actions in urban environments. Our method relies on a multi-modal transformer model that encodes past observations and produces predictions at different anticipation times, employing a learned mask technique to filter out redundancy in the observed frames. Instead of relying solely on visual cues extracted from images or videos, we explore the impact of integrating textual information in enriching the input modalities of our pedestrian action anticipation model. We investigate various techniques for generating descriptive captions corresponding to input images, aiming to enhance the anticipation performance. Evaluation results on available public benchmarks demonstrate the effectiveness of our method in improving the prediction performance at different anticipation times compared to previous works. Additionally, incorporating the language modality in our anticipation model proved significant improvement, reaching a 29.5% increase in the F1 score at 1-second anticipation and a 16.66% increase at 4-second anticipation. These results underscore the potential of language-vision integration in advancing pedestrian action anticipation in complex urban environments.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S107731422400225X/pdfft?md5=56f12e2679069b787f5e626421a0e104&pid=1-s2.0-S107731422400225X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142240257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Human–object interaction detection algorithm based on graph structure and improved cascade pyramid network 基于图结构和改进级联金字塔网络的人机交互检测算法
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-07 DOI: 10.1016/j.cviu.2024.104162

Aiming at the problem of insufficient use of human–object interaction (HOI) information and spatial location information in images, we propose a human–object​ interaction detection network based on graph structure and improved cascade pyramid. This network is composed of three branches, namely, graph branch, human–object branch and human pose branch. In graph branch, we propose a Graph-based Interactive Feature Generation Algorithm (GIFGA) to address the inadequate utilization of interaction information. GIFGA constructs an initial dense graph model by taking humans and objects as nodes and their interaction relationships as edges. Then, by traversing each node, the graph model is updated to generate the final interaction features. In human pose branch, we propose an Improved Cascade Pyramid Network (ICPN) to tackle the underutilization of spatial location information. ICPN extracts human pose features and maps both the object bounding boxes and extracted human pose maps onto the global feature map to capture the most discriminative interaction-related region features within the global context. Finally, the features from the three branches are fed into a Multi-Layer Perceptron (MLP) for fusion and then classified for recognition. Experimental results demonstrate that our network achieves mAP of 54.93% and 28.69% on the V-COCO and HICO-DET datasets, respectively.

针对图像中人机交互(HOI)信息和空间位置信息利用不足的问题,我们提出了一种基于图结构和改进级联金字塔的人机交互检测网络。该网络由三个分支组成,即图分支、人-物分支和人的姿势分支。在图分支中,我们提出了基于图的交互特征生成算法(GIFGA),以解决交互信息利用不足的问题。GIFGA 将人和物体作为节点,将它们之间的交互关系作为边,从而构建一个初始密集图模型。然后,通过遍历每个节点,更新图模型,生成最终的交互特征。在人体姿态分支中,我们提出了一种改进的级联金字塔网络(ICPN),以解决空间位置信息利用不足的问题。ICPN 可提取人体姿态特征,并将物体边界框和提取的人体姿态映射到全局特征图上,从而在全局范围内捕捉最具区分度的交互相关区域特征。最后,将三个分支的特征输入多层感知器(MLP)进行融合,然后进行识别分类。实验结果表明,我们的网络在 V-COCO 和 HICO-DET 数据集上的 mAP 分别达到了 54.93% 和 28.69%。
{"title":"Human–object interaction detection algorithm based on graph structure and improved cascade pyramid network","authors":"","doi":"10.1016/j.cviu.2024.104162","DOIUrl":"10.1016/j.cviu.2024.104162","url":null,"abstract":"<div><p>Aiming at the problem of insufficient use of human–object interaction (HOI) information and spatial location information in images, we propose a human–object​ interaction detection network based on graph structure and improved cascade pyramid. This network is composed of three branches, namely, graph branch, human–object branch and human pose branch. In graph branch, we propose a Graph-based Interactive Feature Generation Algorithm (GIFGA) to address the inadequate utilization of interaction information. GIFGA constructs an initial dense graph model by taking humans and objects as nodes and their interaction relationships as edges. Then, by traversing each node, the graph model is updated to generate the final interaction features. In human pose branch, we propose an Improved Cascade Pyramid Network (ICPN) to tackle the underutilization of spatial location information. ICPN extracts human pose features and maps both the object bounding boxes and extracted human pose maps onto the global feature map to capture the most discriminative interaction-related region features within the global context. Finally, the features from the three branches are fed into a Multi-Layer Perceptron (MLP) for fusion and then classified for recognition. Experimental results demonstrate that our network achieves mAP of 54.93% and 28.69% on the V-COCO and HICO-DET datasets, respectively.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection VIDF-Net:用于三维物体检测的体素-图像动态融合方法
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-07 DOI: 10.1016/j.cviu.2024.104164

In recent years, multi-modal fusion methods have shown excellent performance in the field of 3D object detection, which select the voxel centers and globally fuse with image features across the scene. However, these approaches exist two issues. First, The distribution of voxel density is highly heterogeneous due to the discrete volumes. Additionally, there are significant differences in the features between images and point clouds. Global fusion does not take into account the correspondence between these two modalities, which leads to the insufficient fusion. In this paper, we propose a new multi-modal fusion method named Voxel-Image Dynamic Fusion (VIDF). Specifically, VIDF-Net is composed of the Voxel Centroid Mapping module (VCM) and the Deformable Attention Fusion module (DAF). The Voxel Centroid Mapping module is used to calculate the centroid of voxel features and map them onto the image plane, which can locate the position of voxel features more effectively. We then use the Deformable Attention Fusion module to dynamically calculates the offset of each voxel centroid from the image position and combine these two modalities. Furthermore, we propose Region Proposal Network with Channel-Spatial Aggregate to combine channel and spatial attention maps for improved multi-scale feature interaction. We conduct extensive experiments on the KITTI dataset to demonstrate the outstanding performance of proposed VIDF network. In particular, significant improvements have been observed in the Hard categories of Cars and Pedestrians, which shows the significant effectiveness of our approach in dealing with complex scenarios.

近年来,多模态融合方法在三维物体检测领域表现出色,这些方法选择体素中心,并与整个场景的图像特征进行全局融合。然而,这些方法存在两个问题。首先,由于体积离散,体素密度的分布具有高度异质性。此外,图像和点云之间的特征也存在显著差异。全局融合没有考虑这两种模式之间的对应关系,从而导致融合不充分。在本文中,我们提出了一种新的多模态融合方法,名为体素-图像动态融合(VIDF)。具体来说,VIDF-Net 由体素中心点映射模块(VCM)和可变形注意力融合模块(DAF)组成。体素中心点映射模块用于计算体素特征的中心点,并将其映射到图像平面上,从而更有效地定位体素特征的位置。然后,我们使用可变形注意力融合模块动态计算每个体素中心点与图像位置的偏移,并将这两种模式结合起来。此外,我们还提出了具有通道-空间聚合功能的区域建议网络(Region Proposal Network with Channel-Spatial Aggregate),以结合通道和空间注意力图,从而改进多尺度特征交互。我们在 KITTI 数据集上进行了大量实验,证明了所提出的 VIDF 网络的卓越性能。特别是在汽车和行人这两个难点类别中,我们观察到了明显的改进,这表明我们的方法在处理复杂场景时非常有效。
{"title":"VIDF-Net: A Voxel-Image Dynamic Fusion method for 3D object detection","authors":"","doi":"10.1016/j.cviu.2024.104164","DOIUrl":"10.1016/j.cviu.2024.104164","url":null,"abstract":"<div><p>In recent years, multi-modal fusion methods have shown excellent performance in the field of 3D object detection, which select the voxel centers and globally fuse with image features across the scene. However, these approaches exist two issues. First, The distribution of voxel density is highly heterogeneous due to the discrete volumes. Additionally, there are significant differences in the features between images and point clouds. Global fusion does not take into account the correspondence between these two modalities, which leads to the insufficient fusion. In this paper, we propose a new multi-modal fusion method named Voxel-Image Dynamic Fusion (VIDF). Specifically, VIDF-Net is composed of the Voxel Centroid Mapping module (VCM) and the Deformable Attention Fusion module (DAF). The Voxel Centroid Mapping module is used to calculate the centroid of voxel features and map them onto the image plane, which can locate the position of voxel features more effectively. We then use the Deformable Attention Fusion module to dynamically calculates the offset of each voxel centroid from the image position and combine these two modalities. Furthermore, we propose Region Proposal Network with Channel-Spatial Aggregate to combine channel and spatial attention maps for improved multi-scale feature interaction. We conduct extensive experiments on the KITTI dataset to demonstrate the outstanding performance of proposed VIDF network. In particular, significant improvements have been observed in the Hard categories of Cars and Pedestrians, which shows the significant effectiveness of our approach in dealing with complex scenarios.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142168432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation HAD-Net:基于注意力 U 的网络,采用超尺度移动聚合和最大对角线采样,用于医学图像分割
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-07 DOI: 10.1016/j.cviu.2024.104151

Objectives:

Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: (i) the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. (ii) Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And (iii) the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.

Methods:

To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.

Results:

Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.

Conclusions:

The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.

目标:准确提取具有不同形状和尺度的感兴趣区(ROI)是医学图像分割的主要挑战之一。目前基于 U 的网络大多将多级编码输出汇总为改进的多尺度跳转连接。虽然这种设计已被证明能提供尺度多样性和上下文完整性,但仍存在一些直观限制:(i) 编码输出被简单地重新采样到相同大小,从而破坏了细粒度信息。利用多尺度的优势并不充分。(ii) 某些与特征维度大小成正比的冗余信息被引入,造成多级干扰。(iii) 信息传递的精确度依赖于上采样层和下采样层,但它们之间缺乏保持特征位置和趋势一致性的指导。方法:为了改善这些情况,本文提出了一种基于 U 的 CNN 网络,命名为 HAD-Net,它集合了一种新的超大规模移位聚合模块(HSAM)范式和用于跳过连接的渐进重用注意力(PRA),并采用了一对新颖的双分支无参数采样层,即最大对角线池化(MDP)和最大对角线非池化(MDUP)。也就是说,该汇集方案在较浅的阶段额外合并了五个具有一定偏移的子区域。由于子区域的缩放比例较低,可以丰富尺度和细粒度背景。然后,注意力方案包含部分到全局通道注意力(PGCA)和多尺度重用空间注意力(MRSA),它在内部建立重用连接,并将重点调整到更有用的维度上。结果:与最先进的网络相比,HAD-Net 的性能相当甚至更好,其 Dice 分别为 90.结论:事实证明,HSAM+PRA+MDP+MDUP 方案具有显著的改进效果,并留有进一步研究的空间。
{"title":"HAD-Net: An attention U-based network with hyper-scale shifted aggregating and max-diagonal sampling for medical image segmentation","authors":"","doi":"10.1016/j.cviu.2024.104151","DOIUrl":"10.1016/j.cviu.2024.104151","url":null,"abstract":"<div><h3>Objectives:</h3><p>Accurate extraction of regions of interest (ROI) with variable shapes and scales is one of the primary challenges in medical image segmentation. Current U-based networks mostly aggregate multi-stage encoding outputs as an improved multi-scale skip connection. Although this design has been proven to provide scale diversity and contextual integrity, there remain several intuitive limits: <strong>(i)</strong> the encoding outputs are resampled to the same size simply, which destruct the fine-grained information. The advantages of utilization of multiple scales are insufficient. <strong>(ii)</strong> Certain redundant information proportional to the feature dimension size is introduced and causes multi-stage interference. And <strong>(iii)</strong> the precision of information delivery relies on the up-sampling and down-sampling layers, but guidance on maintaining consistency in feature locations and trends between them is lacking.</p></div><div><h3>Methods:</h3><p>To improve these situations, this paper proposed a U-based CNN network named HAD-Net, by assembling a new hyper-scale shifted aggregating module (HSAM) paradigm and progressive reusing attention (PRA) for skip connections, as well as employing a novel pair of dual-branch parameter-free sampling layers, i.e. max-diagonal pooling (MDP) and max-diagonal un-pooling (MDUP). That is, the aggregating scheme additionally combines five subregions with certain offsets in the shallower stage. Since the lower scale-down ratios of subregions enrich scales and fine-grain context. Then, the attention scheme contains a partial-to-global channel attention (PGCA) and a multi-scale reusing spatial attention (MRSA), it builds reusing connections internally and adjusts the focus on more useful dimensions. Finally, MDP and MDUP are explored in pairs to improve texture delivery and feature consistency, enhancing information retention and avoiding positional confusion.</p></div><div><h3>Results:</h3><p>Compared to state-of-the-art networks, HAD-Net has achieved comparable and even better performances with Dice of 90.13%, 81.51%, and 75.43% for each class on BraTS20, 89.59% Dice and 98.56% AUC on Kvasir-SEG, as well as 82.17% Dice and 98.05% AUC on DRIVE.</p></div><div><h3>Conclusions:</h3><p>The scheme of HSAM+PRA+MDP+MDUP has been proven to be a remarkable improvement and leaves room for further research.</p></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":null,"pages":null},"PeriodicalIF":4.3,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002327/pdfft?md5=8776295cbe51596acb5f3c2feb76b9bf&pid=1-s2.0-S1077314224002327-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142229388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1