首页 > 最新文献

Journal of Visual Communication and Image Representation最新文献

英文 中文
Accelerating inter-frame prediction in Versatile Video Coding via deep learning-based mode selection 基于深度学习的模式选择加速通用视频编码中的帧间预测
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-29 DOI: 10.1016/j.jvcir.2025.104653
Xudong Zhang , Jing Chen , Huanqiang Zeng , Wenjie Xiang , Yuting Zuo
Compared to its predecessor HEVC, VVC utilizes the Quad-Tree plus Multitype Tree (QTMT) structure for partitioning Coding Units (CU) and integrates a wider range of inter-frame prediction modes within its inter-frame coding framework. The incorporation of these innovative techniques enables VVC to achieve a substantial bitrate reduction of approximately 40% compared to HEVC. However, this efficiency boost is accompanied by a more than tenfold increase in encoding time. To accelerate the inter-frame prediction mode selection process, a FPMSN (Fast Prediction Mode Selection Network)-based method focusing on encoding acceleration during the non-partitioning mode testing phase is proposed in this paper. First, the execution results of the affine mode are collected as neural network input features. Next, FPMSN is proposed to extract critical information from multi-dimensional data and output the probabilities for each mode. Finally, multiple trade-off strategies are implemented to early terminate low-probability mode candidates.
Experimental results show that, under the Random Access (RA) configuration, the proposed method achieves a reduction in encoding time ranging from 3.22% to 19.3%, with a corresponding BDBR increase of only 0.12% to 1.363%, surpassing the performance of state-of-the-art methods.
与其前身HEVC相比,VVC利用四叉树加多类型树(QTMT)结构来划分编码单元(CU),并在帧间编码框架内集成了更广泛的帧间预测模式。与HEVC相比,这些创新技术的结合使VVC的比特率大幅降低了约40%。然而,这种效率的提高伴随着编码时间的十倍以上的增加。为了加速帧间预测模式选择过程,提出了一种基于快速预测模式选择网络(FPMSN)的非分割模式测试阶段的编码加速方法。首先,收集仿射模式的执行结果作为神经网络的输入特征。其次,提出了FPMSN从多维数据中提取关键信息并输出每种模式的概率。最后,采用多种权衡策略,提前终止低概率模式候选者。实验结果表明,在随机存取(Random Access, RA)配置下,该方法的编码时间缩短了3.22% ~ 19.3%,BDBR仅提高了0.12% ~ 1.363%,超过了现有方法的性能。
{"title":"Accelerating inter-frame prediction in Versatile Video Coding via deep learning-based mode selection","authors":"Xudong Zhang ,&nbsp;Jing Chen ,&nbsp;Huanqiang Zeng ,&nbsp;Wenjie Xiang ,&nbsp;Yuting Zuo","doi":"10.1016/j.jvcir.2025.104653","DOIUrl":"10.1016/j.jvcir.2025.104653","url":null,"abstract":"<div><div>Compared to its predecessor HEVC, VVC utilizes the Quad-Tree plus Multitype Tree (QTMT) structure for partitioning Coding Units (CU) and integrates a wider range of inter-frame prediction modes within its inter-frame coding framework. The incorporation of these innovative techniques enables VVC to achieve a substantial bitrate reduction of approximately 40% compared to HEVC. However, this efficiency boost is accompanied by a more than tenfold increase in encoding time. To accelerate the inter-frame prediction mode selection process, a FPMSN (Fast Prediction Mode Selection Network)-based method focusing on encoding acceleration during the non-partitioning mode testing phase is proposed in this paper. First, the execution results of the affine mode are collected as neural network input features. Next, FPMSN is proposed to extract critical information from multi-dimensional data and output the probabilities for each mode. Finally, multiple trade-off strategies are implemented to early terminate low-probability mode candidates.</div><div>Experimental results show that, under the Random Access (RA) configuration, the proposed method achieves a reduction in encoding time ranging from 3.22% to 19.3%, with a corresponding BDBR increase of only 0.12% to 1.363%, surpassing the performance of state-of-the-art methods.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104653"},"PeriodicalIF":3.1,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DAIRNet: Degradation-aware All-in-one Image Restoration Network with cross-channel feature interaction DAIRNet:具有跨通道特征交互的退化感知一体化图像恢复网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-29 DOI: 10.1016/j.jvcir.2025.104659
Amit Monga , Hemkant Nehete , Tharun Kumar Reddy Bollu , Balasubramanian Raman
Image restoration is a fundamental task in computer vision that recovers clean images from degraded inputs. However, preserving fine-details and maintaining global structural consistency are challenging tasks. Traditional convolutional neural network (CNN)-based methods capture local features but fail to model long-range dependencies and often overlook small objects within similar backgrounds. Transformers, conversely, model global context effectively but lack local detail precision. To overcome these limitations, this paper proposes a Degradation-aware All-in-one Image Restoration Network that integrates both CNNs and Transformers. Beginning with a multiscale feature extraction block, the network captures diverse features across different resolutions, enhancing its ability to handle complex image structures. The features from CNN encoder are subsequently passed through a Transformer decoder. Notably, an interleaved Transformer is applied to the features extracted by the CNN encoder, fostering cross-interaction between features and helping to propagate similar texture signals across the entire feature space, making them more distinguishable. These improved features are then concatenated with the transformer decoder blocks with degradation-aware information as prompts, enriching the restoration process. On average, across various restoration tasks, DAIRNet surpasses the state-of-the-art PromptIR and AirNet methods by 0.76 dB and 1.62 dB, respectively. Specifically, it achieves gains of 1.74 dB in image deraining, 0.26 dB in high-noise level denoising, and 0.84 dB in image dehazing tasks as compared to PromptIR. Single-task benchmarks further confirm the model’s effectiveness and generalizability.
图像恢复是计算机视觉中的一项基本任务,它从退化的输入中恢复干净的图像。然而,保留细节和保持全球结构的一致性是一项具有挑战性的任务。传统的基于卷积神经网络(CNN)的方法捕获局部特征,但无法建立长期依赖关系,并且经常忽略相似背景中的小对象。相反,变形金刚可以有效地模拟全局上下文,但缺乏局部细节精度。为了克服这些限制,本文提出了一种集成cnn和transformer的退化感知一体化图像恢复网络。从多尺度特征提取块开始,该网络捕获不同分辨率的不同特征,增强了处理复杂图像结构的能力。CNN编码器的特征随后通过变压器解码器。值得注意的是,对CNN编码器提取的特征应用了交错变压器,促进了特征之间的交叉交互,并有助于在整个特征空间中传播相似的纹理信号,使它们更容易区分。然后将这些改进的特性与具有退化感知信息的变压器解码器块连接起来,作为提示,丰富恢复过程。平均而言,在各种恢复任务中,DAIRNet比最先进的PromptIR和AirNet方法分别高出0.76 dB和1.62 dB。具体来说,与PromptIR相比,它在图像去噪方面的增益为1.74 dB,在高噪声水平去噪方面的增益为0.26 dB,在图像去雾任务方面的增益为0.84 dB。单任务基准测试进一步证实了模型的有效性和可泛化性。
{"title":"DAIRNet: Degradation-aware All-in-one Image Restoration Network with cross-channel feature interaction","authors":"Amit Monga ,&nbsp;Hemkant Nehete ,&nbsp;Tharun Kumar Reddy Bollu ,&nbsp;Balasubramanian Raman","doi":"10.1016/j.jvcir.2025.104659","DOIUrl":"10.1016/j.jvcir.2025.104659","url":null,"abstract":"<div><div>Image restoration is a fundamental task in computer vision that recovers clean images from degraded inputs. However, preserving fine-details and maintaining global structural consistency are challenging tasks. Traditional convolutional neural network (CNN)-based methods capture local features but fail to model long-range dependencies and often overlook small objects within similar backgrounds. Transformers, conversely, model global context effectively but lack local detail precision. To overcome these limitations, this paper proposes a Degradation-aware All-in-one Image Restoration Network that integrates both CNNs and Transformers. Beginning with a multiscale feature extraction block, the network captures diverse features across different resolutions, enhancing its ability to handle complex image structures. The features from CNN encoder are subsequently passed through a Transformer decoder. Notably, an interleaved Transformer is applied to the features extracted by the CNN encoder, fostering cross-interaction between features and helping to propagate similar texture signals across the entire feature space, making them more distinguishable. These improved features are then concatenated with the transformer decoder blocks with degradation-aware information as prompts, enriching the restoration process. On average, across various restoration tasks, DAIRNet surpasses the state-of-the-art PromptIR and AirNet methods by 0.76 dB and 1.62 dB, respectively. Specifically, it achieves gains of 1.74 dB in image deraining, 0.26 dB in high-noise level denoising, and 0.84 dB in image dehazing tasks as compared to PromptIR. Single-task benchmarks further confirm the model’s effectiveness and generalizability.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104659"},"PeriodicalIF":3.1,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing 3D point cloud generation via Mamba-based time-varying denoising diffusion 通过基于曼巴的时变去噪扩散增强三维点云生成
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-27 DOI: 10.1016/j.jvcir.2025.104657
DaoPeng Zhang, Li Yu
3D point cloud generation plays a pivotal role in a wide range of applications, including robotics, medical imaging, autonomous driving, and virtual/augmented reality (VR/AR). However, generating high-quality point clouds remains highly challenging due to the irregularity and unordered nature of point cloud data. Existing Transformer-based generative models suffer from quadratic computational complexity, which limits their ability to capture global contextual dependencies and often leads to the loss of critical geometric information. To address these limitations, we propose a novel diffusion-based framework for point cloud generation that integrates the Mamba state-space model — known for its linear complexity and strong long-sequence modeling capability — with convolutional layers. Specifically, Mamba is employed to capture global structural dependencies across time steps, while the convolutional layers refine local geometric details. To effectively leverage the strengths of both components, we introduce a learnable masking mechanism that dynamically fuses global and local features at optimal time steps, thereby exploiting their complementary advantages. Extensive experiments demonstrate that our model outperforms previous point cloud generative approaches such as TIGER and PVD in terms of both quality and diversity. On the airplane category, our model achieves a 9.28% improvement in 1-NNA accuracy based on EMD compared to PVD, and a 1.72% improvement based on CD compared to TIGER. Compared with recent baseline models, our method consistently achieves significant gains across multiple evaluation metrics.
3D点云生成在广泛的应用中起着关键作用,包括机器人、医学成像、自动驾驶和虚拟/增强现实(VR/AR)。然而,由于点云数据的不规则性和无序性,生成高质量的点云仍然具有很高的挑战性。现有的基于transformer的生成模型存在二次计算复杂性,这限制了它们捕获全局上下文依赖关系的能力,并且经常导致关键几何信息的丢失。为了解决这些限制,我们提出了一种新的基于扩散的点云生成框架,该框架将Mamba状态空间模型(以其线性复杂性和强大的长序列建模能力而闻名)与卷积层集成在一起。具体来说,Mamba用于捕获跨时间步长的全局结构依赖关系,而卷积层则细化局部几何细节。为了有效地利用这两个组件的优势,我们引入了一种可学习的掩蔽机制,该机制在最佳时间步长动态融合全局和局部特征,从而利用它们的互补优势。大量的实验表明,我们的模型在质量和多样性方面都优于以前的点云生成方法,如TIGER和PVD。在飞机类别中,我们的模型基于EMD的1-NNA精度比PVD提高了9.28%,基于CD的1-NNA精度比TIGER提高了1.72%。与最近的基线模型相比,我们的方法在多个评估指标中一致地取得了显著的收益。
{"title":"Enhancing 3D point cloud generation via Mamba-based time-varying denoising diffusion","authors":"DaoPeng Zhang,&nbsp;Li Yu","doi":"10.1016/j.jvcir.2025.104657","DOIUrl":"10.1016/j.jvcir.2025.104657","url":null,"abstract":"<div><div>3D point cloud generation plays a pivotal role in a wide range of applications, including robotics, medical imaging, autonomous driving, and virtual/augmented reality (VR/AR). However, generating high-quality point clouds remains highly challenging due to the irregularity and unordered nature of point cloud data. Existing Transformer-based generative models suffer from quadratic computational complexity, which limits their ability to capture global contextual dependencies and often leads to the loss of critical geometric information. To address these limitations, we propose a novel diffusion-based framework for point cloud generation that integrates the Mamba state-space model — known for its linear complexity and strong long-sequence modeling capability — with convolutional layers. Specifically, Mamba is employed to capture global structural dependencies across time steps, while the convolutional layers refine local geometric details. To effectively leverage the strengths of both components, we introduce a learnable masking mechanism that dynamically fuses global and local features at optimal time steps, thereby exploiting their complementary advantages. Extensive experiments demonstrate that our model outperforms previous point cloud generative approaches such as TIGER and PVD in terms of both quality and diversity. On the airplane category, our model achieves a 9.28% improvement in 1-NNA accuracy based on EMD compared to PVD, and a 1.72% improvement based on CD compared to TIGER. Compared with recent baseline models, our method consistently achieves significant gains across multiple evaluation metrics.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104657"},"PeriodicalIF":3.1,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BIAN: Bidirectional interwoven attention network for retinal OCT image classification 边:用于视网膜OCT图像分类的双向交织注意网络
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-25 DOI: 10.1016/j.jvcir.2025.104654
Ahmed Alasri , Zhixiang Chen , Yalong Xiao , Chengzhang Zhu , Abdulrahman Noman , Raeed Alsabri , Harrison Xiao Bai
Retinal diseases are a significant global health concern, requiring advanced diagnostic tools for early detection and treatment. Automated diagnosis of retinal diseases using deep learning can significantly enhance early detection and intervention efforts. However, conventional deep learning models that concentrate on localized perspectives often develop feature representations that lack sufficient semantic discriminative capability. Conversely, models that prioritize global semantic-level information may fail to capture essential, subtle local pathological features. To address this issue, we propose BIAN, a novel Bidirectional Interwoven Attention Network designed for the classification of retinal Optical Coherence Tomography (OCT) images. BIAN synergistically combines the strengths of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) by integrating a ResNet architecture backbone with a ViT backbone through a bidirectional interwoven attention block. This network enables the model to effectively capture both local features and global contextual information. Specifically, the bidirectional interwoven attention block allow the ResNet and ViT components to attend to each other’s feature representations, enhancing the network’s overall learning capacity. We evaluated BIAN on both the OCTID and OCTDL datasets for retinal disease classification. The OCTID dataset includes conditions such as Age-related Macular Degeneration (AMD), Macular Hole (MH), Central Serous Retinopathy (CSR), etc., while OCTDL covers AMD, Diabetic Macular Edema (DME), Epiretinal Membrane (ERM), Retinal Vein Occlusion (RVO), etc. On OCTID, the proposed model achieved 95.7% accuracy for five-class classification, outperforming existing state-of-the-art models. On OCTDL, BIAN attained 94.7% accuracy, with consistently high F1-scores (95.6% on OCTID, 94.6% on OCTDL) and AUC values (99.3% and 99.0%, respectively). These results highlight the potential of BIAN as a robust network for retinal OCT image classification in medical applications.
视网膜疾病是一个重要的全球健康问题,需要先进的诊断工具进行早期发现和治疗。使用深度学习的视网膜疾病自动诊断可以显著提高早期发现和干预的努力。然而,专注于局部视角的传统深度学习模型经常开发缺乏足够语义判别能力的特征表示。相反,优先考虑全局语义级信息的模型可能无法捕获基本的、微妙的局部病理特征。为了解决这个问题,我们提出了一种新的双向交织注意力网络BIAN,用于视网膜光学相干断层扫描(OCT)图像的分类。通过一个双向交织的注意块,将一个ResNet架构主干网与一个ViT主干网集成在一起,将卷积神经网络(cnn)和视觉变压器(ViTs)的优势协同结合起来。该网络使模型能够有效地捕获局部特征和全局上下文信息。具体来说,双向交织的注意块允许ResNet和ViT组件相互关注对方的特征表示,增强了网络的整体学习能力。我们在视网膜疾病分类的OCTID和OCTDL数据集上评估了BIAN。OCTID数据集包括年龄相关性黄斑变性(AMD)、黄斑孔(MH)、中心性浆液性视网膜病变(CSR)等疾病,而OCTDL涵盖AMD、糖尿病性黄斑水肿(DME)、视网膜前膜(ERM)、视网膜静脉闭塞(RVO)等疾病。在OCTID上,该模型对五类分类的准确率达到95.7%,优于现有最先进的模型。在OCTDL上,BIAN的准确率达到94.7%,f1得分(OCTID为95.6%,OCTDL为94.6%)和AUC值(分别为99.3%和99.0%)一直很高。这些结果突出了BIAN在医学应用中作为视网膜OCT图像分类的强大网络的潜力。
{"title":"BIAN: Bidirectional interwoven attention network for retinal OCT image classification","authors":"Ahmed Alasri ,&nbsp;Zhixiang Chen ,&nbsp;Yalong Xiao ,&nbsp;Chengzhang Zhu ,&nbsp;Abdulrahman Noman ,&nbsp;Raeed Alsabri ,&nbsp;Harrison Xiao Bai","doi":"10.1016/j.jvcir.2025.104654","DOIUrl":"10.1016/j.jvcir.2025.104654","url":null,"abstract":"<div><div>Retinal diseases are a significant global health concern, requiring advanced diagnostic tools for early detection and treatment. Automated diagnosis of retinal diseases using deep learning can significantly enhance early detection and intervention efforts. However, conventional deep learning models that concentrate on localized perspectives often develop feature representations that lack sufficient semantic discriminative capability. Conversely, models that prioritize global semantic-level information may fail to capture essential, subtle local pathological features. To address this issue, we propose BIAN, a novel Bidirectional Interwoven Attention Network designed for the classification of retinal Optical Coherence Tomography (OCT) images. BIAN synergistically combines the strengths of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) by integrating a ResNet architecture backbone with a ViT backbone through a bidirectional interwoven attention block. This network enables the model to effectively capture both local features and global contextual information. Specifically, the bidirectional interwoven attention block allow the ResNet and ViT components to attend to each other’s feature representations, enhancing the network’s overall learning capacity. We evaluated BIAN on both the OCTID and OCTDL datasets for retinal disease classification. The OCTID dataset includes conditions such as Age-related Macular Degeneration (AMD), Macular Hole (MH), Central Serous Retinopathy (CSR), etc., while OCTDL covers AMD, Diabetic Macular Edema (DME), Epiretinal Membrane (ERM), Retinal Vein Occlusion (RVO), etc. On OCTID, the proposed model achieved 95.7% accuracy for five-class classification, outperforming existing state-of-the-art models. On OCTDL, BIAN attained 94.7% accuracy, with consistently high F1-scores (95.6% on OCTID, 94.6% on OCTDL) and AUC values (99.3% and 99.0%, respectively). These results highlight the potential of BIAN as a robust network for retinal OCT image classification in medical applications.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104654"},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145748550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HySaM: An improved hybrid SAM and Mask R-CNN for underwater instance segmentation HySaM:一种改进的混合SAM和Mask R-CNN用于水下实例分割
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-25 DOI: 10.1016/j.jvcir.2025.104656
Xingfa Wang , Chengjun Chen , Chenggang Dai , Kunhua Liu , Mingxing Lin
Due to inherent absorption and scattering effects, underwater images often exhibit low visibility and significant color deviation. These issues hinder the extraction of discriminative features and adversely impact instance-level segmentation accuracy. To address these challenges, this study proposes a novel Hybrid SAM and Mask R-CNN framework for underwater instance segmentation, integrating the strong generalization capability of SAM with the structural decoding strength of Mask R-CNN. The powerful global modeling ability of SAM effectively mitigates the impact of underwater image degradation, thereby enabling more robust feature representation. Moreover, a novel underwater feature weighted enhancer is introduced in the framework to enhance multi-scale feature fusion and improve the detection of small and scale-varying objects in underwater environments. To provide benchmark data, a large-scale underwater instance segmentation dataset, UW10K, is also constructed, comprising 13,551 images and 22,968 annotated instances across 15 categories. Comprehensive experiments validate the superiority of the proposed model across various instance segmentation tasks. Specifically, it achieves precisions of 74.2 %, 40.5 %, and 70.6 % on UW10K, USIS10K, and WHU Building datasets, respectively. This study is expected to advance ocean exploration and fisheries, while providing valuable training samples for instance segmentation tasks. Datasets and codes are available at https://github.com/xfwang-qut/HySaM.
由于固有的吸收和散射效应,水下图像往往表现出低能见度和显着的色彩偏差。这些问题阻碍了鉴别特征的提取,并对实例级分割的准确性产生不利影响。为了解决这些挑战,本研究提出了一种新的混合SAM和Mask R-CNN框架用于水下实例分割,将SAM的强大泛化能力与Mask R-CNN的结构解码强度相结合。SAM强大的全局建模能力有效减轻了水下图像退化的影响,从而实现了更鲁棒的特征表示。此外,在该框架中引入了一种新的水下特征加权增强器,增强了多尺度特征融合,提高了水下环境中小目标和尺度变化目标的检测能力。为了提供基准数据,还构建了一个大型水下实例分割数据集UW10K,该数据集包括15个类别的13,551张图像和22,968个注释实例。综合实验验证了该模型在各种实例分割任务中的优越性。具体来说,它在UW10K、USIS10K和WHU Building数据集上分别达到了74.2%、40.5%和70.6%的精度。该研究有望推动海洋勘探和渔业,同时为实例分割任务提供有价值的训练样本。数据集和代码可在https://github.com/xfwang-qut/HySaM上获得。
{"title":"HySaM: An improved hybrid SAM and Mask R-CNN for underwater instance segmentation","authors":"Xingfa Wang ,&nbsp;Chengjun Chen ,&nbsp;Chenggang Dai ,&nbsp;Kunhua Liu ,&nbsp;Mingxing Lin","doi":"10.1016/j.jvcir.2025.104656","DOIUrl":"10.1016/j.jvcir.2025.104656","url":null,"abstract":"<div><div>Due to inherent absorption and scattering effects, underwater images often exhibit low visibility and significant color deviation. These issues hinder the extraction of discriminative features and adversely impact instance-level segmentation accuracy. To address these challenges, this study proposes a novel Hybrid SAM and Mask R-CNN framework for underwater instance segmentation, integrating the strong generalization capability of SAM with the structural decoding strength of Mask R-CNN. The powerful global modeling ability of SAM effectively mitigates the impact of underwater image degradation, thereby enabling more robust feature representation. Moreover, a novel underwater feature weighted enhancer is introduced in the framework to enhance multi-scale feature fusion and improve the detection of small and scale-varying objects in underwater environments. To provide benchmark data, a large-scale underwater instance segmentation dataset, UW10K, is also constructed, comprising 13,551 images and 22,968 annotated instances across 15 categories. Comprehensive experiments validate the superiority of the proposed model across various instance segmentation tasks. Specifically, it achieves precisions of 74.2 %, 40.5 %, and 70.6 % on UW10K, USIS10K, and WHU Building datasets, respectively. This study is expected to advance ocean exploration and fisheries, while providing valuable training samples for instance segmentation tasks. Datasets and codes are available at <span><span>https://github.com/xfwang-qut/HySaM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104656"},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale interleaved transformer network for image deraining 多尺度交错变压器网络图像提取
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-25 DOI: 10.1016/j.jvcir.2025.104655
Yue Que , Chen Qiu , Hanqing Xiong , Xue Xia , Zhiwei Liu
Convolutional neural networks (CNNs) have demonstrated impressive performance in image deraining tasks. However, CNNs have a limited receptive field, which restricts their ability to adapt to spatial variations in the input. Recently, Transformers have demonstrated promising results in image deraining, surpassing CNN in several cases. However, most existing methods leverage limited spatial input information through attribution analysis. In this paper, we investigated the construction of multi-scale feature representations within Transformers to fully exploit their potential in image deraining. We propose a multi-scale interleaved Transformer framework, which aims to reconstruct high-quality images by leveraging information across different scales, thereby enabling it to better capture the size and distribution of rain. In addition, we introduce a hybrid cross-attention mechanism to replace traditional feature fusion, facilitating global feature interaction and capturing complementary information across scales simultaneously. Our approach surpasses state-of-the-art methods in terms of image deraining performance on two types of benchmark datasets.
卷积神经网络(cnn)在图像训练任务中表现出了令人印象深刻的性能。然而,cnn有一个有限的接受野,这限制了它们适应输入空间变化的能力。最近,变形金刚在图像脱轨方面显示出了令人鼓舞的结果,在一些情况下超过了CNN。然而,大多数现有方法通过归因分析利用有限的空间输入信息。在本文中,我们研究了变形金刚内部的多尺度特征表示的构建,以充分发挥其在图像提取中的潜力。我们提出了一个多尺度交错的变压器框架,旨在通过利用不同尺度的信息重建高质量的图像,从而使其能够更好地捕捉降雨的大小和分布。此外,我们引入了一种混合的交叉注意机制来取代传统的特征融合,促进全局特征交互,同时捕获跨尺度的互补信息。在两种类型的基准数据集上,我们的方法在图像提取性能方面超过了最先进的方法。
{"title":"Multi-scale interleaved transformer network for image deraining","authors":"Yue Que ,&nbsp;Chen Qiu ,&nbsp;Hanqing Xiong ,&nbsp;Xue Xia ,&nbsp;Zhiwei Liu","doi":"10.1016/j.jvcir.2025.104655","DOIUrl":"10.1016/j.jvcir.2025.104655","url":null,"abstract":"<div><div>Convolutional neural networks (CNNs) have demonstrated impressive performance in image deraining tasks. However, CNNs have a limited receptive field, which restricts their ability to adapt to spatial variations in the input. Recently, Transformers have demonstrated promising results in image deraining, surpassing CNN in several cases. However, most existing methods leverage limited spatial input information through attribution analysis. In this paper, we investigated the construction of multi-scale feature representations within Transformers to fully exploit their potential in image deraining. We propose a multi-scale interleaved Transformer framework, which aims to reconstruct high-quality images by leveraging information across different scales, thereby enabling it to better capture the size and distribution of rain. In addition, we introduce a hybrid cross-attention mechanism to replace traditional feature fusion, facilitating global feature interaction and capturing complementary information across scales simultaneously. Our approach surpasses state-of-the-art methods in terms of image deraining performance on two types of benchmark datasets.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104655"},"PeriodicalIF":3.1,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145694272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PatchNeRF: Patch-based Neural Radiance Fields for real time view synthesis in wide-scale scenes PatchNeRF:基于patch的神经辐射场,用于大尺度场景的实时视图合成
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-20 DOI: 10.1016/j.jvcir.2025.104602
Ziyu Hu , Xiaoguang Jiang , Qiong Liu , Xin Ding
Recent methods based on Neural Radiance Fields (NeRFs) have excelled in real-time novel view synthesis for small-scale scenes but struggle with fast rendering for large-scale scenes. Achieving a balance in performance between small-scale and large-scale scenes has emerged as a challenging problem. To address this, we propose PatchNeRF, a patch-based NeRF representation for wide-scale scenes. PatchNeRF uses small 2D patches to fit surfaces, learning a 2D neural radiance field for local geometry and texture. To make the most of sampling patches and skip empty space, we propose strategies for initializing and progressively updating the patch structure, along with performing end-to-end training using both large and tiny MLPs. After training, we prebake the implicit 2D neural radiance fields as feature maps to accelerate the rendering process. Experiments demonstrate that our approach outperforms state-of-the-art methods in both small-scale and large-scale scenes, while achieving superior rendering speeds.
基于神经辐射场(Neural Radiance Fields, nerf)的方法在小规模场景的实时新视图合成方面表现出色,但在大规模场景的快速渲染方面存在困难。实现小规模和大规模场景之间的性能平衡已经成为一个具有挑战性的问题。为了解决这个问题,我们提出了PatchNeRF,一个基于补丁的大尺度场景NeRF表示。PatchNeRF使用小的2D补丁来拟合表面,为局部几何和纹理学习2D神经辐射场。为了充分利用采样补丁并跳过空白空间,我们提出了初始化和逐步更新补丁结构的策略,以及使用大型和小型mlp进行端到端训练。训练完成后,我们将隐式二维神经辐射场预焙为特征映射,以加速渲染过程。实验表明,我们的方法在小规模和大规模场景中都优于最先进的方法,同时实现了卓越的渲染速度。
{"title":"PatchNeRF: Patch-based Neural Radiance Fields for real time view synthesis in wide-scale scenes","authors":"Ziyu Hu ,&nbsp;Xiaoguang Jiang ,&nbsp;Qiong Liu ,&nbsp;Xin Ding","doi":"10.1016/j.jvcir.2025.104602","DOIUrl":"10.1016/j.jvcir.2025.104602","url":null,"abstract":"<div><div>Recent methods based on Neural Radiance Fields (NeRFs) have excelled in real-time novel view synthesis for small-scale scenes but struggle with fast rendering for large-scale scenes. Achieving a balance in performance between small-scale and large-scale scenes has emerged as a challenging problem. To address this, we propose PatchNeRF, a patch-based NeRF representation for wide-scale scenes. PatchNeRF uses small 2D patches to fit surfaces, learning a 2D neural radiance field for local geometry and texture. To make the most of sampling patches and skip empty space, we propose strategies for initializing and progressively updating the patch structure, along with performing end-to-end training using both large and tiny MLPs. After training, we prebake the implicit 2D neural radiance fields as feature maps to accelerate the rendering process. Experiments demonstrate that our approach outperforms state-of-the-art methods in both small-scale and large-scale scenes, while achieving superior rendering speeds.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"115 ","pages":"Article 104602"},"PeriodicalIF":3.1,"publicationDate":"2025-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145610299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing aesthetic image generation with reinforcement learning guided prompt optimization in stable diffusion 在稳定扩散中,用强化学习引导提示优化增强美学图像的生成
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-15 DOI: 10.1016/j.jvcir.2025.104641
Junyong You , Yuan Lin , Bin Hu
Generative models, e.g., stable diffusion, excel at producing compelling images but remain highly dependent on crafted prompts. Refining prompts for specific objectives, especially aesthetic quality, is time-consuming and inconsistent. We propose a novel approach that leverages LLMs to enhance prompt refinement process for stable diffusion. First, we propose a model to predict aesthetic image quality, examining various aesthetic elements in spatial, channel, and color domains. Reinforcement learning is employed to refine the prompt, starting from a rudimentary version and iteratively improving them with LLM’s assistance. This iterative process is guided by a policy network updating prompts based on interactions with the generated images, with a reward function measuring aesthetic improvement and adherence to the prompt. Our experimental results demonstrate that this method significantly boosts the visual quality of generated images when using these refined prompts. Beyond image synthesis, this approach provides a broader framework for improving prompts across diverse applications with the support of LLMs.
生成模型,例如,稳定扩散,擅长产生引人注目的图像,但仍然高度依赖于精心制作的提示。针对特定目标(尤其是美学质量)精炼提示既耗时又不一致。我们提出了一种新的方法,利用llm来增强稳定扩散的快速细化过程。首先,我们提出了一个模型来预测美学图像质量,检查空间,通道和颜色域的各种美学元素。强化学习被用来完善提示,从一个基本的版本开始,在LLM的帮助下迭代改进它们。这个迭代过程由一个策略网络指导,该网络基于与生成的图像的交互更新提示,并带有衡量美学改进和对提示的遵守的奖励功能。我们的实验结果表明,当使用这些改进的提示时,该方法显著提高了生成图像的视觉质量。除了图像合成之外,这种方法还提供了一个更广泛的框架,可以在llm的支持下改进不同应用程序之间的提示。
{"title":"Enhancing aesthetic image generation with reinforcement learning guided prompt optimization in stable diffusion","authors":"Junyong You ,&nbsp;Yuan Lin ,&nbsp;Bin Hu","doi":"10.1016/j.jvcir.2025.104641","DOIUrl":"10.1016/j.jvcir.2025.104641","url":null,"abstract":"<div><div>Generative models, e.g., stable diffusion, excel at producing compelling images but remain highly dependent on crafted prompts. Refining prompts for specific objectives, especially aesthetic quality, is time-consuming and inconsistent. We propose a novel approach that leverages LLMs to enhance prompt refinement process for stable diffusion. First, we propose a model to predict aesthetic image quality, examining various aesthetic elements in spatial, channel, and color domains. Reinforcement learning is employed to refine the prompt, starting from a rudimentary version and iteratively improving them with LLM’s assistance. This iterative process is guided by a policy network updating prompts based on interactions with the generated images, with a reward function measuring aesthetic improvement and adherence to the prompt. Our experimental results demonstrate that this method significantly boosts the visual quality of generated images when using these refined prompts. Beyond image synthesis, this approach provides a broader framework for improving prompts across diverse applications with the support of LLMs.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104641"},"PeriodicalIF":3.1,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vision-language tracking with attention-based optimization 基于注意力优化的视觉语言跟踪
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-15 DOI: 10.1016/j.jvcir.2025.104644
Shuo Hu , Tongtong Liu , Liyang Han , Run Xing
Most existing visual tracking methods typically employ image patches as target references and endeavor to enhance tracking performance by maximizing the utilization of visual information through various deep networks. However, due to the intrinsic limitations of visual information, the performance of the trackers significantly deteriorates when confronted with drastic target variations or complex background environments. To address these issues, we propose a vision-language multimodal fusion tracker for object tracking. Firstly, we use semantic information from language descriptions to compensate for the instability of visual information, and establish multimodal cross-relations through the fusion of visual and language features. Secondly, we propose an attention-based token screening mechanism that utilizes semantic-guided attention and masking operations to eliminate irrelevant search tokens devoid of target information, thereby enhancing both accuracy and efficiency. Furthermore, we optimize the localization head by introducing channel attention, which effectively improves the accuracy of target positioning. Extensive experiments conducted on the OTB99, LaSOT, and TNL2K datasets demonstrate the effectiveness of our proposed tracking method, achieving success rates of 71.2%, 69.5%, and 58.9%, respectively.
现有的大多数视觉跟踪方法通常采用图像补丁作为目标参考,并通过各种深度网络最大限度地利用视觉信息来提高跟踪性能。然而,由于视觉信息固有的局限性,当目标变化剧烈或背景环境复杂时,跟踪器的性能会显著下降。为了解决这些问题,我们提出了一种用于目标跟踪的视觉语言多模态融合跟踪器。首先,利用语言描述中的语义信息弥补视觉信息的不稳定性,通过视觉特征和语言特征的融合建立多模态交叉关系;其次,我们提出了一种基于注意力的令牌筛选机制,该机制利用语义引导的注意力和屏蔽操作来消除缺乏目标信息的不相关搜索令牌,从而提高准确性和效率。此外,通过引入信道注意对定位头进行优化,有效提高了目标定位的精度。在OTB99、LaSOT和TNL2K数据集上进行的大量实验证明了我们提出的跟踪方法的有效性,成功率分别为71.2%、69.5%和58.9%。
{"title":"Vision-language tracking with attention-based optimization","authors":"Shuo Hu ,&nbsp;Tongtong Liu ,&nbsp;Liyang Han ,&nbsp;Run Xing","doi":"10.1016/j.jvcir.2025.104644","DOIUrl":"10.1016/j.jvcir.2025.104644","url":null,"abstract":"<div><div>Most existing visual tracking methods typically employ image patches as target references and endeavor to enhance tracking performance by maximizing the utilization of visual information through various deep networks. However, due to the intrinsic limitations of visual information, the performance of the trackers significantly deteriorates when confronted with drastic target variations or complex background environments. To address these issues, we propose a vision-language multimodal fusion tracker for object tracking. Firstly, we use semantic information from language descriptions to compensate for the instability of visual information, and establish multimodal cross-relations through the fusion of visual and language features. Secondly, we propose an attention-based token screening mechanism that utilizes semantic-guided attention and masking operations to eliminate irrelevant search tokens devoid of target information, thereby enhancing both accuracy and efficiency. Furthermore, we optimize the localization head by introducing channel attention, which effectively improves the accuracy of target positioning. Extensive experiments conducted on the OTB99, LaSOT, and TNL2K datasets demonstrate the effectiveness of our proposed tracking method, achieving success rates of 71.2%, 69.5%, and 58.9%, respectively.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104644"},"PeriodicalIF":3.1,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145571651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal deep facial expression recognition framework combining knowledge distillation and retrieval-augmented generation 结合知识蒸馏和检索增强生成的多模态深度面部表情识别框架
IF 3.1 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-15 DOI: 10.1016/j.jvcir.2025.104645
Beibei Jiang, Yu Zhou
In recent years, significant progress has been made in facial expression recognition (FER) methods based on deep learning. However, existing models still face challenges in terms of computational efficiency and generalization performance when dealing with diverse emotional expressions and complex environmental variations. Recently, large-scale vision-language pre-training models such as CLIP have achieved remarkable success in multi-modal learning. Their rich visual and textual representations offer valuable insights for downstream tasks. Consequently, transferring the knowledge to develop efficient and accurate facial expression recognition (FER) systems has emerged as a key research direction. To the end, this paper proposes a novel model, termed Knowledge Distillation and Retrieval-Augmented Generation (KDRAG), which combines Distillation and Retrieval-Augmented Generation (RAG) techniques to improve the efficiency and accuracy of FER. Through knowledge distillation, the teacher model (ViT-L/14) transfers its rich knowledge to the smaller student model (ViT-B/32). An additional linear projection layer is added to map the teacher model’s output features to the student model’s feature dimensions for feature alignment. Moreover, the RAG mechanism is developed to enhance the emotional understanding of students by retrieving text descriptions related to the input image. Additionally, this framework combines soft loss (from the teacher model’s knowledge) and hard loss (from the true targets of the labels) to enhance the model’s generalization ability. Extensive experimental results on multiple datasets demonstrate that the KDRAG framework can achieve significant improvements in accuracy and computational efficiency, providing new insights for real-time FER systems.
近年来,基于深度学习的面部表情识别方法取得了重大进展。然而,现有模型在处理复杂的情绪表达和复杂的环境变化时,在计算效率和泛化性能方面仍然面临挑战。近年来,大规模的视觉语言预训练模型(如CLIP)在多模态学习中取得了显著的成功。它们丰富的可视化和文本表示为下游任务提供了有价值的见解。因此,利用这些知识开发高效、准确的面部表情识别系统已成为一个重要的研究方向。最后,本文提出了一种新的知识精馏和检索增强生成(KDRAG)模型,该模型将精馏和检索增强生成(RAG)技术相结合,以提高知识精馏和检索增强生成的效率和准确性。教师模型(viti - l /14)通过知识提炼,将其丰富的知识传递给较小的学生模型(viti - b /32)。添加了一个额外的线性投影层,将教师模型的输出特征映射到学生模型的特征维度,以进行特征对齐。此外,我们开发了RAG机制,通过检索与输入图像相关的文本描述来增强学生的情感理解。此外,该框架结合了软损失(来自教师模型的知识)和硬损失(来自标签的真实目标),以增强模型的泛化能力。在多个数据集上的大量实验结果表明,KDRAG框架可以显著提高精度和计算效率,为实时FER系统提供新的见解。
{"title":"Multi-modal deep facial expression recognition framework combining knowledge distillation and retrieval-augmented generation","authors":"Beibei Jiang,&nbsp;Yu Zhou","doi":"10.1016/j.jvcir.2025.104645","DOIUrl":"10.1016/j.jvcir.2025.104645","url":null,"abstract":"<div><div>In recent years, significant progress has been made in facial expression recognition (FER) methods based on deep learning. However, existing models still face challenges in terms of computational efficiency and generalization performance when dealing with diverse emotional expressions and complex environmental variations. Recently, large-scale vision-language pre-training models such as CLIP have achieved remarkable success in multi-modal learning. Their rich visual and textual representations offer valuable insights for downstream tasks. Consequently, transferring the knowledge to develop efficient and accurate facial expression recognition (FER) systems has emerged as a key research direction. To the end, this paper proposes a novel model, termed Knowledge Distillation and Retrieval-Augmented Generation (KDRAG), which combines Distillation and Retrieval-Augmented Generation (RAG) techniques to improve the efficiency and accuracy of FER. Through knowledge distillation, the teacher model (ViT-L/14) transfers its rich knowledge to the smaller student model (ViT-B/32). An additional linear projection layer is added to map the teacher model’s output features to the student model’s feature dimensions for feature alignment. Moreover, the RAG mechanism is developed to enhance the emotional understanding of students by retrieving text descriptions related to the input image. Additionally, this framework combines soft loss (from the teacher model’s knowledge) and hard loss (from the true targets of the labels) to enhance the model’s generalization ability. Extensive experimental results on multiple datasets demonstrate that the KDRAG framework can achieve significant improvements in accuracy and computational efficiency, providing new insights for real-time FER systems.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"114 ","pages":"Article 104645"},"PeriodicalIF":3.1,"publicationDate":"2025-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145520897","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Visual Communication and Image Representation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1