首页 > 最新文献

International Journal of Computer Vision最新文献

英文 中文
Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces 未注册表面空间上的基点受限弹性形状分析
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-30 DOI: 10.1007/s11263-024-02269-3
Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon

This paper introduces a new framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of our approach is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations. This allows us to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is equipped with a non-Euclidean Riemannian metric inherited from the family of elastic metrics. We demonstrate how this model can be effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face and hand scans for problems such as shape registration, interpolation, motion transfer or random pose generation.

本文从形状空间上的弹性黎曼度量的一般设置出发,介绍了一种新的曲面分析框架。传统上,这些度量定义在浸没曲面的无限维流形上,并满足特定的不变性属性,从而可以比较曲面在形状保持变换(如重拟三维变换)下的模态。我们方法的特殊性在于将允许变换的空间限制在预定义的变形场有限维基上。这些基础是以数据驱动的方式估算的,以便模拟特定类型的表面变换。这样,我们就能将相应形状空间的表示简化为有限维潜在空间。然而,与网格自动编码器等方法形成鲜明对比的是,潜空间配备了从弹性度量系列继承而来的非欧几里得黎曼度量。我们展示了如何有效地实施这一模型,以便在曲面网格上执行各种任务,重要的是,我们并不假定这些网格是预先注册的,甚至不假定它们具有一致的网格结构。我们特别在人体形状和姿势数据以及人脸和手部扫描上验证了我们的方法,以解决形状注册、插值、运动转移或随机姿势生成等问题。
{"title":"Basis Restricted Elastic Shape Analysis on the Space of Unregistered Surfaces","authors":"Emmanuel Hartman, Emery Pierson, Martin Bauer, Mohamed Daoudi, Nicolas Charon","doi":"10.1007/s11263-024-02269-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02269-3","url":null,"abstract":"<p>This paper introduces a new framework for surface analysis derived from the general setting of elastic Riemannian metrics on shape spaces. Traditionally, those metrics are defined over the infinite dimensional manifold of immersed surfaces and satisfy specific invariance properties enabling the comparison of surfaces modulo shape preserving transformations such as reparametrizations. The specificity of our approach is to restrict the space of allowable transformations to predefined finite dimensional bases of deformation fields. These are estimated in a data-driven way so as to emulate specific types of surface transformations. This allows us to simplify the representation of the corresponding shape space to a finite dimensional latent space. However, in sharp contrast with methods involving e.g. mesh autoencoders, the latent space is equipped with a non-Euclidean Riemannian metric inherited from the family of elastic metrics. We demonstrate how this model can be effectively implemented to perform a variety of tasks on surface meshes which, importantly, does not assume these to be pre-registered or to even have a consistent mesh structure. We specifically validate our approach on human body shape and pose data as well as human face and hand scans for problems such as shape registration, interpolation, motion transfer or random pose generation.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142556048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection 用于弱监督在线活动检测的具有课程预测功能的记忆辅助知识转移框架
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-28 DOI: 10.1007/s11263-024-02279-1
Tianshan Liu, Kin-Man Lam, Bing-Kun Bao

As a crucial topic of high-level video understanding, weakly supervised online activity detection (WS-OAD) involves identifying the ongoing behaviors moment-to-moment in streaming videos, trained with solely cheap video-level annotations. It is essentially a challenging task, which requires addressing the entangled issues of the weakly supervised settings and online constraints. In this paper, we tackle the WS-OAD task from the knowledge-distillation (KD) perspective, which trains an online student detector to distill dual-level knowledge from a weakly supervised offline teacher model. To guarantee the completeness of knowledge transfer, we improve the vanilla KD framework from two aspects. First, we introduce an external memory bank to maintain the long-term activity prototypes, which serves as a bridge to align the activity semantics learned from the offline teacher and online student models. Second, to compensate the missing contexts of unseen near future, we leverage a curriculum learning paradigm to gradually train the online student detector to anticipate the future activity semantics. By dynamically scheduling the provided auxiliary future states, the online detector progressively distills contextual information from the offline model in an easy-to-hard course. Extensive experimental results on three public data sets demonstrate the superiority of our proposed method over the competing methods.

作为高级视频理解的一个重要课题,弱监督在线活动检测(WS-OAD)是指仅利用廉价的视频级注释进行训练,识别流媒体视频中每时每刻正在发生的行为。从本质上讲,这是一项具有挑战性的任务,需要解决弱监督设置和在线约束之间的纠缠不清的问题。在本文中,我们从知识提炼(KD)的角度来解决 WS-OAD 任务,即训练一个在线学生检测器,从弱监督离线教师模型中提炼出双层知识。为了保证知识转移的完整性,我们从两个方面改进了虚无的 KD 框架。首先,我们引入了一个外部记忆库来维护长期的活动原型,作为一座桥梁,将从离线教师模型和在线学生模型中学到的活动语义统一起来。其次,为了弥补近期未见语境的缺失,我们利用课程学习范式来逐步训练在线学生检测器,以预测未来的活动语义。通过动态调度所提供的辅助未来状态,在线检测器在由易到难的过程中逐步从离线模型中提炼出上下文信息。在三个公共数据集上的广泛实验结果表明,我们提出的方法优于其他竞争方法。
{"title":"A Memory-Assisted Knowledge Transferring Framework with Curriculum Anticipation for Weakly Supervised Online Activity Detection","authors":"Tianshan Liu, Kin-Man Lam, Bing-Kun Bao","doi":"10.1007/s11263-024-02279-1","DOIUrl":"https://doi.org/10.1007/s11263-024-02279-1","url":null,"abstract":"<p>As a crucial topic of high-level video understanding, weakly supervised online activity detection (WS-OAD) involves identifying the ongoing behaviors moment-to-moment in streaming videos, trained with solely cheap video-level annotations. It is essentially a challenging task, which requires addressing the entangled issues of the weakly supervised settings and online constraints. In this paper, we tackle the WS-OAD task from the knowledge-distillation (KD) perspective, which trains an online student detector to distill dual-level knowledge from a weakly supervised offline teacher model. To guarantee the completeness of knowledge transfer, we improve the vanilla KD framework from two aspects. First, we introduce an external memory bank to maintain the long-term activity prototypes, which serves as a bridge to align the activity semantics learned from the offline teacher and online student models. Second, to compensate the missing contexts of unseen near future, we leverage a curriculum learning paradigm to gradually train the online student detector to anticipate the future activity semantics. By dynamically scheduling the provided auxiliary future states, the online detector progressively distills contextual information from the offline model in an easy-to-hard course. Extensive experimental results on three public data sets demonstrate the superiority of our proposed method over the competing methods.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142536604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Attention Vision-Language Transformer Network for Person Re-identification 用于人员再识别的动态注意力视觉语言转换器网络
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-26 DOI: 10.1007/s11263-024-02277-3
Guifang Zhang, Shijun Tan, Zhe Ji, Yuming Fang

Multimodal based person re-identification (ReID) has garnered increasing attention in recent years. However, the integration of visual and textual information encounters significant challenges. Biases in feature integration are frequently observed in existing methods, resulting in suboptimal performance and restricted generalization across a spectrum of ReID tasks. At the same time, since there is a domain gap between the datasets used by the pretraining model and the ReID datasets, it has a certain impact on the performance. In response to these challenges, we proposed a dynamic attention vision-language transformer network for the ReID task. In this network, a novel image-text dynamic attention module (ITDA) is designed to promote unbiased feature integration by dynamically assigning the importance of image and text representations. Additionally, an adapter module is adopted to address the domain gap between pretraining datasets and ReID datasets. Our network can capture complex connections between visual and textual information and achieve satisfactory performance. We conducted numerous experiments on ReID benchmarks to demonstrate the efficacy of our proposed method. The experimental results show that our method achieves state-of-the-art performance, surpassing existing integration strategies. These findings underscore the critical role of unbiased feature dynamic integration in enhancing the capabilities of multimodal based ReID models.

近年来,基于多模态的人员再识别(ReID)技术受到越来越多的关注。然而,视觉和文本信息的整合遇到了重大挑战。现有方法在特征整合方面经常出现偏差,导致在一系列 ReID 任务中表现不佳,通用性受限。同时,由于预训练模型所使用的数据集与 ReID 数据集之间存在领域差距,这对性能有一定的影响。为了应对这些挑战,我们为 ReID 任务提出了一种动态注意力视觉语言转换器网络。在这个网络中,我们设计了一个新颖的图像-文本动态注意力模块(ITDA),通过动态分配图像和文本表征的重要性来促进无偏见的特征整合。此外,还采用了一个适配器模块来解决预训练数据集和 ReID 数据集之间的领域差距。我们的网络能够捕捉视觉和文本信息之间的复杂联系,并取得了令人满意的性能。我们在 ReID 基准上进行了大量实验,以证明我们提出的方法的有效性。实验结果表明,我们的方法达到了最先进的性能,超越了现有的整合策略。这些发现强调了无偏差特征动态整合在增强基于多模态的 ReID 模型能力方面的关键作用。
{"title":"Dynamic Attention Vision-Language Transformer Network for Person Re-identification","authors":"Guifang Zhang, Shijun Tan, Zhe Ji, Yuming Fang","doi":"10.1007/s11263-024-02277-3","DOIUrl":"https://doi.org/10.1007/s11263-024-02277-3","url":null,"abstract":"<p>Multimodal based person re-identification (ReID) has garnered increasing attention in recent years. However, the integration of visual and textual information encounters significant challenges. Biases in feature integration are frequently observed in existing methods, resulting in suboptimal performance and restricted generalization across a spectrum of ReID tasks. At the same time, since there is a domain gap between the datasets used by the pretraining model and the ReID datasets, it has a certain impact on the performance. In response to these challenges, we proposed a dynamic attention vision-language transformer network for the ReID task. In this network, a novel image-text dynamic attention module (ITDA) is designed to promote unbiased feature integration by dynamically assigning the importance of image and text representations. Additionally, an adapter module is adopted to address the domain gap between pretraining datasets and ReID datasets. Our network can capture complex connections between visual and textual information and achieve satisfactory performance. We conducted numerous experiments on ReID benchmarks to demonstrate the efficacy of our proposed method. The experimental results show that our method achieves state-of-the-art performance, surpassing existing integration strategies. These findings underscore the critical role of unbiased feature dynamic integration in enhancing the capabilities of multimodal based ReID models.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142490658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StyleAdapter: A Unified Stylized Image Generation Model 样式适配器统一的风格化图像生成模型
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-25 DOI: 10.1007/s11263-024-02253-x
Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo

This work focuses on generating high-quality images with specific style of reference images and content of provided textual descriptions. Current leading algorithms, i.e., DreamBooth and LoRA, require fine-tuning for each style, leading to time-consuming and computationally expensive processes. In this work, we propose StyleAdapter, a unified stylized image generation model capable of producing a variety of stylized images that match both the content of a given prompt and the style of reference images, without the need for per-style fine-tuning. It introduces a two-path cross-attention (TPCA) module to separately process style information and textual prompt, which cooperate with a semantic suppressing vision model (SSVM) to suppress the semantic content of style images. In this way, it can ensure that the prompt maintains control over the content of the generated images, while also mitigating the negative impact of semantic information in style references. This results in the content of the generated image adhering to the prompt, and its style aligning with the style references. Besides, our StyleAdapter can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet, to attain a more controllable and stable generation process. Extensive experiments demonstrate the superiority of our method over previous works.

这项工作的重点是生成具有特定参考图像风格和所提供文本描述内容的高质量图像。目前的主流算法,即 DreamBooth 和 LoRA,需要对每种风格进行微调,导致过程耗时且计算成本高昂。在这项工作中,我们提出了样式适配器(StyleAdapter),这是一种统一的样式化图像生成模型,能够生成与给定提示内容和参考图像样式相匹配的各种样式化图像,而无需对每种样式进行微调。它引入了双路径交叉注意(TPCA)模块,分别处理风格信息和文本提示,并与语义抑制视觉模型(SSVM)合作,抑制风格图像的语义内容。这样,既能确保提示信息保持对生成图像内容的控制,又能减轻样式参考中语义信息的负面影响。这样,生成图像的内容就会与提示保持一致,其样式也会与样式参考保持一致。此外,我们的 StyleAdapter 可以与现有的可控合成方法(如 T2I-adapter 和 ControlNet)集成,以实现更可控、更稳定的生成过程。广泛的实验证明了我们的方法优于之前的作品。
{"title":"StyleAdapter: A Unified Stylized Image Generation Model","authors":"Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo","doi":"10.1007/s11263-024-02253-x","DOIUrl":"https://doi.org/10.1007/s11263-024-02253-x","url":null,"abstract":"<p>This work focuses on generating high-quality images with specific style of reference images and content of provided textual descriptions. Current leading algorithms, i.e., DreamBooth and LoRA, require fine-tuning for each style, leading to time-consuming and computationally expensive processes. In this work, we propose StyleAdapter, a unified stylized image generation model capable of producing a variety of stylized images that match both the content of a given prompt and the style of reference images, without the need for per-style fine-tuning. It introduces a two-path cross-attention (TPCA) module to separately process style information and textual prompt, which cooperate with a semantic suppressing vision model (SSVM) to suppress the semantic content of style images. In this way, it can ensure that the prompt maintains control over the content of the generated images, while also mitigating the negative impact of semantic information in style references. This results in the content of the generated image adhering to the prompt, and its style aligning with the style references. Besides, our StyleAdapter can be integrated with existing controllable synthesis methods, such as T2I-adapter and ControlNet, to attain a more controllable and stable generation process. Extensive experiments demonstrate the superiority of our method over previous works.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sample Correlation for Fingerprinting Deep Face Recognition 指纹深度人脸识别的样本相关性
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-25 DOI: 10.1007/s11263-024-02254-w
Jiyang Guan, Jian Liang, Yanbo Wang, Ran He

Face recognition has witnessed remarkable advancements in recent years, thanks to the development of deep learning techniques. However, an off-the-shelf face recognition model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model owner. Model fingerprinting, as a model stealing detection method, aims to verify whether a suspect model is stolen from the victim model, gaining more and more attention nowadays. Previous methods always utilize transferable adversarial examples as the model fingerprint, but this method is known to be sensitive to adversarial defense and transfer learning techniques. To address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC). Specifically, we present SAC-JC that selects JPEG compressed samples as model inputs and calculates the correlation matrix among their model outputs. Extensive results validate that SAC successfully defends against various model stealing attacks in deep face recognition, encompassing face verification and face emotion recognition, exhibiting the highest performance in terms of AUC, p-value and F1 score. Furthermore, we extend our evaluation of SAC-JC to object recognition datasets including Tiny-ImageNet and CIFAR10, which also demonstrates the superior performance of SAC-JC to previous methods. The code will be available at https://github.com/guanjiyang/SAC_JC.

近年来,得益于深度学习技术的发展,人脸识别技术取得了显著进步。然而,作为一种商业服务,现成的人脸识别模型可能会被模型窃取攻击窃取,对模型所有者的权益造成极大威胁。模型指纹识别作为一种模型窃取检测方法,旨在验证可疑模型是否是从受害模型中窃取的,如今正受到越来越多的关注。以往的方法总是利用可转移的对抗实例作为模型指纹,但众所周知,这种方法对对抗防御和转移学习技术很敏感。为了解决这个问题,我们考虑了样本之间的成对关系,提出了一种新颖而简单的基于 "简单相关性"(SAmple Correlation,SAC)的模型窃取检测方法。具体来说,我们提出的 SAC-JC 可以选择 JPEG 压缩样本作为模型输入,并计算其模型输出之间的相关矩阵。广泛的结果验证了 SAC 成功抵御了深度人脸识别(包括人脸验证和人脸情感识别)中的各种模型窃取攻击,在 AUC、P 值和 F1 分数方面表现出了最高的性能。此外,我们还将SAC-JC的评估扩展到了物体识别数据集,包括Tiny-ImageNet和CIFAR10,这也证明了SAC-JC的性能优于之前的方法。代码可在 https://github.com/guanjiyang/SAC_JC 上获取。
{"title":"Sample Correlation for Fingerprinting Deep Face Recognition","authors":"Jiyang Guan, Jian Liang, Yanbo Wang, Ran He","doi":"10.1007/s11263-024-02254-w","DOIUrl":"https://doi.org/10.1007/s11263-024-02254-w","url":null,"abstract":"<p>Face recognition has witnessed remarkable advancements in recent years, thanks to the development of deep learning techniques. However, an off-the-shelf face recognition model as a commercial service could be stolen by model stealing attacks, posing great threats to the rights of the model owner. Model fingerprinting, as a model stealing detection method, aims to verify whether a suspect model is stolen from the victim model, gaining more and more attention nowadays. Previous methods always utilize transferable adversarial examples as the model fingerprint, but this method is known to be sensitive to adversarial defense and transfer learning techniques. To address this issue, we consider the pairwise relationship between samples instead and propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC). Specifically, we present SAC-JC that selects JPEG compressed samples as model inputs and calculates the correlation matrix among their model outputs. Extensive results validate that SAC successfully defends against various model stealing attacks in deep face recognition, encompassing face verification and face emotion recognition, exhibiting the highest performance in terms of AUC, <i>p</i>-value and F1 score. Furthermore, we extend our evaluation of SAC-JC to object recognition datasets including Tiny-ImageNet and CIFAR10, which also demonstrates the superior performance of SAC-JC to previous methods. The code will be available at https://github.com/guanjiyang/SAC_JC.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142490657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation Show-1:将像素模型和潜在扩散模型用于文本到视频的生成
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-24 DOI: 10.1007/s11263-024-02271-9
David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou

Significant advancements have been achieved in the realm of large-scale pre-trained text-to-video Diffusion Models (VDMs). However, previous methods either rely solely on pixel-based VDMs, which come with high computational costs, or on latent-based VDMs, which often struggle with precise text-video alignment. In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation. Our model first uses pixel-based VDMs to produce a low-resolution video of strong text-video correlation. After that, we propose a novel expert translation method that employs the latent-based VDMs to further upsample the low-resolution video to high resolution, which can also remove potential artifacts and corruptions from low-resolution videos. Compared to latent VDMs, Show-1 can produce high-quality videos of precise text-video alignment; Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage during inference is 15 G vs. 72 G). Furthermore, our Show-1 model can be readily adapted for motion customization and video stylization applications through simple temporal attention layer finetuning. Our model achieves state-of-the-art performance on standard video generation benchmarks. Code of Show-1 is publicly available and more videos can be found here.

在大规模预训练文本到视频扩散模型(VDM)领域取得了重大进展。然而,以前的方法要么完全依赖于基于像素的 VDM,计算成本高昂;要么依赖于基于潜隐的 VDM,往往难以实现文本与视频的精确对齐。在本文中,我们首次提出了一种混合模型,称为 Show-1,它将基于像素的 VDM 和基于潜变量的 VDM 结合在一起,用于文本到视频的生成。我们的模型首先使用基于像素的 VDM 生成文本与视频高度相关的低分辨率视频。然后,我们提出了一种新颖的专家翻译方法,该方法利用基于潜像的 VDM 将低分辨率视频进一步升采样到高分辨率,这样还能去除低分辨率视频中潜在的伪影和损坏。与潜在 VDM 相比,Show-1 可以生成文本与视频精确对齐的高质量视频;与像素 VDM 相比,Show-1 的效率更高(推理过程中 GPU 内存使用量为 15 G,而像素 VDM 为 72 G)。此外,通过简单的时间注意层微调,我们的 Show-1 模型还能轻松地适用于运动定制和视频风格化应用。我们的模型在标准视频生成基准上达到了最先进的性能。Show-1 的代码已公开,更多视频请点击此处。
{"title":"Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation","authors":"David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, Yuchao Gu, Difei Gao, Mike Zheng Shou","doi":"10.1007/s11263-024-02271-9","DOIUrl":"https://doi.org/10.1007/s11263-024-02271-9","url":null,"abstract":"<p>Significant advancements have been achieved in the realm of large-scale pre-trained text-to-video Diffusion Models (VDMs). However, previous methods either rely solely on pixel-based VDMs, which come with high computational costs, or on latent-based VDMs, which often struggle with precise text-video alignment. In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation. Our model first uses pixel-based VDMs to produce a low-resolution video of strong text-video correlation. After that, we propose a novel expert translation method that employs the latent-based VDMs to further upsample the low-resolution video to high resolution, which can also remove potential artifacts and corruptions from low-resolution videos. Compared to latent VDMs, Show-1 can produce high-quality videos of precise text-video alignment; Compared to pixel VDMs, Show-1 is much more efficient (GPU memory usage during inference is 15 G vs. 72 G). Furthermore, our Show-1 model can be readily adapted for motion customization and video stylization applications through simple temporal attention layer finetuning. Our model achieves state-of-the-art performance on standard video generation benchmarks. Code of Show-1 is publicly available and more videos can be found here.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142489488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neural Vector Fields for Implicit Surface Representation and Inference 用于隐式表面表示和推理的神经向量场
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1007/s11263-024-02251-z
Edoardo Mello Rella, Ajad Chhatkuli, Ender Konukoglu, Luc Van Gool

Neural implicit fields have recently shown increasing success in representing, learning and analysis of 3D shapes. Signed distance fields and occupancy fields are still the preferred choice of implicit representations with well-studied properties, despite their restriction to closed surfaces. With neural networks, unsigned distance fields as well as several other variations and training principles have been proposed with the goal to represent all classes of shapes. In this paper, we develop a novel and yet a fundamental representation considering unit vectors in 3D space and call it Vector Field (VF). At each point in (mathbb {R}^3), VF is directed to the closest point on the surface. We theoretically demonstrate that VF can be easily transformed to surface density by computing the flux density. Unlike other standard representations, VF directly encodes an important physical property of the surface, its normal. We further show the advantages of VF representation, in learning open, closed, or multi-layered surfaces. We show that, thanks to the continuity property of the neural optimization with VF, a separate distance field becomes unnecessary for extracting surfaces from the implicit field via Marching Cubes. We compare our method on several datasets including ShapeNet where the proposed new neural implicit field shows superior accuracy in representing any type of shape, outperforming other standard methods. Codes are available at https://github.com/edomel/ImplicitVF.

最近,神经隐含场在表示、学习和分析三维形状方面取得了越来越多的成功。有符号距离场和占位场仍然是隐式表示法的首选,尽管它们仅限于封闭曲面,但其特性已得到充分研究。随着神经网络的发展,人们提出了无符号距离场以及其他一些变体和训练原则,旨在表示所有类别的形状。在本文中,我们考虑到三维空间中的单位向量,开发了一种新颖且基本的表示方法,并称之为矢量场(VF)。在 (mathbb {R}^3) 中的每个点,VF 都指向曲面上最近的点。我们从理论上证明,通过计算通量密度,VF 可以很容易地转换为表面密度。与其他标准表示法不同,VF 直接编码了曲面的一个重要物理属性--法线。我们进一步展示了 VF 表示法在学习开放、封闭或多层表面时的优势。我们证明,由于 VF 神经优化的连续性特性,通过行进立方体从隐式场中提取曲面时无需单独的距离场。我们在包括 ShapeNet 在内的多个数据集上对我们的方法进行了比较,结果表明,所提出的新神经隐含场在表示任何类型的形状时都表现出卓越的准确性,优于其他标准方法。代码见 https://github.com/edomel/ImplicitVF。
{"title":"Neural Vector Fields for Implicit Surface Representation and Inference","authors":"Edoardo Mello Rella, Ajad Chhatkuli, Ender Konukoglu, Luc Van Gool","doi":"10.1007/s11263-024-02251-z","DOIUrl":"https://doi.org/10.1007/s11263-024-02251-z","url":null,"abstract":"<p>Neural implicit fields have recently shown increasing success in representing, learning and analysis of 3D shapes. Signed distance fields and occupancy fields are still the preferred choice of implicit representations with well-studied properties, despite their restriction to closed surfaces. With neural networks, unsigned distance fields as well as several other variations and training principles have been proposed with the goal to represent all classes of shapes. In this paper, we develop a novel and yet a fundamental representation considering unit vectors in 3D space and call it Vector Field (VF). At each point in <span>(mathbb {R}^3)</span>, VF is directed to the closest point on the surface. We theoretically demonstrate that VF can be easily transformed to surface density by computing the flux density. Unlike other standard representations, VF directly encodes an important physical property of the surface, its normal. We further show the advantages of VF representation, in learning open, closed, or multi-layered surfaces. We show that, thanks to the continuity property of the neural optimization with VF, a separate distance field becomes unnecessary for extracting surfaces from the implicit field via Marching Cubes. We compare our method on several datasets including ShapeNet where the proposed new neural implicit field shows superior accuracy in representing any type of shape, outperforming other standard methods. Codes are available at https://github.com/edomel/ImplicitVF.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning Text-to-Video Retrieval from Image Captioning 从图像字幕学习文本到视频检索
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-22 DOI: 10.1007/s11263-024-02202-8
Lucas Ventura, Cordelia Schmid, Gül Varol

We describe a protocol to study text-to-video retrieval training with unlabeled videos, where we assume (i) no access to labels for any videos, i.e., no access to the set of ground-truth captions, but (ii) access to labeled images in the form of text. Using image expert models is a realistic scenario given that annotating images is cheaper therefore scalable, in contrast to expensive video labeling schemes. Recently, zero-shot image experts such as CLIP have established a new strong baseline for video understanding tasks. In this paper, we make use of this progress and instantiate the image experts from two types of models: a text-to-image retrieval model to provide an initial backbone, and image captioning models to provide supervision signal into unlabeled videos. We show that automatically labeling video frames with image captioning allows text-to-video retrieval training. This process adapts the features to the target domain at no manual annotation cost, consequently outperforming the strong zero-shot CLIP baseline. During training, we sample captions from multiple video frames that best match the visual content, and perform a temporal pooling over frame representations by scoring frames according to their relevance to each caption. We conduct extensive ablations to provide insights and demonstrate the effectiveness of this simple framework by outperforming the CLIP zero-shot baselines on text-to-video retrieval on three standard datasets, namely ActivityNet, MSR-VTT, and MSVD. Code and models will be made publicly available.

我们描述了一种利用无标签视频进行文本到视频检索训练的研究方案,其中我们假设:(i) 无法访问任何视频的标签,即无法访问真实字幕集,但 (ii) 可以访问文本形式的标签图像。与昂贵的视频标注方案相比,使用图像专家模型注释图像的成本更低,因此具有可扩展性,这是一个现实的方案。最近,零镜头图像专家(如 CLIP)为视频理解任务建立了新的强大基线。在本文中,我们利用这一进展,从两类模型中实例化了图像专家:文本到图像检索模型提供初始骨干,图像字幕模型为无标签视频提供监督信号。我们的研究表明,利用图像标题自动标记视频帧可实现文本到视频的检索训练。这一过程无需人工标注成本,就能使特征适应目标领域,因此优于强大的零镜头 CLIP 基线。在训练过程中,我们从与视觉内容最匹配的多个视频帧中抽取字幕样本,并根据帧与每个字幕的相关性对帧表示进行评分,从而对帧表示进行时间池化。我们对三个标准数据集(即 ActivityNet、MSR-VTT 和 MSVD)上的文本到视频检索进行了广泛的消减,以提供深入的见解,并通过优于 CLIP 零点基线来证明这一简单框架的有效性。代码和模型将公开发布。
{"title":"Learning Text-to-Video Retrieval from Image Captioning","authors":"Lucas Ventura, Cordelia Schmid, Gül Varol","doi":"10.1007/s11263-024-02202-8","DOIUrl":"https://doi.org/10.1007/s11263-024-02202-8","url":null,"abstract":"<p>We describe a protocol to study text-to-video retrieval training with unlabeled videos, where we assume (i) no access to labels for any videos, i.e., no access to the set of ground-truth captions, but (ii) access to labeled images in the form of text. Using image expert models is a realistic scenario given that annotating images is cheaper therefore scalable, in contrast to expensive video labeling schemes. Recently, zero-shot image experts such as CLIP have established a new strong baseline for video understanding tasks. In this paper, we make use of this progress and instantiate the image experts from two types of models: a text-to-image retrieval model to provide an initial backbone, and image captioning models to provide supervision signal into unlabeled videos. We show that automatically labeling video frames with image captioning allows text-to-video retrieval training. This process adapts the features to the target domain at no manual annotation cost, consequently outperforming the strong zero-shot CLIP baseline. During training, we sample captions from multiple video frames that best match the visual content, and perform a temporal pooling over frame representations by scoring frames according to their relevance to each caption. We conduct extensive ablations to provide insights and demonstrate the effectiveness of this simple framework by outperforming the CLIP zero-shot baselines on text-to-video retrieval on three standard datasets, namely ActivityNet, MSR-VTT, and MSVD. Code and models will be made publicly available.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142487598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CogCartoon: Towards Practical Story Visualization CogCartoon:实现实用的故事可视化
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-21 DOI: 10.1007/s11263-024-02267-5
Zhongyang Zhu, Jie Tang

The state-of-the-art methods for story visualization demonstrate a significant demand for training data and storage, as well as limited flexibility in story presentation, thereby rendering them impractical for real-world applications. We introduce CogCartoon, a practical story visualization method based on pre-trained diffusion models. To alleviate dependence on data and storage, we propose an innovative strategy of character-plugin generation that can represent a specific character as a compact 316 KB plugin by using a few training samples. To facilitate enhanced flexibility, we employ a strategy of plugin-guided and layout-guided inference, enabling users to seamlessly incorporate new characters and custom layouts into the generated image results at their convenience. We have conducted comprehensive qualitative and quantitative studies, providing compelling evidence for the superiority of CogCartoon over existing methodologies. Moreover, CogCartoon demonstrates its power in tackling challenging tasks, including long story visualization and realistic style story visualization.

最先进的故事可视化方法需要大量的训练数据和存储空间,而且故事呈现的灵活性有限,因此在实际应用中并不实用。我们介绍了基于预训练扩散模型的实用故事可视化方法 CogCartoon。为了减轻对数据和存储的依赖,我们提出了一种创新的角色插件生成策略,只需使用少量训练样本,即可将特定角色表示为一个紧凑的 316 KB 插件。为了提高灵活性,我们采用了插件引导推理和布局引导推理的策略,使用户能够在方便的时候将新字符和自定义布局无缝纳入生成的图像结果中。我们进行了全面的定性和定量研究,为 CogCartoon 优于现有方法提供了令人信服的证据。此外,CogCartoon 还展示了其在处理长篇故事可视化和现实风格故事可视化等挑战性任务时的强大功能。
{"title":"CogCartoon: Towards Practical Story Visualization","authors":"Zhongyang Zhu, Jie Tang","doi":"10.1007/s11263-024-02267-5","DOIUrl":"https://doi.org/10.1007/s11263-024-02267-5","url":null,"abstract":"<p>The state-of-the-art methods for story visualization demonstrate a significant demand for training data and storage, as well as limited flexibility in story presentation, thereby rendering them impractical for real-world applications. We introduce CogCartoon, a practical story visualization method based on pre-trained diffusion models. To alleviate dependence on data and storage, we propose an innovative strategy of character-plugin generation that can represent a specific character as a compact 316 KB plugin by using a few training samples. To facilitate enhanced flexibility, we employ a strategy of plugin-guided and layout-guided inference, enabling users to seamlessly incorporate new characters and custom layouts into the generated image results at their convenience. We have conducted comprehensive qualitative and quantitative studies, providing compelling evidence for the superiority of CogCartoon over existing methodologies. Moreover, CogCartoon demonstrates its power in tackling challenging tasks, including long story visualization and realistic style story visualization.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142451997","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing AgMTR:用于遥感中少镜头分割的代理挖掘变换器
IF 19.5 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-10-21 DOI: 10.1007/s11263-024-02252-y
Hanbo Bi, Yingchao Feng, Yongqiang Mao, Jianning Pei, Wenhui Diao, Hongqi Wang, Xian Sun

Few-shot Segmentation aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images). Previous schemes would leverage the similarity between support-query pixel pairs to construct the pixel-level semantic correlation. However, in remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches, resulting in semantic ambiguity between the query foreground (FG) and background (BG) pixels. To tackle this problem, we propose a novel Agent Mining Transformer, which adaptively mines a set of local-aware agents to construct agent-level semantic correlation. Compared with pixel-level semantics, the given agents are equipped with local-contextual information and possess a broader receptive field. At this point, different query pixels can selectively aggregate the fine-grained local semantics of different agents, thereby enhancing the semantic clarity between query FG and BG pixels. Concretely, the Agent Learning Encoder is first proposed to erect the optimal transport plan that arranges different agents to aggregate support semantics under different local regions. Then, for further optimizing the agents, the Agent Aggregation Decoder and the Semantic Alignment Decoder are constructed to break through the limited support set for mining valuable class-specific semantics from unlabeled data sources and the query image itself, respectively. Extensive experiments on the remote sensing benchmark iSAID indicate that the proposed method achieves state-of-the-art performance. Surprisingly, our method remains quite competitive when extended to more common natural scenarios, i.e., PASCAL-(5^i) and COCO-(20^{i}).

少量图像分割法(Few-shot Segmentation)旨在通过少量标注样本(即支持图像)来分割查询图像中感兴趣的对象。以前的方案会利用支持-查询像素对之间的相似性来构建像素级语义相关性。然而,在类内差异极大、背景杂乱的遥感场景中,这种像素级相关性可能会产生巨大的不匹配,从而导致查询前景(FG)和背景(BG)像素之间的语义模糊。为了解决这个问题,我们提出了一种新颖的代理挖掘转换器(Agent Mining Transformer),它能自适应地挖掘一组本地感知代理,从而构建代理级语义相关性。与像素级语义相比,给定的代理具有本地上下文信息,并拥有更广阔的接受域。此时,不同的查询像素可以选择性地聚合不同代理的细粒度本地语义,从而提高查询 FG 和 BG 像素之间的语义清晰度。具体来说,首先提出代理学习编码器,以建立最佳传输计划,安排不同的代理聚合不同局部区域下的支持语义。然后,为了进一步优化代理,构建了代理聚合解码器(Agent Aggregation Decoder)和语义对齐解码器(Semantic Alignment Decoder),以突破有限的支持集,分别从未标明的数据源和查询图像本身挖掘有价值的特定类别语义。在遥感基准 iSAID 上进行的广泛实验表明,所提出的方法达到了最先进的性能。令人惊讶的是,当我们的方法扩展到更常见的自然场景(即 PASCAL-(5^i) 和 COCO-(20^{i}))时,仍然具有相当的竞争力。
{"title":"AgMTR: Agent Mining Transformer for Few-Shot Segmentation in Remote Sensing","authors":"Hanbo Bi, Yingchao Feng, Yongqiang Mao, Jianning Pei, Wenhui Diao, Hongqi Wang, Xian Sun","doi":"10.1007/s11263-024-02252-y","DOIUrl":"https://doi.org/10.1007/s11263-024-02252-y","url":null,"abstract":"<p>Few-shot Segmentation aims to segment the interested objects in the query image with just a handful of labeled samples (i.e., support images). Previous schemes would leverage the similarity between support-query pixel pairs to construct the pixel-level semantic correlation. However, in remote sensing scenarios with extreme intra-class variations and cluttered backgrounds, such pixel-level correlations may produce tremendous mismatches, resulting in semantic ambiguity between the query foreground (FG) and background (BG) pixels. To tackle this problem, we propose a novel Agent Mining Transformer, which adaptively mines a set of local-aware agents to construct agent-level semantic correlation. Compared with pixel-level semantics, the given agents are equipped with local-contextual information and possess a broader receptive field. At this point, different query pixels can selectively aggregate the fine-grained local semantics of different agents, thereby enhancing the semantic clarity between query FG and BG pixels. Concretely, the Agent Learning Encoder is first proposed to erect the optimal transport plan that arranges different agents to aggregate support semantics under different local regions. Then, for further optimizing the agents, the Agent Aggregation Decoder and the Semantic Alignment Decoder are constructed to break through the limited support set for mining valuable class-specific semantics from unlabeled data sources and the query image itself, respectively. Extensive experiments on the remote sensing benchmark iSAID indicate that the proposed method achieves state-of-the-art performance. Surprisingly, our method remains quite competitive when extended to more common natural scenarios, i.e., PASCAL-<span>(5^i)</span> and COCO-<span>(20^{i})</span>.\u0000</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":null,"pages":null},"PeriodicalIF":19.5,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1