首页 > 最新文献

arXiv - CS - Multimedia最新文献

英文 中文
Efficient Low-Resolution Face Recognition via Bridge Distillation 通过桥式蒸馏实现高效低分辨率人脸识别
Pub Date : 2024-09-18 DOI: arxiv-2409.11786
Shiming Ge, Shengwei Zhao, Chenyu Li, Yu Zhang, Jia Li
Face recognition in the wild is now advancing towards light-weight models,fast inference speed and resolution-adapted capability. In this paper, wepropose a bridge distillation approach to turn a complex face model pretrainedon private high-resolution faces into a light-weight one for low-resolutionface recognition. In our approach, such a cross-dataset resolution-adaptedknowledge transfer problem is solved via two-step distillation. In the firststep, we conduct cross-dataset distillation to transfer the prior knowledgefrom private high-resolution faces to public high-resolution faces and generatecompact and discriminative features. In the second step, the resolution-adapteddistillation is conducted to further transfer the prior knowledge to syntheticlow-resolution faces via multi-task learning. By learning low-resolution facerepresentations and mimicking the adapted high-resolution knowledge, alight-weight student model can be constructed with high efficiency andpromising accuracy in recognizing low-resolution faces. Experimental resultsshow that the student model performs impressively in recognizing low-resolutionfaces with only 0.21M parameters and 0.057MB memory. Meanwhile, its speedreaches up to 14,705, ~934 and 763 faces per second on GPU, CPU and mobilephone, respectively.
目前,野生人脸识别正朝着轻量级模型、快速推理速度和分辨率适应能力的方向发展。在本文中,我们提出了一种桥式蒸馏方法,将在私人高分辨率人脸上预先训练好的复杂人脸模型转化为轻量级模型,用于低分辨率人脸识别。在我们的方法中,这种跨数据集分辨率适应知识转移问题是通过两步蒸馏法来解决的。第一步,我们进行跨数据集蒸馏,将私有高分辨率人脸中的先验知识转移到公共高分辨率人脸中,并生成紧凑且具有辨别力的特征。第二步,通过多任务学习,进行分辨率适应蒸馏,进一步将先验知识转移到合成低分辨率人脸。通过学习低分辨率的人脸图像并模仿经过调整的高分辨率知识,可以构建一个轻量级的学生模型,该模型识别低分辨率人脸的效率和准确率都很高。实验结果表明,该学生模型在识别低分辨率人脸时表现出色,仅需 0.21M 参数和 0.057MB 内存。同时,在 GPU、CPU 和手机上的识别速度分别达到每秒 14705、934 和 763 张人脸。
{"title":"Efficient Low-Resolution Face Recognition via Bridge Distillation","authors":"Shiming Ge, Shengwei Zhao, Chenyu Li, Yu Zhang, Jia Li","doi":"arxiv-2409.11786","DOIUrl":"https://doi.org/arxiv-2409.11786","url":null,"abstract":"Face recognition in the wild is now advancing towards light-weight models,\u0000fast inference speed and resolution-adapted capability. In this paper, we\u0000propose a bridge distillation approach to turn a complex face model pretrained\u0000on private high-resolution faces into a light-weight one for low-resolution\u0000face recognition. In our approach, such a cross-dataset resolution-adapted\u0000knowledge transfer problem is solved via two-step distillation. In the first\u0000step, we conduct cross-dataset distillation to transfer the prior knowledge\u0000from private high-resolution faces to public high-resolution faces and generate\u0000compact and discriminative features. In the second step, the resolution-adapted\u0000distillation is conducted to further transfer the prior knowledge to synthetic\u0000low-resolution faces via multi-task learning. By learning low-resolution face\u0000representations and mimicking the adapted high-resolution knowledge, a\u0000light-weight student model can be constructed with high efficiency and\u0000promising accuracy in recognizing low-resolution faces. Experimental results\u0000show that the student model performs impressively in recognizing low-resolution\u0000faces with only 0.21M parameters and 0.057MB memory. Meanwhile, its speed\u0000reaches up to 14,705, ~934 and 763 faces per second on GPU, CPU and mobile\u0000phone, respectively.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"51 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion MoRAG -- 针对人体运动的多融合检索增强生成技术
Pub Date : 2024-09-18 DOI: arxiv-2409.12140
Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla
We introduce MoRAG, a novel multi-part fusion based retrieval-augmentedgeneration strategy for text-based human motion generation. The method enhancesmotion diffusion models by leveraging additional knowledge obtained through animproved motion retrieval process. By effectively prompting large languagemodels (LLMs), we address spelling errors and rephrasing issues in motionretrieval. Our approach utilizes a multi-part retrieval strategy to improve thegeneralizability of motion retrieval across the language space. We creatediverse samples through the spatial composition of the retrieved motions.Furthermore, by utilizing low-level, part-specific motion information, we canconstruct motion samples for unseen text descriptions. Our experimentsdemonstrate that our framework can serve as a plug-and-play module, improvingthe performance of motion diffusion models. Code, pretrained models and samplevideos will be made available at: https://motion-rag.github.io/
我们介绍了 MoRAG,这是一种新颖的基于多部分融合的检索-增强生成策略,适用于基于文本的人体动作生成。该方法利用通过动画改进的动作检索过程获得的额外知识来增强动作扩散模型。通过有效地提示大型语言模型(LLM),我们解决了运动检索中的拼写错误和重新措辞问题。我们的方法采用了多部分检索策略,以提高运动检索在整个语言空间的通用性。此外,通过利用低层次、特定部分的运动信息,我们可以为未见的文本描述构建运动样本。我们的实验证明,我们的框架可以作为即插即用模块,提高运动扩散模型的性能。代码、预训练模型和样本视频可在以下网址获取: https://motion-rag.github.io/
{"title":"MoRAG -- Multi-Fusion Retrieval Augmented Generation for Human Motion","authors":"Kalakonda Sai Shashank, Shubh Maheshwari, Ravi Kiran Sarvadevabhatla","doi":"arxiv-2409.12140","DOIUrl":"https://doi.org/arxiv-2409.12140","url":null,"abstract":"We introduce MoRAG, a novel multi-part fusion based retrieval-augmented\u0000generation strategy for text-based human motion generation. The method enhances\u0000motion diffusion models by leveraging additional knowledge obtained through an\u0000improved motion retrieval process. By effectively prompting large language\u0000models (LLMs), we address spelling errors and rephrasing issues in motion\u0000retrieval. Our approach utilizes a multi-part retrieval strategy to improve the\u0000generalizability of motion retrieval across the language space. We create\u0000diverse samples through the spatial composition of the retrieved motions.\u0000Furthermore, by utilizing low-level, part-specific motion information, we can\u0000construct motion samples for unseen text descriptions. Our experiments\u0000demonstrate that our framework can serve as a plug-and-play module, improving\u0000the performance of motion diffusion models. Code, pretrained models and sample\u0000videos will be made available at: https://motion-rag.github.io/","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"10 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vista3D: Unravel the 3D Darkside of a Single Image Vista3D:揭开单张图像的 3D 黑幕
Pub Date : 2024-09-18 DOI: arxiv-2409.12193
Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang
We embark on the age-old quest: unveiling the hidden dimensions of objectsfrom mere glimpses of their visible parts. To address this, we present Vista3D,a framework that realizes swift and consistent 3D generation within a mere 5minutes. At the heart of Vista3D lies a two-phase approach: the coarse phaseand the fine phase. In the coarse phase, we rapidly generate initial geometrywith Gaussian Splatting from a single image. In the fine phase, we extract aSigned Distance Function (SDF) directly from learned Gaussian Splatting,optimizing it with a differentiable isosurface representation. Furthermore, itelevates the quality of generation by using a disentangled representation withtwo independent implicit functions to capture both visible and obscured aspectsof objects. Additionally, it harmonizes gradients from 2D diffusion prior with3D-aware diffusion priors by angular diffusion prior composition. Throughextensive evaluation, we demonstrate that Vista3D effectively sustains abalance between the consistency and diversity of the generated 3D objects.Demos and code will be available at https://github.com/florinshen/Vista3D.
我们开始了一项古老的探索:从物体可见部分的一瞥中揭示其隐藏的维度。为此,我们推出了 Vista3D,这是一个能在短短 5 分钟内实现快速、一致的三维生成的框架。Vista3D 的核心是一种两阶段方法:粗略阶段和精细阶段。在粗略阶段,我们利用高斯拼接技术从单张图像中快速生成初始几何图形。在精细阶段,我们直接从学习的高斯拼接法中提取有符号距离函数(SDF),并用可变等值面表示法对其进行优化。此外,它还通过使用两个独立隐含函数的分离表示来捕捉物体的可见和不可见部分,从而提高了生成质量。此外,它还通过角度扩散先验组合协调了来自二维扩散先验和三维感知扩散先验的梯度。通过广泛的评估,我们证明了 Vista3D 能够有效地维持生成的三维物体的一致性和多样性之间的平衡。演示和代码将发布在 https://github.com/florinshen/Vista3D 网站上。
{"title":"Vista3D: Unravel the 3D Darkside of a Single Image","authors":"Qiuhong Shen, Xingyi Yang, Michael Bi Mi, Xinchao Wang","doi":"arxiv-2409.12193","DOIUrl":"https://doi.org/arxiv-2409.12193","url":null,"abstract":"We embark on the age-old quest: unveiling the hidden dimensions of objects\u0000from mere glimpses of their visible parts. To address this, we present Vista3D,\u0000a framework that realizes swift and consistent 3D generation within a mere 5\u0000minutes. At the heart of Vista3D lies a two-phase approach: the coarse phase\u0000and the fine phase. In the coarse phase, we rapidly generate initial geometry\u0000with Gaussian Splatting from a single image. In the fine phase, we extract a\u0000Signed Distance Function (SDF) directly from learned Gaussian Splatting,\u0000optimizing it with a differentiable isosurface representation. Furthermore, it\u0000elevates the quality of generation by using a disentangled representation with\u0000two independent implicit functions to capture both visible and obscured aspects\u0000of objects. Additionally, it harmonizes gradients from 2D diffusion prior with\u00003D-aware diffusion priors by angular diffusion prior composition. Through\u0000extensive evaluation, we demonstrate that Vista3D effectively sustains a\u0000balance between the consistency and diversity of the generated 3D objects.\u0000Demos and code will be available at https://github.com/florinshen/Vista3D.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NVLM: Open Frontier-Class Multimodal LLMs NVLM:开放的前沿类多模态 LLMs
Pub Date : 2024-09-17 DOI: arxiv-2409.11402
Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping
We introduce NVLM 1.0, a family of frontier-class multimodal large languagemodels (LLMs) that achieve state-of-the-art results on vision-language tasks,rivaling the leading proprietary models (e.g., GPT-4o) and open-access models(e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improvedtext-only performance over its LLM backbone after multimodal training. In termsof model design, we perform a comprehensive comparison between decoder-onlymultimodal LLMs (e.g., LLaVA) and cross-attention-based models (e.g.,Flamingo). Based on the strengths and weaknesses of both approaches, we proposea novel architecture that enhances both training efficiency and multimodalreasoning capabilities. Furthermore, we introduce a 1-D tile-tagging design fortile-based dynamic high-resolution images, which significantly boostsperformance on multimodal reasoning and OCR-related tasks. Regarding trainingdata, we meticulously curate and provide detailed information on our multimodalpretraining and supervised fine-tuning datasets. Our findings indicate thatdataset quality and task diversity are more important than scale, even duringthe pretraining phase, across all architectures. Notably, we developproduction-grade multimodality for the NVLM-1.0 models, enabling them to excelin vision-language tasks while maintaining and even improving text-onlyperformance compared to their LLM backbones. To achieve this, we craft andintegrate a high-quality text-only dataset into multimodal training, alongsidea substantial amount of multimodal math and reasoning data, leading to enhancedmath and coding capabilities across modalities. To advance research in thefield, we are releasing the model weights and will open-source the code for thecommunity: https://nvlm-project.github.io/.
我们介绍了前沿级多模态大型语言模型(LLM)系列 NVLM 1.0,它在视觉语言任务上取得了最先进的结果,可与领先的专有模型(如 GPT-4o)和开放存取模型(如 Llama 3-V 405B 和 InternVL 2)相媲美。值得注意的是,经过多模态训练后,NVLM 1.0 的纯文本性能比其 LLM 骨干模型有所提高。在模型设计方面,我们对纯解码器多模态 LLM(如 LLaVA)和基于交叉注意力的模型(如 Flamingo)进行了全面比较。基于这两种方法的优缺点,我们提出了一种新型架构,既提高了训练效率,又增强了多模态推理能力。此外,我们还引入了基于瓦片的一维动态高分辨率图像标记设计,从而显著提高了多模态推理和 OCR 相关任务的性能。在训练数据方面,我们精心策划并提供了多模态训练数据集和监督微调数据集的详细信息。我们的研究结果表明,在所有架构中,数据集的质量和任务多样性比规模更重要,即使在预训练阶段也是如此。值得注意的是,我们为 NVLM-1.0 模型开发了生产级多模态模型,使其能够胜任视觉-语言任务,同时保持甚至提高了纯文本性能(与 LLM 骨干模型相比)。为了实现这一目标,我们制作了一个高质量的纯文本数据集,并将其与大量多模态数学和推理数据整合到多模态训练中,从而增强了跨模态的数学和编码能力。为了推动该领域的研究,我们将发布模型权重,并为社区开源代码:https://nvlm-project.github.io/。
{"title":"NVLM: Open Frontier-Class Multimodal LLMs","authors":"Wenliang Dai, Nayeon Lee, Boxin Wang, Zhuoling Yang, Zihan Liu, Jon Barker, Tuomas Rintamaki, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping","doi":"arxiv-2409.11402","DOIUrl":"https://doi.org/arxiv-2409.11402","url":null,"abstract":"We introduce NVLM 1.0, a family of frontier-class multimodal large language\u0000models (LLMs) that achieve state-of-the-art results on vision-language tasks,\u0000rivaling the leading proprietary models (e.g., GPT-4o) and open-access models\u0000(e.g., Llama 3-V 405B and InternVL 2). Remarkably, NVLM 1.0 shows improved\u0000text-only performance over its LLM backbone after multimodal training. In terms\u0000of model design, we perform a comprehensive comparison between decoder-only\u0000multimodal LLMs (e.g., LLaVA) and cross-attention-based models (e.g.,\u0000Flamingo). Based on the strengths and weaknesses of both approaches, we propose\u0000a novel architecture that enhances both training efficiency and multimodal\u0000reasoning capabilities. Furthermore, we introduce a 1-D tile-tagging design for\u0000tile-based dynamic high-resolution images, which significantly boosts\u0000performance on multimodal reasoning and OCR-related tasks. Regarding training\u0000data, we meticulously curate and provide detailed information on our multimodal\u0000pretraining and supervised fine-tuning datasets. Our findings indicate that\u0000dataset quality and task diversity are more important than scale, even during\u0000the pretraining phase, across all architectures. Notably, we develop\u0000production-grade multimodality for the NVLM-1.0 models, enabling them to excel\u0000in vision-language tasks while maintaining and even improving text-only\u0000performance compared to their LLM backbones. To achieve this, we craft and\u0000integrate a high-quality text-only dataset into multimodal training, alongside\u0000a substantial amount of multimodal math and reasoning data, leading to enhanced\u0000math and coding capabilities across modalities. To advance research in the\u0000field, we are releasing the model weights and will open-source the code for the\u0000community: https://nvlm-project.github.io/.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints 通过多层次对比约束增强无遗忘的少拍分类功能
Pub Date : 2024-09-17 DOI: arxiv-2409.11286
Bingzhi Chen, Haoming Zhou, Yishu Liu, Biqing Zeng, Jiahui Pan, Guangming Lu
Most recent few-shot learning approaches are based on meta-learning withepisodic training. However, prior studies encounter two crucial problems: (1)textit{the presence of inductive bias}, and (2) textit{the occurrence ofcatastrophic forgetting}. In this paper, we propose a novel Multi-LevelContrastive Constraints (MLCC) framework, that jointly integrateswithin-episode learning and across-episode learning into a unified interactivelearning paradigm to solve these issues. Specifically, we employ a space-awareinteraction modeling scheme to explore the correct inductive paradigms for eachclass between within-episode similarity/dis-similarity distributions.Additionally, with the aim of better utilizing former prior knowledge, across-stage distribution adaption strategy is designed to align theacross-episode distributions from different time stages, thus reducing thesemantic gap between existing and past prediction distribution. Extensiveexperiments on multiple few-shot datasets demonstrate the consistentsuperiority of MLCC approach over the existing state-of-the-art baselines.
近来,大多数 "少量学习 "方法都是基于 "元学习"(meta-learning)和 "序列训练"(isodic training)。然而,之前的研究遇到了两个关键问题:(1)(textit{存在归纳偏差};(2)(textit{发生灾难性遗忘}。在本文中,我们提出了一个新颖的多层次对比约束(MLCC)框架,它将集内学习和跨集学习联合整合到一个统一的交互式学习范式中,以解决这些问题。此外,为了更好地利用以前的先验知识,我们还设计了跨阶段分布自适应策略,以调整不同时间阶段的跨集分布,从而缩小现有预测分布与过去预测分布之间的语义差距。在多个少量数据集上进行的广泛实验证明,MLCC 方法始终优于现有的最先进基线。
{"title":"Enhancing Few-Shot Classification without Forgetting through Multi-Level Contrastive Constraints","authors":"Bingzhi Chen, Haoming Zhou, Yishu Liu, Biqing Zeng, Jiahui Pan, Guangming Lu","doi":"arxiv-2409.11286","DOIUrl":"https://doi.org/arxiv-2409.11286","url":null,"abstract":"Most recent few-shot learning approaches are based on meta-learning with\u0000episodic training. However, prior studies encounter two crucial problems: (1)\u0000textit{the presence of inductive bias}, and (2) textit{the occurrence of\u0000catastrophic forgetting}. In this paper, we propose a novel Multi-Level\u0000Contrastive Constraints (MLCC) framework, that jointly integrates\u0000within-episode learning and across-episode learning into a unified interactive\u0000learning paradigm to solve these issues. Specifically, we employ a space-aware\u0000interaction modeling scheme to explore the correct inductive paradigms for each\u0000class between within-episode similarity/dis-similarity distributions.\u0000Additionally, with the aim of better utilizing former prior knowledge, a\u0000cross-stage distribution adaption strategy is designed to align the\u0000across-episode distributions from different time stages, thus reducing the\u0000semantic gap between existing and past prediction distribution. Extensive\u0000experiments on multiple few-shot datasets demonstrate the consistent\u0000superiority of MLCC approach over the existing state-of-the-art baselines.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"201 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs 少即是多:一种简单而有效的标记减少方法,可实现高效的多模态 LLM
Pub Date : 2024-09-17 DOI: arxiv-2409.10994
Dingjie Song, Wenjun Wang, Shunian Chen, Xidong Wang, Michael Guan, Benyou Wang
The rapid advancement of Multimodal Large Language Models (MLLMs) has led toremarkable performances across various domains. However, this progress isaccompanied by a substantial surge in the resource consumption of these models.We address this pressing issue by introducing a new approach, Token Reductionusing CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs withoutsacrificing their performance. Inspired by human attention patterns in VisualQuestion Answering (VQA) tasks, TRIM presents a fresh perspective on theselection and reduction of image tokens. The TRIM method has been extensivelytested across 12 datasets, and the results demonstrate a significant reductionin computational overhead while maintaining a consistent level of performance.This research marks a critical stride in efficient MLLM development, promotinggreater accessibility and sustainability of high-performing models.
多模态大语言模型(MLLMs)的快速发展使其在各个领域都取得了令人瞩目的成绩。为了解决这一紧迫问题,我们引入了一种新方法--使用 CLIP 度量的标记减少法(TRIM),旨在提高 MLLM 的效率,同时不影响其性能。受视觉问题解答(VQA)任务中人类注意力模式的启发,TRIM 为图像标记的选择和减少提供了一个全新的视角。TRIM 方法已在 12 个数据集上进行了广泛测试,结果表明在保持性能水平一致的同时显著降低了计算开销。这项研究标志着在高效 MLLM 开发方面迈出了关键一步,促进了高性能模型的更大可及性和可持续性。
{"title":"Less is More: A Simple yet Effective Token Reduction Method for Efficient Multi-modal LLMs","authors":"Dingjie Song, Wenjun Wang, Shunian Chen, Xidong Wang, Michael Guan, Benyou Wang","doi":"arxiv-2409.10994","DOIUrl":"https://doi.org/arxiv-2409.10994","url":null,"abstract":"The rapid advancement of Multimodal Large Language Models (MLLMs) has led to\u0000remarkable performances across various domains. However, this progress is\u0000accompanied by a substantial surge in the resource consumption of these models.\u0000We address this pressing issue by introducing a new approach, Token Reduction\u0000using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without\u0000sacrificing their performance. Inspired by human attention patterns in Visual\u0000Question Answering (VQA) tasks, TRIM presents a fresh perspective on the\u0000selection and reduction of image tokens. The TRIM method has been extensively\u0000tested across 12 datasets, and the results demonstrate a significant reduction\u0000in computational overhead while maintaining a consistent level of performance.\u0000This research marks a critical stride in efficient MLLM development, promoting\u0000greater accessibility and sustainability of high-performing models.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"54 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269113","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Benchmarking VLMs' Reasoning About Persuasive Atypical Images 以 VLM 对具有说服力的非典型图像的推理能力为基准
Pub Date : 2024-09-16 DOI: arxiv-2409.10719
Sina Malakouti, Aysan Aghazadeh, Ashmit Khandelwal, Adriana Kovashka
Vision language models (VLMs) have shown strong zero-shot generalizationacross various tasks, especially when integrated with large language models(LLMs). However, their ability to comprehend rhetorical and persuasive visualmedia, such as advertisements, remains understudied. Ads often employ atypicalimagery, using surprising object juxtapositions to convey shared properties.For example, Fig. 1 (e) shows a beer with a feather-like texture. This requiresadvanced reasoning to deduce that this atypical representation signifies thebeer's lightness. We introduce three novel tasks, Multi-label AtypicalityClassification, Atypicality Statement Retrieval, and Aypical ObjectRecognition, to benchmark VLMs' understanding of atypicality in persuasiveimages. We evaluate how well VLMs use atypicality to infer an ad's message andtest their reasoning abilities by employing semantically challenging negatives.Finally, we pioneer atypicality-aware verbalization by extracting comprehensiveimage descriptions sensitive to atypical elements. Our findings reveal that:(1) VLMs lack advanced reasoning capabilities compared to LLMs; (2) simple,effective strategies can extract atypicality-aware information, leading tocomprehensive image verbalization; (3) atypicality aids persuasiveadvertisement understanding. Code and data will be made available.
视觉语言模型(VLMs)在各种任务中都表现出很强的零点泛化能力,尤其是与大型语言模型(LLMs)集成时。然而,视觉语言模型理解修辞性和劝说性视觉媒体(如广告)的能力仍未得到充分研究。广告通常采用非典型图像,利用令人惊讶的物体并置来传达共同属性。例如,图 1 (e) 显示了一种具有羽毛般质感的啤酒。这需要高级推理才能推断出这种非典型的表现形式代表了啤酒的轻盈。我们引入了三个新任务:多标签非典型性分类、非典型性语句检索和非典型对象识别,以衡量 VLMs 对说服性图像中的非典型性的理解。最后,我们通过提取对非典型元素敏感的综合图像描述,开创了非典型感知语言化的先河。我们的研究结果表明:(1)与 LLM 相比,VLM 缺乏高级推理能力;(2)简单有效的策略可以提取非典型感知信息,从而实现全面的图像语言化;(3)非典型性有助于说服性广告的理解。将提供代码和数据。
{"title":"Benchmarking VLMs' Reasoning About Persuasive Atypical Images","authors":"Sina Malakouti, Aysan Aghazadeh, Ashmit Khandelwal, Adriana Kovashka","doi":"arxiv-2409.10719","DOIUrl":"https://doi.org/arxiv-2409.10719","url":null,"abstract":"Vision language models (VLMs) have shown strong zero-shot generalization\u0000across various tasks, especially when integrated with large language models\u0000(LLMs). However, their ability to comprehend rhetorical and persuasive visual\u0000media, such as advertisements, remains understudied. Ads often employ atypical\u0000imagery, using surprising object juxtapositions to convey shared properties.\u0000For example, Fig. 1 (e) shows a beer with a feather-like texture. This requires\u0000advanced reasoning to deduce that this atypical representation signifies the\u0000beer's lightness. We introduce three novel tasks, Multi-label Atypicality\u0000Classification, Atypicality Statement Retrieval, and Aypical Object\u0000Recognition, to benchmark VLMs' understanding of atypicality in persuasive\u0000images. We evaluate how well VLMs use atypicality to infer an ad's message and\u0000test their reasoning abilities by employing semantically challenging negatives.\u0000Finally, we pioneer atypicality-aware verbalization by extracting comprehensive\u0000image descriptions sensitive to atypical elements. Our findings reveal that:\u0000(1) VLMs lack advanced reasoning capabilities compared to LLMs; (2) simple,\u0000effective strategies can extract atypicality-aware information, leading to\u0000comprehensive image verbalization; (3) atypicality aids persuasive\u0000advertisement understanding. Code and data will be made available.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"36 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269109","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models 拟合和剪枝:多模态大型语言模型的快速免训练视觉标记剪枝
Pub Date : 2024-09-16 DOI: arxiv-2409.10197
Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou
Recent progress in Multimodal Large Language Models(MLLMs) often use largeimage tokens to compensate the visual shortcoming of MLLMs, which not onlyexhibits obvious redundancy but also greatly exacerbates the already highcomputation. Token pruning is an effective solution for speeding up MLLMs, butwhen and how to drop tokens still remains a challenge. In this paper, wepropose a novel and training-free approach for the effective visual tokenpruning of MLLMs, termed FitPrune, which can quickly produce a complete pruningrecipe for MLLMs according to a pre-defined budget. Specifically, FitPruneconsiders token pruning as a statistical problem of MLLM and its objective isto find out an optimal pruning scheme that can minimize the divergence of theattention distributions before and after pruning. In practice, FitPrune can bequickly accomplished based on the attention statistics from a small batch ofinference data, avoiding the expensive trials of MLLMs. According to thepruning recipe, an MLLM can directly remove the redundant visual tokens ofdifferent examples during inference. To validate FitPrune, we apply it to a setof recent MLLMs, including LLaVA-1.5, LLaVA-HR and LLaVA-NEXT, and conductextensive experiments on a set of benchmarks. The experimental results showthat our FitPrune can not only reduce the computational complexity to a largeextent, while retaining high performance, e.g., -54.9% FLOPs for LLaVA-NEXTwith only 0.5% accuracy drop. Notably, the pruning recipe can be obtained inabout 5 minutes. Our code is available at https://github.com/ywh187/FitPrune.
多模态大语言模型(MLLMs)的最新研究进展通常使用大图像标记来弥补 MLLMs 在视觉上的不足,这不仅表现出明显的冗余,而且大大加剧了本已很高的计算量。标记剪枝是加速 MLLM 的有效解决方案,但何时以及如何丢弃标记仍是一个难题。在本文中,我们提出了一种新颖且无需训练的方法,用于对 MLLMs 进行有效的视觉标记剪枝,称为 FitPrune,它可以根据预先定义的预算,快速为 MLLMs 生成完整的剪枝方案。具体来说,FitPrun 将标记剪枝视为 MLLM 的一个统计问题,其目标是找出一个最优剪枝方案,使剪枝前后注意力分布的发散最小。在实践中,FitPrune 可以根据小批量推理数据的注意力统计快速完成,避免了 MLLM 昂贵的试验费用。根据剪枝配方,MLLM 可以直接去除推理过程中不同示例的冗余视觉标记。为了验证 FitPrune 的有效性,我们将其应用于一组最新的 MLLM,包括 LLaVA-1.5、LLaVA-HR 和 LLaVA-NEXT,并在一组基准上进行了广泛的实验。实验结果表明,我们的FitPrune不仅能在很大程度上降低计算复杂度,同时还能保持较高的性能,例如,LLaVA-NEXT的FLOPS为-54.9%,而精度下降仅为0.5%。值得注意的是,剪枝配方可以在大约 5 分钟内获得。我们的代码见 https://github.com/ywh187/FitPrune。
{"title":"Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models","authors":"Weihao Ye, Qiong Wu, Wenhao Lin, Yiyi Zhou","doi":"arxiv-2409.10197","DOIUrl":"https://doi.org/arxiv-2409.10197","url":null,"abstract":"Recent progress in Multimodal Large Language Models(MLLMs) often use large\u0000image tokens to compensate the visual shortcoming of MLLMs, which not only\u0000exhibits obvious redundancy but also greatly exacerbates the already high\u0000computation. Token pruning is an effective solution for speeding up MLLMs, but\u0000when and how to drop tokens still remains a challenge. In this paper, we\u0000propose a novel and training-free approach for the effective visual token\u0000pruning of MLLMs, termed FitPrune, which can quickly produce a complete pruning\u0000recipe for MLLMs according to a pre-defined budget. Specifically, FitPrune\u0000considers token pruning as a statistical problem of MLLM and its objective is\u0000to find out an optimal pruning scheme that can minimize the divergence of the\u0000attention distributions before and after pruning. In practice, FitPrune can be\u0000quickly accomplished based on the attention statistics from a small batch of\u0000inference data, avoiding the expensive trials of MLLMs. According to the\u0000pruning recipe, an MLLM can directly remove the redundant visual tokens of\u0000different examples during inference. To validate FitPrune, we apply it to a set\u0000of recent MLLMs, including LLaVA-1.5, LLaVA-HR and LLaVA-NEXT, and conduct\u0000extensive experiments on a set of benchmarks. The experimental results show\u0000that our FitPrune can not only reduce the computational complexity to a large\u0000extent, while retaining high performance, e.g., -54.9% FLOPs for LLaVA-NEXT\u0000with only 0.5% accuracy drop. Notably, the pruning recipe can be obtained in\u0000about 5 minutes. Our code is available at https://github.com/ywh187/FitPrune.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-view Hypergraph-based Contrastive Learning Model for Cold-Start Micro-video Recommendation 基于多视角超图的对比学习模型用于冷启动微视频推荐
Pub Date : 2024-09-15 DOI: arxiv-2409.09638
Sisuo Lyu, Xiuze Zhou, Xuming Hu
With the widespread use of mobile devices and the rapid growth of micro-videoplatforms such as TikTok and Kwai, the demand for personalized micro-videorecommendation systems has significantly increased. Micro-videos typicallycontain diverse information, such as textual metadata, visual cues (e.g., coverimages), and dynamic video content, significantly affecting user interactionand engagement patterns. However, most existing approaches often suffer fromthe problem of over-smoothing, which limits their ability to capturecomprehensive interaction information effectively. Additionally, cold-startscenarios present ongoing challenges due to sparse interaction data and theunderutilization of available interaction signals. To address these issues, we propose a Multi-view Hypergraph-based Contrastivelearning model for cold-start micro-video Recommendation (MHCR). MHCRintroduces a multi-view multimodal feature extraction layer to captureinteraction signals from various perspectives and incorporates multi-viewself-supervised learning tasks to provide additional supervisory signals.Through extensive experiments on two real-world datasets, we show that MHCRsignificantly outperforms existing video recommendation models and effectivelymitigates cold-start challenges. Our code is available athttps://anonymous.4open.science/r/MHCR-02EF.
随着移动设备的广泛使用以及 TikTok 和 Kwai 等微视频平台的快速发展,对个性化微视频推荐系统的需求显著增加。微视频通常包含多种信息,如文本元数据、视觉线索(如封面图片)和动态视频内容,这极大地影响了用户的互动和参与模式。然而,大多数现有方法往往存在过度平滑的问题,这限制了它们有效捕捉全面交互信息的能力。此外,由于交互数据稀少和对可用交互信号的利用不足,冷启动情景也带来了持续的挑战。为了解决这些问题,我们提出了基于多视角超图的冷启动微视频推荐对比学习模型(MHCR)。MHCR 引入了多视角多模态特征提取层来捕捉来自不同视角的交互信号,并结合多视角自我监督学习任务来提供额外的监督信号。通过在两个真实世界数据集上的广泛实验,我们表明 MHCR 的性能明显优于现有的视频推荐模型,并有效地缓解了冷启动挑战。我们的代码可在https://anonymous.4open.science/r/MHCR-02EF。
{"title":"Multi-view Hypergraph-based Contrastive Learning Model for Cold-Start Micro-video Recommendation","authors":"Sisuo Lyu, Xiuze Zhou, Xuming Hu","doi":"arxiv-2409.09638","DOIUrl":"https://doi.org/arxiv-2409.09638","url":null,"abstract":"With the widespread use of mobile devices and the rapid growth of micro-video\u0000platforms such as TikTok and Kwai, the demand for personalized micro-video\u0000recommendation systems has significantly increased. Micro-videos typically\u0000contain diverse information, such as textual metadata, visual cues (e.g., cover\u0000images), and dynamic video content, significantly affecting user interaction\u0000and engagement patterns. However, most existing approaches often suffer from\u0000the problem of over-smoothing, which limits their ability to capture\u0000comprehensive interaction information effectively. Additionally, cold-start\u0000scenarios present ongoing challenges due to sparse interaction data and the\u0000underutilization of available interaction signals. To address these issues, we propose a Multi-view Hypergraph-based Contrastive\u0000learning model for cold-start micro-video Recommendation (MHCR). MHCR\u0000introduces a multi-view multimodal feature extraction layer to capture\u0000interaction signals from various perspectives and incorporates multi-view\u0000self-supervised learning tasks to provide additional supervisory signals.\u0000Through extensive experiments on two real-world datasets, we show that MHCR\u0000significantly outperforms existing video recommendation models and effectively\u0000mitigates cold-start challenges. Our code is available at\u0000https://anonymous.4open.science/r/MHCR-02EF.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SafeEar: Content Privacy-Preserving Audio Deepfake Detection SafeEar:内容隐私保护音频深度伪造检测
Pub Date : 2024-09-14 DOI: arxiv-2409.09272
Xinfeng Li, Kai Li, Yifan Zheng, Chen Yan, Xiaoyu Ji, Wenyuan Xu
Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibitedremarkable performance in generating realistic and natural audio. However,their dark side, audio deepfake poses a significant threat to both society andindividuals. Existing countermeasures largely focus on determining thegenuineness of speech based on complete original audio recordings, whichhowever often contain private content. This oversight may refrain deepfakedetection from many applications, particularly in scenarios involving sensitiveinformation like business secrets. In this paper, we propose SafeEar, a novelframework that aims to detect deepfake audios without relying on accessing thespeech content within. Our key idea is to devise a neural audio codec into anovel decoupling model that well separates the semantic and acousticinformation from audio samples, and only use the acoustic information (e.g.,prosody and timbre) for deepfake detection. In this way, no semantic contentwill be exposed to the detector. To overcome the challenge of identifyingdiverse deepfake audio without semantic clues, we enhance our deepfake detectorwith real-world codec augmentation. Extensive experiments conducted on fourbenchmark datasets demonstrate SafeEar's effectiveness in detecting variousdeepfake techniques with an equal error rate (EER) down to 2.02%.Simultaneously, it shields five-language speech content from being decipheredby both machine and human auditory analysis, demonstrated by word error rates(WERs) all above 93.93% and our user study. Furthermore, our benchmarkconstructed for anti-deepfake and anti-content recovery evaluation helpsprovide a basis for future research in the realms of audio privacy preservationand deepfake detection.
文本到语音(TTS)和语音转换(VC)模型在生成逼真自然的音频方面表现出了卓越的性能。然而,它们的阴暗面--音频深度伪造对社会和个人都构成了重大威胁。现有的应对措施主要集中在根据完整的原始音频录音来确定语音的原始性,但这些录音往往包含私人内容。这种疏忽可能会导致深度伪造检测无法广泛应用,尤其是在涉及商业机密等敏感信息的场景中。在本文中,我们提出了 SafeEar,这是一个新颖的框架,旨在检测深度伪音频,而无需依赖访问其中的语音内容。我们的主要想法是将神经音频编解码器设计成一个高级解耦模型,该模型能很好地分离音频样本中的语义和声学信息,并仅使用声学信息(如前奏和音色)进行深度伪听检测。这样,就不会有语义内容暴露给检测器。为了克服在没有语义线索的情况下识别各种深度伪造音频的挑战,我们使用真实世界编解码器增强技术来增强我们的深度伪造检测器。在四个基准数据集上进行的广泛实验证明,SafeEar 能有效检测各种深度伪造技术,等效错误率(EER)低至 2.02%。同时,它还能保护五种语言的语音内容不被机器和人类听觉分析破译,词错误率(WER)均高于 93.93%,我们的用户研究也证明了这一点。此外,我们为反深度伪造和反内容恢复评估而构建的基准有助于为音频隐私保护和深度伪造检测领域的未来研究提供基础。
{"title":"SafeEar: Content Privacy-Preserving Audio Deepfake Detection","authors":"Xinfeng Li, Kai Li, Yifan Zheng, Chen Yan, Xiaoyu Ji, Wenyuan Xu","doi":"arxiv-2409.09272","DOIUrl":"https://doi.org/arxiv-2409.09272","url":null,"abstract":"Text-to-Speech (TTS) and Voice Conversion (VC) models have exhibited\u0000remarkable performance in generating realistic and natural audio. However,\u0000their dark side, audio deepfake poses a significant threat to both society and\u0000individuals. Existing countermeasures largely focus on determining the\u0000genuineness of speech based on complete original audio recordings, which\u0000however often contain private content. This oversight may refrain deepfake\u0000detection from many applications, particularly in scenarios involving sensitive\u0000information like business secrets. In this paper, we propose SafeEar, a novel\u0000framework that aims to detect deepfake audios without relying on accessing the\u0000speech content within. Our key idea is to devise a neural audio codec into a\u0000novel decoupling model that well separates the semantic and acoustic\u0000information from audio samples, and only use the acoustic information (e.g.,\u0000prosody and timbre) for deepfake detection. In this way, no semantic content\u0000will be exposed to the detector. To overcome the challenge of identifying\u0000diverse deepfake audio without semantic clues, we enhance our deepfake detector\u0000with real-world codec augmentation. Extensive experiments conducted on four\u0000benchmark datasets demonstrate SafeEar's effectiveness in detecting various\u0000deepfake techniques with an equal error rate (EER) down to 2.02%.\u0000Simultaneously, it shields five-language speech content from being deciphered\u0000by both machine and human auditory analysis, demonstrated by word error rates\u0000(WERs) all above 93.93% and our user study. Furthermore, our benchmark\u0000constructed for anti-deepfake and anti-content recovery evaluation helps\u0000provide a basis for future research in the realms of audio privacy preservation\u0000and deepfake detection.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"63 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1