首页 > 最新文献

arXiv - CS - Multimedia最新文献

英文 中文
Turbo your multi-modal classification with contrastive learning 通过对比学习提升多模态分类能力
Pub Date : 2024-09-14 DOI: arxiv-2409.09282
Zhiyu Zhang, Da Liu, Shengqiang Liu, Anna Wang, Jie Gao, Yali Li
Contrastive learning has become one of the most impressive approaches formulti-modal representation learning. However, previous multi-modal works mainlyfocused on cross-modal understanding, ignoring in-modal contrastive learning,which limits the representation of each modality. In this paper, we propose anovel contrastive learning strategy, called $Turbo$, to promote multi-modalunderstanding by joint in-modal and cross-modal contrastive learning.Specifically, multi-modal data pairs are sent through the forward pass twicewith different hidden dropout masks to get two different representations foreach modality. With these representations, we obtain multiple in-modal andcross-modal contrastive objectives for training. Finally, we combine theself-supervised Turbo with the supervised multi-modal classification anddemonstrate its effectiveness on two audio-text classification tasks, where thestate-of-the-art performance is achieved on a speech emotion recognitionbenchmark dataset.
对比学习已成为多模态表征学习中最令人印象深刻的方法之一。然而,以往的多模态研究主要关注跨模态理解,忽视了模态内对比学习,从而限制了每种模态的表征。本文提出了一种新的对比学习策略,称为 "涡轮"(Turbo),通过模内和跨模态对比学习来促进多模态理解。有了这些表征,我们就能得到多个模态内和跨模态对比目标,用于训练。最后,我们将自我监督 Turbo 与监督多模态分类相结合,并在两个音频-文本分类任务中演示了其有效性,其中在语音情感识别基准数据集上取得了最先进的性能。
{"title":"Turbo your multi-modal classification with contrastive learning","authors":"Zhiyu Zhang, Da Liu, Shengqiang Liu, Anna Wang, Jie Gao, Yali Li","doi":"arxiv-2409.09282","DOIUrl":"https://doi.org/arxiv-2409.09282","url":null,"abstract":"Contrastive learning has become one of the most impressive approaches for\u0000multi-modal representation learning. However, previous multi-modal works mainly\u0000focused on cross-modal understanding, ignoring in-modal contrastive learning,\u0000which limits the representation of each modality. In this paper, we propose a\u0000novel contrastive learning strategy, called $Turbo$, to promote multi-modal\u0000understanding by joint in-modal and cross-modal contrastive learning.\u0000Specifically, multi-modal data pairs are sent through the forward pass twice\u0000with different hidden dropout masks to get two different representations for\u0000each modality. With these representations, we obtain multiple in-modal and\u0000cross-modal contrastive objectives for training. Finally, we combine the\u0000self-supervised Turbo with the supervised multi-modal classification and\u0000demonstrate its effectiveness on two audio-text classification tasks, where the\u0000state-of-the-art performance is achieved on a speech emotion recognition\u0000benchmark dataset.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals MHAD:包含多角度视频和同步生理信号的多模态家庭活动数据集
Pub Date : 2024-09-14 DOI: arxiv-2409.09366
Lei Yu, Jintao Fei, Xinyi Liu, Yang Yao, Jun Zhao, Guoxin Wang, Xin Li
Video-based physiology, exemplified by remote photoplethysmography (rPPG),extracts physiological signals such as pulse and respiration by analyzingsubtle changes in video recordings. This non-contact, real-time monitoringmethod holds great potential for home settings. Despite the valuablecontributions of public benchmark datasets to this technology, there iscurrently no dataset specifically designed for passive home monitoring.Existing datasets are often limited to close-up, static, frontal recordings andtypically include only 1-2 physiological signals. To advance video-basedphysiology in real home settings, we introduce the MHAD dataset. It comprises1,440 videos from 40 subjects, capturing 6 typical activities from 3 angles ina real home environment. Additionally, 5 physiological signals were recorded,making it a comprehensive video-based physiology dataset. MHAD is compatiblewith the rPPG-toolbox and has been validated using several unsupervised andsupervised methods. Our dataset is publicly available athttps://github.com/jdh-algo/MHAD-Dataset.
以视频为基础的生理学,如远程照相血压计(rPPG),通过分析视频记录中的细微变化来提取脉搏和呼吸等生理信号。这种非接触式实时监测方法在家庭环境中具有巨大潜力。现有的数据集通常仅限于特写、静态、正面记录,而且通常只包括 1-2 个生理信号。为了在真实的家庭环境中推进基于视频的生理学研究,我们引入了 MHAD 数据集。该数据集由 40 名受试者的 1440 段视频组成,从 3 个角度捕捉了真实家庭环境中的 6 种典型活动。此外,还记录了 5 种生理信号,使其成为一个全面的基于视频的生理学数据集。MHAD 与 rPPG 工具箱兼容,并使用多种无监督和有监督方法进行了验证。我们的数据集可在https://github.com/jdh-algo/MHAD-Dataset。
{"title":"MHAD: Multimodal Home Activity Dataset with Multi-Angle Videos and Synchronized Physiological Signals","authors":"Lei Yu, Jintao Fei, Xinyi Liu, Yang Yao, Jun Zhao, Guoxin Wang, Xin Li","doi":"arxiv-2409.09366","DOIUrl":"https://doi.org/arxiv-2409.09366","url":null,"abstract":"Video-based physiology, exemplified by remote photoplethysmography (rPPG),\u0000extracts physiological signals such as pulse and respiration by analyzing\u0000subtle changes in video recordings. This non-contact, real-time monitoring\u0000method holds great potential for home settings. Despite the valuable\u0000contributions of public benchmark datasets to this technology, there is\u0000currently no dataset specifically designed for passive home monitoring.\u0000Existing datasets are often limited to close-up, static, frontal recordings and\u0000typically include only 1-2 physiological signals. To advance video-based\u0000physiology in real home settings, we introduce the MHAD dataset. It comprises\u00001,440 videos from 40 subjects, capturing 6 typical activities from 3 angles in\u0000a real home environment. Additionally, 5 physiological signals were recorded,\u0000making it a comprehensive video-based physiology dataset. MHAD is compatible\u0000with the rPPG-toolbox and has been validated using several unsupervised and\u0000supervised methods. Our dataset is publicly available at\u0000https://github.com/jdh-algo/MHAD-Dataset.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"100 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototypical Prompting for Text-to-image Person Re-identification 文本到图像的人员再识别原型提示
Pub Date : 2024-09-14 DOI: arxiv-2409.09427
Shuanglin Yan, Jun Liu, Neng Dong, Liyan Zhang, Jinhui Tang
In this paper, we study the problem of Text-to-Image Person Re-identification(TIReID), which aims to find images of the same identity described by a textsentence from a pool of candidate images. Benefiting from Vision-LanguagePre-training, such as CLIP (Contrastive Language-Image Pretraining), the TIReIDtechniques have achieved remarkable progress recently. However, most existingmethods only focus on instance-level matching and ignore identity-levelmatching, which involves associating multiple images and texts belonging to thesame person. In this paper, we propose a novel prototypical prompting framework(Propot) designed to simultaneously model instance-level and identity-levelmatching for TIReID. Our Propot transforms the identity-level matching probleminto a prototype learning problem, aiming to learn identity-enrichedprototypes. Specifically, Propot works by 'initialize, adapt, enrich, thenaggregate'. We first use CLIP to generate high-quality initial prototypes.Then, we propose a domain-conditional prototypical prompting (DPP) module toadapt the prototypes to the TIReID task using task-related information.Further, we propose an instance-conditional prototypical prompting (IPP) moduleto update prototypes conditioned on intra-modal and inter-modal instances toensure prototype diversity. Finally, we design an adaptive prototypeaggregation module to aggregate these prototypes, generating finalidentity-enriched prototypes. With identity-enriched prototypes, we diffuse itsrich identity information to instances through prototype-to-instancecontrastive loss to facilitate identity-level matching. Extensive experimentsconducted on three benchmarks demonstrate the superiority of Propot compared toexisting TIReID methods.
本文研究的是文本到图像的人员再识别(TIReID)问题,其目的是从候选图像库中找到文本句子所描述的相同身份的图像。得益于视觉语言预训练(如 CLIP,Contrastive Language-Image Pretraining),TIReID 技术近来取得了显著进展。然而,现有的大多数方法只关注实例级匹配,而忽略了身份级匹配,这涉及到将属于同一个人的多个图像和文本联系起来。在本文中,我们提出了一个新颖的原型提示框架(Propot),旨在同时为 TIReID 的实例级匹配和身份级匹配建模。我们的 Propot 将身份级匹配问题转化为原型学习问题,旨在学习身份丰富的原型。具体来说,Propot 的工作原理是 "初始化、适应、丰富、然后聚合"。首先,我们使用 CLIP 生成高质量的初始原型;然后,我们提出了领域条件原型提示(DPP)模块,利用任务相关信息使原型适应 TIReID 任务;再者,我们提出了实例条件原型提示(IPP)模块,根据模态内和模态间的实例更新原型,以确保原型的多样性。最后,我们设计了一个自适应原型聚合模块来聚合这些原型,生成最终的身份丰富原型。有了身份丰富的原型,我们通过原型到实例的对比损失将其丰富的身份信息扩散到实例中,以促进身份级匹配。在三个基准测试中进行的大量实验证明,与现有的 TIReID 方法相比,Propot 更胜一筹。
{"title":"Prototypical Prompting for Text-to-image Person Re-identification","authors":"Shuanglin Yan, Jun Liu, Neng Dong, Liyan Zhang, Jinhui Tang","doi":"arxiv-2409.09427","DOIUrl":"https://doi.org/arxiv-2409.09427","url":null,"abstract":"In this paper, we study the problem of Text-to-Image Person Re-identification\u0000(TIReID), which aims to find images of the same identity described by a text\u0000sentence from a pool of candidate images. Benefiting from Vision-Language\u0000Pre-training, such as CLIP (Contrastive Language-Image Pretraining), the TIReID\u0000techniques have achieved remarkable progress recently. However, most existing\u0000methods only focus on instance-level matching and ignore identity-level\u0000matching, which involves associating multiple images and texts belonging to the\u0000same person. In this paper, we propose a novel prototypical prompting framework\u0000(Propot) designed to simultaneously model instance-level and identity-level\u0000matching for TIReID. Our Propot transforms the identity-level matching problem\u0000into a prototype learning problem, aiming to learn identity-enriched\u0000prototypes. Specifically, Propot works by 'initialize, adapt, enrich, then\u0000aggregate'. We first use CLIP to generate high-quality initial prototypes.\u0000Then, we propose a domain-conditional prototypical prompting (DPP) module to\u0000adapt the prototypes to the TIReID task using task-related information.\u0000Further, we propose an instance-conditional prototypical prompting (IPP) module\u0000to update prototypes conditioned on intra-modal and inter-modal instances to\u0000ensure prototype diversity. Finally, we design an adaptive prototype\u0000aggregation module to aggregate these prototypes, generating final\u0000identity-enriched prototypes. With identity-enriched prototypes, we diffuse its\u0000rich identity information to instances through prototype-to-instance\u0000contrastive loss to facilitate identity-level matching. Extensive experiments\u0000conducted on three benchmarks demonstrate the superiority of Propot compared to\u0000existing TIReID methods.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142269108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AI-Driven Virtual Teacher for Enhanced Educational Efficiency: Leveraging Large Pretrain Models for Autonomous Error Analysis and Correction 人工智能驱动的虚拟教师,提高教育效率:利用大型预训练模型进行自主错误分析和纠正
Pub Date : 2024-09-14 DOI: arxiv-2409.09403
Tianlong Xu, Yi-Fan Zhang, Zhendong Chu, Shen Wang, Qingsong Wen
Students frequently make mistakes while solving mathematical problems, andtraditional error correction methods are both time-consuming andlabor-intensive. This paper introduces an innovative textbf{V}irtualtextbf{A}I textbf{T}eacher system designed to autonomously analyze andcorrect student textbf{E}rrors (VATE). Leveraging advanced large languagemodels (LLMs), the system uses student drafts as a primary source for erroranalysis, which enhances understanding of the student's learning process. Itincorporates sophisticated prompt engineering and maintains an error pool toreduce computational overhead. The AI-driven system also features a real-timedialogue component for efficient student interaction. Our approach demonstratessignificant advantages over traditional and machine learning-based errorcorrection methods, including reduced educational costs, high scalability, andsuperior generalizability. The system has been deployed on the Squirrel AIlearning platform for elementary mathematics education, where it achieves78.3% accuracy in error analysis and shows a marked improvement in studentlearning efficiency. Satisfaction surveys indicate a strong positive reception,highlighting the system's potential to transform educational practices.
学生在解决数学问题时经常犯错,而传统的纠错方法既耗时又耗力。本文介绍了一种创新的虚拟(textbf{V}irtualtextbf{A}I (textbf{T})教师系统,该系统旨在自主分析和纠正学生的错误(VATE)。该系统利用先进的大型语言模型(LLM),将学生草稿作为错误分析的主要来源,从而增强了对学生学习过程的理解。它结合了复杂的提示工程,并维护一个错误库,以减少计算开销。人工智能驱动的系统还具有实时对话组件,可实现高效的学生互动。与传统的纠错方法和基于机器学习的纠错方法相比,我们的方法具有显著优势,包括降低教育成本、高可扩展性和更好的通用性。该系统已部署在松鼠小学数学教育人工智能学习平台上,其错误分析准确率达到78.3%,学生学习效率明显提高。满意度调查显示,该系统受到了强烈的欢迎,彰显了其改变教育实践的潜力。
{"title":"AI-Driven Virtual Teacher for Enhanced Educational Efficiency: Leveraging Large Pretrain Models for Autonomous Error Analysis and Correction","authors":"Tianlong Xu, Yi-Fan Zhang, Zhendong Chu, Shen Wang, Qingsong Wen","doi":"arxiv-2409.09403","DOIUrl":"https://doi.org/arxiv-2409.09403","url":null,"abstract":"Students frequently make mistakes while solving mathematical problems, and\u0000traditional error correction methods are both time-consuming and\u0000labor-intensive. This paper introduces an innovative textbf{V}irtual\u0000textbf{A}I textbf{T}eacher system designed to autonomously analyze and\u0000correct student textbf{E}rrors (VATE). Leveraging advanced large language\u0000models (LLMs), the system uses student drafts as a primary source for error\u0000analysis, which enhances understanding of the student's learning process. It\u0000incorporates sophisticated prompt engineering and maintains an error pool to\u0000reduce computational overhead. The AI-driven system also features a real-time\u0000dialogue component for efficient student interaction. Our approach demonstrates\u0000significant advantages over traditional and machine learning-based error\u0000correction methods, including reduced educational costs, high scalability, and\u0000superior generalizability. The system has been deployed on the Squirrel AI\u0000learning platform for elementary mathematics education, where it achieves\u000078.3% accuracy in error analysis and shows a marked improvement in student\u0000learning efficiency. Satisfaction surveys indicate a strong positive reception,\u0000highlighting the system's potential to transform educational practices.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy? 多模态语音变换解码器:多种模式何时能提高准确性?
Pub Date : 2024-09-13 DOI: arxiv-2409.09221
Yiwen Guan, Viet Anh Trinh, Vivek Voleti, Jacob Whitehill
Decoder-only discrete-token language models have recently achievedsignificant success in automatic speech recognition. However, systematicanalyses of how different modalities impact performance in specific scenariosremain limited. In this paper, we investigate the effects of multiplemodalities on recognition accuracy on both synthetic and real-world datasets.Our experiments suggest that: (1) Integrating more modalities can increaseaccuracy; in particular, our paper is, to our best knowledge, the first to showthe benefit of combining audio, image context, and lip information; (2) Imagesas a supplementary modality for speech recognition provide the greatest benefitat moderate noise levels, moreover, they exhibit a different trend compared toinherently synchronized modalities like lip movements; (3) Performance improveson both synthetic and real-world datasets when the most relevant visualinformation is filtered as a preprocessing step.
纯解码器离散令牌语言模型最近在自动语音识别领域取得了巨大成功。然而,对不同模式在特定场景中如何影响性能的系统分析仍然有限。在本文中,我们在合成和真实世界数据集上研究了多种模态对识别准确率的影响:我们的实验表明:(1) 整合更多模态可以提高识别准确率;特别是,据我们所知,我们的论文首次展示了整合音频、图像上下文和唇语信息的益处;(2) 图像作为语音识别的辅助模态,在中等噪声水平下具有最大益处,而且,与唇语运动等固有同步模态相比,它们表现出不同的趋势;(3) 在合成和真实世界数据集上,当作为预处理步骤过滤最相关的视觉信息时,识别性能都会提高。
{"title":"Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?","authors":"Yiwen Guan, Viet Anh Trinh, Vivek Voleti, Jacob Whitehill","doi":"arxiv-2409.09221","DOIUrl":"https://doi.org/arxiv-2409.09221","url":null,"abstract":"Decoder-only discrete-token language models have recently achieved\u0000significant success in automatic speech recognition. However, systematic\u0000analyses of how different modalities impact performance in specific scenarios\u0000remain limited. In this paper, we investigate the effects of multiple\u0000modalities on recognition accuracy on both synthetic and real-world datasets.\u0000Our experiments suggest that: (1) Integrating more modalities can increase\u0000accuracy; in particular, our paper is, to our best knowledge, the first to show\u0000the benefit of combining audio, image context, and lip information; (2) Images\u0000as a supplementary modality for speech recognition provide the greatest benefit\u0000at moderate noise levels, moreover, they exhibit a different trend compared to\u0000inherently synchronized modalities like lip movements; (3) Performance improves\u0000on both synthetic and real-world datasets when the most relevant visual\u0000information is filtered as a preprocessing step.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"17 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Virtual Try-On with Garment-focused Diffusion Models 利用以服装为重点的扩散模型改进虚拟试穿
Pub Date : 2024-09-12 DOI: arxiv-2409.08258
Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan, Ting Yao, Yang Cao, Tao Mei
Diffusion models have led to the revolutionizing of generative modeling innumerous image synthesis tasks. Nevertheless, it is not trivial to directlyapply diffusion models for synthesizing an image of a target person wearing agiven in-shop garment, i.e., image-based virtual try-on (VTON) task. Thedifficulty originates from the aspect that the diffusion process should notonly produce holistically high-fidelity photorealistic image of the targetperson, but also locally preserve every appearance and texture detail of thegiven garment. To address this, we shape a new Diffusion model, namely GarDiff,which triggers the garment-focused diffusion process with amplified guidance ofboth basic visual appearance and detailed textures (i.e., high-frequencydetails) derived from the given garment. GarDiff first remoulds a pre-trainedlatent diffusion model with additional appearance priors derived from the CLIPand VAE encodings of the reference garment. Meanwhile, a novel garment-focusedadapter is integrated into the UNet of diffusion model, pursuing localfine-grained alignment with the visual appearance of reference garment andhuman pose. We specifically design an appearance loss over the synthesizedgarment to enhance the crucial, high-frequency details. Extensive experimentson VITON-HD and DressCode datasets demonstrate the superiority of our GarDiffwhen compared to state-of-the-art VTON approaches. Code is publicly availableat:href{https://github.com/siqi0905/GarDiff/tree/master}{https://github.com/siqi0905/GarDiff/tree/master}.
扩散模型为无数图像合成任务的生成建模带来了革命性的变革。然而,直接应用扩散模型合成目标人物穿着特定店内服装的图像(即基于图像的虚拟试穿(VTON)任务)并非易事。难点在于扩散过程不仅要生成高保真的目标人物整体逼真图像,还要在局部保留服装的每一个外观和纹理细节。为了解决这个问题,我们设计了一种新的扩散模型,即 GarDiff,它可以通过对给定服装的基本视觉外观和细节纹理(即高频细节)的放大引导来触发以服装为中心的扩散过程。GarDiff 首先利用从参考服装的 CLIP 和 VAE 编码中提取的附加外观前验,重塑预先训练好的静态扩散模型。与此同时,一个以服装为重点的新适配器被集成到扩散模型的 UNet 中,以追求与参考服装和人体姿势的视觉外观进行局部精细调整。我们特别设计了合成服装的外观损失,以增强关键的高频细节。在VITON-HD和DressCode数据集上进行的大量实验证明,与最先进的VTON方法相比,我们的GarDiff方法更胜一筹。代码可在以下网址公开获取:href{https://github.com/siqi0905/GarDiff/tree/master}{https://github.com/siqi0905/GarDiff/tree/master}。
{"title":"Improving Virtual Try-On with Garment-focused Diffusion Models","authors":"Siqi Wan, Yehao Li, Jingwen Chen, Yingwei Pan, Ting Yao, Yang Cao, Tao Mei","doi":"arxiv-2409.08258","DOIUrl":"https://doi.org/arxiv-2409.08258","url":null,"abstract":"Diffusion models have led to the revolutionizing of generative modeling in\u0000numerous image synthesis tasks. Nevertheless, it is not trivial to directly\u0000apply diffusion models for synthesizing an image of a target person wearing a\u0000given in-shop garment, i.e., image-based virtual try-on (VTON) task. The\u0000difficulty originates from the aspect that the diffusion process should not\u0000only produce holistically high-fidelity photorealistic image of the target\u0000person, but also locally preserve every appearance and texture detail of the\u0000given garment. To address this, we shape a new Diffusion model, namely GarDiff,\u0000which triggers the garment-focused diffusion process with amplified guidance of\u0000both basic visual appearance and detailed textures (i.e., high-frequency\u0000details) derived from the given garment. GarDiff first remoulds a pre-trained\u0000latent diffusion model with additional appearance priors derived from the CLIP\u0000and VAE encodings of the reference garment. Meanwhile, a novel garment-focused\u0000adapter is integrated into the UNet of diffusion model, pursuing local\u0000fine-grained alignment with the visual appearance of reference garment and\u0000human pose. We specifically design an appearance loss over the synthesized\u0000garment to enhance the crucial, high-frequency details. Extensive experiments\u0000on VITON-HD and DressCode datasets demonstrate the superiority of our GarDiff\u0000when compared to state-of-the-art VTON approaches. Code is publicly available\u0000at:\u0000href{https://github.com/siqi0905/GarDiff/tree/master}{https://github.com/siqi0905/GarDiff/tree/master}.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations 反思带有部分注释的多标签识别的提示策略
Pub Date : 2024-09-12 DOI: arxiv-2409.08381
Samyak Rawlekar, Shubhang Bhatnagar, Narendra Ahuja
Vision-language models (VLMs) like CLIP have been adapted for Multi-LabelRecognition (MLR) with partial annotations by leveraging prompt-learning, wherepositive and negative prompts are learned for each class to associate theirembeddings with class presence or absence in the shared vision-text featurespace. While this approach improves MLR performance by relying on VLM priors,we hypothesize that learning negative prompts may be suboptimal, as thedatasets used to train VLMs lack image-caption pairs explicitly focusing onclass absence. To analyze the impact of positive and negative prompt learningon MLR, we introduce PositiveCoOp and NegativeCoOp, where only one prompt islearned with VLM guidance while the other is replaced by an embedding vectorlearned directly in the shared feature space without relying on the textencoder. Through empirical analysis, we observe that negative prompts degradeMLR performance, and learning only positive prompts, combined with learnednegative embeddings (PositiveCoOp), outperforms dual prompt learningapproaches. Moreover, we quantify the performance benefits that prompt-learningoffers over a simple vision-features-only baseline, observing that the baselinedisplays strong performance comparable to dual prompt learning approach(DualCoOp), when the proportion of missing labels is low, while requiring halfthe training compute and 16 times fewer parameters
像 CLIP 这样的视觉语言模型(VLMs)已经通过提示学习(prompt-learning)被用于带有部分注释的多标签识别(MLR),在提示学习中,为每个类别学习正向和负向提示,以便在共享的视觉-文本特征空间中将它们与类别的存在或不存在联系起来。虽然这种方法依靠 VLM 先验提高了 MLR 性能,但我们假设学习负面提示可能不是最佳方法,因为用于训练 VLM 的数据集缺乏明确关注类别缺失的图像标题对。为了分析正面和负面提示学习对 MLR 的影响,我们引入了 PositiveCoOp 和 NegativeCoOp,其中只有一个提示是在 VLM 的指导下学习的,而另一个提示则由直接在共享特征空间中学习的嵌入向量代替,而不依赖文本编码器。通过实证分析,我们发现负面提示会降低MLR 的性能,而只学习正面提示并结合学习到的负面嵌入向量(PositiveCoOp)的效果优于双重提示学习方法。此外,我们还量化了提示学习相对于单纯视觉特征基线的性能优势,观察到当缺失标签比例较低时,基线表现出与双提示学习方法(DualCoOp)相当的强劲性能,同时所需的训练计算量和参数分别减少了一半和 16 倍。
{"title":"Rethinking Prompting Strategies for Multi-Label Recognition with Partial Annotations","authors":"Samyak Rawlekar, Shubhang Bhatnagar, Narendra Ahuja","doi":"arxiv-2409.08381","DOIUrl":"https://doi.org/arxiv-2409.08381","url":null,"abstract":"Vision-language models (VLMs) like CLIP have been adapted for Multi-Label\u0000Recognition (MLR) with partial annotations by leveraging prompt-learning, where\u0000positive and negative prompts are learned for each class to associate their\u0000embeddings with class presence or absence in the shared vision-text feature\u0000space. While this approach improves MLR performance by relying on VLM priors,\u0000we hypothesize that learning negative prompts may be suboptimal, as the\u0000datasets used to train VLMs lack image-caption pairs explicitly focusing on\u0000class absence. To analyze the impact of positive and negative prompt learning\u0000on MLR, we introduce PositiveCoOp and NegativeCoOp, where only one prompt is\u0000learned with VLM guidance while the other is replaced by an embedding vector\u0000learned directly in the shared feature space without relying on the text\u0000encoder. Through empirical analysis, we observe that negative prompts degrade\u0000MLR performance, and learning only positive prompts, combined with learned\u0000negative embeddings (PositiveCoOp), outperforms dual prompt learning\u0000approaches. Moreover, we quantify the performance benefits that prompt-learning\u0000offers over a simple vision-features-only baseline, observing that the baseline\u0000displays strong performance comparable to dual prompt learning approach\u0000(DualCoOp), when the proportion of missing labels is low, while requiring half\u0000the training compute and 16 times fewer parameters","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"201 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142254053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ComAlign: Compositional Alignment in Vision-Language Models ComAlign:视觉语言模型中的构图对齐
Pub Date : 2024-09-12 DOI: arxiv-2409.08206
Ali Abdollah, Amirmohammad Izadi, Armin Saghafian, Reza Vahidimajd, Mohammad Mozafari, Amirreza Mirzaei, Mohammadmahdi Samiei, Mahdieh Soleymani Baghshah
Vision-language models (VLMs) like CLIP have showcased a remarkable abilityto extract transferable features for downstream tasks. Nonetheless, thetraining process of these models is usually based on a coarse-grainedcontrastive loss between the global embedding of images and texts which maylose the compositional structure of these modalities. Many recent studies haveshown VLMs lack compositional understandings like attribute binding andidentifying object relationships. Although some recent methods have tried toachieve finer-level alignments, they either are not based on extractingmeaningful components of proper granularity or don't properly utilize themodalities' correspondence (especially in image-text pairs with moreingredients). Addressing these limitations, we introduce CompositionalAlignment (ComAlign), a fine-grained approach to discover more exactcorrespondence of text and image components using only the weak supervision inthe form of image-text pairs. Our methodology emphasizes that the compositionalstructure (including entities and relations) extracted from the text modalitymust also be retained in the image modality. To enforce correspondence offine-grained concepts in image and text modalities, we train a lightweightnetwork lying on top of existing visual and language encoders using a smalldataset. The network is trained to align nodes and edges of the structureacross the modalities. Experimental results on various VLMs and datasetsdemonstrate significant improvements in retrieval and compositional benchmarks,affirming the effectiveness of our plugin model.
像 CLIP 这样的视觉语言模型(VLMs)已经展示了为下游任务提取可转移特征的卓越能力。然而,这些模型的训练过程通常基于图像和文本全局嵌入之间的粗粒度对比损失,这可能会丢失这些模态的组成结构。最近的许多研究表明,VLM 缺乏对组成结构的理解,如属性绑定和识别对象关系。尽管最近的一些方法试图实现更精细的对齐,但它们要么不是基于提取适当粒度的有意义成分,要么没有正确利用模态的对应关系(特别是在成分较多的图像-文本对中)。针对这些局限性,我们引入了合成对齐(ComAlign)技术,这是一种细粒度方法,只使用图像-文本对形式的弱监督来发现文本和图像成分之间更精确的对应关系。我们的方法强调,从文本模态中提取的组成结构(包括实体和关系)也必须保留在图像模态中。为了加强图像和文本模式中细粒度概念的对应性,我们在现有的视觉和语言编码器基础上,使用一个小型数据集训练一个轻量级网络。通过训练,该网络可使跨模态结构的节点和边对齐。在各种 VLM 和数据集上的实验结果表明,我们在检索和合成基准方面取得了显著的改进,从而肯定了我们的插件模型的有效性。
{"title":"ComAlign: Compositional Alignment in Vision-Language Models","authors":"Ali Abdollah, Amirmohammad Izadi, Armin Saghafian, Reza Vahidimajd, Mohammad Mozafari, Amirreza Mirzaei, Mohammadmahdi Samiei, Mahdieh Soleymani Baghshah","doi":"arxiv-2409.08206","DOIUrl":"https://doi.org/arxiv-2409.08206","url":null,"abstract":"Vision-language models (VLMs) like CLIP have showcased a remarkable ability\u0000to extract transferable features for downstream tasks. Nonetheless, the\u0000training process of these models is usually based on a coarse-grained\u0000contrastive loss between the global embedding of images and texts which may\u0000lose the compositional structure of these modalities. Many recent studies have\u0000shown VLMs lack compositional understandings like attribute binding and\u0000identifying object relationships. Although some recent methods have tried to\u0000achieve finer-level alignments, they either are not based on extracting\u0000meaningful components of proper granularity or don't properly utilize the\u0000modalities' correspondence (especially in image-text pairs with more\u0000ingredients). Addressing these limitations, we introduce Compositional\u0000Alignment (ComAlign), a fine-grained approach to discover more exact\u0000correspondence of text and image components using only the weak supervision in\u0000the form of image-text pairs. Our methodology emphasizes that the compositional\u0000structure (including entities and relations) extracted from the text modality\u0000must also be retained in the image modality. To enforce correspondence of\u0000fine-grained concepts in image and text modalities, we train a lightweight\u0000network lying on top of existing visual and language encoders using a small\u0000dataset. The network is trained to align nodes and edges of the structure\u0000across the modalities. Experimental results on various VLMs and datasets\u0000demonstrate significant improvements in retrieval and compositional benchmarks,\u0000affirming the effectiveness of our plugin model.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187517","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSMF: Multi-Scale Multi-Modal Fusion for Enhanced Stock Market Prediction MSMF:多尺度多模态融合增强股市预测功能
Pub Date : 2024-09-12 DOI: arxiv-2409.07855
Jiahao Qin
This paper presents MSMF (Multi-Scale Multi-Modal Fusion), a novel approachfor enhanced stock market prediction. MSMF addresses key challenges inmulti-modal stock analysis by integrating a modality completion encoder,multi-scale feature extraction, and an innovative fusion mechanism. Our modelleverages blank learning and progressive fusion to balance complementarity andredundancy across modalities, while multi-scale alignment facilitates directcorrelations between heterogeneous data types. We introduce Multi-GranularityGates and a specialized architecture to optimize the integration of local andglobal information for different tasks. Additionally, a Task-targetedPrediction layer is employed to preserve both coarse and fine-grained featuresduring fusion. Experimental results demonstrate that MSMF outperforms existingmethods, achieving significant improvements in accuracy and reducing predictionerrors across various stock market forecasting tasks. This research contributesvaluable insights to the field of multi-modal financial analysis and offers arobust framework for enhanced market prediction.
本文介绍了 MSMF(多尺度多模态融合),这是一种用于增强股市预测的新方法。MSMF 通过整合模态完成编码器、多尺度特征提取和创新的融合机制,解决了多模态股票分析中的关键难题。我们的模型利用空白学习和渐进融合来平衡各模态之间的互补性和冗余性,而多尺度对齐则促进了异构数据类型之间的直接关联。我们引入了多粒度门(Multi-GranularityGates)和专门的架构,以优化不同任务的本地和全局信息整合。此外,我们还采用了任务目标预测层,以在融合过程中保留粗粒度和细粒度特征。实验结果表明,MSMF 优于现有方法,在各种股市预测任务中显著提高了准确性并减少了预测误差。这项研究为多模态金融分析领域贡献了宝贵的见解,并为增强市场预测提供了一个稳健的框架。
{"title":"MSMF: Multi-Scale Multi-Modal Fusion for Enhanced Stock Market Prediction","authors":"Jiahao Qin","doi":"arxiv-2409.07855","DOIUrl":"https://doi.org/arxiv-2409.07855","url":null,"abstract":"This paper presents MSMF (Multi-Scale Multi-Modal Fusion), a novel approach\u0000for enhanced stock market prediction. MSMF addresses key challenges in\u0000multi-modal stock analysis by integrating a modality completion encoder,\u0000multi-scale feature extraction, and an innovative fusion mechanism. Our model\u0000leverages blank learning and progressive fusion to balance complementarity and\u0000redundancy across modalities, while multi-scale alignment facilitates direct\u0000correlations between heterogeneous data types. We introduce Multi-Granularity\u0000Gates and a specialized architecture to optimize the integration of local and\u0000global information for different tasks. Additionally, a Task-targeted\u0000Prediction layer is employed to preserve both coarse and fine-grained features\u0000during fusion. Experimental results demonstrate that MSMF outperforms existing\u0000methods, achieving significant improvements in accuracy and reducing prediction\u0000errors across various stock market forecasting tasks. This research contributes\u0000valuable insights to the field of multi-modal financial analysis and offers a\u0000robust framework for enhanced market prediction.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"34 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Improving Text-guided Object Inpainting with Semantic Pre-inpainting 利用语义预绘制改进文本引导的对象绘制
Pub Date : 2024-09-12 DOI: arxiv-2409.08260
Yifu Chen, Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Zhineng Chen, Tao Mei
Recent years have witnessed the success of large text-to-image diffusionmodels and their remarkable potential to generate high-quality images. Thefurther pursuit of enhancing the editability of images has sparked significantinterest in the downstream task of inpainting a novel object described by atext prompt within a designated region in the image. Nevertheless, the problemis not trivial from two aspects: 1) Solely relying on one single U-Net to aligntext prompt and visual object across all the denoising timesteps isinsufficient to generate desired objects; 2) The controllability of objectgeneration is not guaranteed in the intricate sampling space of diffusionmodel. In this paper, we propose to decompose the typical single-stage objectinpainting into two cascaded processes: 1) semantic pre-inpainting that infersthe semantic features of desired objects in a multi-modal feature space; 2)high-fieldity object generation in diffusion latent space that pivots on suchinpainted semantic features. To achieve this, we cascade a Transformer-basedsemantic inpainter and an object inpainting diffusion model, leading to a novelCAscaded Transformer-Diffusion (CAT-Diffusion) framework for text-guided objectinpainting. Technically, the semantic inpainter is trained to predict thesemantic features of the target object conditioning on unmasked context andtext prompt. The outputs of the semantic inpainter then act as the informativevisual prompts to guide high-fieldity object generation through a referenceadapter layer, leading to controllable object inpainting. Extensive evaluationson OpenImages-V6 and MSCOCO validate the superiority of CAT-Diffusion againstthe state-of-the-art methods. Code is available aturl{https://github.com/Nnn-s/CATdiffusion}.
近年来,大型文本到图像扩散模型取得了成功,并在生成高质量图像方面具有显著的潜力。为了进一步提高图像的可编辑性,人们对在图像指定区域内插入文本提示所描述的新对象这一下游任务产生了浓厚的兴趣。然而,从两个方面来看,这个问题并不简单:1) 在所有去噪时间步中,仅依靠一个 U-Net 将文本提示和视觉对象对齐不足以生成所需的对象;2) 在扩散模型错综复杂的采样空间中,无法保证对象生成的可控性。在本文中,我们建议将典型的单阶段对象绘制分解为两个级联过程:1) 语义预绘制,即在多模态特征空间中推断所需对象的语义特征;2) 在扩散潜空间中生成高场度对象,该过程以这些绘制的语义特征为中心。为此,我们级联了一个基于变换器的语义绘制器和一个对象绘制扩散模型,从而形成了一个用于文本引导的对象绘制的新型级联变换器-扩散(CAT-Diffusion)框架。从技术上讲,语义绘制器通过训练来预测目标对象的语义特征,并以未屏蔽的上下文和文本提示为条件。然后,语义绘制器的输出作为信息视觉提示,通过参考适配器层引导高场度对象生成,从而实现可控的对象绘制。在 OpenImages-V6 和 MSCOCO 上进行的广泛评估验证了 CAT-Diffusion 优于最先进的方法。代码请访问:url{https://github.com/Nnn-s/CATdiffusion}。
{"title":"Improving Text-guided Object Inpainting with Semantic Pre-inpainting","authors":"Yifu Chen, Jingwen Chen, Yingwei Pan, Yehao Li, Ting Yao, Zhineng Chen, Tao Mei","doi":"arxiv-2409.08260","DOIUrl":"https://doi.org/arxiv-2409.08260","url":null,"abstract":"Recent years have witnessed the success of large text-to-image diffusion\u0000models and their remarkable potential to generate high-quality images. The\u0000further pursuit of enhancing the editability of images has sparked significant\u0000interest in the downstream task of inpainting a novel object described by a\u0000text prompt within a designated region in the image. Nevertheless, the problem\u0000is not trivial from two aspects: 1) Solely relying on one single U-Net to align\u0000text prompt and visual object across all the denoising timesteps is\u0000insufficient to generate desired objects; 2) The controllability of object\u0000generation is not guaranteed in the intricate sampling space of diffusion\u0000model. In this paper, we propose to decompose the typical single-stage object\u0000inpainting into two cascaded processes: 1) semantic pre-inpainting that infers\u0000the semantic features of desired objects in a multi-modal feature space; 2)\u0000high-fieldity object generation in diffusion latent space that pivots on such\u0000inpainted semantic features. To achieve this, we cascade a Transformer-based\u0000semantic inpainter and an object inpainting diffusion model, leading to a novel\u0000CAscaded Transformer-Diffusion (CAT-Diffusion) framework for text-guided object\u0000inpainting. Technically, the semantic inpainter is trained to predict the\u0000semantic features of the target object conditioning on unmasked context and\u0000text prompt. The outputs of the semantic inpainter then act as the informative\u0000visual prompts to guide high-fieldity object generation through a reference\u0000adapter layer, leading to controllable object inpainting. Extensive evaluations\u0000on OpenImages-V6 and MSCOCO validate the superiority of CAT-Diffusion against\u0000the state-of-the-art methods. Code is available at\u0000url{https://github.com/Nnn-s/CATdiffusion}.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"26 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1