首页 > 最新文献

arXiv - CS - Multimedia最新文献

英文 中文
Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild 为野外图像质量评估探索丰富的主观质量信息
Pub Date : 2024-09-09 DOI: arxiv-2409.05540
Xiongkuo Min, Yixuan Gao, Yuqin Cao, Guangtao Zhai, Wenjun Zhang, Huifang Sun, Chang Wen Chen
Traditional in the wild image quality assessment (IQA) models are generallytrained with the quality labels of mean opinion score (MOS), while missing therich subjective quality information contained in the quality ratings, forexample, the standard deviation of opinion scores (SOS) or even distribution ofopinion scores (DOS). In this paper, we propose a novel IQA method namedRichIQA to explore the rich subjective rating information beyond MOS to predictimage quality in the wild. RichIQA is characterized by two key novel designs:(1) a three-stage image quality prediction network which exploits the powerfulfeature representation capability of the Convolutional vision Transformer (CvT)and mimics the short-term and long-term memory mechanisms of human brain; (2) amulti-label training strategy in which rich subjective quality information likeMOS, SOS and DOS are concurrently used to train the quality prediction network.Powered by these two novel designs, RichIQA is able to predict the imagequality in terms of a distribution, from which the mean image quality can besubsequently obtained. Extensive experimental results verify that thethree-stage network is tailored to predict rich quality information, while themulti-label training strategy can fully exploit the potentials withinsubjective quality rating and enhance the prediction performance andgeneralizability of the network. RichIQA outperforms state-of-the-artcompetitors on multiple large-scale in the wild IQA databases with richsubjective rating labels. The code of RichIQA will be made publicly availableon GitHub.
传统的野外图像质量评估(IQA)模型一般使用平均意见分(MOS)作为质量标签进行训练,而忽略了质量评分中包含的丰富的主观质量信息,例如意见分的标准偏差(SOS)甚至意见分的分布(DOS)。在本文中,我们提出了一种名为 "RichIQA "的新型 IQA 方法,以探索 MOS 以外的丰富主观评分信息,从而预测野生图像的质量。RichIQA 有两个关键的新设计:(1)三阶段图像质量预测网络,利用卷积视觉变换器(CvT)强大的特征表示能力,模拟人脑的短期和长期记忆机制;(2)多标签训练策略,同时使用丰富的主观质量信息(如 MOS、SOS 和 DOS)来训练质量预测网络。在这两种新颖设计的支持下,RichIQA 能够以分布的形式预测图像质量,并从中获得平均图像质量。广泛的实验结果验证了三级网络能够预测丰富的质量信息,而多标签训练策略能够充分挖掘主观质量评级的潜力,提高网络的预测性能和通用性。RichIQA 在多个具有丰富主观评分标签的大规模野生 IQA 数据库上的表现优于目前的竞争对手。RichIQA 的代码将在 GitHub 上公开发布。
{"title":"Exploring Rich Subjective Quality Information for Image Quality Assessment in the Wild","authors":"Xiongkuo Min, Yixuan Gao, Yuqin Cao, Guangtao Zhai, Wenjun Zhang, Huifang Sun, Chang Wen Chen","doi":"arxiv-2409.05540","DOIUrl":"https://doi.org/arxiv-2409.05540","url":null,"abstract":"Traditional in the wild image quality assessment (IQA) models are generally\u0000trained with the quality labels of mean opinion score (MOS), while missing the\u0000rich subjective quality information contained in the quality ratings, for\u0000example, the standard deviation of opinion scores (SOS) or even distribution of\u0000opinion scores (DOS). In this paper, we propose a novel IQA method named\u0000RichIQA to explore the rich subjective rating information beyond MOS to predict\u0000image quality in the wild. RichIQA is characterized by two key novel designs:\u0000(1) a three-stage image quality prediction network which exploits the powerful\u0000feature representation capability of the Convolutional vision Transformer (CvT)\u0000and mimics the short-term and long-term memory mechanisms of human brain; (2) a\u0000multi-label training strategy in which rich subjective quality information like\u0000MOS, SOS and DOS are concurrently used to train the quality prediction network.\u0000Powered by these two novel designs, RichIQA is able to predict the image\u0000quality in terms of a distribution, from which the mean image quality can be\u0000subsequently obtained. Extensive experimental results verify that the\u0000three-stage network is tailored to predict rich quality information, while the\u0000multi-label training strategy can fully exploit the potentials within\u0000subjective quality rating and enhance the prediction performance and\u0000generalizability of the network. RichIQA outperforms state-of-the-art\u0000competitors on multiple large-scale in the wild IQA databases with rich\u0000subjective rating labels. The code of RichIQA will be made publicly available\u0000on GitHub.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187560","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Look One and More: Distilling Hybrid Order Relational Knowledge for Cross-Resolution Image Recognition Look One and More:为跨分辨率图像识别提炼混合秩序关系知识
Pub Date : 2024-09-09 DOI: arxiv-2409.05384
Shiming Ge, Kangkai Zhang, Haolin Liu, Yingying Hua, Shengwei Zhao, Xin Jin, Hao Wen
In spite of great success in many image recognition tasks achieved by recentdeep models, directly applying them to recognize low-resolution images maysuffer from low accuracy due to the missing of informative details duringresolution degradation. However, these images are still recognizable forsubjects who are familiar with the corresponding high-resolution ones. Inspiredby that, we propose a teacher-student learning approach to facilitatelow-resolution image recognition via hybrid order relational knowledgedistillation. The approach refers to three streams: the teacher stream ispretrained to recognize high-resolution images in high accuracy, the studentstream is learned to identify low-resolution images by mimicking the teacher'sbehaviors, and the extra assistant stream is introduced as bridge to helpknowledge transfer across the teacher to the student. To extract sufficientknowledge for reducing the loss in accuracy, the learning of student issupervised with multiple losses, which preserves the similarities in variousorder relational structures. In this way, the capability of recovering missingdetails of familiar low-resolution images can be effectively enhanced, leadingto a better knowledge transfer. Extensive experiments on metric learning,low-resolution image classification and low-resolution face recognition tasksshow the effectiveness of our approach, while taking reduced models.
尽管近代深度模型在许多图像识别任务中取得了巨大成功,但直接将其用于识别低分辨率图像可能会因分辨率下降过程中信息细节的缺失而导致识别准确率较低。然而,对于熟悉相应高分辨率图像的受试者来说,这些图像仍然是可以识别的。受此启发,我们提出了一种师生共同学习的方法,通过混合阶梯关系知识灌输促进低分辨率图像识别。该方法包括三个流:教师流经过训练,能够高精度识别高分辨率图像;学生流通过模仿教师的行为,学会识别低分辨率图像;额外的助手流作为桥梁,帮助知识从教师向学生转移。为了提取足够的知识以减少准确率的损失,对学生的学习进行了多重损失监督,从而保留了各种顺序关系结构的相似性。这样,就能有效提高对熟悉的低分辨率图像中遗漏细节的恢复能力,从而实现更好的知识迁移。在度量学习、低分辨率图像分类和低分辨率人脸识别任务中进行的大量实验表明,我们的方法是有效的,同时采用了简化的模型。
{"title":"Look One and More: Distilling Hybrid Order Relational Knowledge for Cross-Resolution Image Recognition","authors":"Shiming Ge, Kangkai Zhang, Haolin Liu, Yingying Hua, Shengwei Zhao, Xin Jin, Hao Wen","doi":"arxiv-2409.05384","DOIUrl":"https://doi.org/arxiv-2409.05384","url":null,"abstract":"In spite of great success in many image recognition tasks achieved by recent\u0000deep models, directly applying them to recognize low-resolution images may\u0000suffer from low accuracy due to the missing of informative details during\u0000resolution degradation. However, these images are still recognizable for\u0000subjects who are familiar with the corresponding high-resolution ones. Inspired\u0000by that, we propose a teacher-student learning approach to facilitate\u0000low-resolution image recognition via hybrid order relational knowledge\u0000distillation. The approach refers to three streams: the teacher stream is\u0000pretrained to recognize high-resolution images in high accuracy, the student\u0000stream is learned to identify low-resolution images by mimicking the teacher's\u0000behaviors, and the extra assistant stream is introduced as bridge to help\u0000knowledge transfer across the teacher to the student. To extract sufficient\u0000knowledge for reducing the loss in accuracy, the learning of student is\u0000supervised with multiple losses, which preserves the similarities in various\u0000order relational structures. In this way, the capability of recovering missing\u0000details of familiar low-resolution images can be effectively enhanced, leading\u0000to a better knowledge transfer. Extensive experiments on metric learning,\u0000low-resolution image classification and low-resolution face recognition tasks\u0000show the effectiveness of our approach, while taking reduced models.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
REVISION: A Roadmap on Adaptive Video Streaming Optimization 修订:自适应视频流优化路线图
Pub Date : 2024-09-09 DOI: arxiv-2409.06051
Farzad Tashtarian, Christian Timmerer
Due to the soaring popularity of video applications and the consequent risein video traffic on the Internet, technologies like HTTP Adaptive Streaming(HAS) are crucial for delivering high Quality of Experience (QoE) to consumers.HAS technology enables video players on consumer devices to enhance viewerengagement by dynamically adapting video content quality based on networkconditions. This is especially relevant for consumer electronics as it ensuresan optimized viewing experience across a variety of devices, from smartphonesto smart TVs. This paper introduces REVISION, an efficient roadmap designed toenhance adaptive video streaming, a core feature of modern consumerelectronics. The REVISION optimization triangle highlights three essentialaspects for improving streaming: Objective, Input Space, and Action Domain.Additionally, REVISION proposes a novel layer-based architecture tailored torefine video streaming systems, comprising Application, Control and Management,and Resource layers. Each layer is designed to optimize different components ofthe streaming process, which is directly linked to the performance andefficiency of consumer devices. By adopting the principles of the REVISION,manufacturers and developers can significantly improve the streamingcapabilities of consumer electronics, thereby enriching the consumer'smultimedia experience and accommodating the increasing demand for high-quality,real-time video content. This approach addresses the complexities of today'sdiverse video streaming ecosystem and paves the way for future advancements inconsumer technology.
由于视频应用日益普及,互联网上的视频流量也随之增加,HTTP 自适应流媒体(HAS)等技术对于向消费者提供高质量的体验(QoE)至关重要。HAS 技术使消费类设备上的视频播放器能够根据网络条件动态调整视频内容质量,从而提高观看参与度。这与消费类电子产品尤其相关,因为它能确保从智能手机到智能电视等各种设备上的优化观看体验。本文介绍的 REVISION 是一个高效的路线图,旨在增强自适应视频流这一现代消费电子产品的核心功能。REVISION 优化三角突出了改进流媒体的三个基本方面:此外,REVISION 还提出了一种新颖的基于层的架构,由应用层、控制和管理层以及资源层组成,专门用于改进视频流系统。每一层都旨在优化流媒体流程的不同组成部分,这与消费类设备的性能和效率直接相关。通过采用 REVISION 的原则,制造商和开发人员可以显著提高消费电子产品的流媒体能力,从而丰富消费者的多媒体体验,满足对高质量实时视频内容日益增长的需求。这种方法解决了当今多样化视频流生态系统的复杂性,并为未来消费技术的发展铺平了道路。
{"title":"REVISION: A Roadmap on Adaptive Video Streaming Optimization","authors":"Farzad Tashtarian, Christian Timmerer","doi":"arxiv-2409.06051","DOIUrl":"https://doi.org/arxiv-2409.06051","url":null,"abstract":"Due to the soaring popularity of video applications and the consequent rise\u0000in video traffic on the Internet, technologies like HTTP Adaptive Streaming\u0000(HAS) are crucial for delivering high Quality of Experience (QoE) to consumers.\u0000HAS technology enables video players on consumer devices to enhance viewer\u0000engagement by dynamically adapting video content quality based on network\u0000conditions. This is especially relevant for consumer electronics as it ensures\u0000an optimized viewing experience across a variety of devices, from smartphones\u0000to smart TVs. This paper introduces REVISION, an efficient roadmap designed to\u0000enhance adaptive video streaming, a core feature of modern consumer\u0000electronics. The REVISION optimization triangle highlights three essential\u0000aspects for improving streaming: Objective, Input Space, and Action Domain.\u0000Additionally, REVISION proposes a novel layer-based architecture tailored to\u0000refine video streaming systems, comprising Application, Control and Management,\u0000and Resource layers. Each layer is designed to optimize different components of\u0000the streaming process, which is directly linked to the performance and\u0000efficiency of consumer devices. By adopting the principles of the REVISION,\u0000manufacturers and developers can significantly improve the streaming\u0000capabilities of consumer electronics, thereby enriching the consumer's\u0000multimedia experience and accommodating the increasing demand for high-quality,\u0000real-time video content. This approach addresses the complexities of today's\u0000diverse video streaming ecosystem and paves the way for future advancements in\u0000consumer technology.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"1 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Survey of Multimodal Composite Editing and Retrieval 多模态合成编辑和检索概览
Pub Date : 2024-09-09 DOI: arxiv-2409.05405
Suyan Li, Fuxiang Huang, Lei Zhang
In the real world, where information is abundant and diverse across differentmodalities, understanding and utilizing various data types to improve retrievalsystems is a key focus of research. Multimodal composite retrieval integratesdiverse modalities such as text, image and audio, etc. to provide moreaccurate, personalized, and contextually relevant results. To facilitate adeeper understanding of this promising direction, this survey exploresmultimodal composite editing and retrieval in depth, covering image-textcomposite editing, image-text composite retrieval, and other multimodalcomposite retrieval. In this survey, we systematically organize the applicationscenarios, methods, benchmarks, experiments, and future directions. Multimodallearning is a hot topic in large model era, and have also witnessed somesurveys in multimodal learning and vision-language models with transformerspublished in the PAMI journal. To the best of our knowledge, this survey is thefirst comprehensive review of the literature on multimodal composite retrieval,which is a timely complement of multimodal fusion to existing reviews. To helpreaders' quickly track this field, we build the project page for this survey,which can be found athttps://github.com/fuxianghuang1/Multimodal-Composite-Editing-and-Retrieval.
在现实世界中,不同模式的信息丰富多样,了解和利用各种数据类型来改进检索系统是研究的重点。多模态复合检索整合了文本、图像和音频等多种模态,以提供更准确、个性化和与上下文相关的结果。为了加深对这一前景广阔的研究方向的理解,本调查深入探讨了多模态复合编辑和检索,包括图像-文本复合编辑、图像-文本复合检索和其他多模态复合检索。在本调查中,我们系统地整理了应用场景、方法、基准、实验和未来方向。多模态学习是大模型时代的热门话题,我们也在 PAMI 期刊上发表了一些关于多模态学习和带有转换器的视觉语言模型的研究。据我们所知,本调查是对多模态复合检索文献的首次全面评述,是对现有评述中多模态融合的及时补充。为了帮助读者快速跟踪这一领域,我们为这份调查报告建立了项目页面,网址是:https://github.com/fuxianghuang1/Multimodal-Composite-Editing-and-Retrieval。
{"title":"A Survey of Multimodal Composite Editing and Retrieval","authors":"Suyan Li, Fuxiang Huang, Lei Zhang","doi":"arxiv-2409.05405","DOIUrl":"https://doi.org/arxiv-2409.05405","url":null,"abstract":"In the real world, where information is abundant and diverse across different\u0000modalities, understanding and utilizing various data types to improve retrieval\u0000systems is a key focus of research. Multimodal composite retrieval integrates\u0000diverse modalities such as text, image and audio, etc. to provide more\u0000accurate, personalized, and contextually relevant results. To facilitate a\u0000deeper understanding of this promising direction, this survey explores\u0000multimodal composite editing and retrieval in depth, covering image-text\u0000composite editing, image-text composite retrieval, and other multimodal\u0000composite retrieval. In this survey, we systematically organize the application\u0000scenarios, methods, benchmarks, experiments, and future directions. Multimodal\u0000learning is a hot topic in large model era, and have also witnessed some\u0000surveys in multimodal learning and vision-language models with transformers\u0000published in the PAMI journal. To the best of our knowledge, this survey is the\u0000first comprehensive review of the literature on multimodal composite retrieval,\u0000which is a timely complement of multimodal fusion to existing reviews. To help\u0000readers' quickly track this field, we build the project page for this survey,\u0000which can be found at\u0000https://github.com/fuxianghuang1/Multimodal-Composite-Editing-and-Retrieval.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"62 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142224489","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization 自定义对比:多层次对比视角,实现主题驱动的文本到图像定制
Pub Date : 2024-09-09 DOI: arxiv-2409.05606
Nan Chen, Mengqi Huang, Zhuowei Chen, Yang Zheng, Lei Zhang, Zhendong Mao
Subject-driven text-to-image (T2I) customization has drawn significantinterest in academia and industry. This task enables pre-trained models togenerate novel images based on unique subjects. Existing studies adopt aself-reconstructive perspective, focusing on capturing all details of a singleimage, which will misconstrue the specific image's irrelevant attributes (e.g.,view, pose, and background) as the subject intrinsic attributes. Thismisconstruction leads to both overfitting or underfitting of irrelevant andintrinsic attributes of the subject, i.e., these attributes areover-represented or under-represented simultaneously, causing a trade-offbetween similarity and controllability. In this study, we argue an idealsubject representation can be achieved by a cross-differential perspective,i.e., decoupling subject intrinsic attributes from irrelevant attributes viacontrastive learning, which allows the model to focus more on intrinsicattributes through intra-consistency (features of the same subject arespatially closer) and inter-distinctiveness (features of different subjectshave distinguished differences). Specifically, we propose CustomContrast, anovel framework, which includes a Multilevel Contrastive Learning (MCL)paradigm and a Multimodal Feature Injection (MFI) Encoder. The MCL paradigm isused to extract intrinsic features of subjects from high-level semantics tolow-level appearance through crossmodal semantic contrastive learning andmultiscale appearance contrastive learning. To facilitate contrastive learning,we introduce the MFI encoder to capture cross-modal representations. Extensiveexperiments show the effectiveness of CustomContrast in subject similarity andtext controllability.
主题驱动的文本到图像(T2I)定制在学术界和工业界引起了极大的兴趣。这项任务使预先训练好的模型能够根据独特的主题生成新颖的图像。现有研究采用自我重构的视角,专注于捕捉单张图像的所有细节,这会将特定图像的无关属性(如视图、姿势和背景)误认为是主体的内在属性。这种误解会导致被摄体的无关属性和内在属性的过度拟合或不足拟合,即这些属性同时被过度呈现或不足呈现,从而造成相似性和可控性之间的权衡。在本研究中,我们认为理想的主体表征可以通过交叉差异视角来实现,即通过对比学习将主体内在属性与无关属性分离开来,从而使模型通过内在一致性(同一主体的特征在空间上更接近)和相互区别性(不同主体的特征有显著差异)更加关注内在属性。具体来说,我们提出了 "自定义对比 "这一高级框架,其中包括多级对比学习(MCL)范式和多模态特征注入(MFI)编码器。MCL 范式用于通过跨模态语义对比学习和多尺度外观对比学习,从高层语义到低层外观提取主体的内在特征。为了促进对比学习,我们引入了 MFI 编码器来捕捉跨模态表征。广泛的实验表明,自定义对比在主体相似性和文本可控性方面非常有效。
{"title":"CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization","authors":"Nan Chen, Mengqi Huang, Zhuowei Chen, Yang Zheng, Lei Zhang, Zhendong Mao","doi":"arxiv-2409.05606","DOIUrl":"https://doi.org/arxiv-2409.05606","url":null,"abstract":"Subject-driven text-to-image (T2I) customization has drawn significant\u0000interest in academia and industry. This task enables pre-trained models to\u0000generate novel images based on unique subjects. Existing studies adopt a\u0000self-reconstructive perspective, focusing on capturing all details of a single\u0000image, which will misconstrue the specific image's irrelevant attributes (e.g.,\u0000view, pose, and background) as the subject intrinsic attributes. This\u0000misconstruction leads to both overfitting or underfitting of irrelevant and\u0000intrinsic attributes of the subject, i.e., these attributes are\u0000over-represented or under-represented simultaneously, causing a trade-off\u0000between similarity and controllability. In this study, we argue an ideal\u0000subject representation can be achieved by a cross-differential perspective,\u0000i.e., decoupling subject intrinsic attributes from irrelevant attributes via\u0000contrastive learning, which allows the model to focus more on intrinsic\u0000attributes through intra-consistency (features of the same subject are\u0000spatially closer) and inter-distinctiveness (features of different subjects\u0000have distinguished differences). Specifically, we propose CustomContrast, a\u0000novel framework, which includes a Multilevel Contrastive Learning (MCL)\u0000paradigm and a Multimodal Feature Injection (MFI) Encoder. The MCL paradigm is\u0000used to extract intrinsic features of subjects from high-level semantics to\u0000low-level appearance through crossmodal semantic contrastive learning and\u0000multiscale appearance contrastive learning. To facilitate contrastive learning,\u0000we introduce the MFI encoder to capture cross-modal representations. Extensive\u0000experiments show the effectiveness of CustomContrast in subject similarity and\u0000text controllability.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"44 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation 基于 KAN 的双域融合技术,用于音频驱动的面部地标生成
Pub Date : 2024-09-09 DOI: arxiv-2409.05330
Hoang-Son Vo-Thanh, Quang-Vinh Nguyen, Soo-Hyung Kim
Audio-driven talking face generation is a widely researched topic due to itshigh applicability. Reconstructing a talking face using audio significantlycontributes to fields such as education, healthcare, online conversations,virtual assistants, and virtual reality. Early studies often focused solely onchanging the mouth movements, which resulted in outcomes with limited practicalapplications. Recently, researchers have proposed a new approach ofconstructing the entire face, including face pose, neck, and shoulders. Toachieve this, they need to generate through landmarks. However, creating stablelandmarks that align well with the audio is a challenge. In this paper, wepropose the KFusion of Dual-Domain model, a robust model that generateslandmarks from audio. We separate the audio into two distinct domains to learnemotional information and facial context, then use a fusion mechanism based onthe KAN model. Our model demonstrates high efficiency compared to recentmodels. This will lay the groundwork for the development of the audio-driventalking face generation problem in the future.
音频驱动的人脸识别技术具有很强的适用性,因此是一个被广泛研究的课题。利用音频重建会说话的人脸对教育、医疗保健、在线对话、虚拟助手和虚拟现实等领域大有裨益。早期的研究通常只关注嘴部动作的变化,结果实际应用有限。最近,研究人员提出了一种新方法,即构建整个面部,包括面部姿势、颈部和肩部。为了实现这一目标,他们需要通过地标来生成。然而,创建与音频完全一致的稳定地标是一项挑战。在本文中,我们提出了 KFusion 双域模型,这是一种从音频生成地标的稳健模型。我们将音频分为两个不同的域来学习情感信息和面部上下文,然后使用基于 KAN 模型的融合机制。与最近的模型相比,我们的模型具有很高的效率。这将为未来开发音频驱动的人脸生成问题奠定基础。
{"title":"KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation","authors":"Hoang-Son Vo-Thanh, Quang-Vinh Nguyen, Soo-Hyung Kim","doi":"arxiv-2409.05330","DOIUrl":"https://doi.org/arxiv-2409.05330","url":null,"abstract":"Audio-driven talking face generation is a widely researched topic due to its\u0000high applicability. Reconstructing a talking face using audio significantly\u0000contributes to fields such as education, healthcare, online conversations,\u0000virtual assistants, and virtual reality. Early studies often focused solely on\u0000changing the mouth movements, which resulted in outcomes with limited practical\u0000applications. Recently, researchers have proposed a new approach of\u0000constructing the entire face, including face pose, neck, and shoulders. To\u0000achieve this, they need to generate through landmarks. However, creating stable\u0000landmarks that align well with the audio is a challenge. In this paper, we\u0000propose the KFusion of Dual-Domain model, a robust model that generates\u0000landmarks from audio. We separate the audio into two distinct domains to learn\u0000emotional information and facial context, then use a fusion mechanism based on\u0000the KAN model. Our model demonstrates high efficiency compared to recent\u0000models. This will lay the groundwork for the development of the audio-driven\u0000talking face generation problem in the future.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A CLIP-based siamese approach for meme classification 基于 CLIP 的连体方法进行备忘录分类
Pub Date : 2024-09-09 DOI: arxiv-2409.05772
Javier Huertas-Tato, Christos Koutlis, Symeon Papadopoulos, David Camacho, Ioannis Kompatsiaris
Memes are an increasingly prevalent element of online discourse in socialnetworks, especially among young audiences. They carry ideas and messages thatrange from humorous to hateful, and are widely consumed. Their potentially highimpact requires adequate means of control to moderate their use in large scale.In this work, we propose SimCLIP a deep learning-based architecture forcross-modal understanding of memes, leveraging a pre-trained CLIP encoder toproduce context-aware embeddings and a Siamese fusion technique to capture theinteractions between text and image. We perform an extensive experimentation onseven meme classification tasks across six datasets. We establish a new stateof the art in Memotion7k with a 7.25% relative F1-score improvement, andachieve super-human performance on Harm-P with 13.73% F1-Score improvement. Ourapproach demonstrates the potential for compact meme classification models,enabling accurate and efficient meme monitoring. We share our code athttps://github.com/jahuerta92/meme-classification-simclip
在社交网络的在线讨论中,特别是在年轻受众中,"备忘录 "是一个越来越普遍的元素。它们承载着从幽默到仇恨的各种想法和信息,被广泛使用。在这项工作中,我们提出了 SimCLIP,这是一种基于深度学习的架构,利用预先训练的 CLIP 编码器生成上下文感知嵌入,并利用连体融合技术捕捉文本和图像之间的交互,从而实现对备忘录的跨模态理解。我们在六个数据集的七个meme分类任务上进行了广泛的实验。我们在 Memotion7k 中建立了新的技术水平,相对 F1 分数提高了 7.25%;在 Harm-P 中取得了超人的性能,F1 分数提高了 13.73%。我们的方法展示了紧凑型 meme 分类模型的潜力,可以实现准确、高效的 meme 监控。我们将在以下网址分享我们的代码:https://github.com/jahuerta92/meme-classification-simclip
{"title":"A CLIP-based siamese approach for meme classification","authors":"Javier Huertas-Tato, Christos Koutlis, Symeon Papadopoulos, David Camacho, Ioannis Kompatsiaris","doi":"arxiv-2409.05772","DOIUrl":"https://doi.org/arxiv-2409.05772","url":null,"abstract":"Memes are an increasingly prevalent element of online discourse in social\u0000networks, especially among young audiences. They carry ideas and messages that\u0000range from humorous to hateful, and are widely consumed. Their potentially high\u0000impact requires adequate means of control to moderate their use in large scale.\u0000In this work, we propose SimCLIP a deep learning-based architecture for\u0000cross-modal understanding of memes, leveraging a pre-trained CLIP encoder to\u0000produce context-aware embeddings and a Siamese fusion technique to capture the\u0000interactions between text and image. We perform an extensive experimentation on\u0000seven meme classification tasks across six datasets. We establish a new state\u0000of the art in Memotion7k with a 7.25% relative F1-score improvement, and\u0000achieve super-human performance on Harm-P with 13.73% F1-Score improvement. Our\u0000approach demonstrates the potential for compact meme classification models,\u0000enabling accurate and efficient meme monitoring. We share our code at\u0000https://github.com/jahuerta92/meme-classification-simclip","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"23 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Educational Virtual Field Trips based on Social VR and 360° Spaces 基于社交 VR 和 360° 空间的教育虚拟实地考察
Pub Date : 2024-09-09 DOI: arxiv-2409.05496
Surya Kalvakolu, Heinrich Söbke, Jannicke Baalsrud Hauge, Eckhard Kraft
Virtual field trips (VFTs) have proven to be valuable learning tools. Suchapplications are mostly based on 360{deg} technology and are to becharacterized as single-user applications in technological terms. In contrast,Social VR applications are characterized by multi-user capability anduser-specific avatars. From a learning perspective, the concepts ofcollaborative learning and embodiment have long been proposed as conducive tolearning. Both concepts might be supported using Social VR. However, little iscurrently known about the use of Social VR for VFTs. Accordingly, the researchquestions are to what extent VFTs can be implemented in Social VR environmentsand how these Social VR-based VFTs are perceived by learners. This articlepresents an evaluation study on the development and evaluation of a VFTenvironment using the Social VR platform Mozilla Hubs. It describes the designdecisions to create the environment and evaluation results from a mixed-methodstudy (N=16) using a questionnaire and focus group discussions. The studyhighlighted the opportunities offered by Social VR-based VFTs but also revealedseveral challenges that need to be addressed to embrace the potential of SocialVR-based VFTs to be utilized regularly in education.
虚拟实地考察(VFTs)已被证明是有价值的学习工具。这类应用大多基于 360 度技术,在技术上可定性为单用户应用。与此相反,社交 VR 应用的特点是多用户能力和特定用户化身。从学习的角度来看,协作学习和化身的概念很早就被提出来,认为它们有利于学习。社交 VR 可以支持这两个概念。然而,目前人们对社交虚拟现实用于 VFT 的情况知之甚少。因此,研究的问题是,VFT 在多大程度上可以在社交 VR 环境中实施,以及学习者如何看待这些基于社交 VR 的 VFT。本文介绍了一项关于使用社交 VR 平台 Mozilla Hubs 开发和评估 VFT 环境的评估研究。文章介绍了创建该环境的设计决策,以及使用问卷调查和焦点小组讨论进行的混合方法研究(N=16)的评估结果。研究强调了基于社交 VR 的虚拟外语教学所提供的机遇,但也揭示了一些挑战,这些挑战需要加以解决,才能将基于社交 VR 的虚拟外语教学的潜力在教育中经常加以利用。
{"title":"Educational Virtual Field Trips based on Social VR and 360° Spaces","authors":"Surya Kalvakolu, Heinrich Söbke, Jannicke Baalsrud Hauge, Eckhard Kraft","doi":"arxiv-2409.05496","DOIUrl":"https://doi.org/arxiv-2409.05496","url":null,"abstract":"Virtual field trips (VFTs) have proven to be valuable learning tools. Such\u0000applications are mostly based on 360{deg} technology and are to be\u0000characterized as single-user applications in technological terms. In contrast,\u0000Social VR applications are characterized by multi-user capability and\u0000user-specific avatars. From a learning perspective, the concepts of\u0000collaborative learning and embodiment have long been proposed as conducive to\u0000learning. Both concepts might be supported using Social VR. However, little is\u0000currently known about the use of Social VR for VFTs. Accordingly, the research\u0000questions are to what extent VFTs can be implemented in Social VR environments\u0000and how these Social VR-based VFTs are perceived by learners. This article\u0000presents an evaluation study on the development and evaluation of a VFT\u0000environment using the Social VR platform Mozilla Hubs. It describes the design\u0000decisions to create the environment and evaluation results from a mixed-method\u0000study (N=16) using a questionnaire and focus group discussions. The study\u0000highlighted the opportunities offered by Social VR-based VFTs but also revealed\u0000several challenges that need to be addressed to embrace the potential of Social\u0000VR-based VFTs to be utilized regularly in education.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"59 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Visual Grounding with Multi-modal Conditional Adaptation 多模式条件适应的视觉基础
Pub Date : 2024-09-08 DOI: arxiv-2409.04999
Ruilin Yao, Shengwu Xiong, Yichen Zhao, Yi Rong
Visual grounding is the task of locating objects specified by naturallanguage expressions. Existing methods extend generic object detectionframeworks to tackle this task. They typically extract visual and textualfeatures separately using independent visual and textual encoders, then fusethese features in a multi-modal decoder for final prediction. However, visualgrounding presents unique challenges. It often involves locating objects withdifferent text descriptions within the same image. Existing methods strugglewith this task because the independent visual encoder produces identical visualfeatures for the same image, limiting detection performance. Some recentlyapproaches propose various language-guided visual encoders to address thisissue, but they mostly rely solely on textual information and requiresophisticated designs. In this paper, we introduce Multi-modal ConditionalAdaptation (MMCA), which enables the visual encoder to adaptively updateweights, directing its focus towards text-relevant regions. Specifically, wefirst integrate information from different modalities to obtain multi-modalembeddings. Then we utilize a set of weighting coefficients, which generatedfrom the multimodal embeddings, to reorganize the weight update matrices andapply them to the visual encoder of the visual grounding model. Extensiveexperiments on four widely used datasets demonstrate that MMCA achievessignificant improvements and state-of-the-art results. Ablation experimentsfurther demonstrate the lightweight and efficiency of our method. Our sourcecode is available at: https://github.com/Mr-Bigworth/MMCA.
可视化接地是通过自然语言表达式定位指定对象的任务。现有方法对通用对象检测框架进行了扩展,以解决这一任务。它们通常使用独立的视觉和文本编码器分别提取视觉和文本特征,然后在多模态解码器中融合这些特征进行最终预测。然而,视觉定位带来了独特的挑战。它通常涉及在同一图像中定位不同文本描述的对象。现有的方法很难完成这项任务,因为独立的视觉编码器会对同一图像产生相同的视觉特征,从而限制了检测性能。最近,一些方法提出了各种语言引导的视觉编码器来解决这个问题,但它们大多只依赖文本信息,需要复杂的设计。在本文中,我们引入了多模态条件适应(Multi-modal ConditionalAdaptation,MMCA),它使视觉编码器能够自适应地更新权重,将重点引向文本相关区域。具体来说,我们首先整合来自不同模态的信息,以获得多模态嵌入。然后,我们利用从多模态嵌入中生成的一组加权系数来重组权重更新矩阵,并将其应用于视觉接地模型的视觉编码器。在四个广泛使用的数据集上进行的大量实验证明,MMCA 取得了显著的改进和最先进的结果。消融实验进一步证明了我们方法的轻便和高效。我们的源代码可在以下网址获取:https://github.com/Mr-Bigworth/MMCA。
{"title":"Visual Grounding with Multi-modal Conditional Adaptation","authors":"Ruilin Yao, Shengwu Xiong, Yichen Zhao, Yi Rong","doi":"arxiv-2409.04999","DOIUrl":"https://doi.org/arxiv-2409.04999","url":null,"abstract":"Visual grounding is the task of locating objects specified by natural\u0000language expressions. Existing methods extend generic object detection\u0000frameworks to tackle this task. They typically extract visual and textual\u0000features separately using independent visual and textual encoders, then fuse\u0000these features in a multi-modal decoder for final prediction. However, visual\u0000grounding presents unique challenges. It often involves locating objects with\u0000different text descriptions within the same image. Existing methods struggle\u0000with this task because the independent visual encoder produces identical visual\u0000features for the same image, limiting detection performance. Some recently\u0000approaches propose various language-guided visual encoders to address this\u0000issue, but they mostly rely solely on textual information and require\u0000sophisticated designs. In this paper, we introduce Multi-modal Conditional\u0000Adaptation (MMCA), which enables the visual encoder to adaptively update\u0000weights, directing its focus towards text-relevant regions. Specifically, we\u0000first integrate information from different modalities to obtain multi-modal\u0000embeddings. Then we utilize a set of weighting coefficients, which generated\u0000from the multimodal embeddings, to reorganize the weight update matrices and\u0000apply them to the visual encoder of the visual grounding model. Extensive\u0000experiments on four widely used datasets demonstrate that MMCA achieves\u0000significant improvements and state-of-the-art results. Ablation experiments\u0000further demonstrate the lightweight and efficiency of our method. Our source\u0000code is available at: https://github.com/Mr-Bigworth/MMCA.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"5 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
POINTS: Improving Your Vision-language Model with Affordable Strategies 要点:用经济实惠的策略改进您的视觉语言模式
Pub Date : 2024-09-07 DOI: arxiv-2409.04828
Yuan Liu, Zhongyin Zhao, Ziyuan Zhuang, Le Tian, Xiao Zhou, Jie Zhou
In recent years, vision-language models have made significant strides,excelling in tasks like optical character recognition and geometricproblem-solving. However, several critical issues remain: 1) Proprietary modelsoften lack transparency about their architectures, while open-source modelsneed more detailed ablations of their training strategies. 2) Pre-training datain open-source works is under-explored, with datasets added empirically, makingthe process cumbersome. 3) Fine-tuning often focuses on adding datasets,leading to diminishing returns. To address these issues, we propose thefollowing contributions: 1) We trained a robust baseline model using the latestadvancements in vision-language models, introducing effective improvements andconducting comprehensive ablation and validation for each technique. 2)Inspired by recent work on large language models, we filtered pre-training datausing perplexity, selecting the lowest perplexity data for training. Thisapproach allowed us to train on a curated 1M dataset, achieving competitiveperformance. 3) During visual instruction tuning, we used model soup ondifferent datasets when adding more datasets yielded marginal improvements.These innovations resulted in a 9B parameter model that performs competitivelywith state-of-the-art models. Our strategies are efficient and lightweight,making them easily adoptable by the community.
近年来,视觉语言模型取得了长足进步,在光学字符识别和几何问题解决等任务中表现出色。然而,几个关键问题依然存在:1) 专有模型的架构往往缺乏透明度,而开源模型则需要更详细的训练策略说明。2)开源模型的预训练数据还未得到充分开发,数据集是根据经验添加的,这使得整个过程非常繁琐。3)微调往往集中在增加数据集上,导致收益递减。为了解决这些问题,我们提出了以下贡献:1)我们利用视觉语言模型的最新进展训练了一个稳健的基线模型,引入了有效的改进措施,并对每种技术进行了全面的消减和验证。2)受近期大型语言模型研究的启发,我们利用plexity过滤了预训练数据,选择plexity最低的数据进行训练。这种方法使我们能够在一个经过策划的 100 万数据集上进行训练,并取得了具有竞争力的性能。3) 在视觉指令调整过程中,当添加更多数据集只产生边际改进时,我们在不同数据集上使用了模型汤。我们的策略既高效又轻便,很容易被社区采用。
{"title":"POINTS: Improving Your Vision-language Model with Affordable Strategies","authors":"Yuan Liu, Zhongyin Zhao, Ziyuan Zhuang, Le Tian, Xiao Zhou, Jie Zhou","doi":"arxiv-2409.04828","DOIUrl":"https://doi.org/arxiv-2409.04828","url":null,"abstract":"In recent years, vision-language models have made significant strides,\u0000excelling in tasks like optical character recognition and geometric\u0000problem-solving. However, several critical issues remain: 1) Proprietary models\u0000often lack transparency about their architectures, while open-source models\u0000need more detailed ablations of their training strategies. 2) Pre-training data\u0000in open-source works is under-explored, with datasets added empirically, making\u0000the process cumbersome. 3) Fine-tuning often focuses on adding datasets,\u0000leading to diminishing returns. To address these issues, we propose the\u0000following contributions: 1) We trained a robust baseline model using the latest\u0000advancements in vision-language models, introducing effective improvements and\u0000conducting comprehensive ablation and validation for each technique. 2)\u0000Inspired by recent work on large language models, we filtered pre-training data\u0000using perplexity, selecting the lowest perplexity data for training. This\u0000approach allowed us to train on a curated 1M dataset, achieving competitive\u0000performance. 3) During visual instruction tuning, we used model soup on\u0000different datasets when adding more datasets yielded marginal improvements.\u0000These innovations resulted in a 9B parameter model that performs competitively\u0000with state-of-the-art models. Our strategies are efficient and lightweight,\u0000making them easily adoptable by the community.","PeriodicalId":501480,"journal":{"name":"arXiv - CS - Multimedia","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142187605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
arXiv - CS - Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1