首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
Frequency-Guided Spatial Adaptation for Camouflaged Object Detection 频率制导空间自适应伪装目标检测
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-17 DOI: 10.1109/TMM.2024.3521681
Shizhou Zhang;Dexuan Kong;Yinghui Xing;Yue Lu;Lingyan Ran;Guoqiang Liang;Hexu Wang;Yanning Zhang
Camouflaged object detection (COD) aims to segment camouflaged objects which exhibit very similar patterns with the surrounding environment. Recent research works have shown that enhancing the feature representation via the frequency information can greatly alleviate the ambiguity problem between the foreground objects and the background. With the emergence of vision foundation models, like InternImage, Segment Anything Model etc, adapting the pretrained model on COD tasks with a lightweight adapter module shows a novel and promising research direction. Existing adapter modules mainly care about the feature adaptation in the spatial domain. In this paper, we propose a novel frequency-guided spatial adaptation method for COD task. Specifically, we transform the input features of the adapter into frequency domain. By grouping and interacting with frequency components located within non overlapping circles in the spectrogram, different frequency components are dynamically enhanced or weakened, making the intensity of image details and contour features adaptively adjusted. At the same time, the features that are conducive to distinguishing object and background are highlighted, indirectly implying the position and shape of camouflaged object. We conduct extensive experiments on four widely adopted benchmark datasets and the proposed method outperforms 26 state-of-the-art methods with large margins. Code will be released.
伪装目标检测(COD)的目的是分割出与周围环境非常相似的伪装对象。近年来的研究表明,通过频率信息增强特征表示可以极大地缓解前景目标与背景之间的模糊问题。随着视觉基础模型如InternImage、Segment Anything Model等的出现,将预训练好的模型适配到COD任务上,采用轻量级的适配模块是一个新颖而有前景的研究方向。现有的适配模块主要关注空间域的特征适配。本文提出了一种基于频率制导的COD空间自适应方法。具体来说,我们将适配器的输入特征转换到频域。通过对谱图中不重叠圆内的频率分量进行分组和相互作用,动态增强或减弱不同的频率分量,使图像细节和轮廓特征的强度自适应调整。同时突出有利于区分物体和背景的特征,间接暗示被伪装物体的位置和形状。我们在四个广泛采用的基准数据集上进行了广泛的实验,所提出的方法优于26种最先进的方法。代码将被发布。
{"title":"Frequency-Guided Spatial Adaptation for Camouflaged Object Detection","authors":"Shizhou Zhang;Dexuan Kong;Yinghui Xing;Yue Lu;Lingyan Ran;Guoqiang Liang;Hexu Wang;Yanning Zhang","doi":"10.1109/TMM.2024.3521681","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521681","url":null,"abstract":"Camouflaged object detection (COD) aims to segment camouflaged objects which exhibit very similar patterns with the surrounding environment. Recent research works have shown that enhancing the feature representation via the frequency information can greatly alleviate the ambiguity problem between the foreground objects and the background. With the emergence of vision foundation models, like InternImage, Segment Anything Model etc, adapting the pretrained model on COD tasks with a lightweight adapter module shows a novel and promising research direction. Existing adapter modules mainly care about the feature adaptation in the spatial domain. In this paper, we propose a novel frequency-guided spatial adaptation method for COD task. Specifically, we transform the input features of the adapter into frequency domain. By grouping and interacting with frequency components located within non overlapping circles in the spectrogram, different frequency components are dynamically enhanced or weakened, making the intensity of image details and contour features adaptively adjusted. At the same time, the features that are conducive to distinguishing object and background are highlighted, indirectly implying the position and shape of camouflaged object. We conduct extensive experiments on four widely adopted benchmark datasets and the proposed method outperforms 26 state-of-the-art methods with large margins. Code will be released.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"72-83"},"PeriodicalIF":8.4,"publicationDate":"2025-01-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993824","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Scatter Sparse Dictionary Pair Learning for Cross-Domain Classification 跨域分类的交叉散射稀疏字典对学习
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-13 DOI: 10.1109/TMM.2024.3521731
Lin Jiang;Jigang Wu;Shuping Zhao;Jiaxing Li
In cross-domain recognition tasks, the divergent distributions of data acquired from various domains degrade the effectiveness of knowledge transfer. Additionally, in practice, cross-domain data also contain a massive amount of redundant information, usually disturbing the training processes of cross-domain classifiers. Seeking to address these issues and obtain efficient domain-invariant knowledge, this paper proposes a novel cross-domain classification method, named cross-scatter sparse dictionary pair learning (CSSDL). Firstly, a pair of dictionaries is learned in a common subspace, in which the marginal distribution divergence between the cross-domain data is mitigated, and domain-invariant information can be efficiently extracted. Then, a cross-scatter discriminant term is proposed to decrease the distance between cross-domain data belonging to the same class. As such, this term guarantees that the data derived from same class can be aligned and that the conditional distribution divergence is mitigated. In addition, a flexible label regression method is introduced to match the feature representation and label information in the label space. Thereafter, a discriminative and transferable feature representation can be obtained. Moreover, two sparse constraints are introduced to maintain the sparse characteristics of the feature representation. Extensive experimental results obtained on public datasets demonstrate the effectiveness of the proposed CSSDL approach.
在跨领域识别任务中,不同领域的数据分布不一致,降低了知识转移的有效性。此外,在实际应用中,跨域数据还包含大量冗余信息,通常会干扰跨域分类器的训练过程。为了解决这些问题并获得高效的领域不变知识,本文提出了一种新的跨领域分类方法——交叉散射稀疏字典对学习(cross-scatter sparse dictionary pair learning, CSSDL)。首先,在公共子空间中学习一对字典,减轻了跨域数据的边际分布发散,有效地提取了域不变信息;然后,提出了一个交叉散射判别项,以减小属于同一类的跨域数据之间的距离。因此,这一项保证了来自同一类的数据可以对齐,并减轻了条件分布的分歧。此外,引入了一种灵活的标签回归方法来匹配标签空间中的特征表示和标签信息。从而得到可判别、可转移的特征表示。此外,引入了两个稀疏约束来保持特征表示的稀疏特性。在公共数据集上获得的大量实验结果证明了所提出的CSSDL方法的有效性。
{"title":"Cross-Scatter Sparse Dictionary Pair Learning for Cross-Domain Classification","authors":"Lin Jiang;Jigang Wu;Shuping Zhao;Jiaxing Li","doi":"10.1109/TMM.2024.3521731","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521731","url":null,"abstract":"In cross-domain recognition tasks, the divergent distributions of data acquired from various domains degrade the effectiveness of knowledge transfer. Additionally, in practice, cross-domain data also contain a massive amount of redundant information, usually disturbing the training processes of cross-domain classifiers. Seeking to address these issues and obtain efficient domain-invariant knowledge, this paper proposes a novel cross-domain classification method, named cross-scatter sparse dictionary pair learning (CSSDL). Firstly, a pair of dictionaries is learned in a common subspace, in which the marginal distribution divergence between the cross-domain data is mitigated, and domain-invariant information can be efficiently extracted. Then, a cross-scatter discriminant term is proposed to decrease the distance between cross-domain data belonging to the same class. As such, this term guarantees that the data derived from same class can be aligned and that the conditional distribution divergence is mitigated. In addition, a flexible label regression method is introduced to match the feature representation and label information in the label space. Thereafter, a discriminative and transferable feature representation can be obtained. Moreover, two sparse constraints are introduced to maintain the sparse characteristics of the feature representation. Extensive experimental results obtained on public datasets demonstrate the effectiveness of the proposed CSSDL approach.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"371-384"},"PeriodicalIF":8.4,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization DPStyler:用于无源域泛化的动态PromptStyler
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-06 DOI: 10.1109/TMM.2024.3521671
Yunlong Tang;Yuxuan Wan;Lei Qi;Xin Geng
Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Research in SFDG primarily bulids upon the existing knowledge of large-scale vision-language models and utilizes the pre-trained model's joint vision-language space to simulate style transfer across domains, thus eliminating the dependency on source domain images. However, how to efficiently simulate rich and diverse styles using text prompts, and how to extract domain-invariant information useful for classification from features that contain both semantic and style information after the encoder, are directions that merit improvement. In this paper, we introduce Dynamic PromptStyler (DPStyler), comprising Style Generation and Style Removal modules to address these issues. The Style Generation module refreshes all styles at every training epoch, while the Style Removal module eliminates variations in the encoder's output features caused by input styles. Moreover, since the Style Generation module, responsible for generating style word vectors using random sampling or style mixing, makes the model sensitive to input text prompts, we introduce a model ensemble method to mitigate this sensitivity. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods on benchmark datasets.
无源域泛化(SFDG)旨在开发一种不依赖任何源域的未知目标域模型。SFDG的研究主要建立在大规模视觉语言模型的现有知识基础上,利用预训练模型的联合视觉语言空间来模拟跨领域的风格迁移,从而消除了对源领域图像的依赖。然而,如何利用文本提示有效地模拟丰富多样的样式,以及如何从包含语义和样式信息的特征中提取对分类有用的域不变信息,是值得改进的方向。在本文中,我们介绍了Dynamic PromptStyler (DPStyler),它包括样式生成和样式移除模块来解决这些问题。样式生成模块在每个训练阶段刷新所有样式,而样式移除模块消除由输入样式引起的编码器输出特征的变化。此外,由于负责使用随机采样或风格混合生成风格词向量的风格生成模块使模型对输入文本提示敏感,因此我们引入了模型集成方法来减轻这种敏感性。大量的实验表明,我们的框架在基准数据集上优于最先进的方法。
{"title":"DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization","authors":"Yunlong Tang;Yuxuan Wan;Lei Qi;Xin Geng","doi":"10.1109/TMM.2024.3521671","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521671","url":null,"abstract":"Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Research in SFDG primarily bulids upon the existing knowledge of large-scale vision-language models and utilizes the pre-trained model's joint vision-language space to simulate style transfer across domains, thus eliminating the dependency on source domain images. However, how to efficiently simulate rich and diverse styles using text prompts, and how to extract domain-invariant information useful for classification from features that contain both semantic and style information after the encoder, are directions that merit improvement. In this paper, we introduce Dynamic PromptStyler (DPStyler), comprising Style Generation and Style Removal modules to address these issues. The Style Generation module refreshes all styles at every training epoch, while the Style Removal module eliminates variations in the encoder's output features caused by input styles. Moreover, since the Style Generation module, responsible for generating style word vectors using random sampling or style mixing, makes the model sensitive to input text prompts, we introduce a model ensemble method to mitigate this sensitivity. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods on benchmark datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"120-132"},"PeriodicalIF":8.4,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
List of Reviewers 审稿人名单
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-03 DOI: 10.1109/TMM.2024.3501532
{"title":"List of Reviewers","authors":"","doi":"10.1109/TMM.2024.3501532","DOIUrl":"https://doi.org/10.1109/TMM.2024.3501532","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"11428-11439"},"PeriodicalIF":8.4,"publicationDate":"2025-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10823085","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding 弱监督时态句基础的双语义重构网络
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-01-01 DOI: 10.1109/TMM.2024.3521676
Kefan Tang;Lihuo He;Nannan Wang;Xinbo Gao
Weakly supervised temporal sentence grounding aims to identify semantically relevant video moments in an untrimmed video corresponding to a given sentence query without exact timestamps. Neuropsychology research indicates that the way the human brain handles information varies based on the grammatical categories of words, highlighting the importance of separately considering nouns and verbs. However, current methodologies primarily utilize pre-extracted video features to reconstruct randomly masked queries, neglecting the distinction between grammatical classes. This oversight could hinder forming meaningful connections between linguistic elements and the corresponding components in the video. To address this limitation, this paper introduces the dual semantic reconstruction network (DSRN) model. DSRN processes video features by distinctly correlating object features with nouns and motion features with verbs, thereby mimicking the human brain's parsing mechanism. It begins with a feature disentanglement module that separately extracts object-aware and motion-aware features from video content. Then, in a dual-branch structure, these disentangled features are used to generate separate proposals for objects and motions through two dedicated proposal generation modules. A consistency constraint is proposed to ensure a high level of agreement between the boundaries of object-related and motion-related proposals. Subsequently, the DSRN independently reconstructs masked nouns and verbs from the sentence queries using the generated proposals. Finally, an integration block is applied to synthesize the two types of proposals, distinguishing between positive and negative instances through contrastive learning. Experiments on the Charades-STA and ActivityNet Captions datasets demonstrate that the proposed method achieves state-of-the-art performance.
弱监督时态句子基础的目的是在没有精确时间戳的情况下,识别与给定句子查询相对应的未修剪视频中的语义相关视频时刻。神经心理学研究表明,人类大脑处理信息的方式因词汇的语法类别而异,强调了分别考虑名词和动词的重要性。然而,目前的方法主要是利用预提取的视频特征来重建随机屏蔽查询,而忽略了语法类之间的区别。这种疏忽可能会妨碍在语言元素和视频中的相应组件之间形成有意义的联系。为了解决这一问题,本文引入了双语义重构网络(DSRN)模型。DSRN通过将物体特征与名词、运动特征与动词明显关联来处理视频特征,从而模仿人脑的解析机制。它从一个特征解缠模块开始,该模块分别从视频内容中提取对象感知和动作感知特征。然后,在双分支结构中,通过两个专用的建议生成模块,将这些解耦的特征用于生成对象和运动的单独建议。提出了一种一致性约束,以确保物体相关和运动相关建议之间的边界高度一致。随后,DSRN使用生成的建议,独立地从句子查询中重构隐藏名词和动词。最后,运用整合块对两类建议进行综合,通过对比学习区分正面和负面实例。在Charades-STA和ActivityNet Captions数据集上的实验表明,该方法达到了最先进的性能。
{"title":"Dual Semantic Reconstruction Network for Weakly Supervised Temporal Sentence Grounding","authors":"Kefan Tang;Lihuo He;Nannan Wang;Xinbo Gao","doi":"10.1109/TMM.2024.3521676","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521676","url":null,"abstract":"Weakly supervised temporal sentence grounding aims to identify semantically relevant video moments in an untrimmed video corresponding to a given sentence query without exact timestamps. Neuropsychology research indicates that the way the human brain handles information varies based on the grammatical categories of words, highlighting the importance of separately considering nouns and verbs. However, current methodologies primarily utilize pre-extracted video features to reconstruct randomly masked queries, neglecting the distinction between grammatical classes. This oversight could hinder forming meaningful connections between linguistic elements and the corresponding components in the video. To address this limitation, this paper introduces the dual semantic reconstruction network (DSRN) model. DSRN processes video features by distinctly correlating object features with nouns and motion features with verbs, thereby mimicking the human brain's parsing mechanism. It begins with a feature disentanglement module that separately extracts object-aware and motion-aware features from video content. Then, in a dual-branch structure, these disentangled features are used to generate separate proposals for objects and motions through two dedicated proposal generation modules. A consistency constraint is proposed to ensure a high level of agreement between the boundaries of object-related and motion-related proposals. Subsequently, the DSRN independently reconstructs masked nouns and verbs from the sentence queries using the generated proposals. Finally, an integration block is applied to synthesize the two types of proposals, distinguishing between positive and negative instances through contrastive learning. Experiments on the Charades-STA and ActivityNet Captions datasets demonstrate that the proposed method achieves state-of-the-art performance.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"95-107"},"PeriodicalIF":8.4,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disaggregation Distillation for Person Search 人物搜索的分解蒸馏
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-30 DOI: 10.1109/TMM.2024.3521732
Yizhen Jia;Rong Quan;Haiyan Chen;Jiamei Liu;Yichao Yan;Song Bai;Jie Qin
Person search is a challenging task in computer vision and multimedia understanding, which aims at localizing and identifying target individuals in realistic scenes. State-of-the-art models achieve remarkable success but suffer from overloaded computation and inefficient inference, making them impractical in most real-world applications. A promising approach to tackle this dilemma is to compress person search models with knowledge distillation (KD). Previous KD-based person search methods typically distill the knowledge from the re-identification (re-id) branch, completely overlooking the useful knowledge from the detection branch. In addition, we elucidate that the imbalance between person and background regions in feature maps has a negative impact on the distillation process. To this end, we propose a novel KD-based approach, namely Disaggregation Distillation for Person Search (DDPS), which disaggregates the distillation process and feature maps, respectively. Firstly, the distillation process is disaggregated into two task-oriented sub-processes, i.e., detection distillation and re-id distillation, to help the student learn both accurate localization capability and discriminative person embeddings. Secondly, we disaggregate each feature map into person and background regions, and distill these two regions independently to alleviate the imbalance problem. More concretely, three types of distillation modules, i.e., logit distillation (LD), correlation distillation (CD), and disaggregation feature distillation (DFD), are particularly designed to transfer comprehensive information from the teacher to the student. Note that such a simple yet effective distillation scheme can be readily applied to both homogeneous and heterogeneous teacher-student combinations. We conduct extensive experiments on two person search benchmarks, where the results demonstrate that, surprisingly, our DDPS enables the student model to surpass the performance of the corresponding teacher model, even achieving comparable results with general person search models.
人物搜索是计算机视觉和多媒体理解领域的一项具有挑战性的任务,其目的是在现实场景中定位和识别目标个体。最先进的模型取得了显著的成功,但受到计算过载和推理效率低下的影响,使其在大多数实际应用中不切实际。解决这一困境的一个很有前途的方法是用知识蒸馏(KD)压缩人员搜索模型。以前基于kd的人员搜索方法通常是从重新识别分支中提取知识,而完全忽略了从检测分支中提取的有用知识。此外,我们阐明了特征映射中人物和背景区域之间的不平衡对蒸馏过程有负面影响。为此,我们提出了一种新的基于kd的方法,即Disaggregation Distillation for Person Search (DDPS),它分别分解了蒸馏过程和特征映射。首先,将蒸馏过程分解为两个面向任务的子过程,即检测蒸馏和重新识别蒸馏,以帮助学生学习准确的定位能力和判别性的人嵌入。其次,我们将每个特征映射分解为人物和背景区域,并将这两个区域独立提取,以缓解不平衡问题;更具体地说,专门设计了三种类型的蒸馏模块,即logit蒸馏(LD),相关蒸馏(CD)和分解特征蒸馏(DFD),以将全面的信息从教师传递给学生。注意,这种简单而有效的蒸馏方案可以很容易地应用于同质和异质师生组合。我们在两个人物搜索基准上进行了广泛的实验,结果表明,令人惊讶的是,我们的DDPS使学生模型的性能超过了相应的教师模型,甚至达到了与一般人物搜索模型相当的结果。
{"title":"Disaggregation Distillation for Person Search","authors":"Yizhen Jia;Rong Quan;Haiyan Chen;Jiamei Liu;Yichao Yan;Song Bai;Jie Qin","doi":"10.1109/TMM.2024.3521732","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521732","url":null,"abstract":"Person search is a challenging task in computer vision and multimedia understanding, which aims at localizing and identifying target individuals in realistic scenes. State-of-the-art models achieve remarkable success but suffer from overloaded computation and inefficient inference, making them impractical in most real-world applications. A promising approach to tackle this dilemma is to compress person search models with knowledge distillation (KD). Previous KD-based person search methods typically distill the knowledge from the re-identification (re-id) branch, completely overlooking the useful knowledge from the detection branch. In addition, we elucidate that the imbalance between person and background regions in feature maps has a negative impact on the distillation process. To this end, we propose a novel KD-based approach, namely Disaggregation Distillation for Person Search (DDPS), which disaggregates the distillation process and feature maps, respectively. Firstly, the distillation process is disaggregated into two task-oriented sub-processes, <italic>i.e.</i>, detection distillation and re-id distillation, to help the student learn both accurate localization capability and discriminative person embeddings. Secondly, we disaggregate each feature map into person and background regions, and distill these two regions independently to alleviate the imbalance problem. More concretely, three types of distillation modules, <italic>i.e.</i>, logit distillation (LD), correlation distillation (CD), and disaggregation feature distillation (DFD), are particularly designed to transfer comprehensive information from the teacher to the student. Note that such a simple yet effective distillation scheme can be readily applied to both homogeneous and heterogeneous teacher-student combinations. We conduct extensive experiments on two person search benchmarks, where the results demonstrate that, surprisingly, our DDPS enables the student model to surpass the performance of the corresponding teacher model, even achieving comparable results with general person search models.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"158-170"},"PeriodicalIF":8.4,"publicationDate":"2024-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Multimedia Publication Information IEEE多媒体出版信息汇刊
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3444988
{"title":"IEEE Transactions on Multimedia Publication Information","authors":"","doi":"10.1109/TMM.2024.3444988","DOIUrl":"https://doi.org/10.1109/TMM.2024.3444988","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"26 ","pages":"C2-C2"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10817140","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142890347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPT4Ego: Unleashing the Potential of Pre-Trained Models for Zero-Shot Egocentric Action Recognition GPT4Ego:释放零射击自我中心行动识别预训练模型的潜力
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-27 DOI: 10.1109/TMM.2024.3521658
Guangzhao Dai;Xiangbo Shu;Wenhao Wu;Rui Yan;Jiachao Zhang
Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in some egocentric tasks, Zero-Shot Egocentric Action Recognition (ZS-EAR), entailing VLMs zero-shot to recognize actions from first-person videos enriched in more realistic human-environment interactions. Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We propose a refined approach for ZS-EAR using VLMs, emphasizing fine-grained concept-description alignment that capitalizes on the rich semantic and contextual details in egocentric videos. In this work, we introduce a straightforward yet remarkably potent VLM framework, aka GPT4Ego, designed to enhance the fine-grained alignment of concept and description between vision and language. Specifically, we first propose a new Ego-oriented Text Prompting (EgoTP$spadesuit$) scheme, which effectively prompts action-related text-contextual semantics by evolving word-level class names to sentence-level contextual descriptions by ChatGPT with well-designed chain-of-thought textual prompts. Moreover, we design a new Ego-oriented Visual Parsing (EgoVP$clubsuit$) strategy that learns action-related vision-contextual semantics by refining global-level images to part-level contextual concepts with the help of SAM. Extensive experiments demonstrate GPT4Ego significantly outperforms existing VLMs on three large-scale egocentric video benchmarks, i.e., EPIC-KITCHENS-100 (33.2%$uparrow$$_{bm {+9.4}}$), EGTEA (39.6%$uparrow$$_{bm {+5.5}}$), and CharadesEgo (31.5%$uparrow$$_{bm {+2.6}}$). In addition, benefiting from the novel mechanism of fine-grained concept and description alignment, GPT4Ego can sustainably evolve with the advancement of ever-growing pre-trained foundational models. We hope this work can encourage the egocentric community to build more investigation into pre-trained vision-language models.
在大规模数据集上进行预训练的视觉语言模型(VLMs)在各种视觉识别任务中表现出令人印象深刻的性能。这一进步为在一些自我中心任务中的显著表现铺平了道路,零镜头自我中心动作识别(ZS-EAR),要求VLMs零镜头识别来自第一人称视频的动作,这些视频丰富了更现实的人类环境交互。通常,vlm将ZS-EAR处理为全局视频文本匹配任务,这通常会导致视觉和语言知识的次优对齐。我们提出了一种使用vlm的改进ZS-EAR方法,强调细粒度的概念-描述对齐,利用以自我为中心的视频中丰富的语义和上下文细节。在这项工作中,我们引入了一个简单但非常有效的VLM框架,即GPT4Ego,旨在增强视觉和语言之间概念和描述的细粒度一致性。具体而言,我们首先提出了一种新的自我导向文本提示(EgoTP $spadesuit$)方案,该方案通过ChatGPT将词级类名演变为句子级上下文描述,通过精心设计的思维链文本提示,有效地提示与动作相关的文本上下文语义。此外,我们设计了一种新的面向自我的视觉解析(Ego-oriented Visual Parsing, EgoVP $clubsuit$)策略,该策略通过在SAM的帮助下将全局级图像精炼为部分级上下文概念来学习与动作相关的视觉上下文语义。大量实验表明,GPT4Ego在三个大规模以自我为中心的视频基准上显著优于现有的vlm,即EPIC-KITCHENS-100 (33.2%$uparrow$$_{bm {+9.4}}$), EGTEA (39.6%$uparrow$$_{bm {+5.5}}$), and CharadesEgo (31.5%$uparrow$$_{bm {+2.6}}$). In addition, benefiting from the novel mechanism of fine-grained concept and description alignment, GPT4Ego can sustainably evolve with the advancement of ever-growing pre-trained foundational models. We hope this work can encourage the egocentric community to build more investigation into pre-trained vision-language models.
{"title":"GPT4Ego: Unleashing the Potential of Pre-Trained Models for Zero-Shot Egocentric Action Recognition","authors":"Guangzhao Dai;Xiangbo Shu;Wenhao Wu;Rui Yan;Jiachao Zhang","doi":"10.1109/TMM.2024.3521658","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521658","url":null,"abstract":"Vision-Language Models (VLMs), pre-trained on large-scale datasets, have shown impressive performance in various visual recognition tasks. This advancement paves the way for notable performance in some egocentric tasks, Zero-Shot Egocentric Action Recognition (ZS-EAR), entailing VLMs zero-shot to recognize actions from first-person videos enriched in more realistic human-environment interactions. Typically, VLMs handle ZS-EAR as a global video-text matching task, which often leads to suboptimal alignment of vision and linguistic knowledge. We propose a refined approach for ZS-EAR using VLMs, emphasizing fine-grained concept-description alignment that capitalizes on the rich semantic and contextual details in egocentric videos. In this work, we introduce a straightforward yet remarkably potent VLM framework, <italic>aka</i> GPT4Ego, designed to enhance the fine-grained alignment of concept and description between vision and language. Specifically, we first propose a new Ego-oriented Text Prompting (EgoTP<inline-formula><tex-math>$spadesuit$</tex-math></inline-formula>) scheme, which effectively prompts action-related text-contextual semantics by evolving word-level class names to sentence-level contextual descriptions by ChatGPT with well-designed chain-of-thought textual prompts. Moreover, we design a new Ego-oriented Visual Parsing (EgoVP<inline-formula><tex-math>$clubsuit$</tex-math></inline-formula>) strategy that learns action-related vision-contextual semantics by refining global-level images to part-level contextual concepts with the help of SAM. Extensive experiments demonstrate GPT4Ego significantly outperforms existing VLMs on three large-scale egocentric video benchmarks, i.e., EPIC-KITCHENS-100 (33.2%<inline-formula><tex-math>$uparrow$</tex-math></inline-formula><inline-formula><tex-math>$_{bm {+9.4}}$</tex-math></inline-formula>), EGTEA (39.6%<inline-formula><tex-math>$uparrow$</tex-math></inline-formula><inline-formula><tex-math>$_{bm {+5.5}}$</tex-math></inline-formula>), and CharadesEgo (31.5%<inline-formula><tex-math>$uparrow$</tex-math></inline-formula><inline-formula><tex-math>$_{bm {+2.6}}$</tex-math></inline-formula>). In addition, benefiting from the novel mechanism of fine-grained concept and description alignment, GPT4Ego can sustainably evolve with the advancement of ever-growing pre-trained foundational models. We hope this work can encourage the egocentric community to build more investigation into pre-trained vision-language models.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"401-413"},"PeriodicalIF":8.4,"publicationDate":"2024-12-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Auxiliary Representation Guided Network for Visible-Infrared Person Re-Identification 可见-红外人再识别辅助表示引导网络
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-25 DOI: 10.1109/TMM.2024.3521773
Mengzan Qi;Sixian Chan;Chen Hang;Guixu Zhang;Tieyong Zeng;Zhi Li
Visible-Infrared Person Re-identification aims to retrieve images of specific identities across modalities. To relieve the large cross-modality discrepancy, researchers introduce the auxiliary modality within the image space to assist modality-invariant representation learning. However, the challenge persists in constraining the inherent quality of generated auxiliary images, further leading to a bottleneck in retrieval performance. In this paper, we propose a novel Auxiliary Representation Guided Network (ARGN) to explore the potential of auxiliary representations, which are directly generated within the modality-shared embedding space. In contrast to the original visible and infrared representations, which contain information solely from their respective modalities, these auxiliary representations integrate cross-modality information by fusing both modalities. In our framework, we utilize these auxiliary representations as modality guidance to reduce the cross-modality discrepancy. First, we propose a High-quality Auxiliary Representation Learning (HARL) framework to generate identity-consistent auxiliary representations. The primary objective of our HARL is to ensure that auxiliary representations capture diverse modality information from both modalities while concurrently preserving identity-related discrimination. Second, guided by these auxiliary representations, we design an Auxiliary Representation Guided Constraint (ARGC) to optimize the modality-shared embedding space. By incorporating this constraint, the modality-shared embedding space is optimized to achieve enhanced intra-identity compactness and inter-identity separability, further improving the retrieval performance. In addition, to improve the robustness of our framework against the modality variation, we introduce a Part-based Adaptive Gaussian Module (PAGM) to adaptively extract discriminative information across modalities. Finally, extensive experiments are conducted to demonstrate the superiority of our method over state-of-the-art approaches on three VI-ReID datasets.
可见-红外人体再识别旨在跨模态检索特定身份的图像。为了缓解较大的跨模态差异,研究者在图像空间中引入辅助模态来辅助模态不变表征学习。然而,所生成的辅助图像的固有质量仍然受到限制,这进一步导致了检索性能的瓶颈。在本文中,我们提出了一种新的辅助表示引导网络(ARGN)来探索辅助表示的潜力,这些辅助表示直接在模态共享的嵌入空间中生成。原始的可见光和红外表征只包含各自模态的信息,与之相反,这些辅助表征通过融合两种模态来整合跨模态信息。在我们的框架中,我们利用这些辅助表示作为情态指导来减少跨情态差异。首先,我们提出了一个高质量辅助表征学习(HARL)框架来生成身份一致的辅助表征。我们的HARL的主要目标是确保辅助表征从两种模态中捕获不同的模态信息,同时保留与身份相关的歧视。其次,在辅助表示的指导下,我们设计了辅助表示引导约束(ARGC)来优化模态共享嵌入空间。通过结合这一约束,优化模态共享嵌入空间,增强同一性内的紧密性和同一性间的可分离性,进一步提高检索性能。此外,为了提高我们的框架对模态变化的鲁棒性,我们引入了一个基于部分的自适应高斯模块(PAGM)来自适应地提取模态间的判别信息。最后,进行了广泛的实验,以证明我们的方法在三个VI-ReID数据集上优于最先进的方法。
{"title":"Auxiliary Representation Guided Network for Visible-Infrared Person Re-Identification","authors":"Mengzan Qi;Sixian Chan;Chen Hang;Guixu Zhang;Tieyong Zeng;Zhi Li","doi":"10.1109/TMM.2024.3521773","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521773","url":null,"abstract":"Visible-Infrared Person Re-identification aims to retrieve images of specific identities across modalities. To relieve the large cross-modality discrepancy, researchers introduce the auxiliary modality within the image space to assist modality-invariant representation learning. However, the challenge persists in constraining the inherent quality of generated auxiliary images, further leading to a bottleneck in retrieval performance. In this paper, we propose a novel Auxiliary Representation Guided Network (ARGN) to explore the potential of auxiliary representations, which are directly generated within the modality-shared embedding space. In contrast to the original visible and infrared representations, which contain information solely from their respective modalities, these auxiliary representations integrate cross-modality information by fusing both modalities. In our framework, we utilize these auxiliary representations as modality guidance to reduce the cross-modality discrepancy. First, we propose a High-quality Auxiliary Representation Learning (HARL) framework to generate identity-consistent auxiliary representations. The primary objective of our HARL is to ensure that auxiliary representations capture diverse modality information from both modalities while concurrently preserving identity-related discrimination. Second, guided by these auxiliary representations, we design an Auxiliary Representation Guided Constraint (ARGC) to optimize the modality-shared embedding space. By incorporating this constraint, the modality-shared embedding space is optimized to achieve enhanced intra-identity compactness and inter-identity separability, further improving the retrieval performance. In addition, to improve the robustness of our framework against the modality variation, we introduce a Part-based Adaptive Gaussian Module (PAGM) to adaptively extract discriminative information across modalities. Finally, extensive experiments are conducted to demonstrate the superiority of our method over state-of-the-art approaches on three VI-ReID datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"340-355"},"PeriodicalIF":8.4,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection 聚焦整体和感知环境的任意形状文本检测
IF 8.4 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-12-25 DOI: 10.1109/TMM.2024.3521797
Xu Han;Junyu Gao;Chuang Yang;Yuan Yuan;Qi Wang
Due to the diversity of scene text in aspects such as font, color, shape, and size, accurately and efficiently detecting text is still a formidable challenge. Among the various detection approaches, segmentation-based approaches have emerged as prominent contenders owing to their flexible pixel-level predictions. However, these methods typically model text instances in a bottom-up manner, which is highly susceptible to noise. In addition, the prediction of pixels is isolated without introducing pixel-feature interaction, which also influences the detection performance. To alleviate these problems, we propose a multi-information level arbitrary-shaped text detector consisting of a focus entirety module (FEM) and a perceive environment module (PEM). The former extracts instance-level features and adopts a top-down scheme to model texts to reduce the influence of noises. Specifically, it assigns consistent entirety information to pixels within the same instance to improve their cohesion. In addition, it emphasizes the scale information, enabling the model to distinguish varying scale texts effectively. The latter extracts region-level information and encourages the model to focus on the distribution of positive samples in the vicinity of a pixel, which perceives environment information. It treats the kernel pixels as positive samples and helps the model differentiate text and kernel features. Extensive experiments demonstrate the FEM's ability to efficiently support the model in handling different scale texts and confirm the PEM can assist in perceiving pixels more accurately by focusing on pixel vicinities. Comparisons show the proposed model outperforms existing state-of-the-art approaches on four public datasets.
由于场景文本在字体、颜色、形状、大小等方面的多样性,准确、高效地检测文本仍然是一个艰巨的挑战。在各种检测方法中,基于分割的方法由于其灵活的像素级预测而成为突出的竞争者。然而,这些方法通常以自下而上的方式对文本实例建模,这很容易受到噪声的影响。此外,像素的预测是孤立的,没有引入像素-特征交互,这也影响了检测性能。为了解决这些问题,我们提出了一种由焦点整体模块(FEM)和感知环境模块(PEM)组成的多信息级任意形状文本检测器。前者提取实例级特征,采用自顶向下的方法对文本进行建模,降低噪声的影响。具体来说,它为同一实例中的像素分配一致的整体信息,以提高它们的内聚性。此外,它强调尺度信息,使模型能够有效地区分不同尺度的文本。后者提取区域级信息,鼓励模型关注像素附近正样本的分布,感知环境信息。它将核像素作为正样本,帮助模型区分文本和核特征。大量的实验证明了FEM能够有效地支持模型处理不同尺度的文本,并证实了PEM可以通过聚焦像素附近来帮助更准确地感知像素。比较表明,所提出的模型在四个公共数据集上优于现有的最先进的方法。
{"title":"Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection","authors":"Xu Han;Junyu Gao;Chuang Yang;Yuan Yuan;Qi Wang","doi":"10.1109/TMM.2024.3521797","DOIUrl":"https://doi.org/10.1109/TMM.2024.3521797","url":null,"abstract":"Due to the diversity of scene text in aspects such as font, color, shape, and size, accurately and efficiently detecting text is still a formidable challenge. Among the various detection approaches, segmentation-based approaches have emerged as prominent contenders owing to their flexible pixel-level predictions. However, these methods typically model text instances in a bottom-up manner, which is highly susceptible to noise. In addition, the prediction of pixels is isolated without introducing pixel-feature interaction, which also influences the detection performance. To alleviate these problems, we propose a multi-information level arbitrary-shaped text detector consisting of a focus entirety module (FEM) and a perceive environment module (PEM). The former extracts instance-level features and adopts a top-down scheme to model texts to reduce the influence of noises. Specifically, it assigns consistent entirety information to pixels within the same instance to improve their cohesion. In addition, it emphasizes the scale information, enabling the model to distinguish varying scale texts effectively. The latter extracts region-level information and encourages the model to focus on the distribution of positive samples in the vicinity of a pixel, which perceives environment information. It treats the kernel pixels as positive samples and helps the model differentiate text and kernel features. Extensive experiments demonstrate the FEM's ability to efficiently support the model in handling different scale texts and confirm the PEM can assist in perceiving pixels more accurately by focusing on pixel vicinities. Comparisons show the proposed model outperforms existing state-of-the-art approaches on four public datasets.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"287-299"},"PeriodicalIF":8.4,"publicationDate":"2024-12-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1