首页 > 最新文献

Computer Vision and Image Understanding最新文献

英文 中文
Bidirectional temporal and frame-segment attention for sparse action segmentation of figure skating 用于花样滑冰稀疏动作分割的双向时间和帧段注意力
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-26 DOI: 10.1016/j.cviu.2024.104186
Yanchao Liu , Xina Cheng , Yuan Li , Takeshi Ikenaga
Temporal action segmentation is a task for understanding human activities in long-term videos. Most of the efforts have been focused on dense-frame action, which relies on strong correlations between frames. However, in the figure skating scene, technical actions are sparsely shown in the video. This brings new challenges: a large amount of redundant temporal information leads to weak frame correlation. To end this, we propose a Bidirectional Temporal and Frame-Segment Attention Module (FSAM). Specifically, we propose an additional reverse-temporal input stream to enhance frame correlation, learned by fusing bidirectional temporal features. In addition, the proposed FSAM contains a Multi-stage segment-aware GCN and decoder interaction module, aiming to learn the correlation between segment features across time domains and integrate embeddings between frame and segment representations. To evaluate our approach, we propose the Figure Skating Sparse Action Segmentation (FSSAS) dataset: The dataset comprises 100 samples of the Olympic figure skating final and semi-final competition, with more than 50 different men and women athletes. Extensive experiments show that our method achieves an accuracy of 87.75 and an edit score of 90.18 on the FSSAS dataset.
时间动作分割是理解长期视频中人类活动的一项任务。大多数工作都集中在密集帧动作上,这依赖于帧与帧之间的强相关性。然而,在花样滑冰场景中,视频中的技术动作非常稀疏。这就带来了新的挑战:大量冗余的时间信息会导致弱帧相关性。为此,我们提出了双向时间和帧段关注模块(FSAM)。具体来说,我们提出了一个额外的反向时空输入流,通过融合双向时空特征来增强帧相关性。此外,拟议的 FSAM 还包含一个多阶段分段感知 GCN 和解码器交互模块,旨在学习跨时域分段特征之间的相关性,并整合帧和分段表征之间的嵌入。为了评估我们的方法,我们提出了花样滑冰稀疏动作分割(FSSAS)数据集:该数据集包括 100 个奥运会花样滑冰决赛和半决赛样本,其中有 50 多名不同的男女运动员。广泛的实验表明,我们的方法在 FSSAS 数据集上达到了 87.75 的准确率和 90.18 的编辑分数。
{"title":"Bidirectional temporal and frame-segment attention for sparse action segmentation of figure skating","authors":"Yanchao Liu ,&nbsp;Xina Cheng ,&nbsp;Yuan Li ,&nbsp;Takeshi Ikenaga","doi":"10.1016/j.cviu.2024.104186","DOIUrl":"10.1016/j.cviu.2024.104186","url":null,"abstract":"<div><div>Temporal action segmentation is a task for understanding human activities in long-term videos. Most of the efforts have been focused on dense-frame action, which relies on strong correlations between frames. However, in the figure skating scene, technical actions are sparsely shown in the video. This brings new challenges: a large amount of redundant temporal information leads to weak frame correlation. To end this, we propose a Bidirectional Temporal and Frame-Segment Attention Module (FSAM). Specifically, we propose an additional reverse-temporal input stream to enhance frame correlation, learned by fusing bidirectional temporal features. In addition, the proposed FSAM contains a Multi-stage segment-aware GCN and decoder interaction module, aiming to learn the correlation between segment features across time domains and integrate embeddings between frame and segment representations. To evaluate our approach, we propose the Figure Skating Sparse Action Segmentation (FSSAS) dataset: The dataset comprises 100 samples of the Olympic figure skating final and semi-final competition, with more than 50 different men and women athletes. Extensive experiments show that our method achieves an accuracy of 87.75 and an edit score of 90.18 on the FSSAS dataset.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104186"},"PeriodicalIF":4.3,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142422246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives 符号学人工智能:衔接计算机视觉和视觉符号学,对大规模面部图像档案进行计算观察
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-26 DOI: 10.1016/j.cviu.2024.104187
Lia Morra , Antonio Santangelo , Pietro Basci , Luca Piano , Fabio Garcea , Fabrizio Lamberti , Massimo Leone
Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework’s output that serves as a reliable measure of similarity in image content.
社交网络正在创造一个数字世界,在这个世界里,人类面孔和身体图像的认知、情感和实用价值正在发生变化。然而,数字人文学科的研究人员往往不具备大规模研究这些现象的能力。本作品介绍了 FRESCO(通过计算观察的电子社会中的人脸表征),这是一个旨在大规模探索社交媒体平台上图像的社会文化影响的框架。FRESCO 利用最先进的计算机视觉技术,按照视觉符号学的原理,将图像解构为数字变量和分类变量。该框架从三个层面对图像进行分析:造型层面,包括线条和颜色等基本视觉特征;具象层面,代表具体实体或概念;以及阐释层面,尤其侧重于构建观众和观察者的观点。通过对这些层面的分析,可以发现图像中更深层次的叙事层次。实验验证证实了 FRESCO 的可靠性和实用性,我们还在两个公共数据集中评估了 FRESCO 的一致性和精确性。随后,我们介绍了 FRESCO 分数,这是一个从框架输出中得出的指标,可作为图像内容相似性的可靠衡量标准。
{"title":"For a semiotic AI: Bridging computer vision and visual semiotics for computational observation of large scale facial image archives","authors":"Lia Morra ,&nbsp;Antonio Santangelo ,&nbsp;Pietro Basci ,&nbsp;Luca Piano ,&nbsp;Fabio Garcea ,&nbsp;Fabrizio Lamberti ,&nbsp;Massimo Leone","doi":"10.1016/j.cviu.2024.104187","DOIUrl":"10.1016/j.cviu.2024.104187","url":null,"abstract":"<div><div>Social networks are creating a digital world in which the cognitive, emotional, and pragmatic value of the imagery of human faces and bodies is arguably changing. However, researchers in the digital humanities are often ill-equipped to study these phenomena at scale. This work presents FRESCO (Face Representation in E-Societies through Computational Observation), a framework designed to explore the socio-cultural implications of images on social media platforms at scale. FRESCO deconstructs images into numerical and categorical variables using state-of-the-art computer vision techniques, aligning with the principles of visual semiotics. The framework analyzes images across three levels: the plastic level, encompassing fundamental visual features like lines and colors; the figurative level, representing specific entities or concepts; and the enunciation level, which focuses particularly on constructing the point of view of the spectator and observer. These levels are analyzed to discern deeper narrative layers within the imagery. Experimental validation confirms the reliability and utility of FRESCO, and we assess its consistency and precision across two public datasets. Subsequently, we introduce the FRESCO score, a metric derived from the framework’s output that serves as a reliable measure of similarity in image content.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104187"},"PeriodicalIF":4.3,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142422247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
M-adapter: Multi-level image-to-video adaptation for video action recognition M-adapter:用于视频动作识别的多级图像视频适配器
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-25 DOI: 10.1016/j.cviu.2024.104150
Rongchang Li , Tianyang Xu , Xiao-Jun Wu , Linze Li , Xiao Yang , Zhongwei Shen , Josef Kittler
With the growing size of visual foundation models, training video models from scratch has become costly and challenging. Recent attempts focus on transferring frozen pre-trained Image Models (PIMs) to video fields by tuning inserted learnable parameters such as adapters and prompts. However, these methods require saving PIM activations for gradient calculations, leading to limited savings of GPU memory. In this paper, we propose a novel parallel branch that adapts the multi-level outputs of the frozen PIM for action recognition. It avoids passing gradients through the PIMs, thus naturally owning much lower GPU memory footprints. The proposed adaptation branch consists of hierarchically combined multi-level output adapters (M-adapters), comprising a fusion module and a temporal module. This design digests the existing discrepancies between the pre-training task and the target task with lower training costs. We show that when using larger models or on scenarios with higher demands for temporal modelling, the proposed method performs better than those with the full-parameter tuning manner. Finally, despite only tuning fewer parameters, our method achieves superior or comparable performance against current state-of-the-art methods.
随着视觉基础模型的规模不断扩大,从头开始训练视频模型变得既昂贵又具有挑战性。最近的尝试侧重于通过调整插入的可学习参数(如适配器和提示),将冻结的预训练图像模型(PIM)转移到视频领域。然而,这些方法需要保存 PIM 激活以进行梯度计算,因此只能有限地节省 GPU 内存。在本文中,我们提出了一个新颖的并行分支,它可以调整冻结 PIM 的多级输出,用于动作识别。它避免了通过 PIM 传递梯度,从而自然而然地大大降低了 GPU 内存占用。拟议的适配分支由分层组合的多级输出适配器(M-adapters)组成,包括一个融合模块和一个时序模块。这种设计以较低的训练成本消化了预训练任务与目标任务之间的现有差异。我们的研究表明,在使用较大的模型或对时间建模要求较高的场景时,所提出的方法比采用全参数调整方式的方法表现更好。最后,尽管只调整了较少的参数,我们的方法仍然取得了优于或媲美当前最先进方法的性能。
{"title":"M-adapter: Multi-level image-to-video adaptation for video action recognition","authors":"Rongchang Li ,&nbsp;Tianyang Xu ,&nbsp;Xiao-Jun Wu ,&nbsp;Linze Li ,&nbsp;Xiao Yang ,&nbsp;Zhongwei Shen ,&nbsp;Josef Kittler","doi":"10.1016/j.cviu.2024.104150","DOIUrl":"10.1016/j.cviu.2024.104150","url":null,"abstract":"<div><div>With the growing size of visual foundation models, training video models from scratch has become costly and challenging. Recent attempts focus on transferring frozen pre-trained Image Models (PIMs) to video fields by tuning inserted learnable parameters such as adapters and prompts. However, these methods require saving PIM activations for gradient calculations, leading to limited savings of GPU memory. In this paper, we propose a novel parallel branch that adapts the multi-level outputs of the frozen PIM for action recognition. It avoids passing gradients through the PIMs, thus naturally owning much lower GPU memory footprints. The proposed adaptation branch consists of hierarchically combined multi-level output adapters (M-adapters), comprising a fusion module and a temporal module. This design digests the existing discrepancies between the pre-training task and the target task with lower training costs. We show that when using larger models or on scenarios with higher demands for temporal modelling, the proposed method performs better than those with the full-parameter tuning manner. Finally, despite only tuning fewer parameters, our method achieves superior or comparable performance against current state-of-the-art methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104150"},"PeriodicalIF":4.3,"publicationDate":"2024-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Spatial attention for human-centric visual understanding: An Information Bottleneck method 以人为本的视觉理解空间注意力:信息瓶颈法
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-24 DOI: 10.1016/j.cviu.2024.104180
Qiuxia Lai , Yongwei Nie , Yu Li , Hanqiu Sun , Qiang Xu
The selective visual attention mechanism in the Human Visual System (HVS) restricts the amount of information that reaches human visual awareness, allowing the brain to perceive high-fidelity natural scenes in real-time with limited computational cost. This selectivity acts as an “Information Bottleneck (IB)” that balances information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). This paper introduces an IB-inspired spatial attention module for DNNs, which generates an attention map by minimizing the mutual information (MI) between the attentive content and the input while maximizing that between the attentive content and the output. We develop this IB-inspired attention mechanism based on a novel graphical model and explore various implementations of the framework. We show that our approach can yield attention maps that neatly highlight the regions of interest while suppressing the backgrounds, and are interpretable for the decision-making of the DNNs. To validate the effectiveness of the proposed IB-inspired attention mechanism, we apply it to various computer vision tasks including image classification, fine-grained recognition, cross-domain classification, semantic segmentation, and object detection. Extensive experiments demonstrate that it bootstraps standard DNN structures quantitatively and qualitatively for these tasks.
人类视觉系统(HVS)中的选择性视觉注意力机制限制了到达人类视觉意识的信息量,使大脑能够以有限的计算成本实时感知高保真自然场景。这种选择性起到了 "信息瓶颈(IB)"的作用,在信息压缩和预测准确性之间取得了平衡。然而,在深度神经网络(DNN)的注意力机制中,这种信息约束很少被探索。本文介绍了受 IB 启发的 DNN 空间注意力模块,该模块通过最小化注意力内容与输入之间的互信息(MI),同时最大化注意力内容与输出之间的互信息(MI)来生成注意力地图。我们基于新颖的图形模型开发了这种受 IB 启发的注意力机制,并探索了该框架的各种实现方法。我们的研究表明,我们的方法可以生成注意力图,在抑制背景的同时突出感兴趣的区域,并且可以为 DNN 的决策提供解释。为了验证受 IB 启发的注意力机制的有效性,我们将其应用于各种计算机视觉任务,包括图像分类、细粒度识别、跨域分类、语义分割和物体检测。广泛的实验证明,在这些任务中,它能从定量和定性上引导标准 DNN 结构。
{"title":"Spatial attention for human-centric visual understanding: An Information Bottleneck method","authors":"Qiuxia Lai ,&nbsp;Yongwei Nie ,&nbsp;Yu Li ,&nbsp;Hanqiu Sun ,&nbsp;Qiang Xu","doi":"10.1016/j.cviu.2024.104180","DOIUrl":"10.1016/j.cviu.2024.104180","url":null,"abstract":"<div><div>The selective visual attention mechanism in the Human Visual System (HVS) restricts the amount of information that reaches human visual awareness, allowing the brain to perceive high-fidelity natural scenes in real-time with limited computational cost. This selectivity acts as an “Information Bottleneck (IB)” that balances information compression and predictive accuracy. However, such information constraints are rarely explored in the attention mechanism for deep neural networks (DNNs). This paper introduces an IB-inspired spatial attention module for DNNs, which generates an attention map by minimizing the mutual information (MI) between the attentive content and the input while maximizing that between the attentive content and the output. We develop this IB-inspired attention mechanism based on a novel graphical model and explore various implementations of the framework. We show that our approach can yield attention maps that neatly highlight the regions of interest while suppressing the backgrounds, and are interpretable for the decision-making of the DNNs. To validate the effectiveness of the proposed IB-inspired attention mechanism, we apply it to various computer vision tasks including image classification, fine-grained recognition, cross-domain classification, semantic segmentation, and object detection. Extensive experiments demonstrate that it bootstraps standard DNN structures quantitatively and qualitatively for these tasks.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104180"},"PeriodicalIF":4.3,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multimodality-guided Visual-Caption Semantic Enhancement 多模态引导下的视觉字幕语义增强
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-23 DOI: 10.1016/j.cviu.2024.104139
Nan Che , Jiang Liu , Fei Yu , Lechao Cheng , Yuxuan Wang , Yuehua Li , Chenrui Liu
Video captions generated with single modality, e.g. video clips, often suffer from insufficient event discovery and inadequate scene description. Therefore, this paper aims to improve the quality of captions by addressing these issues through the integration of multi-modal information. Specifically, We first construct a multi-modal dataset and introduce the triplet annotations of video, audio and text, fostering a comprehensive exploration about the associations between different modalities. Build upon this, We propose to explore the collaborative perception of audio and visual concepts to mitigate inaccuracies and incompleteness in captions in vision-based benchmarks by incorporating audio-visual perception priors. To achieve this, we extract effective semantic features from visual and auditory modalities, bridge the semantic gap between audio-visual modalities and text, and form a more precise knowledge graph multimodal coherence checking and information pruning mechanism. Exhaustive experiments demonstrate that the proposed approach surpasses existing methods and generalizes well with the assistance of ChatGPT.
使用单一模式(如视频剪辑)生成的视频字幕往往存在事件发现不足和场景描述不充分的问题。因此,本文旨在通过整合多模态信息来解决这些问题,从而提高字幕质量。具体来说,我们首先构建了一个多模态数据集,并引入了视频、音频和文本的三元组注释,促进了对不同模态之间关联的全面探索。在此基础上,我们提出探索音频和视觉概念的协同感知,通过结合音频和视觉感知先验,减少基于视觉的基准中字幕的不准确和不完整。为此,我们从视觉和听觉模态中提取有效的语义特征,弥合视听模态与文本之间的语义鸿沟,形成更精确的知识图谱多模态一致性检查和信息剪枝机制。详尽的实验证明,在 ChatGPT 的帮助下,所提出的方法超越了现有方法,并具有良好的通用性。
{"title":"Multimodality-guided Visual-Caption Semantic Enhancement","authors":"Nan Che ,&nbsp;Jiang Liu ,&nbsp;Fei Yu ,&nbsp;Lechao Cheng ,&nbsp;Yuxuan Wang ,&nbsp;Yuehua Li ,&nbsp;Chenrui Liu","doi":"10.1016/j.cviu.2024.104139","DOIUrl":"10.1016/j.cviu.2024.104139","url":null,"abstract":"<div><div>Video captions generated with single modality, e.g. video clips, often suffer from insufficient event discovery and inadequate scene description. Therefore, this paper aims to improve the quality of captions by addressing these issues through the integration of multi-modal information. Specifically, We first construct a multi-modal dataset and introduce the triplet annotations of video, audio and text, fostering a comprehensive exploration about the associations between different modalities. Build upon this, We propose to explore the collaborative perception of audio and visual concepts to mitigate inaccuracies and incompleteness in captions in vision-based benchmarks by incorporating audio-visual perception priors. To achieve this, we extract effective semantic features from visual and auditory modalities, bridge the semantic gap between audio-visual modalities and text, and form a more precise knowledge graph multimodal coherence checking and information pruning mechanism. Exhaustive experiments demonstrate that the proposed approach surpasses existing methods and generalizes well with the assistance of ChatGPT.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104139"},"PeriodicalIF":4.3,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bridging the gap between object detection in close-up and high-resolution wide shots 缩小特写镜头和高分辨率广角镜头之间的差距
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-23 DOI: 10.1016/j.cviu.2024.104181
Wenxi Li , Yuchen Guo , Jilai Zheng , Haozhe Lin , Chao Ma , Lu Fang , Xiaokang Yang
Recent years have seen a significant rise in gigapixel-level image/video capture systems and benchmarks with high-resolution wide (HRW) shots. Different from close-up shots like MS COCO, the higher resolution and wider field of view raise new research and application problems, such as how to perform accurate and efficient object detection with such large input in low-power edge devices like UAVs. There are several unique challenges in HRW shots. (1) Sparse information: the objects of interest cover less area. (2) Various scale: there is 10 to 100× object scale change in one single image. (3) Incomplete objects: the sliding window strategy to handle the large input leads to truncated objects at the window edge. (4) Multi-scale information: it is unclear how to use multi-scale information in training and inference. Consequently, directly using a close-up detector leads to inaccuracy and inefficiency. In this paper, we systematically investigate this problem and bridge the gap between object detection in close-up and HRW shots, by introducing a novel sparse architecture that can be integrated with common networks like ConvNet and Transformer. It leverages alternative sparse learning to complementarily fuse coarse-grained and fine-grained features to (1) adaptively extract valuable information from (2) different object scales. We also propose a novel Cross-window Non-Maximum Suppression (C-NMS) algorithm to (3) improve the box merge from different windows. Furthermore, we propose a (4) simple yet effective multi-scale training and inference strategy to improve accuracy. Experiments on two benchmarks with HRW shots, PANDA and DOTA-v1.0, demonstrate that our methods significantly improve accuracy (up to 5.8%) and speed (up to 3×) over SotAs, for both ConvNet or Transformer based detectors, on edge devices. Our code is open-sourced and available at https://github.com/liwenxi/SparseFormer.
近年来,千兆像素级图像/视频捕捉系统和高分辨率广角(HRW)拍摄基准大幅增加。与 MS COCO 等特写镜头不同的是,更高的分辨率和更宽的视场提出了新的研究和应用问题,例如如何在无人机等低功耗边缘设备中利用如此大的输入量进行准确高效的物体检测。高红外图像有几个独特的挑战。(1) 信息稀疏:感兴趣的物体覆盖面积较小。(2) 尺度变化大:单张图像中的物体尺度变化在 10 到 100 倍之间。(3) 对象不完整:处理大输入的滑动窗口策略会导致窗口边缘的对象被截断。(4) 多尺度信息:目前还不清楚如何在训练和推理中使用多尺度信息。因此,直接使用特写检测器会导致不准确和低效率。在本文中,我们系统地研究了这一问题,并通过引入一种可与 ConvNet 和 Transformer 等常见网络集成的新型稀疏架构,弥补了特写镜头和 HRW 镜头中物体检测之间的差距。它利用替代性稀疏学习来互补融合粗粒度和细粒度特征,从而(1) 自适应地从(2) 不同物体尺度中提取有价值的信息。我们还提出了一种新颖的跨窗口非最大值抑制(C-NMS)算法,以(3) 改进来自不同窗口的框合并。此外,我们还提出了一种 (4) 简单而有效的多尺度训练和推理策略,以提高准确性。在 PANDA 和 DOTA-v1.0 这两个具有 HRW 镜头的基准上进行的实验表明,对于基于 ConvNet 或 Transformer 的检测器,我们的方法在边缘设备上比 SotAs 显著提高了准确率(高达 5.8%)和速度(高达 3 倍)。我们的代码开源于 https://github.com/liwenxi/SparseFormer。
{"title":"Bridging the gap between object detection in close-up and high-resolution wide shots","authors":"Wenxi Li ,&nbsp;Yuchen Guo ,&nbsp;Jilai Zheng ,&nbsp;Haozhe Lin ,&nbsp;Chao Ma ,&nbsp;Lu Fang ,&nbsp;Xiaokang Yang","doi":"10.1016/j.cviu.2024.104181","DOIUrl":"10.1016/j.cviu.2024.104181","url":null,"abstract":"<div><div>Recent years have seen a significant rise in gigapixel-level image/video capture systems and benchmarks with high-resolution wide (HRW) shots. Different from close-up shots like MS COCO, the higher resolution and wider field of view raise new research and application problems, such as how to perform accurate and efficient object detection with such large input in low-power edge devices like UAVs. There are several unique challenges in HRW shots. (1) Sparse information: the objects of interest cover less area. (2) Various scale: there is 10 to 100<span><math><mo>×</mo></math></span> object scale change in one single image. (3) Incomplete objects: the sliding window strategy to handle the large input leads to truncated objects at the window edge. (4) Multi-scale information: it is unclear how to use multi-scale information in training and inference. Consequently, directly using a close-up detector leads to inaccuracy and inefficiency. In this paper, we systematically investigate this problem and bridge the gap between object detection in close-up and HRW shots, by introducing a novel sparse architecture that can be integrated with common networks like ConvNet and Transformer. It leverages alternative sparse learning to complementarily fuse coarse-grained and fine-grained features to (1) adaptively extract valuable information from (2) different object scales. We also propose a novel Cross-window Non-Maximum Suppression (C-NMS) algorithm to (3) improve the box merge from different windows. Furthermore, we propose a (4) simple yet effective multi-scale training and inference strategy to improve accuracy. Experiments on two benchmarks with HRW shots, PANDA and DOTA-v1.0, demonstrate that our methods significantly improve accuracy (up to 5.8%) and speed (up to 3<span><math><mo>×</mo></math></span>) over SotAs, for both ConvNet or Transformer based detectors, on edge devices. Our code is open-sourced and available at <span><span>https://github.com/liwenxi/SparseFormer</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104181"},"PeriodicalIF":4.3,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142327798","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deformable surface reconstruction via Riemannian metric preservation 通过黎曼度量保全重建可变形曲面
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-19 DOI: 10.1016/j.cviu.2024.104155
Oriol Barbany, Adrià Colomé, Carme Torras
Estimating the pose of an object from a monocular image is a fundamental inverse problem in computer vision. Due to its ill-posed nature, solving this problem requires incorporating deformation priors. In practice, many materials do not perceptibly shrink or extend when manipulated, constituting a reliable and well-known prior. Mathematically, this translates to the preservation of the Riemannian metric. Neural networks offer the perfect playground to solve the surface reconstruction problem as they can approximate surfaces with arbitrary precision and allow the computation of differential geometry quantities. This paper presents an approach for inferring continuous deformable surfaces from a sequence of images, which is benchmarked against several techniques and achieves state-of-the-art performance without the need for offline training. Being a method that performs per-frame optimization, our method can refine its estimates, contrary to those based on performing a single inference step. Despite enforcing differential geometry constraints at each update, our approach is the fastest of all the tested optimization-based methods.
从单目图像中估计物体的姿态是计算机视觉中的一个基本反问题。由于这一问题的不确定性,解决这一问题需要结合变形先验。在实践中,许多材料在操作时不会出现明显的收缩或延伸,这就构成了可靠且众所周知的先验。在数学上,这意味着黎曼度量的保留。神经网络能够以任意精度逼近曲面,并允许计算微分几何量,因此为解决曲面重构问题提供了完美的平台。本文介绍了一种从图像序列中推断连续可变形曲面的方法,该方法以多种技术为基准,无需离线训练即可达到最先进的性能。作为一种按帧优化的方法,我们的方法可以完善其估计值,这与那些基于执行单一推理步骤的方法截然不同。尽管每次更新都会强制执行差分几何约束,但我们的方法是所有测试过的基于优化的方法中速度最快的。
{"title":"Deformable surface reconstruction via Riemannian metric preservation","authors":"Oriol Barbany,&nbsp;Adrià Colomé,&nbsp;Carme Torras","doi":"10.1016/j.cviu.2024.104155","DOIUrl":"10.1016/j.cviu.2024.104155","url":null,"abstract":"<div><div>Estimating the pose of an object from a monocular image is a fundamental inverse problem in computer vision. Due to its ill-posed nature, solving this problem requires incorporating deformation priors. In practice, many materials do not perceptibly shrink or extend when manipulated, constituting a reliable and well-known prior. Mathematically, this translates to the preservation of the Riemannian metric. Neural networks offer the perfect playground to solve the surface reconstruction problem as they can approximate surfaces with arbitrary precision and allow the computation of differential geometry quantities. This paper presents an approach for inferring continuous deformable surfaces from a sequence of images, which is benchmarked against several techniques and achieves state-of-the-art performance without the need for offline training. Being a method that performs per-frame optimization, our method can refine its estimates, contrary to those based on performing a single inference step. Despite enforcing differential geometry constraints at each update, our approach is the fastest of all the tested optimization-based methods.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104155"},"PeriodicalIF":4.3,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002364/pdfft?md5=e37118b164489f2910fb59a519a86d29&pid=1-s2.0-S1077314224002364-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LCMA-Net: A light cross-modal attention network for streamer re-identification in live video LCMA-Net:用于实时视频中流媒体再识别的轻型跨模态注意力网络
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-19 DOI: 10.1016/j.cviu.2024.104183
Jiacheng Yao, Jing Zhang, Hui Zhang, Li Zhuo
With the rapid expansion of the we-media industry, streamers have increasingly incorporated inappropriate content into live videos to attract traffic and pursue interests. Blacklisted streamers often forge their identities or switch platforms to continue streaming, causing significant harm to the online environment. Consequently, streamer re-identification (re-ID) has become of paramount importance. Streamer biometrics in live videos exhibit multimodal characteristics, including voiceprints, faces, and spatiotemporal information, which complement each other. Therefore, we propose a light cross-modal attention network (LCMA-Net) for streamer re-ID in live videos. First, the voiceprint, face, and spatiotemporal features of the streamer are extracted by RawNet-SA, Π-Net, and STDA-ResNeXt3D, respectively. We then design a light cross-modal pooling attention (LCMPA) module, which, combined with a multilayer perceptron (MLP), aligns and concatenates different modality features into multimodal features within the LCMA-Net. Finally, the streamer is re-identified by measuring the similarity between these multimodal features. Five experiments were conducted on the StreamerReID dataset, and the results demonstrated that the proposed method achieved competitive performance. The dataset and code are available at https://github.com/BJUT-AIVBD/LCMA-Net.
随着微信媒体行业的迅速发展,直播者为了吸引流量和追求利益,越来越多地在直播视频中加入不当内容。被列入黑名单的直播者往往会伪造身份或转换平台继续直播,给网络环境造成严重危害。因此,流媒体重新识别(re-ID)变得至关重要。直播视频中的流媒体生物识别呈现出多模态特征,包括声纹、人脸和时空信息,这些特征相辅相成。因此,我们提出了一种轻型跨模态注意力网络(LCMA-Net),用于直播视频中的流媒体再识别。首先,通过 RawNet-SA、Π-Net 和 STDA-ResNeXt3D 分别提取视频流的声纹、人脸和时空特征。然后,我们设计了一个轻型跨模态集合注意力(LCMPA)模块,该模块与多层感知器(MLP)相结合,在 LCMA-Net 中将不同模态特征排列并串联成多模态特征。最后,通过测量这些多模态特征之间的相似性来重新识别流媒体。在 StreamerReID 数据集上进行了五次实验,结果表明所提出的方法取得了具有竞争力的性能。数据集和代码见 https://github.com/BJUT-AIVBD/LCMA-Net。
{"title":"LCMA-Net: A light cross-modal attention network for streamer re-identification in live video","authors":"Jiacheng Yao,&nbsp;Jing Zhang,&nbsp;Hui Zhang,&nbsp;Li Zhuo","doi":"10.1016/j.cviu.2024.104183","DOIUrl":"10.1016/j.cviu.2024.104183","url":null,"abstract":"<div><div>With the rapid expansion of the we-media industry, streamers have increasingly incorporated inappropriate content into live videos to attract traffic and pursue interests. Blacklisted streamers often forge their identities or switch platforms to continue streaming, causing significant harm to the online environment. Consequently, streamer re-identification (re-ID) has become of paramount importance. Streamer biometrics in live videos exhibit multimodal characteristics, including voiceprints, faces, and spatiotemporal information, which complement each other. Therefore, we propose a light cross-modal attention network (LCMA-Net) for streamer re-ID in live videos. First, the voiceprint, face, and spatiotemporal features of the streamer are extracted by RawNet-SA, <span><math><mi>Π</mi></math></span>-Net, and STDA-ResNeXt3D, respectively. We then design a light cross-modal pooling attention (LCMPA) module, which, combined with a multilayer perceptron (MLP), aligns and concatenates different modality features into multimodal features within the LCMA-Net. Finally, the streamer is re-identified by measuring the similarity between these multimodal features. Five experiments were conducted on the StreamerReID dataset, and the results demonstrated that the proposed method achieved competitive performance. The dataset and code are available at <span><span>https://github.com/BJUT-AIVBD/LCMA-Net</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104183"},"PeriodicalIF":4.3,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142358309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Specular highlight removal using Quaternion transformer 利用四元数变换器去除镜面反射高光
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-19 DOI: 10.1016/j.cviu.2024.104179
The Van Le, Jin Young Lee
Specular highlight removal is a very important issue, because specular highlight reflections in images with illumination changes can give very negative effects on various computer vision and image processing tasks. Numerous state-of-the-art networks for the specular removal use convolutional neural networks (CNN), which cannot learn global context effectively. They capture spatial information while overlooking 3D structural correlation information of an RGB image. To address this problem, we introduce a specular highlight removal network based on Quaternion transformer (QformerSHR), which employs a transformer architecture based on Quaternion representation. In particular, a depth-wise separable Quaternion convolutional layer (DSQConv) is proposed to enhance computational performance of QformerSHR, while efficiently preserving the structural correlation of an RGB image by utilizing the Quaternion representation. In addition, a Quaternion transformer block (QTB) based on DSQConv learns global context. As a result, QformerSHR consisting of DSQConv and QTB performs the specular removal from natural and text image datasets effectively. Experimental results demonstrate that it is significantly more effective than state-of-the-art networks for the specular removal, in terms of both quantitative performance and subjective quality.
去除镜面高光是一个非常重要的问题,因为在光照变化的图像中,镜面高光反射会给各种计算机视觉和图像处理任务带来非常负面的影响。许多用于去除镜面反射的先进网络都使用卷积神经网络(CNN),但这种网络无法有效学习全局上下文。它们在捕捉空间信息的同时,忽略了 RGB 图像的三维结构相关信息。为了解决这个问题,我们引入了基于四元数变换器的镜面高光去除网络(QformerSHR),它采用了基于四元数表示的变换器架构。特别是,我们提出了深度可分离四元卷积层(DSQConv),以提高 QformerSHR 的计算性能,同时利用四元数表示有效地保留 RGB 图像的结构相关性。此外,基于 DSQConv 的四元数变换器块(QTB)可学习全局上下文。因此,由 DSQConv 和 QTB 组成的 QformerSHR 能有效地去除自然和文本图像数据集中的镜面反射。实验结果表明,无论从定量性能还是主观质量来看,它都比最先进的去除镜面反射的网络有效得多。
{"title":"Specular highlight removal using Quaternion transformer","authors":"The Van Le,&nbsp;Jin Young Lee","doi":"10.1016/j.cviu.2024.104179","DOIUrl":"10.1016/j.cviu.2024.104179","url":null,"abstract":"<div><div>Specular highlight removal is a very important issue, because specular highlight reflections in images with illumination changes can give very negative effects on various computer vision and image processing tasks. Numerous state-of-the-art networks for the specular removal use convolutional neural networks (CNN), which cannot learn global context effectively. They capture spatial information while overlooking 3D structural correlation information of an RGB image. To address this problem, we introduce a specular highlight removal network based on Quaternion transformer (QformerSHR), which employs a transformer architecture based on Quaternion representation. In particular, a depth-wise separable Quaternion convolutional layer (DSQConv) is proposed to enhance computational performance of QformerSHR, while efficiently preserving the structural correlation of an RGB image by utilizing the Quaternion representation. In addition, a Quaternion transformer block (QTB) based on DSQConv learns global context. As a result, QformerSHR consisting of DSQConv and QTB performs the specular removal from natural and text image datasets effectively. Experimental results demonstrate that it is significantly more effective than state-of-the-art networks for the specular removal, in terms of both quantitative performance and subjective quality.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104179"},"PeriodicalIF":4.3,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142422160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimating optical flow: A comprehensive review of the state of the art 估计光流:最新技术综述
IF 4.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-09-16 DOI: 10.1016/j.cviu.2024.104160
Andrea Alfarano , Luca Maiano , Lorenzo Papa , Irene Amerini
Optical flow estimation is a crucial task in computer vision that provides low-level motion information. Despite recent advances, real-world applications still present significant challenges. This survey provides an overview of optical flow techniques and their application. For a comprehensive review, this survey covers both classical frameworks and the latest AI-based techniques. In doing so, we highlight the limitations of current benchmarks and metrics, underscoring the need for more representative datasets and comprehensive evaluation methods. The survey also highlights the importance of integrating industry knowledge and adopting training practices optimized for deep learning-based models. By addressing these issues, future research can aid the development of robust and efficient optical flow methods that can effectively address real-world scenarios.
光流估计是计算机视觉中的一项重要任务,它能提供低层次的运动信息。尽管最近取得了一些进展,但现实世界的应用仍然面临着巨大的挑战。本研究概述了光流技术及其应用。为了全面回顾,本调查涵盖了经典框架和基于人工智能的最新技术。在此过程中,我们强调了当前基准和衡量标准的局限性,强调需要更具代表性的数据集和全面的评估方法。调查还强调了整合行业知识和采用针对基于深度学习的模型进行优化的培训实践的重要性。通过解决这些问题,未来的研究将有助于开发稳健高效的光流方法,从而有效应对现实世界的各种场景。
{"title":"Estimating optical flow: A comprehensive review of the state of the art","authors":"Andrea Alfarano ,&nbsp;Luca Maiano ,&nbsp;Lorenzo Papa ,&nbsp;Irene Amerini","doi":"10.1016/j.cviu.2024.104160","DOIUrl":"10.1016/j.cviu.2024.104160","url":null,"abstract":"<div><div>Optical flow estimation is a crucial task in computer vision that provides low-level motion information. Despite recent advances, real-world applications still present significant challenges. This survey provides an overview of optical flow techniques and their application. For a comprehensive review, this survey covers both classical frameworks and the latest AI-based techniques. In doing so, we highlight the limitations of current benchmarks and metrics, underscoring the need for more representative datasets and comprehensive evaluation methods. The survey also highlights the importance of integrating industry knowledge and adopting training practices optimized for deep learning-based models. By addressing these issues, future research can aid the development of robust and efficient optical flow methods that can effectively address real-world scenarios.</div></div>","PeriodicalId":50633,"journal":{"name":"Computer Vision and Image Understanding","volume":"249 ","pages":"Article 104160"},"PeriodicalIF":4.3,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1077314224002418/pdfft?md5=0e040acf6e4116194d80885aeb4b2b49&pid=1-s2.0-S1077314224002418-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142312277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Computer Vision and Image Understanding
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1