首页 > 最新文献

IET Computer Vision最新文献

英文 中文
GR-Former: Graph-reinforcement transformer for skeleton-based driver action recognition GR-Former:基于骨架的驾驶员动作识别图形强化变换器
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-10 DOI: 10.1049/cvi2.12298
Zhuoyan Xu, Jingke Xu

In in-vehicle driving scenarios, composite action recognition is crucial for improving safety and understanding the driver's intention. Due to spatial constraints and occlusion factors, the driver's range of motion is limited, thus resulting in similar action patterns that are difficult to differentiate. Additionally, collecting skeleton data that characterise the full human posture is difficult, posing additional challenges for action recognition. To address the problems, a novel Graph-Reinforcement Transformer (GR-Former) model is proposed. Using limited skeleton data as inputs, by introducing graph structure information to directionally reinforce the effect of the self-attention mechanism, dynamically learning and aggregating features between joints at multiple levels, the authors’ model constructs a richer feature vector space, enhancing its expressiveness and recognition accuracy. Based on the Drive & Act dataset for composite action recognition, the authors’ work only applies human upper-body skeleton data to achieve state-of-the-art performance compared to existing methods. Using complete human skeleton data also has excellent recognition accuracy on the NTU RGB + D- and NTU RGB + D 120 dataset, demonstrating the great generalisability of the GR-Former. Generally, the authors’ work provides a new and effective solution for driver action recognition in in-vehicle scenarios.

在车内驾驶场景中,复合动作识别对于提高安全性和理解驾驶员意图至关重要。由于空间限制和遮挡因素,驾驶员的运动范围有限,因此会产生难以区分的相似动作模式。此外,收集能描述完整人体姿态的骨骼数据也很困难,这给动作识别带来了更多挑战。为了解决这些问题,我们提出了一种新颖的图形强化变换器(GR-Former)模型。作者的模型以有限的骨骼数据为输入,通过引入图结构信息来定向强化自我注意机制的效果,动态学习和聚合多层次关节间的特征,构建了一个更丰富的特征向量空间,增强了模型的表现力和识别准确性。基于 Drive & Act 数据集的复合动作识别,作者的工作只应用了人体上半身骨架数据,与现有方法相比取得了最先进的性能。使用完整的人体骨骼数据在 NTU RGB + D- 和 NTU RGB + D 120 数据集上也有极高的识别准确率,这证明了 GR-Former 的强大通用性。总体而言,作者的研究为车载场景中的驾驶员动作识别提供了一种新的有效解决方案。
{"title":"GR-Former: Graph-reinforcement transformer for skeleton-based driver action recognition","authors":"Zhuoyan Xu,&nbsp;Jingke Xu","doi":"10.1049/cvi2.12298","DOIUrl":"10.1049/cvi2.12298","url":null,"abstract":"<p>In in-vehicle driving scenarios, composite action recognition is crucial for improving safety and understanding the driver's intention. Due to spatial constraints and occlusion factors, the driver's range of motion is limited, thus resulting in similar action patterns that are difficult to differentiate. Additionally, collecting skeleton data that characterise the full human posture is difficult, posing additional challenges for action recognition. To address the problems, a novel Graph-Reinforcement Transformer (GR-Former) model is proposed. Using limited skeleton data as inputs, by introducing graph structure information to directionally reinforce the effect of the self-attention mechanism, dynamically learning and aggregating features between joints at multiple levels, the authors’ model constructs a richer feature vector space, enhancing its expressiveness and recognition accuracy. Based on the Drive &amp; Act dataset for composite action recognition, the authors’ work only applies human upper-body skeleton data to achieve state-of-the-art performance compared to existing methods. Using complete human skeleton data also has excellent recognition accuracy on the NTU RGB + D- and NTU RGB + D 120 dataset, demonstrating the great generalisability of the GR-Former. Generally, the authors’ work provides a new and effective solution for driver action recognition in in-vehicle scenarios.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"982-991"},"PeriodicalIF":1.5,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12298","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141659905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition 基于骨骼动作识别的多尺度骨骼简化图卷积网络
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-08 DOI: 10.1049/cvi2.12300
Fan Zhang, Ding Chongyang, Kai Liu, Liu Hongjin

Human action recognition based on graph convolutional networks (GCNs) is one of the hotspots in computer vision. However, previous methods generally rely on handcrafted graph, which limits the effectiveness of the model in characterising the connections between indirectly connected joints. The limitation leads to weakened connections when joints are separated by long distances. To address the above issue, the authors propose a skeleton simplification method which aims to reduce the number of joints and the distance between joints by merging adjacent joints into simplified joints. Group convolutional block is devised to extract the internal features of the simplified joints. Additionally, the authors enhance the method by introducing multi-scale modelling, which maps inputs into sequences across various levels of simplification. Combining with spatial temporal graph convolution, a multi-scale skeleton simplification GCN for skeleton-based action recognition (M3S-GCN) is proposed for fusing multi-scale skeleton sequences and modelling the connections between joints. Finally, M3S-GCN is evaluated on five benchmarks of NTU RGB+D 60 (C-Sub, C-View), NTU RGB+D 120 (X-Sub, X-Set) and NW-UCLA datasets. Experimental results show that the authors’ M3S-GCN achieves state-of-the-art performance with the accuracies of 93.0%, 97.0% and 91.2% on C-Sub, C-View and X-Set benchmarks, which validates the effectiveness of the method.

基于图卷积网络(GCN)的人类动作识别是计算机视觉领域的热点之一。然而,以往的方法通常依赖于手工制作的图,这就限制了模型在描述间接连接的关节之间的连接时的有效性。当关节之间的距离较远时,这种限制会导致连接减弱。为解决上述问题,作者提出了一种骨架简化方法,旨在通过将相邻关节合并为简化关节来减少关节数量和关节间距。组卷积块用于提取简化关节的内部特征。此外,作者还通过引入多尺度建模,将输入映射到不同简化级别的序列中,从而增强了该方法。结合空间时间图卷积,作者提出了一种用于基于骨骼的动作识别的多尺度骨骼简化 GCN(M3S-GCN),用于融合多尺度骨骼序列并对关节之间的连接进行建模。最后,M3S-GCN 在 NTU RGB+D 60(C-Sub、C-View)、NTU RGB+D 120(X-Sub、X-Set)和 NW-UCLA 数据集的五个基准上进行了评估。实验结果表明,作者的 M3S-GCN 在 C-Sub、C-View 和 X-Set 基准上的准确率分别为 93.0%、97.0% 和 91.2%,达到了最先进的水平,验证了该方法的有效性。
{"title":"Multi-scale skeleton simplification graph convolutional network for skeleton-based action recognition","authors":"Fan Zhang,&nbsp;Ding Chongyang,&nbsp;Kai Liu,&nbsp;Liu Hongjin","doi":"10.1049/cvi2.12300","DOIUrl":"10.1049/cvi2.12300","url":null,"abstract":"<p>Human action recognition based on graph convolutional networks (GCNs) is one of the hotspots in computer vision. However, previous methods generally rely on handcrafted graph, which limits the effectiveness of the model in characterising the connections between indirectly connected joints. The limitation leads to weakened connections when joints are separated by long distances. To address the above issue, the authors propose a skeleton simplification method which aims to reduce the number of joints and the distance between joints by merging adjacent joints into simplified joints. Group convolutional block is devised to extract the internal features of the simplified joints. Additionally, the authors enhance the method by introducing multi-scale modelling, which maps inputs into sequences across various levels of simplification. Combining with spatial temporal graph convolution, a multi-scale skeleton simplification GCN for skeleton-based action recognition (M3S-GCN) is proposed for fusing multi-scale skeleton sequences and modelling the connections between joints. Finally, M3S-GCN is evaluated on five benchmarks of NTU RGB+D 60 (C-Sub, C-View), NTU RGB+D 120 (X-Sub, X-Set) and NW-UCLA datasets. Experimental results show that the authors’ M3S-GCN achieves state-of-the-art performance with the accuracies of 93.0%, 97.0% and 91.2% on C-Sub, C-View and X-Set benchmarks, which validates the effectiveness of the method.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"992-1003"},"PeriodicalIF":1.5,"publicationDate":"2024-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12300","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141668289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recognition of European mammals and birds in camera trap images using deep neural networks 利用深度神经网络识别相机捕获图像中的欧洲哺乳动物和鸟类
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-03 DOI: 10.1049/cvi2.12294
Daniel Schneider, Kim Lindner, Markus Vogelbacher, Hicham Bellafkir, Nina Farwig, Bernd Freisleben

Most machine learning methods for animal recognition in camera trap images are limited to mammal identification and group birds into a single class. Machine learning methods for visually discriminating birds, in turn, cannot discriminate between mammals and are not designed for camera trap images. The authors present deep neural network models to recognise both mammals and bird species in camera trap images. They train neural network models for species classification as well as for predicting the animal taxonomy, that is, genus, family, order, group, and class names. Different neural network architectures, including ResNet, EfficientNetV2, Vision Transformer, Swin Transformer, and ConvNeXt, are compared for these tasks. Furthermore, the authors investigate approaches to overcome various challenges associated with camera trap image analysis. The authors’ best species classification models achieve a mean average precision (mAP) of 97.91% on a validation data set and mAPs of 90.39% and 82.77% on test data sets recorded in forests in Germany and Poland, respectively. Their best taxonomic classification models reach a validation mAP of 97.18% and mAPs of 94.23% and 79.92% on the two test data sets, respectively.

大多数在相机陷阱图像中进行动物识别的机器学习方法仅限于哺乳动物识别,并将鸟类归为一类。反过来,视觉识别鸟类的机器学习方法不能识别哺乳动物,也不是为相机陷阱图像设计的。作者提出了深度神经网络模型,用于识别相机陷阱图像中的哺乳动物和鸟类物种。他们训练神经网络模型用于物种分类以及预测动物分类,即属、科、目、群和类名称。针对这些任务比较了不同的神经网络架构,包括 ResNet、EfficientNetV2、Vision Transformer、Swin Transformer 和 ConvNeXt。此外,作者还研究了克服与相机陷阱图像分析相关的各种挑战的方法。作者的最佳物种分类模型在验证数据集上的平均精度 (mAP) 达到 97.91%,在德国和波兰森林记录的测试数据集上的平均精度 (mAP) 分别达到 90.39% 和 82.77%。他们的最佳分类模型的验证 mAP 为 97.18%,在两个测试数据集上的 mAP 分别为 94.23% 和 79.92%。
{"title":"Recognition of European mammals and birds in camera trap images using deep neural networks","authors":"Daniel Schneider,&nbsp;Kim Lindner,&nbsp;Markus Vogelbacher,&nbsp;Hicham Bellafkir,&nbsp;Nina Farwig,&nbsp;Bernd Freisleben","doi":"10.1049/cvi2.12294","DOIUrl":"10.1049/cvi2.12294","url":null,"abstract":"<p>Most machine learning methods for animal recognition in camera trap images are limited to mammal identification and group birds into a single class. Machine learning methods for visually discriminating birds, in turn, cannot discriminate between mammals and are not designed for camera trap images. The authors present deep neural network models to recognise both mammals and bird species in camera trap images. They train neural network models for species classification as well as for predicting the animal taxonomy, that is, genus, family, order, group, and class names. Different neural network architectures, including ResNet, EfficientNetV2, Vision Transformer, Swin Transformer, and ConvNeXt, are compared for these tasks. Furthermore, the authors investigate approaches to overcome various challenges associated with camera trap image analysis. The authors’ best species classification models achieve a mean average precision (mAP) of 97.91% on a validation data set and mAPs of 90.39% and 82.77% on test data sets recorded in forests in Germany and Poland, respectively. Their best taxonomic classification models reach a validation mAP of 97.18% and mAPs of 94.23% and 79.92% on the two test data sets, respectively.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1162-1192"},"PeriodicalIF":1.5,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12294","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141683177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Self-supervised multi-view clustering in computer vision: A survey 计算机视觉中的自监督多视角聚类:调查
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-02 DOI: 10.1049/cvi2.12299
Jiatai Wang, Zhiwei Xu, Xuewen Yang, Hailong Li, Bo Li, Xuying Meng

In recent years, multi-view clustering (MVC) has had significant implications in the fields of cross-modal representation learning and data-driven decision-making. Its main objective is to cluster samples into distinct groups by leveraging consistency and complementary information among multiple views. However, the field of computer vision has witnessed the evolution of contrastive learning, and self-supervised learning has made substantial research progress. Consequently, self-supervised learning is progressively becoming dominant in MVC methods. It involves designing proxy tasks to extract supervisory information from image and video data, thereby guiding the clustering process. Despite the rapid development of self-supervised MVC, there is currently no comprehensive survey analysing and summarising the current state of research progress. Hence, the authors aim to explore the emergence of self-supervised MVC by discussing the reasons and advantages behind it. Additionally, the internal connections and classifications of common datasets, data issues, representation learning methods, and self-supervised learning methods are investigated. The authors not only introduce the mechanisms for each category of methods, but also provide illustrative examples of their applications. Finally, some open problems are identified for further investigation and development.

近年来,多视图聚类(MVC)在跨模态表征学习和数据驱动决策领域产生了重大影响。其主要目的是利用多视图之间的一致性和互补性信息,将样本聚类为不同的组。然而,计算机视觉领域见证了对比学习的发展,自监督学习也取得了长足的研究进展。因此,自监督学习逐渐成为 MVC 方法的主流。它包括设计代理任务,从图像和视频数据中提取监督信息,从而指导聚类过程。尽管自监督 MVC 发展迅速,但目前还没有一份全面的调查报告来分析和总结当前的研究进展状况。因此,作者旨在通过讨论自监督 MVC 出现的原因和优势,探索自监督 MVC 的出现。此外,作者还研究了常见数据集、数据问题、表示学习方法和自监督学习方法的内在联系和分类。作者不仅介绍了各类方法的机制,还提供了应用实例。最后,作者还指出了一些有待进一步研究和开发的问题。
{"title":"Self-supervised multi-view clustering in computer vision: A survey","authors":"Jiatai Wang,&nbsp;Zhiwei Xu,&nbsp;Xuewen Yang,&nbsp;Hailong Li,&nbsp;Bo Li,&nbsp;Xuying Meng","doi":"10.1049/cvi2.12299","DOIUrl":"https://doi.org/10.1049/cvi2.12299","url":null,"abstract":"<p>In recent years, multi-view clustering (MVC) has had significant implications in the fields of cross-modal representation learning and data-driven decision-making. Its main objective is to cluster samples into distinct groups by leveraging consistency and complementary information among multiple views. However, the field of computer vision has witnessed the evolution of contrastive learning, and self-supervised learning has made substantial research progress. Consequently, self-supervised learning is progressively becoming dominant in MVC methods. It involves designing proxy tasks to extract supervisory information from image and video data, thereby guiding the clustering process. Despite the rapid development of self-supervised MVC, there is currently no comprehensive survey analysing and summarising the current state of research progress. Hence, the authors aim to explore the emergence of self-supervised MVC by discussing the reasons and advantages behind it. Additionally, the internal connections and classifications of common datasets, data issues, representation learning methods, and self-supervised learning methods are investigated. The authors not only introduce the mechanisms for each category of methods, but also provide illustrative examples of their applications. Finally, some open problems are identified for further investigation and development.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 6","pages":"709-734"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12299","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142158626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fusing crops representation into snippet via mutual learning for weakly supervised surveillance anomaly detection 通过相互学习将作物表征融合到片段中,用于弱监督监控异常检测
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-02 DOI: 10.1049/cvi2.12289
Bohua Zhang, Jianru Xue

In recent years, the challenge of detecting anomalies in real-world surveillance videos using weakly supervised data has emerged. Traditional methods, utilising multi-instance learning (MIL) with video snippets, struggle with background noise and tend to overlook subtle anomalies. To tackle this, the authors propose a novel approach that crops snippets to create multiple instances with less noise, separately evaluates them and then fuses these evaluations for more precise anomaly detection. This method, however, leads to higher computational demands, especially during inference. Addressing this, our solution employs mutual learning to guide snippet feature training using these low-noise crops. The authors integrate multiple instance learning (MIL) for the primary task with snippets as inputs and multiple-multiple instance learning (MMIL) for an auxiliary task with crops during training. The authors’ approach ensures consistent multi-instance results in both tasks and incorporates a temporal activation mutual learning module (TAML) for aligning temporal anomaly activations between snippets and crops, improving the overall quality of snippet representations. Additionally, a snippet feature discrimination enhancement module (SFDE) refines the snippet features further. Tested across various datasets, the authors’ method shows remarkable performance, notably achieving a frame-level AUC of 85.78% on the UCF-Crime dataset, while reducing computational costs.

近年来,利用弱监督数据检测真实世界监控视频中的异常情况成为一项挑战。传统方法利用视频片段进行多实例学习 (MIL),在背景噪声的影响下举步维艰,往往会忽略细微的异常情况。为了解决这个问题,作者提出了一种新方法,即裁剪视频片段以创建噪声较小的多个实例,分别对它们进行评估,然后将这些评估结果融合起来,以实现更精确的异常检测。然而,这种方法对计算要求较高,尤其是在推理过程中。为了解决这个问题,我们的解决方案采用了相互学习的方法,利用这些低噪音作物指导片段特征训练。作者将多实例学习(MIL)和多-多实例学习(MMIL)相结合,前者用于以片段为输入的主要任务,后者用于在训练期间以作物为输入的辅助任务。作者的方法确保了这两项任务的多实例结果的一致性,并整合了一个时间激活相互学习模块(TAML),用于调整片段和作物之间的时间异常激活,从而提高片段表征的整体质量。此外,片段特征辨别增强模块(SFDE)可进一步完善片段特征。在各种数据集上进行测试后,作者的方法显示出卓越的性能,尤其是在 UCF-Crime 数据集上实现了 85.78% 的帧级 AUC,同时降低了计算成本。
{"title":"Fusing crops representation into snippet via mutual learning for weakly supervised surveillance anomaly detection","authors":"Bohua Zhang,&nbsp;Jianru Xue","doi":"10.1049/cvi2.12289","DOIUrl":"10.1049/cvi2.12289","url":null,"abstract":"<p>In recent years, the challenge of detecting anomalies in real-world surveillance videos using weakly supervised data has emerged. Traditional methods, utilising multi-instance learning (MIL) with video snippets, struggle with background noise and tend to overlook subtle anomalies. To tackle this, the authors propose a novel approach that crops snippets to create multiple instances with less noise, separately evaluates them and then fuses these evaluations for more precise anomaly detection. This method, however, leads to higher computational demands, especially during inference. Addressing this, our solution employs mutual learning to guide snippet feature training using these low-noise crops. The authors integrate multiple instance learning (MIL) for the primary task with snippets as inputs and multiple-multiple instance learning (MMIL) for an auxiliary task with crops during training. The authors’ approach ensures consistent multi-instance results in both tasks and incorporates a temporal activation mutual learning module (TAML) for aligning temporal anomaly activations between snippets and crops, improving the overall quality of snippet representations. Additionally, a snippet feature discrimination enhancement module (SFDE) refines the snippet features further. Tested across various datasets, the authors’ method shows remarkable performance, notably achieving a frame-level AUC of 85.78% on the UCF-Crime dataset, while reducing computational costs.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1112-1126"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12289","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141684297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FastFaceCLIP: A lightweight text-driven high-quality face image manipulation FastFaceCLIP:轻量级文本驱动的高质量人脸图像处理工具
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-07-02 DOI: 10.1049/cvi2.12295
Jiaqi Ren, Junping Qin, Qianli Ma, Yin Cao

Although many new methods have emerged in text-driven images, the large computational power required for model training causes these methods to have a slow training process. Additionally, these methods consume a considerable amount of video random access memory (VRAM) resources during training. When generating high-resolution images, the VRAM resources are often insufficient, which results in the inability to generate high-resolution images. Nevertheless, recent Vision Transformers (ViTs) advancements have demonstrated their image classification and recognition capabilities. Unlike the traditional Convolutional Neural Networks based methods, ViTs have a Transformer-based architecture, leverage attention mechanisms to capture comprehensive global information, moreover enabling enhanced global understanding of images through inherent long-range dependencies, thus extracting more robust features and achieving comparable results with reduced computational load. The adaptability of ViTs to text-driven image manipulation was investigated. Specifically, existing image generation methods were refined and the FastFaceCLIP method was proposed by combining the image-text semantic alignment function of the pre-trained CLIP model with the high-resolution image generation function of the proposed FastFace. Additionally, the Multi-Axis Nested Transformer module was incorporated for advanced feature extraction from the latent space, generating higher-resolution images that are further enhanced using the Real-ESRGAN algorithm. Eventually, extensive face manipulation-related tests on the CelebA-HQ dataset challenge the proposed method and other related schemes, demonstrating that FastFaceCLIP effectively generates semantically accurate, visually realistic, and clear images using fewer parameters and less time.

虽然在文本驱动图像领域出现了许多新方法,但模型训练所需的计算能力很大,导致这些方法的训练过程很慢。此外,这些方法在训练过程中会消耗大量视频随机存取存储器(VRAM)资源。在生成高分辨率图像时,VRAM 资源往往不足,导致无法生成高分辨率图像。不过,视觉转换器(ViTs)的最新进展已经证明了其图像分类和识别能力。与传统的基于卷积神经网络的方法不同,ViTs 采用基于变换器的架构,利用注意力机制捕捉全面的全局信息,并通过固有的长距离依赖关系增强对图像的全局理解,从而提取更强大的特征,并在减少计算负荷的情况下实现可比的结果。研究了 ViTs 对文本驱动图像处理的适应性。具体而言,对现有的图像生成方法进行了改进,并通过将预先训练的 CLIP 模型的图像-文本语义配准功能与所提出的 FastFace 的高分辨率图像生成功能相结合,提出了 FastFaceCLIP 方法。此外,还加入了多轴嵌套变换器模块,用于从潜空间进行高级特征提取,生成更高分辨率的图像,并使用 Real-ESRGAN 算法对图像进行进一步增强。最终,在 CelebA-HQ 数据集上进行的大量人脸操作相关测试对所提出的方法和其他相关方案提出了挑战,证明 FastFaceCLIP 能有效地生成语义准确、视觉逼真和清晰的图像,而且参数更少、时间更短。
{"title":"FastFaceCLIP: A lightweight text-driven high-quality face image manipulation","authors":"Jiaqi Ren,&nbsp;Junping Qin,&nbsp;Qianli Ma,&nbsp;Yin Cao","doi":"10.1049/cvi2.12295","DOIUrl":"10.1049/cvi2.12295","url":null,"abstract":"<p>Although many new methods have emerged in text-driven images, the large computational power required for model training causes these methods to have a slow training process. Additionally, these methods consume a considerable amount of video random access memory (VRAM) resources during training. When generating high-resolution images, the VRAM resources are often insufficient, which results in the inability to generate high-resolution images. Nevertheless, recent Vision Transformers (ViTs) advancements have demonstrated their image classification and recognition capabilities. Unlike the traditional Convolutional Neural Networks based methods, ViTs have a Transformer-based architecture, leverage attention mechanisms to capture comprehensive global information, moreover enabling enhanced global understanding of images through inherent long-range dependencies, thus extracting more robust features and achieving comparable results with reduced computational load. The adaptability of ViTs to text-driven image manipulation was investigated. Specifically, existing image generation methods were refined and the FastFaceCLIP method was proposed by combining the image-text semantic alignment function of the pre-trained CLIP model with the high-resolution image generation function of the proposed FastFace. Additionally, the Multi-Axis Nested Transformer module was incorporated for advanced feature extraction from the latent space, generating higher-resolution images that are further enhanced using the Real-ESRGAN algorithm. Eventually, extensive face manipulation-related tests on the CelebA-HQ dataset challenge the proposed method and other related schemes, demonstrating that FastFaceCLIP effectively generates semantically accurate, visually realistic, and clear images using fewer parameters and less time.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"950-967"},"PeriodicalIF":1.5,"publicationDate":"2024-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12295","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141687557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DualAD: Dual adversarial network for image anomaly detection⋆
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-25 DOI: 10.1049/cvi2.12297
Yonghao Wan, Aimin Feng

Anomaly Detection, also known as outlier detection, is critical in domains such as network security, intrusion detection, and fraud detection. One popular approach to anomaly detection is using autoencoders, which are trained to reconstruct input by minimising reconstruction error with the neural network. However, these methods usually suffer from the trade-off between normal reconstruction fidelity and abnormal reconstruction distinguishability, which damages the performance. The authors find that the above trade-off can be better mitigated by imposing constraints on the latent space of images. To this end, the authors propose a new Dual Adversarial Network (DualAD) that consists of a Feature Constraint (FC) module and a reconstruction module. The method incorporates the FC module during the reconstruction training process to impose constraints on the latent space of images, thereby yielding feature representations more conducive to anomaly detection. Additionally, the authors employ dual adversarial learning to model the distribution of normal data. On the one hand, adversarial learning was implemented during the reconstruction process to obtain higher-quality reconstruction samples, thereby preventing the effects of blurred image reconstructions on model performance. On the other hand, the authors utilise adversarial training of the FC module and the reconstruction module to achieve superior feature representation, making anomalies more distinguishable at the feature level. During the inference phase, the authors perform anomaly detection simultaneously in the pixel and latent spaces to identify abnormal patterns more comprehensively. Experiments on three data sets CIFAR10, MNIST, and FashionMNIST demonstrate the validity of the authors’ work. Results show that constraints on the latent space and adversarial learning can improve detection performance.

{"title":"DualAD: Dual adversarial network for image anomaly detection⋆","authors":"Yonghao Wan,&nbsp;Aimin Feng","doi":"10.1049/cvi2.12297","DOIUrl":"https://doi.org/10.1049/cvi2.12297","url":null,"abstract":"<p>Anomaly Detection, also known as outlier detection, is critical in domains such as network security, intrusion detection, and fraud detection. One popular approach to anomaly detection is using autoencoders, which are trained to reconstruct input by minimising reconstruction error with the neural network. However, these methods usually suffer from the trade-off between normal reconstruction fidelity and abnormal reconstruction distinguishability, which damages the performance. The authors find that the above trade-off can be better mitigated by imposing constraints on the latent space of images. To this end, the authors propose a new Dual Adversarial Network (DualAD) that consists of a Feature Constraint (FC) module and a reconstruction module. The method incorporates the FC module during the reconstruction training process to impose constraints on the latent space of images, thereby yielding feature representations more conducive to anomaly detection. Additionally, the authors employ dual adversarial learning to model the distribution of normal data. On the one hand, adversarial learning was implemented during the reconstruction process to obtain higher-quality reconstruction samples, thereby preventing the effects of blurred image reconstructions on model performance. On the other hand, the authors utilise adversarial training of the FC module and the reconstruction module to achieve superior feature representation, making anomalies more distinguishable at the feature level. During the inference phase, the authors perform anomaly detection simultaneously in the pixel and latent spaces to identify abnormal patterns more comprehensively. Experiments on three data sets CIFAR10, MNIST, and FashionMNIST demonstrate the validity of the authors’ work. Results show that constraints on the latent space and adversarial learning can improve detection performance.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1138-1148"},"PeriodicalIF":1.5,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12297","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143253263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SAM-Y: Attention-enhanced hazardous vehicle object detection algorithm
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-17 DOI: 10.1049/cvi2.12293
Shanshan Wang, Bushi Liu, Pengcheng Zhu, Xianchun Meng, Bolun Chen, Wei Shao, Liqing Chen

Vehicle transportation of hazardous chemicals is one of the important mobile hazards in modern logistics, and its unsafe factors bring serious threats to people's lives, property and environmental safety. Although the current object detection algorithm has certain applications in the detection of hazardous chemical vehicles, due to the complexity of the transportation environment, the small size and low resolution of the vehicle target etc., object detection becomes more difficult in the face of a complex background. In order to solve these problems, the authors propose an improved algorithm based on YOLOv5 to enhance the detection accuracy and efficiency of hazardous chemical vehicles. Firstly, in order to better capture the details and semantic information of hazardous chemical vehicles, the algorithm solves the problem of mismatch between the receptive field of the detector and the target object by introducing the receptive field expansion block into the backbone network, so as to improve the ability of the model to capture the detailed information of hazardous chemical vehicles. Secondly, in order to improve the ability of the model to express the characteristics of hazardous chemical vehicles, the authors introduce a separable attention mechanism in the multi-scale target detection stage, and enhances the prediction ability of the model by combining the object detection head and attention mechanism coherently in the feature layer of scale perception, the spatial location of spatial perception and the output channel of task perception. Experimental results show that the improved model significantly surpasses the baseline model in terms of accuracy and achieves more accurate object detection. At the same time, the model also has a certain improvement in inference speed and achieves faster inference ability.

{"title":"SAM-Y: Attention-enhanced hazardous vehicle object detection algorithm","authors":"Shanshan Wang,&nbsp;Bushi Liu,&nbsp;Pengcheng Zhu,&nbsp;Xianchun Meng,&nbsp;Bolun Chen,&nbsp;Wei Shao,&nbsp;Liqing Chen","doi":"10.1049/cvi2.12293","DOIUrl":"https://doi.org/10.1049/cvi2.12293","url":null,"abstract":"<p>Vehicle transportation of hazardous chemicals is one of the important mobile hazards in modern logistics, and its unsafe factors bring serious threats to people's lives, property and environmental safety. Although the current object detection algorithm has certain applications in the detection of hazardous chemical vehicles, due to the complexity of the transportation environment, the small size and low resolution of the vehicle target etc., object detection becomes more difficult in the face of a complex background. In order to solve these problems, the authors propose an improved algorithm based on YOLOv5 to enhance the detection accuracy and efficiency of hazardous chemical vehicles. Firstly, in order to better capture the details and semantic information of hazardous chemical vehicles, the algorithm solves the problem of mismatch between the receptive field of the detector and the target object by introducing the receptive field expansion block into the backbone network, so as to improve the ability of the model to capture the detailed information of hazardous chemical vehicles. Secondly, in order to improve the ability of the model to express the characteristics of hazardous chemical vehicles, the authors introduce a separable attention mechanism in the multi-scale target detection stage, and enhances the prediction ability of the model by combining the object detection head and attention mechanism coherently in the feature layer of scale perception, the spatial location of spatial perception and the output channel of task perception. Experimental results show that the improved model significantly surpasses the baseline model in terms of accuracy and achieves more accurate object detection. At the same time, the model also has a certain improvement in inference speed and achieves faster inference ability.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 8","pages":"1149-1161"},"PeriodicalIF":1.5,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12293","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143252870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PSANet: Automatic colourisation using position-spatial attention for natural images PSANet:利用位置空间注意力为自然图像自动着色
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-06-16 DOI: 10.1049/cvi2.12291
Peng-Jie Zhu, Yuan-Yuan Pu, Qiuxia Yang, Siqi Li, Zheng-Peng Zhao, Hao Wu, Dan Xu

Due to the richness of natural image semantics, natural image colourisation is a challenging problem. Existing methods often suffer from semantic confusion due to insufficient semantic understanding, resulting in unreasonable colour assignments, especially at the edges of objects. This phenomenon is referred to as colour bleeding. The authors have found that using the self-attention mechanism benefits the model's understanding and recognition of object semantics. However, this leads to another problem in colourisation, namely dull colour. With this in mind, a Position-Spatial Attention Network(PSANet) is proposed to address the colour bleeding and the dull colour. Firstly, a novel new attention module called position-spatial attention module (PSAM) is introduced. Through the proposed PSAM module, the model enhances the semantic understanding of images while solving the dull colour problem caused by self-attention. Then, in order to further prevent colour bleeding on object boundaries, a gradient-aware loss is proposed. Lastly, the colour bleeding phenomenon is further improved by the combined effect of gradient-aware loss and edge-aware loss. Experimental results show that this method can reduce colour bleeding largely while maintaining good perceptual quality.

由于自然图像语义的丰富性,自然图像着色是一个具有挑战性的问题。由于对语义的理解不够,现有的方法经常会出现语义混乱,导致颜色分配不合理,尤其是在物体的边缘。这种现象被称为 "渗色"。作者发现,使用自我关注机制有利于模型对物体语义的理解和识别。然而,这也导致了色彩化的另一个问题,即色彩暗淡。有鉴于此,我们提出了一种位置-空间注意力网络(PSANet)来解决渗色和颜色暗淡的问题。首先,我们引入了一个新颖的注意力模块--位置空间注意力模块(PSAM)。通过所提出的 PSAM 模块,该模型增强了对图像的语义理解,同时解决了由自我注意力引起的色彩暗淡问题。然后,为了进一步防止物体边界上的颜色渗漏,提出了一种梯度感知损失(gradient-aware loss)。最后,通过梯度感知损耗和边缘感知损耗的共同作用,进一步改善了渗色现象。实验结果表明,这种方法可以在很大程度上减少渗色现象,同时保持良好的感知质量。
{"title":"PSANet: Automatic colourisation using position-spatial attention for natural images","authors":"Peng-Jie Zhu,&nbsp;Yuan-Yuan Pu,&nbsp;Qiuxia Yang,&nbsp;Siqi Li,&nbsp;Zheng-Peng Zhao,&nbsp;Hao Wu,&nbsp;Dan Xu","doi":"10.1049/cvi2.12291","DOIUrl":"https://doi.org/10.1049/cvi2.12291","url":null,"abstract":"<p>Due to the richness of natural image semantics, natural image colourisation is a challenging problem. Existing methods often suffer from semantic confusion due to insufficient semantic understanding, resulting in unreasonable colour assignments, especially at the edges of objects. This phenomenon is referred to as colour bleeding. The authors have found that using the self-attention mechanism benefits the model's understanding and recognition of object semantics. However, this leads to another problem in colourisation, namely dull colour. With this in mind, a Position-Spatial Attention Network(PSANet) is proposed to address the colour bleeding and the dull colour. Firstly, a novel new attention module called position-spatial attention module (PSAM) is introduced. Through the proposed PSAM module, the model enhances the semantic understanding of images while solving the dull colour problem caused by self-attention. Then, in order to further prevent colour bleeding on object boundaries, a gradient-aware loss is proposed. Lastly, the colour bleeding phenomenon is further improved by the combined effect of gradient-aware loss and edge-aware loss. Experimental results show that this method can reduce colour bleeding largely while maintaining good perceptual quality.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"922-934"},"PeriodicalIF":1.5,"publicationDate":"2024-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12291","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563038","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Knowledge distillation of face recognition via attention cosine similarity review 通过注意力余弦相似性审查提炼人脸识别知识
IF 1.5 4区 计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2024-05-31 DOI: 10.1049/cvi2.12288
Zhuo Wang, SuWen Zhao, WanYi Guo

Deep learning-based face recognition models have demonstrated remarkable performance in benchmark tests, and knowledge distillation technology has been frequently accustomed to obtain high-precision real-time face recognition models specifically designed for mobile and embedded devices. However, in recent years, the knowledge distillation methods for face recognition, which mainly focus on feature or logit knowledge distillation techniques, neglect the attention mechanism that play an important role in the domain of neural networks. An innovation cross-stage connection review path of the attention cosine similarity knowledge distillation method that unites the attention mechanism with review knowledge distillation method is proposed. This method transfers the attention map obtained from the teacher network to the student through a cross-stage connection path. The efficacy and excellence of the proposed algorithm are demonstrated in popular benchmark tests.

基于深度学习的人脸识别模型在基准测试中表现出了不俗的性能,知识蒸馏技术也经常被用来获得专为移动和嵌入式设备设计的高精度实时人脸识别模型。然而,近年来用于人脸识别的知识提炼方法主要集中在特征或对数知识提炼技术上,忽略了在神经网络领域发挥重要作用的注意力机制。本文提出了一种创新的跨阶段连接审查路径的注意力余弦相似性知识提炼方法,将注意力机制与审查知识提炼方法结合起来。该方法通过跨阶段连接路径将从教师网络获得的注意力图谱传递给学生。在流行的基准测试中证明了所提算法的有效性和卓越性。
{"title":"Knowledge distillation of face recognition via attention cosine similarity review","authors":"Zhuo Wang,&nbsp;SuWen Zhao,&nbsp;WanYi Guo","doi":"10.1049/cvi2.12288","DOIUrl":"https://doi.org/10.1049/cvi2.12288","url":null,"abstract":"<p>Deep learning-based face recognition models have demonstrated remarkable performance in benchmark tests, and knowledge distillation technology has been frequently accustomed to obtain high-precision real-time face recognition models specifically designed for mobile and embedded devices. However, in recent years, the knowledge distillation methods for face recognition, which mainly focus on feature or logit knowledge distillation techniques, neglect the attention mechanism that play an important role in the domain of neural networks. An innovation cross-stage connection review path of the attention cosine similarity knowledge distillation method that unites the attention mechanism with review knowledge distillation method is proposed. This method transfers the attention map obtained from the teacher network to the student through a cross-stage connection path. The efficacy and excellence of the proposed algorithm are demonstrated in popular benchmark tests.</p>","PeriodicalId":56304,"journal":{"name":"IET Computer Vision","volume":"18 7","pages":"875-887"},"PeriodicalIF":1.5,"publicationDate":"2024-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cvi2.12288","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142563031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IET Computer Vision
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1