首页 > 最新文献

IEEE Transactions on Multimedia最新文献

英文 中文
HMS2Net: Heterogeneous Multimodal State Space Network via CLIP for Dynamic Scene Classification in Livestreaming HMS2Net:基于CLIP的异构多模态空间网络用于直播动态场景分类
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2026-01-12 DOI: 10.1109/TMM.2025.3632629
Wensheng Li;Jing Zhang;Li Zhuo;Qi Tian
Livestreaming platforms attract countless daily active users, making online content regulation imperative. The complex and diverse multimodal content elements in dynamic livestreaming scene pose a great challenge to video content understanding. Thanks to the success of contrastive language-image pre-training (CLIP) for dynamic scene classification, which is one of the basic tasks of video content understanding. We propose a heterogeneous multimodal state space network (HMS2Net) for dynamic scene classification in livestreaming via CLIP. (1) To fully and efficiently mine the dynamic scene elements in livestreaming, we design a heterogeneous teacher-student Transformer (HT-SFormer) with CLIP to extract multimodal features in an energy-efficient unified pipeline; (2) To cope with the possible information conflicts in heterogeneous feature fusion, we introduce a cross-modal adaptive feature filter and fusion (CMAF) module to generate more complete information complementarity by adjusting multimodal feature composition; (3) For temporal context-awareness of dynamic scene, we establish a dynamic state space memory (DSSM) structure for capturing the correlation of multimodal data between neighboring video frames. A series of comparative experiments are conducted on the publicly available datasets DAVIS, Mini-kinetics, HMDB51, and the self-built BJUT-LCD. Our HMS2Net produce competitive results of 71.09%, 95.40%, 53.64%, and 82.36%, respectively, demonstrating the effectiveness and superiority of dynamic scene classification in livestreaming.
直播平台吸引了无数的日活跃用户,使得在线内容监管势在必行。动态直播场景中复杂多样的多模态内容元素对视频内容的理解提出了巨大的挑战。由于对比语言图像预训练(CLIP)在视频内容理解的基本任务之一——动态场景分类中的成功应用。提出了一种异构多模态空间网络(HMS2Net),用于视频直播中的动态场景分类。(1)为了充分高效地挖掘直播中的动态场景元素,我们设计了一种基于CLIP的异构师生变压器(HT-SFormer),在节能的统一管道中提取多模态特征;(2)针对异构特征融合过程中可能出现的信息冲突,引入了跨模态自适应特征滤波与融合(CMAF)模块,通过调整多模态特征组成来产生更完整的信息互补;(3)针对动态场景的时间上下文感知,建立了动态状态空间记忆(DSSM)结构,用于捕获相邻视频帧之间多模态数据的相关性。在公开数据集DAVIS、Mini-kinetics、HMDB51和自建的bjt - lcd上进行了一系列对比实验。我们的HMS2Net分别产生了71.09%、95.40%、53.64%和82.36%的竞争结果,证明了动态场景分类在直播中的有效性和优越性。
{"title":"HMS2Net: Heterogeneous Multimodal State Space Network via CLIP for Dynamic Scene Classification in Livestreaming","authors":"Wensheng Li;Jing Zhang;Li Zhuo;Qi Tian","doi":"10.1109/TMM.2025.3632629","DOIUrl":"https://doi.org/10.1109/TMM.2025.3632629","url":null,"abstract":"Livestreaming platforms attract countless daily active users, making online content regulation imperative. The complex and diverse multimodal content elements in dynamic livestreaming scene pose a great challenge to video content understanding. Thanks to the success of contrastive language-image pre-training (CLIP) for dynamic scene classification, which is one of the basic tasks of video content understanding. We propose a heterogeneous multimodal state space network (HMS<sup>2</sup>Net) for dynamic scene classification in livestreaming via CLIP. (1) To fully and efficiently mine the dynamic scene elements in livestreaming, we design a heterogeneous teacher-student Transformer (HT-SFormer) with CLIP to extract multimodal features in an energy-efficient unified pipeline; (2) To cope with the possible information conflicts in heterogeneous feature fusion, we introduce a cross-modal adaptive feature filter and fusion (CMAF) module to generate more complete information complementarity by adjusting multimodal feature composition; (3) For temporal context-awareness of dynamic scene, we establish a dynamic state space memory (DSSM) structure for capturing the correlation of multimodal data between neighboring video frames. A series of comparative experiments are conducted on the publicly available datasets DAVIS, Mini-kinetics, HMDB51, and the self-built BJUT-LCD. Our HMS<sup>2</sup>Net produce competitive results of 71.09%, 95.40%, 53.64%, and 82.36%, respectively, demonstrating the effectiveness and superiority of dynamic scene classification in livestreaming.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"772-785"},"PeriodicalIF":9.7,"publicationDate":"2026-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
2025 Reviewers List 2025审稿人名单
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-30 DOI: 10.1109/TMM.2025.3642659
{"title":"2025 Reviewers List","authors":"","doi":"10.1109/TMM.2025.3642659","DOIUrl":"https://doi.org/10.1109/TMM.2025.3642659","url":null,"abstract":"","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"9931-9940"},"PeriodicalIF":9.7,"publicationDate":"2025-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11318415","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145879969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-Tailed Continual Learning for Visual Food Recognition 视觉食物识别的长尾持续学习。
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-03 DOI: 10.1109/TMM.2025.3632640
Jiangpeng He;Xiaoyan Zhang;Luotao Lin;Jack Ma;Heather A. Eicher-Miller;Fengqing Zhu
Deep learning-based food recognition has made significant progress in predicting food types from eating occasion images. However, two key challenges hinder real-world deployment: (1) continuously learning new food classes without forgetting previously learned ones, and (2) handling the long-tailed distribution of food images, where a few common classes and many more rare classes. To address these, food recognition methods should focus on long-tailed continual learning. In this work, We introduce a dataset that encompasses 186 American foods along with comprehensive annotations. We also introduce three new benchmark datasets, VFN186-LT, VFN186-INSULIN and VFN186-T2D, which reflect real-world food consumption for healthy populations, insulin takers and individuals with type 2 diabetes without taking insulin. We propose a novel end-to-end framework that improves the generalization ability for instance-rare food classes using a knowledge distillation-based predictor to avoid misalignment of representation during continual learning. Additionally, we introduce an augmentation technique by integrating class-activation-map (CAM) and CutMix to improve generalization on instance-rare food classes. Our method, evaluated on Food101-LT, VFN-LT, VFN186-LT, VFN186-INSULIN, and VFN186-T2DM, shows significant improvements over existing methods. An ablation study highlights further performance enhancements, demonstrating its potential for real-world food recognition applications.
基于深度学习的食物识别在从进食场合图像预测食物类型方面取得了重大进展。然而,两个关键的挑战阻碍了现实世界的部署:(1)不断学习新的食物类别,而不忘记以前学过的食物类别;(2)处理食物图像的长尾分布,其中一些常见的类别和更多的罕见类别。为了解决这些问题,食物识别方法应该关注长尾持续学习。在这项工作中,我们引入了一个包含186种美国食物以及综合注释的数据集。我们还引入了三个新的基准数据集,VFN186-LT、vfn186 -胰岛素和VFN186-T2D,它们反映了健康人群、胰岛素使用者和未使用胰岛素的2型糖尿病患者的真实食物消费。我们提出了一种新的端到端框架,该框架使用基于知识蒸馏的预测器来提高例如稀有食物类别的泛化能力,以避免在持续学习期间表示的不一致。此外,我们引入了一种增强技术,通过集成类激活图(class-activation-map, CAM)和CutMix来提高实例稀有食物类的泛化。通过对Food101-LT、VFN-LT、VFN186-LT、VFN186-INSULIN和VFN186-T2DM的评估,我们的方法比现有方法有了显著的改进。一项消融研究强调了进一步的性能增强,展示了其在现实世界中食物识别应用的潜力。
{"title":"Long-Tailed Continual Learning for Visual Food Recognition","authors":"Jiangpeng He;Xiaoyan Zhang;Luotao Lin;Jack Ma;Heather A. Eicher-Miller;Fengqing Zhu","doi":"10.1109/TMM.2025.3632640","DOIUrl":"10.1109/TMM.2025.3632640","url":null,"abstract":"Deep learning-based food recognition has made significant progress in predicting food types from eating occasion images. However, two key challenges hinder real-world deployment: (1) continuously learning new food classes without forgetting previously learned ones, and (2) handling the long-tailed distribution of food images, where a few common classes and many more rare classes. To address these, food recognition methods should focus on long-tailed continual learning. In this work, We introduce a dataset that encompasses 186 American foods along with comprehensive annotations. We also introduce three new benchmark datasets, VFN186-LT, VFN186-INSULIN and VFN186-T2D, which reflect real-world food consumption for healthy populations, insulin takers and individuals with type 2 diabetes without taking insulin. We propose a novel end-to-end framework that improves the generalization ability for instance-rare food classes using a knowledge distillation-based predictor to avoid misalignment of representation during continual learning. Additionally, we introduce an augmentation technique by integrating class-activation-map (CAM) and CutMix to improve generalization on instance-rare food classes. Our method, evaluated on Food101-LT, VFN-LT, VFN186-LT, VFN186-INSULIN, and VFN186-T2DM, shows significant improvements over existing methods. An ablation study highlights further performance enhancements, demonstrating its potential for real-world food recognition applications.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"865-877"},"PeriodicalIF":9.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145700829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SSPD: Spatial-Spectral Prior Decoupling Model for Spectral Snapshot Compressive Imaging 光谱快照压缩成像的空间-光谱先验解耦模型
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-12-03 DOI: 10.1109/TMM.2025.3638016
Lizhu Liu;Yaonan Wang;Yurong Chen;Jiwen Lu;Hui Zhang
Coded aperture snapshot spectral imaging (CASSI) captures 3D hyperspectral images (HSIs) in a single shot by encoding incident light into 2D measurements. However, recovering the original hyperspectral data from these measurements is a severely ill-posed inverse problem due to significant information loss during compression. Recent deep learning methods, especially deep unfolding networks, have demonstrated promising reconstruction results by embedding learnable priors into iterative optimization frameworks. However, most existing approaches use a single network to jointly estimate spatial and spectral priors, limiting their ability to handle the distinct properties of HSIs. To overcome this limitation, we propose the Spatial-Spectral Prior Decoupling Model (SSPD), which reformulates HSI reconstruction as a prior absorption problem, enabling independent modeling of spatial and spectral priors with specialized network architectures. To achieve this, we design two attention mechanisms tailored for hyperspectral data: one for capturing spatial correlations and another for preserving spectral signatures. Additionally, we develop a hybrid loss function that combines convergence constraints and cross-prior interactions, ensuring accurate prior fusion and stable reconstruction. Experiments on synthetic and real-world datasets confirm that SSPD outperforms existing methods in spectral snapshot compressive imaging.
编码孔径快照光谱成像(CASSI)通过将入射光编码为二维测量值,在一次拍摄中捕获三维高光谱图像(hsi)。然而,从这些测量中恢复原始的高光谱数据是一个严重的病态逆问题,因为在压缩过程中有大量的信息丢失。最近的深度学习方法,特别是深度展开网络,通过将可学习先验嵌入到迭代优化框架中,已经证明了有希望的重建结果。然而,大多数现有方法使用单一网络来联合估计空间和光谱先验,限制了它们处理hsi不同性质的能力。为了克服这一限制,我们提出了空间-光谱先验解耦模型(SSPD),该模型将HSI重构重新表述为先验吸收问题,从而能够通过专门的网络架构对空间和光谱先验进行独立建模。为了实现这一目标,我们设计了两种针对高光谱数据的注意机制:一种用于捕获空间相关性,另一种用于保留光谱特征。此外,我们开发了一个混合损失函数,结合收敛约束和交叉先验相互作用,确保准确的先验融合和稳定的重建。在合成数据集和真实数据集上的实验证实,SSPD在光谱快照压缩成像方面优于现有方法。
{"title":"SSPD: Spatial-Spectral Prior Decoupling Model for Spectral Snapshot Compressive Imaging","authors":"Lizhu Liu;Yaonan Wang;Yurong Chen;Jiwen Lu;Hui Zhang","doi":"10.1109/TMM.2025.3638016","DOIUrl":"https://doi.org/10.1109/TMM.2025.3638016","url":null,"abstract":"Coded aperture snapshot spectral imaging (CASSI) captures 3D hyperspectral images (HSIs) in a single shot by encoding incident light into 2D measurements. However, recovering the original hyperspectral data from these measurements is a severely ill-posed inverse problem due to significant information loss during compression. Recent deep learning methods, especially deep unfolding networks, have demonstrated promising reconstruction results by embedding learnable priors into iterative optimization frameworks. However, most existing approaches use a single network to jointly estimate spatial and spectral priors, limiting their ability to handle the distinct properties of HSIs. To overcome this limitation, we propose the Spatial-Spectral Prior Decoupling Model (SSPD), which reformulates HSI reconstruction as a prior absorption problem, enabling independent modeling of spatial and spectral priors with specialized network architectures. To achieve this, we design two attention mechanisms tailored for hyperspectral data: one for capturing spatial correlations and another for preserving spectral signatures. Additionally, we develop a hybrid loss function that combines convergence constraints and cross-prior interactions, ensuring accurate prior fusion and stable reconstruction. Experiments on synthetic and real-world datasets confirm that SSPD outperforms existing methods in spectral snapshot compressive imaging.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"27 ","pages":"9847-9860"},"PeriodicalIF":9.7,"publicationDate":"2025-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145886575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Long-Short Match for Lost Control in UAV Multi-Object Tracking 无人机多目标跟踪中失控的长-短匹配
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-18 DOI: 10.1109/TMM.2025.3632642
Zizhuang Zou;Mao Ye;Luping Ji;Lihua Zhou;Song Tang;Yan Gan;Shuai Li
Multi-Object Tracking (MOT) in Uncrewed Aerial Vehicles (UAV) aims to continuously and stably detect and track objects in videos captured by UAVs. In existing MOT tracking-by-detection schemes, the tracker with a fixed step size is always employed, and a fixed length of past tracking information is input to the tracker to guide position prediction. However, the limited prediction range of a single-scale tracker leads to frequent tracking losses, and limited historical information also reduces tracking accuracy. To address these limitations, we propose a novel Long-Short Match (LSMTrack) tracking method. The key idea is to use long and short trackers and maintain a long-term motion state to improve tracking performance, thus reducing the likelihood of entering the lost status. To this end, a new Mamba-based tracker and a long-short match strategy are proposed. For long and short trackers, the same architecture is used based on Mamba. Unlike the previous Mamba-based approach, the proposed tracker maintains a long-term state while updating the state and making position predictions in each time step, so we call it a step Mamba tracker. Meanwhile, we devise a long-short match strategy at the inference stage to integrate long and short trackers, and design a lost control operation which updates the long-term states using historical state values. In this way, the matching probability and the inference efficiency are guaranteed. Experimental results on two UAV MOT datasets confirm the state-of-the-art performance. Specifically, the best results are achieved in terms of two popular MOTA and IDF1 tracking evaluation metrics.
无人机中的多目标跟踪(MOT)技术旨在对无人机捕获的视频中的目标进行连续、稳定的检测和跟踪。在现有的MOT检测跟踪方案中,总是采用固定步长的跟踪器,并向跟踪器输入固定长度的过去跟踪信息来指导位置预测。然而,单尺度跟踪器的预测范围有限,导致跟踪损失频繁,而且有限的历史信息也降低了跟踪精度。为了解决这些限制,我们提出了一种新的长-短匹配(LSMTrack)跟踪方法。关键思想是使用长、短跟踪器,保持长时间的运动状态,以提高跟踪性能,从而减少进入丢失状态的可能性。为此,提出了一种新的基于曼巴的跟踪器和多空匹配策略。对于长跟踪器和短跟踪器,使用基于Mamba的相同架构。与之前基于曼巴的方法不同,所提出的跟踪器在每个时间步更新状态并进行位置预测的同时保持长期状态,因此我们称其为步长曼巴跟踪器。同时,我们在推理阶段设计了一种长-短匹配策略,整合了长-短跟踪器,并设计了一种丢失控制操作,利用历史状态值更新长期状态。这样可以保证匹配概率和推理效率。在两个无人机MOT数据集上的实验结果验证了该方法的性能。具体来说,在两种流行的MOTA和IDF1跟踪评估指标方面取得了最好的结果。
{"title":"Long-Short Match for Lost Control in UAV Multi-Object Tracking","authors":"Zizhuang Zou;Mao Ye;Luping Ji;Lihua Zhou;Song Tang;Yan Gan;Shuai Li","doi":"10.1109/TMM.2025.3632642","DOIUrl":"https://doi.org/10.1109/TMM.2025.3632642","url":null,"abstract":"Multi-Object Tracking (MOT) in Uncrewed Aerial Vehicles (UAV) aims to continuously and stably detect and track objects in videos captured by UAVs. In existing MOT tracking-by-detection schemes, the tracker with a fixed step size is always employed, and a fixed length of past tracking information is input to the tracker to guide position prediction. However, the limited prediction range of a single-scale tracker leads to frequent tracking losses, and limited historical information also reduces tracking accuracy. To address these limitations, we propose a novel Long-Short Match (LSMTrack) tracking method. The key idea is to use long and short trackers and maintain a long-term motion state to improve tracking performance, thus reducing the likelihood of entering the lost status. To this end, a new Mamba-based tracker and a long-short match strategy are proposed. For long and short trackers, the same architecture is used based on Mamba. Unlike the previous Mamba-based approach, the proposed tracker maintains a long-term state while updating the state and making position predictions in each time step, so we call it a step Mamba tracker. Meanwhile, we devise a long-short match strategy at the inference stage to integrate long and short trackers, and design a lost control operation which updates the long-term states using historical state values. In this way, the matching probability and the inference efficiency are guaranteed. Experimental results on two UAV MOT datasets confirm the state-of-the-art performance. Specifically, the best results are achieved in terms of two popular MOTA and IDF1 tracking evaluation metrics.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"786-800"},"PeriodicalIF":9.7,"publicationDate":"2025-11-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145982352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Feature Dispersion Adaptation With Pre-Pooling Prototype for Continual Image Classification 基于预池原型的特征离散度自适应连续图像分类
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-17 DOI: 10.1109/TMM.2025.3632692
Wuxuan Shi;Mang Ye;Wei Yu;Bo Du
Catastrophic forgetting, the degradation of knowledge about previously seen classes when learning new concepts from a shifting data stream, is a pitfall faced by neural network learning in open environments. Recent research on continual image classification usually relies on storing samples or prototypes to resist this forgetting. We find that during acquiring knowledge of the new classes, the features of old classes gradually disperse, which leads to confusion of features between classes and makes them difficult to discriminate. Coping with feature dispersion would be a key consideration in resisting catastrophic forgetting, which has been neglected in previous works. To this end, we try to address this issue from two perspectives. First, we propose a dispersing feature generation mechanism, which generates pseudo-features based on the pre-pooling prototypes of the old classes to simulate feature dispersion and remind the classifier to adjust the decision boundary. Second, we design a consistent alignment constraint to alleviate the severity of feature dispersion by maintaining consistency in the hidden states of different depths when aligning the current model with the previous model. Extensive experimental results on various benchmarks show the superiority of our proposed method.
灾难性的遗忘,即在从不断变化的数据流中学习新概念时,对以前见过的类的知识的退化,是开放环境中神经网络学习面临的一个陷阱。最近的连续图像分类研究通常依赖于存储样本或原型来抵抗这种遗忘。我们发现,在获取新类别知识的过程中,旧类别的特征逐渐分散,导致类别之间的特征混淆,使其难以区分。应对特征分散是抵抗灾难性遗忘的关键考虑因素,这在以前的研究中被忽视了。为此,我们试图从两个角度来解决这个问题。首先,我们提出了一种分散特征生成机制,该机制基于旧类的预池原型生成伪特征来模拟特征的分散,提醒分类器调整决策边界。其次,设计一致性对齐约束,在当前模型与前一模型对齐时,通过保持不同深度隐藏状态的一致性来缓解特征离散的严重程度。在各种基准测试上的大量实验结果表明了我们提出的方法的优越性。
{"title":"Feature Dispersion Adaptation With Pre-Pooling Prototype for Continual Image Classification","authors":"Wuxuan Shi;Mang Ye;Wei Yu;Bo Du","doi":"10.1109/TMM.2025.3632692","DOIUrl":"https://doi.org/10.1109/TMM.2025.3632692","url":null,"abstract":"Catastrophic forgetting, the degradation of knowledge about previously seen classes when learning new concepts from a shifting data stream, is a pitfall faced by neural network learning in open environments. Recent research on continual image classification usually relies on storing samples or prototypes to resist this forgetting. We find that during acquiring knowledge of the new classes, the features of old classes gradually disperse, which leads to confusion of features between classes and makes them difficult to discriminate. Coping with feature dispersion would be a key consideration in resisting catastrophic forgetting, which has been neglected in previous works. To this end, we try to address this issue from two perspectives. First, we propose a dispersing feature generation mechanism, which generates pseudo-features based on the pre-pooling prototypes of the old classes to simulate feature dispersion and remind the classifier to adjust the decision boundary. Second, we design a consistent alignment constraint to alleviate the severity of feature dispersion by maintaining consistency in the hidden states of different depths when aligning the current model with the previous model. Extensive experimental results on various benchmarks show the superiority of our proposed method.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"801-812"},"PeriodicalIF":9.7,"publicationDate":"2025-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UniAlign: A Universal Cross-Modality Knowledge Alignment Framework for Fine-Grained Action Recognition UniAlign:用于细粒度动作识别的通用跨模态知识对齐框架
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-14 DOI: 10.1109/TMM.2025.3632670
Yihan Wang;Baoli Sun;Haojie Li;Xinzhu Ma;Zhihui Wang;Zhiyong Wang
The key to fine-grained video action recognition is identifying subtle differences between action categories. Relying solely on visual features supervised by action labels makes it challenging to characterize robust and discriminative action dynamics from videos. With significant advancements in human pose estimation and the powerful capabilities of Vision-Language Models (VLMs), obtaining reliable and cost-free human pose data and textual semantics has become increasingly feasible, enabling their effective use in fine-grained action recognition. However, the inherent disparities in feature representations across different modalities necessitate a robust alignment strategy to achieve optimal fusion. To address this, we propose a universal cross-modality knowledge alignment framework, namely UniAlign, to transfer the knowledge from such pre-trained multi-modal models into action recognition models. Specifically, UniAlign introduces two additional branches to extract pose features and textual semantics with the pre-trained pose encoder and VLM. To align the action-relevant cues among video features, pose features, and textual semantics, we propose a Cross-Modality Similarity Aggregation module (CMSA) that utilizes the importance of different modal cues while aggregating cross-modal similarities. Additionally, we adopt a fine-tuning mechanism similar to Exponential Moving Average (EMA) to refine the textual semantics, ensuring that the semantic representations encoded by VLMs are preserved while being optimized towards the specific task preferences. Extensive experiments on widely used fine-grained action recognition benchmarks (e.g., FineGym, NTURGB-D, Diving48) and coarse-grained K400 dataset demonstrate the effectiveness of the proposed UniAlign method.
细粒度视频动作识别的关键是识别动作类别之间的细微差异。仅仅依靠由动作标签监督的视觉特征,从视频中描述鲁棒性和判别性的动作动态是具有挑战性的。随着人体姿态估计的显著进步和视觉语言模型(VLMs)的强大功能,获得可靠且无成本的人体姿态数据和文本语义变得越来越可行,使其能够有效地用于细粒度动作识别。然而,不同模式之间特征表示的固有差异需要稳健的对齐策略来实现最佳融合。为了解决这个问题,我们提出了一个通用的跨模态知识对齐框架,即UniAlign,将这些知识从预训练的多模态模型转移到动作识别模型中。具体来说,UniAlign引入了两个额外的分支,通过预训练的姿态编码器和VLM来提取姿态特征和文本语义。为了在视频特征、姿势特征和文本语义之间对齐动作相关线索,我们提出了一个跨模态相似性聚合模块(CMSA),该模块在聚合跨模态相似性时利用不同模态线索的重要性。此外,我们采用类似于指数移动平均(EMA)的微调机制来细化文本语义,确保vlm编码的语义表示在针对特定任务偏好进行优化的同时得到保留。在广泛使用的细粒度动作识别基准(如FineGym, NTURGB-D, Diving48)和粗粒度K400数据集上进行的大量实验证明了所提出的UniAlign方法的有效性。
{"title":"UniAlign: A Universal Cross-Modality Knowledge Alignment Framework for Fine-Grained Action Recognition","authors":"Yihan Wang;Baoli Sun;Haojie Li;Xinzhu Ma;Zhihui Wang;Zhiyong Wang","doi":"10.1109/TMM.2025.3632670","DOIUrl":"https://doi.org/10.1109/TMM.2025.3632670","url":null,"abstract":"The key to fine-grained video action recognition is identifying subtle differences between action categories. Relying solely on visual features supervised by action labels makes it challenging to characterize robust and discriminative action dynamics from videos. With significant advancements in human pose estimation and the powerful capabilities of Vision-Language Models (VLMs), obtaining reliable and cost-free human pose data and textual semantics has become increasingly feasible, enabling their effective use in fine-grained action recognition. However, the inherent disparities in feature representations across different modalities necessitate a robust alignment strategy to achieve optimal fusion. To address this, we propose a universal cross-modality knowledge alignment framework, namely UniAlign, to transfer the knowledge from such pre-trained multi-modal models into action recognition models. Specifically, UniAlign introduces two additional branches to extract pose features and textual semantics with the pre-trained pose encoder and VLM. To align the action-relevant cues among video features, pose features, and textual semantics, we propose a Cross-Modality Similarity Aggregation module (CMSA) that utilizes the importance of different modal cues while aggregating cross-modal similarities. Additionally, we adopt a fine-tuning mechanism similar to Exponential Moving Average (EMA) to refine the textual semantics, ensuring that the semantic representations encoded by VLMs are preserved while being optimized towards the specific task preferences. Extensive experiments on widely used fine-grained action recognition benchmarks (e.g., FineGym, NTURGB-D, Diving48) and coarse-grained K400 dataset demonstrate the effectiveness of the proposed UniAlign method.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"891-901"},"PeriodicalIF":9.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unsupervised Point Cloud Reconstruction via Recurrent Multi-Step Moving Strategy 基于循环多步移动策略的无监督点云重建
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-14 DOI: 10.1109/TMM.2025.3632687
Zheng Liu;Jianjun Zhang;Ming Zhang;Runze Ke;Chengcheng Yu;Ligang Liu
Point cloud reconstruction is an ingredient in geometry modeling, computer graphics, and 3D vision. In this paper, we propose a novel unsupervised learning method called the Recurrent Multi-Step Moving Strategy, which progressively moves query points toward the underlying surface to accurately learn unsigned distance fields (UDFs) for point cloud reconstruction. Specifically, we design a recurrent network for UDF estimation that integrates a multi-step strategy for query movement. This model treats query movement as a trajectory prediction process, establishing dependencies between the current query move decision and the previous path, thus utilizing temporal information to improve UDF estimation accuracy. Further, we design distance and gradient regularization losses to ensure the precision, consistency, and continuity of the estimated UDFs. Extensive evaluations, comparisons, and ablation studies are conducted to show the superiority of our method over the competing approaches in terms of reconstruction accuracy and generality. Our unsupervised reconstruction method outperforms many supervised techniques and demonstrates efficacy across diverse scenarios, including single-object, indoor, and outdoor benchmarks.
点云重建是几何建模、计算机图形学和3D视觉的一个组成部分。在本文中,我们提出了一种新的无监督学习方法,称为循环多步移动策略,该方法逐步将查询点向下表面移动,以准确地学习无符号距离场(unsigned distance field, udf),用于点云重建。具体来说,我们设计了一个用于UDF估计的循环网络,该网络集成了用于查询移动的多步策略。该模型将查询移动视为一个轨迹预测过程,建立当前查询移动决策与先前路径之间的依赖关系,从而利用时间信息提高UDF估计精度。此外,我们设计了距离和梯度正则化损失,以确保估计udf的精度、一致性和连续性。广泛的评估、比较和消融研究表明,我们的方法在重建准确性和一般性方面优于其他竞争方法。我们的无监督重建方法优于许多有监督的技术,并在不同的场景下证明了有效性,包括单对象、室内和室外基准。
{"title":"Unsupervised Point Cloud Reconstruction via Recurrent Multi-Step Moving Strategy","authors":"Zheng Liu;Jianjun Zhang;Ming Zhang;Runze Ke;Chengcheng Yu;Ligang Liu","doi":"10.1109/TMM.2025.3632687","DOIUrl":"https://doi.org/10.1109/TMM.2025.3632687","url":null,"abstract":"Point cloud reconstruction is an ingredient in geometry modeling, computer graphics, and 3D vision. In this paper, we propose a novel unsupervised learning method called the Recurrent Multi-Step Moving Strategy, which progressively moves query points toward the underlying surface to accurately learn unsigned distance fields (UDFs) for point cloud reconstruction. Specifically, we design a recurrent network for UDF estimation that integrates a multi-step strategy for query movement. This model treats query movement as a trajectory prediction process, establishing dependencies between the current query move decision and the previous path, thus utilizing temporal information to improve UDF estimation accuracy. Further, we design distance and gradient regularization losses to ensure the precision, consistency, and continuity of the estimated UDFs. Extensive evaluations, comparisons, and ablation studies are conducted to show the superiority of our method over the competing approaches in terms of reconstruction accuracy and generality. Our unsupervised reconstruction method outperforms many supervised techniques and demonstrates efficacy across diverse scenarios, including single-object, indoor, and outdoor benchmarks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"972-984"},"PeriodicalIF":9.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Purified Zero-Shot Sketch-Based Image Retrieval 基于纯化零镜头草图的图像检索
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-14 DOI: 10.1109/TMM.2025.3632682
Yang Zhou;Jingru Yang;Jin Wang;Kaixiang Huang;Guodong Lu;Shengfeng He
Sketches, as a new solution in multimedia systems that can replace natural language, are characterized by sparse visual cues such as simple strokes that differ significantly from natural images containing complex elements such as background, foreground, and texture. This misalignment poses substantial challenges for zero-shot sketch-based image retrieval (ZS-SBIR). Prior approaches match sketches to full images and tend to overlook redundant elements in natural images, leading to model distraction and semantic ambiguity. To address this issue, we introduce a distraction-agnostic framework, purified cross-domain matching (PuXIM), which operates on a straightforward principle: masking and matching. We devise a visual-cross-linguistic (VxL) sampler that generates linguistic masks based on semantic labels to obscure semantically irrelevant image features. Our novel contribution is the concept of purified masked matching (PMM), which comprises two processes: (1) reconstruction, which compels the image encoder to reconstruct the masked image feature, and (2) interaction, which involves a transformer decoder that processes both sketch and masked image features to investigate cross-domain relationships for effective matching. Evaluated on the TU-Berlin, Sketchy, and QuickDraw datasets, PuXIM sets new benchmarks in terms of performance. Importantly, the distraction-agnostic nature of the matching process renders PuXIM more conducive to training, enabling efficient adaptation to zero-shot scenarios with reduced data requirements and low data quality.
摘要速写作为多媒体系统中可以替代自然语言的一种新的解决方案,其特点是简单笔画等稀疏的视觉线索与包含背景、前景和纹理等复杂元素的自然图像有很大的不同。这种不对齐对基于零镜头草图的图像检索(ZS-SBIR)提出了实质性的挑战。先前的方法将草图与完整的图像相匹配,并倾向于忽略自然图像中的冗余元素,导致模型分散和语义模糊。为了解决这个问题,我们引入了一个干扰不可知的框架,纯化跨域匹配(PuXIM),它的工作原理很简单:屏蔽和匹配。我们设计了一个视觉跨语言(VxL)采样器,该采样器基于语义标签生成语言掩码,以模糊语义无关的图像特征。我们的新贡献是纯化屏蔽匹配(PMM)的概念,它包括两个过程:(1)重建,它迫使图像编码器重建被屏蔽的图像特征;(2)交互,它涉及一个转换器解码器,它处理草图和被屏蔽的图像特征,以研究跨域关系以进行有效匹配。通过对TU-Berlin、Sketchy和QuickDraw数据集的评估,PuXIM在性能方面设定了新的基准。重要的是,匹配过程的不可知性使得PuXIM更有利于训练,能够有效地适应零射击场景,减少数据需求,降低数据质量。
{"title":"Purified Zero-Shot Sketch-Based Image Retrieval","authors":"Yang Zhou;Jingru Yang;Jin Wang;Kaixiang Huang;Guodong Lu;Shengfeng He","doi":"10.1109/TMM.2025.3632682","DOIUrl":"https://doi.org/10.1109/TMM.2025.3632682","url":null,"abstract":"Sketches, as a new solution in multimedia systems that can replace natural language, are characterized by sparse visual cues such as simple strokes that differ significantly from natural images containing complex elements such as background, foreground, and texture. This misalignment poses substantial challenges for zero-shot sketch-based image retrieval (ZS-SBIR). Prior approaches match sketches to full images and tend to overlook redundant elements in natural images, leading to model distraction and semantic ambiguity. To address this issue, we introduce a distraction-agnostic framework, purified cross-domain matching (PuXIM), which operates on a straightforward principle: masking and matching. We devise a visual-cross-linguistic (VxL) sampler that generates linguistic masks based on semantic labels to obscure semantically irrelevant image features. Our novel contribution is the concept of purified masked matching (PMM), which comprises two processes: (1) <italic>reconstruction</i>, which compels the image encoder to reconstruct the masked image feature, and (2) <italic>interaction</i>, which involves a transformer decoder that processes both sketch and masked image features to investigate cross-domain relationships for effective matching. Evaluated on the TU-Berlin, Sketchy, and QuickDraw datasets, PuXIM sets new benchmarks in terms of performance. Importantly, the distraction-agnostic nature of the matching process renders PuXIM more conducive to training, enabling efficient adaptation to zero-shot scenarios with reduced data requirements and low data quality.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"929-943"},"PeriodicalIF":9.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Scene Image Dehazing via Laplacian Pyramid-Based Conditional Diffusion Model 基于拉普拉斯金字塔的条件扩散模型的实景图像去雾
IF 9.7 1区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2025-11-14 DOI: 10.1109/TMM.2025.3632694
Yongzhen Wang;Jie Sun;Heng Liu;Xiao-Ping Zhang;Mingqiang Wei
Recent diffusion models have demonstrated exceptional efficacy across various image restoration tasks, but still suffer from time-consuming and substantial computational resource consumption. To address these challenges, we present LPCDiff, a novel Laplacian Pyramid-based Conditional Diffusion model designed for real-scene image dehazing. LPCDiff leverages the Laplacian pyramid decomposition to decouple the input image into two components: the low-resolution low-pass image and the high-frequency residuals. These components are subsequently reconstructed through a diffusion model and a well-designed high-frequency residual recovery module. With such a strategy, LPCDiff can substantially accelerate inference speed and reduce computational costs without sacrificing image fidelity. In addition, the framework empowers the model to capture intrinsic high-frequency details and low-frequency structural information within the image, resulting in sharper and more realistic haze-free outputs. Moreover, to extract more valuable information from the limited training data, we introduce a low-frequency refinement module to further enhance the intricate details of the final dehazed images. Through extensive experimentation, our method significantly outperforms 12 state-of-the-art approaches on three real-world and one synthetic image dehazing benchmarks.
最近的扩散模型在各种图像恢复任务中表现出优异的效果,但仍然存在耗时和大量计算资源消耗的问题。为了解决这些挑战,我们提出了LPCDiff,一种新的基于拉普拉斯金字塔的条件扩散模型,专为真实场景图像去雾而设计。LPCDiff利用拉普拉斯金字塔分解将输入图像解耦为两个组件:低分辨率低通图像和高频残差。这些组件随后通过扩散模型和精心设计的高频残余恢复模块进行重建。使用这种策略,LPCDiff可以在不牺牲图像保真度的情况下大大加快推理速度并降低计算成本。此外,该框架使模型能够捕获图像内固有的高频细节和低频结构信息,从而产生更清晰、更真实的无雾输出。此外,为了从有限的训练数据中提取更多有价值的信息,我们引入了低频细化模块,以进一步增强最终去雾图像的复杂细节。通过广泛的实验,我们的方法在三个真实世界和一个合成图像去雾基准上显著优于12种最先进的方法。
{"title":"Real-Scene Image Dehazing via Laplacian Pyramid-Based Conditional Diffusion Model","authors":"Yongzhen Wang;Jie Sun;Heng Liu;Xiao-Ping Zhang;Mingqiang Wei","doi":"10.1109/TMM.2025.3632694","DOIUrl":"https://doi.org/10.1109/TMM.2025.3632694","url":null,"abstract":"Recent diffusion models have demonstrated exceptional efficacy across various image restoration tasks, but still suffer from time-consuming and substantial computational resource consumption. To address these challenges, we present LPCDiff, a novel Laplacian Pyramid-based Conditional Diffusion model designed for real-scene image dehazing. LPCDiff leverages the Laplacian pyramid decomposition to decouple the input image into two components: the low-resolution low-pass image and the high-frequency residuals. These components are subsequently reconstructed through a diffusion model and a well-designed high-frequency residual recovery module. With such a strategy, LPCDiff can substantially accelerate inference speed and reduce computational costs without sacrificing image fidelity. In addition, the framework empowers the model to capture intrinsic high-frequency details and low-frequency structural information within the image, resulting in sharper and more realistic haze-free outputs. Moreover, to extract more valuable information from the limited training data, we introduce a low-frequency refinement module to further enhance the intricate details of the final dehazed images. Through extensive experimentation, our method significantly outperforms 12 state-of-the-art approaches on three real-world and one synthetic image dehazing benchmarks.","PeriodicalId":13273,"journal":{"name":"IEEE Transactions on Multimedia","volume":"28 ","pages":"944-957"},"PeriodicalIF":9.7,"publicationDate":"2025-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145929533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Multimedia
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1