首页 > 最新文献

Multimedia Systems最新文献

英文 中文
Gateinst: instance segmentation with multi-scale gated-enhanced queries in transformer decoder Gateinst:在变压器解码器中使用多尺度门控增强查询进行实例分割
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-20 DOI: 10.1007/s00530-024-01438-1
Chih-Wei Lin, Ye Lin, Shangtai Zhou, Lirong Zhu

Recently, a popular query-based end-to-end framework has been used for instance segmentation. However, queries update based on individual layers or scales of feature maps at each stage of Transformer decoding, which makes queries unable to gather sufficient multi-scale feature information. Therefore, querying these features may result in inconsistent information due to disparities among feature maps and leading to erroneous updates. This study proposes a new network called GateInst, which employs a dual-path auto-select mechanism based on gate structures to overcome these issues. Firstly, we design a block-wise multi-scale feature fusion module that combines features of different scales while maintaining low computational cost. Secondly, we introduce the gated-enhanced queries Transformer decoder that utilizes a gating mechanism to filter and merge the queries generated at different stages to compensate for the inaccuracies in updating queries. GateInst addresses the issue of insufficient feature information and compensates for the problem of cumulative errors in queries. Experiments have shown that GateInst achieves significant gains of 8.4 AP, 5.5 (AP_{50}) over Mask2Former on the self-collected Tree Species Instance Dataset and performs well compared to non-Mask2Former-like and Mask2Former-like networks on self-collected and public COCO datasets, with only a tiny amount of additional computational cost and fast convergence. Code and models are available at https://github.com/FAFU-IMLab/GateInst.

最近,一种流行的基于查询的端到端框架被用于实例分割。然而,在变换器解码的每个阶段,查询都是根据特征图的单个层或尺度进行更新的,这使得查询无法收集到足够的多尺度特征信息。因此,查询这些特征可能会因特征图之间的差异而导致信息不一致,从而导致错误更新。本研究提出了一种名为 GateInst 的新网络,它采用基于门结构的双路径自动选择机制来克服这些问题。首先,我们设计了一个分块式多尺度特征融合模块,在保持较低计算成本的同时,将不同尺度的特征融合在一起。其次,我们引入了门控增强查询变换器解码器,该解码器利用门控机制过滤和合并不同阶段生成的查询,以弥补更新查询的不准确性。GateInst 解决了特征信息不足的问题,并对查询中的累积误差问题进行了补偿。实验表明,在自收集的树种实例数据集上,GateInst 比 Mask2Former 取得了 8.4 AP、5.5 (AP_{50})的显著收益,在自收集和公共 COCO 数据集上,GateInst 与非 Mask2Former 类网络和 Mask2Former 类网络相比表现良好,只增加了极少量的计算成本,而且收敛速度很快。代码和模型见 https://github.com/FAFU-IMLab/GateInst。
{"title":"Gateinst: instance segmentation with multi-scale gated-enhanced queries in transformer decoder","authors":"Chih-Wei Lin, Ye Lin, Shangtai Zhou, Lirong Zhu","doi":"10.1007/s00530-024-01438-1","DOIUrl":"https://doi.org/10.1007/s00530-024-01438-1","url":null,"abstract":"<p>Recently, a popular query-based end-to-end framework has been used for instance segmentation. However, queries update based on individual layers or scales of feature maps at each stage of Transformer decoding, which makes queries unable to gather sufficient multi-scale feature information. Therefore, querying these features may result in inconsistent information due to disparities among feature maps and leading to erroneous updates. This study proposes a new network called GateInst, which employs a dual-path auto-select mechanism based on gate structures to overcome these issues. Firstly, we design a block-wise multi-scale feature fusion module that combines features of different scales while maintaining low computational cost. Secondly, we introduce the gated-enhanced queries Transformer decoder that utilizes a gating mechanism to filter and merge the queries generated at different stages to compensate for the inaccuracies in updating queries. GateInst addresses the issue of insufficient feature information and compensates for the problem of cumulative errors in queries. Experiments have shown that GateInst achieves significant gains of 8.4 <i>AP</i>, 5.5 <span>(AP_{50})</span> over Mask2Former on the self-collected Tree Species Instance Dataset and performs well compared to non-Mask2Former-like and Mask2Former-like networks on self-collected and public COCO datasets, with only a tiny amount of additional computational cost and fast convergence. Code and models are available at https://github.com/FAFU-IMLab/GateInst.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"13 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression SiamS3C:利用中心向导回归进行视觉跟踪的空间通道交叉相关技术
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-20 DOI: 10.1007/s00530-024-01450-5
Jianming Zhang, Wentao Chen, Yufan He, Li-Dan Kuang, Arun Kumar Sangaiah

Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps.

视觉物体跟踪可分为物体分类和边界框回归任务,但只共享一个相关图会导致不准确。连体跟踪器通过交叉相关运算来计算相关图,计算成本较高,而且这种运算要么在通道上进行,要么在空间域进行,导致对全局信息的感知较弱。此外,一些带有中心性分支的连体跟踪器会忽略中心性分支与边界框回归分支之间的关联。为了解决这些问题,我们提出了一种基于空间通道交叉相关和中心性引导回归的视觉物体跟踪器。首先,我们提出了一个空间通道交叉相关模块(SC3M),它在通道和空间域上结合了搜索区域特征和模板特征,从而抑制了干扰因素的干扰。作为一个轻量级模块,SC3M 可以计算输入到不同子网络的双独立相关图。其次,我们提出了由中心性分支和边界框回归分支组成的中心性引导回归子网络。中心性引导整个回归子网络,以增强两个分支的关联性,并进一步抑制低质量的边界框预测。第三,我们在五个具有挑战性的基准上进行了广泛的实验,包括 GOT-10k、VOT2018、TrackingNet、OTB100 和 UAV123。结果表明,我们的跟踪器性能卓越,达到了 48.52 fps 的实时要求。
{"title":"SiamS3C: spatial-channel cross-correlation for visual tracking with centerness-guided regression","authors":"Jianming Zhang, Wentao Chen, Yufan He, Li-Dan Kuang, Arun Kumar Sangaiah","doi":"10.1007/s00530-024-01450-5","DOIUrl":"https://doi.org/10.1007/s00530-024-01450-5","url":null,"abstract":"<p>Visual object tracking can be divided into the object classification and bounding-box regression tasks, but only one sharing correlation map leads to inaccuracy. Siamese trackers compute correlation map by cross-correlation operation with high computational cost, and this operation performed either on channels or in spatial domain results in weak perception of the global information. In addition, some Siamese trackers with a centerness branch ignore the associations between the centerness branch and the bounding-box regression branch. To alleviate these problems, we propose a visual object tracker based on Spatial-Channel Cross-Correlation and Centerness-Guided Regression. Firstly, we propose a spatial-channel cross-correlation module (SC3M) that combines the search region feature and the template feature both on channels and in spatial domain, which suppresses the interference of distractors. As a lightweight module, SC3M can compute dual independent correlation maps inputted to different subnetworks. Secondly, we propose a centerness-guided regression subnetwork consisting of the centerness branch and the bounding-box regression branch. The centerness guides the whole regression subnetwork to enhance the association of two branches and further suppress the low-quality predicted bounding boxes. Thirdly, we have conducted extensive experiments on five challenging benchmarks, including GOT-10k, VOT2018, TrackingNet, OTB100 and UAV123. The results show the excellent performance of our tracker and our tracker achieves real-time requirement at 48.52 fps.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"154 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D model watermarking using surface integrals of generated random vector fields 利用生成的随机向量场的曲面积分进行三维模型水印处理
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-20 DOI: 10.1007/s00530-024-01455-0
Luke Vandenberghe, Chris Joslin

We propose a new semi-blind semi-fragile watermarking algorithm for authenticating triangulated 3D models using the surface integrals of generated random vector fields. Watermark data is embedded into the flux of a vector field across the model’s surface and through gradient-based optimization techniques, the vertices are shifted to obtain the modified flux values. The watermark can be extracted through the recomputation of the surface integrals and compared using correlation measures. This algorithm is invariant to Euclidean transformations including rotations and translation, reduces distortion, and achieves improved robustness to additive noise.

我们提出了一种新的半盲半脆弱水印算法,利用生成的随机矢量场的表面积分来验证三角化三维模型。水印数据被嵌入模型表面矢量场的通量中,并通过基于梯度的优化技术移动顶点以获得修改后的通量值。水印可以通过重新计算表面积分提取出来,并使用相关性度量进行比较。该算法不受欧几里得变换(包括旋转和平移)的影响,减少了失真,并提高了对加性噪声的鲁棒性。
{"title":"3D model watermarking using surface integrals of generated random vector fields","authors":"Luke Vandenberghe, Chris Joslin","doi":"10.1007/s00530-024-01455-0","DOIUrl":"https://doi.org/10.1007/s00530-024-01455-0","url":null,"abstract":"<p>We propose a new semi-blind semi-fragile watermarking algorithm for authenticating triangulated 3D models using the surface integrals of generated random vector fields. Watermark data is embedded into the flux of a vector field across the model’s surface and through gradient-based optimization techniques, the vertices are shifted to obtain the modified flux values. The watermark can be extracted through the recomputation of the surface integrals and compared using correlation measures. This algorithm is invariant to Euclidean transformations including rotations and translation, reduces distortion, and achieves improved robustness to additive noise.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"27 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Anomaly detection in surveillance videos using Transformer with margin learning 利用边际学习变压器检测监控视频中的异常情况
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-16 DOI: 10.1007/s00530-024-01443-4
Dicong Wang, Kaijun Wu

Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance learning (MIL) problem. However, quite a few of these methods tend to primarily concentrate on time periods when anomalies occur discernibly. To recognize anomalous events, they rely solely on detecting significant changes in appearance or motion, ignoring the temporal completeness or continuity that anomalous events possess by nature. In addition, they also disregard the subtle correlations at the transitional boundaries between normal and abnormal states. Therefore, we propose a weakly supervised learning approach based on Transformer with margin learning for video anomaly detection. Specifically, our network effectively captures temporal changes around the occurrence of anomalies by utilizing the benefits of Transformer blocks, which are adept at capturing long-range dependencies in anomalous events. Secondly, to tackle challenging cases, i.e., normal events with high similarity to anomalous events, we employed a hard score memory. The purpose of this memory is to store the anomaly scores of hard samples, enabling iterative optimization training on those hard instances. Additionally, to bolster the discriminative capability of the model at the score level, we utilize pseudo-labels for anomalous events to provide supplementary support in detection. Experiments were conducted on two large-scale datasets, namely the ShanghaiTech dataset and the UCF-Crime dataset, and they achieved highly favorable results. The results of the experiments demonstrate that the proposed method is sensitive to anomalous events while performing competitively against state-of-the-art methods.

弱监督视频异常检测(WSVAD)是图像和视频处理领域中一个极具研究导向和挑战性的项目。在之前对 WSVAD 的研究中,它通常被表述为一个多实例学习 (MIL) 问题。然而,这些方法中的相当一部分往往主要集中在异常情况明显发生的时间段。要识别异常事件,它们只依赖于检测外观或运动的显著变化,而忽略了异常事件本质上具有的时间完整性或连续性。此外,它们还忽略了正常与异常状态之间过渡边界的微妙关联。因此,我们提出了一种基于 Transformer 的弱监督学习方法,利用边际学习进行视频异常检测。具体地说,我们的网络利用 Transformer 模块善于捕捉异常事件中的长距离依赖关系的优势,有效地捕捉了异常事件发生前后的时间变化。其次,为了应对具有挑战性的情况,即与异常事件高度相似的正常事件,我们采用了硬分数存储器。该存储器的目的是存储高难度样本的异常得分,以便在这些高难度实例上进行迭代优化训练。此外,为了增强模型在分数层面的判别能力,我们还利用异常事件的伪标签为检测提供辅助支持。我们在两个大型数据集(即上海科技数据集和 UCF-Crime 数据集)上进行了实验,并取得了非常好的结果。实验结果表明,所提出的方法对异常事件很敏感,同时与最先进的方法相比具有很强的竞争力。
{"title":"Anomaly detection in surveillance videos using Transformer with margin learning","authors":"Dicong Wang, Kaijun Wu","doi":"10.1007/s00530-024-01443-4","DOIUrl":"https://doi.org/10.1007/s00530-024-01443-4","url":null,"abstract":"<p>Weakly supervised video anomaly detection (WSVAD) constitutes a highly research-oriented and challenging project within the domains of image and video processing. In prior studies of WSVAD, it has typically been formulated as a multiple-instance learning (MIL) problem. However, quite a few of these methods tend to primarily concentrate on time periods when anomalies occur discernibly. To recognize anomalous events, they rely solely on detecting significant changes in appearance or motion, ignoring the temporal completeness or continuity that anomalous events possess by nature. In addition, they also disregard the subtle correlations at the transitional boundaries between normal and abnormal states. Therefore, we propose a weakly supervised learning approach based on Transformer with margin learning for video anomaly detection. Specifically, our network effectively captures temporal changes around the occurrence of anomalies by utilizing the benefits of Transformer blocks, which are adept at capturing long-range dependencies in anomalous events. Secondly, to tackle challenging cases, i.e., normal events with high similarity to anomalous events, we employed a hard score memory. The purpose of this memory is to store the anomaly scores of hard samples, enabling iterative optimization training on those hard instances. Additionally, to bolster the discriminative capability of the model at the score level, we utilize pseudo-labels for anomalous events to provide supplementary support in detection. Experiments were conducted on two large-scale datasets, namely the ShanghaiTech dataset and the UCF-Crime dataset, and they achieved highly favorable results. The results of the experiments demonstrate that the proposed method is sensitive to anomalous events while performing competitively against state-of-the-art methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"49 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Remote sensing image cloud removal based on multi-scale spatial information perception 基于多尺度空间信息感知的遥感图像云去除技术
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-16 DOI: 10.1007/s00530-024-01442-5
Aozhe Dou, Yang Hao, Weifeng Liu, Liangliang Li, Zhenzhong Wang, Baodi Liu

Remote sensing imagery is indispensable in diverse domains, including geographic information systems, climate monitoring, agricultural planning, and disaster management. Nonetheless, cloud cover can drastically degrade the utility and quality of these images. Current deep learning-based cloud removal methods rely on convolutional neural networks to extract features at the same scale, which can overlook detailed and global information, resulting in suboptimal cloud removal performance. To overcome these challenges, we develop a method for cloud removal that leverages multi-scale spatial information perception. Our technique employs convolution kernels of various sizes, enabling the integration of both global semantic information and local detail information. An attention mechanism enhances this process by targeting key areas within the images, and dynamically adjusting channel weights to improve feature reconstruction. We compared our method with current popular cloud removal methods across three datasets, and the results show that our proposed method improves metrics such as PSNR, SSIM, and cosine similarity, verifying the effectiveness of our method in cloud removal.

遥感图像在地理信息系统、气候监测、农业规划和灾害管理等多个领域都不可或缺。然而,云层会大大降低这些图像的实用性和质量。目前基于深度学习的云层去除方法依赖卷积神经网络提取同一尺度的特征,这可能会忽略细节和全局信息,导致云层去除效果不理想。为了克服这些挑战,我们开发了一种利用多尺度空间信息感知的去云方法。我们的技术采用了不同大小的卷积核,能够整合全局语义信息和局部细节信息。注意力机制通过锁定图像中的关键区域和动态调整通道权重来改进特征重建,从而增强了这一过程。我们在三个数据集上比较了我们的方法和当前流行的云去除方法,结果表明我们提出的方法提高了 PSNR、SSIM 和余弦相似度等指标,验证了我们的方法在云去除方面的有效性。
{"title":"Remote sensing image cloud removal based on multi-scale spatial information perception","authors":"Aozhe Dou, Yang Hao, Weifeng Liu, Liangliang Li, Zhenzhong Wang, Baodi Liu","doi":"10.1007/s00530-024-01442-5","DOIUrl":"https://doi.org/10.1007/s00530-024-01442-5","url":null,"abstract":"<p>Remote sensing imagery is indispensable in diverse domains, including geographic information systems, climate monitoring, agricultural planning, and disaster management. Nonetheless, cloud cover can drastically degrade the utility and quality of these images. Current deep learning-based cloud removal methods rely on convolutional neural networks to extract features at the same scale, which can overlook detailed and global information, resulting in suboptimal cloud removal performance. To overcome these challenges, we develop a method for cloud removal that leverages multi-scale spatial information perception. Our technique employs convolution kernels of various sizes, enabling the integration of both global semantic information and local detail information. An attention mechanism enhances this process by targeting key areas within the images, and dynamically adjusting channel weights to improve feature reconstruction. We compared our method with current popular cloud removal methods across three datasets, and the results show that our proposed method improves metrics such as PSNR, SSIM, and cosine similarity, verifying the effectiveness of our method in cloud removal.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"11 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploiting multi-level consistency learning for source-free domain adaptation 利用多层次一致性学习实现无源域适应
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-16 DOI: 10.1007/s00530-024-01444-3
Jihong Ouyang, Zhengjie Zhang, Qingyi Meng, Ximing Li, Jinjin Chi

Due to data privacy concerns, a more practical task known as Source-free Unsupervised Domain Adaptation (SFUDA) has gained significant attention recently. SFUDA adapts a pre-trained source model to the target domain without access to the source domain data. Existing SFUDA methods typically rely on per-class cluster structure to refine labels. However, these clusters often contain samples with different ground truth labels, leading to label noise. To address this issue, we propose a novel Multi-level Consistency Learning (MLCL) method. MLCL focuses on learning discriminative class-wise target feature representations, resulting in more accurate cluster structures. Specifically, at the inter-domain level, we construct pseudo-source domain data based on the entropy criterion. We align pseudo-labeled target domain sample with corresponding pseudo-source domain prototype by introducing a prototype contrastive loss. This loss ensures that our model can learn discriminative class-wise feature representations effectively. At the intra-domain level, we enforce consistency among different views of the same image by employing consistency-based self-training. The self-training further enhances the feature representation ability of our model. Additionally, we apply information maximization regularization to facilitate target sample clustering and promote diversity. Our extensive experiments conducted on four benchmark datasets for classification demonstrate the superior performance of the proposed MLCL method. The code is here.

出于对数据隐私的考虑,一种被称为无源无监督领域适配(SFUDA)的更实用的任务最近获得了极大关注。SFUDA 将预先训练好的源模型适配到目标领域,而无需访问源领域数据。现有的 SFUDA 方法通常依赖于每类聚类结构来完善标签。然而,这些聚类通常包含具有不同真实标签的样本,从而导致标签噪声。为了解决这个问题,我们提出了一种新颖的多级一致性学习(MLCL)方法。MLCL 侧重于学习具有区分性的类别目标特征表征,从而获得更准确的聚类结构。具体来说,在域间层面,我们根据熵标准构建伪源域数据。我们通过引入原型对比损失(prototype contrastive loss),将伪标签目标域样本与相应的伪源域原型对齐。这种损失可确保我们的模型能有效地学习具有区分性的分类特征表征。在域内层面,我们通过采用基于一致性的自我训练来加强同一图像不同视图之间的一致性。自我训练进一步增强了模型的特征表征能力。此外,我们还应用了信息最大化正则化技术来促进目标样本聚类和多样性。我们在四个基准数据集上进行了广泛的分类实验,证明了所提出的 MLCL 方法的卓越性能。代码在此。
{"title":"Exploiting multi-level consistency learning for source-free domain adaptation","authors":"Jihong Ouyang, Zhengjie Zhang, Qingyi Meng, Ximing Li, Jinjin Chi","doi":"10.1007/s00530-024-01444-3","DOIUrl":"https://doi.org/10.1007/s00530-024-01444-3","url":null,"abstract":"<p>Due to data privacy concerns, a more practical task known as Source-free Unsupervised Domain Adaptation (SFUDA) has gained significant attention recently. SFUDA adapts a pre-trained source model to the target domain without access to the source domain data. Existing SFUDA methods typically rely on per-class cluster structure to refine labels. However, these clusters often contain samples with different ground truth labels, leading to label noise. To address this issue, we propose a novel Multi-level Consistency Learning (MLCL) method. MLCL focuses on learning discriminative class-wise target feature representations, resulting in more accurate cluster structures. Specifically, at the inter-domain level, we construct pseudo-source domain data based on the entropy criterion. We align pseudo-labeled target domain sample with corresponding pseudo-source domain prototype by introducing a prototype contrastive loss. This loss ensures that our model can learn discriminative class-wise feature representations effectively. At the intra-domain level, we enforce consistency among different views of the same image by employing consistency-based self-training. The self-training further enhances the feature representation ability of our model. Additionally, we apply information maximization regularization to facilitate target sample clustering and promote diversity. Our extensive experiments conducted on four benchmark datasets for classification demonstrate the superior performance of the proposed MLCL method. The code is here.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"58 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrate encryption of multiple images based on a new hyperchaotic system and Baker map 基于新的超混沌系统和贝克图,对多幅图像进行整合加密
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1007/s00530-024-01449-y
Xingbin Liu

Image encryption serves as a crucial means to safeguard information against unauthorized access during both transmission and storage phases. This paper introduces an integrated encryption algorithm tailored for multiple images, leveraging a novel hyperchaotic system and the Baker map to augment the key space and enhance security measures. The methodology encompasses a permutation-diffusion framework, employing sequences derived from the hyperchaotic system for both permutation and diffusion operations. Initially, the multiple images undergo intermixing, consolidating them into a singular image. Subsequently, the Baker map is employed to further scramble this amalgamated image, thereby extending the scrambling period. Ultimately, the ciphertext image is generated through forward–backward diffusion applied to the pixel sequence of the Zigzag scanned image. Experimental findings substantiate the high-security efficacy of the proposed scheme, demonstrating resilience against diverse threats.

图像加密是保护信息在传输和存储阶段免遭未经授权访问的重要手段。本文介绍了一种专为多图像量身定制的集成加密算法,利用新颖的超混沌系统和贝克图来扩展密钥空间并增强安全措施。该方法包含一个置换-扩散框架,在置换和扩散操作中都采用了从超混沌系统中提取的序列。首先,多幅图像经过混合,合并成一幅单一图像。随后,利用贝克图进一步扰乱这个混合图像,从而延长扰乱时间。最后,通过对 "之 "字形扫描图像的像素序列进行前向-后向扩散,生成密文图像。实验结果证实了所提方案的高安全性,证明了该方案能够抵御各种威胁。
{"title":"Integrate encryption of multiple images based on a new hyperchaotic system and Baker map","authors":"Xingbin Liu","doi":"10.1007/s00530-024-01449-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01449-y","url":null,"abstract":"<p>Image encryption serves as a crucial means to safeguard information against unauthorized access during both transmission and storage phases. This paper introduces an integrated encryption algorithm tailored for multiple images, leveraging a novel hyperchaotic system and the Baker map to augment the key space and enhance security measures. The methodology encompasses a permutation-diffusion framework, employing sequences derived from the hyperchaotic system for both permutation and diffusion operations. Initially, the multiple images undergo intermixing, consolidating them into a singular image. Subsequently, the Baker map is employed to further scramble this amalgamated image, thereby extending the scrambling period. Ultimately, the ciphertext image is generated through forward–backward diffusion applied to the pixel sequence of the Zigzag scanned image. Experimental findings substantiate the high-security efficacy of the proposed scheme, demonstrating resilience against diverse threats.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"44 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D human pose estimation method based on multi-constrained dilated convolutions 基于多约束扩张卷积的 3D 人体姿态估计方法
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-14 DOI: 10.1007/s00530-024-01441-6
Huaijun Wang, Bingqian Bai, Junhuai Li, Hui Ke, Wei Xiang

In recent years, research on 2D to 3D human pose estimation methods has gained increasing attention. However, these methods, such as depth ambiguity and self-occlusion, still need to be addressed. To address these problems, we propose a 3D human pose estimation method based on multi-constrained dilated convolutions. This approach involves using a local constraint based on graph convolution and a global constraint based on a fully connected network. It also utilizes a dilated temporal convolution network to capture long-term temporal correlations of human poses. Taking 2D joint coordinate sequences as input, the local constraint module constructs cross-joint and equipotential connections for the human skeleton. The global constraint module encodes global semantic information about posture. Finally, the constraint modules and the temporal correlation of human posture are alternately connected to achieve 3D human posture estimation. The method was validated on the public datasets Human3.6M and MPI-INF-3DHP, and the results show that the proposed method effectively reduces the error in 3D human pose estimation and demonstrates a certain degree of generalization ability.

近年来,从二维到三维人体姿态估计方法的研究越来越受到关注。然而,这些方法仍需解决深度模糊和自闭塞等问题。为了解决这些问题,我们提出了一种基于多约束扩张卷积的三维人体姿态估计方法。这种方法包括使用基于图卷积的局部约束和基于全连接网络的全局约束。它还利用扩张时间卷积网络来捕捉人体姿势的长期时间相关性。局部约束模块以二维关节坐标序列为输入,构建人体骨骼的交叉关节和等电位连接。全局约束模块对姿势的全局语义信息进行编码。最后,约束模块和人体姿势的时间相关性交替连接,实现三维人体姿势估计。该方法在公共数据集 Human3.6M 和 MPI-INF-3DHP 上进行了验证,结果表明所提出的方法有效地减少了三维人体姿态估计的误差,并表现出一定的泛化能力。
{"title":"3D human pose estimation method based on multi-constrained dilated convolutions","authors":"Huaijun Wang, Bingqian Bai, Junhuai Li, Hui Ke, Wei Xiang","doi":"10.1007/s00530-024-01441-6","DOIUrl":"https://doi.org/10.1007/s00530-024-01441-6","url":null,"abstract":"<p>In recent years, research on 2D to 3D human pose estimation methods has gained increasing attention. However, these methods, such as depth ambiguity and self-occlusion, still need to be addressed. To address these problems, we propose a 3D human pose estimation method based on multi-constrained dilated convolutions. This approach involves using a local constraint based on graph convolution and a global constraint based on a fully connected network. It also utilizes a dilated temporal convolution network to capture long-term temporal correlations of human poses. Taking 2D joint coordinate sequences as input, the local constraint module constructs cross-joint and equipotential connections for the human skeleton. The global constraint module encodes global semantic information about posture. Finally, the constraint modules and the temporal correlation of human posture are alternately connected to achieve 3D human posture estimation. The method was validated on the public datasets Human3.6M and MPI-INF-3DHP, and the results show that the proposed method effectively reduces the error in 3D human pose estimation and demonstrates a certain degree of generalization ability.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"258 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring multi-dimensional interests for session-based recommendation 探索基于会话推荐的多维兴趣
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-13 DOI: 10.1007/s00530-024-01437-2
Yuhan Yang, Jing Sun, Guojia An

Session-based recommendation (SBR) aims to recommend the next clicked item to users by mining the user’s interaction sequences in the current session. It has received widespread attention recently due to its excellent privacy protection capabilities. However, existing SBR methods have the following limitations: (1) there exists noisy information in session sequences; (2) it is a challenge to simultaneously model both the long-term stable and dynamic changing interests of users; (3) the internal relationships between different interest representations are often neglected. To address the above issues, we propose an Exploring Multi-Dimensional Interests for session-based recommendation model, termed EMDI, which attempts to predict more accurate and complete user intentions from multiple dimensions of user interests. Specifically, the EMDI contains the following three aspects: (1) the interest enhancement module aims to filter noise and enhance the interest expressions in the user’s behavior sequences, providing high-quality item embeddings; (2) the interest mining module separately mines users’ multi-dimensional interests, including static interests, local dynamic interests, and global dynamic interests, to capture users’ tendencies in different dimensions of interest; (3) the interest fusion module is designed to dynamically aggregate users’ interest representations from different dimensions through a novel multi-layer gated fusion network so that the implicit association between interest representations can be captured. Extensive experimental results show that the EMDI performs significantly better than other state-of-the-art methods.

基于会话的推荐(SBR)旨在通过挖掘用户在当前会话中的交互序列,向用户推荐下一个点击的项目。由于其出色的隐私保护能力,近年来受到广泛关注。然而,现有的 SBR 方法存在以下局限性:(1) 会话序列中存在噪声信息;(2) 同时模拟用户长期稳定和动态变化的兴趣是一个挑战;(3) 不同兴趣表征之间的内部关系往往被忽视。针对上述问题,我们提出了基于会话推荐的多维兴趣探索模型(EMDI),试图从用户兴趣的多个维度预测更准确、更完整的用户意图。具体来说,EMDI 包括以下三个方面:(1)兴趣增强模块旨在过滤噪声,增强用户行为序列中的兴趣表达,提供高质量的项目嵌入;(2)兴趣挖掘模块分别挖掘用户的多维兴趣,包括静态兴趣、局部动态兴趣和全局动态兴趣,捕捉用户在不同兴趣维度上的兴趣倾向;(3)兴趣融合模块旨在通过新颖的多层门控融合网络,动态聚合用户在不同维度上的兴趣表征,从而捕捉兴趣表征之间的隐性关联。广泛的实验结果表明,EMDI 的性能明显优于其他最先进的方法。
{"title":"Exploring multi-dimensional interests for session-based recommendation","authors":"Yuhan Yang, Jing Sun, Guojia An","doi":"10.1007/s00530-024-01437-2","DOIUrl":"https://doi.org/10.1007/s00530-024-01437-2","url":null,"abstract":"<p>Session-based recommendation (SBR) aims to recommend the next clicked item to users by mining the user’s interaction sequences in the current session. It has received widespread attention recently due to its excellent privacy protection capabilities. However, existing SBR methods have the following limitations: (1) there exists noisy information in session sequences; (2) it is a challenge to simultaneously model both the long-term stable and dynamic changing interests of users; (3) the internal relationships between different interest representations are often neglected. To address the above issues, we propose an <u>E</u>xploring <u>M</u>ulti-<u>D</u>imensional <u>I</u>nterests for session-based recommendation model, termed EMDI, which attempts to predict more accurate and complete user intentions from multiple dimensions of user interests. Specifically, the EMDI contains the following three aspects: (1) the interest enhancement module aims to filter noise and enhance the interest expressions in the user’s behavior sequences, providing high-quality item embeddings; (2) the interest mining module separately mines users’ multi-dimensional interests, including static interests, local dynamic interests, and global dynamic interests, to capture users’ tendencies in different dimensions of interest; (3) the interest fusion module is designed to dynamically aggregate users’ interest representations from different dimensions through a novel multi-layer gated fusion network so that the implicit association between interest representations can be captured. Extensive experimental results show that the EMDI performs significantly better than other state-of-the-art methods.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"82 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PS-YOLO: a small object detector based on efficient convolution and multi-scale feature fusion PS-YOLO:基于高效卷积和多尺度特征融合的小物体检测器
IF 3.9 3区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-08-13 DOI: 10.1007/s00530-024-01447-0
Shifeng Peng, Xin Fan, Shengwei Tian, Long Yu

Compared to generalized object detection, research on small object detection has been slow, mainly due to the need to learn appropriate features from limited information about small objects. This is coupled with difficulties such as information loss during the forward propagation of neural networks. In order to solve this problem, this paper proposes an object detector named PS-YOLO with a model: (1) Reconstructs the C2f module to reduce the weakening or loss of small object features during the deep superposition of the backbone network. (2) Optimizes the neck feature fusion using the PD module, which fuses features at different levels and sizes to improve the model’s feature fusion capability at multiple scales. (3) Design the multi-channel aggregate receptive field module (MCARF) for downsampling to extend the image receptive field and recognize more local information. The experimental results of this method on three public datasets show that the algorithm achieves satisfactory accuracy, prediction, and recall.

与广义物体检测相比,小物体检测方面的研究进展缓慢,主要原因是需要从有限的小物体信息中学习适当的特征。再加上神经网络前向传播过程中的信息丢失等困难。为了解决这一问题,本文提出了一种名为 PS-YOLO 的物体检测器,其模型为:(1)重构 C2f 模块,减少骨干网络深度叠加过程中对小物体特征的削弱或损失。(2) 利用 PD 模块优化颈部特征融合,融合不同层次和大小的特征,提高模型在多尺度上的特征融合能力。(3) 设计多通道聚合感受野模块(MCARF)进行降采样,扩展图像感受野,识别更多局部信息。该方法在三个公共数据集上的实验结果表明,算法的准确率、预测率和召回率都达到了令人满意的水平。
{"title":"PS-YOLO: a small object detector based on efficient convolution and multi-scale feature fusion","authors":"Shifeng Peng, Xin Fan, Shengwei Tian, Long Yu","doi":"10.1007/s00530-024-01447-0","DOIUrl":"https://doi.org/10.1007/s00530-024-01447-0","url":null,"abstract":"<p>Compared to generalized object detection, research on small object detection has been slow, mainly due to the need to learn appropriate features from limited information about small objects. This is coupled with difficulties such as information loss during the forward propagation of neural networks. In order to solve this problem, this paper proposes an object detector named PS-YOLO with a model: (1) Reconstructs the C2f module to reduce the weakening or loss of small object features during the deep superposition of the backbone network. (2) Optimizes the neck feature fusion using the PD module, which fuses features at different levels and sizes to improve the model’s feature fusion capability at multiple scales. (3) Design the multi-channel aggregate receptive field module (MCARF) for downsampling to extend the image receptive field and recognize more local information. The experimental results of this method on three public datasets show that the algorithm achieves satisfactory accuracy, prediction, and recall.</p>","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"12 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142210659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Multimedia Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1