Multimedia Systems最新文献_第6页

Effective ensemble based intrusion detection and energy efficient load balancing using sunflower optimization in distributed wireless sensor network 在分布式无线传感器网络中使用向日葵优化技术进行基于集合的有效入侵检测和节能负载平衡

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-29 DOI: 10.1007/s00530-024-01388-8

V. S. Prasanth, A. Mary Posonia, A. Parveen Akhther

Wireless sensor networks (WSNs) play a very important role in providing real-time data access for big data and internet of things applications. Despite this, WSNs’ open deployment makes them highly susceptible to various malicious attacks, energy constraints, and decentralized governance. For mission-critical applications in WSNs, it is crucial to identify rogue sensor devices and remove the sensed data they contain. The resource-constrained nature of sensor devices prevents the direct application of standard cryptography and authentication techniques in WSNs. Low latency and energy-efficient methods are therefore needed. An efficient and safe routing system is created in this study. Initially the outliers are detected from deployed nodes using stacking based ensemble learning approach. Deep neural network (DNN) and long short term memory (LSTM) are two different basic classifiers and multilayer perceptron (MLP) is utilized as a Meta classifier in the ensemble method. The normal nodes are considered for further process. Then, distance, density and residual energy based cluster head selection and cluster formations are done. Sunflower optimization algorithm (SOA) is employed in this approach for routing purpose to improve energy efficiency and load balancing. Superior transmission routing can potentially obtained by taking the shortest way. This proposed method achieves 95% accuracy for the intrusion detection phase and 92% is the packet delivery ratio for energy efficient routing. Consequently, the proposed method is the most effective option for load balancing with intrusion detection.

无线传感器网络（WSN）在为大数据和物联网应用提供实时数据访问方面发挥着非常重要的作用。尽管如此，WSN 的开放部署使其极易受到各种恶意攻击、能源限制和分散治理的影响。对于 WSN 中的关键任务应用来说，识别恶意传感器设备并删除其中包含的传感数据至关重要。传感器设备的资源受限特性阻碍了标准加密和认证技术在 WSN 中的直接应用。因此，需要低延迟和节能的方法。本研究创建了一个高效、安全的路由系统。首先，使用基于堆叠的集合学习方法从部署的节点中检测异常值。深度神经网络（DNN）和长短期记忆（LSTM）是两种不同的基本分类器，多层感知器（MLP）被用作集合方法中的元分类器。在进一步处理过程中会考虑正常节点。然后，进行基于距离、密度和剩余能量的簇头选择和簇的形成。该方法采用向日葵优化算法（SOA）进行路由选择，以提高能效和负载平衡。通过选择最短路径，可以获得最佳传输路由。所提出的方法在入侵检测阶段达到了 95% 的准确率，在节能路由方面达到了 92% 的数据包传送率。因此，所提出的方法是通过入侵检测实现负载平衡的最有效选择。

{"title":"Effective ensemble based intrusion detection and energy efficient load balancing using sunflower optimization in distributed wireless sensor network","authors":"V. S. Prasanth, A. Mary Posonia, A. Parveen Akhther","doi":"10.1007/s00530-024-01388-8","DOIUrl":"https://doi.org/10.1007/s00530-024-01388-8","url":null,"abstract":"Wireless sensor networks (WSNs) play a very important role in providing real-time data access for big data and internet of things applications. Despite this, WSNs’ open deployment makes them highly susceptible to various malicious attacks, energy constraints, and decentralized governance. For mission-critical applications in WSNs, it is crucial to identify rogue sensor devices and remove the sensed data they contain. The resource-constrained nature of sensor devices prevents the direct application of standard cryptography and authentication techniques in WSNs. Low latency and energy-efficient methods are therefore needed. An efficient and safe routing system is created in this study. Initially the outliers are detected from deployed nodes using stacking based ensemble learning approach. Deep neural network (DNN) and long short term memory (LSTM) are two different basic classifiers and multilayer perceptron (MLP) is utilized as a Meta classifier in the ensemble method. The normal nodes are considered for further process. Then, distance, density and residual energy based cluster head selection and cluster formations are done. Sunflower optimization algorithm (SOA) is employed in this approach for routing purpose to improve energy efficiency and load balancing. Superior transmission routing can potentially obtained by taking the shortest way. This proposed method achieves 95% accuracy for the intrusion detection phase and 92% is the packet delivery ratio for energy efficient routing. Consequently, the proposed method is the most effective option for load balancing with intrusion detection.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"8 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866097","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A survey of multimodal federated learning: background, applications, and perspectives 多模态联合学习调查：背景、应用和前景

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-29 DOI: 10.1007/s00530-024-01422-9

Hao Pan, Xiaoli Zhao, Lipeng He, Yicong Shi, Xiaogang Lin

Multimodal Federated Learning (MMFL) is a novel machine learning technique that enhances the capabilities of traditional Federated Learning (FL) to support collaborative training of local models using data available in various modalities. With the generation and storage of a vast amount of multimodal data from the internet, sensors, and mobile devices, as well as the rapid iteration of artificial intelligence models, the demand for multimodal models is growing rapidly. While FL has been widely studied in the past few years, most of the existing research was based in unimodal settings. With the hope of inspiring more applications and research within the MMFL paradigm, we conduct a comprehensive review of the progress and challenges in various aspects of state-of-the-art MMFL. Specifically, we analyze the research motivation for MMFL, propose a new classification method of existing research, discuss the available datasets and application scenarios, and put forward perspectives on the opportunities and challenges faced by MMFL.

多模态联合学习（MMFL）是一种新颖的机器学习技术，它增强了传统联合学习（FL）的功能，支持使用各种模态的数据对本地模型进行协作训练。随着来自互联网、传感器和移动设备的大量多模态数据的生成和存储，以及人工智能模型的快速迭代，对多模态模型的需求正在迅速增长。虽然 FL 在过去几年中得到了广泛研究，但现有研究大多基于单模态环境。为了激励更多的应用和研究，我们对最先进的多模态语言模型的各个方面的进展和挑战进行了全面回顾。具体来说，我们分析了 MMFL 的研究动机，提出了现有研究的新分类方法，讨论了可用数据集和应用场景，并对 MMFL 面临的机遇和挑战提出了展望。

引用次数: 0

GAN-based image steganography by exploiting transform domain knowledge with deep networks 利用深度网络的变换域知识，实现基于 GAN 的图像隐写术

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-29 DOI: 10.1007/s00530-024-01427-4

Xiao Li, Liquan Chen, Jianchang Lai, Zhangjie Fu, Suhui Liu

Image steganography secures the transmission of secret information by covering it under routine multimedia transmission. During image generation based on Generative Adversarial Network (GAN), the embedding and recovery of secret bits can rely entirely on deep networks, relieving many manual design efforts. However, existing GAN-based methods always design deep networks by adapting generic deep learning structures to image steganography. These structures themselves lack the feature extraction that is effective for steganography, resulting in the low imperceptibility of these methods. To address the problem, we propose GAN-based image steganography by exploiting transform domain knowledge with deep networks, called EStegTGANs. Different from existing GAN-based methods, we explicitly introduce transform domain knowledge with Discrete Wavelet Transform (DWT) and its inverse (IDWT) in deep networks, ensuring that each network performs with DWT features. Specifically, the encoder embeds secrets and generates stego images with the explicit DWT and IDWT approaches. The decoder recovers secrets and the discriminator evaluates feature distribution with the explicit DWT approach. By utilizing traditional DWT and IDWT approaches, we propose EStegTGAN-coe, which directly adopts the DWT coefficients of pixels for embedding and recovering. To create more feature redundancy for secrets, we extract DWT features from the intermediate features in deep networks for embedding and recovering. We then propose EStegTGAN-DWT with traditional DWT and IDWT approaches. To entirely rely on deep networks without traditional filters, we design the convolutional DWT and IDWT approaches that conduct the same feature transformation on features as traditional approaches. We further replace the traditional approaches in EStegTGAN-DWT with our proposed convolutional approaches. Comprehensive experimental results demonstrate that our proposals significantly improve the imperceptibility and our designed convolutional DWT and IDWT approaches are more effective in distinguishing high-frequency characteristics of images for steganography than traditional DWT and IDWT approaches.

图像隐写术通过在常规多媒体传输中覆盖秘密信息来确保信息传输的安全性。在基于生成对抗网络（GAN）的图像生成过程中，秘密比特的嵌入和恢复可以完全依赖于深度网络，从而减轻了许多人工设计工作。然而，现有的基于生成对抗网络（GAN）的方法在设计深度网络时，总是将通用的深度学习结构用于图像隐写术。这些结构本身缺乏对隐写有效的特征提取，导致这些方法的不可感知性很低。为了解决这个问题，我们提出了基于 GAN 的图像隐写术，利用深度网络的变换领域知识，称为 EStegTGANs。与现有的基于 GAN 的方法不同，我们在深度网络中明确引入了离散小波变换（DWT）及其逆变换（IDWT）的变换域知识，确保每个网络都具有 DWT 特征。具体来说，编码器利用显式 DWT 和 IDWT 方法嵌入秘密并生成偷窃图像。解码器利用显式 DWT 方法恢复秘密，鉴别器评估特征分布。通过利用传统的 DWT 和 IDWT 方法，我们提出了 EStegTGAN-coe，它直接采用像素的 DWT 系数进行嵌入和恢复。为了创造更多的保密特征冗余，我们从深度网络的中间特征中提取 DWT 特征进行嵌入和恢复。然后，我们提出了 EStegTGAN-DWT 与传统的 DWT 和 IDWT 方法。为了完全依赖深度网络而不使用传统滤波器，我们设计了卷积 DWT 和 IDWT 方法，对特征进行与传统方法相同的特征变换。在 EStegTGAN-DWT 中，我们进一步用我们提出的卷积方法取代了传统方法。综合实验结果表明，与传统的 DWT 和 IDWT 方法相比，我们的建议大大提高了不可感知性，而且我们设计的卷积 DWT 和 IDWT 方法在区分图像的高频特征以进行隐写术方面更加有效。

{"title":"GAN-based image steganography by exploiting transform domain knowledge with deep networks","authors":"Xiao Li, Liquan Chen, Jianchang Lai, Zhangjie Fu, Suhui Liu","doi":"10.1007/s00530-024-01427-4","DOIUrl":"https://doi.org/10.1007/s00530-024-01427-4","url":null,"abstract":"Image steganography secures the transmission of secret information by covering it under routine multimedia transmission. During image generation based on Generative Adversarial Network (GAN), the embedding and recovery of secret bits can rely entirely on deep networks, relieving many manual design efforts. However, existing GAN-based methods always design deep networks by adapting generic deep learning structures to image steganography. These structures themselves lack the feature extraction that is effective for steganography, resulting in the low imperceptibility of these methods. To address the problem, we propose GAN-based image steganography by exploiting transform domain knowledge with deep networks, called EStegTGANs. Different from existing GAN-based methods, we explicitly introduce transform domain knowledge with Discrete Wavelet Transform (DWT) and its inverse (IDWT) in deep networks, ensuring that each network performs with DWT features. Specifically, the encoder embeds secrets and generates stego images with the explicit DWT and IDWT approaches. The decoder recovers secrets and the discriminator evaluates feature distribution with the explicit DWT approach. By utilizing traditional DWT and IDWT approaches, we propose EStegTGAN-coe, which directly adopts the DWT coefficients of pixels for embedding and recovering. To create more feature redundancy for secrets, we extract DWT features from the intermediate features in deep networks for embedding and recovering. We then propose EStegTGAN-DWT with traditional DWT and IDWT approaches. To entirely rely on deep networks without traditional filters, we design the convolutional DWT and IDWT approaches that conduct the same feature transformation on features as traditional approaches. We further replace the traditional approaches in EStegTGAN-DWT with our proposed convolutional approaches. Comprehensive experimental results demonstrate that our proposals significantly improve the imperceptibility and our designed convolutional DWT and IDWT approaches are more effective in distinguishing high-frequency characteristics of images for steganography than traditional DWT and IDWT approaches.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"46 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Coordinate-aligned multi-camera collaboration for active multi-object tracking 用于主动多目标跟踪的坐标对齐多摄像头协作

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-29 DOI: 10.1007/s00530-024-01420-x

Zeyu Fang, Jian Zhao, Mingyu Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li

Active Multi-Object Tracking (AMOT) is a task where cameras are controlled by a centralized system to adjust their poses automatically and collaboratively so as to maximize the coverage of targets in their shared visual field. In AMOT, each camera only receives partial information from its observation, which may mislead cameras to take locally optimal action. Besides, the global goal, i.e., maximum coverage of objects, is hard to be directly optimized. To address the above issues, we propose a coordinate-aligned multi-camera collaboration system for AMOT. In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution. To represent the observation of each agent, we first identify the targets in the camera view with an image detector and then align the coordinates of the targets via inverse projection transformation. We define the reward of each agent based on both global coverage as well as four individual reward terms. The action policy of the agents is derived from a value-based Q-network. To the best of our knowledge, we are the first to study the AMOT task. To train and evaluate the efficacy of our system, we build a virtual yet credible 3D environment, named “Soccer Court”, to mimic the real-world AMOT scenario. The experimental results show that our system outperforms the baseline and existing methods in various settings, including real-world datasets.

主动多目标跟踪（AMOT）是一项由中央系统控制摄像机自动协同调整姿态的任务，以便最大限度地覆盖共享视场中的目标。在 AMOT 中，每台摄像机只能从其观测中获得部分信息，这可能会误导摄像机采取局部最优行动。此外，全局目标（即最大限度地覆盖目标）很难直接优化。针对上述问题，我们提出了一种用于 AMOT 的坐标对齐多摄像机协作系统。在我们的方法中，我们将每台摄像机视为一个代理，并通过多代理强化学习解决方案来解决 AMOT 问题。为了表示每个代理的观察结果，我们首先用图像检测器识别摄像机视图中的目标，然后通过反投影变换对齐目标的坐标。我们根据全局覆盖和四个单项奖励来定义每个代理的奖励。代理的行动策略源自基于价值的 Q 网络。据我们所知，我们是第一个研究 AMOT 任务的人。为了训练和评估我们系统的功效，我们建立了一个虚拟但可信的三维环境，名为 "足球场"，以模拟现实世界中的 AMOT 场景。实验结果表明，在包括真实世界数据集在内的各种环境中，我们的系统都优于基准方法和现有方法。

{"title":"Coordinate-aligned multi-camera collaboration for active multi-object tracking","authors":"Zeyu Fang, Jian Zhao, Mingyu Yang, Zhenbo Lu, Wengang Zhou, Houqiang Li","doi":"10.1007/s00530-024-01420-x","DOIUrl":"https://doi.org/10.1007/s00530-024-01420-x","url":null,"abstract":"Active Multi-Object Tracking (AMOT) is a task where cameras are controlled by a centralized system to adjust their poses automatically and collaboratively so as to maximize the coverage of targets in their shared visual field. In AMOT, each camera only receives partial information from its observation, which may mislead cameras to take locally optimal action. Besides, the global goal, i.e., maximum coverage of objects, is hard to be directly optimized. To address the above issues, we propose a coordinate-aligned multi-camera collaboration system for AMOT. In our approach, we regard each camera as an agent and address AMOT with a multi-agent reinforcement learning solution. To represent the observation of each agent, we first identify the targets in the camera view with an image detector and then align the coordinates of the targets via inverse projection transformation. We define the reward of each agent based on both global coverage as well as four individual reward terms. The action policy of the agents is derived from a value-based Q-network. To the best of our knowledge, we are the first to study the AMOT task. To train and evaluate the efficacy of our system, we build a virtual yet credible 3D environment, named “Soccer Court”, to mimic the real-world AMOT scenario. The experimental results show that our system outperforms the baseline and existing methods in various settings, including real-world datasets.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"7 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SAM-guided contrast based self-training for source-free cross-domain semantic segmentation 基于 SAM 引导的对比度自我训练，实现无源跨域语义分割

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-26 DOI: 10.1007/s00530-024-01426-5

Qinghua Ren, Ke Hou, Yongzhao Zhan, Chen Wang

Traditional domain adaptive semantic segmentation methods typically assume access to source domain data during training, a paradigm known as source-access domain adaptation for semantic segmentation (SASS). To address data privacy concerns in real-world applications, source-free domain adaptation for semantic segmentation (SFSS) has recently been studied, eliminating the need for direct access to source data. Most SFSS methods primarily utilize pseudo-labels to regularize the model in either the label space or the feature space. Inspired by the segment anything model (SAM), we propose SAM-guided contrast based pseudo-label learning for SFSS in this work. Unlike previous methods that heavily rely on noisy pseudo-labels, we leverage the class-agnostic segmentation masks generated by SAM as prior knowledge to construct positive and negative sample pairs. This approach allows us to directly shape the feature space using contrastive learning. This design ensures the reliable construction of contrastive samples and exploits both intra-class and intra-instance diversity. Our framework is built upon a vanilla teacher–student network architecture for online pseudo-label learning. Consequently, the SFSS model can be jointly regularized in both the feature and label spaces in an end-to-end manner. Extensive experiments demonstrate that our method achieves competitive performance in two challenging SFSS tasks.

传统的域自适应语义分割方法通常假定在训练期间可以访问源域数据，这种模式被称为语义分割的源访问域自适应（SASS）。为了解决实际应用中的数据隐私问题，最近有人研究了无源域适应语义分割（SFSS），这种方法无需直接访问源数据。大多数 SFSS 方法主要利用伪标签来规范标签空间或特征空间中的模型。受segment anything 模型（SAM）的启发，我们在这项工作中提出了基于 SAM 引导的对比度伪标签学习 SFSS 方法。与以往严重依赖噪声伪标签的方法不同，我们利用 SAM 生成的类无关分割掩码作为先验知识来构建正负样本对。这种方法允许我们使用对比学习直接塑造特征空间。这种设计可确保可靠地构建对比样本，并利用类内和实例内的多样性。我们的框架建立在用于在线伪标签学习的虚构师生网络架构之上。因此，SFSS 模型可以端到端方式在特征空间和标签空间联合正则化。广泛的实验证明，我们的方法在两个具有挑战性的 SFSS 任务中取得了具有竞争力的性能。

{"title":"SAM-guided contrast based self-training for source-free cross-domain semantic segmentation","authors":"Qinghua Ren, Ke Hou, Yongzhao Zhan, Chen Wang","doi":"10.1007/s00530-024-01426-5","DOIUrl":"https://doi.org/10.1007/s00530-024-01426-5","url":null,"abstract":"Traditional domain adaptive semantic segmentation methods typically assume access to source domain data during training, a paradigm known as source-access domain adaptation for semantic segmentation (SASS). To address data privacy concerns in real-world applications, source-free domain adaptation for semantic segmentation (SFSS) has recently been studied, eliminating the need for direct access to source data. Most SFSS methods primarily utilize pseudo-labels to regularize the model in either the label space or the feature space. Inspired by the segment anything model (SAM), we propose SAM-guided contrast based pseudo-label learning for SFSS in this work. Unlike previous methods that heavily rely on noisy pseudo-labels, we leverage the class-agnostic segmentation masks generated by SAM as prior knowledge to construct positive and negative sample pairs. This approach allows us to directly shape the feature space using contrastive learning. This design ensures the reliable construction of contrastive samples and exploits both intra-class and intra-instance diversity. Our framework is built upon a vanilla teacher–student network architecture for online pseudo-label learning. Consequently, the SFSS model can be jointly regularized in both the feature and label spaces in an end-to-end manner. Extensive experiments demonstrate that our method achieves competitive performance in two challenging SFSS tasks.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"71 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RA-RevGAN: region-aware reversible adversarial example generation network for privacy-preserving applications RA-RevGAN：用于隐私保护应用的区域感知可逆对抗示例生成网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-26 DOI: 10.1007/s00530-024-01425-6

Jiacheng Zhao, Xiuming Zhao, Zhihua Gan, Xiuli Chai, Tianfeng Ma, Zhen Chen

The rise of online sharing platforms has provided people with diverse and convenient ways to share images. However, a substantial amount of sensitive user information is contained within these images, which can be easily captured by malicious neural networks. To ensure the secure utilization of authorized protected data, reversible adversarial attack techniques have emerged. Existing algorithms for generating adversarial examples do not strike a good balance between visibility and attack capability. Additionally, the network oscillations generated during the training process affect the quality of the final examples. To address these shortcomings, we propose a novel reversible adversarial network based on generative adversarial networks (RA-RevGAN). In this paper, the generator is used for noise generation to map features into perturbations of the image, while the region selection module confines these perturbations to specific areas that significantly affect classification. Furthermore, a robust attack mechanism is integrated into the discriminator to stabilize the network’s training by optimizing convergence speed and minimizing time cost. Extensive experiments have demonstrated that the proposed method ensures a high image generation rate, excellent attack capability, and superior visual quality while maintaining high classification accuracy in image restoration.

网络共享平台的兴起为人们提供了多种便捷的图像共享方式。然而，这些图像中包含大量敏感的用户信息，很容易被恶意神经网络捕获。为了确保安全使用授权保护数据，可逆对抗攻击技术应运而生。现有的生成对抗示例的算法无法在可见性和攻击能力之间取得良好的平衡。此外，训练过程中产生的网络振荡会影响最终示例的质量。针对这些不足，我们提出了一种基于生成式对抗网络（RA-RevGAN）的新型可逆对抗网络。在本文中，生成器用于生成噪声，将特征映射到图像的扰动中，而区域选择模块则将这些扰动限制在对分类有重大影响的特定区域。此外，还在判别器中集成了鲁棒攻击机制，通过优化收敛速度和最小化时间成本来稳定网络的训练。广泛的实验证明，所提出的方法能确保较高的图像生成率、出色的攻击能力和卓越的视觉质量，同时在图像修复中保持较高的分类精度。

{"title":"RA-RevGAN: region-aware reversible adversarial example generation network for privacy-preserving applications","authors":"Jiacheng Zhao, Xiuming Zhao, Zhihua Gan, Xiuli Chai, Tianfeng Ma, Zhen Chen","doi":"10.1007/s00530-024-01425-6","DOIUrl":"https://doi.org/10.1007/s00530-024-01425-6","url":null,"abstract":"The rise of online sharing platforms has provided people with diverse and convenient ways to share images. However, a substantial amount of sensitive user information is contained within these images, which can be easily captured by malicious neural networks. To ensure the secure utilization of authorized protected data, reversible adversarial attack techniques have emerged. Existing algorithms for generating adversarial examples do not strike a good balance between visibility and attack capability. Additionally, the network oscillations generated during the training process affect the quality of the final examples. To address these shortcomings, we propose a novel reversible adversarial network based on generative adversarial networks (RA-RevGAN). In this paper, the generator is used for noise generation to map features into perturbations of the image, while the region selection module confines these perturbations to specific areas that significantly affect classification. Furthermore, a robust attack mechanism is integrated into the discriminator to stabilize the network’s training by optimizing convergence speed and minimizing time cost. Extensive experiments have demonstrated that the proposed method ensures a high image generation rate, excellent attack capability, and superior visual quality while maintaining high classification accuracy in image restoration.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"23 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Application of CLIP for efficient zero-shot learning 应用 CLIP 实现高效的零点学习

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-26 DOI: 10.1007/s00530-024-01414-9

Hairui Yang, Ning Wang, Haojie Li, Lei Wang, Zhihui Wang

Zero-shot learning (ZSL) addresses the challenging task of recognizing classes absent during training. Existing methodologies focus on knowledge transfer from known to unknown categories by formulating a correlation between visual and semantic spaces. However, these methods are faced with constraints related to the discrimination of visual features and the integrity of semantic representations. To alleviate these limitations, we propose a novel Collaborative learning Framework for Zero-Shot Learning (CFZSL), which integrates the CLIP architecture into a fundamental zero-shot learner. Specifically, the foundational zero-shot learning model extracts visual features through a set of CNNs and maps them to a domain-specific semantic space. Simultaneously, the CLIP image encoder extracts visual features containing universal semantics. In this way, the CFZSL framework can obtain discriminative visual features for both domain-specific and domain-agnostic semantics. Additionally, a more comprehensive semantic space is explored by combining the latent feature space learned by CLIP and the domain-specific semantic space. Notably, we just leverage the pre-trained parameters of the CLIP model, mitigating the high training cost and potential overfitting issues associated with fine-tuning. Our proposed framework, characterized by its simple structure, undergoes training exclusively via classification and triplet loss functions. Extensive experimental results, conducted on three widely recognized benchmark datasets-AwA2, CUB, and SUN, conclusively affirm the effectiveness and superiority of our proposed approach.

零镜头学习（Zero-shot learning，ZSL）解决了识别训练过程中缺失的类别这一具有挑战性的任务。现有的方法侧重于通过在视觉空间和语义空间之间建立关联，将知识从已知类别转移到未知类别。然而，这些方法面临着与视觉特征的辨别和语义表征的完整性有关的限制。为了缓解这些限制，我们提出了一种新颖的零点学习协作学习框架（CFZSL），它将 CLIP 架构集成到基本零点学习器中。具体来说，基础零拍学习模型通过一组 CNN 提取视觉特征，并将其映射到特定领域的语义空间。与此同时，CLIP 图像编码器提取包含通用语义的视觉特征。这样，CFZSL 框架就能获得特定领域语义和领域无关语义的辨别性视觉特征。此外，通过将 CLIP 学习到的潜在特征空间与特定领域的语义空间相结合，还能探索出一个更全面的语义空间。值得注意的是，我们只是利用了 CLIP 模型的预训练参数，减轻了与微调相关的高训练成本和潜在的过拟合问题。我们提出的框架结构简单，完全通过分类和三重损失函数进行训练。在三个广受认可的基准数据集--AwA2、CUB 和 SUN--上进行的广泛实验结果证实了我们提出的方法的有效性和优越性。

{"title":"Application of CLIP for efficient zero-shot learning","authors":"Hairui Yang, Ning Wang, Haojie Li, Lei Wang, Zhihui Wang","doi":"10.1007/s00530-024-01414-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01414-9","url":null,"abstract":"Zero-shot learning (ZSL) addresses the challenging task of recognizing classes absent during training. Existing methodologies focus on knowledge transfer from known to unknown categories by formulating a correlation between visual and semantic spaces. However, these methods are faced with constraints related to the discrimination of visual features and the integrity of semantic representations. To alleviate these limitations, we propose a novel Collaborative learning Framework for Zero-Shot Learning (CFZSL), which integrates the CLIP architecture into a fundamental zero-shot learner. Specifically, the foundational zero-shot learning model extracts visual features through a set of CNNs and maps them to a domain-specific semantic space. Simultaneously, the CLIP image encoder extracts visual features containing universal semantics. In this way, the CFZSL framework can obtain discriminative visual features for both domain-specific and domain-agnostic semantics. Additionally, a more comprehensive semantic space is explored by combining the latent feature space learned by CLIP and the domain-specific semantic space. Notably, we just leverage the pre-trained parameters of the CLIP model, mitigating the high training cost and potential overfitting issues associated with fine-tuning. Our proposed framework, characterized by its simple structure, undergoes training exclusively via classification and triplet loss functions. Extensive experimental results, conducted on three widely recognized benchmark datasets-AwA2, CUB, and SUN, conclusively affirm the effectiveness and superiority of our proposed approach.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"65 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence CMLCNet：基于卷积胶囊编码器和多尺度局部共现的医学图像分割网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-26 DOI: 10.1007/s00530-024-01430-9

Chendong Qin, Yongxiong Wang, Jiapeng Zhang

Medical images have low contrast and blurred boundaries between different tissues or between tissues and lesions. Because labeling medical images is laborious and requires expert knowledge, the labeled data are expensive or simply unavailable. UNet has achieved great success in the field of medical image segmentation. However, the pooling layer in downsampling tends to discard important information such as location information. It is difficult to learn global and long-range semantic interactive information well due to the locality of convolution operation. The usual solution is increasing the number of datasets or enhancing the training data though augmentation methods. However, to obtain a large number of medical datasets is tough, and the augmentation methods may increase the training burden. In this work, we propose a 2D medical image segmentation network with a convolutional capsule encoder and a multiscale local co-occurrence module. To extract more local detail and contextual information, the capsule encoder is introduced to learn the information about the target location and the relationship between the part and the whole. Multi-scale features can be fused by a new attention mechanism, which can then selectively emphasize salient features useful for a specific task by capturing global information and suppress background noise. The proposed attention mechanism is used to preserve the information that is discarded by pooling layers of the network. In addition, a multi-scale local co-occurrence algorithm is proposed, where the context and dependencies between different regions in an image can be better learned. Experimental results on the dataset of Liver, ISIC and BraTS2019 show that our network is superior to the UNet and other previous medical image segmentation networks under the same experimental conditions.

医学图像对比度低，不同组织之间或组织与病变之间的界限模糊。由于对医学图像进行标注非常费力，而且需要专家知识，因此标注数据非常昂贵或根本无法获得。UNet 在医学图像分割领域取得了巨大成功。但是，下采样中的池层往往会丢弃重要信息，如位置信息。由于卷积操作的局部性，很难很好地学习全局和远距离语义交互信息。通常的解决方法是增加数据集的数量，或通过增强方法来增强训练数据。然而，要获得大量的医学数据集非常困难，而且增强方法可能会增加训练负担。在这项研究中，我们提出了一种带有卷积胶囊编码器和多尺度局部共现模块的二维医学图像分割网络。为了提取更多局部细节和上下文信息，我们引入了胶囊编码器来学习目标位置信息以及部分与整体之间的关系。多尺度特征可以通过一种新的注意力机制进行融合，从而通过捕捉全局信息和抑制背景噪音，有选择性地强调对特定任务有用的突出特征。所提出的注意力机制可用于保留网络汇集层所丢弃的信息。此外，还提出了一种多尺度局部共现算法，可以更好地学习图像中不同区域之间的上下文和依赖关系。在肝脏、ISIC 和 BraTS2019 数据集上的实验结果表明，在相同的实验条件下，我们的网络优于 UNet 和之前的其他医学图像分割网络。

{"title":"CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrence","authors":"Chendong Qin, Yongxiong Wang, Jiapeng Zhang","doi":"10.1007/s00530-024-01430-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01430-9","url":null,"abstract":"Medical images have low contrast and blurred boundaries between different tissues or between tissues and lesions. Because labeling medical images is laborious and requires expert knowledge, the labeled data are expensive or simply unavailable. UNet has achieved great success in the field of medical image segmentation. However, the pooling layer in downsampling tends to discard important information such as location information. It is difficult to learn global and long-range semantic interactive information well due to the locality of convolution operation. The usual solution is increasing the number of datasets or enhancing the training data though augmentation methods. However, to obtain a large number of medical datasets is tough, and the augmentation methods may increase the training burden. In this work, we propose a 2D medical image segmentation network with a convolutional capsule encoder and a multiscale local co-occurrence module. To extract more local detail and contextual information, the capsule encoder is introduced to learn the information about the target location and the relationship between the part and the whole. Multi-scale features can be fused by a new attention mechanism, which can then selectively emphasize salient features useful for a specific task by capturing global information and suppress background noise. The proposed attention mechanism is used to preserve the information that is discarded by pooling layers of the network. In addition, a multi-scale local co-occurrence algorithm is proposed, where the context and dependencies between different regions in an image can be better learned. Experimental results on the dataset of Liver, ISIC and BraTS2019 show that our network is superior to the UNet and other previous medical image segmentation networks under the same experimental conditions.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"95 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TrafficTrack: rethinking the motion and appearance cue for multi-vehicle tracking in traffic monitoring 交通跟踪：重新思考交通监控中多车跟踪的运动和外观线索

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-25 DOI: 10.1007/s00530-024-01407-8

Hui Cai, Haifeng Lin, Dapeng Liu

Analyzing traffic flow based on data from traffic monitoring is an essential component of intelligent transportation systems. In most traffic scenarios, vehicles are the primary targets, so multi-object tracking of vehicles in traffic monitoring is a critical subject. In view of the current difficulties, such as complex road conditions, numerous obstructions, and similar vehicle appearances, we propose a detection-based multi-object vehicle tracking algorithm that combines motion and appearance cues. Firstly, to improve the motion prediction accuracy, we propose a Kalman filter that adaptively updates the noise according to the motion matching cost and detection confidence score, combined with exponential transformation and residuals. Then, we propose a combined distance to utilize motion and appearance cues. Finally, we present a trajectory recovery strategy to handle unmatched trajectories and detections. Experimental results on the UA-DETRAC dataset demonstrate that this method achieves excellent tracking performance for vehicle tracking tasks in traffic monitoring perspectives, meeting the practical application demands of complex traffic scenarios.

根据交通监控数据分析交通流量是智能交通系统的重要组成部分。在大多数交通场景中，车辆是主要目标，因此在交通监控中对车辆进行多目标跟踪是一个重要课题。针对目前存在的困难，如路况复杂、障碍物众多、车辆外观相似等，我们提出了一种基于检测的多目标车辆跟踪算法，该算法结合了运动和外观线索。首先，为了提高运动预测精度，我们提出了卡尔曼滤波器，根据运动匹配成本和检测置信度得分，结合指数变换和残差，自适应地更新噪声。然后，我们提出了利用运动和外观线索的组合距离。最后，我们提出了一种轨迹恢复策略，以处理未匹配的轨迹和检测。在 UA-DETRAC 数据集上的实验结果表明，该方法在交通监控视角下的车辆跟踪任务中取得了优异的跟踪性能，满足了复杂交通场景的实际应用需求。

{"title":"TrafficTrack: rethinking the motion and appearance cue for multi-vehicle tracking in traffic monitoring","authors":"Hui Cai, Haifeng Lin, Dapeng Liu","doi":"10.1007/s00530-024-01407-8","DOIUrl":"https://doi.org/10.1007/s00530-024-01407-8","url":null,"abstract":"Analyzing traffic flow based on data from traffic monitoring is an essential component of intelligent transportation systems. In most traffic scenarios, vehicles are the primary targets, so multi-object tracking of vehicles in traffic monitoring is a critical subject. In view of the current difficulties, such as complex road conditions, numerous obstructions, and similar vehicle appearances, we propose a detection-based multi-object vehicle tracking algorithm that combines motion and appearance cues. Firstly, to improve the motion prediction accuracy, we propose a Kalman filter that adaptively updates the noise according to the motion matching cost and detection confidence score, combined with exponential transformation and residuals. Then, we propose a combined distance to utilize motion and appearance cues. Finally, we present a trajectory recovery strategy to handle unmatched trajectories and detections. Experimental results on the UA-DETRAC dataset demonstrate that this method achieves excellent tracking performance for vehicle tracking tasks in traffic monitoring perspectives, meeting the practical application demands of complex traffic scenarios.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"2 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fs-yolo: fire-smoke detection based on improved YOLOv7 Fs-yolo：基于改进型 YOLOv7 的烟火探测技术

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-24 DOI: 10.1007/s00530-024-01359-z

Dongmei Wang, Ying Qian, Jingyi Lu, Peng Wang, Zhongrui Hu, Yongkang Chai

Fire has emerged as a major danger to the Earth’s ecological equilibrium and human well-being. Fire detection and alert systems are essential. There is a scarcity of public fire datasets with examples of fire and smoke in real-world situations. Moreover, techniques for recognizing items in fire smoke are imprecise and unreliable when it comes to identifying small objects. We developed a dual dataset to evaluate the model’s ability to handle these difficulties. Introducing FS-YOLO, a new fire detection model with improved accuracy. Training YOLOv7 may lead to overfitting because of the large number of parameters and the limited fire detection object categories. YOLOv7 struggles to recognize small dense objects during feature extraction, resulting in missed detections. The Swin Transformer module has been enhanced to decrease local feature interdependence, obtain a wider range of parameters, and handle features at several levels. The improvements strengthen the model’s robustness and the network’s ability to recognize dense tiny objects. The efficient channel attention was incorporated to reduce the occurrence of false fire detections. Localizing the region of interest and extracting meaningful information aids the model in identifying pertinent areas and minimizing false detections. The proposal also considers using fire-smoke and real-fire-smoke datasets. The latter dataset simulates real-world conditions with occlusions, lens blur, and motion blur. This dataset tests the model’s robustness and adaptability in complex situations. On both datasets, the mAP of FS-YOLO is improved by 6.4(%) and 5.4(%) compared to YOLOv7. In the robustness check experiments, the mAP of FS-YOLO is 4.1(%) and 3.1(%) higher than that of today’s SOTA models YOLOv8s, DINO.

火灾已成为地球生态平衡和人类福祉的一大威胁。火灾探测和警报系统至关重要。在现实世界中，具有火灾和烟雾实例的公共火灾数据集非常稀缺。此外，识别火灾烟雾中物品的技术在识别小物体时并不精确和可靠。我们开发了一个双重数据集来评估模型处理这些困难的能力。引入 FS-YOLO，一种新的火灾检测模型，其准确性有所提高。由于参数数量庞大且火灾探测对象类别有限，训练 YOLOv7 可能会导致过度拟合。在特征提取过程中，YOLOv7 难以识别小而密集的物体，从而导致漏检。我们对 Swin Transformer 模块进行了改进，以降低局部特征的相互依赖性，获得更广泛的参数范围，并处理多个层次的特征。这些改进增强了模型的鲁棒性和网络识别密集微小物体的能力。高效的通道关注被纳入其中，以减少错误火情检测的发生。对感兴趣区域进行定位并提取有意义的信息，有助于模型识别相关区域并将误报率降至最低。该建议还考虑使用烟火数据集和真实烟火数据集。后一种数据集模拟了真实世界中的遮挡、镜头模糊和运动模糊等情况。该数据集测试了模型在复杂情况下的鲁棒性和适应性。在这两个数据集上，与YOLOv7相比，FS-YOLO的mAP分别提高了6.4和5.4。在鲁棒性检查实验中，FS-YOLO的mAP比现在的SOTA模型YOLOv8s、DINO分别高出4.1和3.1。

{"title":"Fs-yolo: fire-smoke detection based on improved YOLOv7","authors":"Dongmei Wang, Ying Qian, Jingyi Lu, Peng Wang, Zhongrui Hu, Yongkang Chai","doi":"10.1007/s00530-024-01359-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01359-z","url":null,"abstract":"Fire has emerged as a major danger to the Earth’s ecological equilibrium and human well-being. Fire detection and alert systems are essential. There is a scarcity of public fire datasets with examples of fire and smoke in real-world situations. Moreover, techniques for recognizing items in fire smoke are imprecise and unreliable when it comes to identifying small objects. We developed a dual dataset to evaluate the model’s ability to handle these difficulties. Introducing FS-YOLO, a new fire detection model with improved accuracy. Training YOLOv7 may lead to overfitting because of the large number of parameters and the limited fire detection object categories. YOLOv7 struggles to recognize small dense objects during feature extraction, resulting in missed detections. The Swin Transformer module has been enhanced to decrease local feature interdependence, obtain a wider range of parameters, and handle features at several levels. The improvements strengthen the model’s robustness and the network’s ability to recognize dense tiny objects. The efficient channel attention was incorporated to reduce the occurrence of false fire detections. Localizing the region of interest and extracting meaningful information aids the model in identifying pertinent areas and minimizing false detections. The proposal also considers using fire-smoke and real-fire-smoke datasets. The latter dataset simulates real-world conditions with occlusions, lens blur, and motion blur. This dataset tests the model’s robustness and adaptability in complex situations. On both datasets, the mAP of FS-YOLO is improved by 6.4(%) and 5.4(%) compared to YOLOv7. In the robustness check experiments, the mAP of FS-YOLO is 4.1(%) and 3.1(%) higher than that of today’s SOTA models YOLOv8s, DINO.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"180 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784572","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0