Multimedia Systems最新文献_第5页

Bitrate-adaptive and quality-aware HTTP video streaming with the multi-access edge computing server handoff control 利用多接入边缘计算服务器切换控制进行比特率自适应和质量感知 HTTP 视频流传输

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-08-02 DOI: 10.1007/s00530-024-01371-3

Chung-Ming Huang, Zi-Yuan Hu

To deal with the coming Multi-access Edge Computing (MEC)-based 5G and the future 6G wireless mobile network environment, a Multi-access Edge Computing (MEC)-based video streaming method using the proposed quality-aware video bitrate adaption and MEC server handoff control mechanisms for Dynamic Adaptive Streaming over HTTP (MPEG-DASH) video streaming was proposed in this work. Since the user is moving, the attached Base Station (BS) of the cellular network can be changed, i.e., the BS handoff can happen, which results in the corresponding MEC server handoff. Thus, this work proposed the MEC server handoff control mechanism to make the playing quality be smooth when the MEC server handoff happens. To have the MEC server to be able to derive the video bit rate for each video segment of the MPEG-DASH video streaming and to have the smooth video streaming when the MEC server handoff happens, the proposed method (i) derives the estimated bandwidth using the adaptive filter mechanism, (ii) generates some candidate video bit rates by considering the estimated bandwidth and the buffer occupancy situation in the client side and then (iii) selects a video bit rate from the candidate ones considering video quality’s stability. For the video quality’s stability concern, the proposed method considered not only (i) both bandwidth and buffer issues but also (ii) the long-term quality variation and the short-term quality variation to have the adaptive video streaming. The results of the performance evaluation, which was executed in a lab-wide experimental LTE network eNB system, shown that the proposed method has the more stable video quality for the MPEG-DASH video streaming over the wireless mobile network.

为了应对即将到来的基于多接入边缘计算（MEC）的5G和未来的6G无线移动网络环境，本研究提出了一种基于多接入边缘计算（MEC）的视频流方法，该方法采用了针对HTTP动态自适应流（MPEG-DASH）视频流提出的质量感知视频比特率自适应和MEC服务器切换控制机制。由于用户在移动，所连接的蜂窝网络基站（BS）可能会发生变化，即发生基站切换，从而导致相应的 MEC 服务器切换。因此，本研究提出了 MEC 服务器切换控制机制，以使 MEC 服务器切换时的播放质量保持平稳。为了让 MEC 服务器能够得出 MPEG-DASH 视频流中每个视频段的视频比特率，并在 MEC 服务器切换时保证视频流的流畅性，所提出的方法(i) 使用自适应滤波机制得出估计带宽，(ii) 通过考虑估计带宽和客户端缓冲区占用情况生成一些候选视频比特率，然后(iii) 从候选比特率中选择一个考虑视频质量稳定性的视频比特率。针对视频质量的稳定性问题，所提出的方法不仅考虑了(i) 带宽和缓冲区问题，还考虑了(ii) 长期质量变化和短期质量变化，以实现自适应视频流。在实验室范围内的 LTE 网络 eNB 系统实验中进行的性能评估结果表明，对于无线移动网络上的 MPEG-DASH 视频流，所提出的方法具有更稳定的视频质量。

{"title":"Bitrate-adaptive and quality-aware HTTP video streaming with the multi-access edge computing server handoff control","authors":"Chung-Ming Huang, Zi-Yuan Hu","doi":"10.1007/s00530-024-01371-3","DOIUrl":"https://doi.org/10.1007/s00530-024-01371-3","url":null,"abstract":"To deal with the coming Multi-access Edge Computing (MEC)-based 5G and the future 6G wireless mobile network environment, a Multi-access Edge Computing (MEC)-based video streaming method using the proposed quality-aware video bitrate adaption and MEC server handoff control mechanisms for Dynamic Adaptive Streaming over HTTP (MPEG-DASH) video streaming was proposed in this work. Since the user is moving, the attached Base Station (BS) of the cellular network can be changed, i.e., the BS handoff can happen, which results in the corresponding MEC server handoff. Thus, this work proposed the MEC server handoff control mechanism to make the playing quality be smooth when the MEC server handoff happens. To have the MEC server to be able to derive the video bit rate for each video segment of the MPEG-DASH video streaming and to have the smooth video streaming when the MEC server handoff happens, the proposed method (i) derives the estimated bandwidth using the adaptive filter mechanism, (ii) generates some candidate video bit rates by considering the estimated bandwidth and the buffer occupancy situation in the client side and then (iii) selects a video bit rate from the candidate ones considering video quality’s stability. For the video quality’s stability concern, the proposed method considered not only (i) both bandwidth and buffer issues but also (ii) the long-term quality variation and the short-term quality variation to have the adaptive video streaming. The results of the performance evaluation, which was executed in a lab-wide experimental LTE network eNB system, shown that the proposed method has the more stable video quality for the MPEG-DASH video streaming over the wireless mobile network.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"43 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LCFormer: linear complexity transformer for efficient image super-resolution LCFormer：用于高效图像超分辨率的线性复杂度变换器

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-08-01 DOI: 10.1007/s00530-024-01435-4

Xiang Gao, Sining Wu, Ying Zhou, Fan Wang, Xiaopeng Hu

Recently, Transformer-based methods have made significant breakthroughs for single image super-resolution (SISR) but with considerable computation overheads. In this paper, we propose a novel Linear Complexity Transformer (LCFormer) for efficient image super-resolution. Specifically, since the vanilla SA has quadratic complexity and often ignores potential correlations among different data samples, External Attention (EA) is introduced into Transformer to reduce the quadratic complexity to linear and implicitly considers the correlations across the whole dataset. To improve training speed and performance, Root Mean Square Layer Normalization (RMSNorm) is adopted in the Transformer layer. Moreover, an Efficient Gated Depth-wise-conv Feed-forward Network (EGDFN) is designed by the gate mechanism and depth-wise convolutions in Transformer for feature representation with an efficient implementation. The proposed LCFormer achieves comparable or superior performance to existing Transformer-based methods. However, the computation complexity and GPU memory consumption have been dramatically reduced. Extensive experiments demonstrate that LCFormer achieves competitive accuracy and visual improvements against other state-of-the-art methods and reaches a trade-off between model performance and computation costs.

最近，基于变换器的方法在单图像超分辨率（SISR）方面取得了重大突破，但计算开销相当大。在本文中，我们提出了一种新型线性复杂度变换器（LCFormer），用于高效图像超分辨率。具体来说，由于普通的超分辨率算法具有二次复杂性，而且往往会忽略不同数据样本之间的潜在相关性，因此我们在变换器中引入了外部注意力（EA），从而将二次复杂性降低为线性，并隐式地考虑了整个数据集的相关性。为了提高训练速度和性能，转换器层采用了均方根层归一化（RMSNorm）技术。此外，通过门机制和变换器中的深度卷积，设计了一个高效门控深度卷积前馈网络（EGDFN），用于特征表示并高效实现。所提出的 LCFormer 与现有的基于 Transformer 的方法相比，性能相当甚至更优。然而，计算复杂度和 GPU 内存消耗却大幅降低。广泛的实验证明，LCFormer 在精度和视觉效果上的改进与其他最先进的方法相比具有竞争力，并在模型性能和计算成本之间实现了权衡。

{"title":"LCFormer: linear complexity transformer for efficient image super-resolution","authors":"Xiang Gao, Sining Wu, Ying Zhou, Fan Wang, Xiaopeng Hu","doi":"10.1007/s00530-024-01435-4","DOIUrl":"https://doi.org/10.1007/s00530-024-01435-4","url":null,"abstract":"Recently, Transformer-based methods have made significant breakthroughs for single image super-resolution (SISR) but with considerable computation overheads. In this paper, we propose a novel Linear Complexity Transformer (LCFormer) for efficient image super-resolution. Specifically, since the vanilla SA has quadratic complexity and often ignores potential correlations among different data samples, External Attention (EA) is introduced into Transformer to reduce the quadratic complexity to linear and implicitly considers the correlations across the whole dataset. To improve training speed and performance, Root Mean Square Layer Normalization (RMSNorm) is adopted in the Transformer layer. Moreover, an Efficient Gated Depth-wise-conv Feed-forward Network (EGDFN) is designed by the gate mechanism and depth-wise convolutions in Transformer for feature representation with an efficient implementation. The proposed LCFormer achieves comparable or superior performance to existing Transformer-based methods. However, the computation complexity and GPU memory consumption have been dramatically reduced. Extensive experiments demonstrate that LCFormer achieves competitive accuracy and visual improvements against other state-of-the-art methods and reaches a trade-off between model performance and computation costs.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"76 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

View sequence prediction GAN: unsupervised representation learning for 3D shapes by decomposing view content and viewpoint variance 视图序列预测 GAN：通过分解视图内容和视点差异对三维形状进行无监督表示学习

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-08-01 DOI: 10.1007/s00530-024-01431-8

Heyu Zhou, Jiayu Li, Xianzhu Liu, Yingda Lyu, Haipeng Chen, An-An Liu

Unsupervised representation learning for 3D shapes has become a critical problem for large-scale 3D shape management. Recent model-based methods for this task require additional information for training, while popular view-based methods often overlook viewpoint variance in view prediction, leading to uninformative 3D features that limit their practical applications. To address these issues, we propose an unsupervised 3D shape representation learning method called View Sequence Prediction GAN (VSP-GAN), which decomposes view content and viewpoint variance. VSP-GAN takes several adjacent views of a 3D shape as input and outputs the subsequent views. The key idea is to split the multi-view sequence into two available perceptible parts, view content and viewpoint variance, and independently encode them with separate encoders. With the information, we design a decoder implemented by the mirrored architecture of the content encoder to predict the view sequence by multi-steps. Besides, to improve the quality of the reconstructed views, we propose a novel hierarchical view prediction loss to enhance view realism, semantic consistency, and details retainment. We evaluate the proposed VSP-GAN on two popular 3D CAD datasets, ModelNet10 and ModelNet40, for 3D shape classification and retrieval. The experimental results demonstrate that our VSP-GAN can learn more discriminative features than the state-of-the-art methods.

三维形状的无监督表示学习已成为大规模三维形状管理的关键问题。最近的基于模型的方法需要额外的信息进行训练，而流行的基于视图的方法在视图预测中往往忽略了视点差异，导致三维特征信息量不足，限制了其实际应用。为了解决这些问题，我们提出了一种名为视图序列预测 GAN（VSP-GAN）的无监督三维形状表示学习方法，它能分解视图内容和视点差异。VSP-GAN 将三维形状的多个相邻视图作为输入，并输出后续视图。其主要思路是将多视图序列拆分成两个可感知的部分，即视图内容和视点差异，并分别用不同的编码器对其进行编码。利用这些信息，我们设计了一个解码器，通过内容编码器的镜像架构来实现多步骤预测视图序列。此外，为了提高重建视图的质量，我们提出了一种新颖的分层视图预测损失，以增强视图的真实性、语义一致性和细节保留。我们在两个流行的三维 CAD 数据集 ModelNet10 和 ModelNet40 上对所提出的 VSP-GAN 进行了评估，以进行三维形状分类和检索。实验结果表明，与最先进的方法相比，我们的 VSP-GAN 可以学习到更多的判别特征。

{"title":"View sequence prediction GAN: unsupervised representation learning for 3D shapes by decomposing view content and viewpoint variance","authors":"Heyu Zhou, Jiayu Li, Xianzhu Liu, Yingda Lyu, Haipeng Chen, An-An Liu","doi":"10.1007/s00530-024-01431-8","DOIUrl":"https://doi.org/10.1007/s00530-024-01431-8","url":null,"abstract":"Unsupervised representation learning for 3D shapes has become a critical problem for large-scale 3D shape management. Recent model-based methods for this task require additional information for training, while popular view-based methods often overlook viewpoint variance in view prediction, leading to uninformative 3D features that limit their practical applications. To address these issues, we propose an unsupervised 3D shape representation learning method called View Sequence Prediction GAN (VSP-GAN), which decomposes view content and viewpoint variance. VSP-GAN takes several adjacent views of a 3D shape as input and outputs the subsequent views. The key idea is to split the multi-view sequence into two available perceptible parts, view content and viewpoint variance, and independently encode them with separate encoders. With the information, we design a decoder implemented by the mirrored architecture of the content encoder to predict the view sequence by multi-steps. Besides, to improve the quality of the reconstructed views, we propose a novel hierarchical view prediction loss to enhance view realism, semantic consistency, and details retainment. We evaluate the proposed VSP-GAN on two popular 3D CAD datasets, ModelNet10 and ModelNet40, for 3D shape classification and retrieval. The experimental results demonstrate that our VSP-GAN can learn more discriminative features than the state-of-the-art methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"45 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector Tex-Net：基于纹理的并行分支交叉注意广义鲁棒 Deepfake 检测器

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-08-01 DOI: 10.1007/s00530-024-01424-7

Deepak Dagar, Dinesh Kumar Vishwakarma

In recent years, artificial faces generated using Generative Adversarial Networks (GANs) and Variational Auto-encoders (VAEs) have become more lifelike and difficult for humans to distinguish. Deepfake refers to highly realistic and impressive media generated using deep learning technology. Convolutional Neural Networks (CNNs) have demonstrated significant potential in computer vision applications, particularly identifying fraudulent faces. However, if these networks are trained on insufficient data, they cannot effectively apply their knowledge to unfamiliar datasets, as they are susceptible to inherent biases in their learning process, such as translation, equivariance, and localization. The attention mechanism of vision transformers has effectively resolved these limits, leading to their growing popularity in recent years. This work introduces a novel module for extracting global texture information and a model that combines data from CNN (ResNet-18) and cross-attention vision transformers. The model takes in input and generates the global texture by utilizing Gram matrices and local binary patterns at each down sampling step of the ResNet-18 architecture. The ResNet-18 main branch and global texture module operate simultaneously before inputting into the visual transformer’s dual branch’s cross-attention mechanism. Initially, the empirical investigation demonstrates that counterfeit images typically display more uniform textures that are inconsistent across long distances. The model’s performance on the cross-forgery dataset is demonstrated by experiments conducted on various types of GAN images and Faceforensics + + categories. The results show that the model outperforms the scores of many state-of-the-art techniques, achieving an accuracy score of up to 85%. Furthermore, multiple tests are performed on different data samples (FF + +, DFDCPreview, Celeb-Df) that undergo post-processing techniques, including compression, noise addition, and blurring. These studies validate that the model acquires the shared distinguishing characteristics (global texture) that persist across different types of fake picture distributions, and the outcomes of these trials demonstrate that the model is resilient and can be used in many scenarios.

近年来，利用生成式对抗网络（GAN）和变异自动编码器（VAE）生成的人工人脸越来越逼真，人类也越来越难以分辨。Deepfake 指的是利用深度学习技术生成的高度逼真、令人印象深刻的媒体。卷积神经网络（CNN）已在计算机视觉应用中展现出巨大潜力，尤其是在识别欺诈性人脸方面。然而，如果这些网络是在数据不足的情况下进行训练的，它们就无法有效地将其知识应用于陌生的数据集，因为它们在学习过程中容易受到固有偏差的影响，例如平移、等差数列和定位。视觉转换器的注意力机制有效地解决了这些限制，因此近年来越来越受欢迎。本作品介绍了一个用于提取全局纹理信息的新模块，以及一个结合了 CNN（ResNet-18）和交叉注意力视觉转换器数据的模型。该模型接收输入，并在 ResNet-18 架构的每个向下采样步骤中利用格兰氏矩阵和局部二进制模式生成全局纹理。ResNet-18 主分支和全局纹理模块同时运行，然后输入视觉转换器双分支的交叉注意机制。最初的实证调查表明，伪造图像通常显示出较均匀的纹理，而这种纹理在长距离上是不一致的。通过对各种类型的 GAN 图像和 Faceforensics + + 类别进行实验，证明了该模型在交叉伪造数据集上的性能。结果表明，该模型优于许多最先进技术的得分，准确率高达 85%。此外，还对经过压缩、噪声添加和模糊等后处理技术的不同数据样本（FF + +、DFDCPreview、Celeb-Df）进行了多次测试。这些研究验证了该模型获得了在不同类型的假图片分布中持续存在的共同识别特征（全局纹理），这些试验结果表明该模型具有弹性，可用于多种场景。

{"title":"Tex-Net: texture-based parallel branch cross-attention generalized robust Deepfake detector","authors":"Deepak Dagar, Dinesh Kumar Vishwakarma","doi":"10.1007/s00530-024-01424-7","DOIUrl":"https://doi.org/10.1007/s00530-024-01424-7","url":null,"abstract":"In recent years, artificial faces generated using Generative Adversarial Networks (GANs) and Variational Auto-encoders (VAEs) have become more lifelike and difficult for humans to distinguish. Deepfake refers to highly realistic and impressive media generated using deep learning technology. Convolutional Neural Networks (CNNs) have demonstrated significant potential in computer vision applications, particularly identifying fraudulent faces. However, if these networks are trained on insufficient data, they cannot effectively apply their knowledge to unfamiliar datasets, as they are susceptible to inherent biases in their learning process, such as translation, equivariance, and localization. The attention mechanism of vision transformers has effectively resolved these limits, leading to their growing popularity in recent years. This work introduces a novel module for extracting global texture information and a model that combines data from CNN (ResNet-18) and cross-attention vision transformers. The model takes in input and generates the global texture by utilizing Gram matrices and local binary patterns at each down sampling step of the ResNet-18 architecture. The ResNet-18 main branch and global texture module operate simultaneously before inputting into the visual transformer’s dual branch’s cross-attention mechanism. Initially, the empirical investigation demonstrates that counterfeit images typically display more uniform textures that are inconsistent across long distances. The model’s performance on the cross-forgery dataset is demonstrated by experiments conducted on various types of GAN images and Faceforensics + + categories. The results show that the model outperforms the scores of many state-of-the-art techniques, achieving an accuracy score of up to 85%. Furthermore, multiple tests are performed on different data samples (FF + +, DFDCPreview, Celeb-Df) that undergo post-processing techniques, including compression, noise addition, and blurring. These studies validate that the model acquires the shared distinguishing characteristics (global texture) that persist across different types of fake picture distributions, and the outcomes of these trials demonstrate that the model is resilient and can be used in many scenarios.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"34 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DCANet: CNN model with dual-path network and improved coordinate attention for JPEG steganalysis DCANet：具有双路径网络和改进的协调注意力的 CNN 模型，用于 JPEG 隐藏分析

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-08-01 DOI: 10.1007/s00530-024-01433-6

Tong Fu, Liquan Chen, Yuan Gao, Huiyu Fang

Nowadays, convolutional neural network (CNN) is applied to JPEG steganalysis and performs better than traditional methods. However, almost all JPEG steganalysis methods utilize single-path structures, making it challenging to use the extracted noise residuals fully. On the other hand, most existing steganalysis detectors lack a focus on areas where secret information may be hidden. In this research, we present a steganalysis model with a dual-path network and improved coordinate attention to detect adaptive JPEG steganography, mainly including noise extraction, noise aggregation, and classification module. Especially, a dual-path network architecture simultaneously combining the advantages of both residual and dense connection is utilized to explore the hidden features in-depth while preserving the stego signal in the noise extraction module. Then, an improved coordinate attention mechanism is introduced into the noise aggregation module, which helps the network identify the complex texture area more quickly and extract more valuable features. We have verified the validity of some components through extensive ablation experiments with the necessary descriptions. Furthermore, we conducted comparative experiments on BOSSBase and BOWS2, and the experimental results demonstrate that the proposed model achieves the best detection performance compared with other start-of-the-art methods.

如今，卷积神经网络（CNN）被应用于 JPEG 隐写分析，其性能优于传统方法。然而，几乎所有的 JPEG 隐写分析方法都采用单路径结构，这使得充分利用提取的噪声残差成为挑战。另一方面，大多数现有的隐写分析检测器缺乏对可能隐藏秘密信息区域的关注。在这项研究中，我们提出了一种采用双路径网络和改进的坐标注意力的隐写分析模型，用于检测自适应 JPEG 隐写术，主要包括噪声提取、噪声聚合和分类模块。其中，在噪声提取模块中，利用双路径网络结构同时结合残差连接和密集连接的优势，在保留隐去信号的同时深入挖掘隐藏特征。然后，在噪声聚合模块中引入了改进的坐标注意机制，帮助网络更快地识别复杂纹理区域，提取出更多有价值的特征。我们通过大量的消融实验验证了一些组件的有效性，并进行了必要的描述。此外，我们还在 BOSSBase 和 BOWS2 上进行了对比实验，实验结果表明，与其他最先进的方法相比，所提出的模型实现了最佳的检测性能。

{"title":"DCANet: CNN model with dual-path network and improved coordinate attention for JPEG steganalysis","authors":"Tong Fu, Liquan Chen, Yuan Gao, Huiyu Fang","doi":"10.1007/s00530-024-01433-6","DOIUrl":"https://doi.org/10.1007/s00530-024-01433-6","url":null,"abstract":"Nowadays, convolutional neural network (CNN) is applied to JPEG steganalysis and performs better than traditional methods. However, almost all JPEG steganalysis methods utilize single-path structures, making it challenging to use the extracted noise residuals fully. On the other hand, most existing steganalysis detectors lack a focus on areas where secret information may be hidden. In this research, we present a steganalysis model with a dual-path network and improved coordinate attention to detect adaptive JPEG steganography, mainly including noise extraction, noise aggregation, and classification module. Especially, a dual-path network architecture simultaneously combining the advantages of both residual and dense connection is utilized to explore the hidden features in-depth while preserving the stego signal in the noise extraction module. Then, an improved coordinate attention mechanism is introduced into the noise aggregation module, which helps the network identify the complex texture area more quickly and extract more valuable features. We have verified the validity of some components through extensive ablation experiments with the necessary descriptions. Furthermore, we conducted comparative experiments on BOSSBase and BOWS2, and the experimental results demonstrate that the proposed model achieves the best detection performance compared with other start-of-the-art methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"44 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Electric vehicle routing optimization under 3D electric energy modeling 三维电能模型下的电动汽车路线优化

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-31 DOI: 10.1007/s00530-024-01409-6

Yanfei Zhu, Yonghua Wang, Chunhui Li, Kwang Y. Lee

In logistics transportation, the electric vehicle routing problem (EVRP) is researched widely in order to save vehicle power expenditure, reduce transportation costs, and improve service quality. The power expenditure model and routing algorithm are essential for resolving EVRP. To align the routing schedule more reasonable and closer to reality, this paper employs a three-dimensional power expenditure model to calculate the power expenditure of EVs. In this model, the power expenditure of the EVs during the process of going up and downhill is considered to solve the routing schedule of logistics transportation in mountainous areas. This study combines Q-learning and the Re-insertion Genetic Algorithm (Q-RIGA) to design EV routes with low electricity expenditure and reduced transportation costs. The Q-learning algorithm is used to improve route initialization and obtain high-quality initial routes, which are further optimized by RIGA. Tested in a collection of randomly dispersed customer groups, the advantages of the proposed method in terms of convergence speed and power expenditure are confirmed. The three-dimensional power expenditure model with consideration of elevation is used to conduct simulation experiments on the distribution example of Sanlian Dairy in Guizhou to verify that the improved model features broader application and higher practical value.

在物流运输中，为了节省车辆电力支出、降低运输成本和提高服务质量，电动汽车路由问题（EVRP）被广泛研究。电力支出模型和路由算法是解决 EVRP 的关键。为了使路由安排更合理、更贴近现实，本文采用了三维电力支出模型来计算电动汽车的电力支出。在该模型中，考虑了电动汽车在上下坡过程中的动力消耗，从而解决山区物流运输的路由安排问题。本研究将 Q-learning 算法和再插入遗传算法（Q-RIGA）相结合，设计出了低电耗、降低运输成本的电动车路线。Q-learning 算法用于改进路线初始化，获得高质量的初始路线，并通过 RIGA 进一步优化。在一组随机分散的客户群中进行测试，证实了所提方法在收敛速度和电力支出方面的优势。考虑了海拔高度的三维电力支出模型在贵州三联乳业的配送实例中进行了仿真实验，验证了改进后的模型具有更广泛的适用性和更高的实用价值。

{"title":"Electric vehicle routing optimization under 3D electric energy modeling","authors":"Yanfei Zhu, Yonghua Wang, Chunhui Li, Kwang Y. Lee","doi":"10.1007/s00530-024-01409-6","DOIUrl":"https://doi.org/10.1007/s00530-024-01409-6","url":null,"abstract":"In logistics transportation, the electric vehicle routing problem (EVRP) is researched widely in order to save vehicle power expenditure, reduce transportation costs, and improve service quality. The power expenditure model and routing algorithm are essential for resolving EVRP. To align the routing schedule more reasonable and closer to reality, this paper employs a three-dimensional power expenditure model to calculate the power expenditure of EVs. In this model, the power expenditure of the EVs during the process of going up and downhill is considered to solve the routing schedule of logistics transportation in mountainous areas. This study combines Q-learning and the Re-insertion Genetic Algorithm (Q-RIGA) to design EV routes with low electricity expenditure and reduced transportation costs. The Q-learning algorithm is used to improve route initialization and obtain high-quality initial routes, which are further optimized by RIGA. Tested in a collection of randomly dispersed customer groups, the advantages of the proposed method in terms of convergence speed and power expenditure are confirmed. The three-dimensional power expenditure model with consideration of elevation is used to conduct simulation experiments on the distribution example of Sanlian Dairy in Guizhou to verify that the improved model features broader application and higher practical value.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"731 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GloFP-MSF: monocular scene flow estimation with global feature perception GloFP-MSF：利用全局特征感知进行单目场景流估计

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-30 DOI: 10.1007/s00530-024-01418-5

Xuezhi Xiang, Yu Cui, Xi Wang, Mingliang Zhai, Abdulmotaleb El Saddik

Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. Based on the cross-covariance attention, we propose a global feature perception module (GFPM) and applie it to the decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder’s recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.

单目场景流估计是一项能从连续的单目图像中获取三维结构和三维运动的任务。以往的单目场景流通常直接关注图像特征和运动特征的增强，而忽视了运动特征和图像特征在解码器中的利用，而运动特征和图像特征对于准确的场景流估计同样至关重要。基于交叉协方差注意，我们提出了全局特征感知模块（GFPM），并将其应用于解码器中，使解码器能够有效利用当前层的运动特征和图像特征以及上一层场景流的粗估计结果，从而提高解码器对三维运动信息的恢复能力。此外，我们还提出了一种用于特征提取的并行自注意和卷积（PCSA）架构，可以增强提取的图像特征的全局表达能力。我们提出的方法在 KITTI 2015 数据集上表现出色，与基线方法相比相对提高了 17.6%。与其他最新方法相比，我们提出的模型取得了具有竞争力的结果。

{"title":"GloFP-MSF: monocular scene flow estimation with global feature perception","authors":"Xuezhi Xiang, Yu Cui, Xi Wang, Mingliang Zhai, Abdulmotaleb El Saddik","doi":"10.1007/s00530-024-01418-5","DOIUrl":"https://doi.org/10.1007/s00530-024-01418-5","url":null,"abstract":"Monocular scene flow estimation is a task that allows us to obtain 3D structure and 3D motion from consecutive monocular images. Previous monocular scene flow usually focused on the enhancement of image features and motion features directly while neglecting the utilization of motion features and image features in the decoder, which are equally crucial for accurate scene flow estimation. Based on the cross-covariance attention, we propose a global feature perception module (GFPM) and applie it to the decoder, which enables the decoder to utilize the motion features and image features of the current layer as well as the coarse estimation result of the scene flow of the previous layer effectively, thus enhancing the decoder’s recovery of 3D motion information. In addition, we also propose a parallel architecture of self-attention and convolution (PCSA) for feature extraction, which can enhance the global expression ability of extracted image features. Our proposed method demonstrates remarkable performance on the KITTI 2015 dataset, achieving a relative improvement of 17.6% compared to the baseline approach. Compared to other recent methods, the proposed model achieves competitive results.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"50 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Text-centered cross-sample fusion network for multimodal sentiment analysis 用于多模态情感分析的文本中心跨样本融合网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-30 DOI: 10.1007/s00530-024-01421-w

Qionghao Huang, Jili Chen, Changqin Huang, Xiaodi Huang, Yi Wang

Significant advancements in multimodal sentiment analysis tasks have been achieved through cross-modal attention mechanisms (CMA). However, the importance of modality-specific information for distinguishing similar samples is often overlooked due to the inherent limitations of CMA. To address this issue, we propose a Text-centered Cross-sample Fusion Network (TeCaFN), which employs cross-sample fusion to perceive modality-specific information during modal fusion. Specifically, we develop a cross-sample fusion method that merges modalities from distinct samples. This method maintains detailed modality-specific information through the use of adversarial training combined with a task of pairwise prediction. Furthermore, a robust mechanism using a two-stage text-centric contrastive learning approach is developed to enhance the stability of cross-sample fusion learning. TeCaFN achieves state-of-the-art results on the CMU-MOSI, CMU-MOSEI, and UR-FUNNY datasets. Moreover, our ablation studies further demonstrate the effectiveness of contrastive learning and adversarial training as the components of TeCaFN in improving model performance. The code implementation of this paper is available at https://github.com/TheShy-Dream/MSA-TeCaFN.

通过跨模态注意力机制（CMA），多模态情感分析任务取得了重大进展。然而，由于 CMA 固有的局限性，特定模态信息对于区分相似样本的重要性往往被忽视。为了解决这个问题，我们提出了以文本为中心的跨样本融合网络（TeCaFN），该网络在模态融合过程中采用跨样本融合来感知特定模态信息。具体来说，我们开发了一种跨样本融合方法，可合并来自不同样本的模态。这种方法通过使用对抗训练结合成对预测任务来保持详细的特定模态信息。此外，还开发了一种使用以文本为中心的两阶段对比学习方法的稳健机制，以增强跨样本融合学习的稳定性。TeCaFN 在 CMU-MOSI、CMU-MOSEI 和 UR-FUNNY 数据集上取得了一流的结果。此外，我们的消融研究进一步证明了对比学习和对抗训练作为 TeCaFN 的组成部分在提高模型性能方面的有效性。本文的代码实现见 https://github.com/TheShy-Dream/MSA-TeCaFN。

{"title":"Text-centered cross-sample fusion network for multimodal sentiment analysis","authors":"Qionghao Huang, Jili Chen, Changqin Huang, Xiaodi Huang, Yi Wang","doi":"10.1007/s00530-024-01421-w","DOIUrl":"https://doi.org/10.1007/s00530-024-01421-w","url":null,"abstract":"Significant advancements in multimodal sentiment analysis tasks have been achieved through cross-modal attention mechanisms (CMA). However, the importance of modality-specific information for distinguishing similar samples is often overlooked due to the inherent limitations of CMA. To address this issue, we propose a Text-centered Cross-sample Fusion Network (TeCaFN), which employs cross-sample fusion to perceive modality-specific information during modal fusion. Specifically, we develop a cross-sample fusion method that merges modalities from distinct samples. This method maintains detailed modality-specific information through the use of adversarial training combined with a task of pairwise prediction. Furthermore, a robust mechanism using a two-stage text-centric contrastive learning approach is developed to enhance the stability of cross-sample fusion learning. TeCaFN achieves state-of-the-art results on the CMU-MOSI, CMU-MOSEI, and UR-FUNNY datasets. Moreover, our ablation studies further demonstrate the effectiveness of contrastive learning and adversarial training as the components of TeCaFN in improving model performance. The code implementation of this paper is available at https://github.com/TheShy-Dream/MSA-TeCaFN.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"22 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141865991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DMFTNet: dense multimodal fusion transfer network for free-space detection DMFTNet：用于自由空间探测的密集多模态融合传输网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-29 DOI: 10.1007/s00530-024-01417-6

Jiabao Ma, Wujie Zhou, Meixin Fang, Ting Luo

Free-space detection is an essential task in autonomous driving; it can be formulated as the semantic segmentation of driving scenes. An important line of research in free-space detection is the use of convolutional neural networks to achieve high-accuracy semantic segmentation. In this study, we introduce two fusion modules: the dense exploration module (DEM) and the dual-attention exploration module (DAEM). They efficiently capture diverse fusion information by fully exploring deep and representative information at each network stage. Furthermore, we propose a dense multimodal fusion transfer network (DMFTNet). This architecture uses elaborate multimodal deep fusion exploration modules to extract fused features from red–green–blue and depth features at every stage with the help of DEM and DAEM and then densely transfer them to predict the free space. Extensive experiments were conducted comparing DMFTNet and 11 state-of-the-art approaches on two datasets. The proposed fusion module ensured that DMFTNet’s free-space-detection performance was superior.

自由空间检测是自动驾驶中的一项重要任务，可表述为驾驶场景的语义分割。自由空间检测的一个重要研究方向是利用卷积神经网络实现高精度的语义分割。在本研究中，我们引入了两个融合模块：密集探索模块（DEM）和双注意探索模块（DAEM）。它们通过在每个网络阶段充分探索深层次的代表性信息，有效地捕捉到了多样化的融合信息。此外，我们还提出了密集多模态融合传输网络（DMFTNet）。该架构使用精心设计的多模态深度融合探索模块，在 DEM 和 DAEM 的帮助下，在每个阶段从红绿蓝和深度特征中提取融合特征，然后将其密集传输以预测自由空间。在两个数据集上进行了广泛的实验，比较了 DMFTNet 和 11 种最先进的方法。所提出的融合模块确保了 DMFTNet 的自由空间探测性能更优越。

引用次数: 0

SA-MDRAD: sample-adaptive multi-teacher dynamic rectification adversarial distillation SA-MDRAD：样本自适应多教师动态整流对抗性蒸馏

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-29 DOI: 10.1007/s00530-024-01416-7

Shuyi Li, Xiaohan Yang, Guozhen Cheng, Wenyan Liu, Hongchao Hu

Adversarial training of lightweight models faces poor effectiveness problem due to the limited model size and the difficult optimization of loss with hard labels. Adversarial distillation is a potential solution to the problem, in which the knowledge from large adversarially pre-trained teachers is used to guide the lightweight models’ learning. However, adversarially pre-training teachers is computationally expensive due to the need for iterative gradient steps concerning the inputs. Additionally, the reliability of guidance from teachers diminishes as lightweight models become more robust. In this paper, we propose an adversarial distillation method called Sample-Adaptive Multi-teacher Dynamic Rectification Adversarial Distillation (SA-MDRAD). First, an adversarial distillation framework of distilling logits and features from the heterogeneous standard pre-trained teachers is developed to reduce pre-training expenses and improve knowledge diversity. Second, the knowledge of teachers is distilled into the lightweight model after sample-aware dynamic rectification and adaptive fusion based on teachers’ predictions to improve the reliability of knowledge. Experiments are conducted to evaluate the performance of the proposed method on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. The results demonstrate that our SA-MDRAD is more effective than existing adversarial distillation methods in enhancing the robustness of lightweight image classification models against various adversarial attacks.

轻量级模型的对抗训练面临着效果不佳的问题，原因在于模型规模有限，以及硬标签损失难以优化。逆向提炼是解决这一问题的一个潜在方案，即利用来自大型逆向预训练教师的知识来指导轻量级模型的学习。然而，由于需要对输入进行梯度迭代，对抗性预训练教师的计算成本很高。此外，随着轻量级模型变得越来越强大，教师指导的可靠性也会降低。在本文中，我们提出了一种称为样本自适应多教师动态矫正对抗蒸馏（SA-MDRAD）的对抗蒸馏方法。首先，我们建立了一个对抗性蒸馏框架，从异构的标准预培训教师中蒸馏对数和特征，以减少预培训费用并提高知识多样性。其次，根据教师的预测，经过样本感知动态矫正和自适应融合，将教师的知识提炼到轻量级模型中，以提高知识的可靠性。实验在 CIFAR-10、CIFAR-100 和 Tiny-ImageNet 数据集上评估了所提方法的性能。结果表明，在提高轻量级图像分类模型对各种对抗性攻击的鲁棒性方面，我们的 SA-MDRAD 比现有的对抗性蒸馏方法更有效。

{"title":"SA-MDRAD: sample-adaptive multi-teacher dynamic rectification adversarial distillation","authors":"Shuyi Li, Xiaohan Yang, Guozhen Cheng, Wenyan Liu, Hongchao Hu","doi":"10.1007/s00530-024-01416-7","DOIUrl":"https://doi.org/10.1007/s00530-024-01416-7","url":null,"abstract":"Adversarial training of lightweight models faces poor effectiveness problem due to the limited model size and the difficult optimization of loss with hard labels. Adversarial distillation is a potential solution to the problem, in which the knowledge from large adversarially pre-trained teachers is used to guide the lightweight models’ learning. However, adversarially pre-training teachers is computationally expensive due to the need for iterative gradient steps concerning the inputs. Additionally, the reliability of guidance from teachers diminishes as lightweight models become more robust. In this paper, we propose an adversarial distillation method called Sample-Adaptive Multi-teacher Dynamic Rectification Adversarial Distillation (SA-MDRAD). First, an adversarial distillation framework of distilling logits and features from the heterogeneous standard pre-trained teachers is developed to reduce pre-training expenses and improve knowledge diversity. Second, the knowledge of teachers is distilled into the lightweight model after sample-aware dynamic rectification and adaptive fusion based on teachers’ predictions to improve the reliability of knowledge. Experiments are conducted to evaluate the performance of the proposed method on CIFAR-10, CIFAR-100, and Tiny-ImageNet datasets. The results demonstrate that our SA-MDRAD is more effective than existing adversarial distillation methods in enhancing the robustness of lightweight image classification models against various adversarial attacks.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"65 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141866092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0