Multimedia Systems最新文献_第7页

Insulator defect detection based on BaS-YOLOv5 基于 BaS-YOLOv5 的绝缘体缺陷检测

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-23 DOI: 10.1007/s00530-024-01413-w

Yu Zhang, Yinke Dou, Kai Yang, Xiaoyang Song, Jin Wang, Liangliang Zhao

Currently, the use of deep learning technologies for detecting defects in transmission line insulators based on images obtained through unmanned aerial vehicle inspection simultaneously presents the problems of insufficient detection accuracy and speed. Therefore, this study first introduced the bidirectional feature pyramid network (BiFPN) module into YOLOv5 to achieve high detection speed as well as enable the combination of image features at different scales, enhance information representation, and allow accurate detection of insulator defect at different scales. Subsequently, the BiFPN module and simple parameter-free attention module (SimAM) were combined to improve the feature representation ability and object detection accuracy. The SimAM also enabled fusion of features at multiple scales, further improving the insulator defect detection performance. Finally, multiple experimental controls were designed to verify the effectiveness and efficiency of the proposed model. The experimental results obtained using self-made datasets show that the combined BiFPN and SimAM model (i.e., the improved BaS-YOLOv5 model) performs better than the original YOLOv5 model; the precision, recall, average precision and F1 score increased by 6.2%, 5%, 5.9%, and 6%, respectively. Therefore, BaS-YOLOv5 substantially improves detection accuracy while maintaining a high detection speed, meeting the requirements for real-time insulator defect detection.

目前，基于无人机巡检获取的图像使用深度学习技术检测输电线路绝缘子缺陷，同时存在检测精度和速度不足的问题。因此，本研究首先在 YOLOv5 中引入了双向特征金字塔网络（BiFPN）模块，以实现较高的检测速度，同时实现不同尺度图像特征的组合，增强信息表示，实现不同尺度绝缘子缺陷的精确检测。随后，BiFPN 模块与简单无参数注意模块（SimAM）相结合，提高了特征表示能力和物体检测精度。SimAM 还能融合多个尺度的特征，进一步提高绝缘体缺陷检测性能。最后，设计了多个实验控制来验证所提模型的有效性和效率。使用自制数据集获得的实验结果表明，BiFPN 和 SimAM 组合模型（即改进的 BaS-YOLOv5 模型）的性能优于原始 YOLOv5 模型；精度、召回率、平均精度和 F1 分数分别提高了 6.2%、5%、5.9% 和 6%。因此，BaS-YOLOv5 在保持较高检测速度的同时大幅提高了检测精度，满足了绝缘体缺陷实时检测的要求。

{"title":"Insulator defect detection based on BaS-YOLOv5","authors":"Yu Zhang, Yinke Dou, Kai Yang, Xiaoyang Song, Jin Wang, Liangliang Zhao","doi":"10.1007/s00530-024-01413-w","DOIUrl":"https://doi.org/10.1007/s00530-024-01413-w","url":null,"abstract":"Currently, the use of deep learning technologies for detecting defects in transmission line insulators based on images obtained through unmanned aerial vehicle inspection simultaneously presents the problems of insufficient detection accuracy and speed. Therefore, this study first introduced the bidirectional feature pyramid network (BiFPN) module into YOLOv5 to achieve high detection speed as well as enable the combination of image features at different scales, enhance information representation, and allow accurate detection of insulator defect at different scales. Subsequently, the BiFPN module and simple parameter-free attention module (SimAM) were combined to improve the feature representation ability and object detection accuracy. The SimAM also enabled fusion of features at multiple scales, further improving the insulator defect detection performance. Finally, multiple experimental controls were designed to verify the effectiveness and efficiency of the proposed model. The experimental results obtained using self-made datasets show that the combined BiFPN and SimAM model (i.e., the improved BaS-YOLOv5 model) performs better than the original YOLOv5 model; the precision, recall, average precision and F1 score increased by 6.2%, 5%, 5.9%, and 6%, respectively. Therefore, BaS-YOLOv5 substantially improves detection accuracy while maintaining a high detection speed, meeting the requirements for real-time insulator defect detection.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"50 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A multi-scale channel attention network with federated learning for magnetic resonance image super-resolution 用于磁共振图像超分辨率的联合学习多尺度通道注意力网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-23 DOI: 10.1007/s00530-024-01415-8

Feiqiang Liu, Aiwen Jiang, Lihui Chen

Magnetic resonance (MR) images are widely used for clinical diagnosis, whereas some surrounding factors always limit the resolution, so under-sampled data is usually generated during imaging. Since high-resolution (HR) MR images contribute to the clinic diagnosis, reconstructing HR MR images from these under-sampled data is pretty important. Recently, deep learning (DL) methods for HR reconstruction of MR images have achieved impressive performance. However, it is difficult to collect enough data for training DL models in practice due to medical data privacy regulations. Fortunately, federated learning (FL) is proposed to eliminate this issue by local/distributed training and encryption. In this paper, we propose a multi-scale channel attention network (MSCAN) for MR image super-resolution (SR) and integrate it into an FL framework named FedAve to make use of data from multiple institutions and avoid privacy risk. Specifically, to utilize multi-scale information in MR images, we introduce a multi-scale feature block (MSFB), in which multi-scale features are extracted and attention among features at different scales is captured to re-weight these multi-scale features. Then, a spatial gradient profile loss is integrated into MSCAN to facilitate the recovery of textures in MR images. Last, we incorporate MSCAN into FedAve to simulate the scenery of collaborated training among multiple institutions. Ablation studies show the effectiveness of the multi-scale features, the multi-scale channel attention, and the texture loss. Comparative experiments with some state-of-the-art (SOTA) methods indicate that the proposed MSCAN is superior to the compared methods and the model with FL has close results to the one trained by centralized data.

磁共振（MR）图像被广泛应用于临床诊断，但一些周围因素总是会限制其分辨率，因此在成像过程中通常会产生采样不足的数据。由于高分辨率（HR）磁共振图像有助于临床诊断，因此从这些低采样数据重建高分辨率磁共振图像相当重要。最近，用于 MR 图像 HR 重建的深度学习（DL）方法取得了令人瞩目的成就。然而，由于医疗数据隐私法规的限制，在实践中很难收集到足够的数据来训练 DL 模型。幸运的是，联合学习（FL）的提出通过本地/分布式训练和加密消除了这一问题。在本文中，我们提出了一种用于磁共振图像超分辨率（SR）的多尺度通道注意网络（MSCAN），并将其集成到名为 FedAve 的联合学习框架中，以利用来自多个机构的数据并避免隐私风险。具体来说，为了利用磁共振图像中的多尺度信息，我们引入了多尺度特征块（MSFB），在该特征块中提取多尺度特征，并捕捉不同尺度特征间的注意力，以重新加权这些多尺度特征。然后，将空间梯度轮廓损失集成到 MSCAN 中，以促进 MR 图像中纹理的恢复。最后，我们将 MSCAN 集成到 FedAve 中，以模拟多个机构之间合作训练的场景。消融研究显示了多尺度特征、多尺度通道关注和纹理损失的有效性。与一些最先进（SOTA）方法的对比实验表明，所提出的 MSCAN 优于所对比的方法，并且带有 FL 的模型与通过集中数据训练的模型结果接近。

{"title":"A multi-scale channel attention network with federated learning for magnetic resonance image super-resolution","authors":"Feiqiang Liu, Aiwen Jiang, Lihui Chen","doi":"10.1007/s00530-024-01415-8","DOIUrl":"https://doi.org/10.1007/s00530-024-01415-8","url":null,"abstract":"Magnetic resonance (MR) images are widely used for clinical diagnosis, whereas some surrounding factors always limit the resolution, so under-sampled data is usually generated during imaging. Since high-resolution (HR) MR images contribute to the clinic diagnosis, reconstructing HR MR images from these under-sampled data is pretty important. Recently, deep learning (DL) methods for HR reconstruction of MR images have achieved impressive performance. However, it is difficult to collect enough data for training DL models in practice due to medical data privacy regulations. Fortunately, federated learning (FL) is proposed to eliminate this issue by local/distributed training and encryption. In this paper, we propose a multi-scale channel attention network (MSCAN) for MR image super-resolution (SR) and integrate it into an FL framework named FedAve to make use of data from multiple institutions and avoid privacy risk. Specifically, to utilize multi-scale information in MR images, we introduce a multi-scale feature block (MSFB), in which multi-scale features are extracted and attention among features at different scales is captured to re-weight these multi-scale features. Then, a spatial gradient profile loss is integrated into MSCAN to facilitate the recovery of textures in MR images. Last, we incorporate MSCAN into FedAve to simulate the scenery of collaborated training among multiple institutions. Ablation studies show the effectiveness of the multi-scale features, the multi-scale channel attention, and the texture loss. Comparative experiments with some state-of-the-art (SOTA) methods indicate that the proposed MSCAN is superior to the compared methods and the model with FL has close results to the one trained by centralized data.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"43 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784566","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes 用于街景实时语义分割的门控特征聚合和配准网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-23 DOI: 10.1007/s00530-024-01429-2

Qian Liu, Zhensheng Li, Youwei Qi, Cunbao Wang

Semantic segmentation of street scenes is important for the vision-based application of autonomous driving. Recently, high-accuracy networks based on deep learning have been widely applied to semantic segmentation, but their inference speeds are slow. In order to achieve faster speed, most popular real-time network architectures adopt stepwise downsampling operation in the backbone to obtain features with different sizes. However, they ignore the misalignment between feature maps from different levels, and their simple feature aggregation using element-wise addition or channel-wise concatenation may submerge the useful information in a large number of useless information. To deal with these problems, we propose a gated feature aggregation and alignment network (GFAANet) for real-time semantic segmentation of street scenes. In GFAANet, a feature alignment aggregation module is developed to effectively align and aggregate the feature maps from different levels. And we present a gated feature aggregation module to selectively aggregate and refine effective information from multi-stage features of the backbone network using gates. Furthermore, a depthwise separable pyramid pooling module based on low-resolution feature maps is designed as a context extractor to expand the effective receptive fields and fuse multi-scale contexts. Experimental results on two challenging street scene benchmark datasets show that GFAANet achieves highest accuracy in real-time semantic segmentation of street scenes, as compared with the state-of-the-art. We conclude that our GFAANet can quickly and effectively segment street scene images, which may provide technical support for autonomous driving.

街道场景的语义分割对于基于视觉的自动驾驶应用非常重要。最近，基于深度学习的高精度网络被广泛应用于语义分割，但其推理速度较慢。为了实现更快的速度，大多数流行的实时网络架构都在骨干网中采用逐步降采样操作，以获得不同大小的特征。然而，它们忽略了不同层次的特征图之间的错位，而且使用元素加法或信道连接进行简单的特征聚合可能会将有用信息淹没在大量无用信息中。为了解决这些问题，我们提出了一种用于街景实时语义分割的门控特征聚合和配准网络（GFAANet）。在 GFAANet 中，我们开发了一个特征对齐聚合模块，以有效地对齐和聚合来自不同层次的特征图。我们还提出了一个门控特征聚合模块，利用门控技术从骨干网络的多级特征中选择性地聚合和提炼有效信息。此外，我们还设计了一个基于低分辨率特征图的深度可分离金字塔汇集模块，作为情境提取器来扩展有效感受野和融合多尺度情境。在两个具有挑战性的街景基准数据集上的实验结果表明，与最先进的技术相比，GFAANet 在街景实时语义分割方面达到了最高的准确率。我们的结论是，我们的 GFAANet 可以快速有效地分割街道场景图像，从而为自动驾驶提供技术支持。

{"title":"Gated feature aggregate and alignment network for real-time semantic segmentation of street scenes","authors":"Qian Liu, Zhensheng Li, Youwei Qi, Cunbao Wang","doi":"10.1007/s00530-024-01429-2","DOIUrl":"https://doi.org/10.1007/s00530-024-01429-2","url":null,"abstract":"Semantic segmentation of street scenes is important for the vision-based application of autonomous driving. Recently, high-accuracy networks based on deep learning have been widely applied to semantic segmentation, but their inference speeds are slow. In order to achieve faster speed, most popular real-time network architectures adopt stepwise downsampling operation in the backbone to obtain features with different sizes. However, they ignore the misalignment between feature maps from different levels, and their simple feature aggregation using element-wise addition or channel-wise concatenation may submerge the useful information in a large number of useless information. To deal with these problems, we propose a gated feature aggregation and alignment network (GFAANet) for real-time semantic segmentation of street scenes. In GFAANet, a feature alignment aggregation module is developed to effectively align and aggregate the feature maps from different levels. And we present a gated feature aggregation module to selectively aggregate and refine effective information from multi-stage features of the backbone network using gates. Furthermore, a depthwise separable pyramid pooling module based on low-resolution feature maps is designed as a context extractor to expand the effective receptive fields and fuse multi-scale contexts. Experimental results on two challenging street scene benchmark datasets show that GFAANet achieves highest accuracy in real-time semantic segmentation of street scenes, as compared with the state-of-the-art. We conclude that our GFAANet can quickly and effectively segment street scene images, which may provide technical support for autonomous driving.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"73 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141784525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Adaptive client selection and model aggregation for heterogeneous federated learning 异构联合学习的自适应客户端选择和模型聚合

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-21 DOI: 10.1007/s00530-024-01386-w

Rui Zhai, Haozhe Jin, Wei Gong, Ke Lu, Yanhong Liu, Yalin Song, Junyang Yu

Federated Learning (FL) is a distributed machine learning method that allows multiple clients to collaborate on model training without sharing raw data. However, FL faces challenges with data heterogeneity, leading to reduced model accuracy and slower convergence. Although existing client selection methods can alleviate the above problems, there is still room to improve FL performance. To tackle these problems, we first propose a novel client selection method based on Multi-Armed Bandit (MAB). The method uses the historical training information uploaded by each client to calculate its correlation and contribution. The calculated values are then used to select a set of clients that can bring the most benefit, i.e., maximizing both model accuracy and convergence speed. Second, we propose an adaptive global model aggregation method that utilizes the local training information of selected clients to dynamically assign weights to local model parameters. Extensive experiments on various datasets with different heterogeneous settings demonstrate that our proposed method is effectively improving FL performance compared to several benchmarks.

联合学习（FL）是一种分布式机器学习方法，它允许多个客户端在不共享原始数据的情况下合作进行模型训练。然而，FL 面临着数据异构性的挑战，导致模型准确性降低和收敛速度减慢。虽然现有的客户端选择方法可以缓解上述问题，但 FL 性能仍有提升空间。为了解决这些问题，我们首先提出了一种基于多臂匪特（MAB）的新型客户选择方法。该方法使用每个客户端上传的历史训练信息来计算其相关性和贡献。然后利用计算值选择一组能带来最大收益的客户端，即最大限度地提高模型准确性和收敛速度。其次，我们提出了一种自适应全局模型聚合方法，该方法利用所选客户端的本地训练信息为本地模型参数动态分配权重。在具有不同异构设置的各种数据集上进行的大量实验表明，与几个基准相比，我们提出的方法有效地提高了 FL 性能。

{"title":"Adaptive client selection and model aggregation for heterogeneous federated learning","authors":"Rui Zhai, Haozhe Jin, Wei Gong, Ke Lu, Yanhong Liu, Yalin Song, Junyang Yu","doi":"10.1007/s00530-024-01386-w","DOIUrl":"https://doi.org/10.1007/s00530-024-01386-w","url":null,"abstract":"Federated Learning (FL) is a distributed machine learning method that allows multiple clients to collaborate on model training without sharing raw data. However, FL faces challenges with data heterogeneity, leading to reduced model accuracy and slower convergence. Although existing client selection methods can alleviate the above problems, there is still room to improve FL performance. To tackle these problems, we first propose a novel client selection method based on Multi-Armed Bandit (MAB). The method uses the historical training information uploaded by each client to calculate its correlation and contribution. The calculated values are then used to select a set of clients that can bring the most benefit, i.e., maximizing both model accuracy and convergence speed. Second, we propose an adaptive global model aggregation method that utilizes the local training information of selected clients to dynamically assign weights to local model parameters. Extensive experiments on various datasets with different heterogeneous settings demonstrate that our proposed method is effectively improving FL performance compared to several benchmarks.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"12 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A method of hybrid dilated and global convolution networks for pavement crack detection 路面裂缝检测的混合扩张和全局卷积网络方法

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-19 DOI: 10.1007/s00530-024-01408-7

Zhong Qu, Ming Li, Bin Yuan, Guoqing Mu

Automatic crack detection is important for efficient and economical pavement maintenance. With the development of Convolutional Neural Networks (CNNs), crack detection methods have been mostly based on CNNs. In this paper, we propose a novel automatic crack detection network architecture, named hybrid dilated and global convolutional networks. Firstly, we integrate the hybrid dilated convolution module into ResNet-152 network, which can effectively aggregate global features. Then, we use the global convolution module to enhance the classification and localization ability of the extracted features. Finally, the feature fusion module is introduced to fuse multi-scale and multi-level feature maps. The proposed network can capture crack features from a global perspective and generate the corresponding feature maps. In order to demonstrate the effectiveness of our proposed method, we evaluate it on the four public crack datasets, DeepCrack, CFD, Cracktree200 and CRACK500, which achieves ODS values as 87.12%, 83.96%, 82.66%, 81.35% and OIS values as 87.55%, 84.82%, 83.56% and 82.98%. Compared with HED, RCF, DeepCrackT, FPHBN, ResNet-152 and DeepCrack, the ODS value performance improvement made in our method is 1.21%, 3.35%, 3.07%, 3.36%, 4.79% and 1% on DeepCrack dataset. Sufficient experimental statistics certificate that our proposed method outperforms other state-of-the-art crack detection, edge detection and image segmentation methods.

自动裂缝检测对于高效、经济的路面维护非常重要。随着卷积神经网络（CNN）的发展，裂缝检测方法大多基于 CNN。在本文中，我们提出了一种新的自动裂缝检测网络架构，命名为混合扩张和全局卷积网络。首先，我们在 ResNet-152 网络中集成了混合扩张卷积模块，它能有效地聚合全局特征。然后，我们使用全局卷积模块来增强提取特征的分类和定位能力。最后，引入特征融合模块，对多尺度、多层次的特征图进行融合。所提出的网络可以从全局角度捕捉裂缝特征，并生成相应的特征图。为了证明所提方法的有效性，我们在 DeepCrack、CFD、Cracktree200 和 CRACK500 四个公开裂纹数据集上对其进行了评估，结果表明，所提方法的 ODS 值分别为 87.12%、83.96%、82.66% 和 81.35%，OIS 值分别为 87.55%、84.82%、83.56% 和 82.98%。与 HED、RCF、DeepCrackT、FPHBN、ResNet-152 和 DeepCrack 相比，我们的方法在 DeepCrack 数据集上的 ODS 值性能提高了 1.21%、3.35%、3.07%、3.36%、4.79% 和 1%。充分的实验数据证明，我们提出的方法优于其他最先进的裂缝检测、边缘检测和图像分割方法。

{"title":"A method of hybrid dilated and global convolution networks for pavement crack detection","authors":"Zhong Qu, Ming Li, Bin Yuan, Guoqing Mu","doi":"10.1007/s00530-024-01408-7","DOIUrl":"https://doi.org/10.1007/s00530-024-01408-7","url":null,"abstract":"Automatic crack detection is important for efficient and economical pavement maintenance. With the development of Convolutional Neural Networks (CNNs), crack detection methods have been mostly based on CNNs. In this paper, we propose a novel automatic crack detection network architecture, named hybrid dilated and global convolutional networks. Firstly, we integrate the hybrid dilated convolution module into ResNet-152 network, which can effectively aggregate global features. Then, we use the global convolution module to enhance the classification and localization ability of the extracted features. Finally, the feature fusion module is introduced to fuse multi-scale and multi-level feature maps. The proposed network can capture crack features from a global perspective and generate the corresponding feature maps. In order to demonstrate the effectiveness of our proposed method, we evaluate it on the four public crack datasets, DeepCrack, CFD, Cracktree200 and CRACK500, which achieves ODS values as 87.12%, 83.96%, 82.66%, 81.35% and OIS values as 87.55%, 84.82%, 83.56% and 82.98%. Compared with HED, RCF, DeepCrackT, FPHBN, ResNet-152 and DeepCrack, the ODS value performance improvement made in our method is 1.21%, 3.35%, 3.07%, 3.36%, 4.79% and 1% on DeepCrack dataset. Sufficient experimental statistics certificate that our proposed method outperforms other state-of-the-art crack detection, edge detection and image segmentation methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"24 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141742014","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised masked face inpainting based on contrastive learning and attention mechanism 基于对比学习和注意力机制的无监督蒙版人脸绘制

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-18 DOI: 10.1007/s00530-024-01411-y

Weiguo Wan, Shunming Chen, Li Yao, Yingmei Zhang

Masked face inpainting, aiming to restore realistic facial details and complete textures, remains a challenging task. In this paper, an unsupervised masked face inpainting method based on contrastive learning and attention mechanism is proposed. First, to overcome the constraint of a paired training dataset, a contrastive learning network framework is constructed by comparing features extracted from inpainted face image patches with those from input masked face image patches. Subsequently, to extract more effective facial features, a feature attention module is designed, which can focus on the significant feature information and establish long-range dependency relationships. In addition, a PatchGAN-based discriminator is refined with spectral normalization to enhance the stability of training the proposed network and guide the generator in producing more realistic face images. Numerous experiment results indicate that our approach can obtain better masked face inpainting results than the comparison approaches overall in terms of both subjective and objective evaluations, as well as face recognition accuracy.

旨在还原逼真面部细节和完整纹理的蒙版人脸着色仍然是一项具有挑战性的任务。本文提出了一种基于对比学习和注意力机制的无监督蒙版人脸内绘方法。首先，为了克服配对训练数据集的限制，本文构建了一个对比学习网络框架，将从内绘人脸图像斑块中提取的特征与从输入的屏蔽人脸图像斑块中提取的特征进行对比。随后，为了提取更有效的面部特征，设计了一个特征关注模块，该模块可以关注重要的特征信息并建立长程依赖关系。此外，基于 PatchGAN 的判别器通过光谱归一化进行了改进，以提高训练网络的稳定性，并引导生成器生成更逼真的人脸图像。大量实验结果表明，无论从主观评价还是客观评价以及人脸识别准确率来看，我们的方法都能获得比对比方法更好的遮罩人脸涂色效果。

引用次数: 0

Underwater small and occlusion object detection with feature fusion and global context decoupling head-based YOLO 利用特征融合和全局上下文解耦头基 YOLO 进行水下小物体和遮挡物体检测

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-17 DOI: 10.1007/s00530-024-01410-z

Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu

The underwater light scattering, absorption, and camera or target moving often bring issues such as blurring, distortion, and color deviation in underwater imaging, which poses significant challenges to underwater target detection. Numerous detectors have been proposed to address these challenges, such as YOLO series models, RCNN-based variants, and Transformer-based variants. However, the previous detectors often have poor detection results when encountering small targets and target occlusion problems. To tackle these issues, We propose a feature fusion and global semantic decoupling head-based YOLO detection method. Specifically, we propose an efficient feature fusion module to solve the problem of small target feature information being lost and difficult to detect accurately. We also use self-supervision to recalibrate the feature information between each level, which achieves full integration of semantic information between different levels. We design a decoupling head that focuses on global context information, which can better filter out complex background information, thereby achieving effective detection of targets under occluded backgrounds. Finally, we replace simple upsampling with a content-aware reassembly module in the YOLO backbone, alleviating the problem of imprecise localization and identification of small targets caused by feature loss to some extent. The experimental results indicate that the proposed method achieves superior performance compared to other state-of-the-art single-stage and two-stage detection networks. Specifically, on the UTDAC2020 dataset, the proposed method attains mAP50-95 and mAP50 scores of 54.4% and 87.7%, respectively.

水下光散射、吸收以及摄像机或目标移动往往会带来水下成像模糊、失真和色彩偏差等问题，这给水下目标检测带来了巨大挑战。为了应对这些挑战，人们提出了许多探测器，如 YOLO 系列模型、基于 RCNN 的变体和基于 Transformer 的变体。然而，以往的检测器在遇到小目标和目标遮挡问题时往往检测效果不佳。为了解决这些问题，我们提出了一种基于特征融合和全局语义解耦的头部 YOLO 检测方法。具体来说，我们提出了一种高效的特征融合模块，以解决小目标特征信息丢失和难以准确检测的问题。我们还利用自监督来重新校准各层次之间的特征信息，实现了不同层次之间语义信息的充分融合。我们设计了一个解耦头，专注于全局上下文信息，可以更好地过滤掉复杂的背景信息，从而实现对遮挡背景下目标的有效检测。最后，我们在 YOLO 骨干模块中加入了内容感知重组装模块，取代了简单的上采样，在一定程度上缓解了因特征丢失而导致的小目标定位和识别不精确的问题。实验结果表明，与其他最先进的单级和两级检测网络相比，所提出的方法实现了更优越的性能。具体来说，在UTDAC2020数据集上，所提方法的mAP50-95和mAP50得分率分别达到54.4%和87.7%。

{"title":"Underwater small and occlusion object detection with feature fusion and global context decoupling head-based YOLO","authors":"Lei Deng, Shaojuan Luo, Chunhua He, Huapan Xiao, Heng Wu","doi":"10.1007/s00530-024-01410-z","DOIUrl":"https://doi.org/10.1007/s00530-024-01410-z","url":null,"abstract":"The underwater light scattering, absorption, and camera or target moving often bring issues such as blurring, distortion, and color deviation in underwater imaging, which poses significant challenges to underwater target detection. Numerous detectors have been proposed to address these challenges, such as YOLO series models, RCNN-based variants, and Transformer-based variants. However, the previous detectors often have poor detection results when encountering small targets and target occlusion problems. To tackle these issues, We propose a feature fusion and global semantic decoupling head-based YOLO detection method. Specifically, we propose an efficient feature fusion module to solve the problem of small target feature information being lost and difficult to detect accurately. We also use self-supervision to recalibrate the feature information between each level, which achieves full integration of semantic information between different levels. We design a decoupling head that focuses on global context information, which can better filter out complex background information, thereby achieving effective detection of targets under occluded backgrounds. Finally, we replace simple upsampling with a content-aware reassembly module in the YOLO backbone, alleviating the problem of imprecise localization and identification of small targets caused by feature loss to some extent. The experimental results indicate that the proposed method achieves superior performance compared to other state-of-the-art single-stage and two-stage detection networks. Specifically, on the UTDAC2020 dataset, the proposed method attains mAP50-95 and mAP50 scores of 54.4% and 87.7%, respectively.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"1 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141719136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

M2AST:MLP-mixer-based adaptive spatial-temporal graph learning for human motion prediction M2AST：基于 MLP 混合器的自适应时空图学习，用于人体运动预测

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-12 DOI: 10.1007/s00530-024-01351-7

Junyi Tang, Simin An, Yuanwei Liu, Yong Su, Jin Chen

Human motion prediction is a challenging task in human-centric computer vision, involving forecasting future poses based on historical sequences. Despite recent progress in modeling spatial-temporal relationships of motion sequences using complex structured graphs, few approaches have provided an adaptive and lightweight representation for varying graph structures of human motion. Taking inspiration from the advantages of MLP-Mixer, a lightweight architecture designed for learning complex interactions in multi-dimensional data, we explore its potential as a backbone for motion prediction. To this end, we propose a novel MLP-Mixer-based adaptive spatial-temporal pattern learning framework (M(^2)AST). Our framework includes an adaptive spatial mixer to model the spatial relationships between joints, an adaptive temporal mixer to learn temporal smoothness, and a local dynamic mixer to capture fine-grained cross-dependencies between joints of adjacent poses. The final method achieves a compact representation of human motion dynamics by adaptively considering spatial-temporal dependencies from coarse to fine. Unlike the trivial spatial-temporal MLP-Mixer, our proposed approach can more effectively capture both local and global spatial-temporal relationships simultaneously. We extensively evaluated our proposed framework on three commonly used benchmarks (Human3.6M, AMASS, 3DPW MoCap), demonstrating comparable or better performance than existing state-of-the-art methods in both short and long-term predictions, despite having significantly fewer parameters. Overall, our proposed framework provides a novel and efficient solution for human motion prediction with adaptive graph learning.

人体运动预测是以人为中心的计算机视觉中一项具有挑战性的任务，它涉及根据历史序列预测未来姿势。尽管最近在使用复杂结构图对运动序列的时空关系建模方面取得了进展，但很少有方法能为人类运动的不同图结构提供自适应的轻量级表示。MLP-Mixer 是一种专为学习多维数据中的复杂交互而设计的轻量级架构，我们从 MLP-Mixer 的优势中汲取灵感，探索其作为运动预测骨干的潜力。为此，我们提出了一种新颖的基于 MLP-Mixer 的自适应时空模式学习框架 (M（^2）AST)。我们的框架包括一个自适应空间混合器，用于模拟关节之间的空间关系；一个自适应时间混合器，用于学习时间平滑性；以及一个局部动态混合器，用于捕捉相邻姿势的关节之间的细粒度交叉依赖关系。最终的方法通过自适应地考虑从粗到细的空间-时间依赖关系，实现了对人体运动动态的紧凑表示。与微不足道的时空 MLP-Mixer 不同，我们提出的方法能更有效地同时捕捉局部和全局时空关系。我们在三个常用基准（Human3.6M、AMASS、3DPW MoCap）上对我们提出的框架进行了广泛评估，结果表明，尽管参数少得多，但在短期和长期预测方面，我们提出的框架的性能与现有的先进方法相当，甚至更好。总之，我们提出的框架为利用自适应图学习进行人体运动预测提供了一种新颖、高效的解决方案。

{"title":"M2AST:MLP-mixer-based adaptive spatial-temporal graph learning for human motion prediction","authors":"Junyi Tang, Simin An, Yuanwei Liu, Yong Su, Jin Chen","doi":"10.1007/s00530-024-01351-7","DOIUrl":"https://doi.org/10.1007/s00530-024-01351-7","url":null,"abstract":"Human motion prediction is a challenging task in human-centric computer vision, involving forecasting future poses based on historical sequences. Despite recent progress in modeling spatial-temporal relationships of motion sequences using complex structured graphs, few approaches have provided an adaptive and lightweight representation for varying graph structures of human motion. Taking inspiration from the advantages of MLP-Mixer, a lightweight architecture designed for learning complex interactions in multi-dimensional data, we explore its potential as a backbone for motion prediction. To this end, we propose a novel MLP-Mixer-based adaptive spatial-temporal pattern learning framework (M(^2)AST). Our framework includes an adaptive spatial mixer to model the spatial relationships between joints, an adaptive temporal mixer to learn temporal smoothness, and a local dynamic mixer to capture fine-grained cross-dependencies between joints of adjacent poses. The final method achieves a compact representation of human motion dynamics by adaptively considering spatial-temporal dependencies from coarse to fine. Unlike the trivial spatial-temporal MLP-Mixer, our proposed approach can more effectively capture both local and global spatial-temporal relationships simultaneously. We extensively evaluated our proposed framework on three commonly used benchmarks (Human3.6M, AMASS, 3DPW MoCap), demonstrating comparable or better performance than existing state-of-the-art methods in both short and long-term predictions, despite having significantly fewer parameters. Overall, our proposed framework provides a novel and efficient solution for human motion prediction with adaptive graph learning.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"29 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Global adaptive histogram feature network for automatic segmentation of infection regions in CT images 用于自动分割 CT 图像中感染区域的全局自适应直方图特征网络

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-11 DOI: 10.1007/s00530-024-01392-y

Xinren Min, Yang Liu, Shengjing Zhou, Huihua Huang, Li Zhang, Xiaojun Gong, Dongshan Yang, Menghao Wang, Rui Yang, Mingyang Zhong

Accurate and timely diagnosis of COVID-like virus is of paramount importance for lifesaving. In this work, deep learning techniques are applied to lung CT image segmentation for accurate disease diagnosis. We discuss the limitations of current diagnostic methods, such as RT-PCR, and highlights the advantages of deep learning, including its ability to automatically learn features and handle complex lesion morphology and texture. We, therefore, propose a novel deep learning framework, GAHFNet, specifically designed for automatic segmentation of COVID-19 lung CT images. The proposed method addresses the challenges in lung CT image segmentation, such as the complex image structure and difficulties of distinguishing COVID-19 pneumonia lesions from other pathologies. We provide the detailed description of the proposed GAHFNet. Finally, comprehensive experiments are carried out to evaluate the performance of GAHFNet, and the proposed method outperforms other traditional and the state-of-the-art methods in various evaluation metrics, demonstrating the effectiveness and the efficiency of the proposed method in this task. GAHFNet is able to facilitate the application of artificial intelligence in COVID-19 diagnosis and achieve accurate automatic segmentation of infected areas in COVID-19 lung CT images.

准确及时地诊断 COVID 类病毒对挽救生命至关重要。在这项工作中，深度学习技术被应用于肺部 CT 图像分割，以实现准确的疾病诊断。我们讨论了当前诊断方法（如 RT-PCR）的局限性，并强调了深度学习的优势，包括其自动学习特征和处理复杂病变形态和纹理的能力。因此，我们提出了一种新型深度学习框架--GAHFNet，专门用于 COVID-19 肺部 CT 图像的自动分割。所提出的方法解决了肺部 CT 图像分割中的难题，如复杂的图像结构以及将 COVID-19 肺炎病变与其他病变区分开来的困难。我们对所提出的 GAHFNet 进行了详细描述。最后，我们对 GAHFNet 的性能进行了全面的实验评估，结果表明所提出的方法在各种评价指标上都优于其他传统方法和最先进的方法，证明了所提出的方法在该任务中的有效性和高效性。GAHFNet 能够促进人工智能在 COVID-19 诊断中的应用，实现 COVID-19 肺部 CT 图像中感染区域的精确自动分割。

{"title":"Global adaptive histogram feature network for automatic segmentation of infection regions in CT images","authors":"Xinren Min, Yang Liu, Shengjing Zhou, Huihua Huang, Li Zhang, Xiaojun Gong, Dongshan Yang, Menghao Wang, Rui Yang, Mingyang Zhong","doi":"10.1007/s00530-024-01392-y","DOIUrl":"https://doi.org/10.1007/s00530-024-01392-y","url":null,"abstract":"Accurate and timely diagnosis of COVID-like virus is of paramount importance for lifesaving. In this work, deep learning techniques are applied to lung CT image segmentation for accurate disease diagnosis. We discuss the limitations of current diagnostic methods, such as RT-PCR, and highlights the advantages of deep learning, including its ability to automatically learn features and handle complex lesion morphology and texture. We, therefore, propose a novel deep learning framework, GAHFNet, specifically designed for automatic segmentation of COVID-19 lung CT images. The proposed method addresses the challenges in lung CT image segmentation, such as the complex image structure and difficulties of distinguishing COVID-19 pneumonia lesions from other pathologies. We provide the detailed description of the proposed GAHFNet. Finally, comprehensive experiments are carried out to evaluate the performance of GAHFNet, and the proposed method outperforms other traditional and the state-of-the-art methods in various evaluation metrics, demonstrating the effectiveness and the efficiency of the proposed method in this task. GAHFNet is able to facilitate the application of artificial intelligence in COVID-19 diagnosis and achieve accurate automatic segmentation of infected areas in COVID-19 lung CT images.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"46 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141611179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Smart contract vulnerabilities detection with bidirectional encoder representations from transformers and control flow graph 利用变换器和控制流图的双向编码器表示检测智能合约漏洞

IF 3.9 3区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Systems

Pub Date : 2024-07-10 DOI: 10.1007/s00530-024-01406-9

Peng Su, Jingyuan Hu

Up to now, the smart contract vulnerabilities detection methods based on sequence modal data and sequence models have been the most commonly used. However, existing state-of-the-art methods disregard the issue of sequence modal data loses structural information and control flow information. Additionally, it is hard for sequence models to extract global features of smart contracts. Moreover, these methods rarely consider the impact of noise data on vulnerabilities detection. To tackle these issues, we propose a smart contract vulnerabilities detection model based on bidirectional encoder representation from transformers (BERT) and control flow graph (CFG). On the one hand, we design a denoising method suitable for control flow graphs to reduce the impact of noisy data on vulnerabilities detection. On the other hand, we design a novel method to parse the control flow graph into a BERT input form that retains control flow information and structural information. The BERT learns the potential vulnerability characteristics of smart contracts to fine-tune itself. Through an empirical evaluation of a large-scale real-world dataset and compare 5 state-of-the-art baseline methods. Our method achieves (1) optimal performance over all baseline methods; (2) 0.6–17.1% higher F1-score than baseline methods; (3) 0.7–16.7% higher accuracy than baseline methods; (4) 0.6–17% higher precision than baseline methods; (5) 0.2–19.5% higher recall than baseline methods.

迄今为止，基于序列模态数据和序列模型的智能合约漏洞检测方法最为常用。然而，现有的先进方法忽略了序列模态数据丢失结构信息和控制流信息的问题。此外，序列模型很难提取智能合约的全局特征。此外，这些方法很少考虑噪声数据对漏洞检测的影响。为了解决这些问题，我们提出了一种基于变压器双向编码器表示法（BERT）和控制流图（CFG）的智能合约漏洞检测模型。一方面，我们设计了一种适用于控制流图的去噪方法，以减少噪声数据对漏洞检测的影响。另一方面，我们设计了一种新方法，将控制流图解析为 BERT 输入形式，其中保留了控制流信息和结构信息。BERT 可以学习智能合约的潜在漏洞特征，从而对自身进行微调。通过对大规模真实数据集进行实证评估，并与 5 种最先进的基线方法进行比较。我们的方法取得了（1）优于所有基线方法的最佳性能；（2）比基线方法高出 0.6-17.1% 的 F1 分数；（3）比基线方法高出 0.7-16.7% 的准确率；（4）比基线方法高出 0.6-17% 的精确度；（5）比基线方法高出 0.2-19.5% 的召回率。

{"title":"Smart contract vulnerabilities detection with bidirectional encoder representations from transformers and control flow graph","authors":"Peng Su, Jingyuan Hu","doi":"10.1007/s00530-024-01406-9","DOIUrl":"https://doi.org/10.1007/s00530-024-01406-9","url":null,"abstract":"Up to now, the smart contract vulnerabilities detection methods based on sequence modal data and sequence models have been the most commonly used. However, existing state-of-the-art methods disregard the issue of sequence modal data loses structural information and control flow information. Additionally, it is hard for sequence models to extract global features of smart contracts. Moreover, these methods rarely consider the impact of noise data on vulnerabilities detection. To tackle these issues, we propose a smart contract vulnerabilities detection model based on bidirectional encoder representation from transformers (BERT) and control flow graph (CFG). On the one hand, we design a denoising method suitable for control flow graphs to reduce the impact of noisy data on vulnerabilities detection. On the other hand, we design a novel method to parse the control flow graph into a BERT input form that retains control flow information and structural information. The BERT learns the potential vulnerability characteristics of smart contracts to fine-tune itself. Through an empirical evaluation of a large-scale real-world dataset and compare 5 state-of-the-art baseline methods. Our method achieves (1) optimal performance over all baseline methods; (2) 0.6–17.1% higher F1-score than baseline methods; (3) 0.7–16.7% higher accuracy than baseline methods; (4) 0.6–17% higher precision than baseline methods; (5) 0.2–19.5% higher recall than baseline methods.","PeriodicalId":51138,"journal":{"name":"Multimedia Systems","volume":"25 1","pages":""},"PeriodicalIF":3.9,"publicationDate":"2024-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141570969","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0