Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.
在使用手持或可穿戴摄像机拍摄时,由于摄像机或前景(即拍摄到的移动物体)的快速移动,通常会产生运动模糊图像。大多数基于传统算法的方法无法有效还原非线性运动模糊图像。最近,人们开发出了基于深度学习网络的方法,这种方法计算量大,可用于消除盲运动模糊图像。然而,这些方法在恢复图像细节方面的效果仍然有限,尤其是对于模糊的夜间图像。为了有效地对白天和夜间的模糊图像进行去模糊,所提出的视频去模糊方法由三大部分组成:图像存储模块(存储上一帧去模糊图像)、相邻帧配准模块(执行最佳特征点选择和透视变换矩阵)和视频去模糊神经网络模块(包含单幅图像去模糊和相邻帧融合去模糊两个子网络)。所提方法的主要策略是设计一个模糊注意力区块,以提取更有效的特征(尤其是夜间图像),从而还原物体的边缘或细节。此外,还在这两个子网络中引入了跳转连接,以提高模型融合不同层上下文特征的能力,从而进一步增强去模糊效果。定量评估结果表明,我们的方法实现了 32.401 dB 的平均 PSNR 和 0.9107 的 SSIM,在 PSNR 和 SSIM 方面分别超过次优方法 1.635 dB 和 0.0381 dB。这些改进揭示了所提出的方法在解决白天和夜间场景中的去模糊难题方面的有效性,特别是在使真正模糊的夜间图像中的字母数字字符清晰可辨方面。
{"title":"Effective video deblurring based on feature-enhanced deep learning network for daytime and nighttime images","authors":"Deng-Yuan Huang, Chao-Ho Chen, Tsong-Yi Chen, Jia-En Li, Hsueh-Liang Hsiao, Da-Jinn Wang, Cheng-Kang Wen","doi":"10.1007/s11042-024-20222-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20222-x","url":null,"abstract":"<p>Motion-blurred images are usually generated when captured with a handheld or wearable video camera, owing to rapid movement of the camera or foreground (i.e., moving object captured). Most traditional algorithm-based approaches cannot effectively restore the nonlinear motion-blurred images. Deep learning network-based approaches with intensive computations have recently been developed for deblurring blind motion-blurred images. However, they still achieve limited effect in restoring the details of the images, especially for blurred nighttime images. To effectively deblur the blurred daytime and nighttime images, the proposed video deblurring method consists of three major parts: an image storage module (storing the previous deblurred frame), adjacent frames alignment module (performing optimal feature point selection and perspective transformation matrix), and video-deblurring neural network module (containing two sub-networks of single image deblurring and adjacent frames fusion deblurring). The proposed approach’s main strategy is to design a blurred attention block to extract more effective features (especially for nighttime images) to restore the edges or details of objects. Additionally, the skip connection is introduced into such two sub-networks to improve the model’s ability to fuse contextual features across different layers to enhance the deblurring effect further. Quantitative evaluations demonstrate that our method achieves an average PSNR of 32.401 dB and SSIM of 0.9107, surpassing the next-best method by 1.635 dB in PSNR and 0.0381 in SSIM. Such improvements reveal the effectiveness of the proposed approach in addressing deblurring challenges across both daytime and nighttime scenarios, especially for making the alphanumeric characters in the really blurred nighttime images legible.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"50 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s11042-024-20206-x
Huan Ouyang, Zheng Chang, Binghao Tang, Si Li
Radiology report generation aims to generate pathological assessments from given radiographic images accurately. Prior methods largely rely on autoregressive models, where the sequential token-by-token generation process always results in longer inference time and suffers from the sequential error accumulation. In order to enhance the efficiency of report generation without compromising diagnostic accuracy, we present a novel radiology report generation approach based on diffusion models. By integrating a graph-guided image feature extractor informed by a radiology knowledge graph, our model adeptly identifies critical abnormalities within images. We also introduce an auxiliary lesion classification loss mechanism using pseudo labels as supervision to align image features and textual disease keyword representations accurately. By adopting the accelerated sampling strategy inherent to diffusion models, our approach significantly reduces the inference time. Through comprehensive evaluation on the IU-Xray and MIMIC-CXR benchmarks, our approach outperforms autoregressive models in inference speed while maintaining high quality, offering a significant advancement in automating radiology report generation task.
{"title":"DMR $$^2$$ G: diffusion model for radiology report generation","authors":"Huan Ouyang, Zheng Chang, Binghao Tang, Si Li","doi":"10.1007/s11042-024-20206-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20206-x","url":null,"abstract":"<p>Radiology report generation aims to generate pathological assessments from given radiographic images accurately. Prior methods largely rely on autoregressive models, where the sequential token-by-token generation process always results in longer inference time and suffers from the sequential error accumulation. In order to enhance the efficiency of report generation without compromising diagnostic accuracy, we present a novel radiology report generation approach based on diffusion models. By integrating a graph-guided image feature extractor informed by a radiology knowledge graph, our model adeptly identifies critical abnormalities within images. We also introduce an auxiliary lesion classification loss mechanism using pseudo labels as supervision to align image features and textual disease keyword representations accurately. By adopting the accelerated sampling strategy inherent to diffusion models, our approach significantly reduces the inference time. Through comprehensive evaluation on the IU-Xray and MIMIC-CXR benchmarks, our approach outperforms autoregressive models in inference speed while maintaining high quality, offering a significant advancement in automating radiology report generation task.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the realm of extracting inter and intra-modal interactions, contemporary models often face challenges such as reduced computational efficiency, particularly when dealing with lengthy visual sequences. To address these issues, this study introduces an innovative model, the Cross-Sketch with Dual Gated Attention Network (CSDNet), designed to handle second-order intra- and inter-modal interactions by integrating a couple of attention modules. Leveraging bilinear pooling to effectively capture these second-order interactions typically requires substantial computational resources due to the processing of large-dimensional tensors. Due to these resource demands, the first module Cross-Sketch Attention (CSA) is proposed, which employs Cross-Tensor Sketch Pooling on attention features to reduce dimensionality while preserving crucial information without sacrificing caption quality. Furthermore, to enhance caption by integrating another novel attention module, Dual Gated Attention (DGA), which contributes additional spatial and channel-wise attention distributions to improve caption generation performance. Our method demonstrates significant computational efficiency improvements, reducing computation time per epoch by an average of 13.54% compared to the base model, which leads to expedited convergence and improved performance metrics. Additionally, we observe a 0.07% enhancement in the METEOR score compared to the base model. Through the application of reinforcement learning optimization, our model achieves a remarkable CIDEr-D score of 132.2% on the MS-COCO dataset. This consistently outperforms baseline performance across a comprehensive range of evaluation metrics.
{"title":"CSDNet: cross-sketch with dual gated attention for fine-grained image captioning network","authors":"Md. Shamim Hossain, Shamima Aktar, Md. Bipul Hossen, Mohammad Alamgir Hossain, Naijie Gu, Zhangjin Huang","doi":"10.1007/s11042-024-20220-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20220-z","url":null,"abstract":"<p>In the realm of extracting inter and intra-modal interactions, contemporary models often face challenges such as reduced computational efficiency, particularly when dealing with lengthy visual sequences. To address these issues, this study introduces an innovative model, the Cross-Sketch with Dual Gated Attention Network (CSDNet), designed to handle second-order intra- and inter-modal interactions by integrating a couple of attention modules. Leveraging bilinear pooling to effectively capture these second-order interactions typically requires substantial computational resources due to the processing of large-dimensional tensors. Due to these resource demands, the first module Cross-Sketch Attention (CSA) is proposed, which employs Cross-Tensor Sketch Pooling on attention features to reduce dimensionality while preserving crucial information without sacrificing caption quality. Furthermore, to enhance caption by integrating another novel attention module, Dual Gated Attention (DGA), which contributes additional spatial and channel-wise attention distributions to improve caption generation performance. Our method demonstrates significant computational efficiency improvements, reducing computation time per epoch by an average of 13.54% compared to the base model, which leads to expedited convergence and improved performance metrics. Additionally, we observe a 0.07% enhancement in the METEOR score compared to the base model. Through the application of reinforcement learning optimization, our model achieves a remarkable CIDEr-D score of 132.2% on the MS-COCO dataset. This consistently outperforms baseline performance across a comprehensive range of evaluation metrics.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"5 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s11042-024-20219-6
Noor Ul Ain Tahir, Zuping Zhang, Muhammad Asim, Sundas Iftikhar, Ahmed A. Abd El-Latif
Ensuring the safe navigation of autonomous vehicles in intelligent transportation system depends on their ability to detect pedestrians and vehicles. While transformer-based models for object detection have shown remarkable advancements, accurately identifying pedestrians and vehicles in adverse weather conditions remains a challenging task. Adverse weather introduces image quality degradation, leading to issues such as low contrast, reduced visibility, blurred edges, false detection, misdetection of tiny objects, and other impediments that further complicate the accuracy of detection. This paper introduces a novel Pedestrian and Vehicle Detection Model under adverse weather conditions, denoted as PVDM-YOLOv8l. In our proposed model, we first incorporate the Swin-Transformer method, which is designed for global extraction of feature of small objects to identify in poor visibility, into the YOLOv8l backbone structure. To enhance detection accuracy and address the impact of inaccurate features on recognition performance, CBAM is integrated between the neck and head networks of YOLOv8l, aiming to gather crucial information and obtain essential data. Finally, we adopted the loss function Wise-IOU v3. This function was implemented to mitigate the adverse effects of low-quality instances by minimizing negative gradients. Additionally, we enhanced and augmented the DAWN dataset and created a custom dataset, named DAWN2024, to cater to the specific requirements of our study. To verify the superiority of PVDM-YOLOV8l, its performance was compared against several commonly used object detectors, including YOLOv3, YOLOv3-tiny, YOLOv3-spp, YOLOv5, YOLOv6, and all the versions of YOLOv8 (n, m, s, l, and x) and some traditional models. The experimental results demonstrate that our proposed model achieved a 6.6%, 5.4%, 6%, and 5.1% improvement in precision, recall, F1-score and mean Average Precision (mAP) on the custom DAWN2024 dataset. This substantial improvement in accuracy indicates a significant leap in the capability of our model to detect pedestrians and vehicles under adverse weather conditions, which is crucial for the safe navigation of autonomous vehicles.
{"title":"PVDM-YOLOv8l: a solution for reliable pedestrian and vehicle detection in autonomous vehicles under adverse weather conditions","authors":"Noor Ul Ain Tahir, Zuping Zhang, Muhammad Asim, Sundas Iftikhar, Ahmed A. Abd El-Latif","doi":"10.1007/s11042-024-20219-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20219-6","url":null,"abstract":"<p>Ensuring the safe navigation of autonomous vehicles in intelligent transportation system depends on their ability to detect pedestrians and vehicles. While transformer-based models for object detection have shown remarkable advancements, accurately identifying pedestrians and vehicles in adverse weather conditions remains a challenging task. Adverse weather introduces image quality degradation, leading to issues such as low contrast, reduced visibility, blurred edges, false detection, misdetection of tiny objects, and other impediments that further complicate the accuracy of detection. This paper introduces a novel Pedestrian and Vehicle Detection Model under adverse weather conditions, denoted as PVDM-YOLOv8l. In our proposed model, we first incorporate the Swin-Transformer method, which is designed for global extraction of feature of small objects to identify in poor visibility, into the YOLOv8l backbone structure. To enhance detection accuracy and address the impact of inaccurate features on recognition performance, CBAM is integrated between the neck and head networks of YOLOv8l, aiming to gather crucial information and obtain essential data. Finally, we adopted the loss function Wise-IOU v3. This function was implemented to mitigate the adverse effects of low-quality instances by minimizing negative gradients. Additionally, we enhanced and augmented the DAWN dataset and created a custom dataset, named DAWN2024, to cater to the specific requirements of our study. To verify the superiority of PVDM-YOLOV8l, its performance was compared against several commonly used object detectors, including YOLOv3, YOLOv3-tiny, YOLOv3-spp, YOLOv5, YOLOv6, and all the versions of YOLOv8 (n, m, s, l, and x) and some traditional models. The experimental results demonstrate that our proposed model achieved a 6.6%, 5.4%, 6%, and 5.1% improvement in precision, recall, F1-score and mean Average Precision (mAP) on the custom DAWN2024 dataset. This substantial improvement in accuracy indicates a significant leap in the capability of our model to detect pedestrians and vehicles under adverse weather conditions, which is crucial for the safe navigation of autonomous vehicles.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"52 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s11042-024-19973-4
Dongli Wang, Xiaolin Zhu, Jinfu Liu, Zixin Zhang, Yan Zhou
Group activity recognition, which aims to understand the activity performed by a group of people, has attracted growing attention in the realm of computer vision over the past decade. In this paper, we propose a novel multi-dimensional convolution Transformer network for group activity recognition, which not only models spatial-temporal feature representations, but also combines channel information to analyze the spatial-temporal dependencies of individual actors. Specifically, we first construct a multi-scale feature extraction module in the feature extraction stage, which can exploit discriminative high-level and low-level feature representations. The multi-branching strategy combined with the dilated convolution can further capture multi-scale feature information in complex group scenarios. Then, to construct the inter-dependence among involved actors from different dimensions, we design a multi-dimensional convolution Transformer in the relational reasoning stage, which consists of the following three parts: a channel attention module, a spatial-temporal convolutional Transformer, and a spatial-temporal attention module. Finally, the final activity recognition result is obtained by using a softmax classifier. Extensive experiments on two public GAR datasets demonstrate that the recognition accuracy on the Volleyball Dataset and Collective Activity Dataset can reach 92.8% and 96.1%, respectively, which is a significant improvement compared with the mainstream methods in recent years.
群体活动识别旨在了解一群人所进行的活动,在过去十年中,它在计算机视觉领域引起了越来越多的关注。在本文中,我们提出了一种用于群体活动识别的新型多维卷积变换器网络,它不仅能建立时空特征表征模型,还能结合通道信息来分析单个参与者的时空依赖关系。具体来说,我们首先在特征提取阶段构建了一个多尺度特征提取模块,该模块可以利用具有区分性的高层和低层特征表征。多分支策略与扩张卷积相结合,可以进一步捕捉复杂群体场景中的多尺度特征信息。然后,为了从不同维度构建参与者之间的相互依存关系,我们在关系推理阶段设计了一个多维卷积变换器,它由以下三个部分组成:通道注意模块、时空卷积变换器和时空注意模块。最后,使用软最大分类器得出最终的活动识别结果。在两个公开的 GAR 数据集上进行的大量实验表明,排球数据集和集体活动数据集的识别准确率分别达到了 92.8% 和 96.1%,与近年来的主流方法相比有了显著提高。
{"title":"Multi-dimensional convolution transformer for group activity recognition","authors":"Dongli Wang, Xiaolin Zhu, Jinfu Liu, Zixin Zhang, Yan Zhou","doi":"10.1007/s11042-024-19973-4","DOIUrl":"https://doi.org/10.1007/s11042-024-19973-4","url":null,"abstract":"<p>Group activity recognition, which aims to understand the activity performed by a group of people, has attracted growing attention in the realm of computer vision over the past decade. In this paper, we propose a novel multi-dimensional convolution Transformer network for group activity recognition, which not only models spatial-temporal feature representations, but also combines channel information to analyze the spatial-temporal dependencies of individual actors. Specifically, we first construct a multi-scale feature extraction module in the feature extraction stage, which can exploit discriminative high-level and low-level feature representations. The multi-branching strategy combined with the dilated convolution can further capture multi-scale feature information in complex group scenarios. Then, to construct the inter-dependence among involved actors from different dimensions, we design a multi-dimensional convolution Transformer in the relational reasoning stage, which consists of the following three parts: a channel attention module, a spatial-temporal convolutional Transformer, and a spatial-temporal attention module. Finally, the final activity recognition result is obtained by using a softmax classifier. Extensive experiments on two public GAR datasets demonstrate that the recognition accuracy on the Volleyball Dataset and Collective Activity Dataset can reach 92.8% and 96.1%, respectively, which is a significant improvement compared with the mainstream methods in recent years.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"32 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s11042-024-20196-w
Sorel Bagio Nono Fotso, William Nodem Atchoffo, Armand C. Nzeukou, Jimmi Hervé Talla Mbé
This paper presents a novel lossless audio encryption algorithm based on a modified zigzag scrambling technique, SHA-256, DNA coding, cipher block chaining (CBC) mode, and the delayed Hopfield neural network (HNN). The algorithm mainly includes the scrambling and diffusion stages. In the scrambling stage, the audio signal is converted into a square matrix on which the modified zigzag scrambling technique is applied. Then follows the confusion stage in which bit-level permutation, DNA coding, and CBC mode are applied successively. Besides, the delayed HNN serving in the encryption process is controlled by the plain audio signal through the hash function SHA-256 to resist differential attack. The proposed algorithm has been assessed on ten audio signals using more than fourteen performance measures. Compare to the state-of-the-art, the obtained results show better performances. Indeed, higher resistance to differential attack is obtained; this is seen through higher values of number of sample change rate (NSCR) and unified average changing intensity (UACI). Also, more disorder is detected in the encrypted audio signal through higher values of the information entropy. Furthermore, the proposed algorithm possesses a larger key space arising from the high number of parameters of the delayed HNN, which results in a higher resistance to brute force attacks. A real-life implementation of the proposed encryption technique is achieved with a visible light communication (VLC) system; this highlights its feasibility and effectiveness in securing optical wireless communication systems.
{"title":"Enhanced security in lossless audio encryption using zigzag scrambling, DNA coding, SHA-256, and hopfield networks: a practical vlc system implementation","authors":"Sorel Bagio Nono Fotso, William Nodem Atchoffo, Armand C. Nzeukou, Jimmi Hervé Talla Mbé","doi":"10.1007/s11042-024-20196-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20196-w","url":null,"abstract":"<p>This paper presents a novel lossless audio encryption algorithm based on a modified zigzag scrambling technique, SHA-256, DNA coding, cipher block chaining (CBC) mode, and the delayed Hopfield neural network (HNN). The algorithm mainly includes the scrambling and diffusion stages. In the scrambling stage, the audio signal is converted into a square matrix on which the modified zigzag scrambling technique is applied. Then follows the confusion stage in which bit-level permutation, DNA coding, and CBC mode are applied successively. Besides, the delayed HNN serving in the encryption process is controlled by the plain audio signal through the hash function SHA-256 to resist differential attack. The proposed algorithm has been assessed on ten audio signals using more than fourteen performance measures. Compare to the state-of-the-art, the obtained results show better performances. Indeed, higher resistance to differential attack is obtained; this is seen through higher values of number of sample change rate (NSCR) and unified average changing intensity (UACI). Also, more disorder is detected in the encrypted audio signal through higher values of the information entropy. Furthermore, the proposed algorithm possesses a larger key space arising from the high number of parameters of the delayed HNN, which results in a higher resistance to brute force attacks. A real-life implementation of the proposed encryption technique is achieved with a visible light communication (VLC) system; this highlights its feasibility and effectiveness in securing optical wireless communication systems.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"96 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s11042-024-20121-1
Feyza Erdoğan, Murat Karakoyun, Şaban Gülcü
Metaheuristic algorithms are recommended and frequently used methods for solving optimization problems. Today, it has been adapted to many challenging problems and its successes have been identified. The grey wolf optimizer (GWO) is one of the most advanced metaheuristics. Because of the advantages it provides, GWO has been applied to solve many different problems. In this study, a new variant of GWO, the Binary Dynamic Grey Wolf Optimizer (BDGWO), is proposed for the solution of binary optimization problems. The main contributions of BDGWO compared to other binary GWO variants are that it uses the XOR bitwise operation to binarize and is based on the dynamic coefficient method developed to determine the effect of the three dominant wolves (alpha, beta, and delta) in the algorithm. BDGWO is a simple, feasible, and successful method that strikes a balance between local search and global search in solving binary optimization problems. To determine the success and accuracy of the proposed BDGWO, it was tested on the 0-1 knapsack problem (0-1 KP), which is classified as an NP-Hard problem. The BDGWO was compared with 17 different binary methods across a total of 55 data sets from three different studies published in the last four years. The Friedman test was applied to interpret the experimental results more easily and to evaluate the algorithm results statistically. As a result of the experiments, it has been proven that the BDGWO is an effective and successful method in accordance with its purpose.
{"title":"An effective binary dynamic grey wolf optimization algorithm for the 0-1 knapsack problem","authors":"Feyza Erdoğan, Murat Karakoyun, Şaban Gülcü","doi":"10.1007/s11042-024-20121-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20121-1","url":null,"abstract":"<p>Metaheuristic algorithms are recommended and frequently used methods for solving optimization problems. Today, it has been adapted to many challenging problems and its successes have been identified. The grey wolf optimizer (GWO) is one of the most advanced metaheuristics. Because of the advantages it provides, GWO has been applied to solve many different problems. In this study, a new variant of GWO, the Binary Dynamic Grey Wolf Optimizer (BDGWO), is proposed for the solution of binary optimization problems. The main contributions of BDGWO compared to other binary GWO variants are that it uses the XOR bitwise operation to binarize and is based on the dynamic coefficient method developed to determine the effect of the three dominant wolves (alpha, beta, and delta) in the algorithm. BDGWO is a simple, feasible, and successful method that strikes a balance between local search and global search in solving binary optimization problems. To determine the success and accuracy of the proposed BDGWO, it was tested on the 0-1 knapsack problem (0-1 KP), which is classified as an NP-Hard problem. The BDGWO was compared with 17 different binary methods across a total of 55 data sets from three different studies published in the last four years. The Friedman test was applied to interpret the experimental results more easily and to evaluate the algorithm results statistically. As a result of the experiments, it has been proven that the BDGWO is an effective and successful method in accordance with its purpose.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"4 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-16DOI: 10.1007/s11042-024-19975-2
Yan Zhou, Jingwei Liu, Jianxun Li, Haibin Zhou
The point cloud data collected by LiDAR is large in scale and contains rich spatial structure detail information, through the collection and labeling of LiDAR data, the automatic driving system can obtain detailed information about the environment around the vehicle. Due to lack of sufficient laser points, some methods transform the point cloud to dense representations such as multi-view or voxelized grids for processing, ignoring the information loss problem caused by the LiDAR imaging characteristics as well as the point cloud transformations, which leads to a degradation of the segmentation performance. In this work, We investigate a 3D semantic segmentation scheme with only LiDAR inputs, called voxel completion and 3D asymmetric convolution network. We propose a voxel completion sub-network to improve the feature extraction capability of the network by enlarging the receptive field and using multi-scale feature extraction to reduce the empty units in the voxels and obtain more complete voxel features. In addition, due to the presence of a large number of cubic objects in the autopilot scenario, to better match the autopilot scenario, we propose a 3D asymmetric convolution network that includes three components: a 3D residual block, an asymmetric convolution block, and a context module. These components are combined together to explore 3D geometric patterns, which can maintain their intrinsic properties and improve the performance of the network. Extensive experiments on the SemanticKITTI and nuScenes benchmark datasets demonstrate the superiority of the approach. For example, on the nuScenes validation set, our method outperforms the state-of-the-art method by 0.3% in mIoU.
{"title":"Voxel completion and 3D asymmetrical convolution networks for Lidar semantic segmentation","authors":"Yan Zhou, Jingwei Liu, Jianxun Li, Haibin Zhou","doi":"10.1007/s11042-024-19975-2","DOIUrl":"https://doi.org/10.1007/s11042-024-19975-2","url":null,"abstract":"<p>The point cloud data collected by LiDAR is large in scale and contains rich spatial structure detail information, through the collection and labeling of LiDAR data, the automatic driving system can obtain detailed information about the environment around the vehicle. Due to lack of sufficient laser points, some methods transform the point cloud to dense representations such as multi-view or voxelized grids for processing, ignoring the information loss problem caused by the LiDAR imaging characteristics as well as the point cloud transformations, which leads to a degradation of the segmentation performance. In this work, We investigate a 3D semantic segmentation scheme with only LiDAR inputs, called voxel completion and 3D asymmetric convolution network. We propose a voxel completion sub-network to improve the feature extraction capability of the network by enlarging the receptive field and using multi-scale feature extraction to reduce the empty units in the voxels and obtain more complete voxel features. In addition, due to the presence of a large number of cubic objects in the autopilot scenario, to better match the autopilot scenario, we propose a 3D asymmetric convolution network that includes three components: a 3D residual block, an asymmetric convolution block, and a context module. These components are combined together to explore 3D geometric patterns, which can maintain their intrinsic properties and improve the performance of the network. Extensive experiments on the SemanticKITTI and nuScenes benchmark datasets demonstrate the superiority of the approach. For example, on the nuScenes validation set, our method outperforms the state-of-the-art method by 0.3% in mIoU.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"33 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-14DOI: 10.1007/s11042-024-20193-z
Yanni Liu, Ayong Ye, Qiulin Chen, Yuexin Zhang, Jianwei Chen
Data-Free Knowledge Distillation (DFKD) can be used to train students using synthetic data, when the original dataset of the teacher network is not accessible. However, existing studies mainly focus on how to use the prior knowledge of the teacher network to synthesize data, ignoring the lack of diversity of synthesized data, which leads to the inability of the student network to learn the real data distribution and low robustness. In this paper, we propose a Diversity-Enhanced Data-Free Knowledge Distillation (DE-DFKD) method based on the idea of generative image modelling, which introduces conditional generative networks and metric learning to solve the problem of class imbalance and single intra-class data distribution in synthetic datasets. The experimental results show that DE-DFKD synthesizes better quality data on MNIST, CIFAR-10, and CIFAR-100 datasets with Frechet Inception Distance (FID) values of 51.79, 60.25, and 50.1, respectively, and higher accuracy of student networks compared with existing schemes.
{"title":"DE-DFKD: diversity enhancing data-free knowledge distillation","authors":"Yanni Liu, Ayong Ye, Qiulin Chen, Yuexin Zhang, Jianwei Chen","doi":"10.1007/s11042-024-20193-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20193-z","url":null,"abstract":"<p>Data-Free Knowledge Distillation (DFKD) can be used to train students using synthetic data, when the original dataset of the teacher network is not accessible. However, existing studies mainly focus on how to use the prior knowledge of the teacher network to synthesize data, ignoring the lack of diversity of synthesized data, which leads to the inability of the student network to learn the real data distribution and low robustness. In this paper, we propose a Diversity-Enhanced Data-Free Knowledge Distillation (DE-DFKD) method based on the idea of generative image modelling, which introduces conditional generative networks and metric learning to solve the problem of class imbalance and single intra-class data distribution in synthetic datasets. The experimental results show that DE-DFKD synthesizes better quality data on MNIST, CIFAR-10, and CIFAR-100 datasets with Frechet Inception Distance (FID) values of 51.79, 60.25, and 50.1, respectively, and higher accuracy of student networks compared with existing schemes.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"77 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the most difficult issues in cloud computing is scheduling tasks on appropriate resources on the cloud.This is significant because multiple tasks may need to be efficiently scheduled across different virtual machines to maximize resource utilization and minimize makespan. As a result, various efforts have been made to use metaheuristic algorithms to tackle the task scheduling problem. However, these techniques may occasionally experience early convergence and be trapped in local search. This research proposes a multi-objective-based task scheduling in cloud computing for big data applications to address these issues. To accomplish this goal, the adaptive Tasmanian Devil Optimization (ATDO) method is created in this study, with a focus on resolving challenging optimization issues. Following that, the opposition-based learning technique (OBL) is combined with TDO to maintain the population diversity and improve convergence on the ideal answer. In addition, cost, makespan,and resource utilization are taken into account when designing the multi-objective function (MOF). The proposed strategy included efficient solution representation, efficient fitness function derivation, TDO, and OBL operators. The effectiveness of the strategy is examined using several evaluation metrics, and its efficacy is compared with those of other approaches.The proposed method takes a minimum time of 2134 ms for scheduling 1000 tasks and 20.97 degree of imbalance.
{"title":"Adaptive Tasmanian Devil Optimization algorithm based efficient task scheduling for big data application in a cloud computing environment","authors":"Ashis Kumar Mishra, Subasis Mohapatra, Pradip Kumar Sahu","doi":"10.1007/s11042-024-19887-1","DOIUrl":"https://doi.org/10.1007/s11042-024-19887-1","url":null,"abstract":"<p>One of the most difficult issues in cloud computing is scheduling tasks on appropriate resources on the cloud.This is significant because multiple tasks may need to be efficiently scheduled across different virtual machines to maximize resource utilization and minimize makespan. As a result, various efforts have been made to use metaheuristic algorithms to tackle the task scheduling problem. However, these techniques may occasionally experience early convergence and be trapped in local search. This research proposes a multi-objective-based task scheduling in cloud computing for big data applications to address these issues. To accomplish this goal, the adaptive Tasmanian Devil Optimization (ATDO) method is created in this study, with a focus on resolving challenging optimization issues. Following that, the opposition-based learning technique (OBL) is combined with TDO to maintain the population diversity and improve convergence on the ideal answer. In addition, cost, makespan,and resource utilization are taken into account when designing the multi-objective function (MOF). The proposed strategy included efficient solution representation, efficient fitness function derivation, TDO, and OBL operators. The effectiveness of the strategy is examined using several evaluation metrics, and its efficacy is compared with those of other approaches.The proposed method takes a minimum time of 2134 ms for scheduling 1000 tasks and 20.97 degree of imbalance.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"19 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}