首页 > 最新文献

Multimedia Tools and Applications最新文献

英文 中文
CSDNet: cross-sketch with dual gated attention for fine-grained image captioning network CSDNet:针对细粒度图像标题网络的交叉草图与双重门控注意力
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-20220-z
Md. Shamim Hossain, Shamima Aktar, Md. Bipul Hossen, Mohammad Alamgir Hossain, Naijie Gu, Zhangjin Huang

In the realm of extracting inter and intra-modal interactions, contemporary models often face challenges such as reduced computational efficiency, particularly when dealing with lengthy visual sequences. To address these issues, this study introduces an innovative model, the Cross-Sketch with Dual Gated Attention Network (CSDNet), designed to handle second-order intra- and inter-modal interactions by integrating a couple of attention modules. Leveraging bilinear pooling to effectively capture these second-order interactions typically requires substantial computational resources due to the processing of large-dimensional tensors. Due to these resource demands, the first module Cross-Sketch Attention (CSA) is proposed, which employs Cross-Tensor Sketch Pooling on attention features to reduce dimensionality while preserving crucial information without sacrificing caption quality. Furthermore, to enhance caption by integrating another novel attention module, Dual Gated Attention (DGA), which contributes additional spatial and channel-wise attention distributions to improve caption generation performance. Our method demonstrates significant computational efficiency improvements, reducing computation time per epoch by an average of 13.54% compared to the base model, which leads to expedited convergence and improved performance metrics. Additionally, we observe a 0.07% enhancement in the METEOR score compared to the base model. Through the application of reinforcement learning optimization, our model achieves a remarkable CIDEr-D score of 132.2% on the MS-COCO dataset. This consistently outperforms baseline performance across a comprehensive range of evaluation metrics.

在提取模式间和模式内交互作用的领域,当代模型经常面临计算效率降低等挑战,尤其是在处理冗长的视觉序列时。为了解决这些问题,本研究引入了一个创新模型--双门控注意网络交叉草图(CSDNet),旨在通过整合几个注意模块来处理二阶模内和模间交互。由于需要处理大维度的张量,利用双线性集合来有效捕捉这些二阶交互通常需要大量的计算资源。考虑到这些资源需求,我们提出了第一个模块 "交叉草图注意力"(Cross-Sketch Attention,CSA),它在注意力特征上采用交叉张量草图池化技术,以降低维度,同时在不牺牲字幕质量的情况下保留关键信息。此外,为了提高字幕质量,我们还集成了另一个新颖的注意力模块--双门控注意力(Dual Gated Attention,DGA),它提供了额外的空间和通道注意力分布,从而提高了字幕生成性能。我们的方法显著提高了计算效率,与基础模型相比,每个历时的计算时间平均减少了 13.54%,从而加快了收敛速度并改善了性能指标。此外,与基础模型相比,我们观察到 METEOR 分数提高了 0.07%。通过应用强化学习优化,我们的模型在 MS-COCO 数据集上取得了 132.2% 的出色 CIDEr-D 分数。在一系列综合评估指标上,我们的模型始终优于基准性能。
{"title":"CSDNet: cross-sketch with dual gated attention for fine-grained image captioning network","authors":"Md. Shamim Hossain, Shamima Aktar, Md. Bipul Hossen, Mohammad Alamgir Hossain, Naijie Gu, Zhangjin Huang","doi":"10.1007/s11042-024-20220-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20220-z","url":null,"abstract":"<p>In the realm of extracting inter and intra-modal interactions, contemporary models often face challenges such as reduced computational efficiency, particularly when dealing with lengthy visual sequences. To address these issues, this study introduces an innovative model, the Cross-Sketch with Dual Gated Attention Network (CSDNet), designed to handle second-order intra- and inter-modal interactions by integrating a couple of attention modules. Leveraging bilinear pooling to effectively capture these second-order interactions typically requires substantial computational resources due to the processing of large-dimensional tensors. Due to these resource demands, the first module Cross-Sketch Attention (CSA) is proposed, which employs Cross-Tensor Sketch Pooling on attention features to reduce dimensionality while preserving crucial information without sacrificing caption quality. Furthermore, to enhance caption by integrating another novel attention module, Dual Gated Attention (DGA), which contributes additional spatial and channel-wise attention distributions to improve caption generation performance. Our method demonstrates significant computational efficiency improvements, reducing computation time per epoch by an average of 13.54% compared to the base model, which leads to expedited convergence and improved performance metrics. Additionally, we observe a 0.07% enhancement in the METEOR score compared to the base model. Through the application of reinforcement learning optimization, our model achieves a remarkable CIDEr-D score of 132.2% on the MS-COCO dataset. This consistently outperforms baseline performance across a comprehensive range of evaluation metrics.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PVDM-YOLOv8l: a solution for reliable pedestrian and vehicle detection in autonomous vehicles under adverse weather conditions PVDM-YOLOv8l:在恶劣天气条件下自动驾驶车辆可靠检测行人和车辆的解决方案
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-20219-6
Noor Ul Ain Tahir, Zuping Zhang, Muhammad Asim, Sundas Iftikhar, Ahmed A. Abd El-Latif

Ensuring the safe navigation of autonomous vehicles in intelligent transportation system depends on their ability to detect pedestrians and vehicles. While transformer-based models for object detection have shown remarkable advancements, accurately identifying pedestrians and vehicles in adverse weather conditions remains a challenging task. Adverse weather introduces image quality degradation, leading to issues such as low contrast, reduced visibility, blurred edges, false detection, misdetection of tiny objects, and other impediments that further complicate the accuracy of detection. This paper introduces a novel Pedestrian and Vehicle Detection Model under adverse weather conditions, denoted as PVDM-YOLOv8l. In our proposed model, we first incorporate the Swin-Transformer method, which is designed for global extraction of feature of small objects to identify in poor visibility, into the YOLOv8l backbone structure. To enhance detection accuracy and address the impact of inaccurate features on recognition performance, CBAM is integrated between the neck and head networks of YOLOv8l, aiming to gather crucial information and obtain essential data. Finally, we adopted the loss function Wise-IOU v3. This function was implemented to mitigate the adverse effects of low-quality instances by minimizing negative gradients. Additionally, we enhanced and augmented the DAWN dataset and created a custom dataset, named DAWN2024, to cater to the specific requirements of our study. To verify the superiority of PVDM-YOLOV8l, its performance was compared against several commonly used object detectors, including YOLOv3, YOLOv3-tiny, YOLOv3-spp, YOLOv5, YOLOv6, and all the versions of YOLOv8 (n, m, s, l, and x) and some traditional models. The experimental results demonstrate that our proposed model achieved a 6.6%, 5.4%, 6%, and 5.1% improvement in precision, recall, F1-score and mean Average Precision (mAP) on the custom DAWN2024 dataset. This substantial improvement in accuracy indicates a significant leap in the capability of our model to detect pedestrians and vehicles under adverse weather conditions, which is crucial for the safe navigation of autonomous vehicles.

确保智能交通系统中自动驾驶汽车的安全导航取决于其探测行人和车辆的能力。虽然基于变压器的物体检测模型已取得显著进步,但在恶劣天气条件下准确识别行人和车辆仍是一项具有挑战性的任务。恶劣天气会造成图像质量下降,导致对比度低、能见度降低、边缘模糊、错误检测、误检测微小物体等问题,使检测的准确性更加复杂。本文介绍了一种在恶劣天气条件下的新型行人和车辆检测模型,称为 PVDM-YOLOv8l。在我们提出的模型中,我们首先在 YOLOv8l 的骨干结构中加入了 Swin-Transformer 方法,该方法专为在能见度较低的情况下全局提取识别小物体的特征而设计。为了提高检测精度,解决特征不准确对识别性能的影响,我们在 YOLOv8l 的颈部和头部网络之间集成了 CBAM,旨在收集关键信息,获取重要数据。最后,我们采用了损失函数 Wise-IOU v3,通过最小化负梯度来减轻低质量实例的不利影响。此外,我们还对 DAWN 数据集进行了增强和扩充,并创建了一个名为 DAWN2024 的自定义数据集,以满足我们研究的特定要求。为了验证 PVDM-YOLOV8l 的优越性,我们将其性能与几种常用的物体检测器进行了比较,包括 YOLOv3、YOLOv3-tiny、YOLOv3-spp、YOLOv5、YOLOv6 和 YOLOv8 的所有版本(n、m、s、l 和 x)以及一些传统模型。实验结果表明,在定制的 DAWN2024 数据集上,我们提出的模型在精确度、召回率、F1 分数和平均精确度 (mAP) 方面分别提高了 6.6%、5.4%、6% 和 5.1%。精度的大幅提高表明,我们的模型在恶劣天气条件下检测行人和车辆的能力有了显著飞跃,这对自动驾驶汽车的安全导航至关重要。
{"title":"PVDM-YOLOv8l: a solution for reliable pedestrian and vehicle detection in autonomous vehicles under adverse weather conditions","authors":"Noor Ul Ain Tahir, Zuping Zhang, Muhammad Asim, Sundas Iftikhar, Ahmed A. Abd El-Latif","doi":"10.1007/s11042-024-20219-6","DOIUrl":"https://doi.org/10.1007/s11042-024-20219-6","url":null,"abstract":"<p>Ensuring the safe navigation of autonomous vehicles in intelligent transportation system depends on their ability to detect pedestrians and vehicles. While transformer-based models for object detection have shown remarkable advancements, accurately identifying pedestrians and vehicles in adverse weather conditions remains a challenging task. Adverse weather introduces image quality degradation, leading to issues such as low contrast, reduced visibility, blurred edges, false detection, misdetection of tiny objects, and other impediments that further complicate the accuracy of detection. This paper introduces a novel Pedestrian and Vehicle Detection Model under adverse weather conditions, denoted as PVDM-YOLOv8l. In our proposed model, we first incorporate the Swin-Transformer method, which is designed for global extraction of feature of small objects to identify in poor visibility, into the YOLOv8l backbone structure. To enhance detection accuracy and address the impact of inaccurate features on recognition performance, CBAM is integrated between the neck and head networks of YOLOv8l, aiming to gather crucial information and obtain essential data. Finally, we adopted the loss function Wise-IOU v3. This function was implemented to mitigate the adverse effects of low-quality instances by minimizing negative gradients. Additionally, we enhanced and augmented the DAWN dataset and created a custom dataset, named DAWN2024, to cater to the specific requirements of our study. To verify the superiority of PVDM-YOLOV8l, its performance was compared against several commonly used object detectors, including YOLOv3, YOLOv3-tiny, YOLOv3-spp, YOLOv5, YOLOv6, and all the versions of YOLOv8 (n, m, s, l, and x) and some traditional models. The experimental results demonstrate that our proposed model achieved a 6.6%, 5.4%, 6%, and 5.1% improvement in precision, recall, F1-score and mean Average Precision (mAP) on the custom DAWN2024 dataset. This substantial improvement in accuracy indicates a significant leap in the capability of our model to detect pedestrians and vehicles under adverse weather conditions, which is crucial for the safe navigation of autonomous vehicles.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-dimensional convolution transformer for group activity recognition 用于群体活动识别的多维卷积变换器
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-19973-4
Dongli Wang, Xiaolin Zhu, Jinfu Liu, Zixin Zhang, Yan Zhou

Group activity recognition, which aims to understand the activity performed by a group of people, has attracted growing attention in the realm of computer vision over the past decade. In this paper, we propose a novel multi-dimensional convolution Transformer network for group activity recognition, which not only models spatial-temporal feature representations, but also combines channel information to analyze the spatial-temporal dependencies of individual actors. Specifically, we first construct a multi-scale feature extraction module in the feature extraction stage, which can exploit discriminative high-level and low-level feature representations. The multi-branching strategy combined with the dilated convolution can further capture multi-scale feature information in complex group scenarios. Then, to construct the inter-dependence among involved actors from different dimensions, we design a multi-dimensional convolution Transformer in the relational reasoning stage, which consists of the following three parts: a channel attention module, a spatial-temporal convolutional Transformer, and a spatial-temporal attention module. Finally, the final activity recognition result is obtained by using a softmax classifier. Extensive experiments on two public GAR datasets demonstrate that the recognition accuracy on the Volleyball Dataset and Collective Activity Dataset can reach 92.8% and 96.1%, respectively, which is a significant improvement compared with the mainstream methods in recent years.

群体活动识别旨在了解一群人所进行的活动,在过去十年中,它在计算机视觉领域引起了越来越多的关注。在本文中,我们提出了一种用于群体活动识别的新型多维卷积变换器网络,它不仅能建立时空特征表征模型,还能结合通道信息来分析单个参与者的时空依赖关系。具体来说,我们首先在特征提取阶段构建了一个多尺度特征提取模块,该模块可以利用具有区分性的高层和低层特征表征。多分支策略与扩张卷积相结合,可以进一步捕捉复杂群体场景中的多尺度特征信息。然后,为了从不同维度构建参与者之间的相互依存关系,我们在关系推理阶段设计了一个多维卷积变换器,它由以下三个部分组成:通道注意模块、时空卷积变换器和时空注意模块。最后,使用软最大分类器得出最终的活动识别结果。在两个公开的 GAR 数据集上进行的大量实验表明,排球数据集和集体活动数据集的识别准确率分别达到了 92.8% 和 96.1%,与近年来的主流方法相比有了显著提高。
{"title":"Multi-dimensional convolution transformer for group activity recognition","authors":"Dongli Wang, Xiaolin Zhu, Jinfu Liu, Zixin Zhang, Yan Zhou","doi":"10.1007/s11042-024-19973-4","DOIUrl":"https://doi.org/10.1007/s11042-024-19973-4","url":null,"abstract":"<p>Group activity recognition, which aims to understand the activity performed by a group of people, has attracted growing attention in the realm of computer vision over the past decade. In this paper, we propose a novel multi-dimensional convolution Transformer network for group activity recognition, which not only models spatial-temporal feature representations, but also combines channel information to analyze the spatial-temporal dependencies of individual actors. Specifically, we first construct a multi-scale feature extraction module in the feature extraction stage, which can exploit discriminative high-level and low-level feature representations. The multi-branching strategy combined with the dilated convolution can further capture multi-scale feature information in complex group scenarios. Then, to construct the inter-dependence among involved actors from different dimensions, we design a multi-dimensional convolution Transformer in the relational reasoning stage, which consists of the following three parts: a channel attention module, a spatial-temporal convolutional Transformer, and a spatial-temporal attention module. Finally, the final activity recognition result is obtained by using a softmax classifier. Extensive experiments on two public GAR datasets demonstrate that the recognition accuracy on the Volleyball Dataset and Collective Activity Dataset can reach 92.8% and 96.1%, respectively, which is a significant improvement compared with the mainstream methods in recent years.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhanced security in lossless audio encryption using zigzag scrambling, DNA coding, SHA-256, and hopfield networks: a practical vlc system implementation 使用之字形加扰、DNA 编码、SHA-256 和 hopfield 网络增强无损音频加密的安全性:一个实用的 vlc 系统实现方案
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-20196-w
Sorel Bagio Nono Fotso, William Nodem Atchoffo, Armand C. Nzeukou, Jimmi Hervé Talla Mbé

This paper presents a novel lossless audio encryption algorithm based on a modified zigzag scrambling technique, SHA-256, DNA coding, cipher block chaining (CBC) mode, and the delayed Hopfield neural network (HNN). The algorithm mainly includes the scrambling and diffusion stages. In the scrambling stage, the audio signal is converted into a square matrix on which the modified zigzag scrambling technique is applied. Then follows the confusion stage in which bit-level permutation, DNA coding, and CBC mode are applied successively. Besides, the delayed HNN serving in the encryption process is controlled by the plain audio signal through the hash function SHA-256 to resist differential attack. The proposed algorithm has been assessed on ten audio signals using more than fourteen performance measures. Compare to the state-of-the-art, the obtained results show better performances. Indeed, higher resistance to differential attack is obtained; this is seen through higher values of number of sample change rate (NSCR) and unified average changing intensity (UACI). Also, more disorder is detected in the encrypted audio signal through higher values of the information entropy. Furthermore, the proposed algorithm possesses a larger key space arising from the high number of parameters of the delayed HNN, which results in a higher resistance to brute force attacks. A real-life implementation of the proposed encryption technique is achieved with a visible light communication (VLC) system; this highlights its feasibility and effectiveness in securing optical wireless communication systems.

本文提出了一种新型无损音频加密算法,该算法基于改进的之字形加扰技术、SHA-256、DNA 编码、密码块链(CBC)模式和延迟 Hopfield 神经网络(HNN)。该算法主要包括加扰和扩散两个阶段。在加扰阶段,音频信号被转换成一个正方形矩阵,在该矩阵上应用改进的之字形加扰技术。然后是混淆阶段,依次应用比特级排列、DNA 编码和 CBC 模式。此外,加密过程中的延迟 HNN 通过哈希函数 SHA-256 由纯音频信号控制,以抵御差分攻击。我们使用超过 14 项性能指标对所提出的算法在 10 个音频信号上进行了评估。与最先进的算法相比,所获得的结果显示出更好的性能。事实上,该算法具有更强的抗差分攻击能力,这体现在更高的采样变化率(NSCR)和统一平均变化强度(UACI)值上。此外,通过更高的信息熵值,加密音频信号中的无序性也被检测到。此外,由于延迟 HNN 的参数数量较多,因此所提出的算法具有更大的密钥空间,从而能更有效地抵御暴力攻击。在可见光通信(VLC)系统中实现了拟议加密技术的实际应用;这突出了其在确保光无线通信系统安全方面的可行性和有效性。
{"title":"Enhanced security in lossless audio encryption using zigzag scrambling, DNA coding, SHA-256, and hopfield networks: a practical vlc system implementation","authors":"Sorel Bagio Nono Fotso, William Nodem Atchoffo, Armand C. Nzeukou, Jimmi Hervé Talla Mbé","doi":"10.1007/s11042-024-20196-w","DOIUrl":"https://doi.org/10.1007/s11042-024-20196-w","url":null,"abstract":"<p>This paper presents a novel lossless audio encryption algorithm based on a modified zigzag scrambling technique, SHA-256, DNA coding, cipher block chaining (CBC) mode, and the delayed Hopfield neural network (HNN). The algorithm mainly includes the scrambling and diffusion stages. In the scrambling stage, the audio signal is converted into a square matrix on which the modified zigzag scrambling technique is applied. Then follows the confusion stage in which bit-level permutation, DNA coding, and CBC mode are applied successively. Besides, the delayed HNN serving in the encryption process is controlled by the plain audio signal through the hash function SHA-256 to resist differential attack. The proposed algorithm has been assessed on ten audio signals using more than fourteen performance measures. Compare to the state-of-the-art, the obtained results show better performances. Indeed, higher resistance to differential attack is obtained; this is seen through higher values of number of sample change rate (NSCR) and unified average changing intensity (UACI). Also, more disorder is detected in the encrypted audio signal through higher values of the information entropy. Furthermore, the proposed algorithm possesses a larger key space arising from the high number of parameters of the delayed HNN, which results in a higher resistance to brute force attacks. A real-life implementation of the proposed encryption technique is achieved with a visible light communication (VLC) system; this highlights its feasibility and effectiveness in securing optical wireless communication systems.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Voxel completion and 3D asymmetrical convolution networks for Lidar semantic segmentation 用于激光雷达语义分割的体素补全和三维非对称卷积网络
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-19975-2
Yan Zhou, Jingwei Liu, Jianxun Li, Haibin Zhou

The point cloud data collected by LiDAR is large in scale and contains rich spatial structure detail information, through the collection and labeling of LiDAR data, the automatic driving system can obtain detailed information about the environment around the vehicle. Due to lack of sufficient laser points, some methods transform the point cloud to dense representations such as multi-view or voxelized grids for processing, ignoring the information loss problem caused by the LiDAR imaging characteristics as well as the point cloud transformations, which leads to a degradation of the segmentation performance. In this work, We investigate a 3D semantic segmentation scheme with only LiDAR inputs, called voxel completion and 3D asymmetric convolution network. We propose a voxel completion sub-network to improve the feature extraction capability of the network by enlarging the receptive field and using multi-scale feature extraction to reduce the empty units in the voxels and obtain more complete voxel features. In addition, due to the presence of a large number of cubic objects in the autopilot scenario, to better match the autopilot scenario, we propose a 3D asymmetric convolution network that includes three components: a 3D residual block, an asymmetric convolution block, and a context module. These components are combined together to explore 3D geometric patterns, which can maintain their intrinsic properties and improve the performance of the network. Extensive experiments on the SemanticKITTI and nuScenes benchmark datasets demonstrate the superiority of the approach. For example, on the nuScenes validation set, our method outperforms the state-of-the-art method by 0.3% in mIoU.

激光雷达采集的点云数据尺度大,包含丰富的空间结构细节信息,通过对激光雷达数据的采集和标注,自动驾驶系统可以获得车辆周围环境的详细信息。由于缺乏足够的激光点,一些方法将点云转换为多视角或体素化网格等密集表示形式进行处理,忽略了激光雷达成像特性以及点云转换带来的信息丢失问题,导致分割性能下降。在这项工作中,我们研究了一种仅使用激光雷达输入的三维语义分割方案,称为体素完成和三维非对称卷积网络。我们提出了一个体素完成子网络,通过扩大感受野和使用多尺度特征提取来减少体素中的空单元,获得更完整的体素特征,从而提高网络的特征提取能力。此外,由于自动驾驶场景中存在大量立方体物体,为了更好地匹配自动驾驶场景,我们提出了一种三维非对称卷积网络,其中包括三个组件:三维残差块、非对称卷积块和上下文模块。这些组件结合在一起,共同探索三维几何模式,既能保持其固有特性,又能提高网络性能。在 SemanticKITTI 和 nuScenes 基准数据集上进行的大量实验证明了这种方法的优越性。例如,在 nuScenes 验证集上,我们的方法在 mIoU 方面比最先进的方法高出 0.3%。
{"title":"Voxel completion and 3D asymmetrical convolution networks for Lidar semantic segmentation","authors":"Yan Zhou, Jingwei Liu, Jianxun Li, Haibin Zhou","doi":"10.1007/s11042-024-19975-2","DOIUrl":"https://doi.org/10.1007/s11042-024-19975-2","url":null,"abstract":"<p>The point cloud data collected by LiDAR is large in scale and contains rich spatial structure detail information, through the collection and labeling of LiDAR data, the automatic driving system can obtain detailed information about the environment around the vehicle. Due to lack of sufficient laser points, some methods transform the point cloud to dense representations such as multi-view or voxelized grids for processing, ignoring the information loss problem caused by the LiDAR imaging characteristics as well as the point cloud transformations, which leads to a degradation of the segmentation performance. In this work, We investigate a 3D semantic segmentation scheme with only LiDAR inputs, called voxel completion and 3D asymmetric convolution network. We propose a voxel completion sub-network to improve the feature extraction capability of the network by enlarging the receptive field and using multi-scale feature extraction to reduce the empty units in the voxels and obtain more complete voxel features. In addition, due to the presence of a large number of cubic objects in the autopilot scenario, to better match the autopilot scenario, we propose a 3D asymmetric convolution network that includes three components: a 3D residual block, an asymmetric convolution block, and a context module. These components are combined together to explore 3D geometric patterns, which can maintain their intrinsic properties and improve the performance of the network. Extensive experiments on the SemanticKITTI and nuScenes benchmark datasets demonstrate the superiority of the approach. For example, on the nuScenes validation set, our method outperforms the state-of-the-art method by 0.3% in mIoU.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266315","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An effective binary dynamic grey wolf optimization algorithm for the 0-1 knapsack problem 0-1 "knapsack "问题的有效二元动态灰狼优化算法
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-16 DOI: 10.1007/s11042-024-20121-1
Feyza Erdoğan, Murat Karakoyun, Şaban Gülcü

Metaheuristic algorithms are recommended and frequently used methods for solving optimization problems. Today, it has been adapted to many challenging problems and its successes have been identified. The grey wolf optimizer (GWO) is one of the most advanced metaheuristics. Because of the advantages it provides, GWO has been applied to solve many different problems. In this study, a new variant of GWO, the Binary Dynamic Grey Wolf Optimizer (BDGWO), is proposed for the solution of binary optimization problems. The main contributions of BDGWO compared to other binary GWO variants are that it uses the XOR bitwise operation to binarize and is based on the dynamic coefficient method developed to determine the effect of the three dominant wolves (alpha, beta, and delta) in the algorithm. BDGWO is a simple, feasible, and successful method that strikes a balance between local search and global search in solving binary optimization problems. To determine the success and accuracy of the proposed BDGWO, it was tested on the 0-1 knapsack problem (0-1 KP), which is classified as an NP-Hard problem. The BDGWO was compared with 17 different binary methods across a total of 55 data sets from three different studies published in the last four years. The Friedman test was applied to interpret the experimental results more easily and to evaluate the algorithm results statistically. As a result of the experiments, it has been proven that the BDGWO is an effective and successful method in accordance with its purpose.

元启发式算法是解决优化问题的推荐和常用方法。如今,它已被应用于许多具有挑战性的问题,并取得了成功。灰狼优化器(GWO)是最先进的元启发式算法之一。由于它的优势,GWO 已被应用于解决许多不同的问题。本研究提出了 GWO 的一种新变体,即二元动态灰狼优化器(BDGWO),用于解决二元优化问题。与其他二进制 GWO 变体相比,BDGWO 的主要贡献在于它使用 XOR 位操作进行二进制化,并基于动态系数法来确定算法中三个主导灰狼(α、β 和 delta)的影响。BDGWO 是一种简单、可行且成功的方法,它在解决二进制优化问题时实现了局部搜索和全局搜索之间的平衡。为了确定所提出的 BDGWO 的成功率和准确性,我们在 0-1 Knapsack 问题(0-1 KP)上对其进行了测试,该问题被归类为 NP-Hard 问题。BDGWO 与 17 种不同的二进制方法进行了比较,共涉及 55 个数据集,这些数据集来自过去四年中发表的三项不同研究。为了更容易解释实验结果,并对算法结果进行统计评估,采用了弗里德曼检验法。实验结果证明,BDGWO 是一种有效且成功的方法,符合其目的。
{"title":"An effective binary dynamic grey wolf optimization algorithm for the 0-1 knapsack problem","authors":"Feyza Erdoğan, Murat Karakoyun, Şaban Gülcü","doi":"10.1007/s11042-024-20121-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20121-1","url":null,"abstract":"<p>Metaheuristic algorithms are recommended and frequently used methods for solving optimization problems. Today, it has been adapted to many challenging problems and its successes have been identified. The grey wolf optimizer (GWO) is one of the most advanced metaheuristics. Because of the advantages it provides, GWO has been applied to solve many different problems. In this study, a new variant of GWO, the Binary Dynamic Grey Wolf Optimizer (BDGWO), is proposed for the solution of binary optimization problems. The main contributions of BDGWO compared to other binary GWO variants are that it uses the XOR bitwise operation to binarize and is based on the dynamic coefficient method developed to determine the effect of the three dominant wolves (alpha, beta, and delta) in the algorithm. BDGWO is a simple, feasible, and successful method that strikes a balance between local search and global search in solving binary optimization problems. To determine the success and accuracy of the proposed BDGWO, it was tested on the 0-1 knapsack problem (0-1 KP), which is classified as an NP-Hard problem. The BDGWO was compared with 17 different binary methods across a total of 55 data sets from three different studies published in the last four years. The Friedman test was applied to interpret the experimental results more easily and to evaluate the algorithm results statistically. As a result of the experiments, it has been proven that the BDGWO is an effective and successful method in accordance with its purpose.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DE-DFKD: diversity enhancing data-free knowledge distillation DE-DFKD:多样性增强型无数据知识提炼
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-20193-z
Yanni Liu, Ayong Ye, Qiulin Chen, Yuexin Zhang, Jianwei Chen

Data-Free Knowledge Distillation (DFKD) can be used to train students using synthetic data, when the original dataset of the teacher network is not accessible. However, existing studies mainly focus on how to use the prior knowledge of the teacher network to synthesize data, ignoring the lack of diversity of synthesized data, which leads to the inability of the student network to learn the real data distribution and low robustness. In this paper, we propose a Diversity-Enhanced Data-Free Knowledge Distillation (DE-DFKD) method based on the idea of generative image modelling, which introduces conditional generative networks and metric learning to solve the problem of class imbalance and single intra-class data distribution in synthetic datasets. The experimental results show that DE-DFKD synthesizes better quality data on MNIST, CIFAR-10, and CIFAR-100 datasets with Frechet Inception Distance (FID) values of 51.79, 60.25, and 50.1, respectively, and higher accuracy of student networks compared with existing schemes.

当教师网络的原始数据集无法获取时,无数据知识蒸馏(DFKD)可用于使用合成数据训练学生。然而,现有研究主要关注如何利用教师网络的先验知识合成数据,忽略了合成数据缺乏多样性的问题,导致学生网络无法学习真实数据分布,鲁棒性较低。本文基于生成图像建模的思想,提出了一种多样性增强的无数据知识蒸馏(Diversity-Enhanced Data-Free Knowledge Distillation,DE-DFKD)方法,引入条件生成网络和度量学习来解决合成数据集中类不平衡和类内数据分布单一的问题。实验结果表明,与现有方案相比,DE-DFKD 在 MNIST、CIFAR-10 和 CIFAR-100 数据集上合成的数据质量更好,Frechet Inception Distance (FID) 值分别为 51.79、60.25 和 50.1,学生网络的准确率更高。
{"title":"DE-DFKD: diversity enhancing data-free knowledge distillation","authors":"Yanni Liu, Ayong Ye, Qiulin Chen, Yuexin Zhang, Jianwei Chen","doi":"10.1007/s11042-024-20193-z","DOIUrl":"https://doi.org/10.1007/s11042-024-20193-z","url":null,"abstract":"<p>Data-Free Knowledge Distillation (DFKD) can be used to train students using synthetic data, when the original dataset of the teacher network is not accessible. However, existing studies mainly focus on how to use the prior knowledge of the teacher network to synthesize data, ignoring the lack of diversity of synthesized data, which leads to the inability of the student network to learn the real data distribution and low robustness. In this paper, we propose a Diversity-Enhanced Data-Free Knowledge Distillation (DE-DFKD) method based on the idea of generative image modelling, which introduces conditional generative networks and metric learning to solve the problem of class imbalance and single intra-class data distribution in synthetic datasets. The experimental results show that DE-DFKD synthesizes better quality data on MNIST, CIFAR-10, and CIFAR-100 datasets with Frechet Inception Distance (FID) values of 51.79, 60.25, and 50.1, respectively, and higher accuracy of student networks compared with existing schemes.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Tasmanian Devil Optimization algorithm based efficient task scheduling for big data application in a cloud computing environment 基于自适应塔斯马尼亚魔鬼优化算法的高效任务调度,适用于云计算环境中的大数据应用
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-19887-1
Ashis Kumar Mishra, Subasis Mohapatra, Pradip Kumar Sahu

One of the most difficult issues in cloud computing is scheduling tasks on appropriate resources on the cloud.This is significant because multiple tasks may need to be efficiently scheduled across different virtual machines to maximize resource utilization and minimize makespan. As a result, various efforts have been made to use metaheuristic algorithms to tackle the task scheduling problem. However, these techniques may occasionally experience early convergence and be trapped in local search. This research proposes a multi-objective-based task scheduling in cloud computing for big data applications to address these issues. To accomplish this goal, the adaptive Tasmanian Devil Optimization (ATDO) method is created in this study, with a focus on resolving challenging optimization issues. Following that, the opposition-based learning technique (OBL) is combined with TDO to maintain the population diversity and improve convergence on the ideal answer. In addition, cost, makespan,and resource utilization are taken into account when designing the multi-objective function (MOF). The proposed strategy included efficient solution representation, efficient fitness function derivation, TDO, and OBL operators. The effectiveness of the strategy is examined using several evaluation metrics, and its efficacy is compared with those of other approaches.The proposed method takes a minimum time of 2134 ms for scheduling 1000 tasks and 20.97 degree of imbalance.

云计算中最困难的问题之一是在云上的适当资源上调度任务。这一点非常重要,因为多个任务可能需要在不同的虚拟机上有效调度,以最大限度地提高资源利用率,最小化时间跨度。因此,人们一直在努力使用元启发式算法来解决任务调度问题。然而,这些技术偶尔会出现早期收敛并陷入局部搜索。为解决这些问题,本研究提出了一种基于多目标的云计算大数据应用任务调度方法。为实现这一目标,本研究创建了自适应塔斯马尼亚魔鬼优化(ATDO)方法,重点解决具有挑战性的优化问题。随后,基于对立面的学习技术(OBL)与 TDO 相结合,以保持种群的多样性,提高对理想答案的收敛性。此外,在设计多目标函数(MOF)时,还考虑了成本、工期和资源利用率。所提出的策略包括高效的解决方案表示、高效的适应度函数推导、TDO 和 OBL 算子。所提出的方法在调度 1000 个任务和 20.97 度不平衡时耗时最少,为 2134 毫秒。
{"title":"Adaptive Tasmanian Devil Optimization algorithm based efficient task scheduling for big data application in a cloud computing environment","authors":"Ashis Kumar Mishra, Subasis Mohapatra, Pradip Kumar Sahu","doi":"10.1007/s11042-024-19887-1","DOIUrl":"https://doi.org/10.1007/s11042-024-19887-1","url":null,"abstract":"<p>One of the most difficult issues in cloud computing is scheduling tasks on appropriate resources on the cloud.This is significant because multiple tasks may need to be efficiently scheduled across different virtual machines to maximize resource utilization and minimize makespan. As a result, various efforts have been made to use metaheuristic algorithms to tackle the task scheduling problem. However, these techniques may occasionally experience early convergence and be trapped in local search. This research proposes a multi-objective-based task scheduling in cloud computing for big data applications to address these issues. To accomplish this goal, the adaptive Tasmanian Devil Optimization (ATDO) method is created in this study, with a focus on resolving challenging optimization issues. Following that, the opposition-based learning technique (OBL) is combined with TDO to maintain the population diversity and improve convergence on the ideal answer. In addition, cost, makespan,and resource utilization are taken into account when designing the multi-objective function (MOF). The proposed strategy included efficient solution representation, efficient fitness function derivation, TDO, and OBL operators. The effectiveness of the strategy is examined using several evaluation metrics, and its efficacy is compared with those of other approaches.The proposed method takes a minimum time of 2134 ms for scheduling 1000 tasks and 20.97 degree of imbalance.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parkinson's disease diagnosis by voice data using particle swarm optimization-extreme learning machine approach 利用粒子群优化-极端学习机方法通过语音数据诊断帕金森病
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-20108-y
Musatafa Abbas Abbood Albadr, Masri Ayob, Sabrina Tiun, Raad Z. Homod, Fahad Taha AL-Dhief, Mohammed Hasan Mutar

Various speech processing approaches (e.g., acoustic feature extraction techniques) and Machine Learning (ML) algorithms have been applied to diagnosing Parkinson's disease (PD). However, the majority of these researches have used conventional techniques which obtain a low accuracy rate in diagnosing PD and still need further improvement. Particle Swarm Optimization-Extreme Learning Machine (PSO-ELM), one of the most recent and effective ML techniques, could be considered an accurate strategy in the classification process but has not been applied to solve the problem of PD diagnosis. Thus, in order to enhance the precision of the PD diagnosing, this study employs the PSO-ELM classifier and examines how well it performs on seven feature extraction techniques (basic features, WT (Wavelet Transform), MFCC (Mel Frequency Cepstral Coefficients), bandwidth + formant, intensity parameters, TQWT (Tunable Q-factor Wavelet Transform), and vocal fold features). The PSO-ELM approach has the capability to a) prevents overfitting, b) solve the binary and multi class classification issues, and c) perform like a kernel-based support vector machine with a structure of neural network. Therefore, if the combination of PSO-ELM classifier and appropriate feature extraction technique can improve learning performance, this combination can produce an effective method for identifying PD. In this study, the PD's voice samples have been taken from the Parkinson’s Disease Classification Benchmark Dataset. To discover a useful feature extraction technique to couple with the PSO-ELM classifier, we applied PSO-ELM to each extracted feature with the utilisation of unbalanced and balanced dataset. According to the experimental results, the MFCC features assist the PSO-ELM classifier to attaining its greatest accuracy, up to 97.35% using unbalanced dataset and 100.00% using balanced dataset. This shows that combining PSO-ELM with MFCC can improve learning performance, ultimately creating an effective method for identifying PD.

各种语音处理方法(如声学特征提取技术)和机器学习(ML)算法已被应用于帕金森病(PD)的诊断。然而,这些研究大多采用传统技术,诊断帕金森病的准确率较低,仍需进一步改进。粒子群优化-极限学习机(PSO-ELM)是最新、最有效的多语言学习技术之一,可被视为分类过程中的一种精确策略,但尚未被应用于解决帕金森病诊断问题。因此,为了提高 PD 诊断的精确度,本研究采用了 PSO-ELM 分类器,并考察了它在七种特征提取技术(基本特征、WT(小波变换)、MFCC(梅尔频率倒频谱系数)、带宽 + 共振声、强度参数、TQWT(可调谐 Q 因子小波变换)和声带褶皱特征)上的表现。PSO-ELM 方法具有以下能力:a) 防止过拟合;b) 解决二元分类和多类分类问题;c) 像具有神经网络结构的基于核的支持向量机一样运行。因此,如果 PSO-ELM 分类器与适当的特征提取技术相结合能提高学习性能,那么这种组合就能产生一种识别 PD 的有效方法。本研究中的帕金森病语音样本来自帕金森病分类基准数据集。为了找到与 PSO-ELM 分类器相匹配的有用特征提取技术,我们利用非平衡和平衡数据集对每个提取的特征应用了 PSO-ELM。实验结果显示,MFCC 特征帮助 PSO-ELM 分类器获得了最高的准确率,使用非平衡数据集时高达 97.35%,使用平衡数据集时高达 100.00%。这表明,PSO-ELM 与 MFCC 的结合可以提高学习性能,最终创造出一种识别 PD 的有效方法。
{"title":"Parkinson's disease diagnosis by voice data using particle swarm optimization-extreme learning machine approach","authors":"Musatafa Abbas Abbood Albadr, Masri Ayob, Sabrina Tiun, Raad Z. Homod, Fahad Taha AL-Dhief, Mohammed Hasan Mutar","doi":"10.1007/s11042-024-20108-y","DOIUrl":"https://doi.org/10.1007/s11042-024-20108-y","url":null,"abstract":"<p>Various speech processing approaches (e.g., acoustic feature extraction techniques) and Machine Learning (ML) algorithms have been applied to diagnosing Parkinson's disease (PD). However, the majority of these researches have used conventional techniques which obtain a low accuracy rate in diagnosing PD and still need further improvement. Particle Swarm Optimization-Extreme Learning Machine (PSO-ELM), one of the most recent and effective ML techniques, could be considered an accurate strategy in the classification process but has not been applied to solve the problem of PD diagnosis. Thus, in order to enhance the precision of the PD diagnosing, this study employs the PSO-ELM classifier and examines how well it performs on seven feature extraction techniques (basic features, WT (Wavelet Transform), MFCC (Mel Frequency Cepstral Coefficients), bandwidth + formant, intensity parameters, TQWT (Tunable Q-factor Wavelet Transform), and vocal fold features). The PSO-ELM approach has the capability to <b>a)</b> prevents overfitting, <b>b)</b> solve the binary and multi class classification issues, and <b>c)</b> perform like a kernel-based support vector machine with a structure of neural network. Therefore, if the combination of PSO-ELM classifier and appropriate feature extraction technique can improve learning performance, this combination can produce an effective method for identifying PD. In this study, the PD's voice samples have been taken from the Parkinson’s Disease Classification Benchmark Dataset. To discover a useful feature extraction technique to couple with the PSO-ELM classifier, we applied PSO-ELM to each extracted feature with the utilisation of unbalanced and balanced dataset. According to the experimental results, the MFCC features assist the PSO-ELM classifier to attaining its greatest accuracy, up to 97.35% using unbalanced dataset and 100.00% using balanced dataset. This shows that combining PSO-ELM with MFCC can improve learning performance, ultimately creating an effective method for identifying PD.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Principal component fusion based unexposed biological feature enhancement of fundus images 基于主成分融合的眼底图像未曝光生物特征增强技术
IF 3.6 4区 计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS Pub Date : 2024-09-14 DOI: 10.1007/s11042-024-20110-4
Neha Singh, Ashish Kumar Bhandari

In the field of ophthalmology, digital images play an important role for automatic detection of various kind of eye diseases. Digital images in the field image enhancement are the first stage to assisting ophthalmologist for diagnosis. As a result, various algorithms, and methods for the enhancement of retinal images have been developed, which may face obstacles that are common in augmentation processes, such as false edges and weak illuminated that obscure image particulars. To eliminate such issues, this paper projected a novel framework for unexposed retinal image. The proposed paper uses multiscale Gaussian function for estimation of illumination layer from unexposed color retinal image and then it is corrected by gamma method. Further to this, the principal component analysis (PCA) is utilized here to generate fused enhance result for unexposed retinal images. Then, contrast limited technique is employed here for further edge and contextual details improvement. When compared to several enhancement-based state-of-the-art procedures, experimental results show that the suggested method produces results with good contrast and brightness. The significance of the proposed method that this method may help ophthalmologists screen for unexposed retinal illnesses more efficiently and build better automated image analysis for healthcare diagnosis.

在眼科领域,数字图像在自动检测各种眼疾方面发挥着重要作用。图像增强领域的数字图像是协助眼科医生进行诊断的第一道工序。因此,人们开发了各种用于增强视网膜图像的算法和方法,这些算法和方法可能会面临增强过程中常见的障碍,如模糊图像细节的假边缘和弱照明。为了消除这些问题,本文提出了一种新颖的未曝光视网膜图像框架。本文使用多尺度高斯函数来估计未曝光彩色视网膜图像的光照层,然后用伽马方法对其进行校正。此外,本文还利用主成分分析法(PCA)生成未曝光视网膜图像的融合增强结果。然后,采用对比度限制技术进一步改善边缘和背景细节。实验结果表明,与几种基于增强技术的最先进程序相比,所建议的方法能产生具有良好对比度和亮度的结果。该方法的重要意义在于,它可以帮助眼科医生更有效地筛查未暴露的视网膜疾病,并为医疗诊断提供更好的自动图像分析。
{"title":"Principal component fusion based unexposed biological feature enhancement of fundus images","authors":"Neha Singh, Ashish Kumar Bhandari","doi":"10.1007/s11042-024-20110-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20110-4","url":null,"abstract":"<p>In the field of ophthalmology, digital images play an important role for automatic detection of various kind of eye diseases. Digital images in the field image enhancement are the first stage to assisting ophthalmologist for diagnosis. As a result, various algorithms, and methods for the enhancement of retinal images have been developed, which may face obstacles that are common in augmentation processes, such as false edges and weak illuminated that obscure image particulars. To eliminate such issues, this paper projected a novel framework for unexposed retinal image. The proposed paper uses multiscale Gaussian function for estimation of illumination layer from unexposed color retinal image and then it is corrected by gamma method. Further to this, the principal component analysis (PCA) is utilized here to generate fused enhance result for unexposed retinal images. Then, contrast limited technique is employed here for further edge and contextual details improvement. When compared to several enhancement-based state-of-the-art procedures, experimental results show that the suggested method produces results with good contrast and brightness. The significance of the proposed method that this method may help ophthalmologists screen for unexposed retinal illnesses more efficiently and build better automated image analysis for healthcare diagnosis.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":null,"pages":null},"PeriodicalIF":3.6,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Multimedia Tools and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1