Multimedia Tools and Applications最新文献_第5页

A hybrid diabetes risk prediction model XGB-ILSO-1DCNN 混合糖尿病风险预测模型 XGB-ILSO-1DCNN

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20155-5

Huifang Feng, Yanan Hui

Accurately predicting the risk of diabetes is of paramount importance for early intervention and prevention. To achieve precise diabetes risk prediction, we propose a hybrid diabetes risk prediction model, XGB-ILSO-1DCNN, which combines the Extreme Gradient Boosting (XGBoost) algorithm, the Improved Lion Swarm Optimization algorithm, and the deep learning model 1DCNN. Firstly, an XGBoost is trained based on the raw data and the prediction result based on XGBoost is regarded as a new feature, concatenating it with the original features to form a new feature set. Then, we introduce a hybrid approach called ILSO-1DCNN, which is based on improved Lion Swarm Optimization (ILSO) and one-dimensional convolutional neural network (1DCNN). This approach is proposed for diabetes risk prediction. The ILSO-1DCNN algorithm utilizes the optimization capabilities of ILSO to automatically determine the hyperparameters of the 1DCNN network. Finally, we conducted comprehensive experiments on the PIMA dataset and compared our model with baseline models. The experimental results not only demonstrate our model's exceptional predictive performance across various evaluation criteria but also highlight its efficiency and low complexity. This study introduces a novel and effective diabetes risk prediction approach, making it a valuable tool for clinical analysis in the care of diabetic patients.

准确预测糖尿病风险对于早期干预和预防至关重要。为了实现精准的糖尿病风险预测，我们提出了一种混合糖尿病风险预测模型--XGB-ILSO-1DCNN，它结合了极梯度提升（XGBoost）算法、改进狮群优化算法和深度学习模型 1DCNN。首先，基于原始数据训练 XGBoost，并将基于 XGBoost 的预测结果视为新特征，与原始特征串联形成新特征集。然后，我们介绍了一种名为 ILSO-1DCNN 的混合方法，它基于改进的狮群优化（ILSO）和一维卷积神经网络（1DCNN）。该方法是针对糖尿病风险预测而提出的。ILSO-1DCNN 算法利用 ILSO 的优化功能自动确定 1DCNN 网络的超参数。最后，我们在 PIMA 数据集上进行了综合实验，并将我们的模型与基线模型进行了比较。实验结果不仅证明了我们的模型在各种评估标准中都具有卓越的预测性能，还突出了它的高效性和低复杂性。本研究介绍了一种新颖有效的糖尿病风险预测方法，使其成为糖尿病患者护理临床分析的重要工具。

{"title":"A hybrid diabetes risk prediction model XGB-ILSO-1DCNN","authors":"Huifang Feng, Yanan Hui","doi":"10.1007/s11042-024-20155-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20155-5","url":null,"abstract":"Accurately predicting the risk of diabetes is of paramount importance for early intervention and prevention. To achieve precise diabetes risk prediction, we propose a hybrid diabetes risk prediction model, XGB-ILSO-1DCNN, which combines the Extreme Gradient Boosting (XGBoost) algorithm, the Improved Lion Swarm Optimization algorithm, and the deep learning model 1DCNN. Firstly, an XGBoost is trained based on the raw data and the prediction result based on XGBoost is regarded as a new feature, concatenating it with the original features to form a new feature set. Then, we introduce a hybrid approach called ILSO-1DCNN, which is based on improved Lion Swarm Optimization (ILSO) and one-dimensional convolutional neural network (1DCNN). This approach is proposed for diabetes risk prediction. The ILSO-1DCNN algorithm utilizes the optimization capabilities of ILSO to automatically determine the hyperparameters of the 1DCNN network. Finally, we conducted comprehensive experiments on the PIMA dataset and compared our model with baseline models. The experimental results not only demonstrate our model's exceptional predictive performance across various evaluation criteria but also highlight its efficiency and low complexity. This study introduces a novel and effective diabetes risk prediction approach, making it a valuable tool for clinical analysis in the care of diabetic patients.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Underwater images enhancement using contrast limited adaptive parameter settings histogram equalization 利用对比度受限的自适应参数设置直方图均衡增强水下图像

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20210-1

Yahui Chen, Yitao Liang

CLAHE is widely used in underwater image processing because of its excellent performance in contrast enhancement. The selection of the clip point formula is the core problem of the CLAHE methods, and the selection of suitable clipping value has become the focus of some extended methods. In this paper, an automatic CLAHE underwater image enhancement algorithm is proposed. The method determines the clipping value according to the high-order moment dynamic features of each block of the underwater image. By quantifying the dynamic features of each block in the image more precisely, and then adding it to the clipping value formula, the contrast and details of the underwater image can be effectively enhanced. In order to effectively improve the saturation and brightness of underwater images, this paper chooses a more accurate and intuitive HSV model. Experimental results show that our methods enhance the contrast subjectively, while suppressing the amplification of noise very well, and also increase the saturation of underwater images. In objective metrics, our method obtains the best values in underwater quality assessment (UIQM), SSIM, and PSNR.

CLAHE 因其在对比度增强方面的优异性能而被广泛应用于水下图像处理。剪辑点公式的选择是 CLAHE 方法的核心问题，如何选择合适的剪辑值也成为一些扩展方法的重点。本文提出了一种自动 CLAHE 水下图像增强算法。该方法根据水下图像每个区块的高阶矩动态特征来确定剪辑值。通过更精确地量化图像中每个区块的动态特征，然后将其加入到裁剪值公式中，可以有效增强水下图像的对比度和细节。为了有效提高水下图像的饱和度和亮度，本文选择了一种更精确、更直观的 HSV 模型。实验结果表明，我们的方法在主观上增强了对比度，同时很好地抑制了噪声的放大，还提高了水下图像的饱和度。在客观指标方面，我们的方法在水下质量评估（UIQM）、SSIM 和 PSNR 方面获得了最佳值。

{"title":"Underwater images enhancement using contrast limited adaptive parameter settings histogram equalization","authors":"Yahui Chen, Yitao Liang","doi":"10.1007/s11042-024-20210-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20210-1","url":null,"abstract":"CLAHE is widely used in underwater image processing because of its excellent performance in contrast enhancement. The selection of the clip point formula is the core problem of the CLAHE methods, and the selection of suitable clipping value has become the focus of some extended methods. In this paper, an automatic CLAHE underwater image enhancement algorithm is proposed. The method determines the clipping value according to the high-order moment dynamic features of each block of the underwater image. By quantifying the dynamic features of each block in the image more precisely, and then adding it to the clipping value formula, the contrast and details of the underwater image can be effectively enhanced. In order to effectively improve the saturation and brightness of underwater images, this paper chooses a more accurate and intuitive HSV model. Experimental results show that our methods enhance the contrast subjectively, while suppressing the amplification of noise very well, and also increase the saturation of underwater images. In objective metrics, our method obtains the best values in underwater quality assessment (UIQM), SSIM, and PSNR.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"106 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancements in brain tumor analysis: a comprehensive review of machine learning, hybrid deep learning, and transfer learning approaches for MRI-based classification and segmentation 脑肿瘤分析的进展：基于 MRI 分类和分割的机器学习、混合深度学习和迁移学习方法综述

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20203-0

Surajit Das, Rajat Subhra Goswami

Brain tumors, whether cancerous or noncancerous, can be life-threatening due to abnormal cell growth, potentially causing organ dysfunction and mortality in adults. Brain tumor segmentation (BTS) and brain tumor classification (BTC) technologies are crucial in diagnosing and treating brain tumors. They assist doctors in locating and measuring tumors and developing treatment and rehabilitation strategies. Despite their importance in the medical field, BTC and BTS remain challenging. This comprehensive review specifically analyses machine and deep learning methodologies, including convolutional neural networks (CNN), transfer learning (TL), and hybrid models for BTS and BTC. We discuss CNN architectures like U-Net++, which is known for its high segmentation accuracy in 2D and 3D medical images. Additionally, transfer learning utilises pre-trained models such as ResNet, Inception, etc., from ImageNet, fine-tuned on brain tumor-specific datasets to enhance classification performance and sensitivity despite limited medical data. Hybrid models combine deep learning techniques with machine learning, using CNN for initial segmentation and traditional classification methods, improving accuracy. We discuss commonly used benchmark datasets in brain tumors research, including the BraTS dataset and the TCIA database, and evaluate performance metrics, such as the F1-score, accuracy, sensitivity, specificity, and the dice coefficient, emphasising their significance and standard thresholds in brain tumors analysis. The review addresses current machine learning (ML) and deep learning (DL) based BTS and BTC challenges and proposes solutions such as explainable deep learning models and multi-task learning frameworks. These insights aim to guide future advancements in fostering the development of accurate and efficient tools for improved patient care in brain tumors analysis.

脑肿瘤，无论是癌症还是非癌症，都可能因细胞异常生长而危及生命，并可能导致器官功能障碍和成人死亡。脑肿瘤分割（BTS）和脑肿瘤分类（BTC）技术是诊断和治疗脑肿瘤的关键。它们有助于医生定位和测量肿瘤，并制定治疗和康复策略。尽管它们在医学领域非常重要，但脑肿瘤分类（BTC）和脑肿瘤分级（BTS）仍然充满挑战。这篇综述专门分析了机器学习和深度学习方法，包括卷积神经网络（CNN）、迁移学习（TL）以及用于 BTS 和 BTC 的混合模型。我们讨论了 U-Net++ 等卷积神经网络架构，该架构因其在二维和三维医学图像中的高分割准确性而闻名。此外，迁移学习利用来自 ImageNet 的 ResNet、Inception 等预训练模型，在脑肿瘤特定数据集上进行微调，以提高分类性能和灵敏度，尽管医疗数据有限。混合模型将深度学习技术与机器学习相结合，使用 CNN 进行初始分割并采用传统分类方法，从而提高了准确性。我们讨论了脑肿瘤研究中常用的基准数据集，包括 BraTS 数据集和 TCIA 数据库，并评估了 F1 分数、准确率、灵敏度、特异性和骰子系数等性能指标，强调了它们在脑肿瘤分析中的重要性和标准阈值。综述探讨了当前基于机器学习（ML）和深度学习（DL）的 BTS 和 BTC 挑战，并提出了可解释的深度学习模型和多任务学习框架等解决方案。这些见解旨在指导未来的进步，促进准确、高效工具的开发，改善脑肿瘤分析中的患者护理。

{"title":"Advancements in brain tumor analysis: a comprehensive review of machine learning, hybrid deep learning, and transfer learning approaches for MRI-based classification and segmentation","authors":"Surajit Das, Rajat Subhra Goswami","doi":"10.1007/s11042-024-20203-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20203-0","url":null,"abstract":"Brain tumors, whether cancerous or noncancerous, can be life-threatening due to abnormal cell growth, potentially causing organ dysfunction and mortality in adults. Brain tumor segmentation (BTS) and brain tumor classification (BTC) technologies are crucial in diagnosing and treating brain tumors. They assist doctors in locating and measuring tumors and developing treatment and rehabilitation strategies. Despite their importance in the medical field, BTC and BTS remain challenging. This comprehensive review specifically analyses machine and deep learning methodologies, including convolutional neural networks (CNN), transfer learning (TL), and hybrid models for BTS and BTC. We discuss CNN architectures like U-Net++, which is known for its high segmentation accuracy in 2D and 3D medical images. Additionally, transfer learning utilises pre-trained models such as ResNet, Inception, etc., from ImageNet, fine-tuned on brain tumor-specific datasets to enhance classification performance and sensitivity despite limited medical data. Hybrid models combine deep learning techniques with machine learning, using CNN for initial segmentation and traditional classification methods, improving accuracy. We discuss commonly used benchmark datasets in brain tumors research, including the BraTS dataset and the TCIA database, and evaluate performance metrics, such as the F1-score, accuracy, sensitivity, specificity, and the dice coefficient, emphasising their significance and standard thresholds in brain tumors analysis. The review addresses current machine learning (ML) and deep learning (DL) based BTS and BTC challenges and proposes solutions such as explainable deep learning models and multi-task learning frameworks. These insights aim to guide future advancements in fostering the development of accurate and efficient tools for improved patient care in brain tumors analysis.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A systematic review of multilabel chest X-ray classification using deep learning 利用深度学习对多标签胸部 X 光片分类进行系统回顾

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20172-4

Uswatun Hasanah, Jenq-Shiou Leu, Cries Avian, Ihsanul Azmi, Setya Widyawan Prakosa

Chest X-ray scans are one of the most often used diagnostic tools for identifying chest diseases. However, identifying diseases in X-ray images needs experienced technicians and is frequently noted as a time-consuming process with varying levels of interpretation. In particular circumstances, disease identification through images is a challenge for human observers. Recent advances in deep learning have opened up new possibilities for using this technique to diagnose diseases. However, further implementation requires prior knowledge of strategy and appropriate architecture design. Revealing this information, will enable faster implementation and encounter potential issues produced by specific designs, especially in multilabel classification, which is challenging compared to single-label tasks. This systematic review of all the approaches published in the literature will assist researchers in developing improved methods of whole chest disease detection. The study focuses on the deep learning methods, publically accessible datasets, hyperparameters, and performance metrics employed by various researchers in classifying multilabel chest X-ray images. The findings of this study provide a complete overview of the current state of the art, highlighting significant practical aspects of the approaches studied. Distinctive results highlighting the potential enhancements and beneficial uses of deep learning in multilabel chest disease identification are presented.

胸部 X 光扫描是识别胸部疾病最常用的诊断工具之一。然而，从 X 光图像中识别疾病需要经验丰富的技术人员，而且经常被认为是一个耗时的过程，解读的程度也不尽相同。在特殊情况下，通过图像识别疾病对人类观察者来说是一项挑战。深度学习的最新进展为使用这种技术诊断疾病提供了新的可能性。然而，进一步的实施需要事先了解策略和适当的架构设计。揭示这些信息将有助于更快地实施，并解决特定设计所产生的潜在问题，特别是在多标签分类方面，这与单标签任务相比具有挑战性。本研究对文献中发表的所有方法进行了系统回顾，这将有助于研究人员开发出更好的全胸疾病检测方法。本研究的重点是深度学习方法、可公开访问的数据集、超参数以及不同研究人员在对多标签胸部 X 光图像进行分类时采用的性能指标。研究结果全面概述了当前的技术水平，突出强调了所研究方法的重要实用性。研究结果突出了深度学习在多标签胸部疾病识别中的潜在优势和有益用途。

{"title":"A systematic review of multilabel chest X-ray classification using deep learning","authors":"Uswatun Hasanah, Jenq-Shiou Leu, Cries Avian, Ihsanul Azmi, Setya Widyawan Prakosa","doi":"10.1007/s11042-024-20172-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20172-4","url":null,"abstract":"Chest X-ray scans are one of the most often used diagnostic tools for identifying chest diseases. However, identifying diseases in X-ray images needs experienced technicians and is frequently noted as a time-consuming process with varying levels of interpretation. In particular circumstances, disease identification through images is a challenge for human observers. Recent advances in deep learning have opened up new possibilities for using this technique to diagnose diseases. However, further implementation requires prior knowledge of strategy and appropriate architecture design. Revealing this information, will enable faster implementation and encounter potential issues produced by specific designs, especially in multilabel classification, which is challenging compared to single-label tasks. This systematic review of all the approaches published in the literature will assist researchers in developing improved methods of whole chest disease detection. The study focuses on the deep learning methods, publically accessible datasets, hyperparameters, and performance metrics employed by various researchers in classifying multilabel chest X-ray images. The findings of this study provide a complete overview of the current state of the art, highlighting significant practical aspects of the approaches studied. Distinctive results highlighting the potential enhancements and beneficial uses of deep learning in multilabel chest disease identification are presented.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigating the impact of sensor axis combinations on activity recognition and fall detection: an empirical study 调查传感器轴组合对活动识别和跌倒检测的影响：实证研究

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20136-8

Erhan Kavuncuoğlu, Ahmet Turan Özdemir, Esma Uzunhisarcıklı

Activity recognition is a fundamental concept widely embraced within the realm of healthcare. Leveraging sensor fusion techniques, particularly involving accelerometers (A), gyroscopes (G), and magnetometers (M), this technology has undergone extensive development to effectively distinguish between various activity types, improve tracking systems, and attain high classification accuracy. This research is dedicated to augmenting the effectiveness of activity recognition by investigating diverse sensor axis combinations while underscoring the advantages of this approach. In pursuit of this objective, we gathered data from two distinct sources: 20 instances of falls and 16 daily life activities, recorded through the utilization of the Motion Tracker Wireless (MTw), a commercial product. In this particular experiment, we meticulously assembled a comprehensive dataset comprising 2520 tests, leveraging the voluntary participation of 14 individuals (comprising 7 females and 7 males). Additionally, data pertaining to 7 cases of falls and 8 daily life activities were captured using a cost-effective, environment-independent Activity Tracking Device (ATD). This alternative dataset encompassed a total of 1350 tests, with the participation of 30 volunteers, equally divided between 15 females and 15 males. Within the framework of this research, we conducted meticulous comparative analyses utilizing the complete dataset, which encompassed 3870 tests in total. The findings obtained from these analyses convincingly establish the efficacy of recognizing both fall incidents and routine daily activities. This investigation underscores the potential of leveraging affordable IoT technologies to enhance the quality of everyday life and their practical utility in real-world scenarios.

活动识别是医疗保健领域广泛采用的一个基本概念。利用传感器融合技术，特别是涉及加速计（A）、陀螺仪（G）和磁力计（M）的传感器融合技术，这项技术得到了广泛的发展，以有效区分各种活动类型、改进跟踪系统并达到较高的分类准确性。这项研究致力于通过研究各种传感器轴的组合来提高活动识别的有效性，同时强调这种方法的优势。为了实现这一目标，我们收集了两个不同来源的数据：通过使用商用产品无线运动追踪器（Motion Tracker Wireless，MTw）记录了 20 次跌倒和 16 次日常生活活动。在这次特定实验中，我们利用 14 人（包括 7 名女性和 7 名男性）的自愿参与，精心组建了一个包含 2520 次测试的综合数据集。此外，我们还使用一种经济实用、不受环境影响的活动追踪设备（ATD）采集了 7 例跌倒和 8 项日常生活活动的相关数据。该替代数据集共包含 1350 次测试，共有 30 名志愿者参与，其中 15 名女性，15 名男性。在这项研究的框架内，我们利用完整的数据集进行了细致的比较分析，总共包括 3870 次测试。这些分析结果令人信服地证明了识别跌倒事件和日常活动的有效性。这项调查强调了利用经济实惠的物联网技术提高日常生活质量的潜力及其在现实世界场景中的实用性。

{"title":"Investigating the impact of sensor axis combinations on activity recognition and fall detection: an empirical study","authors":"Erhan Kavuncuoğlu, Ahmet Turan Özdemir, Esma Uzunhisarcıklı","doi":"10.1007/s11042-024-20136-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20136-8","url":null,"abstract":"Activity recognition is a fundamental concept widely embraced within the realm of healthcare. Leveraging sensor fusion techniques, particularly involving accelerometers (A), gyroscopes (G), and magnetometers (M), this technology has undergone extensive development to effectively distinguish between various activity types, improve tracking systems, and attain high classification accuracy. This research is dedicated to augmenting the effectiveness of activity recognition by investigating diverse sensor axis combinations while underscoring the advantages of this approach. In pursuit of this objective, we gathered data from two distinct sources: 20 instances of falls and 16 daily life activities, recorded through the utilization of the Motion Tracker Wireless (MTw), a commercial product. In this particular experiment, we meticulously assembled a comprehensive dataset comprising 2520 tests, leveraging the voluntary participation of 14 individuals (comprising 7 females and 7 males). Additionally, data pertaining to 7 cases of falls and 8 daily life activities were captured using a cost-effective, environment-independent Activity Tracking Device (ATD). This alternative dataset encompassed a total of 1350 tests, with the participation of 30 volunteers, equally divided between 15 females and 15 males. Within the framework of this research, we conducted meticulous comparative analyses utilizing the complete dataset, which encompassed 3870 tests in total. The findings obtained from these analyses convincingly establish the efficacy of recognizing both fall incidents and routine daily activities. This investigation underscores the potential of leveraging affordable IoT technologies to enhance the quality of everyday life and their practical utility in real-world scenarios.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Crowd dynamics analysis and behavior recognition in surveillance videos based on deep learning 基于深度学习的监控视频中的人群动态分析和行为识别

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-12 DOI: 10.1007/s11042-024-20161-7

Anum Ilyas, Narmeen Bawany

Video surveillance is widely adopted across various sectors for purposes such as law enforcement, COVID-19 isolation monitoring, and analyzing crowds for potential threats like flash mobs or violence. The vast amount of data generated daily from surveillance devices holds significant potential but requires effective analysis to extract value. Detecting anomalous crowd behavior, which can lead to chaos and casualties, is particularly challenging in video surveillance due to its labor-intensive nature and susceptibility to errors. To address these challenges, this research contributes in two key areas: first, by creating a diverse and representative video dataset that accurately reflects real-world crowd dynamics across eight different categories; second, by developing a reliable framework, ‘CRAB-NET,’ for automated behavior recognition. Extensive experimentation and evaluation, using Convolutional Long Short-Term Memory networks (ConvLSTM) and Long-Term Recurrent Convolutional Networks (LRCN), validated the effectiveness of the proposed approach in accurately categorizing behaviors observed in surveillance videos. The employed models were able to achieve the accuracy score of 99.46% for celebratory crowd, 99.98% for formal crowd and 96.69% for violent crowd. The demonstrated accuracy of 97.20% for comprehensive dataset achieved by the LRCN underscores its potential to revolutionize crowd behavior analysis. It ensures safer mass gatherings and more effective security interventions. Incorporating AI-powered crowd behavior recognition like ‘CRAB-NET’ into security measures not only safeguards public gatherings but also paves the way for proactive event management and predictive safety strategies.

视频监控被广泛应用于各个领域，如执法、COVID-19 隔离监控以及分析人群中的潜在威胁（如快闪或暴力）。监控设备每天产生的大量数据蕴含着巨大的潜力，但需要进行有效分析才能提取价值。检测可能导致混乱和伤亡的异常人群行为在视频监控中尤其具有挑战性，因为它需要大量人力，而且容易出错。为了应对这些挑战，本研究在两个关键领域做出了贡献：首先，创建了一个多样化、具有代表性的视频数据集，准确反映了现实世界中八个不同类别的人群动态；其次，开发了一个可靠的框架 "CRAB-NET"，用于自动行为识别。通过使用卷积长短期记忆网络（ConvLSTM）和长期递归卷积网络（LRCN）进行广泛的实验和评估，验证了所提出的方法在对监控视频中观察到的行为进行准确分类方面的有效性。所采用的模型对庆祝人群的准确率达到 99.46%，对正式人群的准确率达到 99.98%，对暴力人群的准确率达到 96.69%。LRCN 对综合数据集的准确率达到了 97.20%，这突显了它在人群行为分析方面的革命性潜力。它能确保更安全的人群聚集和更有效的安全干预。将像 "CRAB-NET "这样的人工智能人群行为识别技术纳入安保措施，不仅能保障公众集会的安全，还能为积极主动的活动管理和预测性安全策略铺平道路。

{"title":"Crowd dynamics analysis and behavior recognition in surveillance videos based on deep learning","authors":"Anum Ilyas, Narmeen Bawany","doi":"10.1007/s11042-024-20161-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20161-7","url":null,"abstract":"Video surveillance is widely adopted across various sectors for purposes such as law enforcement, COVID-19 isolation monitoring, and analyzing crowds for potential threats like flash mobs or violence. The vast amount of data generated daily from surveillance devices holds significant potential but requires effective analysis to extract value. Detecting anomalous crowd behavior, which can lead to chaos and casualties, is particularly challenging in video surveillance due to its labor-intensive nature and susceptibility to errors. To address these challenges, this research contributes in two key areas: first, by creating a diverse and representative video dataset that accurately reflects real-world crowd dynamics across eight different categories; second, by developing a reliable framework, ‘CRAB-NET,’ for automated behavior recognition. Extensive experimentation and evaluation, using Convolutional Long Short-Term Memory networks (ConvLSTM) and Long-Term Recurrent Convolutional Networks (LRCN), validated the effectiveness of the proposed approach in accurately categorizing behaviors observed in surveillance videos. The employed models were able to achieve the accuracy score of 99.46% for celebratory crowd, 99.98% for formal crowd and 96.69% for violent crowd. The demonstrated accuracy of 97.20% for comprehensive dataset achieved by the LRCN underscores its potential to revolutionize crowd behavior analysis. It ensures safer mass gatherings and more effective security interventions. Incorporating AI-powered crowd behavior recognition like ‘CRAB-NET’ into security measures not only safeguards public gatherings but also paves the way for proactive event management and predictive safety strategies.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing eyeglasses removal in facial images: a novel approach using translation models for eyeglasses mask completion 增强面部图像中的眼镜去除效果：利用翻译模型完成眼镜遮罩的新方法

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-11 DOI: 10.1007/s11042-024-20101-5

Zahra Esmaily, Hossein Ebrahimpour-Komleh

Accurately removing eyeglasses from facial images is crucial for improving the performance of various face-related tasks such as verification, identification, and reconstruction. This paper presents a novel approach to enhancing eyeglasses removal by integrating a mask completion technique into the existing framework. Our method focuses on improving the accuracy of eyeglasses masks, which is essential for subsequent eyeglasses and shadow removal steps. We introduce a unique dataset specifically designed for eyeglasses mask image completion. This dataset is generated by applying Top-Hat morphological operations to existing eyeglasses mask datasets, creating a collection of images containing eyeglasses masks in two states: damaged (incomplete) and complete (ground truth). A Pix2Pix image-to-image translation model is trained on this newly created dataset for the purpose of restoring incomplete eyeglass mask predictions. This restoration step significantly improves the accuracy of eyeglass frame extraction and leads to more realistic results in subsequent eyeglasses and shadow removal. Our method incorporates a post-processing step to refine the completed mask, preventing the formation of artifacts in the background or outside of the eyeglasses frame box, further enhancing the overall quality of the processed image. Experimental results on CelebA, FFHQ, and MeGlass datasets showcase the effectiveness of our method, outperforming state-of-the-art approaches in quantitative metrics (FID, KID, MOS) and qualitative evaluations.

准确去除面部图像中的眼镜对于提高验证、识别和重建等各种面部相关任务的性能至关重要。本文提出了一种新方法，通过在现有框架中集成面具补全技术来增强眼镜去除效果。我们的方法侧重于提高眼镜遮罩的准确性，这对后续的眼镜和阴影去除步骤至关重要。我们引入了一个专为完成眼镜遮罩图像而设计的独特数据集。该数据集是通过对现有的眼镜遮罩数据集应用 Top-Hat 形态学操作生成的，它创建了一个包含两种状态眼镜遮罩的图像集合：损坏（不完整）和完整（地面实况）。在这个新创建的数据集上训练 Pix2Pix 图像到图像平移模型，以恢复不完整的眼镜遮罩预测。这一还原步骤大大提高了眼镜框提取的准确性，并使后续的眼镜和阴影去除效果更加逼真。我们的方法采用了后处理步骤来完善已完成的遮罩，防止在背景或眼镜框框外形成伪影，进一步提高了处理后图像的整体质量。在 CelebA、FFHQ 和 MeGlass 数据集上的实验结果表明，我们的方法非常有效，在定量指标（FID、KID、MOS）和定性评估方面都优于最先进的方法。

{"title":"Enhancing eyeglasses removal in facial images: a novel approach using translation models for eyeglasses mask completion","authors":"Zahra Esmaily, Hossein Ebrahimpour-Komleh","doi":"10.1007/s11042-024-20101-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20101-5","url":null,"abstract":"Accurately removing eyeglasses from facial images is crucial for improving the performance of various face-related tasks such as verification, identification, and reconstruction. This paper presents a novel approach to enhancing eyeglasses removal by integrating a mask completion technique into the existing framework. Our method focuses on improving the accuracy of eyeglasses masks, which is essential for subsequent eyeglasses and shadow removal steps. We introduce a unique dataset specifically designed for eyeglasses mask image completion. This dataset is generated by applying Top-Hat morphological operations to existing eyeglasses mask datasets, creating a collection of images containing eyeglasses masks in two states: damaged (incomplete) and complete (ground truth). A Pix2Pix image-to-image translation model is trained on this newly created dataset for the purpose of restoring incomplete eyeglass mask predictions. This restoration step significantly improves the accuracy of eyeglass frame extraction and leads to more realistic results in subsequent eyeglasses and shadow removal. Our method incorporates a post-processing step to refine the completed mask, preventing the formation of artifacts in the background or outside of the eyeglasses frame box, further enhancing the overall quality of the processed image. Experimental results on CelebA, FFHQ, and MeGlass datasets showcase the effectiveness of our method, outperforming state-of-the-art approaches in quantitative metrics (FID, KID, MOS) and qualitative evaluations.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cyber-XAI-Block: an end-to-end cyber threat detection & fl-based risk assessment framework for iot enabled smart organization using xai and blockchain technologies Cyber-XAI-Block：利用 xai 和区块链技术为启用了 iot 的智能组织提供端到端网络威胁检测和基于 fl 的风险评估框架

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-11 DOI: 10.1007/s11042-024-20059-4

Omar Abboosh Hussein Gwassi, Osman Nuri Uçan, Enrique A. Navarro

The growing integration of the Internet of Things (IoT) in smart organizations is increasing the vulnerability of cyber threats, necessitating advanced frameworks for effective threat detection and risk assessment. Existing works provide achievable results but lack effective solutions, such as detecting Social Engineering Attacks (SEA). Using Deep Learning (DL) and Machine Learning (ML) methods whereas they are limited to validating user behaviors. Like high false positive rates, attack reoccurrence, and increases in numerous attacks. To overcome this problem, we use explainable (DL) techniques to increase cyber security in an IoT-enabled smart organization environment. This paper firstly, implements Capsule Network (CapsNet) to process employee fingerprints and blink patterns. Secondly, the Quantum Key Secure Communication Protocol (QKSCP) was also used to decrease communication channel vulnerabilities like Man In The Middle (MITM) and reply attacks. After Dual Q Network-based Asynchronous Advantage Actor-Critic algorithm DQN-A3C algorithm detects and prevents attacks. Thirdly, employed the explainable DQN-A3C model and the Siamese Inter Lingual Transformer (SILT) transformer for natural language explanations to boost social engineering security by ensuring the Artificial Intelligence (AI) model and human trustworthiness. After, we built a Hopping Intrusion Detection & Prevention System (IDS/IPS) using an explainable Harmonized Google Net (HGN) model with SHAP and SILT explanations to appropriately categorize dangerous external traffic flows. Finally, to improve global, cyberattack comprehension, we created a Federated Learning (FL)-based knowledge-sharing mechanism between Cyber Threat Repository (CTR) and cloud servers, known as global risk assessment. To evaluate the suggested approach, the new method is compared to the ones that already exist in terms of malicious traffic (65 bytes/sec), detection rate (97%), false positive rate (45%), prevention accuracy (98%), end-to-end response time (97 s), recall (96%), false negative rate (42%) and resource consumption (41). Our strategy's performance is examined using numerical analysis, and the results demonstrate that it outperforms other methods in all metrics.

物联网（IoT）在智能组织中的集成度越来越高，增加了网络威胁的脆弱性，因此需要先进的框架来进行有效的威胁检测和风险评估。现有作品提供了可实现的结果，但缺乏有效的解决方案，如检测社交工程攻击（SEA）。深度学习（DL）和机器学习（ML）方法仅限于验证用户行为。例如，误报率高、攻击重复发生以及攻击次数增多。为了克服这一问题，我们使用可解释（DL）技术来提高物联网智能组织环境中的网络安全性。本文首先实现了胶囊网络（CapsNet）来处理员工指纹和眨眼模式。其次，还使用了量子密钥安全通信协议（QKSCP）来减少中间人（MITM）和回复攻击等通信信道漏洞。在基于双 Q 网络的异步优势行动者批评算法 DQN-A3C 算法检测和防止攻击之后。第三，采用可解释的 DQN-A3C 模型和用于自然语言解释的 SILT 变换器（Siamese Inter Lingual Transformer），通过确保人工智能（AI）模型和人类的可信度来提高社会工程学的安全性。之后，我们利用可解释的统一谷歌网络（HGN）模型，结合 SHAP 和 SILT 解释，构建了一个跳转式入侵检测与防范系统（IDS/IPS），对危险的外部流量进行适当分类。最后，为了提高对全球网络攻击的理解能力，我们在网络威胁库（CTR）和云服务器之间创建了一种基于联合学习（FL）的知识共享机制，即全球风险评估。为了评估所建议的方法，我们将新方法与现有方法在恶意流量（65 字节/秒）、检测率（97%）、误报率（45%）、预防准确率（98%）、端到端响应时间（97 秒）、召回率（96%）、误报率（42%）和资源消耗（41）方面进行了比较。我们通过数值分析检验了该策略的性能，结果表明它在所有指标上都优于其他方法。

{"title":"Cyber-XAI-Block: an end-to-end cyber threat detection & fl-based risk assessment framework for iot enabled smart organization using xai and blockchain technologies","authors":"Omar Abboosh Hussein Gwassi, Osman Nuri Uçan, Enrique A. Navarro","doi":"10.1007/s11042-024-20059-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20059-4","url":null,"abstract":"The growing integration of the Internet of Things (IoT) in smart organizations is increasing the vulnerability of cyber threats, necessitating advanced frameworks for effective threat detection and risk assessment. Existing works provide achievable results but lack effective solutions, such as detecting Social Engineering Attacks (SEA). Using Deep Learning (DL) and Machine Learning (ML) methods whereas they are limited to validating user behaviors. Like high false positive rates, attack reoccurrence, and increases in numerous attacks. To overcome this problem, we use explainable (DL) techniques to increase cyber security in an IoT-enabled smart organization environment. This paper firstly, implements Capsule Network (CapsNet) to process employee fingerprints and blink patterns. Secondly, the Quantum Key Secure Communication Protocol (QKSCP) was also used to decrease communication channel vulnerabilities like Man In The Middle (MITM) and reply attacks. After Dual Q Network-based Asynchronous Advantage Actor-Critic algorithm DQN-A3C algorithm detects and prevents attacks. Thirdly, employed the explainable DQN-A3C model and the Siamese Inter Lingual Transformer (SILT) transformer for natural language explanations to boost social engineering security by ensuring the Artificial Intelligence (AI) model and human trustworthiness. After, we built a Hopping Intrusion Detection & Prevention System (IDS/IPS) using an explainable Harmonized Google Net (HGN) model with SHAP and SILT explanations to appropriately categorize dangerous external traffic flows. Finally, to improve global, cyberattack comprehension, we created a Federated Learning (FL)-based knowledge-sharing mechanism between Cyber Threat Repository (CTR) and cloud servers, known as global risk assessment. To evaluate the suggested approach, the new method is compared to the ones that already exist in terms of malicious traffic (65 bytes/sec), detection rate (97%), false positive rate (45%), prevention accuracy (98%), end-to-end response time (97 s), recall (96%), false negative rate (42%) and resource consumption (41). Our strategy's performance is examined using numerical analysis, and the results demonstrate that it outperforms other methods in all metrics.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"58 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SMOTE-Based deep network with adaptive boosted sooty for the detection and classification of type 2 diabetes mellitus 基于SMOTE的深度网络与自适应提升烟尘技术用于2型糖尿病的检测和分类

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-11 DOI: 10.1007/s11042-024-19770-z

Phani Kumar Immadisetty, C. Rajabhushanam

Type 2 diabetes (T2D) is a prolonged disease caused by abnormal rise in glucose levels due to poor insulin production in the pancreas. However, the detection and classification of this type of disease is very challenging and requires effective techniques for learning the T2D features. Therefore, this study proposes the use of a novel hybridized deep learning-based technique to automatically detect and categorize T2D by effectively learning disease attributes. First, missing value imputation and a normalization-based pre-processing phase are introduced to improve the quality of the data. The Adaptive Boosted Sooty Tern Optimization (Adap-BSTO) approach is then used to select the best features while minimizing complexity. After that, the Synthetic Minority Oversampling Technique (SMOTE) is used to verify that the database classes are evenly distributed. Finally, the Deep Convolutional Attention-based Bidirectional Recurrent Neural Network (DCA-BiRNN) technique is proposed to detect and classify the presence and absence of T2D disease accurately. The proposed study is instigated via the Python platform, and two publicly available PIMA Indian and HFD databases are utilized in this study. Accuracy, NPV, kappa score, Mathew's correlation coefficient (MCC), false discovery rate (FDR), and time complexity are among the assessment metrics examined and compared to prior research. For the PIMA Indian dataset, the proposed method obtains an overall accuracy of 99.6%, FDR of 0.0038, kappa of 99.24%, and NPV of 99.6%. For the HFD dataset, the proposed method acquires an overall accuracy of 99.5%, FDR of 0.0052, kappa of 99%, and NPV of 99.4%, respectively.

2 型糖尿病（T2D）是一种因胰腺胰岛素分泌不足而导致血糖水平异常升高的长期疾病。然而，这类疾病的检测和分类非常具有挑战性，需要有效的技术来学习 T2D 特征。因此，本研究提出使用一种基于混合深度学习的新型技术，通过有效学习疾病属性来自动检测和分类 T2D。首先，引入缺失值估算和基于归一化的预处理阶段来提高数据质量。然后，使用自适应助推燕鸥优化（Adap-BSTO）方法来选择最佳特征，同时最大限度地降低复杂性。之后，使用合成少数群体过度采样技术（SMOTE）来验证数据库类别的均匀分布。最后，提出了基于深度卷积注意力的双向循环神经网络（DCA-BiRNN）技术，用于准确检测和分类是否患有 T2D 疾病。该研究是通过 Python 平台进行的，并利用了两个公开的 PIMA 印度和 HFD 数据库。准确度、NPV、kappa 分数、Mathew 相关系数（MCC）、误诊率（FDR）和时间复杂性等评估指标都在研究之列，并与之前的研究进行了比较。对于 PIMA 印度数据集，建议方法的总体准确率为 99.6%，FDR 为 0.0038，kappa 为 99.24%，NPV 为 99.6%。对于 HFD 数据集，建议的方法分别获得了 99.5% 的总体准确率、0.0052 的 FDR、99% 的 kappa 和 99.4% 的 NPV。

{"title":"SMOTE-Based deep network with adaptive boosted sooty for the detection and classification of type 2 diabetes mellitus","authors":"Phani Kumar Immadisetty, C. Rajabhushanam","doi":"10.1007/s11042-024-19770-z","DOIUrl":"https://doi.org/10.1007/s11042-024-19770-z","url":null,"abstract":"Type 2 diabetes (T2D) is a prolonged disease caused by abnormal rise in glucose levels due to poor insulin production in the pancreas. However, the detection and classification of this type of disease is very challenging and requires effective techniques for learning the T2D features. Therefore, this study proposes the use of a novel hybridized deep learning-based technique to automatically detect and categorize T2D by effectively learning disease attributes. First, missing value imputation and a normalization-based pre-processing phase are introduced to improve the quality of the data. The Adaptive Boosted Sooty Tern Optimization (Adap-BSTO) approach is then used to select the best features while minimizing complexity. After that, the Synthetic Minority Oversampling Technique (SMOTE) is used to verify that the database classes are evenly distributed. Finally, the Deep Convolutional Attention-based Bidirectional Recurrent Neural Network (DCA-BiRNN) technique is proposed to detect and classify the presence and absence of T2D disease accurately. The proposed study is instigated via the Python platform, and two publicly available PIMA Indian and HFD databases are utilized in this study. Accuracy, NPV, kappa score, Mathew's correlation coefficient (MCC), false discovery rate (FDR), and time complexity are among the assessment metrics examined and compared to prior research. For the PIMA Indian dataset, the proposed method obtains an overall accuracy of 99.6%, FDR of 0.0038, kappa of 99.24%, and NPV of 99.6%. For the HFD dataset, the proposed method acquires an overall accuracy of 99.5%, FDR of 0.0052, kappa of 99%, and NPV of 99.4%, respectively.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation 通过扩散模型利用多变换器编码器和多假设聚合进行三维人体姿态估计

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-10 DOI: 10.1007/s11042-024-20179-x

Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo

The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction & proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction & proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.

在从二维到三维的升降式人体姿态估计任务中，变换器架构一直保持着最先进的性能。尽管基于变压器的方法取得了进步，但仍存在与顺序数据处理、解决深度模糊性和有效处理敏感噪声数据相关的问题。因此，变压器编码器在精确估计人体位置方面遇到了困难。为解决这一问题，本研究提出了一种带有多重假设聚合（MHAFormer）模块的新型多变换器编码器。为此，首先引入一个扩散模块，生成多个三维姿态假设，并逐渐将高斯噪声分布到地面真实三维姿态上。然后，在扩散模块中使用去噪器，利用二维关键点的信息恢复可行的三维姿势。此外，我们还提出了带有连接级重投（MHAJR）的多假设聚合方法，将三维假设重新设计为二维位置，并通过考虑重投误差来选择最优假设。特别是，多假设聚合方法通过考虑各种可能的姿势并结合其优势以获得更准确的最终估计，从而解决了深度模糊性和顺序数据处理问题。接下来，我们介绍了改进的时空变换器编码器，它可以通过明确模拟不同身体关节之间的时空关系，帮助提高三维姿势估计的准确性并减少模糊性。具体来说，时空变换器编码器将时空收缩与扩散（TCP）注意机制和特征聚合细化模块（FAR）引入到细化时空收缩与扩散（RTCP）变换器中，从而增强了块内时空建模，并进一步细化了块间特征交互。最后，通过使用 Human3.6M 和 MPI-INF-3DHP 基准数据集与现有方法进行比较，证明了所提出方法的优越性。

{"title":"Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation","authors":"Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo","doi":"10.1007/s11042-024-20179-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20179-x","url":null,"abstract":"The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction & proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction & proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"47 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0