Pub Date : 2024-09-12DOI: 10.1007/s11042-024-20155-5
Huifang Feng, Yanan Hui
Accurately predicting the risk of diabetes is of paramount importance for early intervention and prevention. To achieve precise diabetes risk prediction, we propose a hybrid diabetes risk prediction model, XGB-ILSO-1DCNN, which combines the Extreme Gradient Boosting (XGBoost) algorithm, the Improved Lion Swarm Optimization algorithm, and the deep learning model 1DCNN. Firstly, an XGBoost is trained based on the raw data and the prediction result based on XGBoost is regarded as a new feature, concatenating it with the original features to form a new feature set. Then, we introduce a hybrid approach called ILSO-1DCNN, which is based on improved Lion Swarm Optimization (ILSO) and one-dimensional convolutional neural network (1DCNN). This approach is proposed for diabetes risk prediction. The ILSO-1DCNN algorithm utilizes the optimization capabilities of ILSO to automatically determine the hyperparameters of the 1DCNN network. Finally, we conducted comprehensive experiments on the PIMA dataset and compared our model with baseline models. The experimental results not only demonstrate our model's exceptional predictive performance across various evaluation criteria but also highlight its efficiency and low complexity. This study introduces a novel and effective diabetes risk prediction approach, making it a valuable tool for clinical analysis in the care of diabetic patients.
{"title":"A hybrid diabetes risk prediction model XGB-ILSO-1DCNN","authors":"Huifang Feng, Yanan Hui","doi":"10.1007/s11042-024-20155-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20155-5","url":null,"abstract":"<p>Accurately predicting the risk of diabetes is of paramount importance for early intervention and prevention. To achieve precise diabetes risk prediction, we propose a hybrid diabetes risk prediction model, XGB-ILSO-1DCNN, which combines the Extreme Gradient Boosting (XGBoost) algorithm, the Improved Lion Swarm Optimization algorithm, and the deep learning model 1DCNN. Firstly, an XGBoost is trained based on the raw data and the prediction result based on XGBoost is regarded as a new feature, concatenating it with the original features to form a new feature set. Then, we introduce a hybrid approach called ILSO-1DCNN, which is based on improved Lion Swarm Optimization (ILSO) and one-dimensional convolutional neural network (1DCNN). This approach is proposed for diabetes risk prediction. The ILSO-1DCNN algorithm utilizes the optimization capabilities of ILSO to automatically determine the hyperparameters of the 1DCNN network. Finally, we conducted comprehensive experiments on the PIMA dataset and compared our model with baseline models. The experimental results not only demonstrate our model's exceptional predictive performance across various evaluation criteria but also highlight its efficiency and low complexity. This study introduces a novel and effective diabetes risk prediction approach, making it a valuable tool for clinical analysis in the care of diabetic patients.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1007/s11042-024-20210-1
Yahui Chen, Yitao Liang
CLAHE is widely used in underwater image processing because of its excellent performance in contrast enhancement. The selection of the clip point formula is the core problem of the CLAHE methods, and the selection of suitable clipping value has become the focus of some extended methods. In this paper, an automatic CLAHE underwater image enhancement algorithm is proposed. The method determines the clipping value according to the high-order moment dynamic features of each block of the underwater image. By quantifying the dynamic features of each block in the image more precisely, and then adding it to the clipping value formula, the contrast and details of the underwater image can be effectively enhanced. In order to effectively improve the saturation and brightness of underwater images, this paper chooses a more accurate and intuitive HSV model. Experimental results show that our methods enhance the contrast subjectively, while suppressing the amplification of noise very well, and also increase the saturation of underwater images. In objective metrics, our method obtains the best values in underwater quality assessment (UIQM), SSIM, and PSNR.
{"title":"Underwater images enhancement using contrast limited adaptive parameter settings histogram equalization","authors":"Yahui Chen, Yitao Liang","doi":"10.1007/s11042-024-20210-1","DOIUrl":"https://doi.org/10.1007/s11042-024-20210-1","url":null,"abstract":"<p>CLAHE is widely used in underwater image processing because of its excellent performance in contrast enhancement. The selection of the clip point formula is the core problem of the CLAHE methods, and the selection of suitable clipping value has become the focus of some extended methods. In this paper, an automatic CLAHE underwater image enhancement algorithm is proposed. The method determines the clipping value according to the high-order moment dynamic features of each block of the underwater image. By quantifying the dynamic features of each block in the image more precisely, and then adding it to the clipping value formula, the contrast and details of the underwater image can be effectively enhanced. In order to effectively improve the saturation and brightness of underwater images, this paper chooses a more accurate and intuitive HSV model. Experimental results show that our methods enhance the contrast subjectively, while suppressing the amplification of noise very well, and also increase the saturation of underwater images. In objective metrics, our method obtains the best values in underwater quality assessment (UIQM), SSIM, and PSNR.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"106 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203673","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1007/s11042-024-20203-0
Surajit Das, Rajat Subhra Goswami
Brain tumors, whether cancerous or noncancerous, can be life-threatening due to abnormal cell growth, potentially causing organ dysfunction and mortality in adults. Brain tumor segmentation (BTS) and brain tumor classification (BTC) technologies are crucial in diagnosing and treating brain tumors. They assist doctors in locating and measuring tumors and developing treatment and rehabilitation strategies. Despite their importance in the medical field, BTC and BTS remain challenging. This comprehensive review specifically analyses machine and deep learning methodologies, including convolutional neural networks (CNN), transfer learning (TL), and hybrid models for BTS and BTC. We discuss CNN architectures like U-Net++, which is known for its high segmentation accuracy in 2D and 3D medical images. Additionally, transfer learning utilises pre-trained models such as ResNet, Inception, etc., from ImageNet, fine-tuned on brain tumor-specific datasets to enhance classification performance and sensitivity despite limited medical data. Hybrid models combine deep learning techniques with machine learning, using CNN for initial segmentation and traditional classification methods, improving accuracy. We discuss commonly used benchmark datasets in brain tumors research, including the BraTS dataset and the TCIA database, and evaluate performance metrics, such as the F1-score, accuracy, sensitivity, specificity, and the dice coefficient, emphasising their significance and standard thresholds in brain tumors analysis. The review addresses current machine learning (ML) and deep learning (DL) based BTS and BTC challenges and proposes solutions such as explainable deep learning models and multi-task learning frameworks. These insights aim to guide future advancements in fostering the development of accurate and efficient tools for improved patient care in brain tumors analysis.
{"title":"Advancements in brain tumor analysis: a comprehensive review of machine learning, hybrid deep learning, and transfer learning approaches for MRI-based classification and segmentation","authors":"Surajit Das, Rajat Subhra Goswami","doi":"10.1007/s11042-024-20203-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20203-0","url":null,"abstract":"<p>Brain tumors, whether cancerous or noncancerous, can be life-threatening due to abnormal cell growth, potentially causing organ dysfunction and mortality in adults. Brain tumor segmentation (BTS) and brain tumor classification (BTC) technologies are crucial in diagnosing and treating brain tumors. They assist doctors in locating and measuring tumors and developing treatment and rehabilitation strategies. Despite their importance in the medical field, BTC and BTS remain challenging. This comprehensive review specifically analyses machine and deep learning methodologies, including convolutional neural networks (CNN), transfer learning (TL), and hybrid models for BTS and BTC. We discuss CNN architectures like U-Net++, which is known for its high segmentation accuracy in 2D and 3D medical images. Additionally, transfer learning utilises pre-trained models such as ResNet, Inception, etc., from ImageNet, fine-tuned on brain tumor-specific datasets to enhance classification performance and sensitivity despite limited medical data. Hybrid models combine deep learning techniques with machine learning, using CNN for initial segmentation and traditional classification methods, improving accuracy. We discuss commonly used benchmark datasets in brain tumors research, including the BraTS dataset and the TCIA database, and evaluate performance metrics, such as the F1-score, accuracy, sensitivity, specificity, and the dice coefficient, emphasising their significance and standard thresholds in brain tumors analysis. The review addresses current machine learning (ML) and deep learning (DL) based BTS and BTC challenges and proposes solutions such as explainable deep learning models and multi-task learning frameworks. These insights aim to guide future advancements in fostering the development of accurate and efficient tools for improved patient care in brain tumors analysis.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"13 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chest X-ray scans are one of the most often used diagnostic tools for identifying chest diseases. However, identifying diseases in X-ray images needs experienced technicians and is frequently noted as a time-consuming process with varying levels of interpretation. In particular circumstances, disease identification through images is a challenge for human observers. Recent advances in deep learning have opened up new possibilities for using this technique to diagnose diseases. However, further implementation requires prior knowledge of strategy and appropriate architecture design. Revealing this information, will enable faster implementation and encounter potential issues produced by specific designs, especially in multilabel classification, which is challenging compared to single-label tasks. This systematic review of all the approaches published in the literature will assist researchers in developing improved methods of whole chest disease detection. The study focuses on the deep learning methods, publically accessible datasets, hyperparameters, and performance metrics employed by various researchers in classifying multilabel chest X-ray images. The findings of this study provide a complete overview of the current state of the art, highlighting significant practical aspects of the approaches studied. Distinctive results highlighting the potential enhancements and beneficial uses of deep learning in multilabel chest disease identification are presented.
胸部 X 光扫描是识别胸部疾病最常用的诊断工具之一。然而,从 X 光图像中识别疾病需要经验丰富的技术人员,而且经常被认为是一个耗时的过程,解读的程度也不尽相同。在特殊情况下,通过图像识别疾病对人类观察者来说是一项挑战。深度学习的最新进展为使用这种技术诊断疾病提供了新的可能性。然而,进一步的实施需要事先了解策略和适当的架构设计。揭示这些信息将有助于更快地实施,并解决特定设计所产生的潜在问题,特别是在多标签分类方面,这与单标签任务相比具有挑战性。本研究对文献中发表的所有方法进行了系统回顾,这将有助于研究人员开发出更好的全胸疾病检测方法。本研究的重点是深度学习方法、可公开访问的数据集、超参数以及不同研究人员在对多标签胸部 X 光图像进行分类时采用的性能指标。研究结果全面概述了当前的技术水平,突出强调了所研究方法的重要实用性。研究结果突出了深度学习在多标签胸部疾病识别中的潜在优势和有益用途。
{"title":"A systematic review of multilabel chest X-ray classification using deep learning","authors":"Uswatun Hasanah, Jenq-Shiou Leu, Cries Avian, Ihsanul Azmi, Setya Widyawan Prakosa","doi":"10.1007/s11042-024-20172-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20172-4","url":null,"abstract":"<p>Chest X-ray scans are one of the most often used diagnostic tools for identifying chest diseases. However, identifying diseases in X-ray images needs experienced technicians and is frequently noted as a time-consuming process with varying levels of interpretation. In particular circumstances, disease identification through images is a challenge for human observers. Recent advances in deep learning have opened up new possibilities for using this technique to diagnose diseases. However, further implementation requires prior knowledge of strategy and appropriate architecture design. Revealing this information, will enable faster implementation and encounter potential issues produced by specific designs, especially in multilabel classification, which is challenging compared to single-label tasks. This systematic review of all the approaches published in the literature will assist researchers in developing improved methods of whole chest disease detection. The study focuses on the deep learning methods, publically accessible datasets, hyperparameters, and performance metrics employed by various researchers in classifying multilabel chest X-ray images. The findings of this study provide a complete overview of the current state of the art, highlighting significant practical aspects of the approaches studied. Distinctive results highlighting the potential enhancements and beneficial uses of deep learning in multilabel chest disease identification are presented.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"1 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203672","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1007/s11042-024-20136-8
Erhan Kavuncuoğlu, Ahmet Turan Özdemir, Esma Uzunhisarcıklı
Activity recognition is a fundamental concept widely embraced within the realm of healthcare. Leveraging sensor fusion techniques, particularly involving accelerometers (A), gyroscopes (G), and magnetometers (M), this technology has undergone extensive development to effectively distinguish between various activity types, improve tracking systems, and attain high classification accuracy. This research is dedicated to augmenting the effectiveness of activity recognition by investigating diverse sensor axis combinations while underscoring the advantages of this approach. In pursuit of this objective, we gathered data from two distinct sources: 20 instances of falls and 16 daily life activities, recorded through the utilization of the Motion Tracker Wireless (MTw), a commercial product. In this particular experiment, we meticulously assembled a comprehensive dataset comprising 2520 tests, leveraging the voluntary participation of 14 individuals (comprising 7 females and 7 males). Additionally, data pertaining to 7 cases of falls and 8 daily life activities were captured using a cost-effective, environment-independent Activity Tracking Device (ATD). This alternative dataset encompassed a total of 1350 tests, with the participation of 30 volunteers, equally divided between 15 females and 15 males. Within the framework of this research, we conducted meticulous comparative analyses utilizing the complete dataset, which encompassed 3870 tests in total. The findings obtained from these analyses convincingly establish the efficacy of recognizing both fall incidents and routine daily activities. This investigation underscores the potential of leveraging affordable IoT technologies to enhance the quality of everyday life and their practical utility in real-world scenarios.
{"title":"Investigating the impact of sensor axis combinations on activity recognition and fall detection: an empirical study","authors":"Erhan Kavuncuoğlu, Ahmet Turan Özdemir, Esma Uzunhisarcıklı","doi":"10.1007/s11042-024-20136-8","DOIUrl":"https://doi.org/10.1007/s11042-024-20136-8","url":null,"abstract":"<p>Activity recognition is a fundamental concept widely embraced within the realm of healthcare. Leveraging sensor fusion techniques, particularly involving accelerometers (A), gyroscopes (G), and magnetometers (M), this technology has undergone extensive development to effectively distinguish between various activity types, improve tracking systems, and attain high classification accuracy. This research is dedicated to augmenting the effectiveness of activity recognition by investigating diverse sensor axis combinations while underscoring the advantages of this approach. In pursuit of this objective, we gathered data from two distinct sources: 20 instances of falls and 16 daily life activities, recorded through the utilization of the Motion Tracker Wireless (MTw), a commercial product. In this particular experiment, we meticulously assembled a comprehensive dataset comprising 2520 tests, leveraging the voluntary participation of 14 individuals (comprising 7 females and 7 males). Additionally, data pertaining to 7 cases of falls and 8 daily life activities were captured using a cost-effective, environment-independent Activity Tracking Device (ATD). This alternative dataset encompassed a total of 1350 tests, with the participation of 30 volunteers, equally divided between 15 females and 15 males. Within the framework of this research, we conducted meticulous comparative analyses utilizing the complete dataset, which encompassed 3870 tests in total. The findings obtained from these analyses convincingly establish the efficacy of recognizing both fall incidents and routine daily activities. This investigation underscores the potential of leveraging affordable IoT technologies to enhance the quality of everyday life and their practical utility in real-world scenarios.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-12DOI: 10.1007/s11042-024-20161-7
Anum Ilyas, Narmeen Bawany
Video surveillance is widely adopted across various sectors for purposes such as law enforcement, COVID-19 isolation monitoring, and analyzing crowds for potential threats like flash mobs or violence. The vast amount of data generated daily from surveillance devices holds significant potential but requires effective analysis to extract value. Detecting anomalous crowd behavior, which can lead to chaos and casualties, is particularly challenging in video surveillance due to its labor-intensive nature and susceptibility to errors. To address these challenges, this research contributes in two key areas: first, by creating a diverse and representative video dataset that accurately reflects real-world crowd dynamics across eight different categories; second, by developing a reliable framework, ‘CRAB-NET,’ for automated behavior recognition. Extensive experimentation and evaluation, using Convolutional Long Short-Term Memory networks (ConvLSTM) and Long-Term Recurrent Convolutional Networks (LRCN), validated the effectiveness of the proposed approach in accurately categorizing behaviors observed in surveillance videos. The employed models were able to achieve the accuracy score of 99.46% for celebratory crowd, 99.98% for formal crowd and 96.69% for violent crowd. The demonstrated accuracy of 97.20% for comprehensive dataset achieved by the LRCN underscores its potential to revolutionize crowd behavior analysis. It ensures safer mass gatherings and more effective security interventions. Incorporating AI-powered crowd behavior recognition like ‘CRAB-NET’ into security measures not only safeguards public gatherings but also paves the way for proactive event management and predictive safety strategies.
{"title":"Crowd dynamics analysis and behavior recognition in surveillance videos based on deep learning","authors":"Anum Ilyas, Narmeen Bawany","doi":"10.1007/s11042-024-20161-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20161-7","url":null,"abstract":"<p>Video surveillance is widely adopted across various sectors for purposes such as law enforcement, COVID-19 isolation monitoring, and analyzing crowds for potential threats like flash mobs or violence. The vast amount of data generated daily from surveillance devices holds significant potential but requires effective analysis to extract value. Detecting anomalous crowd behavior, which can lead to chaos and casualties, is particularly challenging in video surveillance due to its labor-intensive nature and susceptibility to errors. To address these challenges, this research contributes in two key areas: first, by creating a diverse and representative video dataset that accurately reflects real-world crowd dynamics across eight different categories; second, by developing a reliable framework, ‘CRAB-NET,’ for automated behavior recognition. Extensive experimentation and evaluation, using Convolutional Long Short-Term Memory networks (ConvLSTM) and Long-Term Recurrent Convolutional Networks (LRCN), validated the effectiveness of the proposed approach in accurately categorizing behaviors observed in surveillance videos. The employed models were able to achieve the accuracy score of 99.46% for celebratory crowd, 99.98% for formal crowd and 96.69% for violent crowd. The demonstrated accuracy of 97.20% for comprehensive dataset achieved by the LRCN underscores its potential to revolutionize crowd behavior analysis. It ensures safer mass gatherings and more effective security interventions. Incorporating AI-powered crowd behavior recognition like ‘CRAB-NET’ into security measures not only safeguards public gatherings but also paves the way for proactive event management and predictive safety strategies.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1007/s11042-024-20101-5
Zahra Esmaily, Hossein Ebrahimpour-Komleh
Accurately removing eyeglasses from facial images is crucial for improving the performance of various face-related tasks such as verification, identification, and reconstruction. This paper presents a novel approach to enhancing eyeglasses removal by integrating a mask completion technique into the existing framework. Our method focuses on improving the accuracy of eyeglasses masks, which is essential for subsequent eyeglasses and shadow removal steps. We introduce a unique dataset specifically designed for eyeglasses mask image completion. This dataset is generated by applying Top-Hat morphological operations to existing eyeglasses mask datasets, creating a collection of images containing eyeglasses masks in two states: damaged (incomplete) and complete (ground truth). A Pix2Pix image-to-image translation model is trained on this newly created dataset for the purpose of restoring incomplete eyeglass mask predictions. This restoration step significantly improves the accuracy of eyeglass frame extraction and leads to more realistic results in subsequent eyeglasses and shadow removal. Our method incorporates a post-processing step to refine the completed mask, preventing the formation of artifacts in the background or outside of the eyeglasses frame box, further enhancing the overall quality of the processed image. Experimental results on CelebA, FFHQ, and MeGlass datasets showcase the effectiveness of our method, outperforming state-of-the-art approaches in quantitative metrics (FID, KID, MOS) and qualitative evaluations.
{"title":"Enhancing eyeglasses removal in facial images: a novel approach using translation models for eyeglasses mask completion","authors":"Zahra Esmaily, Hossein Ebrahimpour-Komleh","doi":"10.1007/s11042-024-20101-5","DOIUrl":"https://doi.org/10.1007/s11042-024-20101-5","url":null,"abstract":"<p>Accurately removing eyeglasses from facial images is crucial for improving the performance of various face-related tasks such as verification, identification, and reconstruction. This paper presents a novel approach to enhancing eyeglasses removal by integrating a mask completion technique into the existing framework. Our method focuses on improving the accuracy of eyeglasses masks, which is essential for subsequent eyeglasses and shadow removal steps. We introduce a unique dataset specifically designed for eyeglasses mask image completion. This dataset is generated by applying Top-Hat morphological operations to existing eyeglasses mask datasets, creating a collection of images containing eyeglasses masks in two states: damaged (incomplete) and complete (ground truth). A Pix2Pix image-to-image translation model is trained on this newly created dataset for the purpose of restoring incomplete eyeglass mask predictions. This restoration step significantly improves the accuracy of eyeglass frame extraction and leads to more realistic results in subsequent eyeglasses and shadow removal. Our method incorporates a post-processing step to refine the completed mask, preventing the formation of artifacts in the background or outside of the eyeglasses frame box, further enhancing the overall quality of the processed image. Experimental results on CelebA, FFHQ, and MeGlass datasets showcase the effectiveness of our method, outperforming state-of-the-art approaches in quantitative metrics (FID, KID, MOS) and qualitative evaluations.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1007/s11042-024-20059-4
Omar Abboosh Hussein Gwassi, Osman Nuri Uçan, Enrique A. Navarro
The growing integration of the Internet of Things (IoT) in smart organizations is increasing the vulnerability of cyber threats, necessitating advanced frameworks for effective threat detection and risk assessment. Existing works provide achievable results but lack effective solutions, such as detecting Social Engineering Attacks (SEA). Using Deep Learning (DL) and Machine Learning (ML) methods whereas they are limited to validating user behaviors. Like high false positive rates, attack reoccurrence, and increases in numerous attacks. To overcome this problem, we use explainable (DL) techniques to increase cyber security in an IoT-enabled smart organization environment. This paper firstly, implements Capsule Network (CapsNet) to process employee fingerprints and blink patterns. Secondly, the Quantum Key Secure Communication Protocol (QKSCP) was also used to decrease communication channel vulnerabilities like Man In The Middle (MITM) and reply attacks. After Dual Q Network-based Asynchronous Advantage Actor-Critic algorithm DQN-A3C algorithm detects and prevents attacks. Thirdly, employed the explainable DQN-A3C model and the Siamese Inter Lingual Transformer (SILT) transformer for natural language explanations to boost social engineering security by ensuring the Artificial Intelligence (AI) model and human trustworthiness. After, we built a Hopping Intrusion Detection & Prevention System (IDS/IPS) using an explainable Harmonized Google Net (HGN) model with SHAP and SILT explanations to appropriately categorize dangerous external traffic flows. Finally, to improve global, cyberattack comprehension, we created a Federated Learning (FL)-based knowledge-sharing mechanism between Cyber Threat Repository (CTR) and cloud servers, known as global risk assessment. To evaluate the suggested approach, the new method is compared to the ones that already exist in terms of malicious traffic (65 bytes/sec), detection rate (97%), false positive rate (45%), prevention accuracy (98%), end-to-end response time (97 s), recall (96%), false negative rate (42%) and resource consumption (41). Our strategy's performance is examined using numerical analysis, and the results demonstrate that it outperforms other methods in all metrics.
{"title":"Cyber-XAI-Block: an end-to-end cyber threat detection & fl-based risk assessment framework for iot enabled smart organization using xai and blockchain technologies","authors":"Omar Abboosh Hussein Gwassi, Osman Nuri Uçan, Enrique A. Navarro","doi":"10.1007/s11042-024-20059-4","DOIUrl":"https://doi.org/10.1007/s11042-024-20059-4","url":null,"abstract":"<p>The growing integration of the Internet of Things (IoT) in smart organizations is increasing the vulnerability of cyber threats, necessitating advanced frameworks for effective threat detection and risk assessment. Existing works provide achievable results but lack effective solutions, such as detecting Social Engineering Attacks (SEA). Using Deep Learning (DL) and Machine Learning (ML) methods whereas they are limited to validating user behaviors. Like high false positive rates, attack reoccurrence, and increases in numerous attacks. To overcome this problem, we use explainable (DL) techniques to increase cyber security in an IoT-enabled smart organization environment. This paper firstly, implements Capsule Network (CapsNet) to process employee fingerprints and blink patterns. Secondly, the Quantum Key Secure Communication Protocol (QKSCP) was also used to decrease communication channel vulnerabilities like Man In The Middle (MITM) and reply attacks. After Dual Q Network-based Asynchronous Advantage Actor-Critic algorithm DQN-A3C algorithm detects and prevents attacks. Thirdly, employed the explainable DQN-A3C model and the Siamese Inter Lingual Transformer (SILT) transformer for natural language explanations to boost social engineering security by ensuring the Artificial Intelligence (AI) model and human trustworthiness. After, we built a Hopping Intrusion Detection & Prevention System (IDS/IPS) using an explainable Harmonized Google Net (HGN) model with SHAP and SILT explanations to appropriately categorize dangerous external traffic flows. Finally, to improve global, cyberattack comprehension, we created a Federated Learning (FL)-based knowledge-sharing mechanism between Cyber Threat Repository (CTR) and cloud servers, known as global risk assessment. To evaluate the suggested approach, the new method is compared to the ones that already exist in terms of malicious traffic (65 bytes/sec), detection rate (97%), false positive rate (45%), prevention accuracy (98%), end-to-end response time (97 s), recall (96%), false negative rate (42%) and resource consumption (41). Our strategy's performance is examined using numerical analysis, and the results demonstrate that it outperforms other methods in all metrics.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"58 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-11DOI: 10.1007/s11042-024-19770-z
Phani Kumar Immadisetty, C. Rajabhushanam
Type 2 diabetes (T2D) is a prolonged disease caused by abnormal rise in glucose levels due to poor insulin production in the pancreas. However, the detection and classification of this type of disease is very challenging and requires effective techniques for learning the T2D features. Therefore, this study proposes the use of a novel hybridized deep learning-based technique to automatically detect and categorize T2D by effectively learning disease attributes. First, missing value imputation and a normalization-based pre-processing phase are introduced to improve the quality of the data. The Adaptive Boosted Sooty Tern Optimization (Adap-BSTO) approach is then used to select the best features while minimizing complexity. After that, the Synthetic Minority Oversampling Technique (SMOTE) is used to verify that the database classes are evenly distributed. Finally, the Deep Convolutional Attention-based Bidirectional Recurrent Neural Network (DCA-BiRNN) technique is proposed to detect and classify the presence and absence of T2D disease accurately. The proposed study is instigated via the Python platform, and two publicly available PIMA Indian and HFD databases are utilized in this study. Accuracy, NPV, kappa score, Mathew's correlation coefficient (MCC), false discovery rate (FDR), and time complexity are among the assessment metrics examined and compared to prior research. For the PIMA Indian dataset, the proposed method obtains an overall accuracy of 99.6%, FDR of 0.0038, kappa of 99.24%, and NPV of 99.6%. For the HFD dataset, the proposed method acquires an overall accuracy of 99.5%, FDR of 0.0052, kappa of 99%, and NPV of 99.4%, respectively.
{"title":"SMOTE-Based deep network with adaptive boosted sooty for the detection and classification of type 2 diabetes mellitus","authors":"Phani Kumar Immadisetty, C. Rajabhushanam","doi":"10.1007/s11042-024-19770-z","DOIUrl":"https://doi.org/10.1007/s11042-024-19770-z","url":null,"abstract":"<p>Type 2 diabetes (T2D) is a prolonged disease caused by abnormal rise in glucose levels due to poor insulin production in the pancreas. However, the detection and classification of this type of disease is very challenging and requires effective techniques for learning the T2D features. Therefore, this study proposes the use of a novel hybridized deep learning-based technique to automatically detect and categorize T2D by effectively learning disease attributes. First, missing value imputation and a normalization-based pre-processing phase are introduced to improve the quality of the data. The Adaptive Boosted Sooty Tern Optimization (Adap-BSTO) approach is then used to select the best features while minimizing complexity. After that, the Synthetic Minority Oversampling Technique (SMOTE) is used to verify that the database classes are evenly distributed. Finally, the Deep Convolutional Attention-based Bidirectional Recurrent Neural Network (DCA-BiRNN) technique is proposed to detect and classify the presence and absence of T2D disease accurately. The proposed study is instigated via the Python platform, and two publicly available PIMA Indian and HFD databases are utilized in this study. Accuracy, NPV, kappa score, Mathew's correlation coefficient (MCC), false discovery rate (FDR), and time complexity are among the assessment metrics examined and compared to prior research. For the PIMA Indian dataset, the proposed method obtains an overall accuracy of 99.6%, FDR of 0.0038, kappa of 99.24%, and NPV of 99.6%. For the HFD dataset, the proposed method acquires an overall accuracy of 99.5%, FDR of 0.0052, kappa of 99%, and NPV of 99.4%, respectively.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"3 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203682","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-09-10DOI: 10.1007/s11042-024-20179-x
Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo
The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction & proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction & proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.
{"title":"Exploiting multi-transformer encoder with multiple-hypothesis aggregation via diffusion model for 3D human pose estimation","authors":"Sathiyamoorthi Arthanari, Jae Hoon Jeong, Young Hoon Joo","doi":"10.1007/s11042-024-20179-x","DOIUrl":"https://doi.org/10.1007/s11042-024-20179-x","url":null,"abstract":"<p>The transformer architecture has consistently achieved cutting-edge performance in the task of 2D to 3D lifting human pose estimation. Despite advances in transformer-based methods they still suffer from issues related to sequential data processing, addressing depth ambiguity, and effective handling of sensitive noisy data. As a result, transformer encoders encounter difficulties in precisely estimating human positions. To solve this problem, a novel multi-transformer encoder with a multiple-hypothesis aggregation (MHAFormer) module is proposed in this study. To do this, a diffusion module is first introduced that generates multiple 3D pose hypotheses and gradually distributes Gaussian noise to ground truth 3D poses. Subsequently, the denoiser is employed within the diffusion module to restore the feasible 3D poses by leveraging the information from the 2D keypoints. Moreover, we propose the multiple-hypothesis aggregation with a join-level reprojection (MHAJR) approach that redesigns the 3D hypotheses into the 2D position and selects the optimal hypothesis by considering reprojection errors. In particular, the multiple-hypothesis aggregation approach tackles depth ambiguity and sequential data processing by considering various possible poses and combining their strengths for a more accurate final estimation. Next, we present the improved spatial-temporal transformers encoder that can help to improve the accuracy and reduce the ambiguity of 3D pose estimation by explicitly modeling the spatial and temporal relationships between different body joints. Specifically, the temporal-transformer encoder introduces the temporal constriction & proliferation (TCP) attention mechanism and the feature aggregation refinement module (FAR) into the refined temporal constriction & proliferation (RTCP) transformer, which enhances intra-block temporal modeling and further refines inter-block feature interaction. Finally, the superiority of the proposed approach is demonstrated through comparison with existing methods using the Human3.6M and MPI-INF-3DHP benchmark datasets.</p>","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"47 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}