Multimedia Tools and Applications最新文献_第10页

HSALC: hard sample aware label correction for medical image classification HSALC：用于医学图像分类的硬样本感知标签校正

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-02 DOI: 10.1007/s11042-024-20114-0

Yangtao Wang, Yicheng Ye, Yanzhao Xie, Maobin Tang, Lisheng Fan

Medical image automatic classification has always been a research hotspot, but the existing methods suffer from the label noise problem, which either discards those samples with noisy labels or produces wrong label correction, seriously preventing the medical image classification performance improvement. To address the above problems, in this paper, we propose a hard sample aware label correction (termed as HSALC) method for medical image classification. Our HSALC mainly consists of a sample division module, a clean(cdot )hard(cdot )noisy (termed as CHN) detection module and a label noise correction module. First, in the sample division module, we design a sample division criterion based on the training difficulty and training losses to divide all samples into three preliminary subsets: clean samples, hard samples and noisy samples. Second, in the CHN detection module, we add noise to the above clean samples and repeatedly adopt the sample division criterion to effectively detect all data, which helps obtain highly reliable clean samples, hard samples and noisy samples. Finally, in the label noise correction module, in order to make full use of each available sample, we train a correction model to purify and correct the wrong labels of noisy samples as much as possible, which brings a highly purified dataset. We conduct extensive experiments on five image datasets including three medical image datasets and two natural image datasets. Experimental results demonstrate that HSALC can greatly promote classification performance on noisily labeled datasets, especially with high noise ratios. The source code of this paper is publicly available at GitHub: https://github.com/YYC117/HSALC.

医学影像自动分类一直是研究热点，但现有方法存在标签噪声问题，要么丢弃有噪声标签的样本，要么产生错误的标签校正，严重阻碍了医学影像分类性能的提高。针对上述问题，本文提出了一种用于医学图像分类的硬样本感知标签校正方法（简称 HSALC）。HSALC主要由样本划分模块、噪声检测模块和标签噪声校正模块组成。首先，在样本划分模块中，我们设计了一个基于训练难度和训练损失的样本划分准则，将所有样本初步划分为三个子集：干净样本、困难样本和噪声样本。其次，在 CHN 检测模块中，我们在上述干净样本中加入噪声，并反复采用样本划分准则对所有数据进行有效检测，这有助于获得高可靠性的干净样本、硬样本和噪声样本。最后，在标签噪声校正模块中，为了充分利用每一个可用样本，我们训练了一个校正模型，以尽可能净化和校正噪声样本的错误标签，从而带来一个高度纯化的数据集。我们在五个图像数据集上进行了大量实验，包括三个医学图像数据集和两个自然图像数据集。实验结果表明，HSALC 可以大大提高噪声标签数据集的分类性能，尤其是在高噪声比的情况下。本文的源代码可在 GitHub 上公开获取：https://github.com/YYC117/HSALC。

{"title":"HSALC: hard sample aware label correction for medical image classification","authors":"Yangtao Wang, Yicheng Ye, Yanzhao Xie, Maobin Tang, Lisheng Fan","doi":"10.1007/s11042-024-20114-0","DOIUrl":"https://doi.org/10.1007/s11042-024-20114-0","url":null,"abstract":"Medical image automatic classification has always been a research hotspot, but the existing methods suffer from the label noise problem, which either discards those samples with noisy labels or produces wrong label correction, seriously preventing the medical image classification performance improvement. To address the above problems, in this paper, we propose a hard sample aware label correction (termed as HSALC) method for medical image classification. Our HSALC mainly consists of a sample division module, a clean(cdot )hard(cdot )noisy (termed as CHN) detection module and a label noise correction module. First, in the sample division module, we design a sample division criterion based on the training difficulty and training losses to divide all samples into three preliminary subsets: clean samples, hard samples and noisy samples. Second, in the CHN detection module, we add noise to the above clean samples and repeatedly adopt the sample division criterion to effectively detect all data, which helps obtain highly reliable clean samples, hard samples and noisy samples. Finally, in the label noise correction module, in order to make full use of each available sample, we train a correction model to purify and correct the wrong labels of noisy samples as much as possible, which brings a highly purified dataset. We conduct extensive experiments on five image datasets including three medical image datasets and two natural image datasets. Experimental results demonstrate that HSALC can greatly promote classification performance on noisily labeled datasets, especially with high noise ratios. The source code of this paper is publicly available at GitHub: https://github.com/YYC117/HSALC.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"60 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Effective and efficient automatic detection, prediction and prescription of potential disease in berry family 切实有效地自动检测、预测浆果科植物的潜在病害并开出处方

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-02 DOI: 10.1007/s11042-024-19896-0

Roopa R. Kulkarni, Abhishek D. Sharma, Bhuvan K. Koundinya, Chokkanahalli Anirudh, Yashas N

The grape cultivation industry in India faces significant challenges from fungal pests and diseases, leading to substantial economic losses. Detecting leaf diseases in grape plants at an early stage is crucial to prevent infections from spreading, minimize crop damage, and apply timely and precise treatments. This proactive approach is vital for maintaining the productivity and quality of grape cultivation. Integrated technology is crucial for improving grape production and minimizing the use of harmful pesticides. Developing smart robots and computer vision-enabled systems can efficiently detect and predict diseases, reducing human labor and optimizing grape production. The CNN algorithm achieved an accuracy of 98% using the real-time dataset, making it a highly effective method for image training and classification. VGG16 and Improved VGG16 achieved accuracies of 95% and 96%, respectively, indicating their strong performance. MobileNet and Improved MobileNet achieved accuracies of 86% and 97%, respectively. Utilizing Convolutional Neural Networks (CNN) for grape plant leaf detection facilitates precise and automated differentiation between healthy and diseased leaves by analyzing their visual features. This method not only enables early disease detection but also calculates the total area of the leaf affected by the disease. Such an approach presents a promising solution to enhance productivity in grape cultivation.

印度的葡萄种植业面临着真菌病虫害的巨大挑战，导致了巨大的经济损失。及早发现葡萄植株叶片上的病害对于防止感染蔓延、最大限度地减少作物损失以及及时准确地进行治疗至关重要。这种积极主动的方法对于保持葡萄种植的产量和质量至关重要。综合技术对于提高葡萄产量和最大限度地减少有害杀虫剂的使用至关重要。开发智能机器人和计算机视觉系统可以有效地检测和预测病害，从而减少人力，优化葡萄生产。使用实时数据集，CNN 算法的准确率达到 98%，是一种高效的图像训练和分类方法。VGG16 和改进型 VGG16 的准确率分别达到 95% 和 96%，显示了其强大的性能。MobileNet 和改进版 MobileNet 的准确率分别为 86% 和 97%。利用卷积神经网络（CNN）进行葡萄植株叶片检测，可通过分析叶片的视觉特征，准确、自动地区分健康叶片和病叶。这种方法不仅能实现早期病害检测，还能计算出受病害影响的叶片总面积。这种方法为提高葡萄种植的生产率提供了一种前景广阔的解决方案。

{"title":"Effective and efficient automatic detection, prediction and prescription of potential disease in berry family","authors":"Roopa R. Kulkarni, Abhishek D. Sharma, Bhuvan K. Koundinya, Chokkanahalli Anirudh, Yashas N","doi":"10.1007/s11042-024-19896-0","DOIUrl":"https://doi.org/10.1007/s11042-024-19896-0","url":null,"abstract":"The grape cultivation industry in India faces significant challenges from fungal pests and diseases, leading to substantial economic losses. Detecting leaf diseases in grape plants at an early stage is crucial to prevent infections from spreading, minimize crop damage, and apply timely and precise treatments. This proactive approach is vital for maintaining the productivity and quality of grape cultivation. Integrated technology is crucial for improving grape production and minimizing the use of harmful pesticides. Developing smart robots and computer vision-enabled systems can efficiently detect and predict diseases, reducing human labor and optimizing grape production. The CNN algorithm achieved an accuracy of 98% using the real-time dataset, making it a highly effective method for image training and classification. VGG16 and Improved VGG16 achieved accuracies of 95% and 96%, respectively, indicating their strong performance. MobileNet and Improved MobileNet achieved accuracies of 86% and 97%, respectively. Utilizing Convolutional Neural Networks (CNN) for grape plant leaf detection facilitates precise and automated differentiation between healthy and diseased leaves by analyzing their visual features. This method not only enables early disease detection but also calculates the total area of the leaf affected by the disease. Such an approach presents a promising solution to enhance productivity in grape cultivation.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Parkinsonian gait modelling from an anomaly deep representation 从异常深度表示中建立帕金森步态模型

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-02 DOI: 10.1007/s11042-024-19961-8

Edgar Rangel, Fabio Martínez

Parkinson’s Disease (PD) is associated with gait movement disorders, such as bradykinesia, stiffness, tremors and postural instability. Hence, a kinematic gait analysis for PD characterization is key to support diagnosis and to carry out an effective treatment planning. Nowadays, automatic classification and characterization strategies are based on deep learning representations, following supervised rules, and assuming large and stratified data. Nonetheless, such requirements are far from real clinical scenarios. Additionally, supervised rules may introduce bias into architectures from expert’s annotations. This work introduces a self-supervised generative representation to learn gait-motion-related patterns, under the pretext task of video reconstruction. Following an anomaly detection framework, the proposed architecture can avoid inter-class variance, learning hidden and complex kinematics locomotion relationships. In this study, the proposed model was trained and validated with an owner dataset (14 Parkinson and 23 control). Also, an external public dataset (16 Parkinson, 30 control, and 50 Knee-arthritis) was used only for testing, measuring the generalization capability of the method. During training, the method learns from control subjects, while Parkinson subjects are detected as anomaly samples. From owner dataset, the proposed approach achieves a ROC-AUC of 95% in classification task. Regarding the external dataset, the architecture evidence generalization capabilities, achieving a 75% of ROC-AUC (shapeness and homoscedasticity of 66.7%), without any additional training. The proposed model has remarkable performance in detecting gait parkinsonian patterns, recorded in markerless videos, even competitive results with classes non-observed during training.

帕金森病（PD）与步态运动障碍有关，如运动迟缓、僵硬、震颤和姿势不稳。因此，针对帕金森病特征的运动步态分析是支持诊断和进行有效治疗规划的关键。如今，自动分类和特征描述策略都是基于深度学习表征，遵循监督规则，并假设有大量分层数据。然而，这些要求与真实的临床场景相去甚远。此外，监督规则可能会在专家注释的架构中引入偏差。这项研究以视频重建任务为借口，引入了一种自监督生成表示法来学习步态运动相关模式。根据异常检测框架，所提出的架构可以避免类间差异，学习隐藏的复杂运动学运动关系。在这项研究中，所提出的模型通过所有者数据集（14 个帕金森患者和 23 个对照组患者）进行了训练和验证。此外，一个外部公共数据集（16 个帕金森患者、30 个对照组和 50 个膝关节炎患者）仅用于测试，以衡量该方法的泛化能力。在训练过程中，该方法从对照组受试者身上学习，而帕金森受试者则作为异常样本进行检测。从所有者数据集来看，所提出的方法在分类任务中的 ROC-AUC 达到了 95%。至于外部数据集，该架构证明了其泛化能力，在没有任何额外训练的情况下，ROC-AUC 达到了 75%（形状和同方差为 66.7%）。所提出的模型在检测无标记视频中记录的帕金森病步态模式方面表现出色，甚至与训练期间未观察到的类别具有竞争性。

{"title":"Parkinsonian gait modelling from an anomaly deep representation","authors":"Edgar Rangel, Fabio Martínez","doi":"10.1007/s11042-024-19961-8","DOIUrl":"https://doi.org/10.1007/s11042-024-19961-8","url":null,"abstract":"Parkinson’s Disease (PD) is associated with gait movement disorders, such as bradykinesia, stiffness, tremors and postural instability. Hence, a kinematic gait analysis for PD characterization is key to support diagnosis and to carry out an effective treatment planning. Nowadays, automatic classification and characterization strategies are based on deep learning representations, following supervised rules, and assuming large and stratified data. Nonetheless, such requirements are far from real clinical scenarios. Additionally, supervised rules may introduce bias into architectures from expert’s annotations. This work introduces a self-supervised generative representation to learn gait-motion-related patterns, under the pretext task of video reconstruction. Following an anomaly detection framework, the proposed architecture can avoid inter-class variance, learning hidden and complex kinematics locomotion relationships. In this study, the proposed model was trained and validated with an owner dataset (14 Parkinson and 23 control). Also, an external public dataset (16 Parkinson, 30 control, and 50 Knee-arthritis) was used only for testing, measuring the generalization capability of the method. During training, the method learns from control subjects, while Parkinson subjects are detected as anomaly samples. From owner dataset, the proposed approach achieves a ROC-AUC of 95% in classification task. Regarding the external dataset, the architecture evidence generalization capabilities, achieving a 75% of ROC-AUC (shapeness and homoscedasticity of 66.7%), without any additional training. The proposed model has remarkable performance in detecting gait parkinsonian patterns, recorded in markerless videos, even competitive results with classes non-observed during training.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"2 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Investigation of attention mechanism for speech command recognition 语音命令识别的注意机制研究

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-09-02 DOI: 10.1007/s11042-024-20129-7

Jie Xie, Mingying Zhu, Kai Hu, Jinglan Zhang, Ya Guo

As an application area of speech command recognition, the smart home has provided people with a convenient way to communicate with various digital devices. Deep learning has demonstrated its effectiveness in speech command recognition. However, few studies have conducted extensive research on leveraging attention mechanisms to enhance its performance. In this study, we aim to investigate the deep learning architectures for improved speaker-independent speech command recognition. Specifically, we first compare the log-Mel-spectrogram and log-Gammatone spectrogram using VGG style and VGG-skip style networks. Next, the best-performing model is selected and investigated using different attention mechanisms including channel-time attention, channel-frequency attention, and channel-time-frequency attention. Finally, a dual CNN with cross-attention is used for speech command classification. A self-made dataset including 40 participants with 12 classes is used for the experiment which are all recorded in Mandarin Chinese, utilizing a variety of smartphone devices across diverse settings. Experimental results indicate that using log-Gammatone spectrogram and VGG-skip style networks with cross attention can achieve the best performance, where the accuracy, precision, recall and F1-score are 94.59%, 95.84%, 94.64%, and 94.57%, respectively.

作为语音命令识别的一个应用领域，智能家居为人们提供了与各种数字设备交流的便捷方式。深度学习在语音命令识别中的有效性已得到证实。然而，很少有研究对利用注意力机制来提高其性能进行广泛研究。在本研究中，我们旨在研究深度学习架构，以提高与说话人无关的语音命令识别能力。具体来说，我们首先使用 VGG 风格和 VGG-skip 风格网络对 log-Mel 频谱图和 log-Gammatone 频谱图进行比较。然后，选出表现最佳的模型，并使用不同的注意机制进行研究，包括信道-时间注意、信道-频率注意和信道-时间-频率注意。最后，使用具有交叉注意力的双 CNN 进行语音命令分类。实验使用了一个自制的数据集，其中包括 40 名参与者和 12 个类别，这些数据都是用普通话录制的，在不同的环境下使用了各种智能手机设备。实验结果表明，使用具有交叉注意力的对数-伽马通频谱图和 VGG-skip 风格网络可以获得最佳性能，准确率、精确度、召回率和 F1 分数分别为 94.59%、95.84%、94.64% 和 94.57%。

{"title":"Investigation of attention mechanism for speech command recognition","authors":"Jie Xie, Mingying Zhu, Kai Hu, Jinglan Zhang, Ya Guo","doi":"10.1007/s11042-024-20129-7","DOIUrl":"https://doi.org/10.1007/s11042-024-20129-7","url":null,"abstract":"As an application area of speech command recognition, the smart home has provided people with a convenient way to communicate with various digital devices. Deep learning has demonstrated its effectiveness in speech command recognition. However, few studies have conducted extensive research on leveraging attention mechanisms to enhance its performance. In this study, we aim to investigate the deep learning architectures for improved speaker-independent speech command recognition. Specifically, we first compare the log-Mel-spectrogram and log-Gammatone spectrogram using VGG style and VGG-skip style networks. Next, the best-performing model is selected and investigated using different attention mechanisms including channel-time attention, channel-frequency attention, and channel-time-frequency attention. Finally, a dual CNN with cross-attention is used for speech command classification. A self-made dataset including 40 participants with 12 classes is used for the experiment which are all recorded in Mandarin Chinese, utilizing a variety of smartphone devices across diverse settings. Experimental results indicate that using log-Gammatone spectrogram and VGG-skip style networks with cross attention can achieve the best performance, where the accuracy, precision, recall and F1-score are 94.59%, 95.84%, 94.64%, and 94.57%, respectively.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"7 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142203650","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep learning-based algorithm for automated detection of glaucoma on eye fundus images 基于深度学习的眼底图像青光眼自动检测算法

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-08-08 DOI: 10.1007/s11042-024-19989-w

Hervé Tampa, Martial Mekongo, Alain Tiedeu

Projections predict that about one hundred and twelve million people will be affected by glaucoma by 2040. It can be ranked as a serious public health problem, being a significant cause of blindness. However, if detected early, total blindness can be delayed. A computerized analysis of images of the eye fundus can be a tool for early diagnosis of glaucoma. In this paper, we have developed a deep-learning-based algorithm for the automated detection of this condition using images from Origa-light and Origa databases. A total of 1300 images were used in the study. The algorithm consists of two steps, namely processing and classification. The images were processed respectively by blue component extraction, conversion into greyscale images, ellipse fitting, median filtering, sobel filter application and finally binarizing by a simple global thresholding method. The classification was carried out using a modified VGGNet19 (Visual Geometric Group Net 19) powered by transfer learning. The algorithm was tested on 260 images. A sensitivity of 100%, a specificity of 97.69%, an accuracy of 98.84%, an F1 score of 98.85%, and finally an area under the ROC-curve (AUC) of 0.989 were obtained. These values are encouraging and better than those yielded by many state-of-the-art methods.

据预测，到 2040 年，约有 1.12 亿人将受到青光眼的影响。青光眼是导致失明的重要原因之一，可被列为严重的公共卫生问题。不过，如果能及早发现，完全失明的时间是可以推迟的。对眼底图像进行计算机分析可作为早期诊断青光眼的工具。在本文中，我们开发了一种基于深度学习的算法，利用 Origa-light 和 Origa 数据库中的图像自动检测青光眼。研究共使用了 1300 张图像。该算法包括两个步骤，即处理和分类。处理图像的步骤分别是提取蓝色分量、转换成灰度图像、椭圆拟合、中值滤波、应用索贝尔滤波器，最后通过简单的全局阈值法进行二值化。分类是通过迁移学习的改进型 VGGNet19（视觉几何组网 19）进行的。该算法在 260 幅图像上进行了测试。结果显示，灵敏度为 100%，特异度为 97.69%，准确度为 98.84%，F1 得分为 98.85%，ROC 曲线下面积（AUC）为 0.989。这些数值令人鼓舞，而且优于许多最先进的方法。

{"title":"Deep learning-based algorithm for automated detection of glaucoma on eye fundus images","authors":"Hervé Tampa, Martial Mekongo, Alain Tiedeu","doi":"10.1007/s11042-024-19989-w","DOIUrl":"https://doi.org/10.1007/s11042-024-19989-w","url":null,"abstract":"Projections predict that about one hundred and twelve million people will be affected by glaucoma by 2040. It can be ranked as a serious public health problem, being a significant cause of blindness. However, if detected early, total blindness can be delayed. A computerized analysis of images of the eye fundus can be a tool for early diagnosis of glaucoma. In this paper, we have developed a deep-learning-based algorithm for the automated detection of this condition using images from Origa-light and Origa databases. A total of 1300 images were used in the study. The algorithm consists of two steps, namely processing and classification. The images were processed respectively by blue component extraction, conversion into greyscale images, ellipse fitting, median filtering, sobel filter application and finally binarizing by a simple global thresholding method. The classification was carried out using a modified VGGNet19 (Visual Geometric Group Net 19) powered by transfer learning. The algorithm was tested on 260 images. A sensitivity of 100%, a specificity of 97.69%, an accuracy of 98.84%, an F1 score of 98.85%, and finally an area under the ROC-curve (AUC) of 0.989 were obtained. These values are encouraging and better than those yielded by many state-of-the-art methods.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"12 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Histopathology image analysis for gastric cancer detection: a hybrid deep learning and catboost approach 用于胃癌检测的组织病理学图像分析：深度学习和 catboost 混合方法

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-08-07 DOI: 10.1007/s11042-024-19816-2

Danial Khayatian, Alireza Maleki, Hamid Nasiri, Morteza Dorrigiv

Since gastric cancer is growing fast, accurate and prompt diagnosis is essential, utilizing computer-aided diagnosis (CAD) systems is an efficient way to achieve this goal. Using methods related to computer vision enables more accurate predictions and faster diagnosis, leading to timely treatment. CAD systems can categorize photos effectively using deep learning techniques based on image analysis and classification. Accurate and timely classification of histopathology images is critical for enabling immediate treatment strategies, but remains challenging. We propose a hybrid deep learning and gradient-boosting approach that achieves high accuracy in classifying gastric histopathology images. This approach examines two classifiers for six networks known as pre-trained models to extract features. Extracted features will be fed to the classifiers separately. The inputs are gastric histopathological images. The GasHisSDB dataset provides these inputs containing histopathology gastric images in three 80px, 120px, and 160px cropping sizes. According to these achievements and experiments, we proposed the final method, which combines the EfficientNetV2B0 model to extract features from the images and then classify them using the CatBoost classifier. The results based on the accuracy score are 89.7%, 93.1%, and 93.9% in 80px, 120px, and 160px cropping sizes, respectively. Additional metrics including precision, recall, and F1-scores were above 0.9, demonstrating strong performance across various evaluation criteria. In another way, to approve and see the model efficiency, the GradCAM algorithm was implemented. Visualization via Grad-CAM illustrated discriminative regions identified by the model, confirming focused learning on histologically relevant features. The consistent accuracy and reliable detections across diverse evaluation metrics substantiate the robustness of the proposed deep learning and gradient-boosting approach for gastric cancer screening from histopathology images. For this purpose, two types of outputs (The heat map and the GradCAM output) are provided. Additionally, t-SNE visualization showed a clear clustering of normal and abnormal cases after EfficientNetV2B0 feature extraction. The cross-validation and visualizations provide further evidence of generalizability and focused learning of meaningful pathology features for gastric cancer screening from histopathology images.

由于胃癌发展迅速，准确和及时的诊断至关重要，利用计算机辅助诊断（CAD）系统是实现这一目标的有效途径。使用与计算机视觉相关的方法可以实现更准确的预测和更快速的诊断，从而获得及时的治疗。计算机辅助诊断系统可以利用基于图像分析和分类的深度学习技术对照片进行有效分类。准确、及时地对组织病理学图像进行分类对于制定即时治疗策略至关重要，但这仍然具有挑战性。我们提出了一种混合深度学习和梯度提升方法，可在胃组织病理学图像分类中实现高准确度。这种方法为六个网络（称为预训练模型）检查了两个分类器，以提取特征。提取的特征将分别输入分类器。输入是胃组织病理学图像。GasHisSDB 数据集提供了这些输入，其中包含 80px、120px 和 160px 三种裁剪尺寸的胃组织病理学图像。根据这些成果和实验，我们提出了最终方法，即结合 EfficientNetV2B0 模型从图像中提取特征，然后使用 CatBoost 分类器进行分类。在 80px、120px 和 160px 裁剪尺寸下，基于准确率得分的结果分别为 89.7%、93.1% 和 93.9%。其他指标，包括精确度、召回率和 F1 分数均高于 0.9，显示了在各种评估标准中的优异表现。另外，为了验证和观察模型的效率，我们采用了 GradCAM 算法。通过 Grad-CAM 的可视化显示了模型识别出的鉴别区域，证实了对组织学相关特征的集中学习。在不同的评估指标中，一致的准确性和可靠的检测结果证明了所提出的深度学习和梯度提升方法在组织病理学图像胃癌筛查中的稳健性。为此，我们提供了两种类型的输出（热图和 GradCAM 输出）。此外，经过 EfficientNetV2B0 特征提取后，t-SNE 可视化显示了正常和异常病例的清晰聚类。交叉验证和可视化进一步证明了从组织病理学图像中筛选胃癌的通用性和有意义病理学特征的集中学习。

{"title":"Histopathology image analysis for gastric cancer detection: a hybrid deep learning and catboost approach","authors":"Danial Khayatian, Alireza Maleki, Hamid Nasiri, Morteza Dorrigiv","doi":"10.1007/s11042-024-19816-2","DOIUrl":"https://doi.org/10.1007/s11042-024-19816-2","url":null,"abstract":"Since gastric cancer is growing fast, accurate and prompt diagnosis is essential, utilizing computer-aided diagnosis (CAD) systems is an efficient way to achieve this goal. Using methods related to computer vision enables more accurate predictions and faster diagnosis, leading to timely treatment. CAD systems can categorize photos effectively using deep learning techniques based on image analysis and classification. Accurate and timely classification of histopathology images is critical for enabling immediate treatment strategies, but remains challenging. We propose a hybrid deep learning and gradient-boosting approach that achieves high accuracy in classifying gastric histopathology images. This approach examines two classifiers for six networks known as pre-trained models to extract features. Extracted features will be fed to the classifiers separately. The inputs are gastric histopathological images. The GasHisSDB dataset provides these inputs containing histopathology gastric images in three 80px, 120px, and 160px cropping sizes. According to these achievements and experiments, we proposed the final method, which combines the EfficientNetV2B0 model to extract features from the images and then classify them using the CatBoost classifier. The results based on the accuracy score are 89.7%, 93.1%, and 93.9% in 80px, 120px, and 160px cropping sizes, respectively. Additional metrics including precision, recall, and F1-scores were above 0.9, demonstrating strong performance across various evaluation criteria. In another way, to approve and see the model efficiency, the GradCAM algorithm was implemented. Visualization via Grad-CAM illustrated discriminative regions identified by the model, confirming focused learning on histologically relevant features. The consistent accuracy and reliable detections across diverse evaluation metrics substantiate the robustness of the proposed deep learning and gradient-boosting approach for gastric cancer screening from histopathology images. For this purpose, two types of outputs (The heat map and the GradCAM output) are provided. Additionally, t-SNE visualization showed a clear clustering of normal and abnormal cases after EfficientNetV2B0 feature extraction. The cross-validation and visualizations provide further evidence of generalizability and focused learning of meaningful pathology features for gastric cancer screening from histopathology images.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Classification of electroencephalograms before or after applying transcutaneous electrical nerve stimulation therapy using fractional empirical mode decomposition 利用分数经验模式分解法对经皮神经电刺激疗法前后的脑电图进行分类

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-08-07 DOI: 10.1007/s11042-024-19992-1

Jiaqi Liu, Bingo Wing-Kuen Ling, Zhaoheng Zhou, Weirong Wu, Ruilin Li, Qing Liu

It is worth noting that applying the transcutaneous electrical nerve stimulation (TENS) therapy at the superficial nerve locations can modulate the brain activities. This paper aims to further investigate whether applying the TENS therapy at the superficial nerve locations can improve the attention of the subjects or not when the subjects are playing the mathematical game or reading a technical paper. First, the electroencephalograms (EEGs) are acquired before and after the TENS therapy is applied at the superficial nerve locations. Then, both the EEGs acquired before and after applying the TENS therapy are mixed together. Next, the preprocessing is applied to these acquired EEGs. Second, the fractional empirical mode decomposition (FEMD) is employed for extracting the features. Subsequently, the genetic algorithm (GA) is employed for performing the feature selection to obtain the optimal features. Finally, the support vector machine (SVM) and the random forest (RF) are used to classify whether the EEGs are acquired before or after the TENS therapy is applied. Since the higher classification accuracy refers to the larger difference of the EEGs acquired before and after the TENS therapy is applied, the classification accuracy reflects the effectiveness of applying the TENS therapy for improving the attention of the subjects. It is found that the percentages of the classification accuracies based on the EEGs acquired via the one channel device during playing the online mathematical game via the SVM and the RF by our proposed method are between 78.90% and 98.31% as well as between 78.44% and 100%, respectively. The percentages of the classification accuracies based on the EEGs acquired via the eight channel device during playing the online mathematical game via the SVM and the RF by our proposed method are between 80.84% and 93.63% as well as between 86.83% and 99.09%, respectively. the percentages of the classification accuracies based on the EEGs acquired via the one channel device during reading a technical paper via the SVM and the RF by our proposed method are between 77.67% and 83.67% as well as between 79.61% and 84.69%, respectively. the percentages of the classification accuracies based on the EEGs acquired via the sixteen channel device during reading a technical paper via the SVM and the RF by our proposed method are between 82.30% and 90.02% as well as between 91.72% and 95.91%, respectively. As our proposed method yields a higher classification accuracy than the states of the arts methods, this demonstrates the potential of using our proposed method as a tool for the medical officers to perform the precise clinical diagnoses and make the therapeutic decisions based on TENS.

值得注意的是，在浅表神经位置应用经皮神经电刺激疗法（TENS）可以调节大脑活动。本文旨在进一步研究在受试者玩数学游戏或阅读科技论文时，在浅表神经位置施加经皮神经电刺激疗法是否能提高受试者的注意力。首先，在浅表神经位置进行 TENS 治疗前后采集脑电图（EEG）。然后，将 TENS 治疗前后采集的脑电图混合在一起。然后，对这些获取的脑电图进行预处理。其次，采用分数经验模式分解（FEMD）提取特征。然后，采用遗传算法（GA）进行特征选择，以获得最佳特征。最后，利用支持向量机（SVM）和随机森林（RF）对脑电图进行分类，以确定脑电图是在 TENS 治疗之前还是之后采集的。由于分类准确率越高，说明应用 TENS 治疗前后获得的脑电图差异越大，因此分类准确率反映了应用 TENS 治疗改善受试者注意力的效果。研究发现，我们提出的方法通过 SVM 和 RF 对单通道设备获取的玩在线数学游戏时的脑电图进行分类，分类准确率分别为 78.90% 至 98.31%，以及 78.44% 至 100%。在玩在线数学游戏时通过八通道设备获取的脑电图，通过 SVM 和射频（RF）方法进行的分类准确率分别在 80.84% 和 93.63% 之间，以及 86.83% 和 99.09% 之间。在阅读技术论文时通过一通道设备获取的脑电图，通过 SVM 和射频（RF）方法进行的分类准确率分别在 77.67% 和 83.67% 之间，以及 86.83% 和 99.09% 之间。通过 SVM 和射频，我们提出的方法对阅读技术论文时通过 16 通道设备获取的脑电图进行分类，分类准确率分别为 82.30% 到 90.02%，以及 91.72% 到 95.91%。由于我们提出的方法比现有方法具有更高的分类准确性，这表明我们提出的方法具有潜力，可作为医务人员根据 TENS 进行精确临床诊断和做出治疗决策的工具。

{"title":"Classification of electroencephalograms before or after applying transcutaneous electrical nerve stimulation therapy using fractional empirical mode decomposition","authors":"Jiaqi Liu, Bingo Wing-Kuen Ling, Zhaoheng Zhou, Weirong Wu, Ruilin Li, Qing Liu","doi":"10.1007/s11042-024-19992-1","DOIUrl":"https://doi.org/10.1007/s11042-024-19992-1","url":null,"abstract":"It is worth noting that applying the transcutaneous electrical nerve stimulation (TENS) therapy at the superficial nerve locations can modulate the brain activities. This paper aims to further investigate whether applying the TENS therapy at the superficial nerve locations can improve the attention of the subjects or not when the subjects are playing the mathematical game or reading a technical paper. First, the electroencephalograms (EEGs) are acquired before and after the TENS therapy is applied at the superficial nerve locations. Then, both the EEGs acquired before and after applying the TENS therapy are mixed together. Next, the preprocessing is applied to these acquired EEGs. Second, the fractional empirical mode decomposition (FEMD) is employed for extracting the features. Subsequently, the genetic algorithm (GA) is employed for performing the feature selection to obtain the optimal features. Finally, the support vector machine (SVM) and the random forest (RF) are used to classify whether the EEGs are acquired before or after the TENS therapy is applied. Since the higher classification accuracy refers to the larger difference of the EEGs acquired before and after the TENS therapy is applied, the classification accuracy reflects the effectiveness of applying the TENS therapy for improving the attention of the subjects. It is found that the percentages of the classification accuracies based on the EEGs acquired via the one channel device during playing the online mathematical game via the SVM and the RF by our proposed method are between 78.90% and 98.31% as well as between 78.44% and 100%, respectively. The percentages of the classification accuracies based on the EEGs acquired via the eight channel device during playing the online mathematical game via the SVM and the RF by our proposed method are between 80.84% and 93.63% as well as between 86.83% and 99.09%, respectively. the percentages of the classification accuracies based on the EEGs acquired via the one channel device during reading a technical paper via the SVM and the RF by our proposed method are between 77.67% and 83.67% as well as between 79.61% and 84.69%, respectively. the percentages of the classification accuracies based on the EEGs acquired via the sixteen channel device during reading a technical paper via the SVM and the RF by our proposed method are between 82.30% and 90.02% as well as between 91.72% and 95.91%, respectively. As our proposed method yields a higher classification accuracy than the states of the arts methods, this demonstrates the potential of using our proposed method as a tool for the medical officers to perform the precise clinical diagnoses and make the therapeutic decisions based on TENS.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"40 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A secure and privacy-preserving technique based on coupled chaotic system and plaintext encryption for multimodal medical images 基于耦合混沌系统和明文加密的多模态医学图像安全和隐私保护技术

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-08-07 DOI: 10.1007/s11042-024-19956-5

Hongwei Xie, Yuzhou Zhang, Jing Bian, Hao Zhang

In medical diagnosis, colored and gray medical images contain different pathological features, and the fusion of the two images can help doctors make a more intuitive diagnosis. Fusion medical images contain a large amount of private information, and ensuring their security during transmission is critical. This paper proposes a multi-modal medical image security protection scheme based on coupled chaotic mapping. Firstly, a sequentially coupled chaotic map is proposed using Logistic mapping and Cubic mapping as seed chaotic maps, and its chaotic performance is verified by Lyapunov index analysis, phase diagram attractor distribution analysis, and NIST randomness test. Secondly, combining the process of image encryption with the process of image fusion, a plaintext-associated multimodal medical image hierarchical encryption algorithm is proposed. Finally, a blind watermarking algorithm based on forward Meyer wavelet transform and singular value decomposition is proposed to embed the EMR report into the encrypted channel to realize the mutual authentication of the EMR report and medical image. The experimental results show that compared with the related algorithms, the proposed algorithm has better encryption authentication performance, histogram, and scatter plot are nearly uniform distribution, and the NPCR and UACI of plaintext sensitivity and key sensitivity are close to 99.6094% and 33.4635%, respectively, and has strong robustness to noise attacks and clipping attacks.

在医学诊断中，彩色和灰色医学图像包含不同的病理特征，将两种图像融合可以帮助医生做出更直观的诊断。融合医学图像包含大量隐私信息，确保其在传输过程中的安全性至关重要。本文提出了一种基于耦合混沌映射的多模态医学图像安全保护方案。首先，以 Logistic 映射和 Cubic 映射为种子混沌映射，提出了一种顺序耦合混沌映射，并通过 Lyapunov 指数分析、相图吸引子分布分析和 NIST 随机性测试验证了其混沌性能。其次，结合图像加密过程和图像融合过程，提出了一种明文关联的多模态医学图像分层加密算法。最后，提出了一种基于前向迈耶小波变换和奇异值分解的盲水印算法，将心电监护报告嵌入到加密通道中，实现心电监护报告与医学影像的相互认证。实验结果表明，与相关算法相比，所提出的算法具有更好的加密认证性能，直方图、散点图接近均匀分布，明文灵敏度和密钥灵敏度的NPCR和UACI分别接近99.6094%和33.4635%，对噪声攻击和剪切攻击具有较强的鲁棒性。

{"title":"A secure and privacy-preserving technique based on coupled chaotic system and plaintext encryption for multimodal medical images","authors":"Hongwei Xie, Yuzhou Zhang, Jing Bian, Hao Zhang","doi":"10.1007/s11042-024-19956-5","DOIUrl":"https://doi.org/10.1007/s11042-024-19956-5","url":null,"abstract":"In medical diagnosis, colored and gray medical images contain different pathological features, and the fusion of the two images can help doctors make a more intuitive diagnosis. Fusion medical images contain a large amount of private information, and ensuring their security during transmission is critical. This paper proposes a multi-modal medical image security protection scheme based on coupled chaotic mapping. Firstly, a sequentially coupled chaotic map is proposed using Logistic mapping and Cubic mapping as seed chaotic maps, and its chaotic performance is verified by Lyapunov index analysis, phase diagram attractor distribution analysis, and NIST randomness test. Secondly, combining the process of image encryption with the process of image fusion, a plaintext-associated multimodal medical image hierarchical encryption algorithm is proposed. Finally, a blind watermarking algorithm based on forward Meyer wavelet transform and singular value decomposition is proposed to embed the EMR report into the encrypted channel to realize the mutual authentication of the EMR report and medical image. The experimental results show that compared with the related algorithms, the proposed algorithm has better encryption authentication performance, histogram, and scatter plot are nearly uniform distribution, and the NPCR and UACI of plaintext sensitivity and key sensitivity are close to 99.6094% and 33.4635%, respectively, and has strong robustness to noise attacks and clipping attacks.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"85 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A mobile application and system architecture for online speech training in Portuguese: design, development, and evaluation of SofiaFala 用于葡萄牙语在线语音培训的移动应用程序和系统架构：SofiaFala 的设计、开发和评估

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-08-07 DOI: 10.1007/s11042-024-19980-5

Alessandra Alaniz Macedo, Vinícius de S. Gonçalves, Patrícia P. Mandrá, Vivian Motti, Renato F. Bulcão-Neto, Kamila Rios da Hora Rodrigues

Online Speech Therapy (OST) systems are assistive tools that provide online support for training, planning, and executing specific speech sounds. The most traditional OST systems are mobile applications, mainly supporting English and Spanish languages. This paper describes the development of a mobile assistive application for speech training –SofiaFala is freely available and provides support for Portuguese. The app builds upon computational modules that support speech therapy for people with neurodevelopmental disorders, including Down Syndrome. Specifically, the development of SofiaFala was iterative, involving target users actively. Speech-language pathologists as well as parents, caregivers, and children with speech disorders, mostly with Down Syndrome, contributed to the app development. This paper also describes the design of SofiaFala and the app functionalities. Also, we discuss usage workload and findings from an experimental study. In addition to analyzing the related work, we explain how we (i) elicited SofiaFala features; (ii) developed the software architecture of the app, seeking to connect speech-language pathologists’ and patients’ activities (including therapy session planning and home training exercises); (iii) evaluated SofiaFala through a field study involving target users; and (iv) addressed key challenges along the implementation process. SofiaFala provides integrated features that aim to maximize communication effectiveness, enhance language skills among users, and ultimately improve the quality of life of people with speech impairments.

在线语音治疗（OST）系统是一种辅助工具，可为特定语音的训练、规划和执行提供在线支持。最传统的 OST 系统是移动应用程序，主要支持英语和西班牙语。本文介绍了一款用于语音训练的移动辅助应用程序的开发情况--SofiaFala 可免费获取，并支持葡萄牙语。该应用程序以支持神经发育障碍（包括唐氏综合症）患者语音治疗的计算模块为基础。具体来说，SofiaFala 的开发过程是反复进行的，目标用户积极参与其中。言语病理学家、家长、护理人员和有言语障碍的儿童（主要是患有唐氏综合症的儿童）都为应用程序的开发做出了贡献。本文还介绍了 SofiaFala 的设计和应用程序的功能。此外，我们还讨论了使用工作量和实验研究结果。除了分析相关工作外，我们还解释了我们如何（i）激发 SofiaFala 的功能；（ii）开发应用程序的软件架构，力求将语言病理学家和患者的活动（包括治疗课程计划和家庭训练练习）联系起来；（iii）通过目标用户参与的实地研究对 SofiaFala 进行评估；以及（iv）应对实施过程中的主要挑战。SofiaFala 提供的综合功能旨在最大限度地提高交流效率，增强用户的语言技能，并最终改善语言障碍患者的生活质量。

{"title":"A mobile application and system architecture for online speech training in Portuguese: design, development, and evaluation of SofiaFala","authors":"Alessandra Alaniz Macedo, Vinícius de S. Gonçalves, Patrícia P. Mandrá, Vivian Motti, Renato F. Bulcão-Neto, Kamila Rios da Hora Rodrigues","doi":"10.1007/s11042-024-19980-5","DOIUrl":"https://doi.org/10.1007/s11042-024-19980-5","url":null,"abstract":"Online Speech Therapy (OST) systems are assistive tools that provide online support for training, planning, and executing specific speech sounds. The most traditional OST systems are mobile applications, mainly supporting English and Spanish languages. This paper describes the development of a mobile assistive application for speech training –SofiaFala is freely available and provides support for Portuguese. The app builds upon computational modules that support speech therapy for people with neurodevelopmental disorders, including Down Syndrome. Specifically, the development of SofiaFala was iterative, involving target users actively. Speech-language pathologists as well as parents, caregivers, and children with speech disorders, mostly with Down Syndrome, contributed to the app development. This paper also describes the design of SofiaFala and the app functionalities. Also, we discuss usage workload and findings from an experimental study. In addition to analyzing the related work, we explain how we (i) elicited SofiaFala features; (ii) developed the software architecture of the app, seeking to connect speech-language pathologists’ and patients’ activities (including therapy session planning and home training exercises); (iii) evaluated SofiaFala through a field study involving target users; and (iv) addressed key challenges along the implementation process. SofiaFala provides integrated features that aim to maximize communication effectiveness, enhance language skills among users, and ultimately improve the quality of life of people with speech impairments.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"18 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Real-time masked face recognition and authentication with convolutional neural networks on the web application 利用卷积神经网络在网络应用程序上进行实时蒙面人脸识别和身份验证

IF 3.6 4区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Multimedia Tools and Applications

Pub Date : 2024-08-07 DOI: 10.1007/s11042-024-19953-8

Sansiri Tarnpradab, Pavat Poonpinij, Nattawut Na Lumpoon, Naruemon Wattanapongsakorn

The COVID-19 outbreak has highlighted the importance of wearing a face mask to prevent virus transmission. During the peak of the pandemic, everyone was required to wear a face mask both inside and outside the building. Nowadays, even though the pandemic has passed, it is still necessary to wear a face mask in some situations/areas. Nevertheless, a face mask becomes a major barrier, especially in places where full-face authentication is required; most facial recognition systems are unable to recognize masked faces accurately, thereby resulting in incorrect predictions. To address this challenge, this study proposes a web-based application system to accomplish three main tasks: (1) recognizing, in real-time, whether an individual entering the location is wearing a face mask; and (2) correctly identifying an individual as a biometric authentication despite facial features obscured by a face mask with varying types, shapes and colors. (3) easily updating the recognition model with the most recent user list, with a user-friendly interface from the real-time web application. The underlying model to perform detection and recognition is convolutional neural networks. In this study, we experimented with VGG16, VGGFace, and InceptionResNetV2. Experimental cases to determine model performance are; using only masked-face images, and using both full-face and masked-face images together. We evaluate the models using performance metrics including accuracy, recall, precision, F1-score, and training time. The results have shown superior performance compared with those from related works. Our best model could reach an accuracy of 93.3%, a recall of 93.8%, and approximately 93-94% for precision and F1-score, when recognizing 50 individuals.

COVID-19 的爆发凸显了佩戴口罩预防病毒传播的重要性。在疫情高峰期，所有人都必须在建筑物内外佩戴口罩。如今，虽然大流行已经过去，但在某些情况下/地区仍有必要佩戴口罩。然而，口罩成为了一个主要障碍，尤其是在需要进行全脸认证的地方；大多数人脸识别系统无法准确识别戴口罩的人脸，从而导致错误的预测。为应对这一挑战，本研究提出了一个基于网络的应用系统，以完成三项主要任务：(1) 实时识别进入该地点的个人是否戴有面罩；(2) 在面部特征被不同类型、形状和颜色的面罩遮挡的情况下，正确识别个人的生物特征认证。(3) 通过实时网络应用程序的用户友好界面，利用最新用户列表轻松更新识别模型。进行检测和识别的基础模型是卷积神经网络。在本研究中，我们使用 VGG16、VGGFace 和 InceptionResNetV2 进行了实验。确定模型性能的实验案例包括：仅使用遮挡面部的图像，以及同时使用完整面部和遮挡面部的图像。我们使用准确率、召回率、精确度、F1 分数和训练时间等性能指标对模型进行评估。结果显示，与相关研究相比，我们的模型性能更优。我们的最佳模型在识别 50 个个体时，准确率达到 93.3%，召回率达到 93.8%，精确度和 F1 分数约为 93-94%。

{"title":"Real-time masked face recognition and authentication with convolutional neural networks on the web application","authors":"Sansiri Tarnpradab, Pavat Poonpinij, Nattawut Na Lumpoon, Naruemon Wattanapongsakorn","doi":"10.1007/s11042-024-19953-8","DOIUrl":"https://doi.org/10.1007/s11042-024-19953-8","url":null,"abstract":"The COVID-19 outbreak has highlighted the importance of wearing a face mask to prevent virus transmission. During the peak of the pandemic, everyone was required to wear a face mask both inside and outside the building. Nowadays, even though the pandemic has passed, it is still necessary to wear a face mask in some situations/areas. Nevertheless, a face mask becomes a major barrier, especially in places where full-face authentication is required; most facial recognition systems are unable to recognize masked faces accurately, thereby resulting in incorrect predictions. To address this challenge, this study proposes a web-based application system to accomplish three main tasks: (1) recognizing, in real-time, whether an individual entering the location is wearing a face mask; and (2) correctly identifying an individual as a biometric authentication despite facial features obscured by a face mask with varying types, shapes and colors. (3) easily updating the recognition model with the most recent user list, with a user-friendly interface from the real-time web application. The underlying model to perform detection and recognition is convolutional neural networks. In this study, we experimented with VGG16, VGGFace, and InceptionResNetV2. Experimental cases to determine model performance are; using only masked-face images, and using both full-face and masked-face images together. We evaluate the models using performance metrics including accuracy, recall, precision, F1-score, and training time. The results have shown superior performance compared with those from related works. Our best model could reach an accuracy of 93.3%, a recall of 93.8%, and approximately 93-94% for precision and F1-score, when recognizing 50 individuals.","PeriodicalId":18770,"journal":{"name":"Multimedia Tools and Applications","volume":"24 1","pages":""},"PeriodicalIF":3.6,"publicationDate":"2024-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141944237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0