首页 > 最新文献

Neural Computing and Applications最新文献

英文 中文
Amina: an Arabic multi-purpose integral news articles dataset Amina:阿拉伯语多功能积分新闻文章数据集
Pub Date : 2024-09-18 DOI: 10.1007/s00521-024-10277-0
Mohamed Zaytoon, Muhannad Bashar, Mohamed A. Khamis, Walid Gomaa

Electronic newspapers are one of the most common sources of Modern Standard Arabic. Existing datasets of Arabic news articles typically provide a title, body, and single label. Ignoring important features, like the article author, image, tags, and publication date, can degrade the efficacy of classification models. In this paper, we propose the Arabic multi-purpose integral news articles (AMINA) dataset. AMINA is a large-scale Arabic news corpus with over 1,850,000 articles collected from 9 Arabic newspapers from different countries. It includes all the article features: title, tags, publication date and time, location, author, article image and its caption, and the number of visits. To test the efficacy of the proposed dataset, three tasks were developed and validated: article textual content (classification and generation) and article image classification. For content classification, we experimented the performance of several state-of-the-art Arabic NLP models including AraBERT and CAMeL-BERT, etc. For content generation, the reformer architecture is adopted as a character text generation model. For image classification applied on Al-Sharq and Youm7 news portals, we have compared the performance of 10 pre-trained models including ConvNeXt, MaxViT, ResNet18, etc. The overall study verifies the significance and contribution of our newly introduced Arabic articles dataset. The AMINA dataset has been released at https://huggingface.co/datasets/MohamedZayton/AMINA.

电子报纸是现代标准阿拉伯语的最常见来源之一。现有的阿拉伯语新闻文章数据集通常提供标题、正文和单一标签。忽略文章作者、图片、标签和发布日期等重要特征会降低分类模型的效率。在本文中,我们提出了阿拉伯语多用途积分新闻文章(AMINA)数据集。AMINA 是一个大规模的阿拉伯语新闻语料库,包含来自不同国家 9 种阿拉伯语报纸的超过 1,850,000 篇文章。它包括所有文章特征:标题、标签、出版日期和时间、地点、作者、文章图片及其标题以及访问次数。为了测试建议数据集的功效,我们开发并验证了三个任务:文章文本内容(分类和生成)和文章图片分类。在内容分类方面,我们测试了几个最先进的阿拉伯语 NLP 模型的性能,包括 AraBERT 和 CAMeL-BERT 等。在内容生成方面,我们采用了 reformer 架构作为字符文本生成模型。在应用于 Al-Sharq 和 Youm7 新闻门户网站的图像分类方面,我们比较了包括 ConvNeXt、MaxViT、ResNet18 等在内的 10 个预训练模型的性能。整个研究验证了我们新推出的阿拉伯语文章数据集的意义和贡献。AMINA 数据集已在 https://huggingface.co/datasets/MohamedZayton/AMINA 上发布。
{"title":"Amina: an Arabic multi-purpose integral news articles dataset","authors":"Mohamed Zaytoon, Muhannad Bashar, Mohamed A. Khamis, Walid Gomaa","doi":"10.1007/s00521-024-10277-0","DOIUrl":"https://doi.org/10.1007/s00521-024-10277-0","url":null,"abstract":"<p>Electronic newspapers are one of the most common sources of Modern Standard Arabic. Existing datasets of Arabic news articles typically provide a title, body, and single label. Ignoring important features, like the article author, image, tags, and publication date, can degrade the efficacy of classification models. In this paper, we propose the Arabic multi-purpose integral news articles (AMINA) dataset. AMINA is a large-scale Arabic news corpus with over 1,850,000 articles collected from 9 Arabic newspapers from different countries. It includes all the article features: title, tags, publication date and time, location, author, article image and its caption, and the number of visits. To test the efficacy of the proposed dataset, three tasks were developed and validated: article textual content (classification and generation) and article image classification. For content classification, we experimented the performance of several state-of-the-art Arabic NLP models including AraBERT and CAMeL-BERT, etc. For content generation, the reformer architecture is adopted as a character text generation model. For image classification applied on Al-Sharq and Youm7 news portals, we have compared the performance of 10 pre-trained models including ConvNeXt, MaxViT, ResNet18, etc. The overall study verifies the significance and contribution of our newly introduced Arabic articles dataset. The AMINA dataset has been released at https://huggingface.co/datasets/MohamedZayton/AMINA.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"25 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter $$epsilon$$ via gradient magnitude histogram analysis 微调自适应随机优化器:通过梯度幅度直方图分析确定最佳超参数 $$epsilon$
Pub Date : 2024-09-18 DOI: 10.1007/s00521-024-10302-2
Gustavo Silva, Paul Rodriguez

Stochastic optimizers play a crucial role in the successful training of deep neural network models. To achieve optimal model performance, designers must carefully select both model and optimizer hyperparameters. However, this process is frequently demanding in terms of computational resources and processing time. While it is a well-established practice to tune the entire set of optimizer hyperparameters for peak performance, there is still a lack of clarity regarding the individual influence of hyperparameters mislabeled as “low priority”, including the safeguard factor (epsilon) and decay rate (beta), in leading adaptive stochastic optimizers like the Adam optimizer. In this manuscript, we introduce a new framework based on the empirical probability density function of the loss’ gradient magnitude, termed as the “gradient magnitude histogram”, for a thorough analysis of adaptive stochastic optimizers and the safeguard hyperparameter (epsilon). This framework reveals and justifies valuable relationships and dependencies among hyperparameters in connection to optimal performance across diverse tasks, such as classification, language modeling and machine translation. Furthermore, we propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard hyperparameter (epsilon), surpassing the conventional trial-and-error methodology by establishing a worst-case search space that is two times narrower.

随机优化器在深度神经网络模型的成功训练中起着至关重要的作用。为了实现最佳模型性能,设计者必须仔细选择模型和优化器的超参数。然而,这一过程往往需要大量的计算资源和处理时间。虽然调整整套优化器超参数以达到最佳性能是一种行之有效的做法,但在亚当优化器等领先的自适应随机优化器中,被误标为 "低优先级 "的超参数(包括保障系数和衰减率)的个别影响仍然不够明确。在本手稿中,我们引入了一个基于损失梯度大小的经验概率密度函数的新框架,称为 "梯度大小直方图",用于全面分析自适应随机优化器和保障超参数(epsilon)。这一框架揭示并证明了超参数之间有价值的关系和依赖性,这些关系和依赖性与分类、语言建模和机器翻译等不同任务的最佳性能有关。此外,我们还提出了一种新颖的算法,利用梯度幅度直方图来自动估算最优保障超参数(epsilon/)的精炼而精确的搜索空间,超越了传统的试错方法,建立了一个最坏情况下比传统方法窄两倍的搜索空间。
{"title":"Fine-tuning adaptive stochastic optimizers: determining the optimal hyperparameter $$epsilon$$ via gradient magnitude histogram analysis","authors":"Gustavo Silva, Paul Rodriguez","doi":"10.1007/s00521-024-10302-2","DOIUrl":"https://doi.org/10.1007/s00521-024-10302-2","url":null,"abstract":"<p>Stochastic optimizers play a crucial role in the successful training of deep neural network models. To achieve optimal model performance, designers must carefully select both model and optimizer hyperparameters. However, this process is frequently demanding in terms of computational resources and processing time. While it is a well-established practice to tune the entire set of optimizer hyperparameters for peak performance, there is still a lack of clarity regarding the individual influence of hyperparameters mislabeled as “low priority”, including the safeguard factor <span>(epsilon)</span> and decay rate <span>(beta)</span>, in leading adaptive stochastic optimizers like the Adam optimizer. In this manuscript, we introduce a new framework based on the empirical probability density function of the loss’ gradient magnitude, termed as the “gradient magnitude histogram”, for a thorough analysis of adaptive stochastic optimizers and the safeguard hyperparameter <span>(epsilon)</span>. This framework reveals and justifies valuable relationships and dependencies among hyperparameters in connection to optimal performance across diverse tasks, such as classification, language modeling and machine translation. Furthermore, we propose a novel algorithm using gradient magnitude histograms to automatically estimate a refined and accurate search space for the optimal safeguard hyperparameter <span>(epsilon)</span>, surpassing the conventional trial-and-error methodology by establishing a worst-case search space that is two times narrower.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"76 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Augmented electric eel foraging optimization algorithm for feature selection with high-dimensional biological and medical diagnosis 用于高维生物和医学诊断特征选择的增强型电鳗觅食优化算法
Pub Date : 2024-09-18 DOI: 10.1007/s00521-024-10288-x
Mohammed Azmi Al-Betar, Malik Sh. Braik, Elfadil A. Mohamed, Mohammed A. Awadallah, Mohamed Nasor

This paper explores the importance of the electric eel foraging optimization (EEFO) algorithm in addressing feature selection (FS) problems, with the aim of ameliorating the practical benefit of FS in real-world applications. The use of EEFO to solve FS problems props our goal of providing clean and useful datasets that provide robust effectiveness for use in classification and clustering tasks. High-dimensional feature selection problems (HFSPs) are more common nowadays yet intricate where they contain a large number of features. Hence, the vast number of features in them should be carefully selected in order to determine the optimal subset of features. As the basic EEFO algorithm experiences premature convergence, there is a need to enhance its global and local search capabilities when applied in the field of FS. In order to tackle such issues, a binary augmented EEFO (BAEEFO) algorithm was developed and proposed for HFSPs. The following strategies were integrated into the mathematical model of the original EEFO algorithm to create BAEEFO: (1) resting behavior with nonlinear coefficient; (2) weight coefficient and confidence effect in the hunting process; (3) spiral search strategy; and (4) Gaussian mutation and random perturbations when the algorithm update is stagnant. Experimental findings confirm the effectiveness of the proposed BAEEFO method on 23 HFSPs gathered from the UCI repository, recording up to a 10% accuracy increment over the basic BEEFO algorithm. In most test cases, BAEEFO outperformed its competitors in classification accuracy rates and outperformed BEEFO in 90% of the datasets used. Thereby, BAEEFO has demonstrated strong competitiveness in terms of fitness scores and classification accuracy. When compared to its competitors, BAEEFO produced superior reduction rates with the fewest number of features selected. The findings in this research underscore the critical need for FS to combat the curse of dimensionality concerns and find highly useful features in data mining applications such as classification. The use of a new meta-heuristic algorithm incorporated with efficient search strategies in solving HFSPs represents a step forward in using this algorithm to solve other practical real-world problems in a variety of domains.

本文探讨了电鳗觅食优化(EEFO)算法在解决特征选择(FS)问题中的重要性,旨在改善FS在现实世界应用中的实际效益。使用 EEFO 解决 FS 问题有助于实现我们的目标,即提供干净、有用的数据集,为分类和聚类任务提供强大的功效。如今,高维特征选择问题(HFSPs)越来越常见,但这些问题错综复杂,包含大量特征。因此,应仔细选择其中的大量特征,以确定最佳特征子集。由于基本的 EEFO 算法会出现过早收敛的情况,因此在应用于 FS 领域时,有必要增强其全局和局部搜索能力。为了解决这些问题,我们开发了一种二进制增强型 EEFO(BAEEFO)算法,并提出将其用于 HFSP。在原始 EEFO 算法的数学模型中集成了以下策略,从而创建了 BAEEFO:(1)非线性系数的静止行为;(2)狩猎过程中的权重系数和置信度效应;(3)螺旋搜索策略;以及(4)算法更新停滞时的高斯突变和随机扰动。实验结果证实了所提出的 BAEEFO 方法对从 UCI 资源库中收集的 23 个 HFSP 的有效性,与基本 BEEFO 算法相比,准确率提高了 10%。在大多数测试案例中,BAEEFO 的分类准确率超过了其竞争对手,在 90% 的数据集中,BAEEFO 的分类准确率超过了 BEEFO。因此,BAEEFO 在适应度得分和分类准确率方面表现出了很强的竞争力。与竞争对手相比,BAEEFO 以最少的特征选择实现了更高的缩减率。这项研究的结果突出表明,在分类等数据挖掘应用中,亟需使用 FS 来解决维度诅咒问题,并找到非常有用的特征。新的元启发式算法与高效搜索策略相结合,在解决 HFSP 方面的应用,标志着我们在使用该算法解决各种领域的其他实际问题方面又向前迈进了一步。
{"title":"Augmented electric eel foraging optimization algorithm for feature selection with high-dimensional biological and medical diagnosis","authors":"Mohammed Azmi Al-Betar, Malik Sh. Braik, Elfadil A. Mohamed, Mohammed A. Awadallah, Mohamed Nasor","doi":"10.1007/s00521-024-10288-x","DOIUrl":"https://doi.org/10.1007/s00521-024-10288-x","url":null,"abstract":"<p>This paper explores the importance of the electric eel foraging optimization (EEFO) algorithm in addressing feature selection (FS) problems, with the aim of ameliorating the practical benefit of FS in real-world applications. The use of EEFO to solve FS problems props our goal of providing clean and useful datasets that provide robust effectiveness for use in classification and clustering tasks. High-dimensional feature selection problems (HFSPs) are more common nowadays yet intricate where they contain a large number of features. Hence, the vast number of features in them should be carefully selected in order to determine the optimal subset of features. As the basic EEFO algorithm experiences premature convergence, there is a need to enhance its global and local search capabilities when applied in the field of FS. In order to tackle such issues, a binary augmented EEFO (BAEEFO) algorithm was developed and proposed for HFSPs. The following strategies were integrated into the mathematical model of the original EEFO algorithm to create BAEEFO: (1) resting behavior with nonlinear coefficient; (2) weight coefficient and confidence effect in the hunting process; (3) spiral search strategy; and (4) Gaussian mutation and random perturbations when the algorithm update is stagnant. Experimental findings confirm the effectiveness of the proposed BAEEFO method on 23 HFSPs gathered from the UCI repository, recording up to a 10% accuracy increment over the basic BEEFO algorithm. In most test cases, BAEEFO outperformed its competitors in classification accuracy rates and outperformed BEEFO in 90% of the datasets used. Thereby, BAEEFO has demonstrated strong competitiveness in terms of fitness scores and classification accuracy. When compared to its competitors, BAEEFO produced superior reduction rates with the fewest number of features selected. The findings in this research underscore the critical need for FS to combat the curse of dimensionality concerns and find highly useful features in data mining applications such as classification. The use of a new meta-heuristic algorithm incorporated with efficient search strategies in solving HFSPs represents a step forward in using this algorithm to solve other practical real-world problems in a variety of domains.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PLD-Det: plant leaf disease detection in real time using an end-to-end neural network approach based on improved YOLOv7 PLD-Det:使用基于改进型 YOLOv7 的端到端神经网络方法实时检测植物叶片病害
Pub Date : 2024-09-17 DOI: 10.1007/s00521-024-10409-6
Md Humaion Kabir Mehedi, Nafisa Nawer, Shafi Ahmed, Md Shakiful Islam Khan, Khan Md Hasib, M. F. Mridha, Md. Golam Rabiul Alam, Thanh Thi Nguyen

In order to maintain sustainable agriculture, it is vital to monitor plant health. Since all species of plants are prone to characteristic diseases, it necessitates regular surveillance to search for any symptoms, which is utterly challenging and time-consuming. Besides, farmers may struggle to identify the type of plant disease and its potential symptoms. Hence, the interest in research like image-based computer-aided automated plant leaf disease detection by analyzing the early symptoms has increased enormously. However, limitations in the plant leaf image database, for instance, unfitting backgrounds, blurry images, and so on, sometimes cause underprivileged feature extraction, misclassification, and overfitting issues in existing models. As a result, we have proposed a real-time plant leaf disease detection architecture incorporating proposed PLD-Det model, which is based on improved YOLOv7 with the intention of assisting farmers while reducing the issues in existing models. The architecture has been trained on the widely used PlantVillage dataset, which resulted in an accuracy of 98.53%. Furthermore, SHapley Additive exPlanations (SHAP) values have been analyzed as a unified measure of feature significance. According to the experimental findings, the proposed PLD-Det model, which is an improved YOLOv7 architecture, outperformed the original YOLOv7 model in test accuracy by approximately 4%.

为了保持农业的可持续发展,监测植物健康状况至关重要。由于所有种类的植物都容易感染特征性病害,因此需要定期监测,寻找任何症状,这非常具有挑战性,也非常耗时。此外,农民可能难以确定植物病害的类型及其潜在症状。因此,人们对通过分析早期症状进行基于图像的计算机辅助自动植物叶片病害检测等研究的兴趣大增。然而,植物叶片图像数据库的局限性,如背景不匹配、图像模糊等,有时会导致现有模型的特征提取不足、分类错误和过拟合等问题。因此,我们提出了一种实时植物叶片病害检测架构,该架构结合了基于改进型 YOLOv7 的 PLD-Det 模型,旨在帮助农民减少现有模型中存在的问题。该架构在广泛使用的 PlantVillage 数据集上进行了训练,准确率达到 98.53%。此外,还对 SHapley Additive exPlanations(SHAP)值进行了分析,将其作为特征重要性的统一衡量标准。根据实验结果,作为 YOLOv7 架构改进版的 PLD-Det 模型在测试准确率方面比原始 YOLOv7 模型高出约 4%。
{"title":"PLD-Det: plant leaf disease detection in real time using an end-to-end neural network approach based on improved YOLOv7","authors":"Md Humaion Kabir Mehedi, Nafisa Nawer, Shafi Ahmed, Md Shakiful Islam Khan, Khan Md Hasib, M. F. Mridha, Md. Golam Rabiul Alam, Thanh Thi Nguyen","doi":"10.1007/s00521-024-10409-6","DOIUrl":"https://doi.org/10.1007/s00521-024-10409-6","url":null,"abstract":"<p>In order to maintain sustainable agriculture, it is vital to monitor plant health. Since all species of plants are prone to characteristic diseases, it necessitates regular surveillance to search for any symptoms, which is utterly challenging and time-consuming. Besides, farmers may struggle to identify the type of plant disease and its potential symptoms. Hence, the interest in research like image-based computer-aided automated plant leaf disease detection by analyzing the early symptoms has increased enormously. However, limitations in the plant leaf image database, for instance, unfitting backgrounds, blurry images, and so on, sometimes cause underprivileged feature extraction, misclassification, and overfitting issues in existing models. As a result, we have proposed a real-time plant leaf disease detection architecture incorporating proposed PLD-Det model, which is based on improved YOLOv7 with the intention of assisting farmers while reducing the issues in existing models. The architecture has been trained on the widely used PlantVillage dataset, which resulted in an accuracy of 98.53%. Furthermore, SHapley Additive exPlanations (SHAP) values have been analyzed as a unified measure of feature significance. According to the experimental findings, the proposed PLD-Det model, which is an improved YOLOv7 architecture, outperformed the original YOLOv7 model in test accuracy by approximately 4%.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep convolutional neural networks for age and gender estimation using an imbalanced dataset of human face images 使用不平衡人脸图像数据集估算年龄和性别的深度卷积神经网络
Pub Date : 2024-09-17 DOI: 10.1007/s00521-024-10390-0
İsmail Akgül

Automatic age and gender estimation provides an important information to analyze real-world applications such as human–machine interaction, system access, activity recognition, and consumer profile detection. While it is easy to estimate a person’s gender from human facial images, estimating their age is difficult. In such previous challenging studies, traditional convolutional neural network (CNN) methods have been used for age and gender estimation. With the development of deep convolutional neural network (DCNN) architectures, more successful results have been obtained than traditional CNN methods. In this study, two state-of-the-art DCNN models have been developed in the field of artificial intelligence (AI) to make age and gender estimation on an imbalanced dataset of human face images. Firstly, a new model called fast description network (FINet) was developed, which has a parametrically changeable structure. Secondly, the number of parameters has been reduced by using the layer reduction approach in InceptionV3 and NASNetLarge DCNN model structures, and a second model named inception Nasnet fast identify network (INFINet) was developed by concatenating these models and the FINet model as a triple. FINet and INFINet models developed for age and gender estimation were compared with many other state-of-the-art DCNN models in AI. The most successful accuracy results in terms of both age and gender were obtained with the INFINet model (age: 61.22%, gender: 80.95% in the FG-NET dataset, age: 72.00%, gender: 90.50% in the UTKFace dataset). The results obtained in age and gender estimation with the INFINet model are much more effective than other recent state-of-the-art works. In addition, the FINet model, which has a much smaller number of parameters than the compared models, showed a classification performance that can compete with state-of-the-art methods for age and gender estimation.

自动年龄和性别估计为分析人机交互、系统访问、活动识别和消费者特征检测等现实世界应用提供了重要信息。从人脸图像中估计一个人的性别很容易,但估计一个人的年龄却很困难。在以往具有挑战性的研究中,传统的卷积神经网络(CNN)方法被用于年龄和性别估计。随着深度卷积神经网络(DCNN)架构的发展,已经取得了比传统 CNN 方法更成功的结果。本研究在人工智能(AI)领域开发了两种最先进的 DCNN 模型,用于在不平衡的人脸图像数据集上进行年龄和性别估计。首先,我们开发了一种名为快速描述网络(FINet)的新模型,它具有参数可变的结构。其次,通过在 InceptionV3 和 NASNetLarge DCNN 模型结构中使用减少层的方法,减少了参数的数量,并通过将这些模型和 FINet 模型串联为一个三重模型,开发出了名为 inception Nasnet 快速识别网络(INFINet)的第二个模型。针对年龄和性别估计开发的 FINet 和 INFINet 模型与人工智能领域的许多其他先进 DCNN 模型进行了比较。INFINet 模型在年龄和性别方面的准确率最高(在 FG-NET 数据集中,年龄:61.22%,性别:80.95%;在 UTKFace 数据集中,年龄:72.00%,性别:90.50%)。INFINet 模型在年龄和性别估计方面所取得的结果要比其他最新研究成果有效得多。此外,FINet 模型的参数数量远远少于同类模型,其分类性能可与最先进的年龄和性别估计方法相媲美。
{"title":"Deep convolutional neural networks for age and gender estimation using an imbalanced dataset of human face images","authors":"İsmail Akgül","doi":"10.1007/s00521-024-10390-0","DOIUrl":"https://doi.org/10.1007/s00521-024-10390-0","url":null,"abstract":"<p>Automatic age and gender estimation provides an important information to analyze real-world applications such as human–machine interaction, system access, activity recognition, and consumer profile detection. While it is easy to estimate a person’s gender from human facial images, estimating their age is difficult. In such previous challenging studies, traditional convolutional neural network (CNN) methods have been used for age and gender estimation. With the development of deep convolutional neural network (DCNN) architectures, more successful results have been obtained than traditional CNN methods. In this study, two state-of-the-art DCNN models have been developed in the field of artificial intelligence (AI) to make age and gender estimation on an imbalanced dataset of human face images. Firstly, a new model called fast description network (FINet) was developed, which has a parametrically changeable structure. Secondly, the number of parameters has been reduced by using the layer reduction approach in InceptionV3 and NASNetLarge DCNN model structures, and a second model named inception Nasnet fast identify network (INFINet) was developed by concatenating these models and the FINet model as a triple. FINet and INFINet models developed for age and gender estimation were compared with many other state-of-the-art DCNN models in AI. The most successful accuracy results in terms of both age and gender were obtained with the INFINet model (age: 61.22%, gender: 80.95% in the FG-NET dataset, age: 72.00%, gender: 90.50% in the UTKFace dataset). The results obtained in age and gender estimation with the INFINet model are much more effective than other recent state-of-the-art works. In addition, the FINet model, which has a much smaller number of parameters than the compared models, showed a classification performance that can compete with state-of-the-art methods for age and gender estimation.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268718","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A systematic review of vision transformers and convolutional neural networks for Alzheimer’s disease classification using 3D MRI images 利用三维核磁共振成像图像对用于阿尔茨海默病分类的视觉转换器和卷积神经网络进行系统回顾
Pub Date : 2024-09-17 DOI: 10.1007/s00521-024-10420-x
Mario Alejandro Bravo-Ortiz, Sergio Alejandro Holguin-Garcia, Sebastián Quiñones-Arredondo, Alejandro Mora-Rubio, Ernesto Guevara-Navarro, Harold Brayan Arteaga-Arteaga, Gonzalo A. Ruz, Reinel Tabares-Soto

Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that mainly affects memory and other cognitive functions, such as thinking, reasoning, and the ability to carry out daily activities. It is considered the most common form of dementia in older adults, but it can appear as early as the age of 25. Although the disease has no cure, treatment can be more effective if diagnosed early. In diagnosing AD, changes in the brain’s morphology are identified macroscopically, which is why deep learning models, such as convolutional neural networks (CNN) or vision transformers (ViT), excel in this task. We followed the Systematic Literature Review process, applying stages of the review protocol from it, which aims to detect the need for a review. Then, search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. Several CNN and ViT approaches have already been tested on problems related to brain image analysis for disease detection. A total of 722 articles were found in the selected databases. Still, a series of filters were performed to decrease the number to 44 articles, focusing specifically on brain image analysis with CNN and ViT methods. Deep learning methods are effective for disease diagnosis, and the surge in research activity underscores its importance. However, the lack of access to repositories may introduce bias into the information. Full access demonstrates transparency and facilitates collaborative work in research.

阿尔茨海默病(AD)是一种进行性神经退行性疾病,主要影响记忆力和其他认知功能,如思维、推理和进行日常活动的能力。它被认为是老年人最常见的痴呆症,但早在 25 岁就可能出现。虽然这种疾病无法治愈,但如果能及早诊断,治疗效果会更好。在诊断注意力缺失症时,需要从宏观上识别大脑形态的变化,这也是卷积神经网络(CNN)或视觉转换器(ViT)等深度学习模型在这项任务中表现出色的原因。我们遵循系统文献综述流程,应用其中的综述协议阶段,旨在发现综述需求。然后,在多个文献数据库中制定并执行了搜索公式。对相关出版物进行扫描并提取证据,以回答研究问题。已有几种 CNN 和 ViT 方法在与疾病检测的大脑图像分析相关的问题上进行了测试。在所选数据库中共找到 722 篇文章。尽管如此,我们还是进行了一系列筛选,将文章数量减少到 44 篇,并特别关注使用 CNN 和 ViT 方法进行脑图像分析。深度学习方法对疾病诊断非常有效,研究活动的激增凸显了其重要性。然而,无法访问资料库可能会给信息带来偏见。充分的访问权体现了透明度,有利于研究中的合作工作。
{"title":"A systematic review of vision transformers and convolutional neural networks for Alzheimer’s disease classification using 3D MRI images","authors":"Mario Alejandro Bravo-Ortiz, Sergio Alejandro Holguin-Garcia, Sebastián Quiñones-Arredondo, Alejandro Mora-Rubio, Ernesto Guevara-Navarro, Harold Brayan Arteaga-Arteaga, Gonzalo A. Ruz, Reinel Tabares-Soto","doi":"10.1007/s00521-024-10420-x","DOIUrl":"https://doi.org/10.1007/s00521-024-10420-x","url":null,"abstract":"<p>Alzheimer’s disease (AD) is a progressive neurodegenerative disorder that mainly affects memory and other cognitive functions, such as thinking, reasoning, and the ability to carry out daily activities. It is considered the most common form of dementia in older adults, but it can appear as early as the age of 25. Although the disease has no cure, treatment can be more effective if diagnosed early. In diagnosing AD, changes in the brain’s morphology are identified macroscopically, which is why deep learning models, such as convolutional neural networks (CNN) or vision transformers (ViT), excel in this task. We followed the Systematic Literature Review process, applying stages of the review protocol from it, which aims to detect the need for a review. Then, search equations were formulated and executed in several literature databases. Relevant publications were scanned and used to extract evidence to answer research questions. Several CNN and ViT approaches have already been tested on problems related to brain image analysis for disease detection. A total of 722 articles were found in the selected databases. Still, a series of filters were performed to decrease the number to 44 articles, focusing specifically on brain image analysis with CNN and ViT methods. Deep learning methods are effective for disease diagnosis, and the surge in research activity underscores its importance. However, the lack of access to repositories may introduce bias into the information. Full access demonstrates transparency and facilitates collaborative work in research.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"134 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251076","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Leveraging graph-based learning for credit card fraud detection: a comparative study of classical, deep learning and graph-based approaches 利用基于图的学习进行信用卡欺诈检测:经典、深度学习和基于图的方法的比较研究
Pub Date : 2024-09-17 DOI: 10.1007/s00521-024-10397-7
Sunisha Harish, Chirag Lakhanpal, Amir Hossein Jafari

Credit card fraud results in staggering financial losses amounting to billions of dollars annually, impacting both merchants and consumers. In light of the escalating prevalence of digital crime and online fraud, it is important for organizations to implement robust and advanced technology to efficiently detect fraud and mitigate the issue. Contemporary solutions heavily rely on classical machine learning (ML) and deep learning (DL) methods to handle such tasks. While these methods have been effective in many aspects of fraud detection, they may not always be sufficient for credit card fraud detection as they aren’t adaptable to detect complex relationships when it comes to transactions. Fraudsters, for example, might set up many coordinated accounts to avoid triggering limitations on individual accounts. In the context of fraud detection, the ability of Graph Neural Networks (GNN’s) to aggregate information contained within the local neighbourhood of a transaction enables them to identify larger patterns that may be missed by just looking at a single transaction. In this research, we conduct a thorough analysis to evaluate the effectiveness of GNNs in improving fraud detection over classical ML and DL methods. We first build an heterogeneous graph architecture with the source, transaction, and destination as our nodes. Next, we leverage Relational Graph Convolutional Network (RGCN) to learn the representations of nodes in our graph and perform node classification on the transaction node. Our experimental results demonstrate that GNN’s outperform classical ML and DL methods.

信用卡欺诈每年造成的经济损失高达数十亿美元,对商家和消费者都造成了影响。鉴于数字犯罪和在线欺诈日益猖獗,企业必须采用强大而先进的技术来有效地检测欺诈行为并缓解这一问题。当代解决方案严重依赖经典的机器学习(ML)和深度学习(DL)方法来处理此类任务。虽然这些方法在欺诈检测的许多方面都很有效,但对于信用卡欺诈检测来说,它们可能并不总是足够的,因为它们无法适应检测交易中的复杂关系。例如,欺诈者可能会设立许多协调账户,以避免触发对单个账户的限制。在欺诈检测方面,图神经网络(GNN)能够聚合交易本地邻域内的信息,使其能够识别更大的模式,而这些模式可能会因为只查看单笔交易而被忽略。在本研究中,我们进行了全面分析,以评估与传统的 ML 和 DL 方法相比,图神经网络在改进欺诈检测方面的有效性。首先,我们以来源、交易和目的地为节点,构建了一个异构图架构。接下来,我们利用关系图卷积网络(RGCN)来学习图中节点的表征,并对交易节点进行节点分类。我们的实验结果表明,GNN 的性能优于经典的 ML 和 DL 方法。
{"title":"Leveraging graph-based learning for credit card fraud detection: a comparative study of classical, deep learning and graph-based approaches","authors":"Sunisha Harish, Chirag Lakhanpal, Amir Hossein Jafari","doi":"10.1007/s00521-024-10397-7","DOIUrl":"https://doi.org/10.1007/s00521-024-10397-7","url":null,"abstract":"<p>Credit card fraud results in staggering financial losses amounting to billions of dollars annually, impacting both merchants and consumers. In light of the escalating prevalence of digital crime and online fraud, it is important for organizations to implement robust and advanced technology to efficiently detect fraud and mitigate the issue. Contemporary solutions heavily rely on classical machine learning (ML) and deep learning (DL) methods to handle such tasks. While these methods have been effective in many aspects of fraud detection, they may not always be sufficient for credit card fraud detection as they aren’t adaptable to detect complex relationships when it comes to transactions. Fraudsters, for example, might set up many coordinated accounts to avoid triggering limitations on individual accounts. In the context of fraud detection, the ability of Graph Neural Networks (GNN’s) to aggregate information contained within the local neighbourhood of a transaction enables them to identify larger patterns that may be missed by just looking at a single transaction. In this research, we conduct a thorough analysis to evaluate the effectiveness of GNNs in improving fraud detection over classical ML and DL methods. We first build an heterogeneous graph architecture with the source, transaction, and destination as our nodes. Next, we leverage Relational Graph Convolutional Network (RGCN) to learn the representations of nodes in our graph and perform node classification on the transaction node. Our experimental results demonstrate that GNN’s outperform classical ML and DL methods.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"3 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A knowledge-enhanced interest segment division attention network for click-through rate prediction 用于预测点击率的知识增强型兴趣段划分注意力网络
Pub Date : 2024-09-17 DOI: 10.1007/s00521-024-10330-y
Zhanghui Liu, Shijie Chen, Yuzhong Chen, Jieyang Su, Jiayuan Zhong, Chen Dong

Click-through rate (CTR) prediction aims to estimate the probability of a user clicking on a particular item, making it one of the core tasks in various recommendation platforms. In such systems, user behavior data are crucial for capturing user interests, which has garnered significant attention from both academia and industry, leading to the development of various user behavior modeling methods. However, existing models still face unresolved issues, as they fail to capture the complex diversity of user interests at the semantic level, refine user interests effectively, and uncover users’ potential interests. To address these challenges, we propose a novel model called knowledge-enhanced Interest segment division attention network (KISDAN), which can effectively and comprehensively model user interests. Specifically, to leverage the semantic information within user behavior sequences, we employ the structure of a knowledge graph to divide user behavior sequence into multiple interest segments. To provide a comprehensive representation of user interests, we further categorize user interests into strong and weak interests. By leveraging both the knowledge graph and the item co-occurrence graph, we explore users’ potential interests from two perspectives. This methodology allows KISDAN to better understand the diversity of user interests. Finally, we extensively evaluate KISDAN on three benchmark datasets, and the experimental results consistently demonstrate that the KISDAN model outperforms state-of-the-art models across various evaluation metrics, which validates the effectiveness and superiority of KISDAN.

点击率(CTR)预测旨在估算用户点击特定项目的概率,是各种推荐平台的核心任务之一。在此类系统中,用户行为数据对于捕捉用户兴趣至关重要,这引起了学术界和工业界的极大关注,并导致了各种用户行为建模方法的发展。然而,现有模型仍面临着一些尚未解决的问题,如无法在语义层面捕捉用户兴趣的复杂多样性、无法有效提炼用户兴趣以及挖掘用户的潜在兴趣。为了应对这些挑战,我们提出了一种名为 "知识增强兴趣段划分注意力网络(KISDAN)"的新型模型,它可以有效、全面地建立用户兴趣模型。具体来说,为了充分利用用户行为序列中的语义信息,我们采用知识图谱的结构将用户行为序列划分为多个兴趣段。为了全面呈现用户兴趣,我们进一步将用户兴趣分为强兴趣和弱兴趣。通过利用知识图谱和项目共现图谱,我们从两个角度探索用户的潜在兴趣。这种方法使 KISDAN 能够更好地理解用户兴趣的多样性。最后,我们在三个基准数据集上对 KISDAN 进行了广泛评估,实验结果一致表明,KISDAN 模型在各种评估指标上都优于最先进的模型,从而验证了 KISDAN 的有效性和优越性。
{"title":"A knowledge-enhanced interest segment division attention network for click-through rate prediction","authors":"Zhanghui Liu, Shijie Chen, Yuzhong Chen, Jieyang Su, Jiayuan Zhong, Chen Dong","doi":"10.1007/s00521-024-10330-y","DOIUrl":"https://doi.org/10.1007/s00521-024-10330-y","url":null,"abstract":"<p>Click-through rate (CTR) prediction aims to estimate the probability of a user clicking on a particular item, making it one of the core tasks in various recommendation platforms. In such systems, user behavior data are crucial for capturing user interests, which has garnered significant attention from both academia and industry, leading to the development of various user behavior modeling methods. However, existing models still face unresolved issues, as they fail to capture the complex diversity of user interests at the semantic level, refine user interests effectively, and uncover users’ potential interests. To address these challenges, we propose a novel model called knowledge-enhanced Interest segment division attention network (KISDAN), which can effectively and comprehensively model user interests. Specifically, to leverage the semantic information within user behavior sequences, we employ the structure of a knowledge graph to divide user behavior sequence into multiple interest segments. To provide a comprehensive representation of user interests, we further categorize user interests into strong and weak interests. By leveraging both the knowledge graph and the item co-occurrence graph, we explore users’ potential interests from two perspectives. This methodology allows KISDAN to better understand the diversity of user interests. Finally, we extensively evaluate KISDAN on three benchmark datasets, and the experimental results consistently demonstrate that the KISDAN model outperforms state-of-the-art models across various evaluation metrics, which validates the effectiveness and superiority of KISDAN.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"33 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A review of machine learning techniques for diagnosing Alzheimer’s disease using imaging modalities 利用成像模式诊断阿尔茨海默病的机器学习技术综述
Pub Date : 2024-09-17 DOI: 10.1007/s00521-024-10399-5
Nand Kishore, Neelam Goel

Alzheimer's disease is a progressive form of dementia. Dementia is a broad term for conditions that impair memory, thinking, and behaviour. Brain traumas or disorders can cause dementia. It is estimated that 60–80% of dementia cases around the world are caused by Alzheimer’s disease, an incurable neurodegenerative disorder. Although Alzheimer's disease research has increased in recent years, early diagnosis is challenging due to the complicated brain structure and functions associated with this disease. It is difficult for doctors to identify Alzheimer's disease in its early stages as there are still no biomarkers to be precise in early detection. In the area of medical imaging, deep learning is becoming increasingly popular and successful. There is no single best approach for the detection of Alzheimer's disease. In comparison with conventional machine learning methods, the deep learning models detect Alzheimer's disease more precisely and effectively. In this review paper, various machine learning-based techniques utilized for the classification of Alzheimer's disease through different imaging modalities are discussed. In addition, a comprehensive and detailed analysis of the various image processing procedures along with corresponding classification performance and feature extraction techniques have been meticulously compiled and presented. The investigation of computer-aided image analysis has demonstrated significant potential in the early detection of cognitive changes in individuals experiencing mild cognitive impairment. Machine learning can provide valuable insights into the cognitive status of patients, enabling healthcare professionals to intervene and provide timely treatment. This review may lead to a reliable method for recognizing and predicting Alzheimer's disease.

阿尔茨海默病是一种进行性痴呆。痴呆症是对损害记忆、思维和行为的病症的统称。脑部创伤或失调可导致痴呆症。据估计,全世界 60-80% 的痴呆症病例是由阿尔茨海默病引起的,这是一种无法治愈的神经退行性疾病。虽然近年来对阿尔茨海默病的研究有所增多,但由于这种疾病的大脑结构和功能复杂,早期诊断具有挑战性。由于目前还没有生物标志物可以精确地进行早期检测,因此医生很难在阿尔茨海默病的早期阶段进行识别。在医学成像领域,深度学习正变得越来越流行和成功。目前还没有一种检测阿尔茨海默病的最佳方法。与传统的机器学习方法相比,深度学习模型能更精确、更有效地检测阿尔茨海默病。在这篇综述论文中,讨论了通过不同成像模式对阿尔茨海默病进行分类的各种基于机器学习的技术。此外,还对各种图像处理程序以及相应的分类性能和特征提取技术进行了全面细致的分析和介绍。计算机辅助图像分析研究在早期检测轻度认知障碍患者的认知变化方面显示出巨大的潜力。机器学习可以为了解患者的认知状况提供有价值的见解,使医疗专业人员能够及时干预和提供治疗。这项研究可能会开发出一种识别和预测阿尔茨海默病的可靠方法。
{"title":"A review of machine learning techniques for diagnosing Alzheimer’s disease using imaging modalities","authors":"Nand Kishore, Neelam Goel","doi":"10.1007/s00521-024-10399-5","DOIUrl":"https://doi.org/10.1007/s00521-024-10399-5","url":null,"abstract":"<p>Alzheimer's disease is a progressive form of dementia. Dementia is a broad term for conditions that impair memory, thinking, and behaviour. Brain traumas or disorders can cause dementia. It is estimated that 60–80% of dementia cases around the world are caused by Alzheimer’s disease, an incurable neurodegenerative disorder. Although Alzheimer's disease research has increased in recent years, early diagnosis is challenging due to the complicated brain structure and functions associated with this disease. It is difficult for doctors to identify Alzheimer's disease in its early stages as there are still no biomarkers to be precise in early detection. In the area of medical imaging, deep learning is becoming increasingly popular and successful. There is no single best approach for the detection of Alzheimer's disease. In comparison with conventional machine learning methods, the deep learning models detect Alzheimer's disease more precisely and effectively. In this review paper, various machine learning-based techniques utilized for the classification of Alzheimer's disease through different imaging modalities are discussed. In addition, a comprehensive and detailed analysis of the various image processing procedures along with corresponding classification performance and feature extraction techniques have been meticulously compiled and presented. The investigation of computer-aided image analysis has demonstrated significant potential in the early detection of cognitive changes in individuals experiencing mild cognitive impairment. Machine learning can provide valuable insights into the cognitive status of patients, enabling healthcare professionals to intervene and provide timely treatment. This review may lead to a reliable method for recognizing and predicting Alzheimer's disease.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"106 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142268766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Local part attention for image stylization with text prompt 通过文本提示实现图像风格化的局部关注
Pub Date : 2024-09-17 DOI: 10.1007/s00521-024-10394-w
Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu

Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.

基于提示的肖像图像风格转换旨在将输入的内容图像转换为由文字描述的所需风格,而无需风格图像。在许多实际情况下,用户可能不仅关注整个肖像图像,还关注局部(如眼睛、嘴唇和头发)。针对此类应用,我们提出了一种新的框架,可在由所需风格的文字描述所描述的特定区域进行风格转移。具体来说,我们结合了语义分割技术来识别目标区域,而不需要用户提供编辑掩码,同时利用预先训练好的基于 CLIP 的模型来进行风格化。此外,我们还提出了一种文本到补丁的匹配损失方法,即随机将风格化图像分割成更小的补丁,以确保结果质量的一致性。为了全面评估所提出的方法,我们在由 CelebAMask-HQ 数据集和其他相关作品的风格描述组成的数据集上使用了 FID、SSIM 和 PSNR 等多个指标。广泛的实验结果表明,我们的框架在风格化质量和推理时间方面都优于其他最先进的方法。
{"title":"Local part attention for image stylization with text prompt","authors":"Quoc-Truong Truong, Vinh-Tiep Nguyen, Lan-Phuong Nguyen, Hung-Phu Cao, Duc-Tuan Luu","doi":"10.1007/s00521-024-10394-w","DOIUrl":"https://doi.org/10.1007/s00521-024-10394-w","url":null,"abstract":"<p>Prompt-based portrait image style transfer aims at translating an input content image to a desired style described by text without a style image. In many practical situations, users may not only attend to the entire portrait image but also the local parts (e.g., eyes, lips, and hair). To address such applications, we propose a new framework that enables style transfer on specific regions described by a text description of the desired style. Specifically, we incorporate semantic segmentation to identify the intended area without requiring edit masks from the user while utilizing a pre-trained CLIP-based model for stylizing. Besides, we propose a text-to-patch matching loss by randomly dividing the stylized image into smaller patches to ensure the consistent quality of the result. To comprehensively evaluate the proposed method, we use several metrics, such as FID, SSIM, and PSNR on a dataset consisting of portraits from the CelebAMask-HQ dataset and style descriptions of other related works. Extensive experimental results demonstrate that our framework outperforms other state-of-the-art methods in terms of both stylization quality and inference time.</p>","PeriodicalId":18925,"journal":{"name":"Neural Computing and Applications","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142251110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Neural Computing and Applications
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1