Pub Date : 2024-10-16DOI: 10.1007/s10462-024-10972-3
Marek Pawlicki, Aleksandra Pawlicka, Rafał Kozik, Michał Choraś
In the rapidly evolving domain of cybersecurity, the imperative for intrusion detection systems is undeniable; yet, it is increasingly clear that to meet the ever-growing challenges posed by sophisticated threats, intrusion detection itself stands in need of the transformative capabilities offered by the explainable artificial intelligence (xAI). As this concept is still developing, it poses an array of challenges that need addressing. This paper discusses 25 of such challenges of varying research interest, encountered in the domain of xAI, identified in the course of a targeted study. While these challenges may appear as obstacles, they concurrently present as significant research opportunities. These analysed challenges encompass a wide spectrum of concerns spanning the intersection of xAI and cybersecurity. The paper underscores the critical role of xAI in addressing opacity issues within machine learning algorithms and sets the stage for further research and innovation in the quest for transparent and interpretable artificial intelligence that humans are able to trust. In addition to this, by reframing these challenges as opportunities, this study seeks to inspire and guide researchers towards realizing the full potential of xAI in cybersecurity.
{"title":"The survey on the dual nature of xAI challenges in intrusion detection and their potential for AI innovation","authors":"Marek Pawlicki, Aleksandra Pawlicka, Rafał Kozik, Michał Choraś","doi":"10.1007/s10462-024-10972-3","DOIUrl":"10.1007/s10462-024-10972-3","url":null,"abstract":"<div><p>In the rapidly evolving domain of cybersecurity, the imperative for intrusion detection systems is undeniable; yet, it is increasingly clear that to meet the ever-growing challenges posed by sophisticated threats, intrusion detection itself stands in need of the transformative capabilities offered by the explainable artificial intelligence (xAI). As this concept is still developing, it poses an array of challenges that need addressing. This paper discusses 25 of such challenges of varying research interest, encountered in the domain of xAI, identified in the course of a targeted study. While these challenges may appear as obstacles, they concurrently present as significant research opportunities. These analysed challenges encompass a wide spectrum of concerns spanning the intersection of xAI and cybersecurity. The paper underscores the critical role of xAI in addressing opacity issues within machine learning algorithms and sets the stage for further research and innovation in the quest for transparent and interpretable artificial intelligence that humans are able to trust. In addition to this, by reframing these challenges as opportunities, this study seeks to inspire and guide researchers towards realizing the full potential of xAI in cybersecurity.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10972-3.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142438878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective task scheduling has become the key to optimizing resource allocation, reducing operation costs, and enhancing the user experience. The complexity and dynamics of cloud computing environments require task scheduling algorithms that can flexibly respond to multiple computing demands and changing resource states. Therefore, we propose an enhanced Red-tailed Hawk algorithm (named ERTH) based on multiple elite policies and chaotic mapping, while applying this approach in conjunction with the proposed scheduling model to optimize the efficiency of task scheduling in cloud computing environments. We apply the ERTH algorithm to a real cloud computing environment and conduct a comparison with the original RTH and other conventional algorithms. The proposed ERTH algorithm has better convergence speed and stability in most cases of small and large-scale tasks and performs better in minimizing the task completion time and system load cost. Specifically, our experiments show that the ERTH algorithm reduces the total system cost by 34.8% and 36.4% relative to the traditional algorithm for tasks of different sizes. Further, evaluations in the IEEE Congress on Evolutionary Computation (CEC) benchmark test sets show that the ERTH algorithm outperforms the traditional or emerging algorithms in several performance metrics such as mean, standard deviation, etc. The proposal and validation of the ERTH algorithm are of great significance in promoting the application of intelligent optimization algorithms in cloud computing.
{"title":"ERTH scheduler: enhanced red-tailed hawk algorithm for multi-cost optimization in cloud task scheduling","authors":"Xinqi Qin, Shaobo Li, Jian Tong, Cankun Xie, Xingxing Zhang, Fengbin Wu, Qun Xie, Yihong Ling, Guangzheng Lin","doi":"10.1007/s10462-024-10945-6","DOIUrl":"10.1007/s10462-024-10945-6","url":null,"abstract":"<div><p>Effective task scheduling has become the key to optimizing resource allocation, reducing operation costs, and enhancing the user experience. The complexity and dynamics of cloud computing environments require task scheduling algorithms that can flexibly respond to multiple computing demands and changing resource states. Therefore, we propose an enhanced Red-tailed Hawk algorithm (named ERTH) based on multiple elite policies and chaotic mapping, while applying this approach in conjunction with the proposed scheduling model to optimize the efficiency of task scheduling in cloud computing environments. We apply the ERTH algorithm to a real cloud computing environment and conduct a comparison with the original RTH and other conventional algorithms. The proposed ERTH algorithm has better convergence speed and stability in most cases of small and large-scale tasks and performs better in minimizing the task completion time and system load cost. Specifically, our experiments show that the ERTH algorithm reduces the total system cost by 34.8% and 36.4% relative to the traditional algorithm for tasks of different sizes. Further, evaluations in the IEEE Congress on Evolutionary Computation (CEC) benchmark test sets show that the ERTH algorithm outperforms the traditional or emerging algorithms in several performance metrics such as mean, standard deviation, etc. The proposal and validation of the ERTH algorithm are of great significance in promoting the application of intelligent optimization algorithms in cloud computing.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10945-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-12DOI: 10.1007/s10462-024-10858-4
Sandeep Kumar, Pranab K. Muhuri
Though economic growths of most of the nations have seen exponential rise due to industrialization, it has also caused proportional increase in their carbon emissions. This paper exploits this proportionate relationship of carbon emission with GDP to predict the per-capita GDP of those nations whose GDP values are missing in the world bank database. The reason behind the same was, those countries were either war-torn or politically isolated/unstable. To achieve the objective of predicting the missing GDP values of those countries from their carbon emissions, this paper exploits the non-linear relationship among the carbon emissions from solid fuels, liquid fuels, and gaseous fuels. It is so because even the differential utilization of these fuels impact economy differently. Use of traditional solid fuel for cooking points toward energy poverty, and access to clean cooking gas indicates higher living standard. However, the available data from the war-torn or isolated countries are very little, and hence insufficient for building a robust predictive machine learning model. So, this paper employs multi-source unsupervised transfer learning to precisely estimate the missing per-capita GDP of those nations. It suitably enlarges the training domains for the prediction models to be more robust. We empirically evaluate the proposed methodology for different regression techniques to estimate the missing GDP values of eleven different nations belonging to diverse strata of economies viz. developed economies, developing, and/or least developing economies. Proposed methodology profoundly improves the prediction preciseness of these regression techniques in estimating the missing per-capita GDP of the considered nations.
虽然大多数国家的经济增长因工业化而呈指数级增长,但这也导致了其碳排放量的成比例增加。本文利用碳排放量与国内生产总值的比例关系,预测那些在世界银行数据库中国内生产总值数值缺失的国家的人均国内生产总值。其背后的原因是,这些国家要么饱受战争蹂躏,要么政治孤立/不稳定。为了实现从这些国家的碳排放量预测其缺失的 GDP 值的目标,本文利用了固体燃料、液体燃料和气体燃料的碳排放量之间的非线性关系。这是因为即使这些燃料的利用率不同,对经济的影响也是不同的。使用传统固体燃料做饭表明能源贫困,而使用清洁燃气做饭则表明生活水平较高。然而,来自战乱国家或偏远国家的可用数据非常少,因此不足以建立一个强大的预测性机器学习模型。因此,本文采用多源无监督迁移学习来精确估算这些国家缺失的人均 GDP。它适当地扩大了预测模型的训练域,使其更加稳健。我们用不同的回归技术对所提出的方法进行了实证评估,以估算 11 个不同国家缺失的 GDP 值,这些国家属于不同的经济阶层,即发达经济体、发展中国家和/或最不发达经济体。在估算所考虑国家缺失的人均 GDP 时,所提出的方法大大提高了这些回归技术的预测精确度。
{"title":"A concise review towards a novel target specific multi-source unsupervised transfer learning technique for GDP estimation using CO2 emission data","authors":"Sandeep Kumar, Pranab K. Muhuri","doi":"10.1007/s10462-024-10858-4","DOIUrl":"10.1007/s10462-024-10858-4","url":null,"abstract":"<div><p>Though economic growths of most of the nations have seen exponential rise due to industrialization, it has also caused proportional increase in their carbon emissions. This paper exploits this proportionate relationship of carbon emission with GDP to predict the per-capita GDP of those nations whose GDP values are missing in the world bank database. The reason behind the same was, those countries were either war-torn or politically isolated/unstable. To achieve the objective of predicting the missing GDP values of those countries from their carbon emissions, this paper exploits the non-linear relationship among the carbon emissions from solid fuels, liquid fuels, and gaseous fuels. It is so because even the differential utilization of these fuels impact economy differently. Use of traditional solid fuel for cooking points toward energy poverty, and access to clean cooking gas indicates higher living standard. However, the available data from the war-torn or isolated countries are very little, and hence insufficient for building a robust predictive machine learning model. So, this paper employs multi-source unsupervised transfer learning to precisely estimate the missing per-capita GDP of those nations. It suitably enlarges the training domains for the prediction models to be more robust. We empirically evaluate the proposed methodology for different regression techniques to estimate the missing GDP values of eleven different nations belonging to diverse strata of economies viz. developed economies, developing, and/or least developing economies. Proposed methodology profoundly improves the prediction preciseness of these regression techniques in estimating the missing per-capita GDP of the considered nations.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10858-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This review paper focuses on the progress of deep learning-based methods for multi-view 3D object recognition. It covers the state-of-the-art techniques in this field, specifically those that utilize 3D multi-view data as input representation. The paper provides a comprehensive analysis of the pipeline for deep learning-based multi-view 3D object recognition, including the various techniques employed at each stage. It also presents the latest developments in CNN-based and transformer-based models for multi-view 3D object recognition. The review discusses existing models in detail, including the datasets, camera configurations, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance. Additionally, it examines various computer vision applications that use multi-view classification. Finally, it highlights future directions, factors impacting recognition performance, and trends for the development of multi-view 3D object recognition method.
本综述论文重点介绍基于深度学习的多视角三维物体识别方法的进展。它涵盖了该领域最先进的技术,特别是那些利用三维多视角数据作为输入表示的技术。论文全面分析了基于深度学习的多视角三维物体识别流程,包括每个阶段采用的各种技术。论文还介绍了基于 CNN 和变换器的多视角 3D 物体识别模型的最新发展。综述详细讨论了现有模型,包括数据集、相机配置、视图选择策略、预训练 CNN 架构、融合策略和识别性能。此外,它还研究了使用多视图分类的各种计算机视觉应用。最后,它强调了多视角三维物体识别方法的未来发展方向、影响识别性能的因素和发展趋势。
{"title":"Deep models for multi-view 3D object recognition: a review","authors":"Mona Alzahrani, Muhammad Usman, Salma Kammoun Jarraya, Saeed Anwar, Tarek Helmy","doi":"10.1007/s10462-024-10941-w","DOIUrl":"10.1007/s10462-024-10941-w","url":null,"abstract":"<div><p>This review paper focuses on the progress of deep learning-based methods for multi-view 3D object recognition. It covers the state-of-the-art techniques in this field, specifically those that utilize 3D multi-view data as input representation. The paper provides a comprehensive analysis of the pipeline for deep learning-based multi-view 3D object recognition, including the various techniques employed at each stage. It also presents the latest developments in CNN-based and transformer-based models for multi-view 3D object recognition. The review discusses existing models in detail, including the datasets, camera configurations, view selection strategies, pre-trained CNN architectures, fusion strategies, and recognition performance. Additionally, it examines various computer vision applications that use multi-view classification. Finally, it highlights future directions, factors impacting recognition performance, and trends for the development of multi-view 3D object recognition method.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10941-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alzheimer’s disease (AD) is a growing global concern, exacerbated by an aging population and the high costs associated with traditional detection methods. Recent research has identified speech data as valuable clinical information for AD detection, given its association with the progressive degeneration of brain cells and subsequent impacts on memory, cognition, and language abilities. The ongoing demographic shift toward an aging global population underscores the critical need for affordable and easily available methods for early AD detection and intervention. To address this major challenge, substantial research has recently focused on investigating speech data, aiming to develop efficient and affordable diagnostic tools that align with the demands of our aging society. This paper presents an in-depth review of studies from 2018–2023 utilizing speech for AD detection. Following the PRISMA protocol and a two-stage selection process, we identified 85 publications for analysis. In contrast to previous literature reviews, this paper places a strong emphasis on conducting a rigorous comparative analysis of various Artificial Intelligence (AI) based techniques, categorizing them meticulously based on underlying algorithms. We perform an exhaustive evaluation of research papers leveraging common benchmark datasets, specifically ADReSS and ADReSSo, to assess their performance. In contrast to previous literature reviews, this work makes a significant contribution by overcoming the limitations posed by the absence of standardized tasks and commonly accepted benchmark datasets for comparing different studies. The analysis reveals the dominance of deep learning models, particularly those leveraging pre-trained models like BERT, in AD detection. The integration of acoustic and linguistic features often achieves accuracies above 85%. Despite these advancements, challenges persist in data scarcity, standardization, privacy, and model interpretability. Future directions include improving multilingual recognition, exploring emerging multimodal approaches, and enhancing ASR systems for AD patients. By identifying these key challenges and suggesting future research directions, our review serves as a valuable resource for advancing AD detection techniques and their practical implementation.
{"title":"Speech based detection of Alzheimer’s disease: a survey of AI techniques, datasets and challenges","authors":"Kewen Ding, Madhu Chetty, Azadeh Noori Hoshyar, Tanusri Bhattacharya, Britt Klein","doi":"10.1007/s10462-024-10961-6","DOIUrl":"10.1007/s10462-024-10961-6","url":null,"abstract":"<div><p>Alzheimer’s disease (AD) is a growing global concern, exacerbated by an aging population and the high costs associated with traditional detection methods. Recent research has identified speech data as valuable clinical information for AD detection, given its association with the progressive degeneration of brain cells and subsequent impacts on memory, cognition, and language abilities. The ongoing demographic shift toward an aging global population underscores the critical need for affordable and easily available methods for early AD detection and intervention. To address this major challenge, substantial research has recently focused on investigating speech data, aiming to develop efficient and affordable diagnostic tools that align with the demands of our aging society. This paper presents an in-depth review of studies from 2018–2023 utilizing speech for AD detection. Following the PRISMA protocol and a two-stage selection process, we identified 85 publications for analysis. In contrast to previous literature reviews, this paper places a strong emphasis on conducting a rigorous comparative analysis of various Artificial Intelligence (AI) based techniques, categorizing them meticulously based on underlying algorithms. We perform an exhaustive evaluation of research papers leveraging common benchmark datasets, specifically ADReSS and ADReSSo, to assess their performance. In contrast to previous literature reviews, this work makes a significant contribution by overcoming the limitations posed by the absence of standardized tasks and commonly accepted benchmark datasets for comparing different studies. The analysis reveals the dominance of deep learning models, particularly those leveraging pre-trained models like BERT, in AD detection. The integration of acoustic and linguistic features often achieves accuracies above 85%. Despite these advancements, challenges persist in data scarcity, standardization, privacy, and model interpretability. Future directions include improving multilingual recognition, exploring emerging multimodal approaches, and enhancing ASR systems for AD patients. By identifying these key challenges and suggesting future research directions, our review serves as a valuable resource for advancing AD detection techniques and their practical implementation.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10961-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-12DOI: 10.1007/s10462-024-10973-2
Marc Schmitt, Ivan Flechais
The advancement of Artificial Intelligence (AI) and Machine Learning (ML) has profound implications for both the utility and security of our digital interactions. This paper investigates the transformative role of Generative AI in Social Engineering (SE) attacks. We conduct a systematic review of social engineering and AI capabilities and use a theory of social engineering to identify three pillars where Generative AI amplifies the impact of SE attacks: Realistic Content Creation, Advanced Targeting and Personalization, and Automated Attack Infrastructure. We integrate these elements into a conceptual model designed to investigate the complex nature of AI-driven SE attacks—the Generative AI Social Engineering Framework. We further explore human implications and potential countermeasures to mitigate these risks. Our study aims to foster a deeper understanding of the risks, human implications, and countermeasures associated with this emerging paradigm, thereby contributing to a more secure and trustworthy human-computer interaction.
{"title":"Digital deception: generative artificial intelligence in social engineering and phishing","authors":"Marc Schmitt, Ivan Flechais","doi":"10.1007/s10462-024-10973-2","DOIUrl":"10.1007/s10462-024-10973-2","url":null,"abstract":"<div><p>The advancement of Artificial Intelligence (AI) and Machine Learning (ML) has profound implications for both the utility and security of our digital interactions. This paper investigates the transformative role of Generative AI in Social Engineering (SE) attacks. We conduct a systematic review of social engineering and AI capabilities and use a theory of social engineering to identify three pillars where Generative AI amplifies the impact of SE attacks: Realistic Content Creation, Advanced Targeting and Personalization, and Automated Attack Infrastructure. We integrate these elements into a conceptual model designed to investigate the complex nature of AI-driven SE attacks—the Generative AI Social Engineering Framework. We further explore human implications and potential countermeasures to mitigate these risks. Our study aims to foster a deeper understanding of the risks, human implications, and countermeasures associated with this emerging paradigm, thereby contributing to a more secure and trustworthy human-computer interaction.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10973-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-12DOI: 10.1007/s10462-024-10963-4
P. Alirezazadeh, F. Dornaika, J. Charafeddine
To enhance the accuracy of breast cancer diagnosis, current practices rely on biopsies and microscopic examinations. However, this approach is known for being time-consuming, tedious, and costly. While convolutional neural networks (CNNs) have shown promise for their efficiency and high accuracy, training them effectively becomes challenging in real-world learning scenarios such as class imbalance, small-scale datasets, and label noises. Angular margin-based softmax losses, which concentrate on the angle between features and classifiers embedded in cosine similarity at the classification layer, aim to regulate feature representation learning. Nevertheless, the cosine similarity’s lack of a heavy tail impedes its ability to compactly regulate intra-class feature distribution, limiting generalization performance. Moreover, these losses are constrained to target classes when margin penalties are applied, which may not always optimize effectiveness. Addressing these hurdles, we introduce an innovative approach termed MF-BAM (Mises-Fisher Similarity-based Boosted Additive Angular Margin Loss), which extends beyond traditional cosine similarity and is anchored in the von Mises-Fisher distribution. MF-BAM not only penalizes the angle between deep features and their corresponding target class weights but also considers angles between deep features and weights associated with non-target classes. Through extensive experimentation on the BreaKHis dataset, MF-BAM achieves outstanding accuracies of 99.92%, 99.96%, 100.00%, and 98.05% for magnification levels of ×40, ×100, ×200, and ×400, respectively. Furthermore, additional experiments conducted on the BACH dataset for breast cancer classification, as well as on the LFW and YTF datasets for face recognition, affirm the generalization capability of our proposed loss function.
{"title":"Mises-Fisher similarity-based boosted additive angular margin loss for breast cancer classification","authors":"P. Alirezazadeh, F. Dornaika, J. Charafeddine","doi":"10.1007/s10462-024-10963-4","DOIUrl":"10.1007/s10462-024-10963-4","url":null,"abstract":"<div><p>To enhance the accuracy of breast cancer diagnosis, current practices rely on biopsies and microscopic examinations. However, this approach is known for being time-consuming, tedious, and costly. While convolutional neural networks (CNNs) have shown promise for their efficiency and high accuracy, training them effectively becomes challenging in real-world learning scenarios such as class imbalance, small-scale datasets, and label noises. Angular margin-based softmax losses, which concentrate on the angle between features and classifiers embedded in cosine similarity at the classification layer, aim to regulate feature representation learning. Nevertheless, the cosine similarity’s lack of a heavy tail impedes its ability to compactly regulate intra-class feature distribution, limiting generalization performance. Moreover, these losses are constrained to target classes when margin penalties are applied, which may not always optimize effectiveness. Addressing these hurdles, we introduce an innovative approach termed MF-BAM (Mises-Fisher Similarity-based Boosted Additive Angular Margin Loss), which extends beyond traditional cosine similarity and is anchored in the von Mises-Fisher distribution. MF-BAM not only penalizes the angle between deep features and their corresponding target class weights but also considers angles between deep features and weights associated with non-target classes. Through extensive experimentation on the BreaKHis dataset, MF-BAM achieves outstanding accuracies of 99.92%, 99.96%, 100.00%, and 98.05% for magnification levels of ×40, ×100, ×200, and ×400, respectively. Furthermore, additional experiments conducted on the BACH dataset for breast cancer classification, as well as on the LFW and YTF datasets for face recognition, affirm the generalization capability of our proposed loss function.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10963-4.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-12DOI: 10.1007/s10462-024-10937-6
Lihua Lu, Ruyang Li, Xiaohui Zhang, Hui Wei, Guoguang Du, Binqiang Wang
In 3D Artificial Intelligence Generated Content (AIGC), compared with generating 3D assets from scratch, editing extant 3D assets satisfies user prompts, allowing the creation of diverse and high-quality 3D assets in a time and labor-saving manner. More recently, text-guided 3D editing that modifies 3D assets guided by text prompts is user-friendly and practical, which evokes a surge in research within this field. In this survey, we comprehensively investigate recent literature on text-guided 3D editing in an attempt to answer two questions: What are the methodologies of existing text-guided 3D editing? How has current progress in text-guided 3D editing gone so far? Specifically, we focus on text-guided 3D editing methods published in the past 4 years, delving deeply into their frameworks and principles. We then present a fundamental taxonomy in terms of the editing strategy, optimization scheme, and 3D representation. Based on the taxonomy, we review recent advances in this field, considering factors such as editing scale, type, granularity, and perspective. In addition, we highlight four applications of text-guided 3D editing, including texturing, style transfer, local editing of scenes, and insertion editing, to exploit further the 3D editing capacities with in-depth comparisons and discussions. Depending on the insights achieved by this survey, we discuss open challenges and future research directions. We hope this survey will help readers gain a deeper understanding of this exciting field and foster further advancements in text-guided 3D editing.
在三维人工智能生成内容(AIGC)中,与从头开始生成三维资产相比,编辑现有的三维资产可以满足用户的提示,从而以省时省力的方式创建多样化和高质量的三维资产。最近,以文本提示为指导修改三维资产的文本指导三维编辑既友好又实用,从而引发了这一领域的研究热潮。在本调查中,我们全面调查了近期有关文本引导 3D 编辑的文献,试图回答两个问题:现有文本引导 3D 编辑的方法有哪些?文本引导的三维编辑目前进展如何?具体而言,我们将重点关注过去 4 年中发表的文本引导 3D 编辑方法,深入探讨其框架和原理。然后,我们从编辑策略、优化方案和三维表示等方面提出了一个基本分类法。基于该分类法,我们回顾了该领域的最新进展,并考虑了编辑规模、类型、粒度和视角等因素。此外,我们还重点介绍了文本引导的三维编辑的四种应用,包括贴图、风格转换、场景局部编辑和插入编辑,通过深入的比较和讨论进一步开发三维编辑能力。根据本次调查所获得的启示,我们讨论了有待解决的挑战和未来的研究方向。我们希望本调查报告能帮助读者更深入地了解这一令人兴奋的领域,并促进文本引导的三维编辑技术的进一步发展。
{"title":"Advances in text-guided 3D editing: a survey","authors":"Lihua Lu, Ruyang Li, Xiaohui Zhang, Hui Wei, Guoguang Du, Binqiang Wang","doi":"10.1007/s10462-024-10937-6","DOIUrl":"10.1007/s10462-024-10937-6","url":null,"abstract":"<div><p>In 3D Artificial Intelligence Generated Content (AIGC), compared with generating 3D assets from scratch, editing extant 3D assets satisfies user prompts, allowing the creation of diverse and high-quality 3D assets in a time and labor-saving manner. More recently, text-guided 3D editing that modifies 3D assets guided by text prompts is user-friendly and practical, which evokes a surge in research within this field. In this survey, we comprehensively investigate recent literature on text-guided 3D editing in an attempt to answer two questions: What are the methodologies of existing text-guided 3D editing? How has current progress in text-guided 3D editing gone so far? Specifically, we focus on text-guided 3D editing methods published in the past 4 years, delving deeply into their frameworks and principles. We then present a fundamental taxonomy in terms of the editing strategy, optimization scheme, and 3D representation. Based on the taxonomy, we review recent advances in this field, considering factors such as editing scale, type, granularity, and perspective. In addition, we highlight four applications of text-guided 3D editing, including texturing, style transfer, local editing of scenes, and insertion editing, to exploit further the 3D editing capacities with in-depth comparisons and discussions. Depending on the insights achieved by this survey, we discuss open challenges and future research directions. We hope this survey will help readers gain a deeper understanding of this exciting field and foster further advancements in text-guided 3D editing.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10937-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-12DOI: 10.1007/s10462-024-10970-5
Younas Khan, David Sánchez, Josep Domingo-Ferrer
Federated learning (FL) is a decentralized machine learning (ML) framework that allows models to be trained without sharing the participants’ local data. FL thus preserves privacy better than centralized machine learning. Since textual data (such as clinical records, posts in social networks, or search queries) often contain personal information, many natural language processing (NLP) tasks dealing with such data have shifted from the centralized to the FL setting. However, FL is not free from issues, including convergence and security vulnerabilities (due to unreliable or poisoned data introduced into the model), communication and computation bottlenecks, and even privacy attacks orchestrated by honest-but-curious servers. In this paper, we present a systematic literature review (SLR) of NLP applications in FL with a special focus on FL issues and the solutions proposed so far. Our review surveys 36 recent papers published in relevant venues, which are systematically analyzed and compared from multiple perspectives. As a result of the survey, we also identify the most outstanding challenges in the area.
{"title":"Federated learning-based natural language processing: a systematic literature review","authors":"Younas Khan, David Sánchez, Josep Domingo-Ferrer","doi":"10.1007/s10462-024-10970-5","DOIUrl":"10.1007/s10462-024-10970-5","url":null,"abstract":"<div><p>Federated learning (FL) is a decentralized machine learning (ML) framework that allows models to be trained without sharing the participants’ local data. FL thus preserves privacy better than centralized machine learning. Since textual data (such as clinical records, posts in social networks, or search queries) often contain personal information, many natural language processing (NLP) tasks dealing with such data have shifted from the centralized to the FL setting. However, FL is not free from issues, including convergence and security vulnerabilities (due to unreliable or poisoned data introduced into the model), communication and computation bottlenecks, and even privacy attacks orchestrated by honest-but-curious servers. In this paper, we present a systematic literature review (SLR) of NLP applications in FL with a special focus on FL issues and the solutions proposed so far. Our review surveys 36 recent papers published in relevant venues, which are systematically analyzed and compared from multiple perspectives. As a result of the survey, we also identify the most outstanding challenges in the area.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10970-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-12DOI: 10.1007/s10462-024-10974-1
Hisashi Kashima, Satoshi Oyama, Hiromi Arai, Junichiro Mori
Human computation is an approach to solving problems that prove difficult using AI only, and involves the cooperation of many humans. Because human computation requires close engagement with both “human populations as users” and “human populations as driving forces,” establishing mutual trust between AI and humans is an important issue to further the development of human computation. This survey lays the groundwork for the realization of trustworthy human computation. First, the trustworthiness of human computation as computing systems, that is, trust offered by humans to AI, is examined using the RAS (reliability, availability, and serviceability) analogy, which define measures of trustworthiness in conventional computer systems. Next, the social trustworthiness provided by human computation systems to users or participants is discussed from the perspective of AI ethics, including fairness, privacy, and transparency. Then, we consider human–AI collaboration based on two-way trust, in which humans and AI build mutual trust and accomplish difficult tasks through reciprocal collaboration. Finally, future challenges and research directions for realizing trustworthy human computation are discussed.
{"title":"Trustworthy human computation: a survey","authors":"Hisashi Kashima, Satoshi Oyama, Hiromi Arai, Junichiro Mori","doi":"10.1007/s10462-024-10974-1","DOIUrl":"10.1007/s10462-024-10974-1","url":null,"abstract":"<div><p>Human computation is an approach to solving problems that prove difficult using AI only, and involves the cooperation of many humans. Because human computation requires close engagement with both “human populations as users” and “human populations as driving forces,” establishing mutual trust between AI and humans is an important issue to further the development of human computation. This survey lays the groundwork for the realization of trustworthy human computation. First, the trustworthiness of human computation as computing systems, that is, trust offered by humans to AI, is examined using the RAS (reliability, availability, and serviceability) analogy, which define measures of trustworthiness in conventional computer systems. Next, the social trustworthiness provided by human computation systems to users or participants is discussed from the perspective of AI ethics, including fairness, privacy, and transparency. Then, we consider human–AI collaboration based on two-way trust, in which humans and AI build mutual trust and accomplish difficult tasks through reciprocal collaboration. Finally, future challenges and research directions for realizing trustworthy human computation are discussed.</p></div>","PeriodicalId":8449,"journal":{"name":"Artificial Intelligence Review","volume":"57 12","pages":""},"PeriodicalIF":10.7,"publicationDate":"2024-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10462-024-10974-1.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142411524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}