Our paper aims to analyze political polarization in US political system using language models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates' views on the economy, healthcare, education, and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a language model-based method that helps analyze how polarized a candidate is. Our data are divided into two parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, and so forth. We further split this data into four phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization, we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer-based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background. The code and data for the project will be available here: “https://github.com/samirangode/Understanding_Polarization”
{"title":"Understanding political polarization using language models: A dataset and method","authors":"Samiran Gode, Supreeth Bare, Bhiksha Raj, Hyungon Yoo","doi":"10.1002/aaai.12104","DOIUrl":"https://doi.org/10.1002/aaai.12104","url":null,"abstract":"<p>Our paper aims to analyze political polarization in US political system using language models, and thereby help candidates make an informed decision. The availability of this information will help voters understand their candidates' views on the economy, healthcare, education, and other social issues. Our main contributions are a dataset extracted from Wikipedia that spans the past 120 years and a language model-based method that helps analyze how polarized a candidate is. Our data are divided into two parts, background information and political information about a candidate, since our hypothesis is that the political views of a candidate should be based on reason and be independent of factors such as birthplace, alma mater, and so forth. We further split this data into four phases chronologically, to help understand if and how the polarization amongst candidates changes. This data has been cleaned to remove biases. To understand the polarization, we begin by showing results from some classical language models in Word2Vec and Doc2Vec. And then use more powerful techniques like the Longformer, a transformer-based encoder, to assimilate more information and find the nearest neighbors of each candidate based on their political view and their background. The code and data for the project will be available here: “https://github.com/samirangode/Understanding_Polarization”</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"44 3","pages":"248-254"},"PeriodicalIF":0.9,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12104","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50123694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Explainable Artificial Intelligence (XAI) has emerged as a crucial research area to address the interpretability challenges posed by complex machine learning models. In this survey paper, we provide a comprehensive analysis of existing approaches in the field of XAI, focusing on the tradeoff between model accuracy and interpretability. Motivated by the need to address this tradeoff, we conduct an extensive review of the literature, presenting a multi-view taxonomy that offers a new perspective on XAI methodologies. We analyze various sub-categories of XAI methods, considering their strengths, weaknesses, and practical challenges. Moreover, we explore causal relationships in model explanations and discuss approaches dedicated to explaining cross-domain classifiers. The latter is particularly important in scenarios where training and test data are sampled from different distributions. Drawing insights from our analysis, we propose future research directions, including exploring explainable allied learning paradigms, developing evaluation metrics for both traditionally trained and allied learning-based classifiers, and applying neural architectural search techniques to minimize the accuracy–interpretability tradeoff. This survey paper provides a comprehensive overview of the state-of-the-art in XAI, serving as a valuable resource for researchers and practitioners interested in understanding and advancing the field.
{"title":"Explainable Image Classification: The Journey So Far and the Road Ahead","authors":"V. Kamakshi, N. C. Krishnan","doi":"10.3390/ai4030033","DOIUrl":"https://doi.org/10.3390/ai4030033","url":null,"abstract":"Explainable Artificial Intelligence (XAI) has emerged as a crucial research area to address the interpretability challenges posed by complex machine learning models. In this survey paper, we provide a comprehensive analysis of existing approaches in the field of XAI, focusing on the tradeoff between model accuracy and interpretability. Motivated by the need to address this tradeoff, we conduct an extensive review of the literature, presenting a multi-view taxonomy that offers a new perspective on XAI methodologies. We analyze various sub-categories of XAI methods, considering their strengths, weaknesses, and practical challenges. Moreover, we explore causal relationships in model explanations and discuss approaches dedicated to explaining cross-domain classifiers. The latter is particularly important in scenarios where training and test data are sampled from different distributions. Drawing insights from our analysis, we propose future research directions, including exploring explainable allied learning paradigms, developing evaluation metrics for both traditionally trained and allied learning-based classifiers, and applying neural architectural search techniques to minimize the accuracy–interpretability tradeoff. This survey paper provides a comprehensive overview of the state-of-the-art in XAI, serving as a valuable resource for researchers and practitioners interested in understanding and advancing the field.","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"39 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81398236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite several solutions and experiments have been conducted recently addressing image super-resolution (SR), boosted by deep learning (DL), they do not usually design evaluations with high scaling factors. Moreover, the datasets are generally benchmarks which do not truly encompass significant diversity of domains to proper evaluate the techniques. It is also interesting to remark that blind SR is attractive for real-world scenarios since it is based on the idea that the degradation process is unknown, and, hence, techniques in this context rely basically on low-resolution (LR) images. In this article, we present a high-scale (8×) experiment which evaluates five recent DL techniques tailored for blind image SR: Adaptive Pseudo Augmentation (APA), Blind Image SR with Spatially Variant Degradations (BlindSR), Deep Alternating Network (DAN), FastGAN, and Mixture of Experts Super-Resolution (MoESR). We consider 14 datasets from five different broader domains (Aerial, Fauna, Flora, Medical, and Satellite), and another remark is that some of the DL approaches were designed for single-image SR but others not. Based on two no-reference metrics, NIQE and the transformer-based MANIQA score, MoESR can be regarded as the best solution although the perceptual quality of the created high-resolution (HR) images of all the techniques still needs to improve.
{"title":"Evaluating Deep Learning Techniques for Blind Image Super-Resolution within a High-Scale Multi-Domain Perspective","authors":"V. A. de Santiago Júnior","doi":"10.3390/ai4030032","DOIUrl":"https://doi.org/10.3390/ai4030032","url":null,"abstract":"Despite several solutions and experiments have been conducted recently addressing image super-resolution (SR), boosted by deep learning (DL), they do not usually design evaluations with high scaling factors. Moreover, the datasets are generally benchmarks which do not truly encompass significant diversity of domains to proper evaluate the techniques. It is also interesting to remark that blind SR is attractive for real-world scenarios since it is based on the idea that the degradation process is unknown, and, hence, techniques in this context rely basically on low-resolution (LR) images. In this article, we present a high-scale (8×) experiment which evaluates five recent DL techniques tailored for blind image SR: Adaptive Pseudo Augmentation (APA), Blind Image SR with Spatially Variant Degradations (BlindSR), Deep Alternating Network (DAN), FastGAN, and Mixture of Experts Super-Resolution (MoESR). We consider 14 datasets from five different broader domains (Aerial, Fauna, Flora, Medical, and Satellite), and another remark is that some of the DL approaches were designed for single-image SR but others not. Based on two no-reference metrics, NIQE and the transformer-based MANIQA score, MoESR can be regarded as the best solution although the perceptual quality of the created high-resolution (HR) images of all the techniques still needs to improve.","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"60 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80087246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is well documented that there has been significant enthusiasm across the globe in respect of using AI for all forms of social activity. However, the electoral process – the time, place, and manner of elections within democratic nations – is one of few sectors in which there has been limited penetration of AI. Electoral management bodies in many countries have recently started exploring and deliberating over the use of AI in the electoral process. In this paper, we consider five avenues within the core electoral process which have potential for AI usage, and map the challenges involved in using AI within them. These five avenues are: voter list maintenance, determining polling booth locations, polling booth protection processes, voter authentication, and video monitoring of elections. Within each avenue, we lay down the context, illustrate current or potential usage of AI, and discuss extant or potential ramifications of AI usage, as well as potential directions for mitigating risks when considering AI usage. We believe that the scant current usage of AI within electoral processes provides a very rare opportunity to deliberate on the risks and mitigation possibilities prior to actual and widespread AI deployment. This paper is an attempt to map the horizons of risks and opportunities in using AI within electoral processes and to help shape the debate around the topic.
{"title":"AI and core electoral processes: Mapping the horizons","authors":"Deepak P, Stanley Simoes, Muiris MacCarthaigh","doi":"10.1002/aaai.12105","DOIUrl":"https://doi.org/10.1002/aaai.12105","url":null,"abstract":"<p>It is well documented that there has been significant enthusiasm across the globe in respect of using AI for all forms of social activity. However, the electoral process – the time, place, and manner of elections within democratic nations – is one of few sectors in which there has been limited penetration of AI. Electoral management bodies in many countries have recently started exploring and deliberating over the use of AI in the electoral process. In this paper, we consider five avenues within the core electoral process which have potential for AI usage, and map the challenges involved in using AI within them. These five avenues are: voter list maintenance, determining polling booth locations, polling booth protection processes, voter authentication, and video monitoring of elections. Within each avenue, we lay down the context, illustrate current or potential usage of AI, and discuss extant or potential ramifications of AI usage, as well as potential directions for mitigating risks when considering AI usage. We believe that the scant current usage of AI within electoral processes provides a very rare opportunity to deliberate on the risks and mitigation possibilities prior to actual and widespread AI deployment. This paper is an attempt to map the horizons of risks and opportunities in using AI within electoral processes and to help shape the debate around the topic.</p>","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"44 3","pages":"218-239"},"PeriodicalIF":0.9,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12105","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50115102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Haoyu Chen, S. Lindshield, P. Ndiaye, Yaya Hamady Ndiaye, J. Pruetz, A. Reibman
Few-shot learning (FSL) describes the challenge of learning a new task using a minimum amount of labeled data, and we have observed significant progress made in this area. In this paper, we explore the effectiveness of the FSL theory by considering a real-world problem where labels are hard to obtain. To assist a large study on chimpanzee hunting activities, we aim to classify various animal species that appear in our in-the-wild camera traps located in Senegal. Using the philosophy of FSL, we aim to train an FSL network to learn to separate animal species using large public datasets and implement the network on our data with its novel species/classes and unseen environments, needing only to label a few images per new species. Here, we first discuss constraints and challenges caused by having in-the-wild uncurated data, which are often not addressed in benchmark FSL datasets. Considering these new challenges, we create two experiments and corresponding evaluation metrics to determine a network’s usefulness in a real-world implementation scenario. We then compare results from various FSL networks, and describe how factors may affect a network’s potential real-world usefulness. We consider network design factors such as distance metrics or extra pre-training, and examine their roles in a real-world implementation setting. We also consider additional factors such as support set selection and ease of implementation, which are usually ignored when a benchmark dataset has been established.
{"title":"Applying Few-Shot Learning for In-the-Wild Camera-Trap Species Classification","authors":"Haoyu Chen, S. Lindshield, P. Ndiaye, Yaya Hamady Ndiaye, J. Pruetz, A. Reibman","doi":"10.3390/ai4030031","DOIUrl":"https://doi.org/10.3390/ai4030031","url":null,"abstract":"Few-shot learning (FSL) describes the challenge of learning a new task using a minimum amount of labeled data, and we have observed significant progress made in this area. In this paper, we explore the effectiveness of the FSL theory by considering a real-world problem where labels are hard to obtain. To assist a large study on chimpanzee hunting activities, we aim to classify various animal species that appear in our in-the-wild camera traps located in Senegal. Using the philosophy of FSL, we aim to train an FSL network to learn to separate animal species using large public datasets and implement the network on our data with its novel species/classes and unseen environments, needing only to label a few images per new species. Here, we first discuss constraints and challenges caused by having in-the-wild uncurated data, which are often not addressed in benchmark FSL datasets. Considering these new challenges, we create two experiments and corresponding evaluation metrics to determine a network’s usefulness in a real-world implementation scenario. We then compare results from various FSL networks, and describe how factors may affect a network’s potential real-world usefulness. We consider network design factors such as distance metrics or extra pre-training, and examine their roles in a real-world implementation setting. We also consider additional factors such as support set selection and ease of implementation, which are usually ignored when a benchmark dataset has been established.","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"4 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80887306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Richard Granger, David Leake, Christopher K. Riesbeck
<p>A summary of Roger Schank's career might initially appear fairly typical for an eminent academic. Following a PhD in linguistics at the University of Texas at Austin in 1969, Roger held faculty positions in linguistics and computer science at Stanford, computer science and psychology at Yale, and computer science and education at Northwestern. He served terms as chair of computer science at both Yale and Northwestern. After Northwestern, he was Chief Educational Officer for Carnegie Mellon's Silicon Valley campus. He authored over 30 books spanning AI, cognitive science, psychology and education. He advised nearly 50 PhD students. He was a Fellow of AAAI.</p><p>But the hundreds of people who worked or interacted with Roger over the years know there was nothing typical about him. Roger was a force of nature. He questioned everything, especially (and gleefully) focusing on topics that were supposed to be canon. He came in, broke things apart, and built new things in their place. In linguistics, he rejected the Chomskyian approach to divorce the study of language from the study of meaning, with his seminal work on semantic primitives. In AI, where language processing focused on the propositions, he argued for the importance of much larger memory structures such as scripts and plans, and for memory processes, such as remindings, for modeling understanding. He argued for examples, that is, cases, rather than logical rules, for modeling human reasoning. Much of his work elicited initial pushback, which then transitioned to wary toleration, and finally arrived at such widespread acceptance that now his ideas are often assumed without attribution.</p><p>Roger relished debate, and engaged avidly in ongoing discourse on the issues he studied. Where many labs have weekly “discussions” or “chats”, Roger fashioned weekly “Friday fights” and an “Indefensible position” seminar. One facet of these was the Socratic investigation of complex topics; another was as a crucible for the courage to make bold claims and the skills to distill, defend, and question them. He questioned loudly. But under the disputative bearing, to those who knew and worked with him he had abundant loyalty and good will.</p><p>He was an explorer of the mind and of the world, an astute observer of humans and human nature: an intuitive psychologist. He had a knack for identifying key questions, always noticing customs, behaviors, and anomalies to explain, gathering data and categorizing to generate theories. His travels and knowledge of wine and food were a rich source of examples for his work and camaraderie. He did things in a big way, from academic passions like studying how language and the mind work and how people learn, to personal passions like food and football. Many stories about Roger occur at restaurants because meals were events. Many fans watch weekend football, but Roger created a room with half a dozen separate TVs, most with picture in picture, to monitor a dozen games s
{"title":"In Memoriam: Roger C. Schank, 1946–2023","authors":"Richard Granger, David Leake, Christopher K. Riesbeck","doi":"10.1002/aaai.12106","DOIUrl":"https://doi.org/10.1002/aaai.12106","url":null,"abstract":"<p>A summary of Roger Schank's career might initially appear fairly typical for an eminent academic. Following a PhD in linguistics at the University of Texas at Austin in 1969, Roger held faculty positions in linguistics and computer science at Stanford, computer science and psychology at Yale, and computer science and education at Northwestern. He served terms as chair of computer science at both Yale and Northwestern. After Northwestern, he was Chief Educational Officer for Carnegie Mellon's Silicon Valley campus. He authored over 30 books spanning AI, cognitive science, psychology and education. He advised nearly 50 PhD students. He was a Fellow of AAAI.</p><p>But the hundreds of people who worked or interacted with Roger over the years know there was nothing typical about him. Roger was a force of nature. He questioned everything, especially (and gleefully) focusing on topics that were supposed to be canon. He came in, broke things apart, and built new things in their place. In linguistics, he rejected the Chomskyian approach to divorce the study of language from the study of meaning, with his seminal work on semantic primitives. In AI, where language processing focused on the propositions, he argued for the importance of much larger memory structures such as scripts and plans, and for memory processes, such as remindings, for modeling understanding. He argued for examples, that is, cases, rather than logical rules, for modeling human reasoning. Much of his work elicited initial pushback, which then transitioned to wary toleration, and finally arrived at such widespread acceptance that now his ideas are often assumed without attribution.</p><p>Roger relished debate, and engaged avidly in ongoing discourse on the issues he studied. Where many labs have weekly “discussions” or “chats”, Roger fashioned weekly “Friday fights” and an “Indefensible position” seminar. One facet of these was the Socratic investigation of complex topics; another was as a crucible for the courage to make bold claims and the skills to distill, defend, and question them. He questioned loudly. But under the disputative bearing, to those who knew and worked with him he had abundant loyalty and good will.</p><p>He was an explorer of the mind and of the world, an astute observer of humans and human nature: an intuitive psychologist. He had a knack for identifying key questions, always noticing customs, behaviors, and anomalies to explain, gathering data and categorizing to generate theories. His travels and knowledge of wine and food were a rich source of examples for his work and camaraderie. He did things in a big way, from academic passions like studying how language and the mind work and how people learn, to personal passions like food and football. Many stories about Roger occur at restaurants because meals were events. Many fans watch weekend football, but Roger created a room with half a dozen separate TVs, most with picture in picture, to monitor a dozen games s","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"44 3","pages":"343-344"},"PeriodicalIF":0.9,"publicationDate":"2023-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/aaai.12106","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50148872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Convolutional Neural Networks (CNNs) have exhibited remarkable potential in effectively tackling the intricate task of classifying MRI images, specifically in Alzheimer’s disease detection and brain tumor identification. While CNNs optimize their parameters automatically through training processes, finding the optimal values for these parameters can still be a challenging task due to the complexity of the search space and the potential for suboptimal results. Consequently, researchers often encounter difficulties determining the ideal parameter settings for CNNs. This challenge necessitates using trial-and-error methods or expert judgment, as the search for the best combination of parameters involves exploring a vast space of possibilities. Despite the automatic optimization during training, the process does not guarantee finding the globally-optimal parameter values. Hence, researchers often rely on iterative experimentation and expert knowledge to fine-tune these parameters and maximize CNN performance. This poses a significant obstacle in developing real-world applications that leverage CNNs for MRI image analysis. This paper presents a new hybrid model that combines the Particle Swarm Optimization (PSO) algorithm with CNNs to enhance detection and classification capabilities. Our method utilizes the PSO algorithm to determine the optimal configuration of CNN hyper-parameters. Subsequently, these optimized parameters are applied to the CNN architectures for classification. As a result, our hybrid model exhibits improved prediction accuracy for brain diseases while reducing the loss of function value. To evaluate the performance of our proposed model, we conducted experiments using three benchmark datasets. Two datasets were utilized for Alzheimer’s disease: the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and an international dataset from Kaggle. The third dataset focused on brain tumors. The experimental assessment demonstrated the superiority of our proposed model, achieving unprecedented accuracy rates of 98.50%, 98.83%, and 97.12% for the datasets mentioned earlier, respectively.
{"title":"Improving Alzheimer’s Disease and Brain Tumor Detection Using Deep Learning with Particle Swarm Optimization","authors":"R. Ibrahim, Rawan Ghnemat, Q. Abu Al-haija","doi":"10.3390/ai4030030","DOIUrl":"https://doi.org/10.3390/ai4030030","url":null,"abstract":"Convolutional Neural Networks (CNNs) have exhibited remarkable potential in effectively tackling the intricate task of classifying MRI images, specifically in Alzheimer’s disease detection and brain tumor identification. While CNNs optimize their parameters automatically through training processes, finding the optimal values for these parameters can still be a challenging task due to the complexity of the search space and the potential for suboptimal results. Consequently, researchers often encounter difficulties determining the ideal parameter settings for CNNs. This challenge necessitates using trial-and-error methods or expert judgment, as the search for the best combination of parameters involves exploring a vast space of possibilities. Despite the automatic optimization during training, the process does not guarantee finding the globally-optimal parameter values. Hence, researchers often rely on iterative experimentation and expert knowledge to fine-tune these parameters and maximize CNN performance. This poses a significant obstacle in developing real-world applications that leverage CNNs for MRI image analysis. This paper presents a new hybrid model that combines the Particle Swarm Optimization (PSO) algorithm with CNNs to enhance detection and classification capabilities. Our method utilizes the PSO algorithm to determine the optimal configuration of CNN hyper-parameters. Subsequently, these optimized parameters are applied to the CNN architectures for classification. As a result, our hybrid model exhibits improved prediction accuracy for brain diseases while reducing the loss of function value. To evaluate the performance of our proposed model, we conducted experiments using three benchmark datasets. Two datasets were utilized for Alzheimer’s disease: the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and an international dataset from Kaggle. The third dataset focused on brain tumors. The experimental assessment demonstrated the superiority of our proposed model, achieving unprecedented accuracy rates of 98.50%, 98.83%, and 97.12% for the datasets mentioned earlier, respectively.","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"42 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87853976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Artificial intelligence (AI) plays a critical role in the operation of robot vacuum cleaners, enabling them to intelligently navigate to clean and avoid indoor obstacles. Due to limited computational resources, manufacturers must balance performance and cost. This necessitates the development of lightweight AI models that can achieve high performance. Traditional uniform weight quantization assigns the same number of levels to all weights, regardless of their distribution or importance. Consequently, this lack of adaptability may lead to sub-optimal quantization results, as the quantization levels do not align with the statistical properties of the weights. To address this challenge, in this work, we propose a new technique called low bitwidth strong non-uniform quantization, which largely reduces the memory footprint of AI models while maintaining high accuracy. Our proposed non-uniform quantization method, as opposed to traditional uniform quantization, aims to align with the actual weight distribution of well-trained neural network models. The proposed quantization scheme builds upon the observation of weight distribution characteristics in AI models and aims to leverage this knowledge to enhance the efficiency of neural network implementations. Additionally, we adjust the input image size to reduce the computational and memory demands of AI models. The goal is to identify an appropriate image size and its corresponding AI models that can be used in resource-constrained robot vacuum cleaners while still achieving acceptable accuracy on the object classification task. Experimental results indicate that when compared to the state-of-the-art AI models in the literature, the proposed AI model achieves a 2-fold decrease in memory usage from 15.51 MB down to 7.68 MB while maintaining the same accuracy of around 93%. In addition, the proposed non-uniform quantization model reduces memory usage by 20 times (from 15.51 MB down to 0.78 MB) with a slight accuracy drop of 3.11% (the classification accuracy is still above 90%). Thus, our proposed high-performance and lightweight AI model strikes an excellent balance between model complexity, classification accuracy, and computational resources for robot vacuum cleaners.
{"title":"High-Performance and Lightweight AI Model for Robot Vacuum Cleaners with Low Bitwidth Strong Non-Uniform Quantization","authors":"Qian Huang, Zhimin Tang","doi":"10.3390/ai4030029","DOIUrl":"https://doi.org/10.3390/ai4030029","url":null,"abstract":"Artificial intelligence (AI) plays a critical role in the operation of robot vacuum cleaners, enabling them to intelligently navigate to clean and avoid indoor obstacles. Due to limited computational resources, manufacturers must balance performance and cost. This necessitates the development of lightweight AI models that can achieve high performance. Traditional uniform weight quantization assigns the same number of levels to all weights, regardless of their distribution or importance. Consequently, this lack of adaptability may lead to sub-optimal quantization results, as the quantization levels do not align with the statistical properties of the weights. To address this challenge, in this work, we propose a new technique called low bitwidth strong non-uniform quantization, which largely reduces the memory footprint of AI models while maintaining high accuracy. Our proposed non-uniform quantization method, as opposed to traditional uniform quantization, aims to align with the actual weight distribution of well-trained neural network models. The proposed quantization scheme builds upon the observation of weight distribution characteristics in AI models and aims to leverage this knowledge to enhance the efficiency of neural network implementations. Additionally, we adjust the input image size to reduce the computational and memory demands of AI models. The goal is to identify an appropriate image size and its corresponding AI models that can be used in resource-constrained robot vacuum cleaners while still achieving acceptable accuracy on the object classification task. Experimental results indicate that when compared to the state-of-the-art AI models in the literature, the proposed AI model achieves a 2-fold decrease in memory usage from 15.51 MB down to 7.68 MB while maintaining the same accuracy of around 93%. In addition, the proposed non-uniform quantization model reduces memory usage by 20 times (from 15.51 MB down to 0.78 MB) with a slight accuracy drop of 3.11% (the classification accuracy is still above 90%). Thus, our proposed high-performance and lightweight AI model strikes an excellent balance between model complexity, classification accuracy, and computational resources for robot vacuum cleaners.","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"12 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76532360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The number of Internet of Things (IoT) devices has increased considerably in the past few years, resulting in a large growth of cyber attacks on IoT infrastructure. As part of a defense in depth approach to cybersecurity, intrusion detection systems (IDSs) have acquired a key role in attempting to detect malicious activities efficiently. Most modern approaches to IDS in IoT are based on machine learning (ML) techniques. The majority of these are centralized, which implies the sharing of data from source devices to a central server for classification. This presents potentially crucial issues related to privacy of user data as well as challenges in data transfers due to their volumes. In this article, we evaluate the use of federated learning (FL) as a method to implement intrusion detection in IoT environments. FL is an alternative, distributed method to centralized ML models, which has seen a surge of interest in IoT intrusion detection recently. In our implementation, we evaluate FL using a shallow artificial neural network (ANN) as the shared model and federated averaging (FedAvg) as the aggregation algorithm. The experiments are completed on the ToN_IoT and CICIDS2017 datasets in binary and multiclass classification. Classification is performed by the distributed devices using their own data. No sharing of data occurs among participants, maintaining data privacy. When compared against a centralized approach, results have shown that a collaborative FL IDS can be an efficient alternative, in terms of accuracy, precision, recall and F1-score, making it a viable option as an IoT IDS. Additionally, with these results as baseline, we have evaluated alternative aggregation algorithms, namely FedAvgM, FedAdam and FedAdagrad, in the same setting by using the Flower FL framework. The results from the evaluation show that, in our scenario, FedAvg and FedAvgM tend to perform better compared to the two adaptive algorithms, FedAdam and FedAdagrad.
{"title":"Federated Learning for IoT Intrusion Detection","authors":"Riccardo Lazzarini, H. Tianfield, V. Charissis","doi":"10.3390/ai4030028","DOIUrl":"https://doi.org/10.3390/ai4030028","url":null,"abstract":"The number of Internet of Things (IoT) devices has increased considerably in the past few years, resulting in a large growth of cyber attacks on IoT infrastructure. As part of a defense in depth approach to cybersecurity, intrusion detection systems (IDSs) have acquired a key role in attempting to detect malicious activities efficiently. Most modern approaches to IDS in IoT are based on machine learning (ML) techniques. The majority of these are centralized, which implies the sharing of data from source devices to a central server for classification. This presents potentially crucial issues related to privacy of user data as well as challenges in data transfers due to their volumes. In this article, we evaluate the use of federated learning (FL) as a method to implement intrusion detection in IoT environments. FL is an alternative, distributed method to centralized ML models, which has seen a surge of interest in IoT intrusion detection recently. In our implementation, we evaluate FL using a shallow artificial neural network (ANN) as the shared model and federated averaging (FedAvg) as the aggregation algorithm. The experiments are completed on the ToN_IoT and CICIDS2017 datasets in binary and multiclass classification. Classification is performed by the distributed devices using their own data. No sharing of data occurs among participants, maintaining data privacy. When compared against a centralized approach, results have shown that a collaborative FL IDS can be an efficient alternative, in terms of accuracy, precision, recall and F1-score, making it a viable option as an IoT IDS. Additionally, with these results as baseline, we have evaluated alternative aggregation algorithms, namely FedAvgM, FedAdam and FedAdagrad, in the same setting by using the Flower FL framework. The results from the evaluation show that, in our scenario, FedAvg and FedAvgM tend to perform better compared to the two adaptive algorithms, FedAdam and FedAdagrad.","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"105 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80869430","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Perhaps one of the best-known machine learning models is the artificial neural network, where a number of parameters must be adjusted to learn a wide range of practical problems from areas such as physics, chemistry, medicine, etc. Such problems can be reduced to pattern recognition problems and then modeled from artificial neural networks, whether these problems are classification problems or regression problems. To achieve the goal of neural networks, they must be trained by appropriately adjusting their parameters using some global optimization methods. In this work, the application of a recent global minimization technique is suggested for the adjustment of neural network parameters. In this technique, an approximation of the objective function to be minimized is created using artificial neural networks and then sampling is performed from the approximation function and not the original one. Therefore, in the present work, learning of the parameters of artificial neural networks is performed using other neural networks. The new training method was tested on a series of well-known problems, a comparative study was conducted against other neural network parameter tuning techniques, and the results were more than promising. From what was seen after performing the experiments and comparing the proposed technique with others that have been used for classification datasets as well as regression datasets, there was a significant difference in the performance of the proposed technique, starting with 30% for classification datasets and reaching 50% for regression problems. However, the proposed technique, because it presupposes the use of global optimization techniques involving artificial neural networks, may require significantly higher execution time than other techniques.
{"title":"Training Artificial Neural Networks Using a Global Optimization Method That Utilizes Neural Networks","authors":"I. Tsoulos, Alexandros T. Tzallas","doi":"10.3390/ai4030027","DOIUrl":"https://doi.org/10.3390/ai4030027","url":null,"abstract":"Perhaps one of the best-known machine learning models is the artificial neural network, where a number of parameters must be adjusted to learn a wide range of practical problems from areas such as physics, chemistry, medicine, etc. Such problems can be reduced to pattern recognition problems and then modeled from artificial neural networks, whether these problems are classification problems or regression problems. To achieve the goal of neural networks, they must be trained by appropriately adjusting their parameters using some global optimization methods. In this work, the application of a recent global minimization technique is suggested for the adjustment of neural network parameters. In this technique, an approximation of the objective function to be minimized is created using artificial neural networks and then sampling is performed from the approximation function and not the original one. Therefore, in the present work, learning of the parameters of artificial neural networks is performed using other neural networks. The new training method was tested on a series of well-known problems, a comparative study was conducted against other neural network parameter tuning techniques, and the results were more than promising. From what was seen after performing the experiments and comparing the proposed technique with others that have been used for classification datasets as well as regression datasets, there was a significant difference in the performance of the proposed technique, starting with 30% for classification datasets and reaching 50% for regression problems. However, the proposed technique, because it presupposes the use of global optimization techniques involving artificial neural networks, may require significantly higher execution time than other techniques.","PeriodicalId":7854,"journal":{"name":"Ai Magazine","volume":"391 1","pages":""},"PeriodicalIF":0.9,"publicationDate":"2023-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79645574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}