Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00081
Yongzhao Wang, Arunesh Sinha, Sky CH-Wang, Michael P. Wellman
In many policy-learning applications, the agent may execute a set of actions at each decision stage. Choosing among an exponential number of alternatives poses a computational challenge, and even representing actions naturally expressed as sets can be a tricky design problem. Building upon prior approaches that employ deep neural networks and iterative construction of action sets, we introduce a reward-shaping approach to apportion reward to each atomic action based on its marginal contribution within an action set, thereby providing useful feedback for learning to build these sets. We demonstrate our method in two environments where action spaces are combinatorial. Experiments reveal that our method significantly accelerates and stabilizes policy learning with combinatorial actions.
{"title":"Building Action Sets in a Deep Reinforcement Learner","authors":"Yongzhao Wang, Arunesh Sinha, Sky CH-Wang, Michael P. Wellman","doi":"10.1109/ICMLA52953.2021.00081","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00081","url":null,"abstract":"In many policy-learning applications, the agent may execute a set of actions at each decision stage. Choosing among an exponential number of alternatives poses a computational challenge, and even representing actions naturally expressed as sets can be a tricky design problem. Building upon prior approaches that employ deep neural networks and iterative construction of action sets, we introduce a reward-shaping approach to apportion reward to each atomic action based on its marginal contribution within an action set, thereby providing useful feedback for learning to build these sets. We demonstrate our method in two environments where action spaces are combinatorial. Experiments reveal that our method significantly accelerates and stabilizes policy learning with combinatorial actions.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"42 1","pages":"484-489"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78749589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00230
Lingxiao Wang, S. Pang, Jinlong Li
Autonomous odor source localization (OSL) has been viewed as a challenging task due to the nature of turbulent airflows and the resulting odor plume characteristics. Here we present an olfactory-based navigation algorithm via deep learning (DL) methods, which navigates a mobile robot to find an odor source without explicating specific search algorithms. Two types of deep neural networks (DNNs), namely traditional feedforward and convolutional neural networks (FNN and CNN), are proposed to generate robot velocity commands on x and y directions based on onboard sensor measurements. Training data is obtained by applying the traditional olfactory-based navigation algorithms, including moth-inspired and Bayesian-inference methods, in thousands of simulated OSL trials. After the supervised training, DNN models are validated in OSL tests with varying search conditions. Experiment results show that given the same training data, CNN is more effective than FNN, and by training with a fused data set, the proposed CNN achieves a comparable search performance with the Bayesian-inference method while requires less computational time.
{"title":"Learn to Trace Odors: Autonomous Odor Source Localization via Deep Learning Methods","authors":"Lingxiao Wang, S. Pang, Jinlong Li","doi":"10.1109/ICMLA52953.2021.00230","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00230","url":null,"abstract":"Autonomous odor source localization (OSL) has been viewed as a challenging task due to the nature of turbulent airflows and the resulting odor plume characteristics. Here we present an olfactory-based navigation algorithm via deep learning (DL) methods, which navigates a mobile robot to find an odor source without explicating specific search algorithms. Two types of deep neural networks (DNNs), namely traditional feedforward and convolutional neural networks (FNN and CNN), are proposed to generate robot velocity commands on x and y directions based on onboard sensor measurements. Training data is obtained by applying the traditional olfactory-based navigation algorithms, including moth-inspired and Bayesian-inference methods, in thousands of simulated OSL trials. After the supervised training, DNN models are validated in OSL tests with varying search conditions. Experiment results show that given the same training data, CNN is more effective than FNN, and by training with a fused data set, the proposed CNN achieves a comparable search performance with the Bayesian-inference method while requires less computational time.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"36 1","pages":"1429-1436"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85823074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00146
Geonhak Song, Tien-Dung Nguyen, J. Bum, Hwijong Yi, C. Son, Hyunseung Choo
Generative adversarial network (GAN)-based methods recover perceptually pleasant details in super resolution (SR), but they pertain to structural distortions. Recent study alleviates such structural distortions by attaching a gradient branch to the generator. However, this method compromises the perceptual details. In this paper, we propose a sparse gradient-guided attention generative adversarial network (SGAGAN), which incorporates a modified residual-in-residual sparse block (MRRSB) in the gradient branch and gradient-guided self-attention (GSA) to suppress structural distortions. Compared to the most frequently used block in GAN-based SR methods, i.e., residual-in-residual dense block (RRDB), MRRSB reduces computational cost and avoids gradient redundancy. In addition, GSA emphasizes the highly correlated features in the generator by guiding sparse gradient. It captures the semantic information by connecting the global interdependencies of the sparse gradient features in the gradient branch and the features in the SR branch. Experimental results show that SGAGAN relieves the structural distortions and generates more realistic images compared to state-of-the-art SR methods. Qualitative and quantitative evaluations in the ablation study show that combining GSA and MRRSB together has a better perceptual quality than combining self-attention alone.
{"title":"Super Resolution with Sparse Gradient-Guided Attention for Suppressing Structural Distortion","authors":"Geonhak Song, Tien-Dung Nguyen, J. Bum, Hwijong Yi, C. Son, Hyunseung Choo","doi":"10.1109/ICMLA52953.2021.00146","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00146","url":null,"abstract":"Generative adversarial network (GAN)-based methods recover perceptually pleasant details in super resolution (SR), but they pertain to structural distortions. Recent study alleviates such structural distortions by attaching a gradient branch to the generator. However, this method compromises the perceptual details. In this paper, we propose a sparse gradient-guided attention generative adversarial network (SGAGAN), which incorporates a modified residual-in-residual sparse block (MRRSB) in the gradient branch and gradient-guided self-attention (GSA) to suppress structural distortions. Compared to the most frequently used block in GAN-based SR methods, i.e., residual-in-residual dense block (RRDB), MRRSB reduces computational cost and avoids gradient redundancy. In addition, GSA emphasizes the highly correlated features in the generator by guiding sparse gradient. It captures the semantic information by connecting the global interdependencies of the sparse gradient features in the gradient branch and the features in the SR branch. Experimental results show that SGAGAN relieves the structural distortions and generates more realistic images compared to state-of-the-art SR methods. Qualitative and quantitative evaluations in the ablation study show that combining GSA and MRRSB together has a better perceptual quality than combining self-attention alone.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"104 1","pages":"885-890"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88163013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00013
R. Zuech, John T. Hancock, T. Khoshgoftaar
We introduce the novel concept of feature popularity with three different web attacks and big data from the CSE-CIC-IDS2018 dataset: Brute Force, SQL Injection, and XSS web attacks. Feature popularity is based upon ensemble Feature Selection Techniques (FSTs) and allows us to more easily understand common important features between different cyberattacks, for two main reasons. First, feature popularity lists can be generated to provide an easy comprehension of important features across different attacks. Second, the Jaccard similarity metric can provide a quantitative score for how similar feature subsets are between different attacks. Both of these approaches not only provide more explainable and easier-to-understand models, but they can also reduce the complexity of implementing models in real-world systems. Four supervised learning-based FSTs are used to generate feature subsets for each of our three different web attack datasets, and then our feature popularity frameworks are applied. For these three web attacks, the XSS and SQL Injection feature subsets are the most similar per the Jaccard similarity. The most popular features across all three web attacks are: Flow_Bytes_s, Flow_IAT_Max, and Flow_Packets_s. While this introductory study is only a simple example using only three web attacks, this feature popularity concept can be easily extended, allowing an automated framework to more easily determine the most popular features across a very large number of attacks and features.
{"title":"Feature Popularity Between Different Web Attacks with Supervised Feature Selection Rankers","authors":"R. Zuech, John T. Hancock, T. Khoshgoftaar","doi":"10.1109/ICMLA52953.2021.00013","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00013","url":null,"abstract":"We introduce the novel concept of feature popularity with three different web attacks and big data from the CSE-CIC-IDS2018 dataset: Brute Force, SQL Injection, and XSS web attacks. Feature popularity is based upon ensemble Feature Selection Techniques (FSTs) and allows us to more easily understand common important features between different cyberattacks, for two main reasons. First, feature popularity lists can be generated to provide an easy comprehension of important features across different attacks. Second, the Jaccard similarity metric can provide a quantitative score for how similar feature subsets are between different attacks. Both of these approaches not only provide more explainable and easier-to-understand models, but they can also reduce the complexity of implementing models in real-world systems. Four supervised learning-based FSTs are used to generate feature subsets for each of our three different web attack datasets, and then our feature popularity frameworks are applied. For these three web attacks, the XSS and SQL Injection feature subsets are the most similar per the Jaccard similarity. The most popular features across all three web attacks are: Flow_Bytes_s, Flow_IAT_Max, and Flow_Packets_s. While this introductory study is only a simple example using only three web attacks, this feature popularity concept can be easily extended, allowing an automated framework to more easily determine the most popular features across a very large number of attacks and features.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"94 1","pages":"30-37"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88192387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00213
M. L. Tlachac, E. Toto, Joshua Lovering, Rimsha Kayastha, Nina Taurich, E. Rundensteiner
Mental illnesses are often undiagnosed, demonstrating need for an effective unbiased alternative to traditional screening surveys. For this we propose our Early Mental Health Uncovering (EMU) framework that supports near instantaneous mental illness screening with non-intrusive active and passive modalities. We designed, deployed, and evaluated the EMU app to passively collect retrospective digital phenotype data and actively collect short voice recordings. Additionally, the EMU app also administered depression and anxiety screening surveys to produce depression and anxiety screening labels for the data. Notably, more than twice as many participants elected to share scripted audio recordings than any passive modality. We then study the effectiveness of machine learning models trained with the active modalities. Using scripted audio, EMU screens for depression with F1=0.746, anxiety with F1=0.667, and suicidal ideation with F1=0.706. Using unscripted audio, EMU screens for depression with F1=0.691, anxiety with F1=0.636, and suicidal ideation with F1=0.667. Jitter is an important feature for screening with scripted audio, while Mel-Frequency Cepstral Coefficient is an important feature for screening with unscripted audio. Further, the frequency of help-related words carried a strong signal for suicidal ideation screening with unscripted audio transcripts. This research results in a deeper understanding of the selection of modalities and corresponding features for mobile screening. The EMU dataset will be made available to public domain, representing valuable data resource for the community to further advance universal mental illness screening research.
{"title":"EMU: Early Mental Health Uncovering Framework and Dataset","authors":"M. L. Tlachac, E. Toto, Joshua Lovering, Rimsha Kayastha, Nina Taurich, E. Rundensteiner","doi":"10.1109/ICMLA52953.2021.00213","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00213","url":null,"abstract":"Mental illnesses are often undiagnosed, demonstrating need for an effective unbiased alternative to traditional screening surveys. For this we propose our Early Mental Health Uncovering (EMU) framework that supports near instantaneous mental illness screening with non-intrusive active and passive modalities. We designed, deployed, and evaluated the EMU app to passively collect retrospective digital phenotype data and actively collect short voice recordings. Additionally, the EMU app also administered depression and anxiety screening surveys to produce depression and anxiety screening labels for the data. Notably, more than twice as many participants elected to share scripted audio recordings than any passive modality. We then study the effectiveness of machine learning models trained with the active modalities. Using scripted audio, EMU screens for depression with F1=0.746, anxiety with F1=0.667, and suicidal ideation with F1=0.706. Using unscripted audio, EMU screens for depression with F1=0.691, anxiety with F1=0.636, and suicidal ideation with F1=0.667. Jitter is an important feature for screening with scripted audio, while Mel-Frequency Cepstral Coefficient is an important feature for screening with unscripted audio. Further, the frequency of help-related words carried a strong signal for suicidal ideation screening with unscripted audio transcripts. This research results in a deeper understanding of the selection of modalities and corresponding features for mobile screening. The EMU dataset will be made available to public domain, representing valuable data resource for the community to further advance universal mental illness screening research.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"20 1","pages":"1311-1318"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86258299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00027
Milan Aryal, Nasim Yahyasoltani
Catheter is a thin tube that is inserted into patients body to provide fluids or medication. The placement of catheter in the chest is very important and if placed wrongly can be life threatening. Radiologists utilize X-ray images of the chest to determine the correctness of placement of catheter. In the time of global pandemic, when the hospitals are crowded with the patients, radiologists might not be able to manually observe all the X-rays. In this situation, an automatic method to identify catheter in the X-ray images would be of great help. In this paper, a novel method to automatically detect the presence and position of the catheter using X-ray images is developed. The proposed algorithm deploys generative adversarial network (GAN) to synthesize the catheter in X-ray images. Transfer learning is then used to classify the catheter and its correct placement. Octave convolution instead of vanilla convolution is utilized to improve the efficiency of deep learning method for classification. Through data augmentation different transformation of images are generated to make the model more robust to noisy images.
{"title":"Identifying Catheter and Line Position in Chest X-Rays Using GANs","authors":"Milan Aryal, Nasim Yahyasoltani","doi":"10.1109/ICMLA52953.2021.00027","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00027","url":null,"abstract":"Catheter is a thin tube that is inserted into patients body to provide fluids or medication. The placement of catheter in the chest is very important and if placed wrongly can be life threatening. Radiologists utilize X-ray images of the chest to determine the correctness of placement of catheter. In the time of global pandemic, when the hospitals are crowded with the patients, radiologists might not be able to manually observe all the X-rays. In this situation, an automatic method to identify catheter in the X-ray images would be of great help. In this paper, a novel method to automatically detect the presence and position of the catheter using X-ray images is developed. The proposed algorithm deploys generative adversarial network (GAN) to synthesize the catheter in X-ray images. Transfer learning is then used to classify the catheter and its correct placement. Octave convolution instead of vanilla convolution is utilized to improve the efficiency of deep learning method for classification. Through data augmentation different transformation of images are generated to make the model more robust to noisy images.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"6 1","pages":"122-127"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82948878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00150
Tiago Pinho da Silva, A. R. Parmezan, Gustavo E. A. P. A. Batista
Elections are complex activities fundamental to any democracy. The contextualized analysis of election data allows us to understand electoral behavior and the factors that influence it. Multidisciplinary studies have been prioritized the predictive modeling of electoral features from thousands of explanatory features, considering geographic and spatial aspects inherent to the data. When building a model for such a purpose, it must be rigorously evaluated to understand its prediction error in future test cases. Although cross-validation is a widely used procedure for this task, it leads to optimistic results because the spatial independence between test and training data is not ensured in the resampling. On the other hand, alternatives to deal with spatial dependence may fall into a pessimistic scenario by assuming total spatial independence between the test and training sets regardless of the size of the first one, increasing the probability of overfitting. This paper addresses these issues by proposing a graph-based spatial cross-validation approach to assess models learned with selected features from spatially contextualized electoral datasets. Our approach takes advantage of the spatial graph structure provided by the lattice-type spatial objects to define a local training set to each test fold. We generate the local training sets by removing spatially close data that are highly correlated and irrelevant distant data that may interfere with error estimates. Experiments involving the second round of the 2018 Brazilian presidential election demonstrate that our approach contributes to the fair evaluation of models by enabling more realistic and local modeling.
{"title":"A Graph-Based Spatial Cross-Validation Approach for Assessing Models Learned with Selected Features to Understand Election Results","authors":"Tiago Pinho da Silva, A. R. Parmezan, Gustavo E. A. P. A. Batista","doi":"10.1109/ICMLA52953.2021.00150","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00150","url":null,"abstract":"Elections are complex activities fundamental to any democracy. The contextualized analysis of election data allows us to understand electoral behavior and the factors that influence it. Multidisciplinary studies have been prioritized the predictive modeling of electoral features from thousands of explanatory features, considering geographic and spatial aspects inherent to the data. When building a model for such a purpose, it must be rigorously evaluated to understand its prediction error in future test cases. Although cross-validation is a widely used procedure for this task, it leads to optimistic results because the spatial independence between test and training data is not ensured in the resampling. On the other hand, alternatives to deal with spatial dependence may fall into a pessimistic scenario by assuming total spatial independence between the test and training sets regardless of the size of the first one, increasing the probability of overfitting. This paper addresses these issues by proposing a graph-based spatial cross-validation approach to assess models learned with selected features from spatially contextualized electoral datasets. Our approach takes advantage of the spatial graph structure provided by the lattice-type spatial objects to define a local training set to each test fold. We generate the local training sets by removing spatially close data that are highly correlated and irrelevant distant data that may interfere with error estimates. Experiments involving the second round of the 2018 Brazilian presidential election demonstrate that our approach contributes to the fair evaluation of models by enabling more realistic and local modeling.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"81 1","pages":"909-915"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88088855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00113
Sheldon Schiffer
Creating photorealistic facial animation for game characters is a labor-intensive process that gives authorial primacy to animators. This research presents an experimental autonomous animation controller based on an emotion model that uses a team of embedded recurrent neural networks (RNNs). The design is a novel alternative method that can elevate an actor’s contribution to game character design. This research presents the first results of combining a facial emotion neural network model with a workflow that incorporates actor preparation methods and the training of auto-regressive bi-directional RNNs with long short-term memory (LSTM) cells. The predicted emotion vectors triggered by player facial stimuli strongly resemble a performing actor for a game character with accuracies over 80% for targeted emotion labels and show accuracy near or above a high baseline standard.
{"title":"Game Character Facial Animation Using Actor Video Corpus and Recurrent Neural Networks","authors":"Sheldon Schiffer","doi":"10.1109/ICMLA52953.2021.00113","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00113","url":null,"abstract":"Creating photorealistic facial animation for game characters is a labor-intensive process that gives authorial primacy to animators. This research presents an experimental autonomous animation controller based on an emotion model that uses a team of embedded recurrent neural networks (RNNs). The design is a novel alternative method that can elevate an actor’s contribution to game character design. This research presents the first results of combining a facial emotion neural network model with a workflow that incorporates actor preparation methods and the training of auto-regressive bi-directional RNNs with long short-term memory (LSTM) cells. The predicted emotion vectors triggered by player facial stimuli strongly resemble a performing actor for a game character with accuracies over 80% for targeted emotion labels and show accuracy near or above a high baseline standard.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"9 6","pages":"674-681"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91479434","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00158
Anand Ravishankar, S. Natarajan, A. B. Malakreddy
Deep Neural Networks (DNNs) is an extremely attractive subset of computational models due to their remarkable ability to provide promising results for a wide variety of problems. However, the performance delivered by DNNs often overshadows the work done before training the network, which includes Network Architecture Search (NAS) and its suitability concerning the task. This paper presents a modified Genetic-NAS framework designed to prevent network stagnation and reduce training loss. The network hyperparameters are initialized in a “Chaos on Edge” region, preventing premature convergence through reverse biases. The Genetic-NAS and parameter space exploration process is co-evolved by applying genetic operators and subjugating them to layer-wise competition. The inherent parallelism offered by both the neural network and its genetic extension is exploited by deploying the model on a GPU which improves the throughput. the GPU device provides an acceleration of 8.4x with 92.9% of the workload placed on the GPU device for the text-based datasets. On average, the task of classifying an image-based dataset takes 3 GPU hours.
{"title":"Pruned Genetic-NAS on GPU Accelerator Platforms with Chaos-on-Edge Hyperparameters","authors":"Anand Ravishankar, S. Natarajan, A. B. Malakreddy","doi":"10.1109/ICMLA52953.2021.00158","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00158","url":null,"abstract":"Deep Neural Networks (DNNs) is an extremely attractive subset of computational models due to their remarkable ability to provide promising results for a wide variety of problems. However, the performance delivered by DNNs often overshadows the work done before training the network, which includes Network Architecture Search (NAS) and its suitability concerning the task. This paper presents a modified Genetic-NAS framework designed to prevent network stagnation and reduce training loss. The network hyperparameters are initialized in a “Chaos on Edge” region, preventing premature convergence through reverse biases. The Genetic-NAS and parameter space exploration process is co-evolved by applying genetic operators and subjugating them to layer-wise competition. The inherent parallelism offered by both the neural network and its genetic extension is exploited by deploying the model on a GPU which improves the throughput. the GPU device provides an acceleration of 8.4x with 92.9% of the workload placed on the GPU device for the text-based datasets. On average, the task of classifying an image-based dataset takes 3 GPU hours.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"28 1","pages":"958-963"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84154561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-12-01DOI: 10.1109/ICMLA52953.2021.00046
Aviv Peled, S. Fine
Latent variables pose a challenge for accurate modelling, experimental design, and inference, since they may cause non-adjustable bias in the estimation of effects. While most of the research regarding latent variables revolves around accounting for their presence and learning how they interact with other variables in the experiment, their bare existence is assumed to be deduced based on domain expertise. In this work we focus on the discovery of such latent variables, utilizing statistical hypothesis testing methods and Bayesian Networks learning. Specifically, we present a novel method for detecting discrete latent factors which affect continuous observed outcomes, in mixed discrete/continuous observed data, and device a structure learning algorithm that adds the detected latent factors to a fully observed Bayesian Network. Finally, we demonstrate the utility of our method with a set of experiments, in both controlled and real-life settings, one of which is a prediction for the outcome of COVID-19 test results.
{"title":"Discrete Latent Variables Discovery and Structure Learning in Mixed Bayesian Networks","authors":"Aviv Peled, S. Fine","doi":"10.1109/ICMLA52953.2021.00046","DOIUrl":"https://doi.org/10.1109/ICMLA52953.2021.00046","url":null,"abstract":"Latent variables pose a challenge for accurate modelling, experimental design, and inference, since they may cause non-adjustable bias in the estimation of effects. While most of the research regarding latent variables revolves around accounting for their presence and learning how they interact with other variables in the experiment, their bare existence is assumed to be deduced based on domain expertise. In this work we focus on the discovery of such latent variables, utilizing statistical hypothesis testing methods and Bayesian Networks learning. Specifically, we present a novel method for detecting discrete latent factors which affect continuous observed outcomes, in mixed discrete/continuous observed data, and device a structure learning algorithm that adds the detected latent factors to a fully observed Bayesian Network. Finally, we demonstrate the utility of our method with a set of experiments, in both controlled and real-life settings, one of which is a prediction for the outcome of COVID-19 test results.","PeriodicalId":6750,"journal":{"name":"2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"221 1","pages":"248-255"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89131009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}