Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00056
Hao Xin, M. Zhu
Image-to-image regression is an important computer vision task. In this paper, we propose a novel image-to-image regression model following the recent trend in generative modeling that employs Stochastic Differential Equations (SDEs) and score matching. We first apply diffusion processes to regression data using designed SDEs, and then perform inference by gradually reversing the processes. In particular, our method uses synchronized diffusion, which simultaneously applies diffusion to both input and response images to stabilize diffusion and subsequent parameter learning. Furthermore, based on the Expectation-Maximization (EM) algorithm, we develop an effective algorithm for prediction. We implement a conditional U-Net architecture with pre-trained DenseNet encoder for our proposed model and refer to it as DenseSocre. Our new model is able to generate diverse outcomes for image colorization, and the proposed prediction algorithm is able to achieve close to state-of-art performance on high-resolution monocular depth estimation.
{"title":"Score-based Image-to-Image Regression with Synchronized Diffusion","authors":"Hao Xin, M. Zhu","doi":"10.1109/ICMLA55696.2022.00056","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00056","url":null,"abstract":"Image-to-image regression is an important computer vision task. In this paper, we propose a novel image-to-image regression model following the recent trend in generative modeling that employs Stochastic Differential Equations (SDEs) and score matching. We first apply diffusion processes to regression data using designed SDEs, and then perform inference by gradually reversing the processes. In particular, our method uses synchronized diffusion, which simultaneously applies diffusion to both input and response images to stabilize diffusion and subsequent parameter learning. Furthermore, based on the Expectation-Maximization (EM) algorithm, we develop an effective algorithm for prediction. We implement a conditional U-Net architecture with pre-trained DenseNet encoder for our proposed model and refer to it as DenseSocre. Our new model is able to generate diverse outcomes for image colorization, and the proposed prediction algorithm is able to achieve close to state-of-art performance on high-resolution monocular depth estimation.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"219 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122447426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00127
Sheuli Paul, Michael Sintek, Veton Këpuska, M. Silaghi, Liam Robertson
Understanding the intent is an essential step for maintaining effective communications. This essential feature is used in communications for assembling, patrolling, and surveillance. A fused and interactive multimodal system for human-robot communication, used in assembly applications, is presented in this paper. Communication is multimodal. Having the options of multiple communication modes such as gestures, text, symbols, graphics, images, and speech increase the chance of effective communication. The intent is the main component that we are aiming to model, specifically in human machine dialogues. For this, we extract the intents from spoken dialogues and fuse the intent with any detected matching gesture that is used in interaction with the robot. The main components of the presented system are: (1) a speech recognizer system using Kaldi, (2) a deep-learning based Dual Intent and Entity Transformer (DIET) based classifier for intent and entity extraction, (3) a hand gesture recognition system, and (4) a dynamic fusion model for speech and gesture based communication. These are evaluated on contextual assembly situation using a simulated interactive robot.
{"title":"Intent based Multimodal Speech and Gesture Fusion for Human-Robot Communication in Assembly Situation","authors":"Sheuli Paul, Michael Sintek, Veton Këpuska, M. Silaghi, Liam Robertson","doi":"10.1109/ICMLA55696.2022.00127","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00127","url":null,"abstract":"Understanding the intent is an essential step for maintaining effective communications. This essential feature is used in communications for assembling, patrolling, and surveillance. A fused and interactive multimodal system for human-robot communication, used in assembly applications, is presented in this paper. Communication is multimodal. Having the options of multiple communication modes such as gestures, text, symbols, graphics, images, and speech increase the chance of effective communication. The intent is the main component that we are aiming to model, specifically in human machine dialogues. For this, we extract the intents from spoken dialogues and fuse the intent with any detected matching gesture that is used in interaction with the robot. The main components of the presented system are: (1) a speech recognizer system using Kaldi, (2) a deep-learning based Dual Intent and Entity Transformer (DIET) based classifier for intent and entity extraction, (3) a hand gesture recognition system, and (4) a dynamic fusion model for speech and gesture based communication. These are evaluated on contextual assembly situation using a simulated interactive robot.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128508919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00076
Ruchita Mehta, V. Palade, S. Sharifzadeh, Bo Tan, Yordanka Karayaneva
Remote Human Activity Recognition (HAR) in a private residential area has a beneficial influence on the elderly population's life, since this group of people require regular monitoring of health conditions. This paper addresses the problem of continuous detection of daily human activities using mm-wave Doppler radar. Unlike most previous research, this work records the data in terms of continuous series of activities rather than individual activities. These series of activities are similar to real-life activity patterns. The Dynamic Time Warping (DTW) algorithm is used for the detection of human activities in the recorded time series of data and compared to other time-series classification methods. DTW requires less amount of labelled data. The input for DTW was provided using three strategies, and the obtained results were compared against each other. The first approach uses the pixel-level data of frames (named UnSup-PLevel). In the other two strategies, a Convolutional Variational Autoencoder (CVAE) is used to extract Un-Supervised Encoded features (UnSup-EnLevel) and Supervised Encoded features (Sup-EnLevel) from the series of Doppler frames. Results demonstrates the superiority of the Sup-EnLevel features over UnSup-EnLevel and UnSup-PLevel strategies. However, the performance of the UnSup-PLevel strategy worked surprisingly well without using annotations.
{"title":"Continuous Human Activity Recognition using Radar Imagery and Dynamic Time Warping","authors":"Ruchita Mehta, V. Palade, S. Sharifzadeh, Bo Tan, Yordanka Karayaneva","doi":"10.1109/ICMLA55696.2022.00076","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00076","url":null,"abstract":"Remote Human Activity Recognition (HAR) in a private residential area has a beneficial influence on the elderly population's life, since this group of people require regular monitoring of health conditions. This paper addresses the problem of continuous detection of daily human activities using mm-wave Doppler radar. Unlike most previous research, this work records the data in terms of continuous series of activities rather than individual activities. These series of activities are similar to real-life activity patterns. The Dynamic Time Warping (DTW) algorithm is used for the detection of human activities in the recorded time series of data and compared to other time-series classification methods. DTW requires less amount of labelled data. The input for DTW was provided using three strategies, and the obtained results were compared against each other. The first approach uses the pixel-level data of frames (named UnSup-PLevel). In the other two strategies, a Convolutional Variational Autoencoder (CVAE) is used to extract Un-Supervised Encoded features (UnSup-EnLevel) and Supervised Encoded features (Sup-EnLevel) from the series of Doppler frames. Results demonstrates the superiority of the Sup-EnLevel features over UnSup-EnLevel and UnSup-PLevel strategies. However, the performance of the UnSup-PLevel strategy worked surprisingly well without using annotations.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128187091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00273
Menore Tekeba Mengistu, Getachew Alemu, P. Chevaillier, P. D. Loor
In this paper, we provided an unsupervised contrastive representation learning method which uses contrastive views in which both spatial and temporal similarity-contrast is balanced. The balanced views are created by taking pixels from the anchor sample and any randomly selected negative sample and balancing the ratio of number of pixels taken from the anchor and the negative. Then these balanced views are paired with the anchor to create the positive contrastive views and all other samples paired with the anchor are taken as negative contrastive views. We made the evaluation using reinforcement learning tasks on Atari games and Deep Mind Control suites (DMControl). Our evaluations on 26 Atari games and six DMControl tasks show that the proposed method is superior in learning spatio-temporally evolving factors of the environment by capturing the relevant task controlling generative factors from the agents’ raw observations.
在本文中,我们提供了一种无监督的对比表示学习方法,该方法使用对比视图,其中空间和时间的相似性-对比度是平衡的。平衡视图是通过从锚点样本和任何随机选择的负样本中获取像素,并平衡从锚点和负样本中获取的像素数量的比例来创建的。然后将这些平衡的视图与锚配对以创建正对比视图,而与锚配对的所有其他样本都被视为负对比视图。我们使用Atari游戏和Deep Mind Control套件(DMControl)上的强化学习任务进行评估。我们对26个Atari游戏和6个DMControl任务的评估表明,该方法通过从智能体的原始观察中捕获相关的任务控制生成因素,在学习环境的时空演变因素方面具有优势。
{"title":"Balancing Similarity-Contrast in Unsupervised Representation Learning: Evaluation with Reinforcement Learning","authors":"Menore Tekeba Mengistu, Getachew Alemu, P. Chevaillier, P. D. Loor","doi":"10.1109/ICMLA55696.2022.00273","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00273","url":null,"abstract":"In this paper, we provided an unsupervised contrastive representation learning method which uses contrastive views in which both spatial and temporal similarity-contrast is balanced. The balanced views are created by taking pixels from the anchor sample and any randomly selected negative sample and balancing the ratio of number of pixels taken from the anchor and the negative. Then these balanced views are paired with the anchor to create the positive contrastive views and all other samples paired with the anchor are taken as negative contrastive views. We made the evaluation using reinforcement learning tasks on Atari games and Deep Mind Control suites (DMControl). Our evaluations on 26 Atari games and six DMControl tasks show that the proposed method is superior in learning spatio-temporally evolving factors of the environment by capturing the relevant task controlling generative factors from the agents’ raw observations.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00204
Lin Zhou, Eric Fischer, C. M. Brahms, U. Granacher, B. Arnrich
Neural networks have been successfully applied to a wide range of human motion analysis topics in combination with wearable sensor data. However, their computation process is not readily comprehensible. Alternatively, many of the model interpretation efforts do not provide physiologically-relevant insights, thus still limiting their use in clinical settings. In this work, we take gait modifications under fatigue and cognitive task performance as a use case to present how in-depth investigations of neural networks can be performed using wearable sensor data. We collected walking data from 16 young healthy individuals in unfatigued and fatigued states and under single- (walking only) and dual-task (walking while concurrently performing a cognitive task) conditions using inertial measurement units. Convolutional neural networks were able to identify both fatigue and dual-task gait patterns with high classification accuracy. To interpret the model, the importance of each time step in the input time series was visualized using Layer-wise Relevance Propagation. The visualization revealed highly individualized gait changes among participants, as well as changes at precise time steps of the input signal that allow further investigations to infer potential underlying mechanisms. Our methods enable in-depth analysis of human movement using transparent neural networks with data collected from unobtrusive, mobile wearable sensors.
{"title":"Using Transparent Neural Networks and Wearable Inertial Sensors to Generate Physiologically-Relevant Insights for Gait","authors":"Lin Zhou, Eric Fischer, C. M. Brahms, U. Granacher, B. Arnrich","doi":"10.1109/ICMLA55696.2022.00204","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00204","url":null,"abstract":"Neural networks have been successfully applied to a wide range of human motion analysis topics in combination with wearable sensor data. However, their computation process is not readily comprehensible. Alternatively, many of the model interpretation efforts do not provide physiologically-relevant insights, thus still limiting their use in clinical settings. In this work, we take gait modifications under fatigue and cognitive task performance as a use case to present how in-depth investigations of neural networks can be performed using wearable sensor data. We collected walking data from 16 young healthy individuals in unfatigued and fatigued states and under single- (walking only) and dual-task (walking while concurrently performing a cognitive task) conditions using inertial measurement units. Convolutional neural networks were able to identify both fatigue and dual-task gait patterns with high classification accuracy. To interpret the model, the importance of each time step in the input time series was visualized using Layer-wise Relevance Propagation. The visualization revealed highly individualized gait changes among participants, as well as changes at precise time steps of the input signal that allow further investigations to infer potential underlying mechanisms. Our methods enable in-depth analysis of human movement using transparent neural networks with data collected from unobtrusive, mobile wearable sensors.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"18 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130997003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00136
Dana Oshri Zalman, S. Fine
Variational inference provides a way to approximate probability densities. It does so by optimizing an upper or a lower bound on the likelihood of the observed data (the evidence). The classic variational inference approach suggests to maximize the Evidence Lower BOund (ELBO). Recent proposals suggest to optimize the variational Rényi bound (VR) and χ upper bound. However, these estimates are either biased or difficult to approximate, due to a high variance.In this paper we introduce a new upper bound (termed VRLU) which is based on the existing variational Rényi bound. In contrast to the existing VR bound, the Monte Carlo (MC) approximation of the VRLU bound is unbiased. Furthermore, we devise a (sandwiched) upper-lower bound variational inference method (termed VRS) to jointly optimize the upper and lower bounds. We present a set of experiments, designed to evaluate the new VRLU bound, and to compare the VRS method with the classic VAE and the VR methods over a set of digit recognition tasks. The experiments and results demonstrate the VRLU bound advantage, and the wide applicability of the VRS method.
{"title":"Variational Inference via Rényi Upper-Lower Bound Optimization","authors":"Dana Oshri Zalman, S. Fine","doi":"10.1109/ICMLA55696.2022.00136","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00136","url":null,"abstract":"Variational inference provides a way to approximate probability densities. It does so by optimizing an upper or a lower bound on the likelihood of the observed data (the evidence). The classic variational inference approach suggests to maximize the Evidence Lower BOund (ELBO). Recent proposals suggest to optimize the variational Rényi bound (VR) and χ upper bound. However, these estimates are either biased or difficult to approximate, due to a high variance.In this paper we introduce a new upper bound (termed VRLU) which is based on the existing variational Rényi bound. In contrast to the existing VR bound, the Monte Carlo (MC) approximation of the VRLU bound is unbiased. Furthermore, we devise a (sandwiched) upper-lower bound variational inference method (termed VRS) to jointly optimize the upper and lower bounds. We present a set of experiments, designed to evaluate the new VRLU bound, and to compare the VRS method with the classic VAE and the VR methods over a set of digit recognition tasks. The experiments and results demonstrate the VRLU bound advantage, and the wide applicability of the VRS method.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131105412","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.10102767
Samira Khorshidi, Bao Wang, G. Mohler
Temporal point processes have many applications, from crime forecasting to modeling earthquake aftershocks sequences. Due to the flexibility and expressiveness of deep learning, neural network-based approaches have recently shown promise for modeling point process intensities. However, there is a lack of research on the robustness of such models in regards to adversarial attacks and natural shocks to systems. Precisely, while neural point processes may outperform simpler parametric models on in-sample tests, how these models perform when encountering adversarial examples or sharp non-stationary trends remains unknown. Current work proposes several white-box and blackbox adversarial attacks against temporal point processes modeled by deep neural networks. Extensive experiments confirm that predictive performance and parametric modeling of neural point processes are vulnerable to adversarial attacks. Additionally, we evaluate the vulnerability and performance of these models in the presence of non-stationary abrupt changes, using the crimes dataset, during the Covid-19 pandemic, as an example.
{"title":"Adversarial Attacks on Deep Temporal Point Process","authors":"Samira Khorshidi, Bao Wang, G. Mohler","doi":"10.1109/ICMLA55696.2022.10102767","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.10102767","url":null,"abstract":"Temporal point processes have many applications, from crime forecasting to modeling earthquake aftershocks sequences. Due to the flexibility and expressiveness of deep learning, neural network-based approaches have recently shown promise for modeling point process intensities. However, there is a lack of research on the robustness of such models in regards to adversarial attacks and natural shocks to systems. Precisely, while neural point processes may outperform simpler parametric models on in-sample tests, how these models perform when encountering adversarial examples or sharp non-stationary trends remains unknown. Current work proposes several white-box and blackbox adversarial attacks against temporal point processes modeled by deep neural networks. Extensive experiments confirm that predictive performance and parametric modeling of neural point processes are vulnerable to adversarial attacks. Additionally, we evaluate the vulnerability and performance of these models in the presence of non-stationary abrupt changes, using the crimes dataset, during the Covid-19 pandemic, as an example.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129246937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00046
Wen-Hao Chiang, G. Mohler
We propose a novel framework for integrating Hawkes processes with multi-armed bandit algorithms to solve spatio-temporal event forecasting and detection problems when data may be undersampled or spatially biased. In particular, we introduce an upper confidence bound algorithm using Bayesian spatial Hawkes process estimation for balancing the trade-off between exploiting geographic regions where data has been collected and exploring geographic regions where data is unobserved. We first validate our model using simulated data. We then apply it to the problem of disaster search and rescue using calls for service data from hurricane Harvey in 2017 and the problem of detection and clearance of improvised explosive devices (IEDs) using IED attack records in Iraq. Our model outperforms state-of-the-art baseline spatial MAB algorithms in terms of cumulative reward and several other ranking evaluation metrics.
{"title":"Hawkes Process Multi-armed Bandits for Search and Rescue","authors":"Wen-Hao Chiang, G. Mohler","doi":"10.1109/ICMLA55696.2022.00046","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00046","url":null,"abstract":"We propose a novel framework for integrating Hawkes processes with multi-armed bandit algorithms to solve spatio-temporal event forecasting and detection problems when data may be undersampled or spatially biased. In particular, we introduce an upper confidence bound algorithm using Bayesian spatial Hawkes process estimation for balancing the trade-off between exploiting geographic regions where data has been collected and exploring geographic regions where data is unobserved. We first validate our model using simulated data. We then apply it to the problem of disaster search and rescue using calls for service data from hurricane Harvey in 2017 and the problem of detection and clearance of improvised explosive devices (IEDs) using IED attack records in Iraq. Our model outperforms state-of-the-art baseline spatial MAB algorithms in terms of cumulative reward and several other ranking evaluation metrics.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128672750","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00069
Junya Saito, Sachihiro Youoku, Ryosuke Kawamura, A. Uchida, Kentaro Murase, Xiaoyue Mi
Facial action units (AUs) represent muscular activities, and their recognition from facial images can capture various psychological states, such as people’s interests as consumers and mental health states. However, degradation of conditions, such as occlusions by hand, often occurs and affects the accuracy of AUs recognition in the real world. Most existing studies on degraded conditions have adopted the approach using additional training images and advanced structures of neural networks to improve the robustness of AUs recognition from a degraded facial image. However, such an approach cannot deal with cases in which evidence of the AUs is completely or almost invisible. Therefore, we propose a novel method to address the degraded conditions by predicting the uncertainties of the AUs recognition caused by them. Our method interpolates the high-uncertainty data using surrounding data to reduce the influence of the degraded conditions, and visualizes the conditions causing the uncertainties to handle cases where the conditions are very poor and need to be improved. In the evaluation experiments, the public datasets BP4D+ and DISFA were modified to degrade them for testing. By evaluating the modified test data, we demonstrated that the maximum improvement with our method was 12% for BP4D+ and 17% for DISFA, and that our method can prevent the decrease in accuracy owing to degraded conditions. We also presented some visualization examples which demonstrate that our method can reasonably predict the conditions and uncertainties.
{"title":"Uncertainty Prediction for Facial Action Units Recognition under Degraded Conditions","authors":"Junya Saito, Sachihiro Youoku, Ryosuke Kawamura, A. Uchida, Kentaro Murase, Xiaoyue Mi","doi":"10.1109/ICMLA55696.2022.00069","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00069","url":null,"abstract":"Facial action units (AUs) represent muscular activities, and their recognition from facial images can capture various psychological states, such as people’s interests as consumers and mental health states. However, degradation of conditions, such as occlusions by hand, often occurs and affects the accuracy of AUs recognition in the real world. Most existing studies on degraded conditions have adopted the approach using additional training images and advanced structures of neural networks to improve the robustness of AUs recognition from a degraded facial image. However, such an approach cannot deal with cases in which evidence of the AUs is completely or almost invisible. Therefore, we propose a novel method to address the degraded conditions by predicting the uncertainties of the AUs recognition caused by them. Our method interpolates the high-uncertainty data using surrounding data to reduce the influence of the degraded conditions, and visualizes the conditions causing the uncertainties to handle cases where the conditions are very poor and need to be improved. In the evaluation experiments, the public datasets BP4D+ and DISFA were modified to degrade them for testing. By evaluating the modified test data, we demonstrated that the maximum improvement with our method was 12% for BP4D+ and 17% for DISFA, and that our method can prevent the decrease in accuracy owing to degraded conditions. We also presented some visualization examples which demonstrate that our method can reasonably predict the conditions and uncertainties.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127349866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-12-01DOI: 10.1109/ICMLA55696.2022.00211
Adam Lehavi, S. Kim
In the realm of cybersecurity, intrusion detection systems (IDS) detect and prevent attacks based on collected computer and network data. In recent research, IDS models have been constructed using machine learning (ML) and deep learning (DL) methods such as Random Forest (RF) and deep neural networks (DNN). Feature selection (FS) can be used to construct faster, more interpretable, and more accurate models. We look at three different FS techniques; RF information gain (RF-IG), correlation feature selection using the Bat Algorithm (CFS-BA), and CFS using the Aquila Optimizer (CFS-AO). Our results show CFS-BA to be the most efficient of the FS methods, building in 55% of the time of the best RF-IG model while achieving 99.99% of its accuracy. This reinforces prior contributions attesting to CFS-BA’s accuracy while building upon the relationship between subset size, CFS score, and RF-IG score in final results.
{"title":"Feature Reduction Method Comparison Towards Explainability and Efficiency in Cybersecurity Intrusion Detection Systems","authors":"Adam Lehavi, S. Kim","doi":"10.1109/ICMLA55696.2022.00211","DOIUrl":"https://doi.org/10.1109/ICMLA55696.2022.00211","url":null,"abstract":"In the realm of cybersecurity, intrusion detection systems (IDS) detect and prevent attacks based on collected computer and network data. In recent research, IDS models have been constructed using machine learning (ML) and deep learning (DL) methods such as Random Forest (RF) and deep neural networks (DNN). Feature selection (FS) can be used to construct faster, more interpretable, and more accurate models. We look at three different FS techniques; RF information gain (RF-IG), correlation feature selection using the Bat Algorithm (CFS-BA), and CFS using the Aquila Optimizer (CFS-AO). Our results show CFS-BA to be the most efficient of the FS methods, building in 55% of the time of the best RF-IG model while achieving 99.99% of its accuracy. This reinforces prior contributions attesting to CFS-BA’s accuracy while building upon the relationship between subset size, CFS score, and RF-IG score in final results.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126637594","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}