Louisa Heidrich, Emanuel Slany, Stephan Scheele, Ute Schmid
The rise of machine-learning applications in domains with critical end-user impact has led to a growing concern about the fairness of learned models, with the goal of avoiding biases that negatively impact specific demographic groups. Most existing bias-mitigation strategies adapt the importance of data instances during pre-processing. Since fairness is a contextual concept, we advocate for an interactive machine-learning approach that enables users to provide iterative feedback for model adaptation. Specifically, we propose to adapt the explanatory interactive machine-learning approach Caipi for fair machine learning. FairCaipi incorporates human feedback in the loop on predictions and explanations to improve the fairness of the model. Experimental results demonstrate that FairCaipi outperforms a state-of-the-art pre-processing bias mitigation strategy in terms of the fairness and the predictive performance of the resulting machine-learning model. We show that FairCaipi can both uncover and reduce bias in machine-learning models and allows us to detect human bias.
{"title":"FairCaipi: A Combination of Explanatory Interactive and Fair Machine Learning for Human and Machine Bias Reduction","authors":"Louisa Heidrich, Emanuel Slany, Stephan Scheele, Ute Schmid","doi":"10.3390/make5040076","DOIUrl":"https://doi.org/10.3390/make5040076","url":null,"abstract":"The rise of machine-learning applications in domains with critical end-user impact has led to a growing concern about the fairness of learned models, with the goal of avoiding biases that negatively impact specific demographic groups. Most existing bias-mitigation strategies adapt the importance of data instances during pre-processing. Since fairness is a contextual concept, we advocate for an interactive machine-learning approach that enables users to provide iterative feedback for model adaptation. Specifically, we propose to adapt the explanatory interactive machine-learning approach Caipi for fair machine learning. FairCaipi incorporates human feedback in the loop on predictions and explanations to improve the fairness of the model. Experimental results demonstrate that FairCaipi outperforms a state-of-the-art pre-processing bias mitigation strategy in terms of the fairness and the predictive performance of the resulting machine-learning model. We show that FairCaipi can both uncover and reduce bias in machine-learning models and allows us to detect human bias.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135889021","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruchita Mehta, Sara Sharifzadeh, Vasile Palade, Bo Tan, Alireza Daneshkhah, Yordanka Karayaneva
Human capability to perform routine tasks declines with age and age-related problems. Remote human activity recognition (HAR) is beneficial for regular monitoring of the elderly population. This paper addresses the problem of the continuous detection of daily human activities using a mm-wave Doppler radar. In this study, two strategies have been employed: the first method uses un-equalized series of activities, whereas the second method utilizes a gradient-based strategy for equalization of the series of activities. The dynamic time warping (DTW) algorithm and Long Short-term Memory (LSTM) techniques have been implemented for the classification of un-equalized and equalized series of activities, respectively. The input for DTW was provided using three strategies. The first approach uses the pixel-level data of frames (UnSup-PLevel). In the other two strategies, a convolutional variational autoencoder (CVAE) is used to extract Un-Supervised Encoded features (UnSup-EnLevel) and Supervised Encoded features (Sup-EnLevel) from the series of Doppler frames. The second approach for equalized data series involves the application of four distinct feature extraction methods: i.e., convolutional neural networks (CNN), supervised and unsupervised CVAE, and principal component Analysis (PCA). The extracted features were considered as an input to the LSTM. This paper presents a comparative analysis of a novel supervised feature extraction pipeline, employing Sup-ENLevel-DTW and Sup-EnLevel-LSTM, against several state-of-the-art unsupervised methods, including UnSUp-EnLevel-DTW, UnSup-EnLevel-LSTM, CNN-LSTM, and PCA-LSTM. The results demonstrate the superiority of the Sup-EnLevel-LSTM strategy. However, the UnSup-PLevel strategy worked surprisingly well without using annotations and frame equalization.
{"title":"Deep Learning Techniques for Radar-Based Continuous Human Activity Recognition","authors":"Ruchita Mehta, Sara Sharifzadeh, Vasile Palade, Bo Tan, Alireza Daneshkhah, Yordanka Karayaneva","doi":"10.3390/make5040075","DOIUrl":"https://doi.org/10.3390/make5040075","url":null,"abstract":"Human capability to perform routine tasks declines with age and age-related problems. Remote human activity recognition (HAR) is beneficial for regular monitoring of the elderly population. This paper addresses the problem of the continuous detection of daily human activities using a mm-wave Doppler radar. In this study, two strategies have been employed: the first method uses un-equalized series of activities, whereas the second method utilizes a gradient-based strategy for equalization of the series of activities. The dynamic time warping (DTW) algorithm and Long Short-term Memory (LSTM) techniques have been implemented for the classification of un-equalized and equalized series of activities, respectively. The input for DTW was provided using three strategies. The first approach uses the pixel-level data of frames (UnSup-PLevel). In the other two strategies, a convolutional variational autoencoder (CVAE) is used to extract Un-Supervised Encoded features (UnSup-EnLevel) and Supervised Encoded features (Sup-EnLevel) from the series of Doppler frames. The second approach for equalized data series involves the application of four distinct feature extraction methods: i.e., convolutional neural networks (CNN), supervised and unsupervised CVAE, and principal component Analysis (PCA). The extracted features were considered as an input to the LSTM. This paper presents a comparative analysis of a novel supervised feature extraction pipeline, employing Sup-ENLevel-DTW and Sup-EnLevel-LSTM, against several state-of-the-art unsupervised methods, including UnSUp-EnLevel-DTW, UnSup-EnLevel-LSTM, CNN-LSTM, and PCA-LSTM. The results demonstrate the superiority of the Sup-EnLevel-LSTM strategy. However, the UnSup-PLevel strategy worked surprisingly well without using annotations and frame equalization.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135766400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joel Arweiler, Cihan Ates, Jesus Cerquides, Rainer Koch, Hans-Jörg Bauer
The inherent dependency of deep learning models on labeled data is a well-known problem and one of the barriers that slows down the integration of such methods into different fields of applied sciences and engineering, in which experimental and numerical methods can easily generate a colossal amount of unlabeled data. This paper proposes an unsupervised domain adaptation methodology that mimics the peer review process to label new observations in a different domain from the training set. The approach evaluates the validity of a hypothesis using domain knowledge acquired from the training set through a similarity analysis, exploring the projected feature space to examine the class centroid shifts. The methodology is tested on a binary classification problem, where synthetic images of cubes and cylinders in different orientations are generated. The methodology improves the accuracy of the object classifier from 60% to around 90% in the case of a domain shift in physical feature space without human labeling.
{"title":"Similarity-Based Framework for Unsupervised Domain Adaptation: Peer Reviewing Policy for Pseudo-Labeling","authors":"Joel Arweiler, Cihan Ates, Jesus Cerquides, Rainer Koch, Hans-Jörg Bauer","doi":"10.3390/make5040074","DOIUrl":"https://doi.org/10.3390/make5040074","url":null,"abstract":"The inherent dependency of deep learning models on labeled data is a well-known problem and one of the barriers that slows down the integration of such methods into different fields of applied sciences and engineering, in which experimental and numerical methods can easily generate a colossal amount of unlabeled data. This paper proposes an unsupervised domain adaptation methodology that mimics the peer review process to label new observations in a different domain from the training set. The approach evaluates the validity of a hypothesis using domain knowledge acquired from the training set through a similarity analysis, exploring the projected feature space to examine the class centroid shifts. The methodology is tested on a binary classification problem, where synthetic images of cubes and cylinders in different orientations are generated. The methodology improves the accuracy of the object classifier from 60% to around 90% in the case of a domain shift in physical feature space without human labeling.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136013669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Miguel S. Soriano-Garcia, Ricardo Sevilla-Escoboza, Angel Garcia-Pedrero
Generative Adversarial Networks are powerful generative models that are used in different areas and with multiple applications. However, this type of model has a training problem called mode collapse. This problem causes the generator to not learn the complete distribution of the data with which it is trained. To force the network to learn the entire data distribution, MSSGAN is introduced. This model has multiple generators and distributes the training data in multiple subspaces, where each generator is enforced to learn only one of the groups with the help of a classifier. We demonstrate that our model performs better on the FID and Sample Distribution metrics compared to previous models to avoid mode collapse. Experimental results show how each of the generators learns different information and, in turn, generates satisfactory quality samples.
{"title":"Mssgan: Enforcing Multiple Generators to Learn Multiple Subspaces to Avoid the Mode Collapse","authors":"Miguel S. Soriano-Garcia, Ricardo Sevilla-Escoboza, Angel Garcia-Pedrero","doi":"10.3390/make5040073","DOIUrl":"https://doi.org/10.3390/make5040073","url":null,"abstract":"Generative Adversarial Networks are powerful generative models that are used in different areas and with multiple applications. However, this type of model has a training problem called mode collapse. This problem causes the generator to not learn the complete distribution of the data with which it is trained. To force the network to learn the entire data distribution, MSSGAN is introduced. This model has multiple generators and distributes the training data in multiple subspaces, where each generator is enforced to learn only one of the groups with the help of a classifier. We demonstrate that our model performs better on the FID and Sample Distribution metrics compared to previous models to avoid mode collapse. Experimental results show how each of the generators learns different information and, in turn, generates satisfactory quality samples.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136358073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep the capacity high. We investigate training a Deep Convolutional Q-learning agent across 20 Atari games intentionally reducing Experience Replay capacity from 1×106 to 5×102. We find that a reduction from 1×104 to 5×103 doesn’t significantly affect rewards, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, we employ a novel method: visualizing Experience Replay via Deep SHAP Explainer. This approach fosters comprehension and transparent, interpretable explanations, though any capacity reduction must be cautious to avoid overfitting. Our study demonstrates the feasibility of reducing Experience Replay and advocates for transparent, interpretable decision explanations using the Deep SHAP Explainer to promote enhancing resource efficiency in Experience Replay.
{"title":"Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations","authors":"Robert S. Sullivan, Luca Longo","doi":"10.3390/make5040072","DOIUrl":"https://doi.org/10.3390/make5040072","url":null,"abstract":"Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep the capacity high. We investigate training a Deep Convolutional Q-learning agent across 20 Atari games intentionally reducing Experience Replay capacity from 1×106 to 5×102. We find that a reduction from 1×104 to 5×103 doesn’t significantly affect rewards, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, we employ a novel method: visualizing Experience Replay via Deep SHAP Explainer. This approach fosters comprehension and transparent, interpretable explanations, though any capacity reduction must be cautious to avoid overfitting. Our study demonstrates the feasibility of reducing Experience Replay and advocates for transparent, interpretable decision explanations using the Deep SHAP Explainer to promote enhancing resource efficiency in Experience Replay.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135094453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Veronika Smejkalová, Radovan Šomplák, Martin Rosecký, Kristína Šramková
Analysis of data is crucial in waste management to improve effective planning from both short- and long-term perspectives. Real-world data often presents anomalies, but in the waste management sector, anomaly detection is seldom performed. The main goal and contribution of this paper is a proposal of a complex machine learning framework for changepoint detection in a large number of short time series from waste management. In such a case, it is not possible to use only an expert-based approach due to the time-consuming nature of this process and subjectivity. The proposed framework consists of two steps: (1) outlier detection via outlier test for trend-adjusted data, and (2) changepoints are identified via comparison of linear model parameters. In order to use the proposed method, it is necessary to have a sufficient number of experts’ assessments of the presence of anomalies in time series. The proposed framework is demonstrated on waste management data from the Czech Republic. It is observed that certain waste categories in specific regions frequently exhibit changepoints. On the micro-regional level, approximately 31.1% of time series contain at least one outlier and 16.4% exhibit changepoints. Certain groups of waste are more prone to the occurrence of anomalies. The results indicate that even in the case of aggregated data, anomalies are not rare, and their presence should always be checked.
{"title":"Machine Learning Method for Changepoint Detection in Short Time Series Data","authors":"Veronika Smejkalová, Radovan Šomplák, Martin Rosecký, Kristína Šramková","doi":"10.3390/make5040071","DOIUrl":"https://doi.org/10.3390/make5040071","url":null,"abstract":"Analysis of data is crucial in waste management to improve effective planning from both short- and long-term perspectives. Real-world data often presents anomalies, but in the waste management sector, anomaly detection is seldom performed. The main goal and contribution of this paper is a proposal of a complex machine learning framework for changepoint detection in a large number of short time series from waste management. In such a case, it is not possible to use only an expert-based approach due to the time-consuming nature of this process and subjectivity. The proposed framework consists of two steps: (1) outlier detection via outlier test for trend-adjusted data, and (2) changepoints are identified via comparison of linear model parameters. In order to use the proposed method, it is necessary to have a sufficient number of experts’ assessments of the presence of anomalies in time series. The proposed framework is demonstrated on waste management data from the Czech Republic. It is observed that certain waste categories in specific regions frequently exhibit changepoints. On the micro-regional level, approximately 31.1% of time series contain at least one outlier and 16.4% exhibit changepoints. Certain groups of waste are more prone to the occurrence of anomalies. The results indicate that even in the case of aggregated data, anomalies are not rare, and their presence should always be checked.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135481730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammed Lansari, Reda Bellafqira, Katarzyna Kapusta, Vincent Thouvenot, Olivier Bettan, Gouenou Coatrieux
Federated learning (FL) is a technique that allows multiple participants to collaboratively train a Deep Neural Network (DNN) without the need to centralize their data. Among other advantages, it comes with privacy-preserving properties, making it attractive for application in sensitive contexts, such as health care or the military. Although the data are not explicitly exchanged, the training procedure requires sharing information about participants’ models. This makes the individual models vulnerable to theft or unauthorized distribution by malicious actors. To address the issue of ownership rights protection in the context of machine learning (ML), DNN watermarking methods have been developed during the last five years. Most existing works have focused on watermarking in a centralized manner, but only a few methods have been designed for FL and its unique constraints. In this paper, we provide an overview of recent advancements in federated learning watermarking, shedding light on the new challenges and opportunities that arise in this field.
{"title":"When Federated Learning Meets Watermarking: A Comprehensive Overview of Techniques for Intellectual Property Protection","authors":"Mohammed Lansari, Reda Bellafqira, Katarzyna Kapusta, Vincent Thouvenot, Olivier Bettan, Gouenou Coatrieux","doi":"10.3390/make5040070","DOIUrl":"https://doi.org/10.3390/make5040070","url":null,"abstract":"Federated learning (FL) is a technique that allows multiple participants to collaboratively train a Deep Neural Network (DNN) without the need to centralize their data. Among other advantages, it comes with privacy-preserving properties, making it attractive for application in sensitive contexts, such as health care or the military. Although the data are not explicitly exchanged, the training procedure requires sharing information about participants’ models. This makes the individual models vulnerable to theft or unauthorized distribution by malicious actors. To address the issue of ownership rights protection in the context of machine learning (ML), DNN watermarking methods have been developed during the last five years. Most existing works have focused on watermarking in a centralized manner, but only a few methods have been designed for FL and its unique constraints. In this paper, we provide an overview of recent advancements in federated learning watermarking, shedding light on the new challenges and opportunities that arise in this field.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135591387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bahareh Najafi, Saeedeh Parsaeefard, Alberto Leon-Garcia
This paper addresses the problem of learning temporal graph representations, which capture the changing nature of complex evolving networks. Existing approaches mainly focus on adding new nodes and edges to capture dynamic graph structures. However, to achieve more accurate representation of graph evolution, we consider both the addition and deletion of nodes and edges as events. These events occur at irregular time scales and are modeled using temporal point processes. Our goal is to learn the conditional intensity function of the temporal point process to investigate the influence of deletion events on node representation learning for link-level prediction. We incorporate network entropy, a measure of node and edge significance, to capture the effect of node deletion and edge removal in our framework. Additionally, we leveraged the characteristics of a generalized temporal Hawkes process, which considers the inhibitory effects of events where past occurrences can reduce future intensity. This framework enables dynamic representation learning by effectively modeling both addition and deletion events in the temporal graph. To evaluate our approach, we utilize autonomous system graphs, a family of inhomogeneous sparse graphs with instances of node and edge additions and deletions, in a link prediction task. By integrating these enhancements into our framework, we improve the accuracy of dynamic link prediction and enable better understanding of the dynamic evolution of complex networks.
{"title":"Entropy-Aware Time-Varying Graph Neural Networks with Generalized Temporal Hawkes Process: Dynamic Link Prediction in the Presence of Node Addition and Deletion","authors":"Bahareh Najafi, Saeedeh Parsaeefard, Alberto Leon-Garcia","doi":"10.3390/make5040069","DOIUrl":"https://doi.org/10.3390/make5040069","url":null,"abstract":"This paper addresses the problem of learning temporal graph representations, which capture the changing nature of complex evolving networks. Existing approaches mainly focus on adding new nodes and edges to capture dynamic graph structures. However, to achieve more accurate representation of graph evolution, we consider both the addition and deletion of nodes and edges as events. These events occur at irregular time scales and are modeled using temporal point processes. Our goal is to learn the conditional intensity function of the temporal point process to investigate the influence of deletion events on node representation learning for link-level prediction. We incorporate network entropy, a measure of node and edge significance, to capture the effect of node deletion and edge removal in our framework. Additionally, we leveraged the characteristics of a generalized temporal Hawkes process, which considers the inhibitory effects of events where past occurrences can reduce future intensity. This framework enables dynamic representation learning by effectively modeling both addition and deletion events in the temporal graph. To evaluate our approach, we utilize autonomous system graphs, a family of inhomogeneous sparse graphs with instances of node and edge additions and deletions, in a link prediction task. By integrating these enhancements into our framework, we improve the accuracy of dynamic link prediction and enable better understanding of the dynamic evolution of complex networks.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135591386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristian Ubal, Gustavo Di-Giorgi, Javier E. Contreras-Reyes, Rodrigo Salas
Long-term dependence is an essential feature for the predictability of time series. Estimating the parameter that describes long memory is essential to describing the behavior of time series models. However, most long memory estimation methods assume that this parameter has a constant value throughout the time series, and do not consider that the parameter may change over time. In this work, we propose an automated methodology that combines the estimation methodologies of the fractional differentiation parameter (and/or Hurst parameter) with its application to Recurrent Neural Networks (RNNs) in order for said networks to learn and predict long memory dependencies from information obtained in nonlinear time series. The proposal combines three methods that allow for better approximation in the prediction of the values of the parameters for each one of the windows obtained, using Recurrent Neural Networks as an adaptive method to learn and predict the dependencies of long memory in Time Series. For the RNNs, we have evaluated four different architectures: the Simple RNN, LSTM, the BiLSTM, and the GRU. These models are built from blocks with gates controlling the cell state and memory. We have evaluated the proposed approach using both synthetic and real-world data sets. We have simulated ARFIMA models for the synthetic data to generate several time series by varying the fractional differentiation parameter. We have evaluated the proposed approach using synthetic and real datasets using Whittle’s estimates of the Hurst parameter classically obtained in each window. We have simulated ARFIMA models in such a way that the synthetic data generate several time series by varying the fractional differentiation parameter. The real-world IPSA stock option index and Tree Ringtime series datasets were evaluated. All of the results show that the proposed approach can predict the Hurst exponent with good performance by selecting the optimal window size and overlap change.
{"title":"Predicting the Long-Term Dependencies in Time Series Using Recurrent Artificial Neural Networks","authors":"Cristian Ubal, Gustavo Di-Giorgi, Javier E. Contreras-Reyes, Rodrigo Salas","doi":"10.3390/make5040068","DOIUrl":"https://doi.org/10.3390/make5040068","url":null,"abstract":"Long-term dependence is an essential feature for the predictability of time series. Estimating the parameter that describes long memory is essential to describing the behavior of time series models. However, most long memory estimation methods assume that this parameter has a constant value throughout the time series, and do not consider that the parameter may change over time. In this work, we propose an automated methodology that combines the estimation methodologies of the fractional differentiation parameter (and/or Hurst parameter) with its application to Recurrent Neural Networks (RNNs) in order for said networks to learn and predict long memory dependencies from information obtained in nonlinear time series. The proposal combines three methods that allow for better approximation in the prediction of the values of the parameters for each one of the windows obtained, using Recurrent Neural Networks as an adaptive method to learn and predict the dependencies of long memory in Time Series. For the RNNs, we have evaluated four different architectures: the Simple RNN, LSTM, the BiLSTM, and the GRU. These models are built from blocks with gates controlling the cell state and memory. We have evaluated the proposed approach using both synthetic and real-world data sets. We have simulated ARFIMA models for the synthetic data to generate several time series by varying the fractional differentiation parameter. We have evaluated the proposed approach using synthetic and real datasets using Whittle’s estimates of the Hurst parameter classically obtained in each window. We have simulated ARFIMA models in such a way that the synthetic data generate several time series by varying the fractional differentiation parameter. The real-world IPSA stock option index and Tree Ringtime series datasets were evaluated. All of the results show that the proposed approach can predict the Hurst exponent with good performance by selecting the optimal window size and overlap change.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135898923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This study introduces an optimal topology of vision transformers for real-time video action recognition in a cloud-based solution. Although model performance is a key criterion for real-time video analysis use cases, inference latency plays a more crucial role in adopting such technology in real-world scenarios. Our objective is to reduce the inference latency of the solution while admissibly maintaining the vision transformer’s performance. Thus, we employed the optimal cloud components as the foundation of our machine learning pipeline and optimized the topology of vision transformers. We utilized UCF101, including more than one million action recognition video clips. The modeling pipeline consists of a preprocessing module to extract frames from video clips, training two-dimensional (2D) vision transformer models, and deep learning baselines. The pipeline also includes a postprocessing step to aggregate the frame-level predictions to generate the video-level predictions at inference. The results demonstrate that our optimal vision transformer model with an input dimension of 56 × 56 × 3 with eight attention heads produces an F1 score of 91.497% for the testing set. The optimized vision transformer reduces the inference latency by 40.70%, measured through a batch-processing approach, with a 55.63% faster training time than the baseline. Lastly, we developed an enhanced skip-frame approach to improve the inference latency by finding an optimal ratio of frames for prediction at inference, where we could further reduce the inference latency by 57.15%. This study reveals that the vision transformer model is highly optimizable for inference latency while maintaining the model performance.
{"title":"Optimal Topology of Vision Transformer for Real-Time Video Action Recognition in an End-To-End Cloud Solution","authors":"Saman Sarraf, Milton Kabia","doi":"10.3390/make5040067","DOIUrl":"https://doi.org/10.3390/make5040067","url":null,"abstract":"This study introduces an optimal topology of vision transformers for real-time video action recognition in a cloud-based solution. Although model performance is a key criterion for real-time video analysis use cases, inference latency plays a more crucial role in adopting such technology in real-world scenarios. Our objective is to reduce the inference latency of the solution while admissibly maintaining the vision transformer’s performance. Thus, we employed the optimal cloud components as the foundation of our machine learning pipeline and optimized the topology of vision transformers. We utilized UCF101, including more than one million action recognition video clips. The modeling pipeline consists of a preprocessing module to extract frames from video clips, training two-dimensional (2D) vision transformer models, and deep learning baselines. The pipeline also includes a postprocessing step to aggregate the frame-level predictions to generate the video-level predictions at inference. The results demonstrate that our optimal vision transformer model with an input dimension of 56 × 56 × 3 with eight attention heads produces an F1 score of 91.497% for the testing set. The optimized vision transformer reduces the inference latency by 40.70%, measured through a batch-processing approach, with a 55.63% faster training time than the baseline. Lastly, we developed an enhanced skip-frame approach to improve the inference latency by finding an optimal ratio of frames for prediction at inference, where we could further reduce the inference latency by 57.15%. This study reveals that the vision transformer model is highly optimizable for inference latency while maintaining the model performance.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135246715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}