Pub Date : 2024-03-22DOI: 10.1109/TAI.2024.3404414
S. Onur Sahin;Suleyman S. Kozat
We study regression (or prediction) of sequential data, which may have missing entries and/or different lengths. This problem is heavily investigated in the machine learning literature since such missingness is a common occurrence in most real-life applications due to data corruption, measurement errors, and similar. To this end, we introduce a novel hierarchical architecture involving a set of long short-term memory (LSTM) networks, which use only the existing inputs in the sequence without any imputations or statistical assumptions on the missing data. To incorporate the missingness information, we partition the input space into different regions in a hierarchical manner based on the “presence-pattern” of the previous inputs and then assign different LSTM networks to these regions. In this sense, we use the LSTM networks as our experts for these regions and adaptively combine their outputs to generate our final output. Our method is generic so that the set of partitioned regions (presence-patterns) that are modeled by the LSTM networks can be customized, and one can readily use other sequential architectures such as gated recurrent unit (GRU) networks and recurrent neural networks (RNNs) as shown in the article. We also provide the computational complexity analysis of the proposed architecture, which is in the same order as a conventional LSTM architecture. In our experiments, our algorithm achieves significant performance improvements on the well-known financial and real-life datasets with respect to the state-of-the-art methods. We also share the source code of our algorithm to facilitate other research and the replicability of our results.
{"title":"Nonlinear Regression With Hierarchical Recurrent Neural Networks Under Missing Data","authors":"S. Onur Sahin;Suleyman S. Kozat","doi":"10.1109/TAI.2024.3404414","DOIUrl":"https://doi.org/10.1109/TAI.2024.3404414","url":null,"abstract":"We study regression (or prediction) of sequential data, which may have missing entries and/or different lengths. This problem is heavily investigated in the machine learning literature since such missingness is a common occurrence in most real-life applications due to data corruption, measurement errors, and similar. To this end, we introduce a novel hierarchical architecture involving a set of long short-term memory (LSTM) networks, which use only the existing inputs in the sequence without any imputations or statistical assumptions on the missing data. To incorporate the missingness information, we partition the input space into different regions in a hierarchical manner based on the “presence-pattern” of the previous inputs and then assign different LSTM networks to these regions. In this sense, we use the LSTM networks as our experts for these regions and adaptively combine their outputs to generate our final output. Our method is generic so that the set of partitioned regions (presence-patterns) that are modeled by the LSTM networks can be customized, and one can readily use other sequential architectures such as gated recurrent unit (GRU) networks and recurrent neural networks (RNNs) as shown in the article. We also provide the computational complexity analysis of the proposed architecture, which is in the same order as a conventional LSTM architecture. In our experiments, our algorithm achieves significant performance improvements on the well-known financial and real-life datasets with respect to the state-of-the-art methods. We also share the source code of our algorithm to facilitate other research and the replicability of our results.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142443130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Stable learning aims to learn a model that generalizes well to arbitrary unseen target domain by leveraging a single source domain. Recent advances in stable learning have focused on balancing the distribution of confounders for each feature to eliminate spurious correlations. However, previous studies treat all features equally without considering the difficulties of confounder balancing associated with different features, and regard irrelevant features as confounders, deteriorating generalization performance. To tackle these issues, this article proposes a novel triplex learning (TriL) based stable learning algorithm, which performs sample reweighting, causal feature selection, and representation learning to remove spurious correlations. Specifically, first, TriL adaptively assigns weights to the confounder balancing term of each feature in accordance with the difficulties of confounder balancing, and aligns the confounder distribution of each feature by learning a group of sample weights. Second, TriL integrates the sample weights into a weighted cross-entropy model to compute causal effects of features for excluding irrelevant features from the confounder set. Finally, TriL relearns a set of sample weights and uses them to guide a new supervised dual-autoencoder containing two classifiers to learn feature representations. TriL forces the results of two classifiers to remain consistent for removing spurious correlations by using a cross-classifier consistency regularization. Extensive experiments on synthetic and two real-world datasets show the superiority of TriL compared with seven methods.
{"title":"Stable Learning via Triplex Learning","authors":"Shuai Yang;Tingting Jiang;Qianlong Dang;Lichuan Gu;Xindong Wu","doi":"10.1109/TAI.2024.3404411","DOIUrl":"https://doi.org/10.1109/TAI.2024.3404411","url":null,"abstract":"Stable learning aims to learn a model that generalizes well to arbitrary unseen target domain by leveraging a single source domain. Recent advances in stable learning have focused on balancing the distribution of confounders for each feature to eliminate spurious correlations. However, previous studies treat all features equally without considering the difficulties of confounder balancing associated with different features, and regard irrelevant features as confounders, deteriorating generalization performance. To tackle these issues, this article proposes a novel triplex learning (TriL) based stable learning algorithm, which performs sample reweighting, causal feature selection, and representation learning to remove spurious correlations. Specifically, first, TriL adaptively assigns weights to the confounder balancing term of each feature in accordance with the difficulties of confounder balancing, and aligns the confounder distribution of each feature by learning a group of sample weights. Second, TriL integrates the sample weights into a weighted cross-entropy model to compute causal effects of features for excluding irrelevant features from the confounder set. Finally, TriL relearns a set of sample weights and uses them to guide a new supervised dual-autoencoder containing two classifiers to learn feature representations. TriL forces the results of two classifiers to remain consistent for removing spurious correlations by using a cross-classifier consistency regularization. Extensive experiments on synthetic and two real-world datasets show the superiority of TriL compared with seven methods.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142442979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-03-22DOI: 10.1109/TAI.2024.3404408
Yu Huang;Jiaxing Liu;Zongshi Zhang;Dui Li;Xuxin Li;Guang Wang
Accurate short-term photovoltaic (PV) power prediction can be crucial for fault detection of the control system and reducing the fault of the PV output control system. However, PV power is highly volatile, and significant power fluctuations cannot be adapted to by the combined model when predicting, thus affecting the stable operation of the PV output control system. In response to this issue, a dynamic combination short-term PV power prediction model of temporal convolutional network (TCN)-bidirectional gated recurrent unit network (BiGRU) and TCN-bidirectional long-short term memory network (BiLSTM) based on complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) is proposed. CEEMDAN is employed to decompose the original PV power data to reduce the volatility of the original data. Constructing two combined models, TCN-BiGRU and TCN-BiLSTM, and training them separately. Introducing ElasticNet, which utilizes both L1