{"title":"Nonlinear Regression With Hierarchical Recurrent Neural Networks Under Missing Data","authors":"S. Onur Sahin;Suleyman S. Kozat","doi":"10.1109/TAI.2024.3404414","DOIUrl":null,"url":null,"abstract":"We study regression (or prediction) of sequential data, which may have missing entries and/or different lengths. This problem is heavily investigated in the machine learning literature since such missingness is a common occurrence in most real-life applications due to data corruption, measurement errors, and similar. To this end, we introduce a novel hierarchical architecture involving a set of long short-term memory (LSTM) networks, which use only the existing inputs in the sequence without any imputations or statistical assumptions on the missing data. To incorporate the missingness information, we partition the input space into different regions in a hierarchical manner based on the “presence-pattern” of the previous inputs and then assign different LSTM networks to these regions. In this sense, we use the LSTM networks as our experts for these regions and adaptively combine their outputs to generate our final output. Our method is generic so that the set of partitioned regions (presence-patterns) that are modeled by the LSTM networks can be customized, and one can readily use other sequential architectures such as gated recurrent unit (GRU) networks and recurrent neural networks (RNNs) as shown in the article. We also provide the computational complexity analysis of the proposed architecture, which is in the same order as a conventional LSTM architecture. In our experiments, our algorithm achieves significant performance improvements on the well-known financial and real-life datasets with respect to the state-of-the-art methods. We also share the source code of our algorithm to facilitate other research and the replicability of our results.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10536892/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We study regression (or prediction) of sequential data, which may have missing entries and/or different lengths. This problem is heavily investigated in the machine learning literature since such missingness is a common occurrence in most real-life applications due to data corruption, measurement errors, and similar. To this end, we introduce a novel hierarchical architecture involving a set of long short-term memory (LSTM) networks, which use only the existing inputs in the sequence without any imputations or statistical assumptions on the missing data. To incorporate the missingness information, we partition the input space into different regions in a hierarchical manner based on the “presence-pattern” of the previous inputs and then assign different LSTM networks to these regions. In this sense, we use the LSTM networks as our experts for these regions and adaptively combine their outputs to generate our final output. Our method is generic so that the set of partitioned regions (presence-patterns) that are modeled by the LSTM networks can be customized, and one can readily use other sequential architectures such as gated recurrent unit (GRU) networks and recurrent neural networks (RNNs) as shown in the article. We also provide the computational complexity analysis of the proposed architecture, which is in the same order as a conventional LSTM architecture. In our experiments, our algorithm achieves significant performance improvements on the well-known financial and real-life datasets with respect to the state-of-the-art methods. We also share the source code of our algorithm to facilitate other research and the replicability of our results.