Sharon Torao Pingi, Duoyi Zhang, Md Abul Bashar, Richi Nayak
{"title":"Joint Representation Learning with Generative Adversarial Imputation Network for Improved Classification of Longitudinal Data","authors":"Sharon Torao Pingi, Duoyi Zhang, Md Abul Bashar, Richi Nayak","doi":"10.1007/s41019-023-00232-9","DOIUrl":null,"url":null,"abstract":"Abstract Generative adversarial networks (GANs) have demonstrated their effectiveness in generating temporal data to fill in missing values, enhancing the classification performance of time series data. Longitudinal datasets encompass multivariate time series data with additional static features that contribute to sample variability over time. These datasets often encounter missing values due to factors such as irregular sampling. However, existing GAN-based imputation methods that address this type of data missingness often overlook the impact of static features on temporal observations and classification outcomes. This paper presents a novel method, fusion-aided imputer-classifier GAN (FaIC-GAN), tailored for longitudinal data classification. FaIC-GAN simultaneously leverages partially observed temporal data and static features to enhance imputation and classification learning. We present four multimodal fusion strategies that effectively extract correlated information from both static and temporal modalities. Our extensive experiments reveal that FaIC-GAN successfully exploits partially observed temporal data and static features, resulting in improved classification accuracy compared to unimodal models. Our post-additive and attention-based multimodal fusion approaches within the FaIC-GAN model consistently rank among the top three methods for classification.","PeriodicalId":52220,"journal":{"name":"Data Science and Engineering","volume":"14 1","pages":"0"},"PeriodicalIF":5.1000,"publicationDate":"2023-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41019-023-00232-9","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Generative adversarial networks (GANs) have demonstrated their effectiveness in generating temporal data to fill in missing values, enhancing the classification performance of time series data. Longitudinal datasets encompass multivariate time series data with additional static features that contribute to sample variability over time. These datasets often encounter missing values due to factors such as irregular sampling. However, existing GAN-based imputation methods that address this type of data missingness often overlook the impact of static features on temporal observations and classification outcomes. This paper presents a novel method, fusion-aided imputer-classifier GAN (FaIC-GAN), tailored for longitudinal data classification. FaIC-GAN simultaneously leverages partially observed temporal data and static features to enhance imputation and classification learning. We present four multimodal fusion strategies that effectively extract correlated information from both static and temporal modalities. Our extensive experiments reveal that FaIC-GAN successfully exploits partially observed temporal data and static features, resulting in improved classification accuracy compared to unimodal models. Our post-additive and attention-based multimodal fusion approaches within the FaIC-GAN model consistently rank among the top three methods for classification.
摘要生成对抗网络(GANs)在生成时间数据来填补缺失值,提高时间序列数据的分类性能方面已经证明了其有效性。纵向数据集包含具有额外静态特征的多变量时间序列数据,这些静态特征有助于样本随时间的变化。由于不规则采样等因素,这些数据集经常会遇到缺失值。然而,解决这类数据缺失的现有基于gan的插值方法往往忽略了静态特征对时间观测和分类结果的影响。本文提出了一种专为纵向数据分类而设计的新方法——融合辅助imputer-classifier GAN (FaIC-GAN)。FaIC-GAN同时利用部分观测到的时间数据和静态特征来增强输入和分类学习。我们提出了四种多模态融合策略,有效地从静态和时间模态中提取相关信息。我们的大量实验表明,FaIC-GAN成功地利用了部分观测到的时间数据和静态特征,与单峰模型相比,提高了分类精度。在FaIC-GAN模型中,我们的后加和基于注意力的多模态融合方法一直名列前三种分类方法之列。
期刊介绍:
The journal of Data Science and Engineering (DSE) responds to the remarkable change in the focus of information technology development from CPU-intensive computation to data-intensive computation, where the effective application of data, especially big data, becomes vital. The emerging discipline data science and engineering, an interdisciplinary field integrating theories and methods from computer science, statistics, information science, and other fields, focuses on the foundations and engineering of efficient and effective techniques and systems for data collection and management, for data integration and correlation, for information and knowledge extraction from massive data sets, and for data use in different application domains. Focusing on the theoretical background and advanced engineering approaches, DSE aims to offer a prime forum for researchers, professionals, and industrial practitioners to share their knowledge in this rapidly growing area. It provides in-depth coverage of the latest advances in the closely related fields of data science and data engineering. More specifically, DSE covers four areas: (i) the data itself, i.e., the nature and quality of the data, especially big data; (ii) the principles of information extraction from data, especially big data; (iii) the theory behind data-intensive computing; and (iv) the techniques and systems used to analyze and manage big data. DSE welcomes papers that explore the above subjects. Specific topics include, but are not limited to: (a) the nature and quality of data, (b) the computational complexity of data-intensive computing,(c) new methods for the design and analysis of the algorithms for solving problems with big data input,(d) collection and integration of data collected from internet and sensing devises or sensor networks, (e) representation, modeling, and visualization of big data,(f) storage, transmission, and management of big data,(g) methods and algorithms of data intensive computing, such asmining big data,online analysis processing of big data,big data-based machine learning, big data based decision-making, statistical computation of big data, graph-theoretic computation of big data, linear algebraic computation of big data, and big data-based optimization. (h) hardware systems and software systems for data-intensive computing, (i) data security, privacy, and trust, and(j) novel applications of big data.