Conditional Generative Adversarial Network for Early Classification of Longitudinal Datasets using an Imputation Approach

IF 4 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-02-07 DOI:10.1145/3644821

Sharon Torao Pingi, Richi Nayak, Md Abul Bashar

引用次数: 0

Abstract

Early classification of longitudinal data remains an active area of research today. The complexity of these datasets and the high rates of missing data caused by irregular sampling present data-level challenges for the Early Longitudinal Data Classification (ELDC) problem. Coupled with the algorithmic challenge of optimising the opposing objectives of early classification (i.e., earliness and accuracy), ELDC becomes a non-trivial task. Inspired by the generative power and utility of the Generative Adversarial Network (GAN), we propose a novel context-conditional, longitudinal early classifier GAN (LEC-GAN). This model utilises informative missingness, static features, and earlier observations to improve the ELDC objective. It achieves this by incorporating ELDC as an auxiliary task within an imputation optimization process. Our experiments on several datasets demonstrate that LEC-GAN outperforms all relevant baselines in terms of F1 scores while increasing the earliness of prediction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用估算方法对纵向数据集进行早期分类的条件生成对抗网络

纵向数据的早期分类仍然是当今一个活跃的研究领域。这些数据集的复杂性和不规则抽样造成的高数据缺失率给早期纵向数据分类（ELDC）问题带来了数据层面的挑战。再加上优化早期分类的对立目标（即早期性和准确性）的算法挑战，ELDC 成为了一项非同小可的任务。受生成对抗网络（GAN）的生成能力和实用性的启发，我们提出了一种新颖的上下文条件纵向早期分类器 GAN（LEC-GAN）。该模型利用信息缺失、静态特征和早期观测来改善 ELDC 目标。它通过将 ELDC 作为一项辅助任务纳入估算优化流程来实现这一目标。我们在多个数据集上进行的实验表明，LEC-GAN 在提高预测准确率的同时，在 F1 分数方面优于所有相关基线。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

6.70

自引率

5.60%

发文量

172

审稿时长

3 months

期刊介绍： TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.