LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining Pub Date : 2020-08-01 DOI:10.1145/3394486.3403213

Kejing Yin, Ardavan Afshar, Joyce C Ho, William K Cheung, Chao Zhang, Jimeng Sun

{"title":"LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.","authors":"Kejing Yin, Ardavan Afshar, Joyce C Ho, William K Cheung, Chao Zhang, Jimeng Sun","doi":"10.1145/3394486.3403213","DOIUrl":null,"url":null,"abstract":"<p><p>Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data. In this paper, we propose Logistic PARAFAC2 (LogPar) by modeling the binary irregular tensor with Bernoulli distribution parameterized by an underlying real-valued tensor. Then we approximate the underlying tensor with a positive-unlabeled learning loss function to account for the missing values. We also incorporate uniqueness and temporal smoothness regularization to enhance the interpretability. Extensive experiments using large-scale real-world datasets show that LogPar outperforms all baselines in both irregular tensor completion and downstream predictive tasks. For the irregular tensor completion, LogPar achieves up to 26% relative improvement compared to the best baseline. Besides, LogPar obtains relative improvement of 13.2% for heart failure prediction and 14% for mortality prediction on average compared to the state-of-the-art PARAFAC2 models.</p>","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3394486.3403213","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394486.3403213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data. In this paper, we propose Logistic PARAFAC2 (LogPar) by modeling the binary irregular tensor with Bernoulli distribution parameterized by an underlying real-valued tensor. Then we approximate the underlying tensor with a positive-unlabeled learning loss function to account for the missing values. We also incorporate uniqueness and temporal smoothness regularization to enhance the interpretability. Extensive experiments using large-scale real-world datasets show that LogPar outperforms all baselines in both irregular tensor completion and downstream predictive tasks. For the irregular tensor completion, LogPar achieves up to 26% relative improvement compared to the best baseline. Besides, LogPar obtains relative improvement of 13.2% for heart failure prediction and 14% for mortality prediction on average compared to the state-of-the-art PARAFAC2 models.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

具有缺失值的时间二元数据的LogPar: Logistic PARAFAC2分解。

具有一类缺失值的二进制数据在实际应用程序中普遍存在。它们可以用一维中不同大小的不规则张量表示，其中值1表示存在特征，而0表示未知(即存在或不存在特征)。从这种二元不规则张量中学习精确的低秩近似是一项具有挑战性的任务。然而，现有的用于分解不规则张量的模型都没有考虑缺失值，并且它们假设高斯分布，导致在应用于二进制数据时分布不匹配。本文通过对具有伯努利分布的二元不规则张量进行建模，提出了Logistic PARAFAC2 (LogPar)。然后，我们用一个正的无标记学习损失函数来近似底层张量，以解释缺失的值。我们还结合唯一性和时间平滑正则化来增强可解释性。使用大规模真实数据集进行的大量实验表明，LogPar在不规则张量完井和下游预测任务中都优于所有基线。对于不规则张量完井，与最佳基线相比，LogPar实现了高达26%的相对改进。此外，与最先进的PARAFAC2模型相比，LogPar在心力衰竭预测方面平均提高13.2%，在死亡率预测方面平均提高14%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

KDD : proceedings. International Conference on Knowledge Discovery & Data Mining

自引率

0.00%

发文量