LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.

Kejing Yin, Ardavan Afshar, Joyce C Ho, William K Cheung, Chao Zhang, Jimeng Sun
{"title":"LogPar: Logistic PARAFAC2 Factorization for Temporal Binary Data with Missing Values.","authors":"Kejing Yin,&nbsp;Ardavan Afshar,&nbsp;Joyce C Ho,&nbsp;William K Cheung,&nbsp;Chao Zhang,&nbsp;Jimeng Sun","doi":"10.1145/3394486.3403213","DOIUrl":null,"url":null,"abstract":"<p><p>Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data. In this paper, we propose Logistic PARAFAC2 (LogPar) by modeling the binary irregular tensor with Bernoulli distribution parameterized by an underlying real-valued tensor. Then we approximate the underlying tensor with a positive-unlabeled learning loss function to account for the missing values. We also incorporate uniqueness and temporal smoothness regularization to enhance the interpretability. Extensive experiments using large-scale real-world datasets show that LogPar outperforms all baselines in both irregular tensor completion and downstream predictive tasks. For the irregular tensor completion, LogPar achieves up to 26% relative improvement compared to the best baseline. Besides, LogPar obtains relative improvement of 13.2% for heart failure prediction and 14% for mortality prediction on average compared to the state-of-the-art PARAFAC2 models.</p>","PeriodicalId":74037,"journal":{"name":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3394486.3403213","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"KDD : proceedings. International Conference on Knowledge Discovery & Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3394486.3403213","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16

Abstract

Binary data with one-class missing values are ubiquitous in real-world applications. They can be represented by irregular tensors with varying sizes in one dimension, where value one means presence of a feature while zero means unknown (i.e., either presence or absence of a feature). Learning accurate low-rank approximations from such binary irregular tensors is a challenging task. However, none of the existing models developed for factorizing irregular tensors take the missing values into account, and they assume Gaussian distributions, resulting in a distribution mismatch when applied to binary data. In this paper, we propose Logistic PARAFAC2 (LogPar) by modeling the binary irregular tensor with Bernoulli distribution parameterized by an underlying real-valued tensor. Then we approximate the underlying tensor with a positive-unlabeled learning loss function to account for the missing values. We also incorporate uniqueness and temporal smoothness regularization to enhance the interpretability. Extensive experiments using large-scale real-world datasets show that LogPar outperforms all baselines in both irregular tensor completion and downstream predictive tasks. For the irregular tensor completion, LogPar achieves up to 26% relative improvement compared to the best baseline. Besides, LogPar obtains relative improvement of 13.2% for heart failure prediction and 14% for mortality prediction on average compared to the state-of-the-art PARAFAC2 models.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
具有缺失值的时间二元数据的LogPar: Logistic PARAFAC2分解。
具有一类缺失值的二进制数据在实际应用程序中普遍存在。它们可以用一维中不同大小的不规则张量表示,其中值1表示存在特征,而0表示未知(即存在或不存在特征)。从这种二元不规则张量中学习精确的低秩近似是一项具有挑战性的任务。然而,现有的用于分解不规则张量的模型都没有考虑缺失值,并且它们假设高斯分布,导致在应用于二进制数据时分布不匹配。本文通过对具有伯努利分布的二元不规则张量进行建模,提出了Logistic PARAFAC2 (LogPar)。然后,我们用一个正的无标记学习损失函数来近似底层张量,以解释缺失的值。我们还结合唯一性和时间平滑正则化来增强可解释性。使用大规模真实数据集进行的大量实验表明,LogPar在不规则张量完井和下游预测任务中都优于所有基线。对于不规则张量完井,与最佳基线相比,LogPar实现了高达26%的相对改进。此外,与最先进的PARAFAC2模型相比,LogPar在心力衰竭预测方面平均提高13.2%,在死亡率预测方面平均提高14%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Predicting Age-Related Macular Degeneration Progression with Contrastive Attention and Time-Aware LSTM. MolSearch: Search-based Multi-objective Molecular Generation and Property Optimization. Deconfounding Actor-Critic Network with Policy Adaptation for Dynamic Treatment Regimes. MoCL: Data-driven Molecular Fingerprint via Knowledge-aware Contrastive Learning from Molecular Graph. Federated Adversarial Debiasing for Fair and Transferable Representations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1