Aptera: Automatic PARAFAC2 Tensor Analysis

Ekta Gujral, E. Papalexakis
{"title":"Aptera: Automatic PARAFAC2 Tensor Analysis","authors":"Ekta Gujral, E. Papalexakis","doi":"10.1109/ASONAM55673.2022.10068699","DOIUrl":null,"url":null,"abstract":"In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"104 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068699","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In data mining, PARAFAC2 is a powerful and a multi-layer tensor decomposition method that is ideally suited for unsupervised modeling of data which forms “irregular” tensors, e.g., patient's diagnostic profiles, where each patient's recovery timeline does not necessarily align with other patients. In real-world applications, where no ground truth is available, how can we automatically choose how many components to analyze? Although extremely trivial, finding the number of components is very hard. So far, under traditional settings, to determine a reasonable number of components, when using PARAFAC2 data, is to compute decomposition with a different number of components and then analyze the outcome manually. This is an inefficient and time-consuming path, first, due to large data volume and second, the human evaluation makes the selection biased. In this paper, we introduce Aptera, a novel automatic PARAFAC2 tensor mining that is based on locating the L-curve corner. The automation of the PARAFAC2 model quality assessment helps both novice and qualified researchers to conduct detailed and advanced analysis. We extensively evaluate Aptera 's performance on synthetic data, outperforming existing state-of-the-art methods on this very hard problem. Finally, we apply Aptera to a variety of real-world datasets and demonstrate its robustness, scalability, and estimation reliability.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
无翅目:自动PARAFAC2张量分析
在数据挖掘中,PARAFAC2是一种功能强大的多层张量分解方法,非常适合于对形成“不规则”张量的数据进行无监督建模,例如,患者的诊断概况,其中每个患者的恢复时间不一定与其他患者一致。在现实世界的应用程序中,没有可获得的基础真理,我们如何自动选择要分析多少组件?虽然非常简单,但是找到组件的数量是非常困难的。到目前为止,在传统设置下,在使用PARAFAC2数据时,要确定合理的组件数量,是使用不同数量的组件计算分解,然后手动分析结果。这是一个低效且耗时的路径,首先,由于数据量大,其次,人工评估使选择有偏见。本文介绍了一种新的基于l曲线拐角定位的PARAFAC2张量自动挖掘算法Aptera。PARAFAC2模型质量评估的自动化有助于新手和合格的研究人员进行详细和高级的分析。我们广泛评估了Aptera在合成数据上的表现,在这个非常困难的问题上优于现有的最先进的方法。最后,我们将Aptera应用于各种现实世界的数据集,并展示了它的鲁棒性、可扩展性和估计可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MOGPlay: A Decentralized Crowd Journalism Application for Democratic News Production The Pursuit of Being Heard: An Unsupervised Approach to Narrative Detection in Online Protest ASONAM 2022 Tutorial I: Mining and Analysing Collaboration in git Repositories with git2net Multigraph transformation for community detection applied to financial services Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1