Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data

M. Balasubramanian, Trevor D. Ruiz, B. Cook, Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, K. Bouchard
{"title":"Scaling of Union of Intersections for Inference of Granger Causal Networks from Observational Data","authors":"M. Balasubramanian, Trevor D. Ruiz, B. Cook, Prabhat, Sharmodeep Bhattacharyya, Aviral Shrivastava, K. Bouchard","doi":"10.1109/IPDPS47924.2020.00036","DOIUrl":null,"url":null,"abstract":"The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"56 1","pages":"264-273"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The development of advanced recording and measurement devices in scientific fields is producing high-dimensional time series data. Vector autoregressive (VAR) models are well suited for inferring Granger-causal networks from high dimensional time series data sets, but accurate inference at scale remains a central challenge. We have recently introduced a flexible and scalable statistical machine learning framework, Union of Intersections (UoI), which enables low false-positive and low false-negative feature selection along with low bias and low variance estimation, enhancing interpretation and predictive accuracy. In this paper, we scale the UoI framework for VAR models (algorithm UoIV AR) to infer network connectivity from large time series data sets (TBs). To achieve this, we optimize distributed convex optimization and introduce novel strategies for improved data read and data distribution times. We study the strong and weak scaling of the algorithm on a Xeon-phi based supercomputer (100,000 cores). These advances enable us to estimate the largest VAR model as known (1000 nodes, corresponding to 1M parameters) and apply it to large time series data from neurophysiology (192 neurons) and finance (470 companies).
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
从观测数据推断格兰杰因果网络的交点并的标度
科学领域先进记录和测量设备的发展正在产生高维时间序列数据。向量自回归(VAR)模型非常适合从高维时间序列数据集推断格兰杰因果网络,但大规模的准确推断仍然是一个核心挑战。我们最近推出了一个灵活且可扩展的统计机器学习框架,交集联合(UoI),它可以实现低假阳性和低假阴性特征选择以及低偏差和低方差估计,从而提高解释和预测准确性。在本文中,我们扩展了VAR模型(算法UoIV AR)的ui框架,以从大型时间序列数据集(tb)中推断网络连通性。为了实现这一点,我们优化了分布式凸优化,并引入了改进数据读取和数据分发时间的新策略。我们在基于Xeon-phi的超级计算机(100,000核)上研究了该算法的强缩放和弱缩放。这些进步使我们能够估计已知的最大VAR模型(1000个节点,对应1M个参数),并将其应用于来自神经生理学(192个神经元)和金融(470家公司)的大型时间序列数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner Resilience at Extreme Scale and Connections with Other Domains A Tale of Two C's: Convergence and Composability 12 Ways to Fool the Masses with Irreproducible Results Is Asymptotic Cost Analysis Useful in Developing Practical Parallel Algorithms
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1