基于可扩展和混合集成的因果关系发现

Pei Guo, Achuna Ofonedu, Jianwu Wang
{"title":"基于可扩展和混合集成的因果关系发现","authors":"Pei Guo, Achuna Ofonedu, Jianwu Wang","doi":"10.1109/SMDS49396.2020.00016","DOIUrl":null,"url":null,"abstract":"Causality discovery mines cause-effect relationships among different variables of a system and has been widely used in many disciplines including climatology and neuroscience. To discover causal relationships, many data-driven causality discovery methods, e.g., Granger causality, PCMCI and Dynamic Bayesian Network, have been proposed. Many of these causality discovery approaches mine time series data and generate a directed causality graph where each graph edge denotes a cause-effect relationship between the two connected graph nodes. Our benchmarking of different causality discovery approaches with real-world climate data shows these approaches often generate quite different causality results with the same input dataset due to their internal learning mechanism differences. Meanwhile, there are ever-increasing available data in virtually every discipline, which makes it more and more difficult to use existing causality discovery algorithms to produce causality results within reasonable time. To address these two challenges, this paper utilizes data partitioning and ensemble techniques, and proposes a two-phase hybrid causality ensemble framework. The framework first conducts phase 1 data ensemble for partitioned data and then conducts phase 2 algorithm ensemble from data ensemble results. To achieve scalability, we further parallelize the ensemble approaches via the Spark big data analytics engine. Our experiments show that our proposed approaches achieve good accuracy through ensemble and high scalability through data-parallelization in distributed computing environments.","PeriodicalId":385149,"journal":{"name":"2020 IEEE International Conference on Smart Data Services (SMDS)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Scalable and Hybrid Ensemble-Based Causality Discovery\",\"authors\":\"Pei Guo, Achuna Ofonedu, Jianwu Wang\",\"doi\":\"10.1109/SMDS49396.2020.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Causality discovery mines cause-effect relationships among different variables of a system and has been widely used in many disciplines including climatology and neuroscience. To discover causal relationships, many data-driven causality discovery methods, e.g., Granger causality, PCMCI and Dynamic Bayesian Network, have been proposed. Many of these causality discovery approaches mine time series data and generate a directed causality graph where each graph edge denotes a cause-effect relationship between the two connected graph nodes. Our benchmarking of different causality discovery approaches with real-world climate data shows these approaches often generate quite different causality results with the same input dataset due to their internal learning mechanism differences. Meanwhile, there are ever-increasing available data in virtually every discipline, which makes it more and more difficult to use existing causality discovery algorithms to produce causality results within reasonable time. To address these two challenges, this paper utilizes data partitioning and ensemble techniques, and proposes a two-phase hybrid causality ensemble framework. The framework first conducts phase 1 data ensemble for partitioned data and then conducts phase 2 algorithm ensemble from data ensemble results. To achieve scalability, we further parallelize the ensemble approaches via the Spark big data analytics engine. Our experiments show that our proposed approaches achieve good accuracy through ensemble and high scalability through data-parallelization in distributed computing environments.\",\"PeriodicalId\":385149,\"journal\":{\"name\":\"2020 IEEE International Conference on Smart Data Services (SMDS)\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE International Conference on Smart Data Services (SMDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SMDS49396.2020.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Conference on Smart Data Services (SMDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SMDS49396.2020.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

因果关系发现挖掘系统中不同变量之间的因果关系,已广泛应用于许多学科,包括气候学和神经科学。为了发现因果关系,人们提出了许多数据驱动的因果关系发现方法,如格兰杰因果关系、PCMCI和动态贝叶斯网络。许多因果关系发现方法挖掘时间序列数据并生成有向因果图,其中每个图边表示两个连接图节点之间的因果关系。我们对不同因果关系发现方法与现实世界气候数据的基准测试表明,由于内部学习机制的差异,这些方法通常会在相同的输入数据集上产生完全不同的因果关系结果。同时,几乎每个学科的可用数据都在不断增加,这使得使用现有的因果关系发现算法在合理的时间内产生因果关系结果变得越来越困难。为了解决这两个问题,本文利用数据划分和集成技术,提出了一个两阶段混合因果关系集成框架。框架首先对分区数据进行第一阶段的数据集成,然后根据数据集成结果进行第二阶段的算法集成。为了实现可扩展性,我们通过Spark大数据分析引擎进一步并行化集成方法。实验表明,本文提出的方法在分布式计算环境下通过集成获得了良好的精度,并通过数据并行化获得了较高的可扩展性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Scalable and Hybrid Ensemble-Based Causality Discovery
Causality discovery mines cause-effect relationships among different variables of a system and has been widely used in many disciplines including climatology and neuroscience. To discover causal relationships, many data-driven causality discovery methods, e.g., Granger causality, PCMCI and Dynamic Bayesian Network, have been proposed. Many of these causality discovery approaches mine time series data and generate a directed causality graph where each graph edge denotes a cause-effect relationship between the two connected graph nodes. Our benchmarking of different causality discovery approaches with real-world climate data shows these approaches often generate quite different causality results with the same input dataset due to their internal learning mechanism differences. Meanwhile, there are ever-increasing available data in virtually every discipline, which makes it more and more difficult to use existing causality discovery algorithms to produce causality results within reasonable time. To address these two challenges, this paper utilizes data partitioning and ensemble techniques, and proposes a two-phase hybrid causality ensemble framework. The framework first conducts phase 1 data ensemble for partitioned data and then conducts phase 2 algorithm ensemble from data ensemble results. To achieve scalability, we further parallelize the ensemble approaches via the Spark big data analytics engine. Our experiments show that our proposed approaches achieve good accuracy through ensemble and high scalability through data-parallelization in distributed computing environments.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
S3QLRDF: Property Table Partitioning Scheme for Distributed SPARQL Querying of large-scale RDF data BC-Sketch: A Simple Reversible Sketch for Detecting Network Anomalies 2020 IEEE International Conference on Smart Data Services (SMDS) SMDS 2020 Scalable and Hybrid Ensemble-Based Causality Discovery Stargazer: A Deep Learning Approach for Estimating the Performance of Edge- Based Clustering Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1