{"title":"大数据来源分析与可视化","authors":"Peng Chen, Beth Plale","doi":"10.1109/CCGrid.2015.85","DOIUrl":null,"url":null,"abstract":"Provenance captured from E-Science experimentation is often large and complex, for instance, from agent-based simulations that have tens of thousands of heterogeneous components interacting over extended time periods. The subject of study of my dissertation is the use of E-Science provenance at scale. My initial research studied the visualization of large provenance graphs and proposed an abstract representation of provenance that supports useful data mining. Recent work involves analyzing large provenance data generated from agent-based simulations on a single machine. In continuation, I propose stream processing techniques to support the continuous and real-time analysis of data provenance, which is captured from agent based simulations on HPC and thus has unprecedented volume and complexity.","PeriodicalId":6664,"journal":{"name":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"20 1","pages":"797-800"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Big Data Provenance Analysis and Visualization\",\"authors\":\"Peng Chen, Beth Plale\",\"doi\":\"10.1109/CCGrid.2015.85\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Provenance captured from E-Science experimentation is often large and complex, for instance, from agent-based simulations that have tens of thousands of heterogeneous components interacting over extended time periods. The subject of study of my dissertation is the use of E-Science provenance at scale. My initial research studied the visualization of large provenance graphs and proposed an abstract representation of provenance that supports useful data mining. Recent work involves analyzing large provenance data generated from agent-based simulations on a single machine. In continuation, I propose stream processing techniques to support the continuous and real-time analysis of data provenance, which is captured from agent based simulations on HPC and thus has unprecedented volume and complexity.\",\"PeriodicalId\":6664,\"journal\":{\"name\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"volume\":\"20 1\",\"pages\":\"797-800\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2015.85\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2015.85","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Provenance captured from E-Science experimentation is often large and complex, for instance, from agent-based simulations that have tens of thousands of heterogeneous components interacting over extended time periods. The subject of study of my dissertation is the use of E-Science provenance at scale. My initial research studied the visualization of large provenance graphs and proposed an abstract representation of provenance that supports useful data mining. Recent work involves analyzing large provenance data generated from agent-based simulations on a single machine. In continuation, I propose stream processing techniques to support the continuous and real-time analysis of data provenance, which is captured from agent based simulations on HPC and thus has unprecedented volume and complexity.