Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675174
Jay Takle, D. Silver, E. Kovacs, K. Heitmann
In this poster, we present a new approach to visualize multivariate dark matter halos representing the spheroid part of galaxies, the disk part of galaxies, black holes and the halo itself. The data being visualized here is an end result of tracking the evolution of cosmic structures called dark matter halos in cosmological simulation and evaluating the formation and evolution of galaxies within. Cosmologists have traditionally visualized individual galaxies in the form of two dimensional density maps, graphs and parallel coordinates. We introduce a new way of mapping multiple parameters of dark matter halos to a halo-icon. This allows the scientist to view all of the parameters associated with the dark matter halos in one visualization.
{"title":"Visualization of multivariate dark matter halos in cosmology simulations","authors":"Jay Takle, D. Silver, E. Kovacs, K. Heitmann","doi":"10.1109/LDAV.2013.6675174","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675174","url":null,"abstract":"In this poster, we present a new approach to visualize multivariate dark matter halos representing the spheroid part of galaxies, the disk part of galaxies, black holes and the halo itself. The data being visualized here is an end result of tracking the evolution of cosmic structures called dark matter halos in cosmological simulation and evaluating the formation and evolution of galaxies within. Cosmologists have traditionally visualized individual galaxies in the form of two dimensional density maps, graphs and parallel coordinates. We introduce a new way of mapping multiple parameters of dark matter halos to a halo-icon. This allows the scientist to view all of the parameters associated with the dark matter halos in one visualization.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129121693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675171
Abon Chaudhuri, Teng-Yok Lee, Han-Wei Shen, T. Peterka
Frequent access to raw data is no longer practical, if possible at all, for answering queries on large-scale data. This has led to the use of distribution-based data summaries, which can substitute for raw data to answer statistical queries of different kinds. Our work is concerned with range distribution query, which returns the distribution of an axis-aligned region of any size. We address the challenge of maintaining the interactivity and accuracy of such query results in the presence of large data. This work presents a novel and efficient framework for pre-computing and storing a set of distributions which can be used to query any arbitrary region during post-processing. We adapt an integral image based data structure to answer such queries in constant time, and propose a similarity-based encoding technique to reduce the storage cost of the data structure. Our scheme utilizes the similarity present among different regions in the data, and hence, their respective distributions. We demonstrate the use our technique in various applications, which directly or indirectly require distributions.
{"title":"Efficient range distribution query in large-scale scientific data","authors":"Abon Chaudhuri, Teng-Yok Lee, Han-Wei Shen, T. Peterka","doi":"10.1109/LDAV.2013.6675171","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675171","url":null,"abstract":"Frequent access to raw data is no longer practical, if possible at all, for answering queries on large-scale data. This has led to the use of distribution-based data summaries, which can substitute for raw data to answer statistical queries of different kinds. Our work is concerned with range distribution query, which returns the distribution of an axis-aligned region of any size. We address the challenge of maintaining the interactivity and accuracy of such query results in the presence of large data. This work presents a novel and efficient framework for pre-computing and storing a set of distributions which can be used to query any arbitrary region during post-processing. We adapt an integral image based data structure to answer such queries in constant time, and propose a similarity-based encoding technique to reduce the storage cost of the data structure. Our scheme utilizes the similarity present among different regions in the data, and hence, their respective distributions. We demonstrate the use our technique in various applications, which directly or indirectly require distributions.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114615311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675167
Liang Zhou, C. Hansen
We present a volume visualization method that allows interactive rendering and efficient querying of large multivariate seismic volume data on consumer level PCs. The volume rendering pipeline utilizes a virtual memory structure that supports out-of-core multivariate multi-resolution data and a GPU-based ray caster that allows interactive multivariate transfer function design. A Gaussian mixture model representation is precomputed and nearly interactive querying is achieved by testing the Gaussian functions against user defined transfer functions on the GPU in the runtime. Finally, the method has been tested on a multivariate 3D seismic dataset which is larger than the size of the main memory of the testing machine.
{"title":"Interactive rendering and efficient querying for large multivariate seismic volumes on consumer level PCs","authors":"Liang Zhou, C. Hansen","doi":"10.1109/LDAV.2013.6675167","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675167","url":null,"abstract":"We present a volume visualization method that allows interactive rendering and efficient querying of large multivariate seismic volume data on consumer level PCs. The volume rendering pipeline utilizes a virtual memory structure that supports out-of-core multivariate multi-resolution data and a GPU-based ray caster that allows interactive multivariate transfer function design. A Gaussian mixture model representation is precomputed and nearly interactive querying is achieved by testing the Gaussian functions against user defined transfer functions on the GPU in the runtime. Finally, the method has been tested on a multivariate 3D seismic dataset which is larger than the size of the main memory of the testing machine.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130664688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675155
Christopher M. Sewell, Li-Ta Lo, J. Ahrens
Data-parallelism is a programming model that maps well to architectures with a high degree of concurrency. Algorithms written using data-parallel primitives can be easily ported to any architecture for which an implementation of these primitives exists, making efficient use of the available parallelism on each. We have previously published results demonstrating our ability to compile the same data-parallel code for several visualization algorithms onto different on-node parallel architectures (GPUs and multi-core CPUs) using our extension of NVIDIA's Thrust library. In this paper, we discuss our extension of Thrust to support concurrency in distributed memory environments across multiple nodes. This enables the application developer to write data-parallel algorithms while viewing the data as single, long vectors, essentially without needing to explicitly take into consideration whether the values are actually distributed across nodes. Our distributed wrapper for Thrust handles the communication in the backend using MPI, while still using the standard Thrust library to take advantage of available on-node parallelism. We describe the details of our distributed implementations of several key data-parallel primitives, including scan, scatter/gather, sort, reduce, and upper/lower bound. We also present two higher-level distributed algorithms developed using these primitives: isosurface and KD-tree construction. Finally, we provide timing results demonstrating the ability of these algorithms to take advantage of available parallelism on nodes and across multiple nodes, and discuss scaling limitations for communication-intensive algorithms such as KD-tree construction.
{"title":"Portable data-parallel visualization and analysis in distributed memory environments","authors":"Christopher M. Sewell, Li-Ta Lo, J. Ahrens","doi":"10.1109/LDAV.2013.6675155","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675155","url":null,"abstract":"Data-parallelism is a programming model that maps well to architectures with a high degree of concurrency. Algorithms written using data-parallel primitives can be easily ported to any architecture for which an implementation of these primitives exists, making efficient use of the available parallelism on each. We have previously published results demonstrating our ability to compile the same data-parallel code for several visualization algorithms onto different on-node parallel architectures (GPUs and multi-core CPUs) using our extension of NVIDIA's Thrust library. In this paper, we discuss our extension of Thrust to support concurrency in distributed memory environments across multiple nodes. This enables the application developer to write data-parallel algorithms while viewing the data as single, long vectors, essentially without needing to explicitly take into consideration whether the values are actually distributed across nodes. Our distributed wrapper for Thrust handles the communication in the backend using MPI, while still using the standard Thrust library to take advantage of available on-node parallelism. We describe the details of our distributed implementations of several key data-parallel primitives, including scan, scatter/gather, sort, reduce, and upper/lower bound. We also present two higher-level distributed algorithms developed using these primitives: isosurface and KD-tree construction. Finally, we provide timing results demonstrating the ability of these algorithms to take advantage of available parallelism on nodes and across multiple nodes, and discuss scaling limitations for communication-intensive algorithms such as KD-tree construction.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131542715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675154
Chun-Ming Chen, Han-Wei Shen
As the size of scientific data sets continues to increase, performing effective data analysis and visualization becomes increasingly difficult. Desktop machines, still the scientists' favorite platform to perform analysis and visualization computation, usually do not have enough memory to load the entire data set all at once. For time-varying flow visualization, the Finite-Time Lyapunov Exponent (FTLE) allows one to glean insight into the existence of the Lagrangian Coherence Structures (LCS) by quantifying the separation of flows. To obtain high resolution FTLE fields, the computation of FTLE requires tracing particles from every grid point and at every time step. Because the size of the time-varying flow data can easily exceed the amount of available memory in the desktop machines, efficient out-of-core FTLE computation algorithms that minimize the I/O overhead are very much needed. To tackle this problem, one can perform a batch mode computation of particle tracing where the particles are organized into different groups, and at any time only one group of particles are advected in the time-varying field. Since tracing particles requires loading the necessary data blocks on demand along the flow paths, to maximize the usage of the data and minimize the I/O cost, an effective scheduling of particles becomes essential. The main challenge is to avoid reloading the same data blocks that were previously processed. In this paper, to solve the problem we model the flow as a directed weighted graph and predict the access dependency among the data blocks, i.e., the path of particles, using Markov chain. With the predicted path we devise an optimization method that groups the particles into different processing batches to minimize the total number of block accesses from the disk. Experimental results show that our scheduling algorithm performs better than algorithms based on a general space-filling ordering.
{"title":"Graph-based seed scheduling for out-of-core FTLE and pathline computation","authors":"Chun-Ming Chen, Han-Wei Shen","doi":"10.1109/LDAV.2013.6675154","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675154","url":null,"abstract":"As the size of scientific data sets continues to increase, performing effective data analysis and visualization becomes increasingly difficult. Desktop machines, still the scientists' favorite platform to perform analysis and visualization computation, usually do not have enough memory to load the entire data set all at once. For time-varying flow visualization, the Finite-Time Lyapunov Exponent (FTLE) allows one to glean insight into the existence of the Lagrangian Coherence Structures (LCS) by quantifying the separation of flows. To obtain high resolution FTLE fields, the computation of FTLE requires tracing particles from every grid point and at every time step. Because the size of the time-varying flow data can easily exceed the amount of available memory in the desktop machines, efficient out-of-core FTLE computation algorithms that minimize the I/O overhead are very much needed. To tackle this problem, one can perform a batch mode computation of particle tracing where the particles are organized into different groups, and at any time only one group of particles are advected in the time-varying field. Since tracing particles requires loading the necessary data blocks on demand along the flow paths, to maximize the usage of the data and minimize the I/O cost, an effective scheduling of particles becomes essential. The main challenge is to avoid reloading the same data blocks that were previously processed. In this paper, to solve the problem we model the flow as a directed weighted graph and predict the access dependency among the data blocks, i.e., the path of particles, using Markov chain. With the predicted path we devise an optimization method that groups the particles into different processing batches to minimize the total number of block accesses from the disk. Experimental results show that our scheduling algorithm performs better than algorithms based on a general space-filling ordering.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116614774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675172
Omar Eltayeby, Dwayne John, Pragnesh Patel, S. Simmerman
One of the challenging tasks in visual analytics is to target clustered time-series data sets, since it is important for data analysts to discover patterns changing over time while keeping their focus on particular subsets. A web-based application to monitor the Lustre file system for system administrators and operation teams has been developed using D3 and Highcharts. This application is a use case to compare those JavaScript libraries demonstrating the differences in capabilities of using them. The goal of this application is to provide time-series visuals of the Remote Procedure Calls (RPCs) and storage patterns of users on Kraken, a University of Tennessee High Performance Computing (HPC) resource in Oak Ridge National Laboratory (ORNL).
{"title":"Comparative case study between D3 & Highcharts on Lustre metadata visualization","authors":"Omar Eltayeby, Dwayne John, Pragnesh Patel, S. Simmerman","doi":"10.1109/LDAV.2013.6675172","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675172","url":null,"abstract":"One of the challenging tasks in visual analytics is to target clustered time-series data sets, since it is important for data analysts to discover patterns changing over time while keeping their focus on particular subsets. A web-based application to monitor the Lustre file system for system administrators and operation teams has been developed using D3 and Highcharts. This application is a use case to compare those JavaScript libraries demonstrating the differences in capabilities of using them. The goal of this application is to provide time-series visuals of the Remote Procedure Calls (RPCs) and storage patterns of users on Kraken, a University of Tennessee High Performance Computing (HPC) resource in Oak Ridge National Laboratory (ORNL).","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124394011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675152
Cornelius Müller, David Camp, B. Hentschel, C. Garth
Particle advection is an important vector field visualization technique that is difficult to apply to very large data sets in a distributed setting due to scalability limitations in existing algorithms. In this paper, we report on several experiments using work requesting dynamic scheduling which achieves balanced work distribution on arbitrary problems with minimal communication overhead. We present a corresponding prototype implementation, provide and analyze benchmark results, and compare our results to an existing algorithm.
{"title":"Distributed parallel particle advection using work requesting","authors":"Cornelius Müller, David Camp, B. Hentschel, C. Garth","doi":"10.1109/LDAV.2013.6675152","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675152","url":null,"abstract":"Particle advection is an important vector field visualization technique that is difficult to apply to very large data sets in a distributed setting due to scalability limitations in existing algorithms. In this paper, we report on several experiments using work requesting dynamic scheduling which achieves balanced work distribution on arbitrary problems with minimal communication overhead. We present a corresponding prototype implementation, provide and analyze benchmark results, and compare our results to an existing algorithm.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125289797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675158
Benjamin Lorendeau, Y. Fournier, A. Ribés
Numerical simulations using supercomputers are producing an increasingly larger volume of data to be visualized. In this context, Catalyst is a prototype In-Situ visualization library developed by Kitware to help reduce the data post-treatment overhead. On the other side, Code Saturne is a Computational Fluid Dynamics code used at Eléctricité de France (EDF), one of the biggest electricity producers in Europe, for its large scale numerical simulations. In this article we present a study case where Catalyst is integrated into Code Saturne. We evaluate the feasibility and performance of this integration by running two test cases in one of our corporate supercomputers.
{"title":"In-Situ visualization in fluid mechanics using Catalyst: A case study for Code Saturne","authors":"Benjamin Lorendeau, Y. Fournier, A. Ribés","doi":"10.1109/LDAV.2013.6675158","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675158","url":null,"abstract":"Numerical simulations using supercomputers are producing an increasingly larger volume of data to be visualized. In this context, Catalyst is a prototype In-Situ visualization library developed by Kitware to help reduce the data post-treatment overhead. On the other side, Code Saturne is a Computational Fluid Dynamics code used at Eléctricité de France (EDF), one of the biggest electricity producers in Europe, for its large scale numerical simulations. In this article we present a study case where Catalyst is integrated into Code Saturne. We evaluate the feasibility and performance of this integration by running two test cases in one of our corporate supercomputers.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114359509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675168
Winston Lee, A. Kejariwal, Bryce Yan
`Anywhere, Anytime and Any Device' is often used to characterize the next generation Internet. Achieving the above in light of the increasing use of the Internet worldwide, especially fueled by mobile Internet usage, and the exponential growth in the number of connected devices is non-trivial. In particular, the three As require development of infrastructure which is highly available, performant and scalable. Additionally, from a corporate standpoint, high efficiency is of utmost importance. To facilitate high availability, deep observability of physical, system and application metrics and analytics support, say for systematic capacity planning, is needed. Although there exist many commercial services to assist observability in the data center, public/ private cloud, they lack analytics support. To this end, we developed a framework at Twitter, called Chiffchaff, to drive capacity planning in light of growing user base. Specifically, the framework provides support for automatic mining of application metrics and subsequent visualization of trends (for example, Week-over-Week (WoW), Month-over-Month (MoM)), data distribution et cetera. Further, the framework enables deep diving into traffic patterns, which can be used to guide load balancing in shared systems. We illustrate the use of Chiffchaff with production traffic.
{"title":"Chiffchaff: Observability and analytics to achieve high availability","authors":"Winston Lee, A. Kejariwal, Bryce Yan","doi":"10.1109/LDAV.2013.6675168","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675168","url":null,"abstract":"`Anywhere, Anytime and Any Device' is often used to characterize the next generation Internet. Achieving the above in light of the increasing use of the Internet worldwide, especially fueled by mobile Internet usage, and the exponential growth in the number of connected devices is non-trivial. In particular, the three As require development of infrastructure which is highly available, performant and scalable. Additionally, from a corporate standpoint, high efficiency is of utmost importance. To facilitate high availability, deep observability of physical, system and application metrics and analytics support, say for systematic capacity planning, is needed. Although there exist many commercial services to assist observability in the data center, public/ private cloud, they lack analytics support. To this end, we developed a framework at Twitter, called Chiffchaff, to drive capacity planning in light of growing user base. Specifically, the framework provides support for automatic mining of application metrics and subsequent visualization of trends (for example, Week-over-Week (WoW), Month-over-Month (MoM)), data distribution et cetera. Further, the framework enables deep diving into traffic patterns, which can be used to guide load balancing in shared systems. We illustrate the use of Chiffchaff with production traffic.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114815266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/LDAV.2013.6675164
R. Hafen, Luke J. Gosink, J. Mcdermott, Karin D. Rodland, K. K. Dam, W. Cleveland
Trelliscope emanates from the Trellis Display framework for visualization and the Divide and Recombine (D&R) approach to analyzing large complex data. In Trellis, the data are broken up into subsets, a visualization method is applied to each subset, and the display result is an array of panels, one per subset. This is a powerful framework for visualization of data, both small and large. In D&R, the data are broken up into subsets, and any analytic method from statistics and machine learning is applied to each subset independently. Then the outputs are recombined. This provides not only a powerful framework for analysis, but also feasible and practical computations using distributed computational facilities. It enables deep analysis of the data: study of both data summaries as well as the detailed data at their finest granularity. This is critical to full understanding of the data. It also enables the analyst to program using an interactive high-level language for data analysis such as R, which allows the analyst to focus more on the data and less on code. In this paper we introduce Trelliscope, a system that scales Trellis to large complex data. It provides a way to create displays with a very large number of panels and an interactive viewer that allows the analyst to sort, filter, and sample the panels in a meaningful way. We discuss the underlying principles, design, and scalable architecture of Trelliscope, and illustrate its use on three analysis projects in proteomics, high intensity physics, and power systems engineering.
Trelliscope源于用于可视化的Trellis Display框架和用于分析大型复杂数据的Divide and recombination (D&R)方法。在Trellis中,数据被分解成子集,对每个子集应用可视化方法,显示结果是一个面板数组,每个子集一个。这是一个强大的数据可视化框架,无论大小。在D&R中,数据被分解成子集,统计学和机器学习中的任何分析方法都被独立地应用于每个子集。然后重新组合输出。这不仅提供了一个强大的分析框架,而且还提供了使用分布式计算设施进行可行和实用的计算。它可以对数据进行深入分析:既可以研究数据摘要,也可以研究最细粒度的详细数据。这对于充分理解数据至关重要。它还使分析人员能够使用交互式高级语言(如R)进行数据分析,从而使分析人员能够更多地关注数据,而不是代码。本文介绍了一个将网格扩展到大型复杂数据的系统Trelliscope。它提供了一种方法来创建具有大量面板和交互式查看器的显示,该查看器允许分析人员以有意义的方式对面板进行排序、过滤和采样。我们讨论了Trelliscope的基本原理、设计和可扩展架构,并说明了它在蛋白质组学、高强度物理和电力系统工程中的三个分析项目中的使用。
{"title":"Trelliscope: A system for detailed visualization in the deep analysis of large complex data","authors":"R. Hafen, Luke J. Gosink, J. Mcdermott, Karin D. Rodland, K. K. Dam, W. Cleveland","doi":"10.1109/LDAV.2013.6675164","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675164","url":null,"abstract":"Trelliscope emanates from the Trellis Display framework for visualization and the Divide and Recombine (D&R) approach to analyzing large complex data. In Trellis, the data are broken up into subsets, a visualization method is applied to each subset, and the display result is an array of panels, one per subset. This is a powerful framework for visualization of data, both small and large. In D&R, the data are broken up into subsets, and any analytic method from statistics and machine learning is applied to each subset independently. Then the outputs are recombined. This provides not only a powerful framework for analysis, but also feasible and practical computations using distributed computational facilities. It enables deep analysis of the data: study of both data summaries as well as the detailed data at their finest granularity. This is critical to full understanding of the data. It also enables the analyst to program using an interactive high-level language for data analysis such as R, which allows the analyst to focus more on the data and less on code. In this paper we introduce Trelliscope, a system that scales Trellis to large complex data. It provides a way to create displays with a very large number of panels and an interactive viewer that allows the analyst to sort, filter, and sample the panels in a meaningful way. We discuss the underlying principles, design, and scalable architecture of Trelliscope, and illustrate its use on three analysis projects in proteomics, high intensity physics, and power systems engineering.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123848568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}