2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)最新文献

英文中文

Visualization of multivariate dark matter halos in cosmology simulations 宇宙学模拟中多元暗物质晕的可视化

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675174

Jay Takle, D. Silver, E. Kovacs, K. Heitmann

In this poster, we present a new approach to visualize multivariate dark matter halos representing the spheroid part of galaxies, the disk part of galaxies, black holes and the halo itself. The data being visualized here is an end result of tracking the evolution of cosmic structures called dark matter halos in cosmological simulation and evaluating the formation and evolution of galaxies within. Cosmologists have traditionally visualized individual galaxies in the form of two dimensional density maps, graphs and parallel coordinates. We introduce a new way of mapping multiple parameters of dark matter halos to a halo-icon. This allows the scientist to view all of the parameters associated with the dark matter halos in one visualization.

在这张海报中，我们提出了一种新的方法来可视化多元暗物质晕，它代表了星系的球体部分、星系的盘状部分、黑洞和晕本身。这里可视化的数据是在宇宙模拟中跟踪宇宙结构(称为暗物质晕)的演化并评估其中星系的形成和演化的最终结果。宇宙学家传统上以二维密度图、图形和平行坐标的形式将单个星系可视化。我们介绍了一种将暗物质晕的多个参数映射到晕标上的新方法。这使得科学家可以在一个可视化中查看与暗物质晕相关的所有参数。

引用次数: 5

Efficient range distribution query in large-scale scientific data 大规模科学数据中有效的距离分布查询

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675171

Abon Chaudhuri, Teng-Yok Lee, Han-Wei Shen, T. Peterka

Frequent access to raw data is no longer practical, if possible at all, for answering queries on large-scale data. This has led to the use of distribution-based data summaries, which can substitute for raw data to answer statistical queries of different kinds. Our work is concerned with range distribution query, which returns the distribution of an axis-aligned region of any size. We address the challenge of maintaining the interactivity and accuracy of such query results in the presence of large data. This work presents a novel and efficient framework for pre-computing and storing a set of distributions which can be used to query any arbitrary region during post-processing. We adapt an integral image based data structure to answer such queries in constant time, and propose a similarity-based encoding technique to reduce the storage cost of the data structure. Our scheme utilizes the similarity present among different regions in the data, and hence, their respective distributions. We demonstrate the use our technique in various applications, which directly or indirectly require distributions.

如果可能的话，频繁访问原始数据对于回答大规模数据的查询已不再实际。这导致使用基于分布的数据摘要，它可以代替原始数据来回答不同类型的统计查询。我们的工作涉及范围分布查询，它返回任何大小的轴对齐区域的分布。我们解决了在大数据存在的情况下保持这些查询结果的交互性和准确性的挑战。本文提出了一种新颖有效的框架，用于预计算和存储一组分布，这些分布可用于在后处理期间查询任意区域。我们采用一种基于积分图像的数据结构来在恒定时间内回答这类查询，并提出了一种基于相似性的编码技术来降低数据结构的存储成本。我们的方案利用了数据中不同区域之间存在的相似性，因此，它们各自的分布。我们将演示在各种应用程序中使用我们的技术，这些应用程序直接或间接地需要发行版。

引用次数: 2

Interactive rendering and efficient querying for large multivariate seismic volumes on consumer level PCs 在消费者级pc上对大型多变量地震卷进行交互式渲染和高效查询

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675167

Liang Zhou, C. Hansen

We present a volume visualization method that allows interactive rendering and efficient querying of large multivariate seismic volume data on consumer level PCs. The volume rendering pipeline utilizes a virtual memory structure that supports out-of-core multivariate multi-resolution data and a GPU-based ray caster that allows interactive multivariate transfer function design. A Gaussian mixture model representation is precomputed and nearly interactive querying is achieved by testing the Gaussian functions against user defined transfer functions on the GPU in the runtime. Finally, the method has been tested on a multivariate 3D seismic dataset which is larger than the size of the main memory of the testing machine.

我们提出了一种体可视化方法，该方法允许在消费者级pc上交互式呈现和高效查询大型多元地震体数据。体绘制管道利用了一个支持核外多变量多分辨率数据的虚拟内存结构和一个基于gpu的光线投射器，允许交互式多变量传递函数设计。预先计算了高斯混合模型表示，并在运行时通过在GPU上测试高斯函数与用户定义的传递函数，实现了近乎交互的查询。最后，在大于测试机主存储器大小的多元三维地震数据集上对该方法进行了测试。

引用次数: 3

Portable data-parallel visualization and analysis in distributed memory environments 分布式存储环境下的便携式数据并行可视化和分析

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675155

Christopher M. Sewell, Li-Ta Lo, J. Ahrens

Data-parallelism is a programming model that maps well to architectures with a high degree of concurrency. Algorithms written using data-parallel primitives can be easily ported to any architecture for which an implementation of these primitives exists, making efficient use of the available parallelism on each. We have previously published results demonstrating our ability to compile the same data-parallel code for several visualization algorithms onto different on-node parallel architectures (GPUs and multi-core CPUs) using our extension of NVIDIA's Thrust library. In this paper, we discuss our extension of Thrust to support concurrency in distributed memory environments across multiple nodes. This enables the application developer to write data-parallel algorithms while viewing the data as single, long vectors, essentially without needing to explicitly take into consideration whether the values are actually distributed across nodes. Our distributed wrapper for Thrust handles the communication in the backend using MPI, while still using the standard Thrust library to take advantage of available on-node parallelism. We describe the details of our distributed implementations of several key data-parallel primitives, including scan, scatter/gather, sort, reduce, and upper/lower bound. We also present two higher-level distributed algorithms developed using these primitives: isosurface and KD-tree construction. Finally, we provide timing results demonstrating the ability of these algorithms to take advantage of available parallelism on nodes and across multiple nodes, and discuss scaling limitations for communication-intensive algorithms such as KD-tree construction.

数据并行是一种编程模型，可以很好地映射到具有高度并发性的体系结构。使用数据并行原语编写的算法可以很容易地移植到存在这些原语实现的任何架构中，从而有效地利用每个架构上可用的并行性。我们之前发布的结果表明，我们能够使用NVIDIA的Thrust库扩展，为不同的节点并行架构(gpu和多核cpu)上的多个可视化算法编译相同的数据并行代码。在本文中，我们讨论了对Thrust的扩展，以支持跨多个节点的分布式内存环境中的并发性。这使得应用程序开发人员可以编写数据并行算法，同时将数据视为单个长向量，基本上不需要显式地考虑值是否实际上分布在节点之间。我们为Thrust设计的分布式包装器使用MPI处理后端通信，同时仍然使用标准Thrust库来利用可用的节点上并行性。我们描述了几个关键数据并行原语的分布式实现的细节，包括扫描、分散/收集、排序、减少和上限/下限。我们还提出了使用这些原语开发的两种高级分布式算法:等值面和kd树构造。最后，我们提供了时序结果，证明了这些算法能够利用节点上和跨多个节点的可用并行性，并讨论了通信密集型算法(如kd树构造)的缩放限制。

{"title":"Portable data-parallel visualization and analysis in distributed memory environments","authors":"Christopher M. Sewell, Li-Ta Lo, J. Ahrens","doi":"10.1109/LDAV.2013.6675155","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675155","url":null,"abstract":"Data-parallelism is a programming model that maps well to architectures with a high degree of concurrency. Algorithms written using data-parallel primitives can be easily ported to any architecture for which an implementation of these primitives exists, making efficient use of the available parallelism on each. We have previously published results demonstrating our ability to compile the same data-parallel code for several visualization algorithms onto different on-node parallel architectures (GPUs and multi-core CPUs) using our extension of NVIDIA's Thrust library. In this paper, we discuss our extension of Thrust to support concurrency in distributed memory environments across multiple nodes. This enables the application developer to write data-parallel algorithms while viewing the data as single, long vectors, essentially without needing to explicitly take into consideration whether the values are actually distributed across nodes. Our distributed wrapper for Thrust handles the communication in the backend using MPI, while still using the standard Thrust library to take advantage of available on-node parallelism. We describe the details of our distributed implementations of several key data-parallel primitives, including scan, scatter/gather, sort, reduce, and upper/lower bound. We also present two higher-level distributed algorithms developed using these primitives: isosurface and KD-tree construction. Finally, we provide timing results demonstrating the ability of these algorithms to take advantage of available parallelism on nodes and across multiple nodes, and discuss scaling limitations for communication-intensive algorithms such as KD-tree construction.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131542715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Graph-based seed scheduling for out-of-core FTLE and pathline computation 基于图的核外FTLE和路径计算种子调度

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675154

Chun-Ming Chen, Han-Wei Shen

As the size of scientific data sets continues to increase, performing effective data analysis and visualization becomes increasingly difficult. Desktop machines, still the scientists' favorite platform to perform analysis and visualization computation, usually do not have enough memory to load the entire data set all at once. For time-varying flow visualization, the Finite-Time Lyapunov Exponent (FTLE) allows one to glean insight into the existence of the Lagrangian Coherence Structures (LCS) by quantifying the separation of flows. To obtain high resolution FTLE fields, the computation of FTLE requires tracing particles from every grid point and at every time step. Because the size of the time-varying flow data can easily exceed the amount of available memory in the desktop machines, efficient out-of-core FTLE computation algorithms that minimize the I/O overhead are very much needed. To tackle this problem, one can perform a batch mode computation of particle tracing where the particles are organized into different groups, and at any time only one group of particles are advected in the time-varying field. Since tracing particles requires loading the necessary data blocks on demand along the flow paths, to maximize the usage of the data and minimize the I/O cost, an effective scheduling of particles becomes essential. The main challenge is to avoid reloading the same data blocks that were previously processed. In this paper, to solve the problem we model the flow as a directed weighted graph and predict the access dependency among the data blocks, i.e., the path of particles, using Markov chain. With the predicted path we devise an optimization method that groups the particles into different processing batches to minimize the total number of block accesses from the disk. Experimental results show that our scheduling algorithm performs better than algorithms based on a general space-filling ordering.

随着科学数据集规模的不断增加，进行有效的数据分析和可视化变得越来越困难。台式计算机仍然是科学家们最喜欢的进行分析和可视化计算的平台，但通常没有足够的内存来一次加载整个数据集。对于时变流动可视化，有限时间李雅普诺夫指数(FTLE)允许人们通过量化流动分离来深入了解拉格朗日相干结构(LCS)的存在。为了获得高分辨率的FTLE场，FTLE的计算需要在每个网格点和每个时间步长对粒子进行跟踪。由于时变流数据的大小很容易超过桌面计算机中的可用内存量，因此非常需要能够将I/O开销降至最低的高效外核FTLE计算算法。为了解决这一问题，可以执行粒子跟踪的批处理模式计算，其中粒子被组织成不同的组，并且在任何时候只有一组粒子在时变场中平流。由于跟踪粒子需要沿着流路径按需加载必要的数据块，为了最大化数据的使用并最小化I/O成本，有效的粒子调度变得至关重要。主要的挑战是避免重新加载先前处理过的相同数据块。为了解决这个问题，我们将流建模为一个有向加权图，并利用马尔可夫链预测数据块之间的访问依赖关系，即粒子的路径。根据预测的路径，我们设计了一种优化方法，将粒子分组到不同的处理批次中，以最小化磁盘的块访问总数。实验结果表明，该调度算法优于基于一般空间填充顺序的调度算法。

{"title":"Graph-based seed scheduling for out-of-core FTLE and pathline computation","authors":"Chun-Ming Chen, Han-Wei Shen","doi":"10.1109/LDAV.2013.6675154","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675154","url":null,"abstract":"As the size of scientific data sets continues to increase, performing effective data analysis and visualization becomes increasingly difficult. Desktop machines, still the scientists' favorite platform to perform analysis and visualization computation, usually do not have enough memory to load the entire data set all at once. For time-varying flow visualization, the Finite-Time Lyapunov Exponent (FTLE) allows one to glean insight into the existence of the Lagrangian Coherence Structures (LCS) by quantifying the separation of flows. To obtain high resolution FTLE fields, the computation of FTLE requires tracing particles from every grid point and at every time step. Because the size of the time-varying flow data can easily exceed the amount of available memory in the desktop machines, efficient out-of-core FTLE computation algorithms that minimize the I/O overhead are very much needed. To tackle this problem, one can perform a batch mode computation of particle tracing where the particles are organized into different groups, and at any time only one group of particles are advected in the time-varying field. Since tracing particles requires loading the necessary data blocks on demand along the flow paths, to maximize the usage of the data and minimize the I/O cost, an effective scheduling of particles becomes essential. The main challenge is to avoid reloading the same data blocks that were previously processed. In this paper, to solve the problem we model the flow as a directed weighted graph and predict the access dependency among the data blocks, i.e., the path of particles, using Markov chain. With the predicted path we devise an optimization method that groups the particles into different processing batches to minimize the total number of block accesses from the disk. Experimental results show that our scheduling algorithm performs better than algorithms based on a general space-filling ordering.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116614774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Comparative case study between D3 & Highcharts on Lustre metadata visualization D3和Highcharts在Lustre元数据可视化上的比较案例研究

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675172

Omar Eltayeby, Dwayne John, Pragnesh Patel, S. Simmerman

One of the challenging tasks in visual analytics is to target clustered time-series data sets, since it is important for data analysts to discover patterns changing over time while keeping their focus on particular subsets. A web-based application to monitor the Lustre file system for system administrators and operation teams has been developed using D3 and Highcharts. This application is a use case to compare those JavaScript libraries demonstrating the differences in capabilities of using them. The goal of this application is to provide time-series visuals of the Remote Procedure Calls (RPCs) and storage patterns of users on Kraken, a University of Tennessee High Performance Computing (HPC) resource in Oak Ridge National Laboratory (ORNL).

可视化分析中具有挑战性的任务之一是针对聚类时间序列数据集，因为对于数据分析师来说，在关注特定子集的同时发现随时间变化的模式是很重要的。我们使用D3和Highcharts开发了一个基于web的应用程序，为系统管理员和操作团队监控Lustre文件系统。这个应用程序是一个用例，用于比较这些JavaScript库，演示使用它们的能力的差异。该应用程序的目标是提供Kraken(橡树岭国家实验室(ORNL)的田纳西大学高性能计算(HPC)资源)上的远程过程调用(rpc)和用户存储模式的时间序列可视化。

引用次数: 11

Distributed parallel particle advection using work requesting 基于功请求的分布式平行粒子平流

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675152

Cornelius Müller, David Camp, B. Hentschel, C. Garth

Particle advection is an important vector field visualization technique that is difficult to apply to very large data sets in a distributed setting due to scalability limitations in existing algorithms. In this paper, we report on several experiments using work requesting dynamic scheduling which achieves balanced work distribution on arbitrary problems with minimal communication overhead. We present a corresponding prototype implementation, provide and analyze benchmark results, and compare our results to an existing algorithm.

粒子平流是一种重要的矢量场可视化技术，由于现有算法的可扩展性限制，难以应用于分布式环境下的大型数据集。在本文中，我们报告了几个使用工作请求动态调度的实验，以最小的通信开销实现了任意问题的平衡工作分配。我们提出了相应的原型实现，提供并分析了基准测试结果，并将我们的结果与现有算法进行了比较。

引用次数: 34

In-Situ visualization in fluid mechanics using Catalyst: A case study for Code Saturne 使用Catalyst的流体力学现场可视化:Code Saturne的案例研究

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675158

Benjamin Lorendeau, Y. Fournier, A. Ribés

Numerical simulations using supercomputers are producing an increasingly larger volume of data to be visualized. In this context, Catalyst is a prototype In-Situ visualization library developed by Kitware to help reduce the data post-treatment overhead. On the other side, Code Saturne is a Computational Fluid Dynamics code used at Eléctricité de France (EDF), one of the biggest electricity producers in Europe, for its large scale numerical simulations. In this article we present a study case where Catalyst is integrated into Code Saturne. We evaluate the feasibility and performance of this integration by running two test cases in one of our corporate supercomputers.

使用超级计算机的数值模拟正在产生越来越大的可视化数据量。在这种情况下，Catalyst是Kitware开发的一个原型原位可视化库，可以帮助减少数据后处理开销。另一方面，Code Saturne是一种计算流体动力学代码，用于欧洲最大的电力生产商之一法国电力公司(EDF)的大规模数值模拟。在本文中，我们将介绍一个将Catalyst集成到Code saturn中的研究案例。我们通过在公司的一台超级计算机上运行两个测试用例来评估这种集成的可行性和性能。

引用次数: 22

Chiffchaff: Observability and analytics to achieve high availability Chiffchaff:可观察性和分析，以实现高可用性

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675168

Winston Lee, A. Kejariwal, Bryce Yan

`Anywhere, Anytime and Any Device' is often used to characterize the next generation Internet. Achieving the above in light of the increasing use of the Internet worldwide, especially fueled by mobile Internet usage, and the exponential growth in the number of connected devices is non-trivial. In particular, the three As require development of infrastructure which is highly available, performant and scalable. Additionally, from a corporate standpoint, high efficiency is of utmost importance. To facilitate high availability, deep observability of physical, system and application metrics and analytics support, say for systematic capacity planning, is needed. Although there exist many commercial services to assist observability in the data center, public/ private cloud, they lack analytics support. To this end, we developed a framework at Twitter, called Chiffchaff, to drive capacity planning in light of growing user base. Specifically, the framework provides support for automatic mining of application metrics and subsequent visualization of trends (for example, Week-over-Week (WoW), Month-over-Month (MoM)), data distribution et cetera. Further, the framework enables deep diving into traffic patterns, which can be used to guide load balancing in shared systems. We illustrate the use of Chiffchaff with production traffic.

“任何地点、任何时间、任何设备”经常被用来描述下一代互联网。鉴于全球互联网的使用日益增加，尤其是移动互联网的使用，以及连接设备数量的指数级增长，实现上述目标并非微不足道。特别是，这三个a要求开发高可用性、高性能和可扩展的基础设施。此外，从企业的角度来看，高效率是最重要的。为了促进高可用性，需要物理、系统和应用程序度量的深度可观察性和分析支持，例如系统容量规划。尽管存在许多商业服务来帮助数据中心、公共/私有云中的可观察性，但它们缺乏分析支持。为此，我们在Twitter开发了一个名为Chiffchaff的框架，根据不断增长的用户基础来推动容量规划。具体来说，该框架提供了对应用程序度量的自动挖掘和随后趋势的可视化(例如，周比周(WoW)、月比月(MoM))、数据分布等的支持。此外，该框架支持深入了解流量模式，可用于指导共享系统中的负载平衡。我们用生产流量来说明chffchaff的使用。

{"title":"Chiffchaff: Observability and analytics to achieve high availability","authors":"Winston Lee, A. Kejariwal, Bryce Yan","doi":"10.1109/LDAV.2013.6675168","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675168","url":null,"abstract":"`Anywhere, Anytime and Any Device' is often used to characterize the next generation Internet. Achieving the above in light of the increasing use of the Internet worldwide, especially fueled by mobile Internet usage, and the exponential growth in the number of connected devices is non-trivial. In particular, the three As require development of infrastructure which is highly available, performant and scalable. Additionally, from a corporate standpoint, high efficiency is of utmost importance. To facilitate high availability, deep observability of physical, system and application metrics and analytics support, say for systematic capacity planning, is needed. Although there exist many commercial services to assist observability in the data center, public/ private cloud, they lack analytics support. To this end, we developed a framework at Twitter, called Chiffchaff, to drive capacity planning in light of growing user base. Specifically, the framework provides support for automatic mining of application metrics and subsequent visualization of trends (for example, Week-over-Week (WoW), Month-over-Month (MoM)), data distribution et cetera. Further, the framework enables deep diving into traffic patterns, which can be used to guide load balancing in shared systems. We illustrate the use of Chiffchaff with production traffic.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114815266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Trelliscope: A system for detailed visualization in the deep analysis of large complex data Trelliscope:用于对大型复杂数据进行深入分析的详细可视化系统

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

Pub Date : 2013-12-02 DOI: 10.1109/LDAV.2013.6675164

R. Hafen, Luke J. Gosink, J. Mcdermott, Karin D. Rodland, K. K. Dam, W. Cleveland

Trelliscope emanates from the Trellis Display framework for visualization and the Divide and Recombine (D&R) approach to analyzing large complex data. In Trellis, the data are broken up into subsets, a visualization method is applied to each subset, and the display result is an array of panels, one per subset. This is a powerful framework for visualization of data, both small and large. In D&R, the data are broken up into subsets, and any analytic method from statistics and machine learning is applied to each subset independently. Then the outputs are recombined. This provides not only a powerful framework for analysis, but also feasible and practical computations using distributed computational facilities. It enables deep analysis of the data: study of both data summaries as well as the detailed data at their finest granularity. This is critical to full understanding of the data. It also enables the analyst to program using an interactive high-level language for data analysis such as R, which allows the analyst to focus more on the data and less on code. In this paper we introduce Trelliscope, a system that scales Trellis to large complex data. It provides a way to create displays with a very large number of panels and an interactive viewer that allows the analyst to sort, filter, and sample the panels in a meaningful way. We discuss the underlying principles, design, and scalable architecture of Trelliscope, and illustrate its use on three analysis projects in proteomics, high intensity physics, and power systems engineering.

Trelliscope源于用于可视化的Trellis Display框架和用于分析大型复杂数据的Divide and recombination (D&R)方法。在Trellis中，数据被分解成子集，对每个子集应用可视化方法，显示结果是一个面板数组，每个子集一个。这是一个强大的数据可视化框架，无论大小。在D&R中，数据被分解成子集，统计学和机器学习中的任何分析方法都被独立地应用于每个子集。然后重新组合输出。这不仅提供了一个强大的分析框架，而且还提供了使用分布式计算设施进行可行和实用的计算。它可以对数据进行深入分析:既可以研究数据摘要，也可以研究最细粒度的详细数据。这对于充分理解数据至关重要。它还使分析人员能够使用交互式高级语言(如R)进行数据分析，从而使分析人员能够更多地关注数据，而不是代码。本文介绍了一个将网格扩展到大型复杂数据的系统Trelliscope。它提供了一种方法来创建具有大量面板和交互式查看器的显示，该查看器允许分析人员以有意义的方式对面板进行排序、过滤和采样。我们讨论了Trelliscope的基本原理、设计和可扩展架构，并说明了它在蛋白质组学、高强度物理和电力系统工程中的三个分析项目中的使用。

{"title":"Trelliscope: A system for detailed visualization in the deep analysis of large complex data","authors":"R. Hafen, Luke J. Gosink, J. Mcdermott, Karin D. Rodland, K. K. Dam, W. Cleveland","doi":"10.1109/LDAV.2013.6675164","DOIUrl":"https://doi.org/10.1109/LDAV.2013.6675164","url":null,"abstract":"Trelliscope emanates from the Trellis Display framework for visualization and the Divide and Recombine (D&R) approach to analyzing large complex data. In Trellis, the data are broken up into subsets, a visualization method is applied to each subset, and the display result is an array of panels, one per subset. This is a powerful framework for visualization of data, both small and large. In D&R, the data are broken up into subsets, and any analytic method from statistics and machine learning is applied to each subset independently. Then the outputs are recombined. This provides not only a powerful framework for analysis, but also feasible and practical computations using distributed computational facilities. It enables deep analysis of the data: study of both data summaries as well as the detailed data at their finest granularity. This is critical to full understanding of the data. It also enables the analyst to program using an interactive high-level language for data analysis such as R, which allows the analyst to focus more on the data and less on code. In this paper we introduce Trelliscope, a system that scales Trellis to large complex data. It provides a way to create displays with a very large number of panels and an interactive viewer that allows the analyst to sort, filter, and sample the panels in a meaningful way. We discuss the underlying principles, design, and scalable architecture of Trelliscope, and illustrate its use on three analysis projects in proteomics, high intensity physics, and power systems engineering.","PeriodicalId":266607,"journal":{"name":"2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123848568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 22

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2013 IEEE Symposium on Large-Scale Data Analysis and Visualization (LDAV)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀