Light-Weight Data Management Solutions for Visualization and Dissemination of Massive Scientific Datasets - Position Paper

G. Agrawal, Yunde Su
{"title":"Light-Weight Data Management Solutions for Visualization and Dissemination of Massive Scientific Datasets - Position Paper","authors":"G. Agrawal, Yunde Su","doi":"10.1109/SC.Companion.2012.157","DOIUrl":null,"url":null,"abstract":"Many of the `big-data' challenges today are arising from increasing computing ability, as data collected from simulations has become extremely valuable for a variety of scientific endeavors. With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. As a specific example, the Global Cloud-Resolving Model (GCRM) currently has a grid-cell size of 4 km, and already produces 1 petabyte of data for a 10 day simulation. Future plans include simulations with a grid-cell size of 1 km, which will increase the data generation 64 folds. Finer granularity of simulation data offers both an opportunity and a challenge. On one hand, it can allow understanding of underlying phenomenon and features in a way that would not be possible with coarser granularity. On the other hand, larger datasets are extremely difficult to store, manage, disseminate, analyze, and visualize. Neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as computing power, contributing to the difficulty in storing, managing, and analyzing these datasets. Simulation data is often disseminated widely, through portals like the Earth System Grid (ESG), and downloaded by researchers all over the world. Such dissemination efforts are hampered by dataset size growth, as wide area data transfer bandwidths are growing at a much slower pace. Finally, while visualizing datasets, human perception is inherently limited.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"144 1","pages":"1296-1300"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.157","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Many of the `big-data' challenges today are arising from increasing computing ability, as data collected from simulations has become extremely valuable for a variety of scientific endeavors. With growing computational capabilities of parallel machines, scientific simulations are being performed at finer spatial and temporal scales, leading to a data explosion. As a specific example, the Global Cloud-Resolving Model (GCRM) currently has a grid-cell size of 4 km, and already produces 1 petabyte of data for a 10 day simulation. Future plans include simulations with a grid-cell size of 1 km, which will increase the data generation 64 folds. Finer granularity of simulation data offers both an opportunity and a challenge. On one hand, it can allow understanding of underlying phenomenon and features in a way that would not be possible with coarser granularity. On the other hand, larger datasets are extremely difficult to store, manage, disseminate, analyze, and visualize. Neither the memory capacity of parallel machines, memory access speeds, nor disk bandwidths are increasing at the same rate as computing power, contributing to the difficulty in storing, managing, and analyzing these datasets. Simulation data is often disseminated widely, through portals like the Earth System Grid (ESG), and downloaded by researchers all over the world. Such dissemination efforts are hampered by dataset size growth, as wide area data transfer bandwidths are growing at a much slower pace. Finally, while visualizing datasets, human perception is inherently limited.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
大规模科学数据集可视化和传播的轻量级数据管理解决方案-立场文件
今天的许多“大数据”挑战都来自于不断提高的计算能力,因为从模拟中收集的数据对各种科学努力都变得非常有价值。随着并行机器的计算能力不断增强,科学模拟正在更精细的空间和时间尺度上进行,导致数据爆炸。作为一个具体的例子,全球云分辨模型(GCRM)目前的网格单元大小为4公里,并且已经为10天的模拟产生了1拍字节的数据。未来的计划包括网格单元大小为1公里的模拟,这将使数据生成增加64倍。更细粒度的模拟数据提供了机遇和挑战。一方面,它允许以一种粗粒度无法实现的方式理解底层现象和特征。另一方面,大型数据集非常难以存储、管理、传播、分析和可视化。并行机器的内存容量、内存访问速度和磁盘带宽都没有以与计算能力相同的速度增长,这增加了存储、管理和分析这些数据集的难度。模拟数据通常通过地球系统网格(ESG)等门户网站广泛传播,并由世界各地的研究人员下载。这种传播努力受到数据集规模增长的阻碍,因为广域数据传输带宽的增长速度要慢得多。最后,在可视化数据集时,人类的感知是有限的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
High Performance Computing and Networking: Select Proceedings of CHSN 2021 High Quality Real-Time Image-to-Mesh Conversion for Finite Element Simulations Abstract: Automatically Adapting Programs for Mixed-Precision Floating-Point Computation Poster: Memory-Conscious Collective I/O for Extreme-Scale HPC Systems Abstract: Virtual Machine Packing Algorithms for Lower Power Consumption
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1