Lessons Learned from Managing a Petabyte

J. Becla, Daniel L. Wang
{"title":"Lessons Learned from Managing a Petabyte","authors":"J. Becla, Daniel L. Wang","doi":"10.2172/839755","DOIUrl":null,"url":null,"abstract":"The amount of data collected and stored by the average business doubles each year. Many commercial databases are already approaching hundreds of terabytes, and at this rate, will soon be managing petabytes. More data enables new functionality and capability, but the larger scale reveals new problems and issues hidden in ''smaller'' terascale environments. This paper presents some of these new problems along with implemented solutions in the framework of a petabyte dataset for a large High Energy Physics experiment. Through experience with two persistence technologies, a commercial database and a file-based approach, we expose format-independent concepts and issues prevalent at this new scale of computing.","PeriodicalId":118073,"journal":{"name":"Conference on Innovative Data Systems Research","volume":"213 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Conference on Innovative Data Systems Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2172/839755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 50

Abstract

The amount of data collected and stored by the average business doubles each year. Many commercial databases are already approaching hundreds of terabytes, and at this rate, will soon be managing petabytes. More data enables new functionality and capability, but the larger scale reveals new problems and issues hidden in ''smaller'' terascale environments. This paper presents some of these new problems along with implemented solutions in the framework of a petabyte dataset for a large High Energy Physics experiment. Through experience with two persistence technologies, a commercial database and a file-based approach, we expose format-independent concepts and issues prevalent at this new scale of computing.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
管理pb的经验教训
一般企业收集和存储的数据量每年都会翻一番。许多商业数据库已经接近数百太字节,按照这个速度,很快就会达到pb级。更多的数据可以实现新的功能和能力,但更大的规模揭示了隐藏在“更小”的万亿级环境中的新问题和问题。本文介绍了其中的一些新问题以及在大型高能物理实验的pb数据集框架下实现的解决方案。通过使用两种持久性技术(商业数据库和基于文件的方法)的经验,我们揭示了在这种新的计算规模中流行的与格式无关的概念和问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Lessons Learned from Managing a Petabyte (Almost) Hands-Off Information Integration for the Life Sciences DPI: The Data Processing Interface for Modern Networks A Case for Staged Database Systems Cache-Oblivious Query Processing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1