EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management

IF 2.6 3区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE ACM Transactions on Storage Pub Date : 2022-12-12 DOI:https://dl.acm.org/doi/10.1145/3523698

Margaret Lawson, William Gropp, Jay Lofstead

{"title":"EMPRESS: Accelerating Scientific Discovery through Descriptive Metadata Management","authors":"Margaret Lawson, William Gropp, Jay Lofstead","doi":"https://dl.acm.org/doi/10.1145/3523698","DOIUrl":null,"url":null,"abstract":"<p>High-performance computing scientists are producing unprecedented volumes of data that take a long time to load for analysis. However, many analyses only require loading in the data containing particular features of interest and scientists have many approaches for identifying these features. Therefore, if scientists store information (descriptive metadata) about these identified features, then for subsequent analyses they can use this information to only read in the data containing these features. This can greatly reduce the amount of data that scientists have to read in, thereby accelerating analysis. Despite the potential benefits of descriptive metadata management, no prior work has created a descriptive metadata system that can help scientists working with a wide range of applications and analyses to restrict their reads to data containing features of interest. In this article, we present EMPRESS, the first such solution. EMPRESS offers all of the features needed to help accelerate discovery: It can accelerate analysis by up to 300 ×, supports a wide range of applications and analyses, is high-performing, is highly scalable, and requires minimal storage space. In addition, EMPRESS offers features required for a production-oriented system: scalable metadata consistency techniques, flexible system configurations, fault tolerance as a service, and portability.</p>","PeriodicalId":49113,"journal":{"name":"ACM Transactions on Storage","volume":"44 5","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2022-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Storage","FirstCategoryId":"94","ListUrlMain":"https://doi.org/https://dl.acm.org/doi/10.1145/3523698","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

High-performance computing scientists are producing unprecedented volumes of data that take a long time to load for analysis. However, many analyses only require loading in the data containing particular features of interest and scientists have many approaches for identifying these features. Therefore, if scientists store information (descriptive metadata) about these identified features, then for subsequent analyses they can use this information to only read in the data containing these features. This can greatly reduce the amount of data that scientists have to read in, thereby accelerating analysis. Despite the potential benefits of descriptive metadata management, no prior work has created a descriptive metadata system that can help scientists working with a wide range of applications and analyses to restrict their reads to data containing features of interest. In this article, we present EMPRESS, the first such solution. EMPRESS offers all of the features needed to help accelerate discovery: It can accelerate analysis by up to 300 ×, supports a wide range of applications and analyses, is high-performing, is highly scalable, and requires minimal storage space. In addition, EMPRESS offers features required for a production-oriented system: scalable metadata consistency techniques, flexible system configurations, fault tolerance as a service, and portability.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

女皇:通过描述性元数据管理加速科学发现

高性能计算科学家正在产生前所未有的大量数据，这些数据需要很长时间才能加载以供分析。然而，许多分析只需要加载包含感兴趣的特定特征的数据，科学家有许多方法来识别这些特征。因此，如果科学家存储了关于这些已识别特征的信息(描述性元数据)，那么在后续分析中，他们可以使用这些信息只读取包含这些特征的数据。这可以大大减少科学家必须读取的数据量，从而加快分析速度。尽管描述性元数据管理具有潜在的好处，但之前还没有工作创建了一个描述性元数据系统，可以帮助科学家处理广泛的应用和分析，将他们的读取限制在包含感兴趣特征的数据上。在本文中，我们提出了第一个这样的解决方案EMPRESS。EMPRESS提供了加速发现所需的所有功能:它可以加速高达300倍的分析，支持广泛的应用和分析，高性能，高度可扩展，并且需要最小的存储空间。此外，EMPRESS还提供面向生产的系统所需的特性:可扩展的元数据一致性技术、灵活的系统配置、作为服务的容错以及可移植性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM Transactions on Storage COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

4.20

自引率

5.90%

发文量

审稿时长

>12 weeks

期刊介绍： The ACM Transactions on Storage (TOS) is a new journal with an intent to publish original archival papers in the area of storage and closely related disciplines. Articles that appear in TOS will tend either to present new techniques and concepts or to report novel experiences and experiments with practical systems. Storage is a broad and multidisciplinary area that comprises of network protocols, resource management, data backup, replication, recovery, devices, security, and theory of data coding, densities, and low-power. Potential synergies among these fields are expected to open up new research directions.