Hongbo Zou, F. Zheng, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, Qing Liu, N. Podhorszki, S. Klasky
{"title":"Quality-Aware Data Management for Large Scale Scientific Applications","authors":"Hongbo Zou, F. Zheng, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, Qing Liu, N. Podhorszki, S. Klasky","doi":"10.1109/SC.Companion.2012.114","DOIUrl":null,"url":null,"abstract":"Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use-specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high-end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"27 1","pages":"816-820"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use-specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high-end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.