Hongbo Zou, F. Zheng, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, Qing Liu, N. Podhorszki, S. Klasky
{"title":"面向大规模科学应用的数据质量管理","authors":"Hongbo Zou, F. Zheng, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, Qing Liu, N. Podhorszki, S. Klasky","doi":"10.1109/SC.Companion.2012.114","DOIUrl":null,"url":null,"abstract":"Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use-specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high-end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.","PeriodicalId":6346,"journal":{"name":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","volume":"27 1","pages":"816-820"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Quality-Aware Data Management for Large Scale Scientific Applications\",\"authors\":\"Hongbo Zou, F. Zheng, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, Qing Liu, N. Podhorszki, S. Klasky\",\"doi\":\"10.1109/SC.Companion.2012.114\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use-specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high-end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.\",\"PeriodicalId\":6346,\"journal\":{\"name\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"volume\":\"27 1\",\"pages\":\"816-820\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 SC Companion: High Performance Computing, Networking Storage and Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SC.Companion.2012.114\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 SC Companion: High Performance Computing, Networking Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.Companion.2012.114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Quality-Aware Data Management for Large Scale Scientific Applications
Increasingly larger scale simulations are generating an unprecedented amount of output data, causing researchers to explore new `data staging' methods that buffer, use, and/or reduce such data online rather than simply pushing it to disk. Leveraging the capabilities of data staging, this study explores the potential for data reduction via online data compression, first using general compression techniques and then proposing use-specific methods that permit users to define simple data queries that cause only the data identified by those queries to be emitted. Using online methods for code generation and deployment, with such dynamic data queries, end users can precisely identify the quality of information (QoI) of their output data, by explicitly determining what data may be lost vs. retained, in contrast to general-purpose lossy compression methods that do not provide such levels of control. The paper also describes the key elements of a quality-aware data management system (QADMS) for high-end machines enabled by this approach. Initial experimental results demonstrate that QADMS can effectively reduce data movement cost and improve the QoS while meeting the QoI constraint stated by users.