J. Annis, Yong Zhao, Jens-S. Vöckler, M. Wilde, S. Kent, Ian T Foster
{"title":"Applying Chimera Virtual Data Concepts to Cluster Finding in the Sloan Sky Survey","authors":"J. Annis, Yong Zhao, Jens-S. Vöckler, M. Wilde, S. Kent, Ian T Foster","doi":"10.1109/SC.2002.10021","DOIUrl":null,"url":null,"abstract":"In many scientific disciplines — especially long running, data- intensive collaborations — it is important to track all aspects of data capture, production, transformation, and analysis. In principle, one can then audit, validate, reproduce, and/or re-run with corrections various data transformations. We have recently proposed and prototyped the Chimera virtual data system, a new database-driven approach to this problem. We present here a major application study in which we apply Chimera to a challenging data analysis problem: the identification of galaxy clusters within the Sloan Digital Sky Survey. We describe the problem, its computational procedures, and the use of Chimera to plan and orchestrate the workflow of thousands of tasks on a data grid comprising hundreds of computers. This experience suggests that a general set of tools can indeed enhance the accuracy and productivity of scientific data reduction and that further development and application of this paradigm will offer great value.","PeriodicalId":302800,"journal":{"name":"ACM/IEEE SC 2002 Conference (SC'02)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"102","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM/IEEE SC 2002 Conference (SC'02)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2002.10021","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 102
Abstract
In many scientific disciplines — especially long running, data- intensive collaborations — it is important to track all aspects of data capture, production, transformation, and analysis. In principle, one can then audit, validate, reproduce, and/or re-run with corrections various data transformations. We have recently proposed and prototyped the Chimera virtual data system, a new database-driven approach to this problem. We present here a major application study in which we apply Chimera to a challenging data analysis problem: the identification of galaxy clusters within the Sloan Digital Sky Survey. We describe the problem, its computational procedures, and the use of Chimera to plan and orchestrate the workflow of thousands of tasks on a data grid comprising hundreds of computers. This experience suggests that a general set of tools can indeed enhance the accuracy and productivity of scientific data reduction and that further development and application of this paradigm will offer great value.