William Y. Arms, Selcuk Aya, Manuel Calimlim, Jim Cordes, J. Deneva, Pavel A. Dmitriev, J. Gehrke, L. Gibbons, C. D. Jones, V. Kuznetsov, D. Lifka, Mirek Riedewald, D. Riley, A. Ryd, G. Sharp
{"title":"Three Case Studies of Large-Scale Data Flows","authors":"William Y. Arms, Selcuk Aya, Manuel Calimlim, Jim Cordes, J. Deneva, Pavel A. Dmitriev, J. Gehrke, L. Gibbons, C. D. Jones, V. Kuznetsov, D. Lifka, Mirek Riedewald, D. Riley, A. Ryd, G. Sharp","doi":"10.1109/ICDEW.2006.148","DOIUrl":null,"url":null,"abstract":"We survey three examples of large-scale scientific workflows that we are working with at Cornell: the Arecibo sky survey, the CLEO high-energy particle physics experiment, and the Web Lab project for enabling social science studies of the Internet. All three projects face the same general challenges: massive amounts of raw data, expensive processing steps, and the requirement to make raw data or data products available to users nation- or world-wide. However, there are several differences that prevent a one-sizefits- all approach to handling their data flows. Instead, current implementations are heavily tuned by domain and data management experts. We describe the three projects, and we outline research issues and opportunities to integrate Grid technology into these workflows.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2006.148","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
We survey three examples of large-scale scientific workflows that we are working with at Cornell: the Arecibo sky survey, the CLEO high-energy particle physics experiment, and the Web Lab project for enabling social science studies of the Internet. All three projects face the same general challenges: massive amounts of raw data, expensive processing steps, and the requirement to make raw data or data products available to users nation- or world-wide. However, there are several differences that prevent a one-sizefits- all approach to handling their data flows. Instead, current implementations are heavily tuned by domain and data management experts. We describe the three projects, and we outline research issues and opportunities to integrate Grid technology into these workflows.