{"title":"On Performance Modeling and Prediction in Support of Scientific Workflow Optimization","authors":"C. Wu, Vivek Datla","doi":"10.1109/SERVICES.2011.37","DOIUrl":null,"url":null,"abstract":"The computing modules in distributed scientific workflows must be mapped to computer nodes in shared network environments for optimal workflow performance. Finding a good workflow mapping scheme critically depends on an accurate prediction of the execution time of each individual computational module in the workflow. The time prediction of a scientific computation does not have a silver bullet as it is determined collectively by several dynamic system factors including concurrent loads, memory size, CPU speed, and also by the complexity of the computational program itself. This paper investigates the problem of modeling scientific computations and predicting their execution time based on a combination of both hardware and software properties. We employ statistical learning techniques to estimate the effective computational power of a given computer node at any point of time and estimate the total number of CPU cycles needed for executing a given computational program on any input data size. We analytically derive an upper bound of the estimation error for execution time prediction given the hardware and software properties. The proposed statistical analysis-based solution to performance modeling and prediction is validated and justified by experimental results measured on the computing nodes that vary significantly in terms of the hardware specifications.","PeriodicalId":429726,"journal":{"name":"2011 IEEE World Congress on Services","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE World Congress on Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SERVICES.2011.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 22
Abstract
The computing modules in distributed scientific workflows must be mapped to computer nodes in shared network environments for optimal workflow performance. Finding a good workflow mapping scheme critically depends on an accurate prediction of the execution time of each individual computational module in the workflow. The time prediction of a scientific computation does not have a silver bullet as it is determined collectively by several dynamic system factors including concurrent loads, memory size, CPU speed, and also by the complexity of the computational program itself. This paper investigates the problem of modeling scientific computations and predicting their execution time based on a combination of both hardware and software properties. We employ statistical learning techniques to estimate the effective computational power of a given computer node at any point of time and estimate the total number of CPU cycles needed for executing a given computational program on any input data size. We analytically derive an upper bound of the estimation error for execution time prediction given the hardware and software properties. The proposed statistical analysis-based solution to performance modeling and prediction is validated and justified by experimental results measured on the computing nodes that vary significantly in terms of the hardware specifications.