Francisco Rodrigo Duro, Francisco Javier García Blas, Florin Isaila, J. Wozniak, J. Carretero, R. Ross
{"title":"在内存对象存储上灵活的数据感知调度工作流","authors":"Francisco Rodrigo Duro, Francisco Javier García Blas, Florin Isaila, J. Wozniak, J. Carretero, R. Ross","doi":"10.1109/CCGrid.2016.40","DOIUrl":null,"url":null,"abstract":"This paper explores novel techniques for improving the performance of many-task workflows based on the Swift scripting language. We propose novel programmer options for automated distributed data placement and task scheduling. These options trigger a data placement mechanism used for distributing intermediate workflow data over the servers of Hercules, a distributed key-value store that can be used to cache file system data. We demonstrate that these new mechanisms can significantly improve the aggregated throughput of many-task workflows with up to 86x, reduce the contention on the shared file system, exploit the data locality, and trade off locality and load balance.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Flexible Data-Aware Scheduling for Workflows over an In-memory Object Store\",\"authors\":\"Francisco Rodrigo Duro, Francisco Javier García Blas, Florin Isaila, J. Wozniak, J. Carretero, R. Ross\",\"doi\":\"10.1109/CCGrid.2016.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper explores novel techniques for improving the performance of many-task workflows based on the Swift scripting language. We propose novel programmer options for automated distributed data placement and task scheduling. These options trigger a data placement mechanism used for distributing intermediate workflow data over the servers of Hercules, a distributed key-value store that can be used to cache file system data. We demonstrate that these new mechanisms can significantly improve the aggregated throughput of many-task workflows with up to 86x, reduce the contention on the shared file system, exploit the data locality, and trade off locality and load balance.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.40\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Flexible Data-Aware Scheduling for Workflows over an In-memory Object Store
This paper explores novel techniques for improving the performance of many-task workflows based on the Swift scripting language. We propose novel programmer options for automated distributed data placement and task scheduling. These options trigger a data placement mechanism used for distributing intermediate workflow data over the servers of Hercules, a distributed key-value store that can be used to cache file system data. We demonstrate that these new mechanisms can significantly improve the aggregated throughput of many-task workflows with up to 86x, reduce the contention on the shared file system, exploit the data locality, and trade off locality and load balance.