{"title":"高能物理应用中网格数据农场架构的调度和复制算法的性能分析","authors":"A. Takefusa, O. Tatebe, S. Matsuoka, Y. Morita","doi":"10.1109/HPDC.2003.1210014","DOIUrl":null,"url":null,"abstract":"Data Grid is a Grid for ubiquitous access and analysis of large-scale data. Because Data Grid is in the early stages of development, the performance of its petabyte-scale models in a realistic data processing setting has not been well investigated. By enhancing our Bricks Grid simulator to accommodated Data Grid scenarios, we investigate and compare the performance of different Data Grid models. These are categorized mainly as either central or tier models; they employ various scheduling and replication strategies under realistic assumptions of job processing for CERN LHC experiments on the Grid Datafarm system. Our results show that the central model is efficient but that the tier model, with its greater resources and its speculative class of background replication policies, are quite effective and achieve higher performance, while each tier is smaller than the central model.","PeriodicalId":430378,"journal":{"name":"High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":"{\"title\":\"Performance analysis of scheduling and replication algorithms on Grid Datafarm architecture for high-energy physics applications\",\"authors\":\"A. Takefusa, O. Tatebe, S. Matsuoka, Y. Morita\",\"doi\":\"10.1109/HPDC.2003.1210014\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Data Grid is a Grid for ubiquitous access and analysis of large-scale data. Because Data Grid is in the early stages of development, the performance of its petabyte-scale models in a realistic data processing setting has not been well investigated. By enhancing our Bricks Grid simulator to accommodated Data Grid scenarios, we investigate and compare the performance of different Data Grid models. These are categorized mainly as either central or tier models; they employ various scheduling and replication strategies under realistic assumptions of job processing for CERN LHC experiments on the Grid Datafarm system. Our results show that the central model is efficient but that the tier model, with its greater resources and its speculative class of background replication policies, are quite effective and achieve higher performance, while each tier is smaller than the central model.\",\"PeriodicalId\":430378,\"journal\":{\"name\":\"High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-06-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"60\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPDC.2003.1210014\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"High Performance Distributed Computing, 2003. Proceedings. 12th IEEE International Symposium on","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2003.1210014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance analysis of scheduling and replication algorithms on Grid Datafarm architecture for high-energy physics applications
Data Grid is a Grid for ubiquitous access and analysis of large-scale data. Because Data Grid is in the early stages of development, the performance of its petabyte-scale models in a realistic data processing setting has not been well investigated. By enhancing our Bricks Grid simulator to accommodated Data Grid scenarios, we investigate and compare the performance of different Data Grid models. These are categorized mainly as either central or tier models; they employ various scheduling and replication strategies under realistic assumptions of job processing for CERN LHC experiments on the Grid Datafarm system. Our results show that the central model is efficient but that the tier model, with its greater resources and its speculative class of background replication policies, are quite effective and achieve higher performance, while each tier is smaller than the central model.