Ali Saleh, R. Javidan, Mohammad Taghi FatehiKhajeh
{"title":"数据网格的四阶段数据复制算法","authors":"Ali Saleh, R. Javidan, Mohammad Taghi FatehiKhajeh","doi":"10.14419/JACST.V4I1.4009","DOIUrl":null,"url":null,"abstract":"Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.","PeriodicalId":445404,"journal":{"name":"Journal of Advanced Computer Science and Technology","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A four-phase data replication algorithm for data grid\",\"authors\":\"Ali Saleh, R. Javidan, Mohammad Taghi FatehiKhajeh\",\"doi\":\"10.14419/JACST.V4I1.4009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.\",\"PeriodicalId\":445404,\"journal\":{\"name\":\"Journal of Advanced Computer Science and Technology\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-04-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Advanced Computer Science and Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14419/JACST.V4I1.4009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Advanced Computer Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14419/JACST.V4I1.4009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A four-phase data replication algorithm for data grid
Nowadays, scientific applications generate a huge amount of data in terabytes or petabytes. Data grids currently proposed solutions to large scale data management problems including efficient file transfer and replication. Data is typically replicated in a Data Grid to improve the job response time and data availability. A reasonable number and right locations for replicas has become a challenge in the Data Grid. In this paper, a four-phase dynamic data replication algorithm based on Temporal and Geographical locality is proposed. It includes: 1) evaluating and identifying the popular data and triggering a replication operation when the popularity data passes a dynamic threshold; 2) analyzing and modeling the relationship between system availability and the number of replicas, and calculating a suitable number of new replicas; 3) evaluating and identifying the popular data in each site, and placing replicas among them; 4) removing files with least cost of average access time when encountering insufficient space for replication. The algorithm was tested using a grid simulator, OptorSim developed by European Data Grid Projects. The simulation results show that the proposed algorithm has better performance in comparison with other algorithms in terms of job execution time, effective network usage and percentage of storage filled.