{"title":"基于FCA方法的海量数据应用的数据放置策略","authors":"Zaki Brahmi, Sahar Mili, Rihab Derouiche","doi":"10.1109/AICCSA.2016.7945616","DOIUrl":null,"url":null,"abstract":"Massive data applications such as E-science applications are characterized by complex treatments on large amounts of data which need to be stored in distributed data centers. In fact, when one task needs several datasets from different data centers, moving these data may cost a lot of time and cause energy's high consumption. Moreover, when the number of the data centers involved in the execution of tasks is high, the total data movement and the execution time increase dramatically and become a bottleneck, since the data centers have a limited bandwidth. Thus, we need a good data placement strategy to minimise the data movement between data centers and reduce the energy consumed. Indeed, many researches are concerned with data placement strategy that distributes data in ways that are advantageous for application execution. In this paper, our data placement strategy aims at grouping the maximum of data and of tasks in a minimal number of data centers. It is based on the Formal Concept Analysis approach (FCA) because its notion of a concept respects our idea since it faithfully represents a group of tasks and data that are required for their execution. It is based on four steps: 1) Hierarchical organization of tasks using Formal Concepts Analysis approach, 2) Selection of candidate concepts, 3) Assigning data in the appropriate data centers and 4) Data replication. Simulations show that our strategy can effectively reduce the data movement and the average query spans compared to the genetic approach.","PeriodicalId":448329,"journal":{"name":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Data placement strategy for massive data applications based on FCA approach\",\"authors\":\"Zaki Brahmi, Sahar Mili, Rihab Derouiche\",\"doi\":\"10.1109/AICCSA.2016.7945616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Massive data applications such as E-science applications are characterized by complex treatments on large amounts of data which need to be stored in distributed data centers. In fact, when one task needs several datasets from different data centers, moving these data may cost a lot of time and cause energy's high consumption. Moreover, when the number of the data centers involved in the execution of tasks is high, the total data movement and the execution time increase dramatically and become a bottleneck, since the data centers have a limited bandwidth. Thus, we need a good data placement strategy to minimise the data movement between data centers and reduce the energy consumed. Indeed, many researches are concerned with data placement strategy that distributes data in ways that are advantageous for application execution. In this paper, our data placement strategy aims at grouping the maximum of data and of tasks in a minimal number of data centers. It is based on the Formal Concept Analysis approach (FCA) because its notion of a concept respects our idea since it faithfully represents a group of tasks and data that are required for their execution. It is based on four steps: 1) Hierarchical organization of tasks using Formal Concepts Analysis approach, 2) Selection of candidate concepts, 3) Assigning data in the appropriate data centers and 4) Data replication. Simulations show that our strategy can effectively reduce the data movement and the average query spans compared to the genetic approach.\",\"PeriodicalId\":448329,\"journal\":{\"name\":\"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AICCSA.2016.7945616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AICCSA.2016.7945616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data placement strategy for massive data applications based on FCA approach
Massive data applications such as E-science applications are characterized by complex treatments on large amounts of data which need to be stored in distributed data centers. In fact, when one task needs several datasets from different data centers, moving these data may cost a lot of time and cause energy's high consumption. Moreover, when the number of the data centers involved in the execution of tasks is high, the total data movement and the execution time increase dramatically and become a bottleneck, since the data centers have a limited bandwidth. Thus, we need a good data placement strategy to minimise the data movement between data centers and reduce the energy consumed. Indeed, many researches are concerned with data placement strategy that distributes data in ways that are advantageous for application execution. In this paper, our data placement strategy aims at grouping the maximum of data and of tasks in a minimal number of data centers. It is based on the Formal Concept Analysis approach (FCA) because its notion of a concept respects our idea since it faithfully represents a group of tasks and data that are required for their execution. It is based on four steps: 1) Hierarchical organization of tasks using Formal Concepts Analysis approach, 2) Selection of candidate concepts, 3) Assigning data in the appropriate data centers and 4) Data replication. Simulations show that our strategy can effectively reduce the data movement and the average query spans compared to the genetic approach.