Mincheng Chen, Jingling Yuan, Lin Li, Dongling Liu, Tao Li
{"title":"A Fast Heuristic Attribute Reduction Algorithm Using Spark","authors":"Mincheng Chen, Jingling Yuan, Lin Li, Dongling Liu, Tao Li","doi":"10.1109/ICDCS.2017.38","DOIUrl":null,"url":null,"abstract":"Energy data, which consists of energy consumption statistics and other related data in green data centers, grows dramatically. The energy data has great value, but many attributes within it are redundant and unnecessary. Thus attribute reduction for the energy data has been conceived as a critical step. However, many existing attribute reduction algorithms are often computationally time-consuming. To address these issues, we extend the methodology of rough sets to construct data center energy consumption knowledge representation system. By taking good advantage of in-memory computing, an attribute reduction algorithm for energy data using Spark is proposed. In this algorithm, we use a heuristic formula for measuring the significance of attribute to reduce search space, and an efficient algorithm for simplifying energy consumption decision table, which further improve the computation efficiency. The experimental results show the speed of our algorithm gains up to 0.28X performance improvement over the traditional attribute reduction algorithm using Spark.","PeriodicalId":127689,"journal":{"name":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDCS.2017.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
Energy data, which consists of energy consumption statistics and other related data in green data centers, grows dramatically. The energy data has great value, but many attributes within it are redundant and unnecessary. Thus attribute reduction for the energy data has been conceived as a critical step. However, many existing attribute reduction algorithms are often computationally time-consuming. To address these issues, we extend the methodology of rough sets to construct data center energy consumption knowledge representation system. By taking good advantage of in-memory computing, an attribute reduction algorithm for energy data using Spark is proposed. In this algorithm, we use a heuristic formula for measuring the significance of attribute to reduce search space, and an efficient algorithm for simplifying energy consumption decision table, which further improve the computation efficiency. The experimental results show the speed of our algorithm gains up to 0.28X performance improvement over the traditional attribute reduction algorithm using Spark.