{"title":"分布式数据处理算法的能量复杂度模型","authors":"Jie Song;Xingchen Zhao;Chaopeng Guo;Yu Gu;Ge Yu","doi":"10.1109/TBDATA.2023.3284259","DOIUrl":null,"url":null,"abstract":"Modern data centers exist as infrastructure in the era of Big Data. Big data processing applications are the major computing workload of data centers. Electricity cost accounts for about 50% of data centers’ operational costs. Therefore, the energy consumed for running distributed data processing algorithms on a data center is starting to attract both academia and industry. Most works study the energy consumption from the hardware perspective and only a few of them from the algorithm perspective. A general and hardware-independent energy evaluation model for the algorithms is in demand. With the model, algorithm designers can evaluate the energy consumption, compare energy consumption features and facilitate energy consumption optimization of distributed data processing algorithms. Inspired by the time complexity model, we propose an energy complexity model for describing the trends that an algorithm's energy consumption grows with the algorithm's input size. We argue that a good algorithm, especially for processing Big Data, should have a ‘small’ energy complexity. We define \n<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\n to represent the functional relationship that associates an algorithm's input size \n<inline-formula><tex-math>$n$</tex-math></inline-formula>\n with its notional energy consumption \n<inline-formula><tex-math>$E$</tex-math></inline-formula>\n. Based on the well-known abstract Bulk Synchronous Parallel (BSP) computer and programming model, we present a complete \n<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\n solution, including abstraction, generalization, quantification, derivation, comparison, analysis, examples, verification, and applications. Comprehensive experimental analysis shows that the proposed energy complexity model is practical, interestingly, and not equivalent to time complexity.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"9 6","pages":"1510-1524"},"PeriodicalIF":7.5000,"publicationDate":"2023-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards an Energy Complexity Model for Distributed Data Processing Algorithms\",\"authors\":\"Jie Song;Xingchen Zhao;Chaopeng Guo;Yu Gu;Ge Yu\",\"doi\":\"10.1109/TBDATA.2023.3284259\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern data centers exist as infrastructure in the era of Big Data. Big data processing applications are the major computing workload of data centers. Electricity cost accounts for about 50% of data centers’ operational costs. Therefore, the energy consumed for running distributed data processing algorithms on a data center is starting to attract both academia and industry. Most works study the energy consumption from the hardware perspective and only a few of them from the algorithm perspective. A general and hardware-independent energy evaluation model for the algorithms is in demand. With the model, algorithm designers can evaluate the energy consumption, compare energy consumption features and facilitate energy consumption optimization of distributed data processing algorithms. Inspired by the time complexity model, we propose an energy complexity model for describing the trends that an algorithm's energy consumption grows with the algorithm's input size. We argue that a good algorithm, especially for processing Big Data, should have a ‘small’ energy complexity. We define \\n<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\\n to represent the functional relationship that associates an algorithm's input size \\n<inline-formula><tex-math>$n$</tex-math></inline-formula>\\n with its notional energy consumption \\n<inline-formula><tex-math>$E$</tex-math></inline-formula>\\n. Based on the well-known abstract Bulk Synchronous Parallel (BSP) computer and programming model, we present a complete \\n<inline-formula><tex-math>$E(n)$</tex-math></inline-formula>\\n solution, including abstraction, generalization, quantification, derivation, comparison, analysis, examples, verification, and applications. Comprehensive experimental analysis shows that the proposed energy complexity model is practical, interestingly, and not equivalent to time complexity.\",\"PeriodicalId\":13106,\"journal\":{\"name\":\"IEEE Transactions on Big Data\",\"volume\":\"9 6\",\"pages\":\"1510-1524\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2023-06-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Big Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10146456/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10146456/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Towards an Energy Complexity Model for Distributed Data Processing Algorithms
Modern data centers exist as infrastructure in the era of Big Data. Big data processing applications are the major computing workload of data centers. Electricity cost accounts for about 50% of data centers’ operational costs. Therefore, the energy consumed for running distributed data processing algorithms on a data center is starting to attract both academia and industry. Most works study the energy consumption from the hardware perspective and only a few of them from the algorithm perspective. A general and hardware-independent energy evaluation model for the algorithms is in demand. With the model, algorithm designers can evaluate the energy consumption, compare energy consumption features and facilitate energy consumption optimization of distributed data processing algorithms. Inspired by the time complexity model, we propose an energy complexity model for describing the trends that an algorithm's energy consumption grows with the algorithm's input size. We argue that a good algorithm, especially for processing Big Data, should have a ‘small’ energy complexity. We define
$E(n)$
to represent the functional relationship that associates an algorithm's input size
$n$
with its notional energy consumption
$E$
. Based on the well-known abstract Bulk Synchronous Parallel (BSP) computer and programming model, we present a complete
$E(n)$
solution, including abstraction, generalization, quantification, derivation, comparison, analysis, examples, verification, and applications. Comprehensive experimental analysis shows that the proposed energy complexity model is practical, interestingly, and not equivalent to time complexity.
期刊介绍:
The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.