Gregor Schiele, Alwyn Burger, Christopher Cichiwskyj
While adaptive hardware acceleration shows huge potential for autonomic IoT applications, developing and experimenting with accelerators in embedded environments is still very challenging. For this reason we developed a novel experimentation platform, the Elastic Node Platform, which we present in this paper. It consists of a wireless embedded device with an 8-bit micro-controller and a low-energy embedded FPGA in combination with a minimal abstraction middleware. The main goal of our platform is to empower researchers and software developers without hardware design knowledge to experiment with adaptive hardware acceleration. We explain our design, show how to use it for developing experiments and evaluate its performance.
{"title":"The Elastic Node: An Experimentation Platform for Hardware Accelerator Research in the Internet of Things","authors":"Gregor Schiele, Alwyn Burger, Christopher Cichiwskyj","doi":"10.1109/ICAC.2019.00020","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00020","url":null,"abstract":"While adaptive hardware acceleration shows huge potential for autonomic IoT applications, developing and experimenting with accelerators in embedded environments is still very challenging. For this reason we developed a novel experimentation platform, the Elastic Node Platform, which we present in this paper. It consists of a wireless embedded device with an 8-bit micro-controller and a low-energy embedded FPGA in combination with a minimal abstraction middleware. The main goal of our platform is to empower researchers and software developers without hardware design knowledge to experiment with adaptive hardware acceleration. We explain our design, show how to use it for developing experiments and evaluate its performance.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127644508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable-e.g., for rapidly evaluating new model designs-they often come with significantly higher monetary costs due to sublinear scalability. In this paper, we investigate the feasibility of using training clusters composed of cheaper transient GPU servers to get the benefits of distributed training without the high costs. We conduct the first large-scale empirical analysis, launching more than a thousand GPU servers of various capacities, aimed at understanding the characteristics of transient GPU servers and their impact on distributed training performance. Our study demonstrates the potential of transient servers with a speedup of 7.7X with more than 62.9% monetary savings for some cluster configurations. We also identify a number of important challenges and opportunities for redesigning distributed training frameworks to be transient-aware. For example, the dynamic cost and availability characteristics of transient servers suggest the need for frameworks to dynamically change cluster configurations to best take advantage of current conditions.
{"title":"Speeding up Deep Learning with Transient Servers","authors":"Shijian Li, R. Walls, Lijie Xu, Tian Guo","doi":"10.1109/ICAC.2019.00024","DOIUrl":"https://doi.org/10.1109/ICAC.2019.00024","url":null,"abstract":"Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable-e.g., for rapidly evaluating new model designs-they often come with significantly higher monetary costs due to sublinear scalability. In this paper, we investigate the feasibility of using training clusters composed of cheaper transient GPU servers to get the benefits of distributed training without the high costs. We conduct the first large-scale empirical analysis, launching more than a thousand GPU servers of various capacities, aimed at understanding the characteristics of transient GPU servers and their impact on distributed training performance. Our study demonstrates the potential of transient servers with a speedup of 7.7X with more than 62.9% monetary savings for some cluster configurations. We also identify a number of important challenges and opportunities for redesigning distributed training frameworks to be transient-aware. For example, the dynamic cost and availability characteristics of transient servers suggest the need for frameworks to dynamically change cluster configurations to best take advantage of current conditions.","PeriodicalId":442645,"journal":{"name":"2019 IEEE International Conference on Autonomic Computing (ICAC)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116824373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}