Flow-based microfluidic biochips (FBMBs) have attracted much attention over the past decade. On such a micrometer-scale platform, various biochemical applications, also called bioas-says, can be processed concurrently and automatically. To improve execution efficiency and reduce fabrication cost, a distributed channel-storage architecture (DCSA) can be implemented on this platform, where fluid samples can be cached temporarily in flow channels close to components. Although DCSA can improve the execution efficiency of FBMBs significantly, it requires a careful arrangement of fluid samples to enable the channels to fulfill the dual functions of transportation and caching. In this paper, we formulate the first flow-layer physical design problem considering DCSA, and propose a top-down synthesis algorithm to generate efficient solutions considering execution efficiency, washing, and resource usage simultaneously. Experimental results demonstrate that the proposed algorithm leads to a shorter execution time, less flow-channel length, and a higher efficiency of on-chip resource utilization for biochemical applications compared with a direct approach to incorporate distributed storage into existing frameworks.
{"title":"Physical Synthesis of Flow-Based Microfluidic Biochips Considering Distributed Channel Storage","authors":"Zhisheng Chen, Xing Huang, Wenzhong Guo, Bing Li, Tsung-Yi Ho, Ulf Schlichtmann","doi":"10.23919/DATE.2019.8715269","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715269","url":null,"abstract":"Flow-based microfluidic biochips (FBMBs) have attracted much attention over the past decade. On such a micrometer-scale platform, various biochemical applications, also called bioas-says, can be processed concurrently and automatically. To improve execution efficiency and reduce fabrication cost, a distributed channel-storage architecture (DCSA) can be implemented on this platform, where fluid samples can be cached temporarily in flow channels close to components. Although DCSA can improve the execution efficiency of FBMBs significantly, it requires a careful arrangement of fluid samples to enable the channels to fulfill the dual functions of transportation and caching. In this paper, we formulate the first flow-layer physical design problem considering DCSA, and propose a top-down synthesis algorithm to generate efficient solutions considering execution efficiency, washing, and resource usage simultaneously. Experimental results demonstrate that the proposed algorithm leads to a shorter execution time, less flow-channel length, and a higher efficiency of on-chip resource utilization for biochemical applications compared with a direct approach to incorporate distributed storage into existing frameworks.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"107 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123246042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8715108
A. O. Glova, Itir Akgun, Shuangchen Li, Xing Hu, Yuan Xie
Homomorphic encryption is a promising technology for enabling various privacy-preserving applications such as secure biomarker search. However, current implementations are not practical due to large performance overheads. A homomorphic encryption scheme has recently been proposed that allows bitwise comparison without the computationally-intensive multiplication and bootstrapping operations. Even so, this scheme still suffers from memory-bound performance bottleneck due to large ciphertext expansion. In this work, we propose HEGA, a near-data processing architecture that leverages this scheme with 3D-stacked memory to accelerate privacy-preserving biomarker search. We observe that homomorphic encryption-based search, like other emerging applications, can greatly benefit from the large throughput, capacity, and energy savings of 3D-stacked memory-based near-data processing architectures. Our near-data acceleration solution can speed up biomarker search by 6.3 × with 5.7× energy savings compared to an 8-core Intel Xeon processor.
{"title":"Near-Data Acceleration of Privacy-Preserving Biomarker Search with 3D-Stacked Memory","authors":"A. O. Glova, Itir Akgun, Shuangchen Li, Xing Hu, Yuan Xie","doi":"10.23919/DATE.2019.8715108","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715108","url":null,"abstract":"Homomorphic encryption is a promising technology for enabling various privacy-preserving applications such as secure biomarker search. However, current implementations are not practical due to large performance overheads. A homomorphic encryption scheme has recently been proposed that allows bitwise comparison without the computationally-intensive multiplication and bootstrapping operations. Even so, this scheme still suffers from memory-bound performance bottleneck due to large ciphertext expansion. In this work, we propose HEGA, a near-data processing architecture that leverages this scheme with 3D-stacked memory to accelerate privacy-preserving biomarker search. We observe that homomorphic encryption-based search, like other emerging applications, can greatly benefit from the large throughput, capacity, and energy savings of 3D-stacked memory-based near-data processing architectures. Our near-data acceleration solution can speed up biomarker search by 6.3 × with 5.7× energy savings compared to an 8-core Intel Xeon processor.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129917480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8715273
Mikel Fernández, Gabriel Fernandez, J. Abella, F. Cazorla
The ability to produce early guaranteed performance (worst-case execution time) estimates for multicores, i.e. before software from different providers gets integrated onto the same critical system, is pivotal. This helps reducing lately-detected costly-to-handle timing violations. An existing methodology creates ‘copy’ (surrogate) applications from the execution in isolation of each target application. Surrogate applications can be used to upperbound multicore contention delay, and hence WCET estimates in multicores. However, this methodology has only been shown to work on a simulation environment. In this paper we show the work we have carried out to adapt this technology to a real multicore processor for the space domain.
{"title":"Multicore Early Design Stage Guaranteed Performance Estimates for the Space Domain","authors":"Mikel Fernández, Gabriel Fernandez, J. Abella, F. Cazorla","doi":"10.23919/DATE.2019.8715273","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715273","url":null,"abstract":"The ability to produce early guaranteed performance (worst-case execution time) estimates for multicores, i.e. before software from different providers gets integrated onto the same critical system, is pivotal. This helps reducing lately-detected costly-to-handle timing violations. An existing methodology creates ‘copy’ (surrogate) applications from the execution in isolation of each target application. Surrogate applications can be used to upperbound multicore contention delay, and hence WCET estimates in multicores. However, this methodology has only been shown to work on a simulation environment. In this paper we show the work we have carried out to adapt this technology to a real multicore processor for the space domain.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129977133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8715167
Patsy Cadareanu, C. N.Reddy, C. G. Almudever, A. Khanna, A. Raychowdhury, S. Datta, K. Bertels, Vijayakrishan Narayanan, M. Ventra, P. Gaillardon
Innovative and new computing paradigms must be considered as we reach the limits of von Neumann computing caused by the growth in necessary data processing. This paper provides an introduction to three emerging computing models that have established themselves as likely post-CMOS and post-von Neumann solutions. The first of these ideas is quantum computing, for which we discuss the challenges and potential of quantum computer architectures. Next, a computational system using intrinsic oscillators is introduced and an example is provided which shows its superiority in comparison to a typical von Neumann computational system. Finally, digital memcomputing using self-organizing logic gates is explained and then discussed as a method for optimization problems and machine learning.
{"title":"Rebooting Our Computing Models","authors":"Patsy Cadareanu, C. N.Reddy, C. G. Almudever, A. Khanna, A. Raychowdhury, S. Datta, K. Bertels, Vijayakrishan Narayanan, M. Ventra, P. Gaillardon","doi":"10.23919/DATE.2019.8715167","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715167","url":null,"abstract":"Innovative and new computing paradigms must be considered as we reach the limits of von Neumann computing caused by the growth in necessary data processing. This paper provides an introduction to three emerging computing models that have established themselves as likely post-CMOS and post-von Neumann solutions. The first of these ideas is quantum computing, for which we discuss the challenges and potential of quantum computer architectures. Next, a computational system using intrinsic oscillators is introduced and an example is provided which shows its superiority in comparison to a typical von Neumann computational system. Finally, digital memcomputing using self-organizing logic gates is explained and then discussed as a method for optimization problems and machine learning.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123276941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8714974
Martin Rapp, A. Pathania, T. Mitra, J. Henkel
Performance of a task running on a many-core with distributed shared Last-Level Cache (LLC) strongly depends on two factors: the power budget needed to guarantee thermally safe operation and the LLC latency. The task’s thread-to-core mapping determines both the factors. Arrival and departure of tasks on a many-core deployed in an open system can change its state significantly in terms of available cores and power budget. Task migrations can thereupon be used as a tool to keep the many-core operating at the peak performance. Furthermore, the relative impacts of power budget and LLC latency on a task’s performance can change with its different execution phases mandating its migration on-the-fly.We propose the first run-time algorithm PCMig that increases the performance of a many-core with distributed shared LLC by migrating tasks based on their phases and the many-core’s state. PCMig is based on a performance-prediction model that predicts the performance impact of migrations. PCMig results in up to 16 % reduction in the average response time compared to the state-of-the-art.
{"title":"Prediction-Based Task Migration on S-NUCA Many-Cores","authors":"Martin Rapp, A. Pathania, T. Mitra, J. Henkel","doi":"10.23919/DATE.2019.8714974","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714974","url":null,"abstract":"Performance of a task running on a many-core with distributed shared Last-Level Cache (LLC) strongly depends on two factors: the power budget needed to guarantee thermally safe operation and the LLC latency. The task’s thread-to-core mapping determines both the factors. Arrival and departure of tasks on a many-core deployed in an open system can change its state significantly in terms of available cores and power budget. Task migrations can thereupon be used as a tool to keep the many-core operating at the peak performance. Furthermore, the relative impacts of power budget and LLC latency on a task’s performance can change with its different execution phases mandating its migration on-the-fly.We propose the first run-time algorithm PCMig that increases the performance of a many-core with distributed shared LLC by migrating tasks based on their phases and the many-core’s state. PCMig is based on a performance-prediction model that predicts the performance impact of migrations. PCMig results in up to 16 % reduction in the average response time compared to the state-of-the-art.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133053437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8714987
Pascal Fradet, A. Girault, R. Krishnaswamy, X. Nicollin, Arash Shafiei
Dataflow Models of Computation (MoCs) are widely used in embedded systems, including multimedia processing, digital signal processing, telecommunications, and automatic control. In a dataflow MoC, an application is specified as a graph of actors connected by FIFO channels. One of the most popular dataflow MoCs, Synchronous Dataflow (SDF), provides static analyses to guarantee boundedness and liveness, which are key properties for embedded systems. However, SDF (and most of its variants) lacks the capability to express the dynamism needed by modern streaming applications. In particular, the applications mentioned above have a strong need for reconfigurability to accommodate changes in the input data, the control objectives, or the environment.We address this need by proposing a new MoC called Reconfigurable Dataflow (RDF). RDF extends SDF with transformation rules that specify how the topology and actors of the graph may be reconfigured. Starting from an initial RDF graph and a set of transformation rules, an arbitrary number of new RDF graphs can be generated at runtime. A key feature of RDF is that it can be statically analyzed to guarantee that all possible graphs generated at runtime will be consistent and live. We introduce the RDF MoC, describe its associated static analyses, and outline its implementation.
{"title":"RDF: Reconfigurable Dataflow","authors":"Pascal Fradet, A. Girault, R. Krishnaswamy, X. Nicollin, Arash Shafiei","doi":"10.23919/DATE.2019.8714987","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714987","url":null,"abstract":"Dataflow Models of Computation (MoCs) are widely used in embedded systems, including multimedia processing, digital signal processing, telecommunications, and automatic control. In a dataflow MoC, an application is specified as a graph of actors connected by FIFO channels. One of the most popular dataflow MoCs, Synchronous Dataflow (SDF), provides static analyses to guarantee boundedness and liveness, which are key properties for embedded systems. However, SDF (and most of its variants) lacks the capability to express the dynamism needed by modern streaming applications. In particular, the applications mentioned above have a strong need for reconfigurability to accommodate changes in the input data, the control objectives, or the environment.We address this need by proposing a new MoC called Reconfigurable Dataflow (RDF). RDF extends SDF with transformation rules that specify how the topology and actors of the graph may be reconfigured. Starting from an initial RDF graph and a set of transformation rules, an arbitrary number of new RDF graphs can be generated at runtime. A key feature of RDF is that it can be statically analyzed to guarantee that all possible graphs generated at runtime will be consistent and live. We introduce the RDF MoC, describe its associated static analyses, and outline its implementation.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131337476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8715289
Aman Goel, K. Sakallah
IC3-based algorithms have emerged as effective scalable approaches for hardware model checking. In this paper we evaluate six implementations of IC3-based model checkers on a diverse set of publicly-available and proprietary industrial Verilog RTL designs. Four of the six verifiers we examined operate at the bit level and two employ abstraction to take advantage of word-level RTL semantics. Overall, the word-level verifier employing data abstraction outperformed the others, especially on the large industrial designs. The analysis helped us identify several key insights on the techniques underlying these tools, their strengths and weaknesses, differences and commonalities, and opportunities for improvement.
{"title":"Empirical Evaluation of IC3-Based Model Checking Techniques on Verilog RTL Designs","authors":"Aman Goel, K. Sakallah","doi":"10.23919/DATE.2019.8715289","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715289","url":null,"abstract":"IC3-based algorithms have emerged as effective scalable approaches for hardware model checking. In this paper we evaluate six implementations of IC3-based model checkers on a diverse set of publicly-available and proprietary industrial Verilog RTL designs. Four of the six verifiers we examined operate at the bit level and two employ abstraction to take advantage of word-level RTL semantics. Overall, the word-level verifier employing data abstraction outperformed the others, especially on the large industrial designs. The analysis helped us identify several key insights on the techniques underlying these tools, their strengths and weaknesses, differences and commonalities, and opportunities for improvement.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":" 14","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113946125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8715181
Alma Pröbstl, Sangyoung Park, S. Steinhorst, S. Chakraborty
The smart energy grid features real-time monitoring of electricity usage such that it can control the generation and distribution of electricity as well as utilize dynamic pricing in response to the demands. For this purpose, smart metering systems continuously monitor the electricity usage of customers, and report it back to the Utility Provider (UP). This raises privacy concerns regarding the undesired exposure of human activity and time-of-use of home appliances. Photovoltaics (PV) and a residential Electrical Energy Storage (EES) have proven to be effective in mitigating the privacy concerns. However, this comes at several costs: Installation of PV and EES, their subsequent aging and the possibly increased electricity cost. We quantify the trade-off between privacy exposure and financial costs by formulating a stochastic dynamic programming problem. Our analysis shows that i) there is a quantifiable trade-off between the financial cost and privacy leakage, ii) proper control of the system is crucial for both metrics, iii) a strategy solely focusing on privacy results in high financial costs, and iv) that for a typical residential setting, the costs for a trade-off solution lie in the range of 600 US$-1700 US$. As the load flattening has a peak shaving effect desirable for UPs, increasing privacy is mutually beneficial for both, customers and UPs.
{"title":"Cost/Privacy Co-optimization in Smart Energy Grids","authors":"Alma Pröbstl, Sangyoung Park, S. Steinhorst, S. Chakraborty","doi":"10.23919/DATE.2019.8715181","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715181","url":null,"abstract":"The smart energy grid features real-time monitoring of electricity usage such that it can control the generation and distribution of electricity as well as utilize dynamic pricing in response to the demands. For this purpose, smart metering systems continuously monitor the electricity usage of customers, and report it back to the Utility Provider (UP). This raises privacy concerns regarding the undesired exposure of human activity and time-of-use of home appliances. Photovoltaics (PV) and a residential Electrical Energy Storage (EES) have proven to be effective in mitigating the privacy concerns. However, this comes at several costs: Installation of PV and EES, their subsequent aging and the possibly increased electricity cost. We quantify the trade-off between privacy exposure and financial costs by formulating a stochastic dynamic programming problem. Our analysis shows that i) there is a quantifiable trade-off between the financial cost and privacy leakage, ii) proper control of the system is crucial for both metrics, iii) a strategy solely focusing on privacy results in high financial costs, and iv) that for a typical residential setting, the costs for a trade-off solution lie in the range of 600 US$-1700 US$. As the load flattening has a peak shaving effect desirable for UPs, increasing privacy is mutually beneficial for both, customers and UPs.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122939309","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8714822
L. Auer, Christian Skubich, Matthias Hiller
New IoT applications are demanding for more and more performance in embedded devices while their deployment and operation poses strict power constraints. We present the security concept for a customizable Internet of Things (IoT) platform based on the RISC-V ISA and developed by several Fraunhofer Institutes. It integrates a range of peripherals with a scalable computing subsystem as a three dimensional System-in-Package (3D-SiP).The security features aim for a medium security level and target the requirements of the IoT market. Our security architecture extends given implementations to enable secure deployment, operation, and update. Core security features are secure boot, an authenticated watchdog timer, and key management.The Universal Sensor Platform (USeP) SoC is developed for GLOBALFOUNDRIES’ 22FDX technology and aims to provide a platform for Small and Medium-sized Enterprises (SMEs) that typically do not have access to advanced microelectronics and integration know-how, and are therefore limited to Commercial Off-The-Shelf (COTS) products.
{"title":"A Security Architecture for RISC-V based IoT Devices","authors":"L. Auer, Christian Skubich, Matthias Hiller","doi":"10.23919/DATE.2019.8714822","DOIUrl":"https://doi.org/10.23919/DATE.2019.8714822","url":null,"abstract":"New IoT applications are demanding for more and more performance in embedded devices while their deployment and operation poses strict power constraints. We present the security concept for a customizable Internet of Things (IoT) platform based on the RISC-V ISA and developed by several Fraunhofer Institutes. It integrates a range of peripherals with a scalable computing subsystem as a three dimensional System-in-Package (3D-SiP).The security features aim for a medium security level and target the requirements of the IoT market. Our security architecture extends given implementations to enable secure deployment, operation, and update. Core security features are secure boot, an authenticated watchdog timer, and key management.The Universal Sensor Platform (USeP) SoC is developed for GLOBALFOUNDRIES’ 22FDX technology and aims to provide a platform for Small and Medium-sized Enterprises (SMEs) that typically do not have access to advanced microelectronics and integration know-how, and are therefore limited to Commercial Off-The-Shelf (COTS) products.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"539 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121457282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-03-25DOI: 10.23919/DATE.2019.8715033
Arman Iranfar, A. Pahlevan, Marina Zapater, David Atienza Alonso
The power density and, consequently, power hungriness of server processors is growing by the day. Traditional air cooling systems fail to cope with such high heat densities, whereas single-phase liquid-cooling still requires high mass flow-rate, high pumping power, and large facility size. On the contrary, in a micro-scale gravity-driven thermosyphon attached on top of a processor, the refrigerant, absorbing the heat, turns into a two-phase mixture. The vapor-liquid mixture exchanges heat with a coolant at the condenser side, turns back to liquid state, and descends thanks to gravity, eliminating the need for pumping power. However, similar to other cooling technologies, thermosyphon efficiency can considerably vary with respect to workload performance requirements and thermal profile, in addition to the platform features, such as packaging and die floorplan. In this work, we first address the workload- and platform-aware design of a two-phase thermosyphon. Then, we propose a thermal-aware workload mapping strategy considering the potential and limitations of a two-phase thermosyphon to further minimize hot spots and spatial thermal gradients. Our experiments, performed on an 8-core Intel Xeon E5 CPU reveal, on average, up to 10°C reduction in thermal hot spots, and 45% reduction in the maximum spatial thermal gradient on the die. Moreover, our design and mapping strategy are able to decrease the chiller cooling power at least by 45%.
{"title":"Enhancing Two-Phase Cooling Efficiency through Thermal-Aware Workload Mapping for Power-Hungry Servers","authors":"Arman Iranfar, A. Pahlevan, Marina Zapater, David Atienza Alonso","doi":"10.23919/DATE.2019.8715033","DOIUrl":"https://doi.org/10.23919/DATE.2019.8715033","url":null,"abstract":"The power density and, consequently, power hungriness of server processors is growing by the day. Traditional air cooling systems fail to cope with such high heat densities, whereas single-phase liquid-cooling still requires high mass flow-rate, high pumping power, and large facility size. On the contrary, in a micro-scale gravity-driven thermosyphon attached on top of a processor, the refrigerant, absorbing the heat, turns into a two-phase mixture. The vapor-liquid mixture exchanges heat with a coolant at the condenser side, turns back to liquid state, and descends thanks to gravity, eliminating the need for pumping power. However, similar to other cooling technologies, thermosyphon efficiency can considerably vary with respect to workload performance requirements and thermal profile, in addition to the platform features, such as packaging and die floorplan. In this work, we first address the workload- and platform-aware design of a two-phase thermosyphon. Then, we propose a thermal-aware workload mapping strategy considering the potential and limitations of a two-phase thermosyphon to further minimize hot spots and spatial thermal gradients. Our experiments, performed on an 8-core Intel Xeon E5 CPU reveal, on average, up to 10°C reduction in thermal hot spots, and 45% reduction in the maximum spatial thermal gradient on the die. Moreover, our design and mapping strategy are able to decrease the chiller cooling power at least by 45%.","PeriodicalId":445778,"journal":{"name":"2019 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128934555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}