RSA cryptographic algorithm, working as a security tool for many years, has long achieved cryptographic and market maturity. However, as all crypto algorithms, RSA implementations, after the discovery and wide spread of Side Channel Attacks (SCA), are susceptible to a wide variety of different attacks that target the hardware structure rather than the algorithm itself. While there are a wide range of countermeasures that can be applied on the RSA structure in order to protect the algorithm from SCAs, combining several such measures in order to guarantee an SCA resistant RSA design is not an easy job. There are many incompatibility issues among SCA protection methods as well as an extensive performance cost added to an SCA secure RSA implementation. In this paper, we address some very popular and potent SCAs against RSA like Fault attacks (FA), Simple Power attacks (SPA), Doubling attacks (DA) and Differential Power attacks (DPA), and propose an algorithmic modification of RSA based on Chinese Remainder Theorem (CRT) that can thwart those attacks. We describe an implementation approach based on Montgomery modular multiplication and propose a hardware architecture for a SCA resistant CRT RSA that is structured on our proposed algorithm. The designed architecture is imPublic Key Cryptography, VLSI Design, Side Channel Attack Resistance, Modular Exponentiation, plemented in FPGA technology and results on its time and space complexity are extracted and evaluated.
{"title":"Efficient CRT RSA with SCA Countermeasures","authors":"A. Fournaris, O. Koufopavlou","doi":"10.1109/DSD.2011.81","DOIUrl":"https://doi.org/10.1109/DSD.2011.81","url":null,"abstract":"RSA cryptographic algorithm, working as a security tool for many years, has long achieved cryptographic and market maturity. However, as all crypto algorithms, RSA implementations, after the discovery and wide spread of Side Channel Attacks (SCA), are susceptible to a wide variety of different attacks that target the hardware structure rather than the algorithm itself. While there are a wide range of countermeasures that can be applied on the RSA structure in order to protect the algorithm from SCAs, combining several such measures in order to guarantee an SCA resistant RSA design is not an easy job. There are many incompatibility issues among SCA protection methods as well as an extensive performance cost added to an SCA secure RSA implementation. In this paper, we address some very popular and potent SCAs against RSA like Fault attacks (FA), Simple Power attacks (SPA), Doubling attacks (DA) and Differential Power attacks (DPA), and propose an algorithmic modification of RSA based on Chinese Remainder Theorem (CRT) that can thwart those attacks. We describe an implementation approach based on Montgomery modular multiplication and propose a hardware architecture for a SCA resistant CRT RSA that is structured on our proposed algorithm. The designed architecture is imPublic Key Cryptography, VLSI Design, Side Channel Attack Resistance, Modular Exponentiation, plemented in FPGA technology and results on its time and space complexity are extracted and evaluated.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a scalable asynchronous distributed control network. The control circuit allows for true asynchronous operation of all digital resources and as a result of its scalable distributed topology allows unlimited resource sharing. We start with the description of a data flow graph, and using traditional scheduling algorithms, generate an asynchronous distributed control network and the asynchronous data path. The distributed controllers are implemented such that they can be created by connecting a small number of pre-designed sub-controllers which are presented in this paper. Prototype IP-blocks of these sub-controller circuits have been designed in a 90nm ASIC design process. To prove the effectiveness of our method, we present some key performance parameters: area and power under timing constraints.
{"title":"A Scalable Distributed Asynchronous Control Network for High Level Synthesis of Digital Circuits","authors":"T. V. Leeuwen, R. V. Leuken","doi":"10.1109/DSD.2011.114","DOIUrl":"https://doi.org/10.1109/DSD.2011.114","url":null,"abstract":"This paper presents a scalable asynchronous distributed control network. The control circuit allows for true asynchronous operation of all digital resources and as a result of its scalable distributed topology allows unlimited resource sharing. We start with the description of a data flow graph, and using traditional scheduling algorithms, generate an asynchronous distributed control network and the asynchronous data path. The distributed controllers are implemented such that they can be created by connecting a small number of pre-designed sub-controllers which are presented in this paper. Prototype IP-blocks of these sub-controller circuits have been designed in a 90nm ASIC design process. To prove the effectiveness of our method, we present some key performance parameters: area and power under timing constraints.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132077182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Nelson, Orlando Moreira, A. Molnos, S. Stuijk, B. T. Nguyen, K. Goossens
Energy efficient execution of applications is important for many reasons, e.g. time between battery charges, device temperature. Voltage and Frequency Scaling (VFS) enables applications to be run at lower frequencies on hardware resources thereby consuming less power. Real-time applications have deadlines that must be met otherwise their output is devalued. Dataflow modelling of real-time applications enables off-line verification of the application's temporal requirements. In this paper we describe a method to reduce the combined static and dynamic energy consumption using a Dynamic VFS (DVFS) technique for dataflow modelled real-time applications that may be mapped onto multiple hardware resources. We achieve this by using an application's static slack in order to perform DVFS while still satisfying the application's temporal requirements. We show that by formulating a dataflow modelled application and its mapping as a convex optimisation problem, with energy consumption as the objective function, the problem can be solved with a generic convex optimisation solver, producing an energy optimal constant frequency per application task. Our method allows task frequencies to be constrained such that, e.g. one frequency per application or per processor may be achieved.
{"title":"Power Minimisation for Real-Time Dataflow Applications","authors":"Andrew Nelson, Orlando Moreira, A. Molnos, S. Stuijk, B. T. Nguyen, K. Goossens","doi":"10.1109/DSD.2011.19","DOIUrl":"https://doi.org/10.1109/DSD.2011.19","url":null,"abstract":"Energy efficient execution of applications is important for many reasons, e.g. time between battery charges, device temperature. Voltage and Frequency Scaling (VFS) enables applications to be run at lower frequencies on hardware resources thereby consuming less power. Real-time applications have deadlines that must be met otherwise their output is devalued. Dataflow modelling of real-time applications enables off-line verification of the application's temporal requirements. In this paper we describe a method to reduce the combined static and dynamic energy consumption using a Dynamic VFS (DVFS) technique for dataflow modelled real-time applications that may be mapped onto multiple hardware resources. We achieve this by using an application's static slack in order to perform DVFS while still satisfying the application's temporal requirements. We show that by formulating a dataflow modelled application and its mapping as a convex optimisation problem, with energy consumption as the objective function, the problem can be solved with a generic convex optimisation solver, producing an energy optimal constant frequency per application task. Our method allows task frequencies to be constrained such that, e.g. one frequency per application or per processor may be achieved.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131677584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents performance and reliability evaluation of deterministic and adaptive fault-tolerant routing algorithms used in Network-on-Chip (NoC) designs. The investigated methods have a multi-level fault-tolerance capability and therefore can be separately evaluated. To illustrate the effectiveness of these methods, we conduct appropriate simulations on different applications for performance evaluation. But, for reliability assessment, we propose an analytical approach based on combinatorial reliability models to show the effect of fault-tolerant routing algorithms on overall NoC reliability.
{"title":"Evaluation of Fault-Tolerant Routing Methods for NoC Architectures","authors":"M. Valinataj","doi":"10.1109/DSD.2011.63","DOIUrl":"https://doi.org/10.1109/DSD.2011.63","url":null,"abstract":"This paper presents performance and reliability evaluation of deterministic and adaptive fault-tolerant routing algorithms used in Network-on-Chip (NoC) designs. The investigated methods have a multi-level fault-tolerance capability and therefore can be separately evaluated. To illustrate the effectiveness of these methods, we conduct appropriate simulations on different applications for performance evaluation. But, for reliability assessment, we propose an analytical approach based on combinatorial reliability models to show the effect of fault-tolerant routing algorithms on overall NoC reliability.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120880666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Programmable data-parallel embedded systems are typically associated with tasks such as image processing, video decoding, and software-defined radio. This talk is particularly focused on designs for resource-constrained mobile and consumer devices. Today, heterogeneous multi-core designs are hailed as the solution, and many research teams claim to work on this topic. However, the heterogeneous processing often stays at the level of combining many RISCs with many DSPs or similarly adapted processors, which should actually still be classified as a homogeneous. In order to really compete with hardwired designs, extremely high efficiency is required. In this talk, we will show how the required levels of efficiency are obtained by building systems which consist of limited sets of highly parallel purpose-built processors, and by ensuring that these systems are programmed to efficiently utilize the available compute resources.
{"title":"The Future of Data-Parallel Embedded Systems (Abstract)","authors":"M. Lindwer","doi":"10.1109/DSD.2011.118","DOIUrl":"https://doi.org/10.1109/DSD.2011.118","url":null,"abstract":"Programmable data-parallel embedded systems are typically associated with tasks such as image processing, video decoding, and software-defined radio. This talk is particularly focused on designs for resource-constrained mobile and consumer devices. Today, heterogeneous multi-core designs are hailed as the solution, and many research teams claim to work on this topic. However, the heterogeneous processing often stays at the level of combining many RISCs with many DSPs or similarly adapted processors, which should actually still be classified as a homogeneous. In order to really compete with hardwired designs, extremely high efficiency is required. In this talk, we will show how the required levels of efficiency are obtained by building systems which consist of limited sets of highly parallel purpose-built processors, and by ensuring that these systems are programmed to efficiently utilize the available compute resources.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115899459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Leveraging utilization of the shared caches of multicore processors is one of the heavily studied topics of today's chip multiprocessing community. Providing a scheduling mechanism that maximizes throughput by reducing miss-rates of shared caches and preserves the fairness of processor usage is in the center of this problem. Proposed scheduling algorithms in this field usually take advantage of thread level properties of software providing modifications at operating system level. In our study we choose to approach the problem from a different perspective and use software models to guide operating system to effectively map software's objects onto processor cores. In an object oriented software objects collaborate on fulfilling jobs and they may operate on common data. Our scheduling method takes class dependencies into account and tries to schedule objects of coupled classes onto cores that share the common cache. This paper presents case studies on implementations of three software design patterns(Strategy, Visitor and Observer) and an image filtering software implementation. During our experiments we use our cache-aware scheduler in guiding Linux's completely fair scheduler (CFS) to perform more cache-aware schedules and decrease running time around 10. Our results promise that guiding/restricting operating system's scheduler using class-relational information present in the object oriented software model can be fruitful in increasing software performance on multicore processors.
{"title":"Model Driven Cache-Aware Scheduling of Object Oriented Software for Chip Multiprocessors","authors":"T. Ovatman, F. Buzluca","doi":"10.1109/DSD.2011.96","DOIUrl":"https://doi.org/10.1109/DSD.2011.96","url":null,"abstract":"Leveraging utilization of the shared caches of multicore processors is one of the heavily studied topics of today's chip multiprocessing community. Providing a scheduling mechanism that maximizes throughput by reducing miss-rates of shared caches and preserves the fairness of processor usage is in the center of this problem. Proposed scheduling algorithms in this field usually take advantage of thread level properties of software providing modifications at operating system level. In our study we choose to approach the problem from a different perspective and use software models to guide operating system to effectively map software's objects onto processor cores. In an object oriented software objects collaborate on fulfilling jobs and they may operate on common data. Our scheduling method takes class dependencies into account and tries to schedule objects of coupled classes onto cores that share the common cache. This paper presents case studies on implementations of three software design patterns(Strategy, Visitor and Observer) and an image filtering software implementation. During our experiments we use our cache-aware scheduler in guiding Linux's completely fair scheduler (CFS) to perform more cache-aware schedules and decrease running time around 10. Our results promise that guiding/restricting operating system's scheduler using class-relational information present in the object oriented software model can be fruitful in increasing software performance on multicore processors.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125858610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jesús Camacho Villanueva, J. Flich, J. Duato, H. Eberle, W. Olesinski
In this paper, we present a flexible network on-chip topology: NR-Mesh (Nearest neighbor Mesh). The topology gives an end node the choice to inject a message through different neighboring routers, thereby reducing hop count and saving latency. At the receiver side, a message may be delivered to the end node through different routers, thus reducing hop count further and increasing flexibility when routing messages. This flexibility allows for maximizing network components to be in switch off mode, thus enabling power aware routing algorithms. Additional benefits are reduced congestion/contention levels in the network, support for efficient broadcast operations, savings in power consumption, and partial fault-tolerance. Our second contribution is a power management technique for the adaptive routing. This technique turns router ports and their attached links on and off depending on traffic conditions. The power management technique is able to achieve significant power savings when there is low traffic in the network. We further compare the new topology with the 2D-Mesh, using either deterministic or adaptive routing. When compared with the 2D-Mesh using deterministic routing, executing real applications in a full system simulation platform, the NR-Mesh topology using adaptive routing is able to obtain significant savings, 7% of reduction in execution time and 75% in energy consumption at the network on average for a 16-Node CMP System. Similar numbers are achieved for a 32-Node CMP system.
{"title":"Towards an Efficient NoC Topology through Multiple Injection Ports","authors":"Jesús Camacho Villanueva, J. Flich, J. Duato, H. Eberle, W. Olesinski","doi":"10.1109/DSD.2011.25","DOIUrl":"https://doi.org/10.1109/DSD.2011.25","url":null,"abstract":"In this paper, we present a flexible network on-chip topology: NR-Mesh (Nearest neighbor Mesh). The topology gives an end node the choice to inject a message through different neighboring routers, thereby reducing hop count and saving latency. At the receiver side, a message may be delivered to the end node through different routers, thus reducing hop count further and increasing flexibility when routing messages. This flexibility allows for maximizing network components to be in switch off mode, thus enabling power aware routing algorithms. Additional benefits are reduced congestion/contention levels in the network, support for efficient broadcast operations, savings in power consumption, and partial fault-tolerance. Our second contribution is a power management technique for the adaptive routing. This technique turns router ports and their attached links on and off depending on traffic conditions. The power management technique is able to achieve significant power savings when there is low traffic in the network. We further compare the new topology with the 2D-Mesh, using either deterministic or adaptive routing. When compared with the 2D-Mesh using deterministic routing, executing real applications in a full system simulation platform, the NR-Mesh topology using adaptive routing is able to obtain significant savings, 7% of reduction in execution time and 75% in energy consumption at the network on average for a 16-Node CMP System. Similar numbers are achieved for a 32-Node CMP system.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124187579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The purpose of this paper is to estimate the cost of utilizing under populated, or sparse, networks on chip (NOC) for chip multiprocessors (CMP). In under-populated NOCs, only a portion of nodes are sources and sinks whereas the rest are simple intermediate nodes increasing communication bandwidth. Compared to dense NOCs, where all nodes can be sources and sinks of communication, the under populated NOCs can be scaled so that any degree of communication frequency of nodes can be supported. The drawback of under populated NOCs is larger network area and bigger logical diameter. GPGPU-style stream-based or high-throughput CMPs can be used to hide the effect of longer latencies. In this paper, we present layouts for mesh-based under populated networks, calculate their wire length distributions and the overall area. Moreover, we present energy consumption calculations for such networks, and show that while the network part of a CMP system based on under populated NOCs can play a major role when considering the chip area and energy consumption, it can be pushed down by increasing the number of dimensions and using meshes instead of tori. We also compare various multidimensional sparse mesh-layouts and conclude the 3-dimensional and 4-dimensional sparse meshes to be the most attractive ones for throughput computing.
{"title":"Cost of Sparse Mesh Layouts Supporting Throughput Computing","authors":"M. Forsell, V. Leppänen, M. Penttonen","doi":"10.1109/DSD.2011.46","DOIUrl":"https://doi.org/10.1109/DSD.2011.46","url":null,"abstract":"The purpose of this paper is to estimate the cost of utilizing under populated, or sparse, networks on chip (NOC) for chip multiprocessors (CMP). In under-populated NOCs, only a portion of nodes are sources and sinks whereas the rest are simple intermediate nodes increasing communication bandwidth. Compared to dense NOCs, where all nodes can be sources and sinks of communication, the under populated NOCs can be scaled so that any degree of communication frequency of nodes can be supported. The drawback of under populated NOCs is larger network area and bigger logical diameter. GPGPU-style stream-based or high-throughput CMPs can be used to hide the effect of longer latencies. In this paper, we present layouts for mesh-based under populated networks, calculate their wire length distributions and the overall area. Moreover, we present energy consumption calculations for such networks, and show that while the network part of a CMP system based on under populated NOCs can play a major role when considering the chip area and energy consumption, it can be pushed down by increasing the number of dimensions and using meshes instead of tori. We also compare various multidimensional sparse mesh-layouts and conclude the 3-dimensional and 4-dimensional sparse meshes to be the most attractive ones for throughput computing.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114577554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Abbasinasab, M. Mohammadi, S. Mohammadi, S. Yanushkevich, Michael R. Smith
This paper proposes integrating mutation analysis into model checking to improve coverage metrics of digital circuits. In contrast to traditional mutation testing where mutant faults are generated and injected into the code description of the model, we apply a series of newly defined mutation operators directly to the model properties rather than to the model code. We claim that any mutant properties that are generated from the initial properties and validated by the model checker should be considered as new properties that have been missed during the initial verification procedure. Therefore, adding these newly identified properties to the existing list of properties improves the coverage metric of the formal verification and consequently lead to a more reliable design. Preliminary simulation results of applying this approach to a 4x4 Booth-Multiplier with 6 and 8 initial properties, demonstrates a 40% and 45% coverage improvement respectively compared to the initial coverage metric.
{"title":"Mutant Fault Injection in Functional Properties of a Model to Improve Coverage Metrics","authors":"A. Abbasinasab, M. Mohammadi, S. Mohammadi, S. Yanushkevich, Michael R. Smith","doi":"10.1109/DSD.2011.57","DOIUrl":"https://doi.org/10.1109/DSD.2011.57","url":null,"abstract":"This paper proposes integrating mutation analysis into model checking to improve coverage metrics of digital circuits. In contrast to traditional mutation testing where mutant faults are generated and injected into the code description of the model, we apply a series of newly defined mutation operators directly to the model properties rather than to the model code. We claim that any mutant properties that are generated from the initial properties and validated by the model checker should be considered as new properties that have been missed during the initial verification procedure. Therefore, adding these newly identified properties to the existing list of properties improves the coverage metric of the formal verification and consequently lead to a more reliable design. Preliminary simulation results of applying this approach to a 4x4 Booth-Multiplier with 6 and 8 initial properties, demonstrates a 40% and 45% coverage improvement respectively compared to the initial coverage metric.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129175098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In high frequency FPGAs with technology scale shrinking and threshold voltage value decreasing and based on existing large numbers of unused resources, leakage power has a considerable contribution in total power consumption. On the other hand, process variation, as an important challenge in nano-scale technologies, has a great impact on leakage power of FPGAs. Reconfigurability of FPGAs makes an unique opportunity to mitigate these challenges by their unique variation map extraction. In this paper, a per-chip process variation-aware placement (VMAP) algorithm is proposed to reduce the leakage power of FPGAs using the extracted variation map without neglecting dynamic power consumption. VMAP is adaptive to different process variation maps of various FPGA chips. Experimental results on attempted benchmarks show that power-delay-product (PDP) cost is reduced by 7.2% in the VMAP compared with conventional placement algorithms, with less than 16.8% standard deviation for different variation maps.
{"title":"VMAP: A Variation Map-Aware Placement Algorithm for Leakage Power Reduction in FPGAs","authors":"Behzad Salami, M. S. Zamani, A. Jahanian","doi":"10.1109/DSD.2011.15","DOIUrl":"https://doi.org/10.1109/DSD.2011.15","url":null,"abstract":"In high frequency FPGAs with technology scale shrinking and threshold voltage value decreasing and based on existing large numbers of unused resources, leakage power has a considerable contribution in total power consumption. On the other hand, process variation, as an important challenge in nano-scale technologies, has a great impact on leakage power of FPGAs. Reconfigurability of FPGAs makes an unique opportunity to mitigate these challenges by their unique variation map extraction. In this paper, a per-chip process variation-aware placement (VMAP) algorithm is proposed to reduce the leakage power of FPGAs using the extracted variation map without neglecting dynamic power consumption. VMAP is adaptive to different process variation maps of various FPGA chips. Experimental results on attempted benchmarks show that power-delay-product (PDP) cost is reduced by 7.2% in the VMAP compared with conventional placement algorithms, with less than 16.8% standard deviation for different variation maps.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127804690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}