首页 > 最新文献

2011 14th Euromicro Conference on Digital System Design最新文献

英文 中文
Efficient CRT RSA with SCA Countermeasures 高效CRT RSA与SCA对策
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.81
A. Fournaris, O. Koufopavlou
RSA cryptographic algorithm, working as a security tool for many years, has long achieved cryptographic and market maturity. However, as all crypto algorithms, RSA implementations, after the discovery and wide spread of Side Channel Attacks (SCA), are susceptible to a wide variety of different attacks that target the hardware structure rather than the algorithm itself. While there are a wide range of countermeasures that can be applied on the RSA structure in order to protect the algorithm from SCAs, combining several such measures in order to guarantee an SCA resistant RSA design is not an easy job. There are many incompatibility issues among SCA protection methods as well as an extensive performance cost added to an SCA secure RSA implementation. In this paper, we address some very popular and potent SCAs against RSA like Fault attacks (FA), Simple Power attacks (SPA), Doubling attacks (DA) and Differential Power attacks (DPA), and propose an algorithmic modification of RSA based on Chinese Remainder Theorem (CRT) that can thwart those attacks. We describe an implementation approach based on Montgomery modular multiplication and propose a hardware architecture for a SCA resistant CRT RSA that is structured on our proposed algorithm. The designed architecture is imPublic Key Cryptography, VLSI Design, Side Channel Attack Resistance, Modular Exponentiation, plemented in FPGA technology and results on its time and space complexity are extracted and evaluated.
RSA加密算法作为安全工具工作了多年,早已达到了密码学和市场的成熟。然而,与所有加密算法一样,RSA实现在发现和广泛传播侧信道攻击(SCA)之后,容易受到针对硬件结构而不是算法本身的各种不同攻击的影响。虽然可以在RSA结构上应用广泛的对策来保护算法免受SCA的攻击,但是结合几个这样的措施来保证抗SCA的RSA设计并不是一件容易的工作。SCA保护方法之间存在许多不兼容性问题,并且SCA安全RSA实现中增加了大量的性能成本。在本文中,我们讨论了一些非常流行和有效的RSA攻击,如故障攻击(FA),简单功率攻击(SPA),加倍攻击(DA)和差分功率攻击(DPA),并提出了一种基于中国剩余定理(CRT)的RSA算法修改,可以阻止这些攻击。我们描述了一种基于Montgomery模块化乘法的实现方法,并提出了一种基于我们提出的算法的抗SCA CRT RSA的硬件架构。所设计的体系结构包括公钥加密、VLSI设计、抗侧信道攻击、模块化幂运算,在FPGA技术中实现,并对其时间和空间复杂度进行了提取和评估。
{"title":"Efficient CRT RSA with SCA Countermeasures","authors":"A. Fournaris, O. Koufopavlou","doi":"10.1109/DSD.2011.81","DOIUrl":"https://doi.org/10.1109/DSD.2011.81","url":null,"abstract":"RSA cryptographic algorithm, working as a security tool for many years, has long achieved cryptographic and market maturity. However, as all crypto algorithms, RSA implementations, after the discovery and wide spread of Side Channel Attacks (SCA), are susceptible to a wide variety of different attacks that target the hardware structure rather than the algorithm itself. While there are a wide range of countermeasures that can be applied on the RSA structure in order to protect the algorithm from SCAs, combining several such measures in order to guarantee an SCA resistant RSA design is not an easy job. There are many incompatibility issues among SCA protection methods as well as an extensive performance cost added to an SCA secure RSA implementation. In this paper, we address some very popular and potent SCAs against RSA like Fault attacks (FA), Simple Power attacks (SPA), Doubling attacks (DA) and Differential Power attacks (DPA), and propose an algorithmic modification of RSA based on Chinese Remainder Theorem (CRT) that can thwart those attacks. We describe an implementation approach based on Montgomery modular multiplication and propose a hardware architecture for a SCA resistant CRT RSA that is structured on our proposed algorithm. The designed architecture is imPublic Key Cryptography, VLSI Design, Side Channel Attack Resistance, Modular Exponentiation, plemented in FPGA technology and results on its time and space complexity are extracted and evaluated.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Scalable Distributed Asynchronous Control Network for High Level Synthesis of Digital Circuits 一种用于数字电路高级综合的可扩展分布式异步控制网络
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.114
T. V. Leeuwen, R. V. Leuken
This paper presents a scalable asynchronous distributed control network. The control circuit allows for true asynchronous operation of all digital resources and as a result of its scalable distributed topology allows unlimited resource sharing. We start with the description of a data flow graph, and using traditional scheduling algorithms, generate an asynchronous distributed control network and the asynchronous data path. The distributed controllers are implemented such that they can be created by connecting a small number of pre-designed sub-controllers which are presented in this paper. Prototype IP-blocks of these sub-controller circuits have been designed in a 90nm ASIC design process. To prove the effectiveness of our method, we present some key performance parameters: area and power under timing constraints.
本文提出了一种可扩展的异步分布式控制网络。控制电路允许所有数字资源的真正异步操作,并且由于其可扩展的分布式拓扑结构允许无限的资源共享。本文从数据流图的描述入手,利用传统的调度算法,生成异步分布式控制网络和异步数据路径。分布式控制器的实现使得它们可以通过连接少量预先设计的子控制器来创建,这些子控制器在本文中提出。这些子控制器电路的原型ip模块已在90nm ASIC设计工艺中设计完成。为了证明该方法的有效性,我们给出了一些关键的性能参数:在时间约束下的面积和功率。
{"title":"A Scalable Distributed Asynchronous Control Network for High Level Synthesis of Digital Circuits","authors":"T. V. Leeuwen, R. V. Leuken","doi":"10.1109/DSD.2011.114","DOIUrl":"https://doi.org/10.1109/DSD.2011.114","url":null,"abstract":"This paper presents a scalable asynchronous distributed control network. The control circuit allows for true asynchronous operation of all digital resources and as a result of its scalable distributed topology allows unlimited resource sharing. We start with the description of a data flow graph, and using traditional scheduling algorithms, generate an asynchronous distributed control network and the asynchronous data path. The distributed controllers are implemented such that they can be created by connecting a small number of pre-designed sub-controllers which are presented in this paper. Prototype IP-blocks of these sub-controller circuits have been designed in a 90nm ASIC design process. To prove the effectiveness of our method, we present some key performance parameters: area and power under timing constraints.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132077182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Power Minimisation for Real-Time Dataflow Applications 实时数据流应用的功耗最小化
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.19
Andrew Nelson, Orlando Moreira, A. Molnos, S. Stuijk, B. T. Nguyen, K. Goossens
Energy efficient execution of applications is important for many reasons, e.g. time between battery charges, device temperature. Voltage and Frequency Scaling (VFS) enables applications to be run at lower frequencies on hardware resources thereby consuming less power. Real-time applications have deadlines that must be met otherwise their output is devalued. Dataflow modelling of real-time applications enables off-line verification of the application's temporal requirements. In this paper we describe a method to reduce the combined static and dynamic energy consumption using a Dynamic VFS (DVFS) technique for dataflow modelled real-time applications that may be mapped onto multiple hardware resources. We achieve this by using an application's static slack in order to perform DVFS while still satisfying the application's temporal requirements. We show that by formulating a dataflow modelled application and its mapping as a convex optimisation problem, with energy consumption as the objective function, the problem can be solved with a generic convex optimisation solver, producing an energy optimal constant frequency per application task. Our method allows task frequencies to be constrained such that, e.g. one frequency per application or per processor may be achieved.
应用程序的节能执行很重要,原因有很多,例如电池充电间隔时间,设备温度。电压和频率缩放(VFS)使应用程序能够在硬件资源上以较低的频率运行,从而消耗更少的功率。实时应用程序具有必须满足的截止日期,否则它们的输出将贬值。实时应用程序的数据流建模支持对应用程序的临时需求进行离线验证。在本文中,我们描述了一种使用动态VFS (DVFS)技术来减少静态和动态能源消耗的方法,用于数据流建模的实时应用程序,可以映射到多个硬件资源。我们通过使用应用程序的静态松弛来实现这一点,以便在执行DVFS的同时仍然满足应用程序的时间需求。我们表明,通过将数据流建模应用程序及其映射制定为凸优化问题,将能耗作为目标函数,可以使用通用凸优化求解器解决该问题,从而产生每个应用程序任务的能量最优恒定频率。我们的方法允许对任务频率进行限制,例如,每个应用程序或每个处理器可以实现一个频率。
{"title":"Power Minimisation for Real-Time Dataflow Applications","authors":"Andrew Nelson, Orlando Moreira, A. Molnos, S. Stuijk, B. T. Nguyen, K. Goossens","doi":"10.1109/DSD.2011.19","DOIUrl":"https://doi.org/10.1109/DSD.2011.19","url":null,"abstract":"Energy efficient execution of applications is important for many reasons, e.g. time between battery charges, device temperature. Voltage and Frequency Scaling (VFS) enables applications to be run at lower frequencies on hardware resources thereby consuming less power. Real-time applications have deadlines that must be met otherwise their output is devalued. Dataflow modelling of real-time applications enables off-line verification of the application's temporal requirements. In this paper we describe a method to reduce the combined static and dynamic energy consumption using a Dynamic VFS (DVFS) technique for dataflow modelled real-time applications that may be mapped onto multiple hardware resources. We achieve this by using an application's static slack in order to perform DVFS while still satisfying the application's temporal requirements. We show that by formulating a dataflow modelled application and its mapping as a convex optimisation problem, with energy consumption as the objective function, the problem can be solved with a generic convex optimisation solver, producing an energy optimal constant frequency per application task. Our method allows task frequencies to be constrained such that, e.g. one frequency per application or per processor may be achieved.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131677584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Evaluation of Fault-Tolerant Routing Methods for NoC Architectures NoC体系结构容错路由方法的评估
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.63
M. Valinataj
This paper presents performance and reliability evaluation of deterministic and adaptive fault-tolerant routing algorithms used in Network-on-Chip (NoC) designs. The investigated methods have a multi-level fault-tolerance capability and therefore can be separately evaluated. To illustrate the effectiveness of these methods, we conduct appropriate simulations on different applications for performance evaluation. But, for reliability assessment, we propose an analytical approach based on combinatorial reliability models to show the effect of fault-tolerant routing algorithms on overall NoC reliability.
本文介绍了用于片上网络(NoC)设计的确定性和自适应容错路由算法的性能和可靠性评估。所研究的方法具有多级容错能力,因此可以单独评估。为了说明这些方法的有效性,我们对不同的应用程序进行了适当的模拟以进行性能评估。但是,对于可靠性评估,我们提出了一种基于组合可靠性模型的分析方法来显示容错路由算法对NoC整体可靠性的影响。
{"title":"Evaluation of Fault-Tolerant Routing Methods for NoC Architectures","authors":"M. Valinataj","doi":"10.1109/DSD.2011.63","DOIUrl":"https://doi.org/10.1109/DSD.2011.63","url":null,"abstract":"This paper presents performance and reliability evaluation of deterministic and adaptive fault-tolerant routing algorithms used in Network-on-Chip (NoC) designs. The investigated methods have a multi-level fault-tolerance capability and therefore can be separately evaluated. To illustrate the effectiveness of these methods, we conduct appropriate simulations on different applications for performance evaluation. But, for reliability assessment, we propose an analytical approach based on combinatorial reliability models to show the effect of fault-tolerant routing algorithms on overall NoC reliability.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120880666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
The Future of Data-Parallel Embedded Systems (Abstract) 数据并行嵌入式系统的未来(摘要)
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.118
M. Lindwer
Programmable data-parallel embedded systems are typically associated with tasks such as image processing, video decoding, and software-defined radio. This talk is particularly focused on designs for resource-constrained mobile and consumer devices. Today, heterogeneous multi-core designs are hailed as the solution, and many research teams claim to work on this topic. However, the heterogeneous processing often stays at the level of combining many RISCs with many DSPs or similarly adapted processors, which should actually still be classified as a homogeneous. In order to really compete with hardwired designs, extremely high efficiency is required. In this talk, we will show how the required levels of efficiency are obtained by building systems which consist of limited sets of highly parallel purpose-built processors, and by ensuring that these systems are programmed to efficiently utilize the available compute resources.
可编程数据并行嵌入式系统通常与图像处理、视频解码和软件定义无线电等任务相关。这次演讲特别关注资源受限的移动和消费设备的设计。今天,异构多核设计被誉为解决方案,许多研究团队声称正在研究这个主题。然而,异构处理通常停留在将许多risc与许多dsp或类似适应的处理器相结合的水平上,这实际上仍应归类为同质处理。为了真正与硬连线设计竞争,需要极高的效率。在这次演讲中,我们将展示如何通过构建由有限的高度并行专用处理器组成的系统来获得所需的效率水平,并确保这些系统被编程为有效利用可用的计算资源。
{"title":"The Future of Data-Parallel Embedded Systems (Abstract)","authors":"M. Lindwer","doi":"10.1109/DSD.2011.118","DOIUrl":"https://doi.org/10.1109/DSD.2011.118","url":null,"abstract":"Programmable data-parallel embedded systems are typically associated with tasks such as image processing, video decoding, and software-defined radio. This talk is particularly focused on designs for resource-constrained mobile and consumer devices. Today, heterogeneous multi-core designs are hailed as the solution, and many research teams claim to work on this topic. However, the heterogeneous processing often stays at the level of combining many RISCs with many DSPs or similarly adapted processors, which should actually still be classified as a homogeneous. In order to really compete with hardwired designs, extremely high efficiency is required. In this talk, we will show how the required levels of efficiency are obtained by building systems which consist of limited sets of highly parallel purpose-built processors, and by ensuring that these systems are programmed to efficiently utilize the available compute resources.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115899459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Model Driven Cache-Aware Scheduling of Object Oriented Software for Chip Multiprocessors 芯片多处理器面向对象软件的模型驱动缓存感知调度
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.96
T. Ovatman, F. Buzluca
Leveraging utilization of the shared caches of multicore processors is one of the heavily studied topics of today's chip multiprocessing community. Providing a scheduling mechanism that maximizes throughput by reducing miss-rates of shared caches and preserves the fairness of processor usage is in the center of this problem. Proposed scheduling algorithms in this field usually take advantage of thread level properties of software providing modifications at operating system level. In our study we choose to approach the problem from a different perspective and use software models to guide operating system to effectively map software's objects onto processor cores. In an object oriented software objects collaborate on fulfilling jobs and they may operate on common data. Our scheduling method takes class dependencies into account and tries to schedule objects of coupled classes onto cores that share the common cache. This paper presents case studies on implementations of three software design patterns(Strategy, Visitor and Observer) and an image filtering software implementation. During our experiments we use our cache-aware scheduler in guiding Linux's completely fair scheduler (CFS) to perform more cache-aware schedules and decrease running time around 10. Our results promise that guiding/restricting operating system's scheduler using class-relational information present in the object oriented software model can be fruitful in increasing software performance on multicore processors.
利用多核处理器的共享缓存是当今芯片多处理社区大量研究的主题之一。这个问题的核心是提供一种调度机制,通过减少共享缓存的失误率来最大化吞吐量,并保持处理器使用的公平性。该领域提出的调度算法通常利用软件的线程级特性,在操作系统级进行修改。在我们的研究中,我们选择从不同的角度来处理这个问题,并使用软件模型来指导操作系统有效地将软件对象映射到处理器内核上。在面向对象的软件中,对象协作完成任务,它们可能对公共数据进行操作。我们的调度方法考虑了类的依赖性,并尝试将耦合类的对象调度到共享公共缓存的核心上。本文介绍了三种软件设计模式(策略、访问者和观察者)的实现和图像过滤软件的实现。在我们的实验中,我们使用我们的缓存感知调度器来指导Linux的完全公平调度器(CFS)执行更多的缓存感知调度,并减少大约10的运行时间。我们的结果表明,使用面向对象软件模型中的类关系信息来指导/限制操作系统的调度器可以有效地提高多核处理器上的软件性能。
{"title":"Model Driven Cache-Aware Scheduling of Object Oriented Software for Chip Multiprocessors","authors":"T. Ovatman, F. Buzluca","doi":"10.1109/DSD.2011.96","DOIUrl":"https://doi.org/10.1109/DSD.2011.96","url":null,"abstract":"Leveraging utilization of the shared caches of multicore processors is one of the heavily studied topics of today's chip multiprocessing community. Providing a scheduling mechanism that maximizes throughput by reducing miss-rates of shared caches and preserves the fairness of processor usage is in the center of this problem. Proposed scheduling algorithms in this field usually take advantage of thread level properties of software providing modifications at operating system level. In our study we choose to approach the problem from a different perspective and use software models to guide operating system to effectively map software's objects onto processor cores. In an object oriented software objects collaborate on fulfilling jobs and they may operate on common data. Our scheduling method takes class dependencies into account and tries to schedule objects of coupled classes onto cores that share the common cache. This paper presents case studies on implementations of three software design patterns(Strategy, Visitor and Observer) and an image filtering software implementation. During our experiments we use our cache-aware scheduler in guiding Linux's completely fair scheduler (CFS) to perform more cache-aware schedules and decrease running time around 10. Our results promise that guiding/restricting operating system's scheduler using class-relational information present in the object oriented software model can be fruitful in increasing software performance on multicore processors.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125858610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Towards an Efficient NoC Topology through Multiple Injection Ports 通过多注入端口实现高效NoC拓扑
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.25
Jesús Camacho Villanueva, J. Flich, J. Duato, H. Eberle, W. Olesinski
In this paper, we present a flexible network on-chip topology: NR-Mesh (Nearest neighbor Mesh). The topology gives an end node the choice to inject a message through different neighboring routers, thereby reducing hop count and saving latency. At the receiver side, a message may be delivered to the end node through different routers, thus reducing hop count further and increasing flexibility when routing messages. This flexibility allows for maximizing network components to be in switch off mode, thus enabling power aware routing algorithms. Additional benefits are reduced congestion/contention levels in the network, support for efficient broadcast operations, savings in power consumption, and partial fault-tolerance. Our second contribution is a power management technique for the adaptive routing. This technique turns router ports and their attached links on and off depending on traffic conditions. The power management technique is able to achieve significant power savings when there is low traffic in the network. We further compare the new topology with the 2D-Mesh, using either deterministic or adaptive routing. When compared with the 2D-Mesh using deterministic routing, executing real applications in a full system simulation platform, the NR-Mesh topology using adaptive routing is able to obtain significant savings, 7% of reduction in execution time and 75% in energy consumption at the network on average for a 16-Node CMP System. Similar numbers are achieved for a 32-Node CMP system.
在本文中,我们提出了一种灵活的片上网络拓扑:NR-Mesh(最近邻Mesh)。该拓扑使终端节点可以选择通过不同的相邻路由器注入消息,从而减少跳数并节省延迟。在接收端,一条消息可以通过不同的路由器传递到终端节点,从而进一步减少跳数,增加消息路由时的灵活性。这种灵活性允许最大限度地使网络组件处于关闭模式,从而启用功率感知路由算法。其他好处包括减少网络中的拥塞/争用级别、支持高效的广播操作、节省功耗和部分容错。我们的第二个贡献是自适应路由的电源管理技术。该技术根据流量情况打开或关闭路由器端口及其附加链接。该电源管理技术能够在网络流量较低的情况下实现显著的功耗节约。我们进一步比较新的拓扑与2D-Mesh,使用确定性或自适应路由。与使用确定性路由的2D-Mesh相比,在完整的系统仿真平台上执行实际应用,使用自适应路由的NR-Mesh拓扑能够显著节省,在16节点CMP系统中平均减少7%的执行时间和75%的网络能耗。32节点的CMP系统也可以获得类似的数字。
{"title":"Towards an Efficient NoC Topology through Multiple Injection Ports","authors":"Jesús Camacho Villanueva, J. Flich, J. Duato, H. Eberle, W. Olesinski","doi":"10.1109/DSD.2011.25","DOIUrl":"https://doi.org/10.1109/DSD.2011.25","url":null,"abstract":"In this paper, we present a flexible network on-chip topology: NR-Mesh (Nearest neighbor Mesh). The topology gives an end node the choice to inject a message through different neighboring routers, thereby reducing hop count and saving latency. At the receiver side, a message may be delivered to the end node through different routers, thus reducing hop count further and increasing flexibility when routing messages. This flexibility allows for maximizing network components to be in switch off mode, thus enabling power aware routing algorithms. Additional benefits are reduced congestion/contention levels in the network, support for efficient broadcast operations, savings in power consumption, and partial fault-tolerance. Our second contribution is a power management technique for the adaptive routing. This technique turns router ports and their attached links on and off depending on traffic conditions. The power management technique is able to achieve significant power savings when there is low traffic in the network. We further compare the new topology with the 2D-Mesh, using either deterministic or adaptive routing. When compared with the 2D-Mesh using deterministic routing, executing real applications in a full system simulation platform, the NR-Mesh topology using adaptive routing is able to obtain significant savings, 7% of reduction in execution time and 75% in energy consumption at the network on average for a 16-Node CMP System. Similar numbers are achieved for a 32-Node CMP system.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124187579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Cost of Sparse Mesh Layouts Supporting Throughput Computing 支持吞吐量计算的稀疏网格布局成本
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.46
M. Forsell, V. Leppänen, M. Penttonen
The purpose of this paper is to estimate the cost of utilizing under populated, or sparse, networks on chip (NOC) for chip multiprocessors (CMP). In under-populated NOCs, only a portion of nodes are sources and sinks whereas the rest are simple intermediate nodes increasing communication bandwidth. Compared to dense NOCs, where all nodes can be sources and sinks of communication, the under populated NOCs can be scaled so that any degree of communication frequency of nodes can be supported. The drawback of under populated NOCs is larger network area and bigger logical diameter. GPGPU-style stream-based or high-throughput CMPs can be used to hide the effect of longer latencies. In this paper, we present layouts for mesh-based under populated networks, calculate their wire length distributions and the overall area. Moreover, we present energy consumption calculations for such networks, and show that while the network part of a CMP system based on under populated NOCs can play a major role when considering the chip area and energy consumption, it can be pushed down by increasing the number of dimensions and using meshes instead of tori. We also compare various multidimensional sparse mesh-layouts and conclude the 3-dimensional and 4-dimensional sparse meshes to be the most attractive ones for throughput computing.
本文的目的是估计利用芯片多处理器(CMP)的芯片上网络(NOC)的成本。在人口稀少的noc中,只有一部分节点是源节点和接收节点,而其余节点则是增加通信带宽的简单中间节点。与密集noc(所有节点都可以是通信的源和汇)相比,密集noc可以扩展,从而可以支持节点的任何程度的通信频率。人口较少的noc的缺点是网络面积较大,逻辑直径较大。gpgpu风格的基于流或高吞吐量的cmp可用于隐藏较长延迟的影响。在本文中,我们提出了基于网格的下填充网络的布局,计算了它们的导线长度分布和总面积。此外,我们给出了此类网络的能耗计算,并表明尽管基于未填充noc的CMP系统的网络部分在考虑芯片面积和能耗时可以发挥主要作用,但可以通过增加维度数量和使用网格而不是环面来降低能耗。我们还比较了各种多维稀疏网格布局,并得出三维和四维稀疏网格是吞吐量计算中最具吸引力的布局。
{"title":"Cost of Sparse Mesh Layouts Supporting Throughput Computing","authors":"M. Forsell, V. Leppänen, M. Penttonen","doi":"10.1109/DSD.2011.46","DOIUrl":"https://doi.org/10.1109/DSD.2011.46","url":null,"abstract":"The purpose of this paper is to estimate the cost of utilizing under populated, or sparse, networks on chip (NOC) for chip multiprocessors (CMP). In under-populated NOCs, only a portion of nodes are sources and sinks whereas the rest are simple intermediate nodes increasing communication bandwidth. Compared to dense NOCs, where all nodes can be sources and sinks of communication, the under populated NOCs can be scaled so that any degree of communication frequency of nodes can be supported. The drawback of under populated NOCs is larger network area and bigger logical diameter. GPGPU-style stream-based or high-throughput CMPs can be used to hide the effect of longer latencies. In this paper, we present layouts for mesh-based under populated networks, calculate their wire length distributions and the overall area. Moreover, we present energy consumption calculations for such networks, and show that while the network part of a CMP system based on under populated NOCs can play a major role when considering the chip area and energy consumption, it can be pushed down by increasing the number of dimensions and using meshes instead of tori. We also compare various multidimensional sparse mesh-layouts and conclude the 3-dimensional and 4-dimensional sparse meshes to be the most attractive ones for throughput computing.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114577554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Mutant Fault Injection in Functional Properties of a Model to Improve Coverage Metrics 模型功能属性中的突变故障注入以提高覆盖度量
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.57
A. Abbasinasab, M. Mohammadi, S. Mohammadi, S. Yanushkevich, Michael R. Smith
This paper proposes integrating mutation analysis into model checking to improve coverage metrics of digital circuits. In contrast to traditional mutation testing where mutant faults are generated and injected into the code description of the model, we apply a series of newly defined mutation operators directly to the model properties rather than to the model code. We claim that any mutant properties that are generated from the initial properties and validated by the model checker should be considered as new properties that have been missed during the initial verification procedure. Therefore, adding these newly identified properties to the existing list of properties improves the coverage metric of the formal verification and consequently lead to a more reliable design. Preliminary simulation results of applying this approach to a 4x4 Booth-Multiplier with 6 and 8 initial properties, demonstrates a 40% and 45% coverage improvement respectively compared to the initial coverage metric.
本文提出将突变分析集成到模型检验中,以提高数字电路的覆盖指标。与传统的突变测试不同,我们将一系列新定义的突变算子直接应用于模型属性,而不是模型代码。我们声称,从初始属性生成并由模型检查器验证的任何突变属性都应该被视为在初始验证过程中错过的新属性。因此,将这些新确定的属性添加到现有的属性列表中,可以改进形式化验证的覆盖度量,从而导致更可靠的设计。将此方法应用于具有6和8初始属性的4x4 Booth-Multiplier的初步模拟结果显示,与初始覆盖度量相比,覆盖率分别提高了40%和45%。
{"title":"Mutant Fault Injection in Functional Properties of a Model to Improve Coverage Metrics","authors":"A. Abbasinasab, M. Mohammadi, S. Mohammadi, S. Yanushkevich, Michael R. Smith","doi":"10.1109/DSD.2011.57","DOIUrl":"https://doi.org/10.1109/DSD.2011.57","url":null,"abstract":"This paper proposes integrating mutation analysis into model checking to improve coverage metrics of digital circuits. In contrast to traditional mutation testing where mutant faults are generated and injected into the code description of the model, we apply a series of newly defined mutation operators directly to the model properties rather than to the model code. We claim that any mutant properties that are generated from the initial properties and validated by the model checker should be considered as new properties that have been missed during the initial verification procedure. Therefore, adding these newly identified properties to the existing list of properties improves the coverage metric of the formal verification and consequently lead to a more reliable design. Preliminary simulation results of applying this approach to a 4x4 Booth-Multiplier with 6 and 8 initial properties, demonstrates a 40% and 45% coverage improvement respectively compared to the initial coverage metric.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129175098","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
VMAP: A Variation Map-Aware Placement Algorithm for Leakage Power Reduction in FPGAs VMAP:一种可感知变化图的fpga漏功率降低放置算法
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.15
Behzad Salami, M. S. Zamani, A. Jahanian
In high frequency FPGAs with technology scale shrinking and threshold voltage value decreasing and based on existing large numbers of unused resources, leakage power has a considerable contribution in total power consumption. On the other hand, process variation, as an important challenge in nano-scale technologies, has a great impact on leakage power of FPGAs. Reconfigurability of FPGAs makes an unique opportunity to mitigate these challenges by their unique variation map extraction. In this paper, a per-chip process variation-aware placement (VMAP) algorithm is proposed to reduce the leakage power of FPGAs using the extracted variation map without neglecting dynamic power consumption. VMAP is adaptive to different process variation maps of various FPGA chips. Experimental results on attempted benchmarks show that power-delay-product (PDP) cost is reduced by 7.2% in the VMAP compared with conventional placement algorithms, with less than 16.8% standard deviation for different variation maps.
在高频fpga中,随着技术规模的缩小和阈值的降低,在现有大量未使用资源的基础上,泄漏功率在总功耗中占有相当大的比重。另一方面,工艺变化作为纳米级技术的一个重要挑战,对fpga的泄漏功率有很大的影响。fpga的可重构性通过其独特的变异图提取为缓解这些挑战提供了独特的机会。为了在不忽略动态功耗的前提下,利用提取的变化图来降低fpga的泄漏功率,本文提出了一种单片工艺变化感知放置(VMAP)算法。VMAP可以适应各种FPGA芯片的不同工艺变化图。在尝试的基准测试中,实验结果表明,与传统的放置算法相比,VMAP的功率延迟积(PDP)成本降低了7.2%,不同变异图的标准差小于16.8%。
{"title":"VMAP: A Variation Map-Aware Placement Algorithm for Leakage Power Reduction in FPGAs","authors":"Behzad Salami, M. S. Zamani, A. Jahanian","doi":"10.1109/DSD.2011.15","DOIUrl":"https://doi.org/10.1109/DSD.2011.15","url":null,"abstract":"In high frequency FPGAs with technology scale shrinking and threshold voltage value decreasing and based on existing large numbers of unused resources, leakage power has a considerable contribution in total power consumption. On the other hand, process variation, as an important challenge in nano-scale technologies, has a great impact on leakage power of FPGAs. Reconfigurability of FPGAs makes an unique opportunity to mitigate these challenges by their unique variation map extraction. In this paper, a per-chip process variation-aware placement (VMAP) algorithm is proposed to reduce the leakage power of FPGAs using the extracted variation map without neglecting dynamic power consumption. VMAP is adaptive to different process variation maps of various FPGA chips. Experimental results on attempted benchmarks show that power-delay-product (PDP) cost is reduced by 7.2% in the VMAP compared with conventional placement algorithms, with less than 16.8% standard deviation for different variation maps.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127804690","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2011 14th Euromicro Conference on Digital System Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1