首页 > 最新文献

2011 14th Euromicro Conference on Digital System Design最新文献

英文 中文
Efficient CRT RSA with SCA Countermeasures 高效CRT RSA与SCA对策
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.81
A. Fournaris, O. Koufopavlou
RSA cryptographic algorithm, working as a security tool for many years, has long achieved cryptographic and market maturity. However, as all crypto algorithms, RSA implementations, after the discovery and wide spread of Side Channel Attacks (SCA), are susceptible to a wide variety of different attacks that target the hardware structure rather than the algorithm itself. While there are a wide range of countermeasures that can be applied on the RSA structure in order to protect the algorithm from SCAs, combining several such measures in order to guarantee an SCA resistant RSA design is not an easy job. There are many incompatibility issues among SCA protection methods as well as an extensive performance cost added to an SCA secure RSA implementation. In this paper, we address some very popular and potent SCAs against RSA like Fault attacks (FA), Simple Power attacks (SPA), Doubling attacks (DA) and Differential Power attacks (DPA), and propose an algorithmic modification of RSA based on Chinese Remainder Theorem (CRT) that can thwart those attacks. We describe an implementation approach based on Montgomery modular multiplication and propose a hardware architecture for a SCA resistant CRT RSA that is structured on our proposed algorithm. The designed architecture is imPublic Key Cryptography, VLSI Design, Side Channel Attack Resistance, Modular Exponentiation, plemented in FPGA technology and results on its time and space complexity are extracted and evaluated.
RSA加密算法作为安全工具工作了多年,早已达到了密码学和市场的成熟。然而,与所有加密算法一样,RSA实现在发现和广泛传播侧信道攻击(SCA)之后,容易受到针对硬件结构而不是算法本身的各种不同攻击的影响。虽然可以在RSA结构上应用广泛的对策来保护算法免受SCA的攻击,但是结合几个这样的措施来保证抗SCA的RSA设计并不是一件容易的工作。SCA保护方法之间存在许多不兼容性问题,并且SCA安全RSA实现中增加了大量的性能成本。在本文中,我们讨论了一些非常流行和有效的RSA攻击,如故障攻击(FA),简单功率攻击(SPA),加倍攻击(DA)和差分功率攻击(DPA),并提出了一种基于中国剩余定理(CRT)的RSA算法修改,可以阻止这些攻击。我们描述了一种基于Montgomery模块化乘法的实现方法,并提出了一种基于我们提出的算法的抗SCA CRT RSA的硬件架构。所设计的体系结构包括公钥加密、VLSI设计、抗侧信道攻击、模块化幂运算,在FPGA技术中实现,并对其时间和空间复杂度进行了提取和评估。
{"title":"Efficient CRT RSA with SCA Countermeasures","authors":"A. Fournaris, O. Koufopavlou","doi":"10.1109/DSD.2011.81","DOIUrl":"https://doi.org/10.1109/DSD.2011.81","url":null,"abstract":"RSA cryptographic algorithm, working as a security tool for many years, has long achieved cryptographic and market maturity. However, as all crypto algorithms, RSA implementations, after the discovery and wide spread of Side Channel Attacks (SCA), are susceptible to a wide variety of different attacks that target the hardware structure rather than the algorithm itself. While there are a wide range of countermeasures that can be applied on the RSA structure in order to protect the algorithm from SCAs, combining several such measures in order to guarantee an SCA resistant RSA design is not an easy job. There are many incompatibility issues among SCA protection methods as well as an extensive performance cost added to an SCA secure RSA implementation. In this paper, we address some very popular and potent SCAs against RSA like Fault attacks (FA), Simple Power attacks (SPA), Doubling attacks (DA) and Differential Power attacks (DPA), and propose an algorithmic modification of RSA based on Chinese Remainder Theorem (CRT) that can thwart those attacks. We describe an implementation approach based on Montgomery modular multiplication and propose a hardware architecture for a SCA resistant CRT RSA that is structured on our proposed algorithm. The designed architecture is imPublic Key Cryptography, VLSI Design, Side Channel Attack Resistance, Modular Exponentiation, plemented in FPGA technology and results on its time and space complexity are extracted and evaluated.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127348559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Scalable Distributed Asynchronous Control Network for High Level Synthesis of Digital Circuits 一种用于数字电路高级综合的可扩展分布式异步控制网络
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.114
T. V. Leeuwen, R. V. Leuken
This paper presents a scalable asynchronous distributed control network. The control circuit allows for true asynchronous operation of all digital resources and as a result of its scalable distributed topology allows unlimited resource sharing. We start with the description of a data flow graph, and using traditional scheduling algorithms, generate an asynchronous distributed control network and the asynchronous data path. The distributed controllers are implemented such that they can be created by connecting a small number of pre-designed sub-controllers which are presented in this paper. Prototype IP-blocks of these sub-controller circuits have been designed in a 90nm ASIC design process. To prove the effectiveness of our method, we present some key performance parameters: area and power under timing constraints.
本文提出了一种可扩展的异步分布式控制网络。控制电路允许所有数字资源的真正异步操作,并且由于其可扩展的分布式拓扑结构允许无限的资源共享。本文从数据流图的描述入手,利用传统的调度算法,生成异步分布式控制网络和异步数据路径。分布式控制器的实现使得它们可以通过连接少量预先设计的子控制器来创建,这些子控制器在本文中提出。这些子控制器电路的原型ip模块已在90nm ASIC设计工艺中设计完成。为了证明该方法的有效性,我们给出了一些关键的性能参数:在时间约束下的面积和功率。
{"title":"A Scalable Distributed Asynchronous Control Network for High Level Synthesis of Digital Circuits","authors":"T. V. Leeuwen, R. V. Leuken","doi":"10.1109/DSD.2011.114","DOIUrl":"https://doi.org/10.1109/DSD.2011.114","url":null,"abstract":"This paper presents a scalable asynchronous distributed control network. The control circuit allows for true asynchronous operation of all digital resources and as a result of its scalable distributed topology allows unlimited resource sharing. We start with the description of a data flow graph, and using traditional scheduling algorithms, generate an asynchronous distributed control network and the asynchronous data path. The distributed controllers are implemented such that they can be created by connecting a small number of pre-designed sub-controllers which are presented in this paper. Prototype IP-blocks of these sub-controller circuits have been designed in a 90nm ASIC design process. To prove the effectiveness of our method, we present some key performance parameters: area and power under timing constraints.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132077182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Power Minimisation for Real-Time Dataflow Applications 实时数据流应用的功耗最小化
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.19
Andrew Nelson, Orlando Moreira, A. Molnos, S. Stuijk, B. T. Nguyen, K. Goossens
Energy efficient execution of applications is important for many reasons, e.g. time between battery charges, device temperature. Voltage and Frequency Scaling (VFS) enables applications to be run at lower frequencies on hardware resources thereby consuming less power. Real-time applications have deadlines that must be met otherwise their output is devalued. Dataflow modelling of real-time applications enables off-line verification of the application's temporal requirements. In this paper we describe a method to reduce the combined static and dynamic energy consumption using a Dynamic VFS (DVFS) technique for dataflow modelled real-time applications that may be mapped onto multiple hardware resources. We achieve this by using an application's static slack in order to perform DVFS while still satisfying the application's temporal requirements. We show that by formulating a dataflow modelled application and its mapping as a convex optimisation problem, with energy consumption as the objective function, the problem can be solved with a generic convex optimisation solver, producing an energy optimal constant frequency per application task. Our method allows task frequencies to be constrained such that, e.g. one frequency per application or per processor may be achieved.
应用程序的节能执行很重要,原因有很多,例如电池充电间隔时间,设备温度。电压和频率缩放(VFS)使应用程序能够在硬件资源上以较低的频率运行,从而消耗更少的功率。实时应用程序具有必须满足的截止日期,否则它们的输出将贬值。实时应用程序的数据流建模支持对应用程序的临时需求进行离线验证。在本文中,我们描述了一种使用动态VFS (DVFS)技术来减少静态和动态能源消耗的方法,用于数据流建模的实时应用程序,可以映射到多个硬件资源。我们通过使用应用程序的静态松弛来实现这一点,以便在执行DVFS的同时仍然满足应用程序的时间需求。我们表明,通过将数据流建模应用程序及其映射制定为凸优化问题,将能耗作为目标函数,可以使用通用凸优化求解器解决该问题,从而产生每个应用程序任务的能量最优恒定频率。我们的方法允许对任务频率进行限制,例如,每个应用程序或每个处理器可以实现一个频率。
{"title":"Power Minimisation for Real-Time Dataflow Applications","authors":"Andrew Nelson, Orlando Moreira, A. Molnos, S. Stuijk, B. T. Nguyen, K. Goossens","doi":"10.1109/DSD.2011.19","DOIUrl":"https://doi.org/10.1109/DSD.2011.19","url":null,"abstract":"Energy efficient execution of applications is important for many reasons, e.g. time between battery charges, device temperature. Voltage and Frequency Scaling (VFS) enables applications to be run at lower frequencies on hardware resources thereby consuming less power. Real-time applications have deadlines that must be met otherwise their output is devalued. Dataflow modelling of real-time applications enables off-line verification of the application's temporal requirements. In this paper we describe a method to reduce the combined static and dynamic energy consumption using a Dynamic VFS (DVFS) technique for dataflow modelled real-time applications that may be mapped onto multiple hardware resources. We achieve this by using an application's static slack in order to perform DVFS while still satisfying the application's temporal requirements. We show that by formulating a dataflow modelled application and its mapping as a convex optimisation problem, with energy consumption as the objective function, the problem can be solved with a generic convex optimisation solver, producing an energy optimal constant frequency per application task. Our method allows task frequencies to be constrained such that, e.g. one frequency per application or per processor may be achieved.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131677584","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Towards an Efficient NoC Topology through Multiple Injection Ports 通过多注入端口实现高效NoC拓扑
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.25
Jesús Camacho Villanueva, J. Flich, J. Duato, H. Eberle, W. Olesinski
In this paper, we present a flexible network on-chip topology: NR-Mesh (Nearest neighbor Mesh). The topology gives an end node the choice to inject a message through different neighboring routers, thereby reducing hop count and saving latency. At the receiver side, a message may be delivered to the end node through different routers, thus reducing hop count further and increasing flexibility when routing messages. This flexibility allows for maximizing network components to be in switch off mode, thus enabling power aware routing algorithms. Additional benefits are reduced congestion/contention levels in the network, support for efficient broadcast operations, savings in power consumption, and partial fault-tolerance. Our second contribution is a power management technique for the adaptive routing. This technique turns router ports and their attached links on and off depending on traffic conditions. The power management technique is able to achieve significant power savings when there is low traffic in the network. We further compare the new topology with the 2D-Mesh, using either deterministic or adaptive routing. When compared with the 2D-Mesh using deterministic routing, executing real applications in a full system simulation platform, the NR-Mesh topology using adaptive routing is able to obtain significant savings, 7% of reduction in execution time and 75% in energy consumption at the network on average for a 16-Node CMP System. Similar numbers are achieved for a 32-Node CMP system.
在本文中,我们提出了一种灵活的片上网络拓扑:NR-Mesh(最近邻Mesh)。该拓扑使终端节点可以选择通过不同的相邻路由器注入消息,从而减少跳数并节省延迟。在接收端,一条消息可以通过不同的路由器传递到终端节点,从而进一步减少跳数,增加消息路由时的灵活性。这种灵活性允许最大限度地使网络组件处于关闭模式,从而启用功率感知路由算法。其他好处包括减少网络中的拥塞/争用级别、支持高效的广播操作、节省功耗和部分容错。我们的第二个贡献是自适应路由的电源管理技术。该技术根据流量情况打开或关闭路由器端口及其附加链接。该电源管理技术能够在网络流量较低的情况下实现显著的功耗节约。我们进一步比较新的拓扑与2D-Mesh,使用确定性或自适应路由。与使用确定性路由的2D-Mesh相比,在完整的系统仿真平台上执行实际应用,使用自适应路由的NR-Mesh拓扑能够显著节省,在16节点CMP系统中平均减少7%的执行时间和75%的网络能耗。32节点的CMP系统也可以获得类似的数字。
{"title":"Towards an Efficient NoC Topology through Multiple Injection Ports","authors":"Jesús Camacho Villanueva, J. Flich, J. Duato, H. Eberle, W. Olesinski","doi":"10.1109/DSD.2011.25","DOIUrl":"https://doi.org/10.1109/DSD.2011.25","url":null,"abstract":"In this paper, we present a flexible network on-chip topology: NR-Mesh (Nearest neighbor Mesh). The topology gives an end node the choice to inject a message through different neighboring routers, thereby reducing hop count and saving latency. At the receiver side, a message may be delivered to the end node through different routers, thus reducing hop count further and increasing flexibility when routing messages. This flexibility allows for maximizing network components to be in switch off mode, thus enabling power aware routing algorithms. Additional benefits are reduced congestion/contention levels in the network, support for efficient broadcast operations, savings in power consumption, and partial fault-tolerance. Our second contribution is a power management technique for the adaptive routing. This technique turns router ports and their attached links on and off depending on traffic conditions. The power management technique is able to achieve significant power savings when there is low traffic in the network. We further compare the new topology with the 2D-Mesh, using either deterministic or adaptive routing. When compared with the 2D-Mesh using deterministic routing, executing real applications in a full system simulation platform, the NR-Mesh topology using adaptive routing is able to obtain significant savings, 7% of reduction in execution time and 75% in energy consumption at the network on average for a 16-Node CMP System. Similar numbers are achieved for a 32-Node CMP system.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124187579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Cost of Sparse Mesh Layouts Supporting Throughput Computing 支持吞吐量计算的稀疏网格布局成本
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.46
M. Forsell, V. Leppänen, M. Penttonen
The purpose of this paper is to estimate the cost of utilizing under populated, or sparse, networks on chip (NOC) for chip multiprocessors (CMP). In under-populated NOCs, only a portion of nodes are sources and sinks whereas the rest are simple intermediate nodes increasing communication bandwidth. Compared to dense NOCs, where all nodes can be sources and sinks of communication, the under populated NOCs can be scaled so that any degree of communication frequency of nodes can be supported. The drawback of under populated NOCs is larger network area and bigger logical diameter. GPGPU-style stream-based or high-throughput CMPs can be used to hide the effect of longer latencies. In this paper, we present layouts for mesh-based under populated networks, calculate their wire length distributions and the overall area. Moreover, we present energy consumption calculations for such networks, and show that while the network part of a CMP system based on under populated NOCs can play a major role when considering the chip area and energy consumption, it can be pushed down by increasing the number of dimensions and using meshes instead of tori. We also compare various multidimensional sparse mesh-layouts and conclude the 3-dimensional and 4-dimensional sparse meshes to be the most attractive ones for throughput computing.
本文的目的是估计利用芯片多处理器(CMP)的芯片上网络(NOC)的成本。在人口稀少的noc中,只有一部分节点是源节点和接收节点,而其余节点则是增加通信带宽的简单中间节点。与密集noc(所有节点都可以是通信的源和汇)相比,密集noc可以扩展,从而可以支持节点的任何程度的通信频率。人口较少的noc的缺点是网络面积较大,逻辑直径较大。gpgpu风格的基于流或高吞吐量的cmp可用于隐藏较长延迟的影响。在本文中,我们提出了基于网格的下填充网络的布局,计算了它们的导线长度分布和总面积。此外,我们给出了此类网络的能耗计算,并表明尽管基于未填充noc的CMP系统的网络部分在考虑芯片面积和能耗时可以发挥主要作用,但可以通过增加维度数量和使用网格而不是环面来降低能耗。我们还比较了各种多维稀疏网格布局,并得出三维和四维稀疏网格是吞吐量计算中最具吸引力的布局。
{"title":"Cost of Sparse Mesh Layouts Supporting Throughput Computing","authors":"M. Forsell, V. Leppänen, M. Penttonen","doi":"10.1109/DSD.2011.46","DOIUrl":"https://doi.org/10.1109/DSD.2011.46","url":null,"abstract":"The purpose of this paper is to estimate the cost of utilizing under populated, or sparse, networks on chip (NOC) for chip multiprocessors (CMP). In under-populated NOCs, only a portion of nodes are sources and sinks whereas the rest are simple intermediate nodes increasing communication bandwidth. Compared to dense NOCs, where all nodes can be sources and sinks of communication, the under populated NOCs can be scaled so that any degree of communication frequency of nodes can be supported. The drawback of under populated NOCs is larger network area and bigger logical diameter. GPGPU-style stream-based or high-throughput CMPs can be used to hide the effect of longer latencies. In this paper, we present layouts for mesh-based under populated networks, calculate their wire length distributions and the overall area. Moreover, we present energy consumption calculations for such networks, and show that while the network part of a CMP system based on under populated NOCs can play a major role when considering the chip area and energy consumption, it can be pushed down by increasing the number of dimensions and using meshes instead of tori. We also compare various multidimensional sparse mesh-layouts and conclude the 3-dimensional and 4-dimensional sparse meshes to be the most attractive ones for throughput computing.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114577554","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The Future of Data-Parallel Embedded Systems (Abstract) 数据并行嵌入式系统的未来(摘要)
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.118
M. Lindwer
Programmable data-parallel embedded systems are typically associated with tasks such as image processing, video decoding, and software-defined radio. This talk is particularly focused on designs for resource-constrained mobile and consumer devices. Today, heterogeneous multi-core designs are hailed as the solution, and many research teams claim to work on this topic. However, the heterogeneous processing often stays at the level of combining many RISCs with many DSPs or similarly adapted processors, which should actually still be classified as a homogeneous. In order to really compete with hardwired designs, extremely high efficiency is required. In this talk, we will show how the required levels of efficiency are obtained by building systems which consist of limited sets of highly parallel purpose-built processors, and by ensuring that these systems are programmed to efficiently utilize the available compute resources.
可编程数据并行嵌入式系统通常与图像处理、视频解码和软件定义无线电等任务相关。这次演讲特别关注资源受限的移动和消费设备的设计。今天,异构多核设计被誉为解决方案,许多研究团队声称正在研究这个主题。然而,异构处理通常停留在将许多risc与许多dsp或类似适应的处理器相结合的水平上,这实际上仍应归类为同质处理。为了真正与硬连线设计竞争,需要极高的效率。在这次演讲中,我们将展示如何通过构建由有限的高度并行专用处理器组成的系统来获得所需的效率水平,并确保这些系统被编程为有效利用可用的计算资源。
{"title":"The Future of Data-Parallel Embedded Systems (Abstract)","authors":"M. Lindwer","doi":"10.1109/DSD.2011.118","DOIUrl":"https://doi.org/10.1109/DSD.2011.118","url":null,"abstract":"Programmable data-parallel embedded systems are typically associated with tasks such as image processing, video decoding, and software-defined radio. This talk is particularly focused on designs for resource-constrained mobile and consumer devices. Today, heterogeneous multi-core designs are hailed as the solution, and many research teams claim to work on this topic. However, the heterogeneous processing often stays at the level of combining many RISCs with many DSPs or similarly adapted processors, which should actually still be classified as a homogeneous. In order to really compete with hardwired designs, extremely high efficiency is required. In this talk, we will show how the required levels of efficiency are obtained by building systems which consist of limited sets of highly parallel purpose-built processors, and by ensuring that these systems are programmed to efficiently utilize the available compute resources.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"272 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115899459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation of Fault-Tolerant Routing Methods for NoC Architectures NoC体系结构容错路由方法的评估
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.63
M. Valinataj
This paper presents performance and reliability evaluation of deterministic and adaptive fault-tolerant routing algorithms used in Network-on-Chip (NoC) designs. The investigated methods have a multi-level fault-tolerance capability and therefore can be separately evaluated. To illustrate the effectiveness of these methods, we conduct appropriate simulations on different applications for performance evaluation. But, for reliability assessment, we propose an analytical approach based on combinatorial reliability models to show the effect of fault-tolerant routing algorithms on overall NoC reliability.
本文介绍了用于片上网络(NoC)设计的确定性和自适应容错路由算法的性能和可靠性评估。所研究的方法具有多级容错能力,因此可以单独评估。为了说明这些方法的有效性,我们对不同的应用程序进行了适当的模拟以进行性能评估。但是,对于可靠性评估,我们提出了一种基于组合可靠性模型的分析方法来显示容错路由算法对NoC整体可靠性的影响。
{"title":"Evaluation of Fault-Tolerant Routing Methods for NoC Architectures","authors":"M. Valinataj","doi":"10.1109/DSD.2011.63","DOIUrl":"https://doi.org/10.1109/DSD.2011.63","url":null,"abstract":"This paper presents performance and reliability evaluation of deterministic and adaptive fault-tolerant routing algorithms used in Network-on-Chip (NoC) designs. The investigated methods have a multi-level fault-tolerance capability and therefore can be separately evaluated. To illustrate the effectiveness of these methods, we conduct appropriate simulations on different applications for performance evaluation. But, for reliability assessment, we propose an analytical approach based on combinatorial reliability models to show the effect of fault-tolerant routing algorithms on overall NoC reliability.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"300 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120880666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Model Driven Cache-Aware Scheduling of Object Oriented Software for Chip Multiprocessors 芯片多处理器面向对象软件的模型驱动缓存感知调度
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.96
T. Ovatman, F. Buzluca
Leveraging utilization of the shared caches of multicore processors is one of the heavily studied topics of today's chip multiprocessing community. Providing a scheduling mechanism that maximizes throughput by reducing miss-rates of shared caches and preserves the fairness of processor usage is in the center of this problem. Proposed scheduling algorithms in this field usually take advantage of thread level properties of software providing modifications at operating system level. In our study we choose to approach the problem from a different perspective and use software models to guide operating system to effectively map software's objects onto processor cores. In an object oriented software objects collaborate on fulfilling jobs and they may operate on common data. Our scheduling method takes class dependencies into account and tries to schedule objects of coupled classes onto cores that share the common cache. This paper presents case studies on implementations of three software design patterns(Strategy, Visitor and Observer) and an image filtering software implementation. During our experiments we use our cache-aware scheduler in guiding Linux's completely fair scheduler (CFS) to perform more cache-aware schedules and decrease running time around 10. Our results promise that guiding/restricting operating system's scheduler using class-relational information present in the object oriented software model can be fruitful in increasing software performance on multicore processors.
利用多核处理器的共享缓存是当今芯片多处理社区大量研究的主题之一。这个问题的核心是提供一种调度机制,通过减少共享缓存的失误率来最大化吞吐量,并保持处理器使用的公平性。该领域提出的调度算法通常利用软件的线程级特性,在操作系统级进行修改。在我们的研究中,我们选择从不同的角度来处理这个问题,并使用软件模型来指导操作系统有效地将软件对象映射到处理器内核上。在面向对象的软件中,对象协作完成任务,它们可能对公共数据进行操作。我们的调度方法考虑了类的依赖性,并尝试将耦合类的对象调度到共享公共缓存的核心上。本文介绍了三种软件设计模式(策略、访问者和观察者)的实现和图像过滤软件的实现。在我们的实验中,我们使用我们的缓存感知调度器来指导Linux的完全公平调度器(CFS)执行更多的缓存感知调度,并减少大约10的运行时间。我们的结果表明,使用面向对象软件模型中的类关系信息来指导/限制操作系统的调度器可以有效地提高多核处理器上的软件性能。
{"title":"Model Driven Cache-Aware Scheduling of Object Oriented Software for Chip Multiprocessors","authors":"T. Ovatman, F. Buzluca","doi":"10.1109/DSD.2011.96","DOIUrl":"https://doi.org/10.1109/DSD.2011.96","url":null,"abstract":"Leveraging utilization of the shared caches of multicore processors is one of the heavily studied topics of today's chip multiprocessing community. Providing a scheduling mechanism that maximizes throughput by reducing miss-rates of shared caches and preserves the fairness of processor usage is in the center of this problem. Proposed scheduling algorithms in this field usually take advantage of thread level properties of software providing modifications at operating system level. In our study we choose to approach the problem from a different perspective and use software models to guide operating system to effectively map software's objects onto processor cores. In an object oriented software objects collaborate on fulfilling jobs and they may operate on common data. Our scheduling method takes class dependencies into account and tries to schedule objects of coupled classes onto cores that share the common cache. This paper presents case studies on implementations of three software design patterns(Strategy, Visitor and Observer) and an image filtering software implementation. During our experiments we use our cache-aware scheduler in guiding Linux's completely fair scheduler (CFS) to perform more cache-aware schedules and decrease running time around 10. Our results promise that guiding/restricting operating system's scheduler using class-relational information present in the object oriented software model can be fruitful in increasing software performance on multicore processors.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125858610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
How a Symmetry Metric Assists Side-Channel Evaluation - A Novel Model Verification Method for Power Analysis 对称度量如何辅助侧信道评估——一种新的功率分析模型验证方法
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.91
Annelie Heuser, M. Kasper, W. Schindler, Marc Stöttinger
Side-channel analysis has become an important field of research for the semiconductor industry and for the academic sector as well. Of particular interest is constructive side-channel analysis as it supports a target-oriented associated design process. The main goal is to increase the side-channel resistance of cryptographic implementations within the design phase by a combination of advanced stochastic methods with design methods, tools, and countermeasures. In this contribution we present a new enhanced tool that utilizes symmetry properties to assist the side-channel evaluation of cryptographic implementations. This technique applies a symmetry metric, which is introduced as an engineering tool to verify the suitability of the leakage model in the evaluation phase of security-sensitive designs. Additionally, this approach also supports the designer in the selection of appropriate time instants.
侧信道分析已经成为半导体行业和学术界的一个重要研究领域。特别有趣的是建设性的侧通道分析,因为它支持面向目标的相关设计过程。主要目标是通过将先进的随机方法与设计方法、工具和对策相结合,在设计阶段增加加密实现的侧信道阻力。在这篇文章中,我们提出了一个新的增强工具,它利用对称特性来帮助加密实现的侧信道评估。该技术采用对称度量,作为一种工程工具,在安全敏感设计的评估阶段验证泄漏模型的适用性。此外,这种方法还支持设计人员选择适当的时间瞬间。
{"title":"How a Symmetry Metric Assists Side-Channel Evaluation - A Novel Model Verification Method for Power Analysis","authors":"Annelie Heuser, M. Kasper, W. Schindler, Marc Stöttinger","doi":"10.1109/DSD.2011.91","DOIUrl":"https://doi.org/10.1109/DSD.2011.91","url":null,"abstract":"Side-channel analysis has become an important field of research for the semiconductor industry and for the academic sector as well. Of particular interest is constructive side-channel analysis as it supports a target-oriented associated design process. The main goal is to increase the side-channel resistance of cryptographic implementations within the design phase by a combination of advanced stochastic methods with design methods, tools, and countermeasures. In this contribution we present a new enhanced tool that utilizes symmetry properties to assist the side-channel evaluation of cryptographic implementations. This technique applies a symmetry metric, which is introduced as an engineering tool to verify the suitability of the leakage model in the evaluation phase of security-sensitive designs. Additionally, this approach also supports the designer in the selection of appropriate time instants.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133110329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Energy Behaviour of NUCA Caches in CMPs cmp中NUCA缓存的能量行为
Pub Date : 2011-08-31 DOI: 10.1109/DSD.2011.99
A. Bardine, P. Foglia, Francesco Panicucci, M. Solinas, J. Sahuquillo
Advances in technology of semiconductor make nowadays possible to design Chip Multiprocessor Systems equipped with huge on-chip Last Level Caches. Due to the wire delay problem, the use of traditional cache memories with a uniform access time would result in unacceptable response latencies. NUCA (Non Uniform Cache Access) architecture has been proposed as a viable solution to hide the adverse impact of wires delay on performance. Many previous studies have focused on the effectiveness of NUCA architectures, but the study of the energy and power aspects of NUCA caches is still limited. In this work, we present an energy model specifically suited for NUCA-based CMP systems, together with a methodology to employ the model to evaluate the NUCA energy consumption. Moreover, we present a performance and energy dissipation analysis for two 8-core CMP systems with an S-NUCA and a D-NUCA, respectively. Experimental results show that, similarly to the monolithic processor, the static power also dominates the total power budget in the CMP system.
随着半导体技术的进步,设计具有巨大片上最后一级高速缓存的芯片多处理器系统成为可能。由于线延迟问题,使用具有统一访问时间的传统高速缓存存储器将导致不可接受的响应延迟。NUCA(非统一缓存访问)架构已被提出作为一种可行的解决方案,以隐藏电线延迟对性能的不利影响。以前的许多研究都集中在NUCA架构的有效性上,但是对NUCA缓存的能量和功率方面的研究仍然有限。在这项工作中,我们提出了一个特别适合于基于NUCA的CMP系统的能源模型,以及使用该模型来评估NUCA能源消耗的方法。此外,我们还对两个8核CMP系统分别采用S-NUCA和D-NUCA进行了性能和能量消耗分析。实验结果表明,与单片处理器类似,静态功耗在CMP系统中也占总功耗预算的主导地位。
{"title":"Energy Behaviour of NUCA Caches in CMPs","authors":"A. Bardine, P. Foglia, Francesco Panicucci, M. Solinas, J. Sahuquillo","doi":"10.1109/DSD.2011.99","DOIUrl":"https://doi.org/10.1109/DSD.2011.99","url":null,"abstract":"Advances in technology of semiconductor make nowadays possible to design Chip Multiprocessor Systems equipped with huge on-chip Last Level Caches. Due to the wire delay problem, the use of traditional cache memories with a uniform access time would result in unacceptable response latencies. NUCA (Non Uniform Cache Access) architecture has been proposed as a viable solution to hide the adverse impact of wires delay on performance. Many previous studies have focused on the effectiveness of NUCA architectures, but the study of the energy and power aspects of NUCA caches is still limited. In this work, we present an energy model specifically suited for NUCA-based CMP systems, together with a methodology to employ the model to evaluate the NUCA energy consumption. Moreover, we present a performance and energy dissipation analysis for two 8-core CMP systems with an S-NUCA and a D-NUCA, respectively. Experimental results show that, similarly to the monolithic processor, the static power also dominates the total power budget in the CMP system.","PeriodicalId":267187,"journal":{"name":"2011 14th Euromicro Conference on Digital System Design","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130299261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2011 14th Euromicro Conference on Digital System Design
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1