首页 > 最新文献

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
A novel model for system-level decision making with combined ASP and SMT solving 结合ASP和SMT求解的系统级决策新模型
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.230
Alexander Biewer, J. Gladigau, C. Haubelt
In this paper, we present a novel model enabling system-level decision making for time-triggered many-core architectures in automotive systems. The proposed application model includes shared data entities that need to be bound to memories during decision making. As a key enabler to our approach, we explicitly separate computation and shared memory communication over a network-on-chip (NoC). To deal with contention on a NoC, we model the necessary basis to implement a time-triggered schedule that guarantees freedom of interference. We compute fundamental design decisions, namely (a) spatial binding, (b) multi-hop routing, and (c) time-triggered scheduling, by a novel coupling of answer set programming (ASP) with satisfiability modulo theories (SMT) solvers. First results of an automotive case study demonstrate the applicability of our method for complex real-world applications.
在本文中,我们提出了一种新的模型,使汽车系统中时间触发多核架构的系统级决策成为可能。建议的应用程序模型包括在决策过程中需要绑定到内存的共享数据实体。作为我们方法的关键推动者,我们通过片上网络(NoC)显式地分离了计算和共享内存通信。为了处理NoC上的争用,我们建立了必要的基础模型,以实现时间触发的时间表,以保证干扰自由。我们计算基本的设计决策,即(a)空间绑定,(b)多跳路由和(c)时间触发调度,通过答案集规划(ASP)与可满足模理论(SMT)求解器的新颖耦合。汽车案例研究的第一个结果证明了我们的方法在复杂的现实世界应用中的适用性。
{"title":"A novel model for system-level decision making with combined ASP and SMT solving","authors":"Alexander Biewer, J. Gladigau, C. Haubelt","doi":"10.7873/DATE.2014.230","DOIUrl":"https://doi.org/10.7873/DATE.2014.230","url":null,"abstract":"In this paper, we present a novel model enabling system-level decision making for time-triggered many-core architectures in automotive systems. The proposed application model includes shared data entities that need to be bound to memories during decision making. As a key enabler to our approach, we explicitly separate computation and shared memory communication over a network-on-chip (NoC). To deal with contention on a NoC, we model the necessary basis to implement a time-triggered schedule that guarantees freedom of interference. We compute fundamental design decisions, namely (a) spatial binding, (b) multi-hop routing, and (c) time-triggered scheduling, by a novel coupling of answer set programming (ASP) with satisfiability modulo theories (SMT) solvers. First results of an automotive case study demonstrate the applicability of our method for complex real-world applications.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"50 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82920719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Time-critical computing on a single-chip massively parallel processor 单片大规模并行处理器上的时间关键计算
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.110
B. Dinechin, D. V. Amstel, Marc Poulhiès, Guillaume Lager
The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA®-256 single-chip many-core processor. The MPPA® -256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. On-chip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA®-256 processor.
通过在可能大量的可编程核上并行执行应用程序,可以满足低功耗下高性能计算的要求。然而,缺乏精确的计时属性可能会妨碍并行执行适用于时间要求严格的应用程序。我们通过适当地设计Kalray MPPA®-256单芯片多核处理器的体系结构、实现和编程模型来说明如何解决这个问题。MPPA®-256(多用途处理阵列)处理器在单个28nm CMOS芯片上集成了256个处理引擎(PE)内核和32个资源管理(RM)内核。这些VLIW内核分布在16个计算集群和4个I/O子系统中,每个子系统都有一个本地共享内存。片上通信和同步由显式寻址的双片上网络(NoC)支持,每个计算集群有一个节点,每个I/O子系统有4个节点。片外接口包括DDR, PCI和以太网,以及对NoC的直接访问,用于低延迟数据流处理。支持时间关键型应用程序的关键体系结构特性是计时组合核心、计算集群内的独立内存库以及由网络演算确定其保证服务的数据NoC。编程模型提供了有效支持分布式计算原语(如远程写、屏障同步、活动消息和抽样通信)的通信器。POSIX时间函数在计算集群和跨MPPA®-256处理器的中同步时钟内公开同步时钟。
{"title":"Time-critical computing on a single-chip massively parallel processor","authors":"B. Dinechin, D. V. Amstel, Marc Poulhiès, Guillaume Lager","doi":"10.7873/DATE.2014.110","DOIUrl":"https://doi.org/10.7873/DATE.2014.110","url":null,"abstract":"The requirement of high performance computing at low power can be met by the parallel execution of an application on a possibly large number of programmable cores. However, the lack of accurate timing properties may prevent parallel execution from being applicable to time-critical applications. We illustrate how this problem has been addressed by suitably designing the architecture, implementation, and programming model, of the Kalray MPPA®-256 single-chip many-core processor. The MPPA® -256 (Multi-Purpose Processing Array) processor integrates 256 processing engine (PE) cores and 32 resource management (RM) cores on a single 28nm CMOS chip. These VLIW cores are distributed across 16 compute clusters and 4 I/O subsystems, each with a locally shared memory. On-chip communication and synchronization are supported by an explicitly addressed dual network-on-chip (NoC), with one node per compute cluster and 4 nodes per I/O subsystem. Off-chip interfaces include DDR, PCI and Ethernet, and a direct access to the NoC for low-latency processing of data streams. The key architectural features that support time-critical applications are timing compositional cores, independent memory banks inside the compute clusters, and the data NoC whose guaranteed services are determined by network calculus. The programming model provides communicators that effectively support distributed computing primitives such as remote writes, barrier synchronizations, active messages, and communication by sampling. POSIX time functions expose synchronous clocks inside compute clusters and mesosynchronous clocks across the MPPA®-256 processor.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89111735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 174
Energy-efficient scheduling for memory-intensive GPGPU workloads 高效调度内存密集型GPGPU工作负载
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.032
Seokwoo Song, Minseok Lee, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu
High performance for a GPGPU workload is obtained by maximizing parallelism and fully utilizing the available resources. However, this is not necessarily energy efficient, especially for memory-intensive GPGPU workloads. In this work, we propose Throttle CTA (cooperative-thread array) Scheduling (TCS) where we leverage two type of throttling - throttling the number of actives cores and throttling of warp execution in the cores - to improve energy-efficiency for memory-intensive GPGPU workloads. The algorithm requires the global CTA or thread block scheduler to reduce the number of cores with assigned thread blocks while leveraging the local warp scheduler to throttle memory requests for some of the cores to further reduce power consumption. The proposed TCS scheduling does not require off-line analysis but can be done dynamically during execution. Instead of relying on conventional metrics such as miss-per-kilo-instruction (MPKI), we leverage the memory access latency metric to determine the memory intensity of the workloads. Our evaluations show that TCS reduces energy by up to 48% (38% on average) across different memory-intensive workload while having very little impact on performance for compute-intensive workloads.
通过最大化并行性和充分利用可用资源来获得GPGPU工作负载的高性能。然而,这并不一定是节能的,特别是对于内存密集型的GPGPU工作负载。在这项工作中,我们提出节流CTA(合作线程阵列)调度(TCS),其中我们利用两种类型的节流-节流活动内核的数量和节流内核中的warp执行-来提高内存密集型GPGPU工作负载的能源效率。该算法要求全局CTA或线程块调度器减少分配线程块的内核数量,同时利用本地warp调度器限制某些内核的内存请求,以进一步降低功耗。提出的TCS调度不需要离线分析,但可以在执行过程中动态完成。我们利用内存访问延迟度量来确定工作负载的内存强度,而不是依赖于诸如每千指令缺失量(MPKI)之类的传统度量。我们的评估表明,在不同的内存密集型工作负载中,TCS最多可减少48%(平均38%)的能耗,而对计算密集型工作负载的性能影响很小。
{"title":"Energy-efficient scheduling for memory-intensive GPGPU workloads","authors":"Seokwoo Song, Minseok Lee, John Kim, Woong Seo, Yeon-Gon Cho, Soojung Ryu","doi":"10.7873/DATE.2014.032","DOIUrl":"https://doi.org/10.7873/DATE.2014.032","url":null,"abstract":"High performance for a GPGPU workload is obtained by maximizing parallelism and fully utilizing the available resources. However, this is not necessarily energy efficient, especially for memory-intensive GPGPU workloads. In this work, we propose Throttle CTA (cooperative-thread array) Scheduling (TCS) where we leverage two type of throttling - throttling the number of actives cores and throttling of warp execution in the cores - to improve energy-efficiency for memory-intensive GPGPU workloads. The algorithm requires the global CTA or thread block scheduler to reduce the number of cores with assigned thread blocks while leveraging the local warp scheduler to throttle memory requests for some of the cores to further reduce power consumption. The proposed TCS scheduling does not require off-line analysis but can be done dynamically during execution. Instead of relying on conventional metrics such as miss-per-kilo-instruction (MPKI), we leverage the memory access latency metric to determine the memory intensity of the workloads. Our evaluations show that TCS reduces energy by up to 48% (38% on average) across different memory-intensive workload while having very little impact on performance for compute-intensive workloads.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"61 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83120904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Improving STT-MRAM density through multibit error correction 通过多比特纠错提高STT-MRAM密度
Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.195
Brandon Del Bel, Jongyeon Kim, C. Kim, S. Sapatnekar
STT-MRAMs are prone to data corruption due to inadvertent bit flips. Traditional methods enhance robustness at the cost of area/energy by using larger cell sizes to improve the thermal stability of the MTJ cells. This paper employs multibit error correction with DRAM-style refreshing to mitigate errors and provides a methodology for determining the optimal level of correction. A detailed analysis demonstrates that the reduction in nonvolatility requirements afforded by strong error correction translates to significantly lower area for the memory array compared to simpler ECC schemes, even when accounting for the increased overhead of error correction.
由于无意的位翻转,stt - mram容易导致数据损坏。传统方法通过使用更大的电池尺寸来提高MTJ电池的热稳定性,从而以面积/能量为代价来增强鲁棒性。本文采用多比特纠错与dram风格的刷新来减少错误,并提供了一种确定最佳纠错水平的方法。详细的分析表明,与更简单的ECC方案相比,强纠错所带来的非易失性需求的减少可以显著降低内存阵列的面积,即使考虑到纠错所增加的开销也是如此。
{"title":"Improving STT-MRAM density through multibit error correction","authors":"Brandon Del Bel, Jongyeon Kim, C. Kim, S. Sapatnekar","doi":"10.7873/DATE2014.195","DOIUrl":"https://doi.org/10.7873/DATE2014.195","url":null,"abstract":"STT-MRAMs are prone to data corruption due to inadvertent bit flips. Traditional methods enhance robustness at the cost of area/energy by using larger cell sizes to improve the thermal stability of the MTJ cells. This paper employs multibit error correction with DRAM-style refreshing to mitigate errors and provides a methodology for determining the optimal level of correction. A detailed analysis demonstrates that the reduction in nonvolatility requirements afforded by strong error correction translates to significantly lower area for the memory array compared to simpler ECC schemes, even when accounting for the increased overhead of error correction.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"11 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84687962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Resolving the memory bottleneck for single supply near-threshold computing 解决单电源近阈值计算的内存瓶颈问题
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.215
T. Gemmeke, M. Sabry, J. Stuijt, P. Raghavan, F. Catthoor, David Atienza Alonso
This paper focuses on a review of state-of-the-art memory designs and new design methods for near-threshold computing (NTC). In particular, it presents new ways to design reliable low-voltage NTC memories cost-effectively by reusing available cell libraries, or by adding a digital wrapper around existing commercially available memories. The approach is based on modeling at system level supported by silicon measurement on a test chip in a 40nm low-power processing technology. Advanced monitoring, control and run-time error mitigation schemes enable the operation of these memories at the same optimal near-Vt voltage level as the digital logic. Reliability degradation is thus overcome and this opens the way to solve the memory bottleneck in NTC systems. Starting from the available 40 nm silicon measurements, the analysis is extended to future 14 and 10 nm technology nodes.
本文主要综述了近阈值计算(NTC)的最新存储器设计和新设计方法。特别是,它提出了新的方法来设计可靠的低电压NTC存储器,通过重复使用可用的单元库,或通过在现有的商业可用存储器周围添加数字包装。该方法基于系统级建模,支持在40nm低功耗处理技术的测试芯片上进行硅测量。先进的监测、控制和运行时错误缓解方案使这些存储器能够在与数字逻辑相同的最佳近vt电压水平上运行。因此,克服了可靠性下降,这为解决NTC系统的内存瓶颈开辟了道路。从现有的40纳米硅测量开始,分析扩展到未来的14和10纳米技术节点。
{"title":"Resolving the memory bottleneck for single supply near-threshold computing","authors":"T. Gemmeke, M. Sabry, J. Stuijt, P. Raghavan, F. Catthoor, David Atienza Alonso","doi":"10.7873/DATE.2014.215","DOIUrl":"https://doi.org/10.7873/DATE.2014.215","url":null,"abstract":"This paper focuses on a review of state-of-the-art memory designs and new design methods for near-threshold computing (NTC). In particular, it presents new ways to design reliable low-voltage NTC memories cost-effectively by reusing available cell libraries, or by adding a digital wrapper around existing commercially available memories. The approach is based on modeling at system level supported by silicon measurement on a test chip in a 40nm low-power processing technology. Advanced monitoring, control and run-time error mitigation schemes enable the operation of these memories at the same optimal near-Vt voltage level as the digital logic. Reliability degradation is thus overcome and this opens the way to solve the memory bottleneck in NTC systems. Starting from the available 40 nm silicon measurements, the analysis is extended to future 14 and 10 nm technology nodes.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"44 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89313148","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Energy efficient data flow transformation for Givens Rotation based QR Decomposition 基于给定旋转QR分解的高能效数据流转换
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.224
Namita Sharma, P. Panda, Min Li, Prashant Agrawal, F. Catthoor
QR Decomposition (QRD) is a typical matrix decomposition algorithm that shares many common features with other algorithms such as LU and Cholesky decomposition. The principle can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence, the overall implementation energy. With modern low power embedded processors evolving towards register files with wide memory interfaces and vector functional units (FUs), the data flow in matrix decomposition algorithms needs to be carefully devised to achieve energy efficient implementation. In this paper, we present an efficient data flow transformation strategy for the Givens Rotation based QRD that optimizes data memory accesses. We also explore different possible implementations for QRD of multiple matrices using the SIMD feature of the processor. With the proposed data flow transformation, a reduction of up to 36% is achieved in the overall energy over conventional QRD sequences.
QR分解(QR Decomposition, QRD)是一种典型的矩阵分解算法,它与LU、Cholesky分解等算法有许多共同的特点。该原理可以在大量有效的处理序列中实现,这些处理序列在内存访问和计算的数量上有很大的不同,因此,总体实现能量。随着现代低功耗嵌入式处理器向具有宽存储接口和矢量功能单元(FUs)的寄存器文件发展,需要仔细设计矩阵分解算法中的数据流以实现节能实现。在本文中,我们提出了一种有效的基于给定旋转的QRD的数据流转换策略,该策略优化了数据存储访问。我们还探讨了使用处理器的SIMD特性对多个矩阵的QRD的不同可能实现。与传统的QRD序列相比,所提出的数据流转换可将总能量降低36%。
{"title":"Energy efficient data flow transformation for Givens Rotation based QR Decomposition","authors":"Namita Sharma, P. Panda, Min Li, Prashant Agrawal, F. Catthoor","doi":"10.7873/DATE.2014.224","DOIUrl":"https://doi.org/10.7873/DATE.2014.224","url":null,"abstract":"QR Decomposition (QRD) is a typical matrix decomposition algorithm that shares many common features with other algorithms such as LU and Cholesky decomposition. The principle can be realized in a large number of valid processing sequences that differ significantly in the number of memory accesses and computations, and hence, the overall implementation energy. With modern low power embedded processors evolving towards register files with wide memory interfaces and vector functional units (FUs), the data flow in matrix decomposition algorithms needs to be carefully devised to achieve energy efficient implementation. In this paper, we present an efficient data flow transformation strategy for the Givens Rotation based QRD that optimizes data memory accesses. We also explore different possible implementations for QRD of multiple matrices using the SIMD feature of the processor. With the proposed data flow transformation, a reduction of up to 36% is achieved in the overall energy over conventional QRD sequences.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"51 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87391018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Spatial pattern prediction based management of faulty data caches 基于空间模式预测的故障数据缓存管理
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.073
G. Keramidas, Michail Mavropoulos, Anna Karvouniari, D. Nikolos
Technology scaling leads to significant faulty bit rates in on-chip caches. In this work, we propose a methodology to mitigate the impact of defective bits (due to permanent faults) in first-level set-associative data caches. Our technique assumes that faulty caches are enhanced with the ability of disabling their defective parts at cache subblock granularity. Our experimental findings reveal that while the occurrence of hard-errors in faulty caches may have a significant impact in performance, a lot of room for improvement exists, if someone is able to take into account the spatial reuse patterns of the to-be-referenced blocks (not all the data fetched into the cache is accessed). To this end, we propose frugal PC-indexed spatial predictors (with very small storage requirements) to orchestrate the (re)placement decisions among the fully and partially unusable faulty blocks. Using cycle-accurate simulations, a wide range of scientific applications, and a plethora of cache fault maps, we showcase that our approach is able to offer significant benefits in cache performance.
技术扩展导致片上缓存中的显着错误比特率。在这项工作中,我们提出了一种方法来减轻第一级集关联数据缓存中缺陷位(由于永久故障)的影响。我们的技术假设故障缓存通过在缓存子块粒度上禁用其缺陷部分的能力得到增强。我们的实验结果表明,虽然在有故障的缓存中出现硬错误可能会对性能产生重大影响,但如果有人能够考虑到要引用的块的空间重用模式(并非所有提取到缓存中的数据都被访问),则存在很大的改进空间。为此,我们提出了节俭的pc索引空间预测器(具有非常小的存储需求),以在完全和部分不可用的故障块之间编排(重新)放置决策。使用周期精确的模拟、广泛的科学应用和大量的缓存故障图,我们展示了我们的方法能够在缓存性能方面提供显着的好处。
{"title":"Spatial pattern prediction based management of faulty data caches","authors":"G. Keramidas, Michail Mavropoulos, Anna Karvouniari, D. Nikolos","doi":"10.7873/DATE.2014.073","DOIUrl":"https://doi.org/10.7873/DATE.2014.073","url":null,"abstract":"Technology scaling leads to significant faulty bit rates in on-chip caches. In this work, we propose a methodology to mitigate the impact of defective bits (due to permanent faults) in first-level set-associative data caches. Our technique assumes that faulty caches are enhanced with the ability of disabling their defective parts at cache subblock granularity. Our experimental findings reveal that while the occurrence of hard-errors in faulty caches may have a significant impact in performance, a lot of room for improvement exists, if someone is able to take into account the spatial reuse patterns of the to-be-referenced blocks (not all the data fetched into the cache is accessed). To this end, we propose frugal PC-indexed spatial predictors (with very small storage requirements) to orchestrate the (re)placement decisions among the fully and partially unusable faulty blocks. Using cycle-accurate simulations, a wide range of scientific applications, and a plethora of cache fault maps, we showcase that our approach is able to offer significant benefits in cache performance.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"74 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87428631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Temperature aware energy-reliability trade-offs for mapping of throughput-constrained applications on multimedia MPSoCs 多媒体mpsoc上吞吐量受限应用映射的温度感知能源可靠性权衡
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.115
Anup Das, Akash Kumar, B. Veeravalli
This paper proposes a design-time (offline) analysis technique to determine application task mapping and scheduling on a multiprocessor system and the voltage and frequency levels of all cores (offline DVFS) that minimize application computation and communication energy, simultaneously minimizing processor aging. The proposed technique incorporates (1) the effect of the voltage and frequency on the temperature of a core; (2) the effect of neighboring cores' voltage and frequency on the temperature (spatial effect); (3) pipelined execution and cyclic dependencies among tasks; and (4) the communication energy component which often constitutes a significant fraction of the total energy for multimedia applications. The temperature model proposed here can be easily integrated in the design space exploration for multiprocessor systems. Experiments conducted with MPEG-4 decoder on a real system demonstrate that the temperature using the proposed model is within 5% of the actual temperature clearly demonstrating its accuracy. Further, the overall optimization technique achieves 40% savings in energy consumption with 6% increase in system lifetime.
本文提出了一种设计时(离线)分析技术,以确定多处理器系统上的应用任务映射和调度,以及所有核心的电压和频率水平(离线DVFS),以最小化应用计算和通信能量,同时最小化处理器老化。所提出的技术包含(1)电压和频率对铁芯温度的影响;(2)相邻磁芯电压和频率对温度的影响(空间效应);(3)任务之间的流水线执行和循环依赖关系;(4)通信能量组件,它通常构成多媒体应用的总能量的重要部分。本文提出的温度模型可以很容易地集成到多处理器系统的设计空间探索中。用MPEG-4解码器在实际系统上进行的实验表明,所提模型的温度与实际温度的误差在5%以内,充分证明了该模型的准确性。此外,整体优化技术实现了40%的能耗节约和6%的系统寿命延长。
{"title":"Temperature aware energy-reliability trade-offs for mapping of throughput-constrained applications on multimedia MPSoCs","authors":"Anup Das, Akash Kumar, B. Veeravalli","doi":"10.7873/DATE.2014.115","DOIUrl":"https://doi.org/10.7873/DATE.2014.115","url":null,"abstract":"This paper proposes a design-time (offline) analysis technique to determine application task mapping and scheduling on a multiprocessor system and the voltage and frequency levels of all cores (offline DVFS) that minimize application computation and communication energy, simultaneously minimizing processor aging. The proposed technique incorporates (1) the effect of the voltage and frequency on the temperature of a core; (2) the effect of neighboring cores' voltage and frequency on the temperature (spatial effect); (3) pipelined execution and cyclic dependencies among tasks; and (4) the communication energy component which often constitutes a significant fraction of the total energy for multimedia applications. The temperature model proposed here can be easily integrated in the design space exploration for multiprocessor systems. Experiments conducted with MPEG-4 decoder on a real system demonstrate that the temperature using the proposed model is within 5% of the actual temperature clearly demonstrating its accuracy. Further, the overall optimization technique achieves 40% savings in energy consumption with 6% increase in system lifetime.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"23 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85183763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
PUF modeling attacks: An introduction and overview PUF建模攻击:介绍和概述
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.361
U. Rührmair, J. Sölter
Machine learning (ML) based modeling attacks are the currently most relevant and effective attack form for so-called Strong Physical Unclonable Functions (Strong PUFs). We provide an overview of this method in this paper: We discuss (i) the basic conditions under which it is applicable; (ii) the ML algorithms that have been used in this context; (iii) the latest and most advanced results; (iv) the right interpretation of existing results; and (v) possible future research directions.
基于机器学习(ML)的建模攻击是目前所谓的强物理不可克隆函数(Strong puf)最相关和最有效的攻击形式。本文对该方法进行了概述:讨论了(1)该方法适用的基本条件;(ii)在这种情况下使用的ML算法;(三)最新、最先进的成果;(iv)对现有结果的正确解释;(五)未来可能的研究方向。
{"title":"PUF modeling attacks: An introduction and overview","authors":"U. Rührmair, J. Sölter","doi":"10.7873/DATE.2014.361","DOIUrl":"https://doi.org/10.7873/DATE.2014.361","url":null,"abstract":"Machine learning (ML) based modeling attacks are the currently most relevant and effective attack form for so-called Strong Physical Unclonable Functions (Strong PUFs). We provide an overview of this method in this paper: We discuss (i) the basic conditions under which it is applicable; (ii) the ML algorithms that have been used in this context; (iii) the latest and most advanced results; (iv) the right interpretation of existing results; and (v) possible future research directions.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86969579","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 91
Advanced system on a chip design based on controllable-polarity FETs 基于可控极性场效应管的先进片上系统设计
Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.248
P. Gaillardon, L. Amarù, Jian Zhang, G. Micheli
Field-Effect Transistors (FETs) with on-line controllable-polarity are promising candidates to support next generation System-on-Chip (SoC). Thanks to their enhanced functionality, controllable-polarity FETs enable a superior design of critical components in a SoC, such as processing units and memories, while also providing native solutions to control power consumption. In this paper, we present the efficient design of a SoC core with controllable-polarity FET. Processing units are speeded-up at the datapath level, as arithmetic operations require fewer physical resources than in standard CMOS. Power consumption is decreased via embedded power-gating techniques and tunable high-performance/low-power devices operation. Memory cells are made smaller by merging the access interface with the storage circuitry. We foresee the advantages deriving from these techniques, by evaluating their impact on the design of SoC for a contemporary telecommunication application. Using a 22-nm vertically-stacked silicon nanowire technology, a coarse-grain evaluation at the block level estimates a delay and power reduction of 20% and 19% respectively, at a cost of a moderate area overhead of 15%, with respect to a state-of-art FinFET technology.
具有在线可控极性的场效应晶体管(fet)是支持下一代片上系统(SoC)的有希望的候选者。由于其增强的功能,可控极性场效应管可以实现SoC中关键组件(如处理单元和存储器)的卓越设计,同时还提供了控制功耗的原生解决方案。在本文中,我们提出了一种具有可控极性场效应晶体管的SoC核心的高效设计。处理单元在数据路径级别加速,因为算术运算需要比标准CMOS更少的物理资源。通过嵌入式电源门控技术和可调的高性能/低功耗器件操作,降低了功耗。通过将存取接口与存储电路合并,存储器单元变得更小。通过评估这些技术对当代电信应用SoC设计的影响,我们预见了这些技术的优势。采用22nm垂直堆叠硅纳米线技术,在块级进行粗粒度评估,估计与最先进的FinFET技术相比,延迟和功耗分别降低了20%和19%,而面积开销仅为15%。
{"title":"Advanced system on a chip design based on controllable-polarity FETs","authors":"P. Gaillardon, L. Amarù, Jian Zhang, G. Micheli","doi":"10.7873/DATE.2014.248","DOIUrl":"https://doi.org/10.7873/DATE.2014.248","url":null,"abstract":"Field-Effect Transistors (FETs) with on-line controllable-polarity are promising candidates to support next generation System-on-Chip (SoC). Thanks to their enhanced functionality, controllable-polarity FETs enable a superior design of critical components in a SoC, such as processing units and memories, while also providing native solutions to control power consumption. In this paper, we present the efficient design of a SoC core with controllable-polarity FET. Processing units are speeded-up at the datapath level, as arithmetic operations require fewer physical resources than in standard CMOS. Power consumption is decreased via embedded power-gating techniques and tunable high-performance/low-power devices operation. Memory cells are made smaller by merging the access interface with the storage circuitry. We foresee the advantages deriving from these techniques, by evaluating their impact on the design of SoC for a contemporary telecommunication application. Using a 22-nm vertically-stacked silicon nanowire technology, a coarse-grain evaluation at the block level estimates a delay and power reduction of 20% and 19% respectively, at a cost of a moderate area overhead of 15%, with respect to a state-of-art FinFET technology.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"104 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83627578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 35
期刊
2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1