首页 > 最新文献

International Symposium on Quality Electronic Design (ISQED)最新文献

英文 中文
On the selection of adder unit in energy efficient vector processing 论高效矢量处理中加法器的选择
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523602
Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero
Vector processors are a very promising solution for mobile devices and servers due to their inherently energy-efficient way of exploiting data-level parallelism. Previous research on vector architectures predominantly focused on performance, so vector processors require a new design space exploration to achieve low power. In this paper, we present a design space exploration of adder unit for vector processors (VA), as it is one of the crucial components in the core design with a non-negligible impact in overall performance and power. For this interrelated circuit-architecture exploration, we developed a novel framework with both architectural- and circuit-level tools. Our framework includes both design- (e.g. adder's family type) and vector architecture-related parameters (e.g. vector length). Finally, we present guidelines on the selection of the most appropriate VA for different types of vector processors according to different sets of metrics of interest. For example, we found that 2-lane configurations are more EDP (Energy×Delay)-efficient than single lane configurations for low-end mobile processors.
矢量处理器是移动设备和服务器非常有前途的解决方案,因为它们利用数据级并行性的固有节能方式。先前对矢量架构的研究主要集中在性能上,因此矢量处理器需要新的设计空间探索以实现低功耗。在本文中,我们提出了矢量处理器(VA)加法器单元的设计空间探索,因为它是核心设计中的关键组件之一,对整体性能和功耗具有不可忽视的影响。对于这种相互关联的电路架构探索,我们开发了一个具有架构和电路级工具的新框架。我们的框架包括设计参数(例如加法器的族类型)和与向量架构相关的参数(例如向量长度)。最后,我们提出了根据不同的感兴趣的度量集为不同类型的矢量处理器选择最合适的VA的指导方针。例如,我们发现,对于低端移动处理器,2通道配置比单通道配置更具EDP (Energy×Delay)效率。
{"title":"On the selection of adder unit in energy efficient vector processing","authors":"Ivan Ratković, Oscar Palomar, Milan Stanic, O. Unsal, A. Cristal, M. Valero","doi":"10.1109/ISQED.2013.6523602","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523602","url":null,"abstract":"Vector processors are a very promising solution for mobile devices and servers due to their inherently energy-efficient way of exploiting data-level parallelism. Previous research on vector architectures predominantly focused on performance, so vector processors require a new design space exploration to achieve low power. In this paper, we present a design space exploration of adder unit for vector processors (VA), as it is one of the crucial components in the core design with a non-negligible impact in overall performance and power. For this interrelated circuit-architecture exploration, we developed a novel framework with both architectural- and circuit-level tools. Our framework includes both design- (e.g. adder's family type) and vector architecture-related parameters (e.g. vector length). Finally, we present guidelines on the selection of the most appropriate VA for different types of vector processors according to different sets of metrics of interest. For example, we found that 2-lane configurations are more EDP (Energy×Delay)-efficient than single lane configurations for low-end mobile processors.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132502680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A system-level solution for managing spatial temperature gradients in thinned 3D ICs 一种系统级解决方案,用于管理薄化3D集成电路中的空间温度梯度
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523595
A. Annamalai, Raghavan Kumar, Arunkumar Vijayakumar, S. Kundu
As conventional CMOS technology is approaching scaling limits, the shift in trend towards stacked 3D Integrated Circuits (3D IC) is gaining more importance. 3D ICs offer reduced power dissipation, higher integration density, heterogeneous stacking and reduced interconnect delays. In a 3D IC stack, all but the bottom tier are thinned down to enable through-silicon vias (TSV). However, the thinning of the substrate increases the lateral thermal resistance resulting in higher intra-layer temperature gradients potentially leading to performance degradation and even functional errors. In this work, we study the effect of thinning the substrate on temperature profile of various tiers in 3D ICs. Our simulation results show that the intra-layer temperature gradient can be as high as 57°C. Often, the conventional static solutions lead to highly inefficient design. To this end, we present a system-level situation-aware integrated scheme that performs opportunistic thread migration and dynamic voltage and frequency scaling (DVFS) to effectively manage thermal violations while increasing the system throughput relative to stand-alone schemes.
随着传统CMOS技术日益接近规模极限,向堆叠3D集成电路(3D IC)的转变趋势变得越来越重要。3D集成电路提供更低的功耗,更高的集成密度,异构堆叠和更少的互连延迟。在3D集成电路堆栈中,除了底层外,所有层都被减薄以实现硅通孔(TSV)。然而,衬底变薄增加了横向热阻,导致层内温度梯度升高,可能导致性能下降甚至功能错误。在这项工作中,我们研究了衬底减薄对三维集成电路中各层温度分布的影响。我们的模拟结果表明,层内温度梯度可以高达57℃。通常,传统的静态解决方案会导致效率极低的设计。为此,我们提出了一种系统级态势感知集成方案,该方案执行机会性线程迁移和动态电压和频率缩放(DVFS),以有效管理热违规,同时相对于独立方案提高系统吞吐量。
{"title":"A system-level solution for managing spatial temperature gradients in thinned 3D ICs","authors":"A. Annamalai, Raghavan Kumar, Arunkumar Vijayakumar, S. Kundu","doi":"10.1109/ISQED.2013.6523595","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523595","url":null,"abstract":"As conventional CMOS technology is approaching scaling limits, the shift in trend towards stacked 3D Integrated Circuits (3D IC) is gaining more importance. 3D ICs offer reduced power dissipation, higher integration density, heterogeneous stacking and reduced interconnect delays. In a 3D IC stack, all but the bottom tier are thinned down to enable through-silicon vias (TSV). However, the thinning of the substrate increases the lateral thermal resistance resulting in higher intra-layer temperature gradients potentially leading to performance degradation and even functional errors. In this work, we study the effect of thinning the substrate on temperature profile of various tiers in 3D ICs. Our simulation results show that the intra-layer temperature gradient can be as high as 57°C. Often, the conventional static solutions lead to highly inefficient design. To this end, we present a system-level situation-aware integrated scheme that performs opportunistic thread migration and dynamic voltage and frequency scaling (DVFS) to effectively manage thermal violations while increasing the system throughput relative to stand-alone schemes.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133499365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
On the interactions between real-time scheduling and inter-thread cached interferences for multicore processors 多核处理器实时调度与线程间缓存干扰的交互研究
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523676
Yiqiang Ding, Wei Zhang
In a multicore platform, the inter-thread cache interferences can significantly affect the worst-case execution time (WCET) of each real-time task, which is crucial for schedulability analysis. At the same time, the worst-case cache interferences are dependent on how tasks are scheduled to run on different cores, thus creating a circular dependence. In this paper, we present an offline real-time scheduling approach on multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model.
在多核平台中,线程间缓存干扰会显著影响每个实时任务的最坏情况执行时间(WCET),这对可调度性分析至关重要。同时,最坏情况下的缓存干扰取决于任务是如何安排在不同的核心上运行的,从而产生了循环依赖。本文提出了一种基于多核处理器的离线实时调度方法,该方法考虑了共享L2缓存上最坏情况下的线程间干扰。我们的调度方法使用贪婪启发式算法来生成安全调度,同时最小化最坏情况下线程间共享L2缓存干扰和WCET。实验结果表明,与理论模型中的循环多核调度方法相比,该方法可将生成的调度利用率平均降低约12%。
{"title":"On the interactions between real-time scheduling and inter-thread cached interferences for multicore processors","authors":"Yiqiang Ding, Wei Zhang","doi":"10.1109/ISQED.2013.6523676","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523676","url":null,"abstract":"In a multicore platform, the inter-thread cache interferences can significantly affect the worst-case execution time (WCET) of each real-time task, which is crucial for schedulability analysis. At the same time, the worst-case cache interferences are dependent on how tasks are scheduled to run on different cores, thus creating a circular dependence. In this paper, we present an offline real-time scheduling approach on multicore processors by considering the worst-case inter-thread interferences on shared L2 caches. Our scheduling approach uses a greedy heuristic to generate safe schedules while minimizing the worst-case inter-thread shared L2 cache interferences and WCET. The experimental results demonstrate that the proposed approach can reduce the utilization of the resulting schedule by about 12% on average compared to the cyclic multicore scheduling approaches in our theoretical model.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133706965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling sizing for enhancing the static noise margins 启用大小以增强静态噪声边界
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523623
Valeriu Beiu, A. Beg, W. Ibrahim, F. Kharbash, M. Alioto
This paper suggests a transistor sizing method for classical CMOS gates implemented in advanced technology nodes and operating at low voltages. The method relies on upsizing the length (L) of all transistors uniformly, and balancing the voltage transfer curves (VTCs) for maximizing the static noise margins (SNMs). We use the most well-known CMOS gates (INV, NAND-2, NOR-2) for introducing the novel sizing method, as well as for validating the concept and evaluating its performances. The results show that sizing has not entirely exhausted its potential, allowing to go beyond the well established delay-power tradeoff, as sizing can increase SNMs by: (i) adjusting the threshold voltages (VTH) and their variations (σVTH); and (ii) balancing the VTCs. Simulation results show that this sizing method enables more reliable (i.e., noise-robust and variation-tolerant) CMOS gates, which could operate correctly at very low supply voltages, hence leading to ultra-low voltage/power circuits.
本文提出了一种在先进技术节点和低电压下实现的经典CMOS栅极晶体管尺寸的方法。该方法通过均匀增大所有晶体管的长度(L)和平衡电压传递曲线(VTCs)来最大化静态噪声裕度(SNMs)。我们使用最著名的CMOS门(INV, NAND-2, NOR-2)来介绍新的尺寸方法,以及验证概念和评估其性能。结果表明,尺寸并没有完全耗尽其潜力,允许超越已经建立的延迟功率权衡,因为尺寸可以通过以下方式增加SNMs:(i)调整阈值电压(VTH)及其变化(σVTH);及(ii)平衡职业训练局。仿真结果表明,这种尺寸方法可以实现更可靠(即噪声鲁棒性和容差性)的CMOS门,可以在非常低的电源电压下正确工作,从而实现超低电压/功率电路。
{"title":"Enabling sizing for enhancing the static noise margins","authors":"Valeriu Beiu, A. Beg, W. Ibrahim, F. Kharbash, M. Alioto","doi":"10.1109/ISQED.2013.6523623","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523623","url":null,"abstract":"This paper suggests a transistor sizing method for classical CMOS gates implemented in advanced technology nodes and operating at low voltages. The method relies on upsizing the length (L) of all transistors uniformly, and balancing the voltage transfer curves (VTCs) for maximizing the static noise margins (SNMs). We use the most well-known CMOS gates (INV, NAND-2, NOR-2) for introducing the novel sizing method, as well as for validating the concept and evaluating its performances. The results show that sizing has not entirely exhausted its potential, allowing to go beyond the well established delay-power tradeoff, as sizing can increase SNMs by: (i) adjusting the threshold voltages (VTH) and their variations (σVTH); and (ii) balancing the VTCs. Simulation results show that this sizing method enables more reliable (i.e., noise-robust and variation-tolerant) CMOS gates, which could operate correctly at very low supply voltages, hence leading to ultra-low voltage/power circuits.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133825702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Fast reliability exploration for embedded processors via high-level fault injection 基于高级故障注入的嵌入式处理器可靠性快速探索
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523621
Z. Wang, Chao Chen, A. Chattopadhyay
The downscaling of technology features has brought the system developers an important design criteria, reliability, into prime consideration. Due to effects like external radiation and temperature gradients, the CMOS device is not guaranteed anymore to function flawlessly. Admission for errors to occur is also helpful as that increases the power budget. The power-reliability trade-off compounds the system design challenge by adding another metric, for which efficient design exploration framework is needed. In this work, we present a high-level design framework extended with the capability of fault injection, an important ingredient of reliability-driven design. Compared to the traditional HDL-based fault injection, the proposed fault injection during instruction-set simulation is significantly faster without any notable loss of accuracy. The fault injection framework also allows quick exploration of fault prevention measure both by the aid of software and hardware techniques. We demonstrate the efficacy of our approach by a case study with a RISC processor customized for cryptographic application, where fault protection plays a major role. We also benchmark our framework with a state-of-the-art HDL-based fault injection framework.
技术特性的小型化使系统开发人员必须首先考虑可靠性这一重要的设计标准。由于外部辐射和温度梯度等影响,CMOS器件不再保证完美运行。承认发生错误也很有帮助,因为这会增加功率预算。电力可靠性权衡增加了另一个度量,从而使系统设计挑战复杂化,因此需要有效的设计探索框架。在这项工作中,我们提出了一个扩展了故障注入能力的高级设计框架,故障注入是可靠性驱动设计的一个重要组成部分。与传统的基于hdl的故障注入方法相比,该方法在指令集仿真过程中的故障注入速度明显加快,且没有明显的准确性损失。故障注入框架还允许通过软件和硬件技术的帮助快速探索故障预防措施。我们通过一个为加密应用定制的RISC处理器的案例研究来证明我们方法的有效性,其中故障保护起着主要作用。我们还使用最先进的基于hdl的故障注入框架对我们的框架进行基准测试。
{"title":"Fast reliability exploration for embedded processors via high-level fault injection","authors":"Z. Wang, Chao Chen, A. Chattopadhyay","doi":"10.1109/ISQED.2013.6523621","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523621","url":null,"abstract":"The downscaling of technology features has brought the system developers an important design criteria, reliability, into prime consideration. Due to effects like external radiation and temperature gradients, the CMOS device is not guaranteed anymore to function flawlessly. Admission for errors to occur is also helpful as that increases the power budget. The power-reliability trade-off compounds the system design challenge by adding another metric, for which efficient design exploration framework is needed. In this work, we present a high-level design framework extended with the capability of fault injection, an important ingredient of reliability-driven design. Compared to the traditional HDL-based fault injection, the proposed fault injection during instruction-set simulation is significantly faster without any notable loss of accuracy. The fault injection framework also allows quick exploration of fault prevention measure both by the aid of software and hardware techniques. We demonstrate the efficacy of our approach by a case study with a RISC processor customized for cryptographic application, where fault protection plays a major role. We also benchmark our framework with a state-of-the-art HDL-based fault injection framework.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130107177","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Clustering techniques and statistical fault injection for selective mitigation of SEUs in flip-flops 基于聚类技术和统计故障注入的人字拖seu选择性缓解
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523691
A. Evans, M. Nicolaidis, Shi-Jie Wen, Thiago Asis
In large SoCs, managing the effects of soft-errors in flip-flops is essential, however, selective mitigation is necessary to minimize the area and power costs. The identification of the optimal set of flip-flops to protect typically requires compute-intensive fault-injection campaigns. We present new techniques which group similar flip-flops into clusters to significantly reduce the number of fault injections. The number of required fault injections can be significantly lower than the total number of flip-flops and in one industrial design with over 100,000 flip-flops, by simulating only 2,100 fault injections, the technique identified a set of 4.1% of the flip-flops, which when protected, reduced the critical failure rate by a factor of 7x.
在大型soc中,管理人字拖中的软错误的影响是必不可少的,然而,有选择性的缓解是必要的,以尽量减少面积和功耗成本。确定要保护的最优触发器集通常需要计算密集型的故障注入活动。我们提出了新的技术,将相似的触发器分组成簇,以显着减少断层注入的数量。所需故障注入的数量可以显着低于触发器的总数,并且在一个超过100,000个触发器的工业设计中,通过仅模拟2,100个故障注入,该技术识别出一组4.1%的触发器,当受到保护时,将临界故障率降低了7倍。
{"title":"Clustering techniques and statistical fault injection for selective mitigation of SEUs in flip-flops","authors":"A. Evans, M. Nicolaidis, Shi-Jie Wen, Thiago Asis","doi":"10.1109/ISQED.2013.6523691","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523691","url":null,"abstract":"In large SoCs, managing the effects of soft-errors in flip-flops is essential, however, selective mitigation is necessary to minimize the area and power costs. The identification of the optimal set of flip-flops to protect typically requires compute-intensive fault-injection campaigns. We present new techniques which group similar flip-flops into clusters to significantly reduce the number of fault injections. The number of required fault injections can be significantly lower than the total number of flip-flops and in one industrial design with over 100,000 flip-flops, by simulating only 2,100 fault injections, the technique identified a set of 4.1% of the flip-flops, which when protected, reduced the critical failure rate by a factor of 7x.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131837088","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Application-driven power efficient ALU design methodology for modern microprocessors 现代微处理器应用驱动的节能ALU设计方法
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523608
Na Gong, Jinhui Wang, R. Sridhar
In this paper, we propose an application-driven ALU design methodology to achieve high level of power efficiency for modern microprocessors. We introduce a PN selection algorithm (PNSA) which enables designers to select power efficient dynamic modules for different applications, based on the detailed analysis of dynamic circuits. Experimental results on ISCAS85 and 74X-Series benchmark circuits show that the power consumption of 8-bit ALU based on this approach can be reduced by 54%-60% for different frequency levels as compared to the conventional dynamic ALU design, demonstrating the effectiveness of the proposed method on application-driven custom ALU design.
在本文中,我们提出了一种应用驱动的ALU设计方法,以实现现代微处理器的高功率效率。我们介绍了一种PN选择算法(PNSA),使设计人员能够在详细分析动态电路的基础上,为不同的应用选择节能的动态模块。在ISCAS85和74x系列基准电路上的实验结果表明,与传统的动态ALU设计相比,基于该方法的8位ALU在不同频率水平下的功耗可降低54% ~ 60%,证明了该方法在应用驱动的定制ALU设计中的有效性。
{"title":"Application-driven power efficient ALU design methodology for modern microprocessors","authors":"Na Gong, Jinhui Wang, R. Sridhar","doi":"10.1109/ISQED.2013.6523608","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523608","url":null,"abstract":"In this paper, we propose an application-driven ALU design methodology to achieve high level of power efficiency for modern microprocessors. We introduce a PN selection algorithm (PNSA) which enables designers to select power efficient dynamic modules for different applications, based on the detailed analysis of dynamic circuits. Experimental results on ISCAS85 and 74X-Series benchmark circuits show that the power consumption of 8-bit ALU based on this approach can be reduced by 54%-60% for different frequency levels as compared to the conventional dynamic ALU design, demonstrating the effectiveness of the proposed method on application-driven custom ALU design.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129658143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells 使用动态可重构隔离单元的能量感知的粗粒度可重构架构
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523597
Syed M. A. H. Jafri, Ozan Bag, A. Hemani, Nasim Farahini, K. Paul, J. Plosila, H. Tenhunen
This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.
本文提出了一种自适应架构,以提高粗粒度可重构架构(CGRAs)的能效。如今,平台承载着多个应用程序,具有任意的应用程序间通信和并发模式。每个应用程序本身可以有多个版本(具有不同并行度的实现),而最佳版本只能在运行时确定。对于这样的场景,传统的最坏情况设计和编译时映射决策既不是最优的,也不是理想的。针对此问题的现有解决方案使用昂贵的专用硬件在运行时配置操作点(使用DVFS)。作为专用硬件的替代方案,我们建议利用现代CGRAs的重新配置特性。我们的解决方案依赖于动态可重构隔离单元(DRICs)和自主并行、电压和频率选择算法(APVFS)。drc通过将现有资源配置为隔离单元来减少DVFS电路的开销。APVFS通过动态选择并行度、电压和频率三重奏来确保高效率,从而消耗最小的功率以满足可用资源的最后期限。使用代表性应用程序(矩阵乘法、FIR和FFT)的仿真结果显示,与传统的DVFS设计相比,功率和能量分别降低了23%和51%。合成结果证实,与最先进的DVFS方法相比,面积开销显着减少。
{"title":"Energy-aware coarse-grained reconfigurable architectures using dynamically reconfigurable isolation cells","authors":"Syed M. A. H. Jafri, Ozan Bag, A. Hemani, Nasim Farahini, K. Paul, J. Plosila, H. Tenhunen","doi":"10.1109/ISQED.2013.6523597","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523597","url":null,"abstract":"This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117310185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
CMOS inverter delay model based on DC transfer curve for slow input 慢输入下基于直流转移曲线的CMOS逆变器延时模型
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523679
F. Marranghello, A. Reis, R. Ribas
This work presents a novel approach to estimate the CMOS inverter delay. The proposed delay model uses the DC transfer curve in order to predict the inverter behavior for slow input transitions rather than estimating the discharging time. Moreover, the only required empirical parameters are those used to calibrate the transistor model. Results are on very good agreement with HSPICE simulations based on BSIM4 transistor model, over a wide range of input slopes and output loads. Comparisons to previously works show that such new delay model offers improved modeling with good trade-off between simplicity and accuracy. The average error is near to 3%, and the worst case error is smaller than 10%.
本文提出了一种估算CMOS逆变器延迟的新方法。所提出的延迟模型使用直流传输曲线来预测逆变器在慢输入转换时的行为,而不是估计放电时间。此外,唯一需要的经验参数是用于校准晶体管模型的参数。结果与基于BSIM4晶体管模型的HSPICE模拟结果非常吻合,在很宽的输入斜率和输出负载范围内。与之前的研究结果比较表明,这种新的延迟模型在简单性和准确性之间提供了更好的权衡。平均误差接近3%,最坏情况误差小于10%。
{"title":"CMOS inverter delay model based on DC transfer curve for slow input","authors":"F. Marranghello, A. Reis, R. Ribas","doi":"10.1109/ISQED.2013.6523679","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523679","url":null,"abstract":"This work presents a novel approach to estimate the CMOS inverter delay. The proposed delay model uses the DC transfer curve in order to predict the inverter behavior for slow input transitions rather than estimating the discharging time. Moreover, the only required empirical parameters are those used to calibrate the transistor model. Results are on very good agreement with HSPICE simulations based on BSIM4 transistor model, over a wide range of input slopes and output loads. Comparisons to previously works show that such new delay model offers improved modeling with good trade-off between simplicity and accuracy. The average error is near to 3%, and the worst case error is smaller than 10%.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134099282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Effective thermal control techniques for liquid-cooled 3D multi-core processors 液冷3D多核处理器的有效热控制技术
Pub Date : 2013-03-04 DOI: 10.1109/ISQED.2013.6523583
Yue Hu, Shaoming Chen, Lu Peng, Edward Song, Jin-Woo Choi
Microchannel liquid cooling shows great potential in cooling 3D processors. However, the cooling of 3D processors is limited due to design-time and run-time challenges. Moreover, in new technologies, the processor power density is continually increasing and this will bring more serious challenges to liquid cooling. In this paper, we propose two thermal control techniques: 1) Core Vertically Placed (CVP) technique. According to the architecture of a processor core, two schemes are given for placing a core vertically onto multilayers. The 3D processor with the CVP technique can be better cooled since its separate hotspot blocks have a larger total contact area with the cooler surroundings. 2) Thermoelectric cooling (TEC) technique. We propose to incorporate the TEC technique into the liquid-cooled 3D processor to enhance the cooling of hotspots. Our experiments show the CVP technique reduces the maximum temperature up to 29.58 °C, and 16.64 °C on average compared with the baseline design. Moreover, the TEC technique effectively cools down a hotspot from 96.86 °C to 78.60 °C.
微通道液体冷却在冷却3D处理器方面显示出巨大的潜力。然而,由于设计时和运行时的挑战,3D处理器的冷却受到限制。此外,在新技术中,处理器功率密度不断提高,这将给液冷带来更严峻的挑战。本文提出了两种热控制技术:1)核心垂直放置(CVP)技术。根据处理器核心的结构,给出了两种将核心垂直放置在多层上的方案。采用CVP技术的3D处理器可以更好地冷却,因为其单独的热点块与较冷的环境有更大的总接触面积。2)热电冷却(TEC)技术。我们建议将TEC技术整合到液冷3D处理器中,以增强热点的冷却。实验表明,与基线设计相比,CVP技术可将最高温度降低29.58°C,平均降低16.64°C。此外,TEC技术有效地将热点从96.86°C冷却到78.60°C。
{"title":"Effective thermal control techniques for liquid-cooled 3D multi-core processors","authors":"Yue Hu, Shaoming Chen, Lu Peng, Edward Song, Jin-Woo Choi","doi":"10.1109/ISQED.2013.6523583","DOIUrl":"https://doi.org/10.1109/ISQED.2013.6523583","url":null,"abstract":"Microchannel liquid cooling shows great potential in cooling 3D processors. However, the cooling of 3D processors is limited due to design-time and run-time challenges. Moreover, in new technologies, the processor power density is continually increasing and this will bring more serious challenges to liquid cooling. In this paper, we propose two thermal control techniques: 1) Core Vertically Placed (CVP) technique. According to the architecture of a processor core, two schemes are given for placing a core vertically onto multilayers. The 3D processor with the CVP technique can be better cooled since its separate hotspot blocks have a larger total contact area with the cooler surroundings. 2) Thermoelectric cooling (TEC) technique. We propose to incorporate the TEC technique into the liquid-cooled 3D processor to enhance the cooling of hotspots. Our experiments show the CVP technique reduces the maximum temperature up to 29.58 °C, and 16.64 °C on average compared with the baseline design. Moreover, the TEC technique effectively cools down a hotspot from 96.86 °C to 78.60 °C.","PeriodicalId":127115,"journal":{"name":"International Symposium on Quality Electronic Design (ISQED)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114146658","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
期刊
International Symposium on Quality Electronic Design (ISQED)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1