2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文中文

ARO-PUF: An aging-resistant ring oscillator PUF design ARO-PUF:一种耐老化的环形振荡器PUF设计

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.082

Md. Tauhidur Rahman, Domenic Forte, J. Fahrny, M. Tehranipoor

Physically Unclonable Functions (PUFs) have emerged as a security block with the potential to generate chip-specific identifiers and cryptographic keys. However it has been shown that the stability of these identifiers and keys is heavily impacted by aging and environmental variations. Previous techniques have mostly focused on improving PUF robustness against supply noise and temperature but aging has been largely neglected. In this paper, we propose a new aging resistant design for the popular ring-oscillator (RO)-PUF. Simulation results demonstrate that our aging resistant RO-PUF (called ARO-PUF) can produce unique, random, and more reliable keys. Only 7.7% bits get flipped on average over 10 years operation period for an ARO-PUF due to aging where the value is 32% for a conventional RO-PUF. The ARO-PUF shows an average interchip HD of 49.67% (close to ideal value 50%) and better than the conventional RO-PUF (~45%). With lower error, ARO-PUF offers ~ 24X area reduction for a 128-bit key because of reduced ECC complexity and smaller PUF footprint.

物理不可克隆函数(puf)已经作为一种安全块出现，具有生成芯片特定标识符和加密密钥的潜力。然而，研究表明，这些标识符和密钥的稳定性受到老化和环境变化的严重影响。以前的技术主要集中在提高PUF对电源噪声和温度的鲁棒性，但老化在很大程度上被忽视了。在本文中，我们提出了一种新的抗老化设计流行的环形振荡器(RO)-PUF。仿真结果表明，我们的抗老化RO-PUF(称为ARO-PUF)可以产生唯一的、随机的、更可靠的密钥。由于老化，ARO-PUF在10年的运行周期内平均只有7.7%的比特被翻转，而传统RO-PUF的这一数值为32%。ARO-PUF的平均片间HD为49.67%(接近理想值50%)，优于常规RO-PUF(~45%)。由于降低了ECC复杂度和更小的PUF占用空间，ARO-PUF的误差更低，为128位密钥提供了约24X的面积减少。

引用次数: 96

On-device objective-C application optimization framework for high-performance mobile processors 针对高性能移动处理器的设备上objective-C应用程序优化框架

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.098

Garo Bournoutian, A. Orailoglu

Smartphones provide applications that are increasingly similar to those of interactive desktop programs, providing rich graphics and animations. To simplify the creation of these interactive applications, mobile operating systems employ highlevel object-oriented programming languages and shared libraries to manipulate the device's peripherals and provide common userinterface frameworks. The presence of dynamic dispatch and polymorphism allows for robust and extensible application coding. Unfortunately, the presence of dynamic dispatch also introduces significant overheads during method calls, which directly impact execution time. Furthermore, since these applications rely heavily on shared libraries and helper routines, the quantity of these method calls is higher than those found in typical desktop-based programs. Optimizing these method calls centrally before consumers download the application onto a given phone is exacerbated due to the large diversity of hardware and operating system versions that the application could run on. This paper proposes a methodology to tailor a given Objective-C application and its associated device-specific shared library codebase using on-device post-compilation code optimization and transformation. In doing so, many polymorphic sites can be resolved statically, improving the overall application performance.

智能手机提供的应用程序越来越类似于交互式桌面程序，提供丰富的图形和动画。为了简化这些交互式应用程序的创建，移动操作系统使用高级面向对象编程语言和共享库来操作设备的外围设备，并提供公共用户界面框架。动态分派和多态性的存在支持健壮和可扩展的应用程序编码。不幸的是，动态分派的存在还会在方法调用期间引入大量开销，直接影响执行时间。此外，由于这些应用程序严重依赖于共享库和helper例程，因此这些方法调用的数量要高于典型的基于桌面的程序。在用户将应用程序下载到给定的手机之前，集中优化这些方法调用会加剧，因为应用程序可以运行的硬件和操作系统版本存在很大的差异。本文提出了一种方法，使用设备上的编译后代码优化和转换来定制给定的Objective-C应用程序及其相关的特定于设备的共享库代码库。这样，可以静态地解析许多多态站点，从而提高应用程序的整体性能。

{"title":"On-device objective-C application optimization framework for high-performance mobile processors","authors":"Garo Bournoutian, A. Orailoglu","doi":"10.7873/DATE.2014.098","DOIUrl":"https://doi.org/10.7873/DATE.2014.098","url":null,"abstract":"Smartphones provide applications that are increasingly similar to those of interactive desktop programs, providing rich graphics and animations. To simplify the creation of these interactive applications, mobile operating systems employ highlevel object-oriented programming languages and shared libraries to manipulate the device's peripherals and provide common userinterface frameworks. The presence of dynamic dispatch and polymorphism allows for robust and extensible application coding. Unfortunately, the presence of dynamic dispatch also introduces significant overheads during method calls, which directly impact execution time. Furthermore, since these applications rely heavily on shared libraries and helper routines, the quantity of these method calls is higher than those found in typical desktop-based programs. Optimizing these method calls centrally before consumers download the application onto a given phone is exacerbated due to the large diversity of hardware and operating system versions that the application could run on. This paper proposes a methodology to tailor a given Objective-C application and its associated device-specific shared library codebase using on-device post-compilation code optimization and transformation. In doing so, many polymorphic sites can be resolved statically, improving the overall application performance.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"IA-20 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84602982","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

A power-efficient reconfigurable architecture using PCM configuration technology 采用PCM配置技术的节能可重构架构

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.349

A. Ahari, H. Asadi, Behnam Khaleghi, M. Tahoori

Promising advantages offered by resistive NonVolatile Memories (NVMs) have brought great attention to replace existing volatile memory technologies. While NVMs were primarily studied to be used in the memory hierarchy, they can also provide benefits in Field-Programmable Gate Arrays (FPGAs). One major limitation of employing NVMs in FPGAs is significant power and area overheads imposed by the Peripheral Circuitry (PC) of NVM configuration bits. In this paper, we investigate the applicability of different NVM technologies for configuration bits of FPGAs and propose a power-efficient reconfigurable architecture based on Phase Change Memory (PCM). The proposed PCM-based architecture has been evaluated using different technology nodes and it is compared to the SRAM-based FPGA architecture. Power and Power Delay Product (PDP) estimations of the proposed architecture show up to 37.7% and 35.7% improvements over SRAM-based FPGAs, respectively, with less than 3.2% performance overhead.

电阻式非易失性存储器(NVMs)以其优越的性能取代了现有的易失性存储器技术，引起了人们的广泛关注。虽然nvm主要研究用于内存层次结构，但它们也可以在现场可编程门阵列(fpga)中提供好处。在fpga中使用NVM的一个主要限制是NVM配置位的外围电路(PC)带来的显著功率和面积开销。在本文中，我们研究了不同的NVM技术对fpga配置位的适用性，并提出了一种基于相变存储器(PCM)的节能可重构架构。采用不同的技术节点对所提出的基于pcm的架构进行了评估，并将其与基于sram的FPGA架构进行了比较。该架构的功率和功率延迟产品(PDP)估计分别比基于sram的fpga提高了37.7%和35.7%，性能开销低于3.2%。

引用次数: 4

A tree arbiter cell for high speed resource sharing in asynchronous environments 用于异步环境中高速资源共享的树仲裁单元

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.308

S. R. Naqvi, A. Steininger

We present a novel tree arbiter cell that allows a pipelined processing of asynchronous requests. In this way it can achieve significantly lower delay in the critical case of frequent requests coming from different clients. We elaborate the necessary extension to facilitate a cascaded use of this cell in a tree-like fashion, and we show by theoretical analysis that in this configuration our cell provides better fairness than the standard approach. We implement our approach and quantitatively compare its performance properties with related work in a gatelevel simulation. In our sample asynchronous Networks-on-Chip application our new cell proves to increase the throughput of three different designs available in literature by approximately 61.28%, 69.24%, and 186.85% respectively.

我们提出了一种新的树仲裁单元，它允许异步请求的流水线处理。通过这种方式，它可以在来自不同客户机的频繁请求的关键情况下显著降低延迟。我们详细阐述了必要的扩展，以促进以树状方式级联使用此单元，并且通过理论分析表明，在此配置中，我们的单元提供了比标准方法更好的公平性。我们实现了我们的方法，并在网关级仿真中定量地比较了其性能特性与相关工作。在我们的示例异步片上网络应用程序中，我们的新单元证明了文献中三种不同设计的吞吐量分别提高了大约61.28%、69.24%和186.85%。

引用次数: 6

Bias Temperature Instability analysis of FinFET based SRAM cells 基于FinFET的SRAM电池的偏置温度不稳定性分析

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.044

Seyab Khan, I. Agbo, S. Hamdioui, H. Kukner, B. Kaczer, P. Raghavan, F. Catthoor

Bias Temperature Instability (BTI) is posing a major reliability challenge for today's and future semiconductor devices as it degrades their performance. This paper provides a comprehensive BTI impact analysis, in terms of time-dependent degradation, of FinFET based SRAM cell. The evaluation metrics are read Static Noise Margin (SNM), hold SNM and Write Trip Point (WTP); while the aspects investigated include BTI impact dependence on the supply voltage, cell strength, and design styles (6 versus 8 Transistors cell). A comparison between FinFET and planar CMOS based SRAM cells degradation is also covered. The simulation performed on FinFET based cells for 108 seconds of operation under nominal Vdd show that Read SNM degradation is 16.72%, which is 1.17× faster than hold SNM, while WTP improves by 6.82%. In addition, a supply voltage increment of 25% reduces the Read SNM degradation by 40%, while strengthening the cell pull-down transistors by 1.5× reduces the degradation by only 22%. Moreover, the results reveal that 8T cell degrades 1.31× faster than 6T cell, and that FinFET cells are more vulnerable (~2×) to BTI degradation than planar CMOS cells.

偏置温度不稳定性(BTI)降低了半导体器件的性能，对当今和未来的半导体器件的可靠性构成了重大挑战。本文提供了一个全面的BTI影响分析，在时间相关的退化，基于FinFET的SRAM电池。评价指标为读静态噪声裕度(SNM)、保持SNM和写跳闸点(WTP);而研究的方面包括BTI对供电电压、电池强度和设计风格(6 vs 8晶体管电池)的影响。比较了FinFET和基于平面CMOS的SRAM电池的退化。基于FinFET的电池在额定Vdd下运行108秒的仿真表明，Read SNM的退化率为16.72%，比hold SNM快1.17倍，而WTP提高了6.82%。此外，电源电压增加25%可使Read SNM退化降低40%，而将电池下拉晶体管加强1.5倍仅可使退化降低22%。结果表明，8T电池的降解速度比6T电池快1.31倍，而FinFET电池比平面CMOS电池更容易受到BTI降解的影响(~2倍)。

{"title":"Bias Temperature Instability analysis of FinFET based SRAM cells","authors":"Seyab Khan, I. Agbo, S. Hamdioui, H. Kukner, B. Kaczer, P. Raghavan, F. Catthoor","doi":"10.7873/DATE.2014.044","DOIUrl":"https://doi.org/10.7873/DATE.2014.044","url":null,"abstract":"Bias Temperature Instability (BTI) is posing a major reliability challenge for today's and future semiconductor devices as it degrades their performance. This paper provides a comprehensive BTI impact analysis, in terms of time-dependent degradation, of FinFET based SRAM cell. The evaluation metrics are read Static Noise Margin (SNM), hold SNM and Write Trip Point (WTP); while the aspects investigated include BTI impact dependence on the supply voltage, cell strength, and design styles (6 versus 8 Transistors cell). A comparison between FinFET and planar CMOS based SRAM cells degradation is also covered. The simulation performed on FinFET based cells for 108 seconds of operation under nominal Vdd show that Read SNM degradation is 16.72%, which is 1.17× faster than hold SNM, while WTP improves by 6.82%. In addition, a supply voltage increment of 25% reduces the Read SNM degradation by 40%, while strengthening the cell pull-down transistors by 1.5× reduces the degradation by only 22%. Moreover, the results reveal that 8T cell degrades 1.31× faster than 6T cell, and that FinFET cells are more vulnerable (~2×) to BTI degradation than planar CMOS cells.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79858452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

Assessing the energy break-even point between an optical NoC architecture and an aggressive electronic baseline 评估光学NoC架构和侵略性电子基线之间的能量盈亏平衡点

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.321

L. Ramini, Alberto Ghiribaldi, P. Grani, S. Bartolini, H. Tatenguem, D. Bertozzi

Many crossbenchmarking results reported in the open literature raise optimistic expectations on the use of optical networks-on-chip (ONoCs) for high-performance and low-power on-chip communication. However, most of those previous works ultimately fail to make a compelling case for chip-level nanophotonic NoCs, especially for the lack of aggressive electronic baselines (ENoC), and the poor accuracy in physical- and architecture-layer analysis of the ONoC. This paper aims at providing the guidelines and minimum requirements so that nanophotonic emerging technology may become of practical relevance. The key differentiating factor of this work consists of contrasting ONoC solutions with an aggressive ENoC architecture with realistic complexity, performance, and power figures, synthesized on an industrial 40nm low-power technology. At the same time, key physical design issues and network interface architecture requirements for the ONoC under test are carefully assessed, thus paving the way for a well-grounded definition of the requirements for the emerging ONoC technology to achieve the energy break-even point with respect to pure electronic interconnect solutions in future multi- and many-core systems.

公开文献中报道的许多交叉基准测试结果对使用光网络片上(ONoCs)进行高性能和低功耗片上通信提出了乐观的期望。然而，大多数先前的工作最终未能对芯片级纳米光子noc提出令人信服的案例，特别是缺乏积极的电子基线(ENoC)，以及ONoC的物理层和架构层分析的准确性较差。本文旨在为纳米光子新兴技术的实际应用提供指导和最低要求。这项工作的关键区别因素包括将ONoC解决方案与具有实际复杂性，性能和功耗数据的激进ENoC架构进行对比，这些数据是在工业40nm低功耗技术上合成的。同时，对正在测试的ONoC的关键物理设计问题和网络接口架构要求进行了仔细评估，从而为新兴ONoC技术的需求定义铺平了道路，以实现未来多核和多核系统中纯电子互连解决方案的能量盈亏平衡点。

{"title":"Assessing the energy break-even point between an optical NoC architecture and an aggressive electronic baseline","authors":"L. Ramini, Alberto Ghiribaldi, P. Grani, S. Bartolini, H. Tatenguem, D. Bertozzi","doi":"10.7873/DATE.2014.321","DOIUrl":"https://doi.org/10.7873/DATE.2014.321","url":null,"abstract":"Many crossbenchmarking results reported in the open literature raise optimistic expectations on the use of optical networks-on-chip (ONoCs) for high-performance and low-power on-chip communication. However, most of those previous works ultimately fail to make a compelling case for chip-level nanophotonic NoCs, especially for the lack of aggressive electronic baselines (ENoC), and the poor accuracy in physical- and architecture-layer analysis of the ONoC. This paper aims at providing the guidelines and minimum requirements so that nanophotonic emerging technology may become of practical relevance. The key differentiating factor of this work consists of contrasting ONoC solutions with an aggressive ENoC architecture with realistic complexity, performance, and power figures, synthesized on an industrial 40nm low-power technology. At the same time, key physical design issues and network interface architecture requirements for the ONoC under test are carefully assessed, thus paving the way for a well-grounded definition of the requirements for the emerging ONoC technology to achieve the energy break-even point with respect to pure electronic interconnect solutions in future multi- and many-core systems.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"82 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80327637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 21

Integrated microfluidic power generation and cooling for bright silicon MPSoCs 集成微流体发电和冷却的亮硅mpsoc

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.147

M. Sabry, A. Sridhar, David Atienza Alonso, P. Ruch, B. Michel

The soaring demand for computing power in our digital information age has produced, as an undesirable side-effect, a surge in power consumption and heat density for Multiprocessors Systems-on-Chip (MPSoCs). The resulting temperature rise results in operating conditions that already preclude operating all the cores at maximum performance levels, in order to prevent system overheating and failures. With more power demands, MPSoCs will face a power delivery wall due to the reliability limitations of the underlying power delivery medium. Thus, state-of-the-art power and cooling delivery solutions are reaching their performance limits and it will no longer be possible to power up simultaneously all the available on-chip cores (situation known as dark silicon). In this paper we investigate a recently proposed disruptive approach to overcome the prevailing worst-case power and cooling provisioning paradigms for MPSoCs. This proposed approach integrates MPSoC with an on-chip microfluidic fuel cell network for joint cooling and power supply (i.e., localized power generation and delivery). By providing alternative means to power delivery integrated with cooling, MPSoCs are expected to gain in I/O connectivity. Based on this disruptive technology, we can envision the removal of the current limits of power delivery and heat dissipation in MPSoC designs, subsequently avoiding dark silicon and enabling a paradigm shift in future energy-proportional computing architecture designs.

在我们的数字信息时代，对计算能力的不断增长的需求产生了一个不良的副作用，即多处理器片上系统(mpsoc)的功耗和热密度激增。由此产生的温度升高导致的操作条件已经排除了所有核心在最大性能水平上运行，以防止系统过热和故障。随着功率需求的增加，由于底层功率传输介质的可靠性限制，mpsoc将面临功率传输墙。因此，最先进的电源和冷却传输解决方案正在达到其性能极限，并且不再可能同时启动所有可用的片上核心(称为暗硅)。在本文中，我们研究了最近提出的一种颠覆性方法，以克服mpsoc中普遍存在的最坏情况功率和冷却供应范例。该方法将MPSoC与片上微流体燃料电池网络集成在一起，用于联合冷却和供电(即本地化发电和输送)。通过提供与冷却集成的电力传输的替代方法，mpsoc有望获得I/O连接。基于这种颠覆性的技术，我们可以设想在MPSoC设计中消除目前的功率传输和散热限制，从而避免暗硅，并在未来的能量比例计算架构设计中实现范式转变。

{"title":"Integrated microfluidic power generation and cooling for bright silicon MPSoCs","authors":"M. Sabry, A. Sridhar, David Atienza Alonso, P. Ruch, B. Michel","doi":"10.7873/DATE.2014.147","DOIUrl":"https://doi.org/10.7873/DATE.2014.147","url":null,"abstract":"The soaring demand for computing power in our digital information age has produced, as an undesirable side-effect, a surge in power consumption and heat density for Multiprocessors Systems-on-Chip (MPSoCs). The resulting temperature rise results in operating conditions that already preclude operating all the cores at maximum performance levels, in order to prevent system overheating and failures. With more power demands, MPSoCs will face a power delivery wall due to the reliability limitations of the underlying power delivery medium. Thus, state-of-the-art power and cooling delivery solutions are reaching their performance limits and it will no longer be possible to power up simultaneously all the available on-chip cores (situation known as dark silicon). In this paper we investigate a recently proposed disruptive approach to overcome the prevailing worst-case power and cooling provisioning paradigms for MPSoCs. This proposed approach integrates MPSoC with an on-chip microfluidic fuel cell network for joint cooling and power supply (i.e., localized power generation and delivery). By providing alternative means to power delivery integrated with cooling, MPSoCs are expected to gain in I/O connectivity. Based on this disruptive technology, we can envision the removal of the current limits of power delivery and heat dissipation in MPSoC designs, subsequently avoiding dark silicon and enabling a paradigm shift in future energy-proportional computing architecture designs.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"11 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81865451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

ALLARM: Optimizing sparse directories for thread-local data ALLARM:为线程本地数据优化稀疏目录

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.091

Amitabha Roy, Timothy M. Jones

Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory access (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely backward compatible with existing operating systems and software, and provides a means to scale cache coherence into the many-core era. On a mix of SPLASH2 and Parsec workloads, ALLARM is able to improve performance by 13% on average while reducing dynamic energy consumption by 9% in the on-chip network and 15% in the directory controller. This is achieved through a 46% reduction in the number of sparse directory entries evicted.

大规模缓存一致的系统通常会对在整个生命周期内都是线程私有的数据施加不必要的开销。这包括用于跟踪数据相干状态的资源，以及通过互连发送的不必要的相干消息。在本文中，我们展示了如何利用非统一内存访问(NUMA)系统的内存分配策略来消除线程本地数据的任何与一致性相关的流量，以及消除在稀疏目录中跟踪这些缓存线的需要。我们的策略是，仅在从与目录不同的关联域中的节点丢失时才分配目录状态。我们将此称为对远程错过分配，或ALLARM。我们的解决方案完全向后兼容现有的操作系统和软件，并提供了一种将缓存一致性扩展到多核时代的方法。在SPLASH2和Parsec工作负载的混合情况下，ALLARM能够平均提高13%的性能，同时在片上网络中减少9%的动态能耗，在目录控制器中减少15%的动态能耗。这是通过减少46%的稀疏目录条目数量来实现的。

{"title":"ALLARM: Optimizing sparse directories for thread-local data","authors":"Amitabha Roy, Timothy M. Jones","doi":"10.7873/DATE.2014.091","DOIUrl":"https://doi.org/10.7873/DATE.2014.091","url":null,"abstract":"Large-scale cache-coherent systems often impose unnecessary overhead on data that is thread-private for the whole of its lifetime. These include resources devoted to tracking the coherence state of the data, as well as unnecessary coherence messages sent out over the interconnect. In this paper we show how the memory allocation strategy for non-uniform memory access (NUMA) systems can be exploited to remove any coherence-related traffic for thread-local data, as well removing the need to track those cache lines in sparse directories. Our strategy is to allocate directory state only on a miss from a node in a different affinity domain from the directory. We call this ALLocAte on Remote Miss, or ALLARM. Our solution is entirely backward compatible with existing operating systems and software, and provides a means to scale cache coherence into the many-core era. On a mix of SPLASH2 and Parsec workloads, ALLARM is able to improve performance by 13% on average while reducing dynamic energy consumption by 9% in the on-chip network and 15% in the directory controller. This is achieved through a 46% reduction in the number of sparse directory entries evicted.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"8 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78671289","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

A low-power, high-performance approximate multiplier with configurable partial error recovery 低功耗，高性能近似乘法器，可配置部分误差恢复

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE.2014.108

Cong Liu, Jie Han, F. Lombardi

Approximate circuits have been considered for error-tolerant applications that can tolerate some loss of accuracy with improved performance and energy efficiency. Multipliers are key arithmetic circuits in many such applications such as digital signal processing (DSP). In this paper, a novel approximate multiplier with a lower power consumption and a shorter critical path than traditional multipliers is proposed for high-performance DSP applications. This multiplier leverages a newly-designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation. Different levels of accuracy can be achieved through a configurable error recovery by using different numbers of most significant bits (MSBs) for error reduction. The approximate multiplier has a low mean error distance, i.e., most of the errors are not significant in magnitude. Compared to the Wallace multiplier, a 16-bit approximate multiplier implemented in a 28nm CMOS process shows a reduction in delay and power of 20% and up to 69%, respectively. It is shown that by utilizing an appropriate error recovery, the proposed approximate multiplier achieves similar processing accuracy as traditional exact multipliers but with significant improvements in power and performance.

近似电路已被考虑用于容错应用，它可以容忍精度的一些损失，同时提高性能和能源效率。乘法器是数字信号处理(DSP)等许多应用中的关键算术电路。本文提出了一种比传统乘法器具有更低功耗和更短关键路径的新型近似乘法器，用于高性能DSP应用。这个乘法器利用了一个新设计的近似加法器，限制了它的进位传播到最近的邻居，以实现快速的部分乘积积累。通过使用不同数量的最高有效位(msb)来减少错误，可以通过可配置的错误恢复来实现不同级别的准确性。近似乘法器具有较低的平均误差距离，即大多数误差在量级上不显著。与Wallace乘法器相比，在28nm CMOS工艺中实现的16位近似乘法器的延迟和功耗分别降低了20%和69%。结果表明，通过适当的误差恢复，所提出的近似乘法器可以达到与传统精确乘法器相似的处理精度，但在功率和性能上有显著提高。

{"title":"A low-power, high-performance approximate multiplier with configurable partial error recovery","authors":"Cong Liu, Jie Han, F. Lombardi","doi":"10.7873/DATE.2014.108","DOIUrl":"https://doi.org/10.7873/DATE.2014.108","url":null,"abstract":"Approximate circuits have been considered for error-tolerant applications that can tolerate some loss of accuracy with improved performance and energy efficiency. Multipliers are key arithmetic circuits in many such applications such as digital signal processing (DSP). In this paper, a novel approximate multiplier with a lower power consumption and a shorter critical path than traditional multipliers is proposed for high-performance DSP applications. This multiplier leverages a newly-designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation. Different levels of accuracy can be achieved through a configurable error recovery by using different numbers of most significant bits (MSBs) for error reduction. The approximate multiplier has a low mean error distance, i.e., most of the errors are not significant in magnitude. Compared to the Wallace multiplier, a 16-bit approximate multiplier implemented in a 28nm CMOS process shows a reduction in delay and power of 20% and up to 69%, respectively. It is shown that by utilizing an appropriate error recovery, the proposed approximate multiplier achieves similar processing accuracy as traditional exact multipliers but with significant improvements in power and performance.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"137 1","pages":"1-4"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86357256","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 271

HEROIC: Homomorphically EncRypted One Instruction Computer 英雄:同态加密单指令计算机

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

Pub Date : 2014-03-24 DOI: 10.7873/DATE2014.259

N. G. Tsoutsos, M. Maniatakos

As cloud computing becomes mainstream, the need to ensure the privacy of the data entrusted to third parties keeps rising. Cloud providers resort to numerous security controls and encryption to thwart potential attackers. Still, since the actual computation inside cloud microprocessors remains unencrypted, the opportunity of leakage is theoretically possible. Therefore, in order to address the challenge of protecting the computation inside the microprocessor, we introduce a novel general purpose architecture for secure data processing, called HEROIC (Homomorphically EncRypted One Instruction Computer). This new design utilizes a single instruction architecture and provides native processing of encrypted data at the architecture level. The security of the solution is assured by a variant of Paillier's homomorphic encryption scheme, used to encrypt both instructions and data. Experimental results using our hardware-cognizant software simulator, indicate an average execution overhead between 5 and 45 times for the encrypted computation (depending on the security parameter), compared to the unencrypted variant, for a 16-bit single instruction architecture.

随着云计算成为主流，确保委托给第三方的数据隐私的需求不断上升。云提供商采用大量的安全控制和加密来阻止潜在的攻击者。尽管如此，由于云微处理器内部的实际计算仍未加密，理论上有可能发生泄漏。因此，为了解决保护微处理器内部计算的挑战，我们引入了一种新的通用架构，用于安全数据处理，称为英雄(同态加密单指令计算机)。这种新设计利用单指令体系结构，并在体系结构级别提供加密数据的本地处理。该解决方案的安全性通过Paillier的同态加密方案的变体来保证，该方案用于对指令和数据进行加密。使用我们的硬件识别软件模拟器的实验结果表明，对于16位单指令架构，与未加密的变体相比，加密计算的平均执行开销在5到45倍之间(取决于安全参数)。

{"title":"HEROIC: Homomorphically EncRypted One Instruction Computer","authors":"N. G. Tsoutsos, M. Maniatakos","doi":"10.7873/DATE2014.259","DOIUrl":"https://doi.org/10.7873/DATE2014.259","url":null,"abstract":"As cloud computing becomes mainstream, the need to ensure the privacy of the data entrusted to third parties keeps rising. Cloud providers resort to numerous security controls and encryption to thwart potential attackers. Still, since the actual computation inside cloud microprocessors remains unencrypted, the opportunity of leakage is theoretically possible. Therefore, in order to address the challenge of protecting the computation inside the microprocessor, we introduce a novel general purpose architecture for secure data processing, called HEROIC (Homomorphically EncRypted One Instruction Computer). This new design utilizes a single instruction architecture and provides native processing of encrypted data at the architecture level. The security of the solution is assured by a variant of Paillier's homomorphic encryption scheme, used to encrypt both instructions and data. Experimental results using our hardware-cognizant software simulator, indicate an average execution overhead between 5 and 45 times for the encrypted computation (depending on the security parameter), compared to the unencrypted variant, for a 16-bit single instruction architecture.","PeriodicalId":6550,"journal":{"name":"2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"14 1","pages":"1-6"},"PeriodicalIF":0.0,"publicationDate":"2014-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82808956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 24

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2014 Design, Automation & Test in Europe Conference & Exhibition (DATE)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀