首页 > 最新文献

2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
GPU-friendly floating random walk algorithm for capacitance extraction of VLSI interconnects 基于gpu友好的VLSI互连电容提取浮动随机游走算法
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.336
Kuangya Zhai, Wenjian Yu, H. Zhuang
The floating random walk (FRW) algorithm is an important field-solver algorithm for capacitance extraction, which has several merits compared with other boundary element method (BEM) based algorithms. In this paper, the FRW algorithm is accelerated with the modern graphics processing units (GPUs). We propose an iterative GPU-based FRW algorithm flow and the technique using an inverse cumulative probability array (ICPA), to reduce the divergence among walks and the global-memory accessing. A variant FRW scheme is proposed to utilize the benefit of ICPA, so that it accelerates the extraction of multi-dielectric structures. The technique for extracting multiple nets concurrently is also discussed. Numerical results show that our GPU-based FRW brings over 20X speedup for various test cases with 0.5% convergence criterion over the CPU counterpart. For the extraction of multiple nets, our GPU-based FRW outperforms the CPU counterpart by up to 59X.
浮动随机漫步(FRW)算法是一种重要的电容提取场求解算法,与其他基于边界元法(BEM)的算法相比,该算法具有许多优点。本文采用现代图形处理器(gpu)对FRW算法进行了加速。我们提出了一种基于gpu的迭代FRW算法流程和使用逆累积概率阵列(ICPA)的技术,以减少行走之间的分歧和全局内存访问。利用ICPA算法的优点,提出了一种改进的FRW算法,加快了多介电结构的提取速度。讨论了同时提取多个网络的技术。数值结果表明,我们的基于gpu的FRW在各种测试用例中带来了超过20倍的加速,收敛标准为CPU的0.5%。对于多个网络的提取,我们基于gpu的FRW比CPU的同类性能高出59倍。
{"title":"GPU-friendly floating random walk algorithm for capacitance extraction of VLSI interconnects","authors":"Kuangya Zhai, Wenjian Yu, H. Zhuang","doi":"10.7873/DATE.2013.336","DOIUrl":"https://doi.org/10.7873/DATE.2013.336","url":null,"abstract":"The floating random walk (FRW) algorithm is an important field-solver algorithm for capacitance extraction, which has several merits compared with other boundary element method (BEM) based algorithms. In this paper, the FRW algorithm is accelerated with the modern graphics processing units (GPUs). We propose an iterative GPU-based FRW algorithm flow and the technique using an inverse cumulative probability array (ICPA), to reduce the divergence among walks and the global-memory accessing. A variant FRW scheme is proposed to utilize the benefit of ICPA, so that it accelerates the extraction of multi-dielectric structures. The technique for extracting multiple nets concurrently is also discussed. Numerical results show that our GPU-based FRW brings over 20X speedup for various test cases with 0.5% convergence criterion over the CPU counterpart. For the extraction of multiple nets, our GPU-based FRW outperforms the CPU counterpart by up to 59X.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"12 1","pages":"1661-1666"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77194654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
Is split manufacturing secure? 拆分制造安全吗?
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.261
Jeyavijayan Rajendran, O. Sinanoglu, R. Karri
Split manufacturing of integrated circuits (IC) is being investigated as a way to simultaneously alleviate the cost of owning a trusted foundry and eliminate the security risks associated with outsourcing IC fabrication. In split manufacturing, a design house (with a low-end, in-house, trusted foundry) fabricates the Front End Of Line (FEOL) layers (transistors and lower metal layers) in advanced technology nodes at an untrusted high-end foundry. The Back End Of Line (BEOL) layers (higher metal layers) are then fabricated at the design house's trusted low-end foundry. Split manufacturing is considered secure (prevents reverse engineering and IC piracy) as it hides the BEOL connections from an attacker in the FEOL foundry. We show that an attacker in the FEOL foundry can exploit the heuristics used in typical floorplanning, placement, and routing tools to bypass the security afforded by straightforward split manufacturing. We developed an attack where an attacker in the FEOL foundry can connect 96% of the missing BEOL connections correctly. To overcome this security vulnerability in split manufacturing, we developed a fault analysis-based defense. This defense improves the security of split manufacturing by deceiving the FEOL attacker into making wrong connections.
集成电路(IC)的分裂制造正在被研究,作为一种同时降低拥有一个值得信赖的代工厂的成本和消除外包IC制造相关的安全风险的方法。在拆分制造中,设计公司(拥有一家低端、内部、可信的代工厂)在一家不可信的高端代工厂的先进技术节点上制造前端线(FEOL)层(晶体管和下层金属层)。后端线(BEOL)层(更高的金属层)然后在设计公司值得信赖的低端铸造厂制造。分离制造被认为是安全的(防止逆向工程和IC盗版),因为它隐藏了FEOL代工厂中的攻击者的BEOL连接。我们展示了FEOL铸造厂中的攻击者可以利用典型的布局规划、放置和路由工具中使用的启发式方法来绕过直接拆分制造所提供的安全性。我们开发了一种攻击,攻击者在FEOL铸造厂可以正确连接96%缺失的BEOL连接。为了克服分裂制造中的这个安全漏洞,我们开发了一个基于故障分析的防御。这种防御通过欺骗FEOL攻击者建立错误的连接来提高分裂制造的安全性。
{"title":"Is split manufacturing secure?","authors":"Jeyavijayan Rajendran, O. Sinanoglu, R. Karri","doi":"10.7873/DATE.2013.261","DOIUrl":"https://doi.org/10.7873/DATE.2013.261","url":null,"abstract":"Split manufacturing of integrated circuits (IC) is being investigated as a way to simultaneously alleviate the cost of owning a trusted foundry and eliminate the security risks associated with outsourcing IC fabrication. In split manufacturing, a design house (with a low-end, in-house, trusted foundry) fabricates the Front End Of Line (FEOL) layers (transistors and lower metal layers) in advanced technology nodes at an untrusted high-end foundry. The Back End Of Line (BEOL) layers (higher metal layers) are then fabricated at the design house's trusted low-end foundry. Split manufacturing is considered secure (prevents reverse engineering and IC piracy) as it hides the BEOL connections from an attacker in the FEOL foundry. We show that an attacker in the FEOL foundry can exploit the heuristics used in typical floorplanning, placement, and routing tools to bypass the security afforded by straightforward split manufacturing. We developed an attack where an attacker in the FEOL foundry can connect 96% of the missing BEOL connections correctly. To overcome this security vulnerability in split manufacturing, we developed a fault analysis-based defense. This defense improves the security of split manufacturing by deceiving the FEOL attacker into making wrong connections.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"42 1","pages":"1259-1264"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77321576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 181
Optical Look Up Table 光学查表
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.184
Zhen Li, S. L. Beux, C. Monat, X. Letartre, I. O’Connor
The computation capacity of conventional FPGAs is directly proportional to the size and expressive power of Look Up Table (LUT) resources. Individual LUT performance is limited by transistor switching time and power dissipation, defined by the CMOS fabrication process. In this paper we propose OLUT, an optical core implementation of LUT, which has the potential for low latency and low power computation. In addition, the use of Wavelength Division Multiplexing (WDM) allows parallel computation, which can further increase computation capacity. Preliminary experimental results demonstrate the potential for optically assisted on-chip computation.
传统fpga的计算能力与查找表(LUT)资源的大小和表达能力成正比。单个LUT性能受到晶体管开关时间和功耗的限制,这是由CMOS制造工艺决定的。在本文中,我们提出了OLUT, LUT的光核实现,具有低延迟和低功耗计算的潜力。此外,使用波分复用(WDM)允许并行计算,这可以进一步提高计算能力。初步的实验结果证明了光辅助片上计算的潜力。
{"title":"Optical Look Up Table","authors":"Zhen Li, S. L. Beux, C. Monat, X. Letartre, I. O’Connor","doi":"10.7873/DATE.2013.184","DOIUrl":"https://doi.org/10.7873/DATE.2013.184","url":null,"abstract":"The computation capacity of conventional FPGAs is directly proportional to the size and expressive power of Look Up Table (LUT) resources. Individual LUT performance is limited by transistor switching time and power dissipation, defined by the CMOS fabrication process. In this paper we propose OLUT, an optical core implementation of LUT, which has the potential for low latency and low power computation. In addition, the use of Wavelength Division Multiplexing (WDM) allows parallel computation, which can further increase computation capacity. Preliminary experimental results demonstrate the potential for optically assisted on-chip computation.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"11 1","pages":"873-876"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76227922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
LFSR seed computation and reduction using SMT-based fault-chaining 基于smt的故障链的LFSR种子计算与约简
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.226
Dhrumeel Bakshi, M. Hsiao
We propose a new method to derive a small number of LFSR seeds for Logic BIST to cover all detectable faults as a first-order satisfiability problem involving extended theories. We use an SMT (Satisfiability Modulo Theories) formulation to efficiently combine the tasks of test-generation and seed-computation. We make use of this formulation in an iterative seed-reduction flow which enables the “chaining” of hard-to-test faults using very few seeds. Experimental results demonstrate that up to 79% reduction in the number of seeds can be achieved.
我们提出了一种新的方法来推导逻辑BIST的少量LFSR种子,以覆盖所有可检测的故障,作为涉及扩展理论的一阶可满足性问题。我们使用SMT(可满足模理论)公式来有效地将测试生成和种子计算任务结合起来。我们在一个迭代的种子缩减流中使用这个公式,它可以使用很少的种子来“链接”难以测试的故障。实验结果表明,种子数量最多可减少79%。
{"title":"LFSR seed computation and reduction using SMT-based fault-chaining","authors":"Dhrumeel Bakshi, M. Hsiao","doi":"10.7873/DATE.2013.226","DOIUrl":"https://doi.org/10.7873/DATE.2013.226","url":null,"abstract":"We propose a new method to derive a small number of LFSR seeds for Logic BIST to cover all detectable faults as a first-order satisfiability problem involving extended theories. We use an SMT (Satisfiability Modulo Theories) formulation to efficiently combine the tasks of test-generation and seed-computation. We make use of this formulation in an iterative seed-reduction flow which enables the “chaining” of hard-to-test faults using very few seeds. Experimental results demonstrate that up to 79% reduction in the number of seeds can be achieved.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"337 1","pages":"1071-1076"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76386795","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Handling discontinuous effects in modeling spatial correlation of wafer-level analog/RF tests 处理晶圆级模拟/射频测试空间相关性建模中的不连续效应
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.123
K. Huang, Nathan Kupp, J. Carulli, Y. Makris
In an effort to reduce the cost of specification testing in analog/RF circuits, spatial correlation modeling of wafer-level measurements has recently attracted increased attention. Existing approaches for capturing and leveraging such correlation, however, rely on the assumption that spatial variation is smooth and continuous. This, in turn, limits the effectiveness of these methods on actual production data, which often exhibits localized spatial discontinuous effects. In this work, we propose a novel approach which enables spatial correlation modeling of wafer-level analog/RF tests to handle such effects and, thereby, to drastically reduce prediction error for measurements exhibiting discontinuous spatial patterns. The core of the proposed approach is a k-means algorithm which partitions a wafer into k clusters, as caused by discontinuous effects. Individual correlation models are then constructed within each cluster, revoking the assumption that spatial patterns should be smooth and continuous across the entire wafer. Effectiveness of the proposed approach is evaluated on industrial probe test data from more than 3,400 wafers, revealing significant error reduction over existing approaches.
为了降低模拟/射频电路规格测试的成本,晶圆级测量的空间相关建模最近引起了越来越多的关注。然而,现有的捕获和利用这种相关性的方法依赖于空间变化是平滑和连续的假设。这反过来又限制了这些方法对实际生产数据的有效性,这些数据往往表现出局部的空间不连续效应。在这项工作中,我们提出了一种新颖的方法,该方法使晶圆级模拟/射频测试的空间相关建模能够处理这种影响,从而大大减少显示不连续空间模式的测量的预测误差。该方法的核心是k-means算法,该算法将晶圆划分为k个簇,这是由不连续效应引起的。然后在每个集群中构建单独的相关模型,取消了整个晶圆的空间模式应该是平滑和连续的假设。通过对3400多块晶圆的工业探针测试数据进行评估,发现该方法的有效性大大降低了现有方法的误差。
{"title":"Handling discontinuous effects in modeling spatial correlation of wafer-level analog/RF tests","authors":"K. Huang, Nathan Kupp, J. Carulli, Y. Makris","doi":"10.7873/DATE.2013.123","DOIUrl":"https://doi.org/10.7873/DATE.2013.123","url":null,"abstract":"In an effort to reduce the cost of specification testing in analog/RF circuits, spatial correlation modeling of wafer-level measurements has recently attracted increased attention. Existing approaches for capturing and leveraging such correlation, however, rely on the assumption that spatial variation is smooth and continuous. This, in turn, limits the effectiveness of these methods on actual production data, which often exhibits localized spatial discontinuous effects. In this work, we propose a novel approach which enables spatial correlation modeling of wafer-level analog/RF tests to handle such effects and, thereby, to drastically reduce prediction error for measurements exhibiting discontinuous spatial patterns. The core of the proposed approach is a k-means algorithm which partitions a wafer into k clusters, as caused by discontinuous effects. Individual correlation models are then constructed within each cluster, revoking the assumption that spatial patterns should be smooth and continuous across the entire wafer. Effectiveness of the proposed approach is evaluated on industrial probe test data from more than 3,400 wafers, revealing significant error reduction over existing approaches.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"39 1","pages":"553-558"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78073680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Roadmap towards ultimately-efficient zeta-scale datacenters 迈向最终高效的泽塔规模数据中心的路线图
Pub Date : 2013-03-18 DOI: 10.1109/HPCSim.2013.6641408
P. Ruch, T. Brunschwiler, S. Paredes, G. Meijer, B. Michel
Chip microscale liquid-cooling reduces thermal resistance and improves datacenter efficiency with higher coolant temperatures by eliminating chillers and allowing thermal energy re-use in cold climates. Liquid cooling enables an unprecedented density in future computers to a level similar to a human brain. This is mediated by a dense 3D architecture for interconnects, fluid cooling, and power delivery of energetic chemical compounds transported in the same fluid. Vertical integration improves memory proximity and electrochemical power delivery creating valuable space for communication. This strongly improves large system efficiency thereby allowing computers to grow beyond exa-scale.
芯片微尺度液体冷却减少了热阻,并通过消除冷却器和允许在寒冷气候下再利用热能,提高了冷却剂温度,提高了数据中心的效率。液体冷却使未来计算机的密度达到前所未有的水平,与人类大脑相似。这是由致密的3D结构介导的,用于互联、流体冷却和在同一流体中运输的高能化合物的电力输送。垂直集成提高了存储器的接近性和电化学电力输送,为通信创造了宝贵的空间。这极大地提高了大型系统的效率,从而使计算机能够超越超大规模。
{"title":"Roadmap towards ultimately-efficient zeta-scale datacenters","authors":"P. Ruch, T. Brunschwiler, S. Paredes, G. Meijer, B. Michel","doi":"10.1109/HPCSim.2013.6641408","DOIUrl":"https://doi.org/10.1109/HPCSim.2013.6641408","url":null,"abstract":"Chip microscale liquid-cooling reduces thermal resistance and improves datacenter efficiency with higher coolant temperatures by eliminating chillers and allowing thermal energy re-use in cold climates. Liquid cooling enables an unprecedented density in future computers to a level similar to a human brain. This is mediated by a dense 3D architecture for interconnects, fluid cooling, and power delivery of energetic chemical compounds transported in the same fluid. Vertical integration improves memory proximity and electrochemical power delivery creating valuable space for communication. This strongly improves large system efficiency thereby allowing computers to grow beyond exa-scale.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"39 1","pages":"1339-1344"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76627291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Using synchronization stalls in power-aware accelerators 在功率感知加速器中使用同步会停止
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.091
A. Jooya, A. Baniasadi
GPUs spend significant time on synchronization stalls. Such stalls provide ample opportunity to save leakage energy in GPU structures left idle during such periods. In this paper we focus on the register file structure of NVIDIA GPUs and introduce sync-aware low leakage solutions to reduce power. Accordingly, we show that applying the power gating technique to the register file during synchronization stalls can improve power efficiency without considerable performance loss. To this end, we equip the register file with two leakage power saving modes with different levels of power saving and wakeup latencies.
gpu在同步延迟上花费了大量时间。这样的停摆提供了充足的机会来节省GPU结构在此期间闲置的泄漏能量。本文重点介绍了NVIDIA gpu的寄存器文件结构,并介绍了同步感知的低泄漏解决方案,以降低功耗。因此,我们证明在同步失速期间对寄存器文件应用功率门控技术可以提高功率效率,而不会造成相当大的性能损失。为此,我们为寄存器文件配备了两种具有不同级别的省电和唤醒延迟的漏电省电模式。
{"title":"Using synchronization stalls in power-aware accelerators","authors":"A. Jooya, A. Baniasadi","doi":"10.7873/DATE.2013.091","DOIUrl":"https://doi.org/10.7873/DATE.2013.091","url":null,"abstract":"GPUs spend significant time on synchronization stalls. Such stalls provide ample opportunity to save leakage energy in GPU structures left idle during such periods. In this paper we focus on the register file structure of NVIDIA GPUs and introduce sync-aware low leakage solutions to reduce power. Accordingly, we show that applying the power gating technique to the register file during synchronization stalls can improve power efficiency without considerable performance loss. To this end, we equip the register file with two leakage power saving modes with different levels of power saving and wakeup latencies.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"246 1","pages":"400-403"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76735606","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
TreeFTL: Efficient RAM management for high performance of NAND flash-based storage systems TreeFTL:用于高性能NAND闪存存储系统的高效RAM管理
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.086
Chundong Wang, W. Wong
NAND flash memory is widely used for secondary storage today. The flash translation layer (FTL) is the embedded software that is responsible for managing and operating in flash storage system. One important module of the FTL performs RAM management. It is well-known to have a significant impact on flash storage system's performance. This paper proposes an efficient RAM management scheme called TreeFTL. As the name suggests, TreeFTL organizes address translation pages and data pages in RAM in a tree structure, through which it dynamically adapts to workloads by adjusting the partitions for address mapping and data buffering. TreeFTL also employs a lightweight mechanism to implement the least recently used (LRU) algorithm for RAM cache evictions. Experiments show that compared to the two latest schemes for RAM management in flash storage system, TreeFTL can reduce service time by 46.6% and 49.0% on average, respectively, with a 64MB RAM cache.
如今,NAND闪存被广泛用于辅助存储。flash翻译层(FTL)是在flash存储系统中负责管理和操作的嵌入式软件。FTL的一个重要模块执行RAM管理。它对闪存存储系统的性能有很大的影响。本文提出了一种高效的内存管理方案,称为TreeFTL。顾名思义,TreeFTL以树状结构在RAM中组织地址转换页和数据页,通过这种结构,它通过调整用于地址映射和数据缓冲的分区来动态适应工作负载。TreeFTL还采用了一种轻量级机制来实现RAM缓存清除的最近最少使用(LRU)算法。实验表明,与两种最新的闪存系统RAM管理方案相比,在64MB RAM缓存下,TreeFTL的服务时间平均分别缩短46.6%和49.0%。
{"title":"TreeFTL: Efficient RAM management for high performance of NAND flash-based storage systems","authors":"Chundong Wang, W. Wong","doi":"10.7873/DATE.2013.086","DOIUrl":"https://doi.org/10.7873/DATE.2013.086","url":null,"abstract":"NAND flash memory is widely used for secondary storage today. The flash translation layer (FTL) is the embedded software that is responsible for managing and operating in flash storage system. One important module of the FTL performs RAM management. It is well-known to have a significant impact on flash storage system's performance. This paper proposes an efficient RAM management scheme called TreeFTL. As the name suggests, TreeFTL organizes address translation pages and data pages in RAM in a tree structure, through which it dynamically adapts to workloads by adjusting the partitions for address mapping and data buffering. TreeFTL also employs a lightweight mechanism to implement the least recently used (LRU) algorithm for RAM cache evictions. Experiments show that compared to the two latest schemes for RAM management in flash storage system, TreeFTL can reduce service time by 46.6% and 49.0% on average, respectively, with a 64MB RAM cache.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"241 1","pages":"374-379"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77477888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Quality-aware media scheduling on MPSoC platforms 基于MPSoC平台的质量感知媒体调度
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.204
Deepak Gangadharan, S. Chakraborty, Roger Zimmermann
Applications that stream multiple video/audio or video+audio clips are being implemented in embedded devices. A Picture-in-Picture (PiP) application is one such application scenario, where two videos are played simultaneously. Although the PiP application is very efficiently handled in televisions and personal computers by providing maximum quality of service to the multiple streams, it is a difficult task in devices with resource constraints. In order to efficiently utilize the resources, it is essential to derive the necessary processor cycles for multiple video streams such that they are displayed with some prespecified quality constraint. Therefore, we propose a network calculus based formal framework to help schedule multiple media streams in the presence of buffer contraints. Further, our framework also presents a schedulability analysis condition to check if the multimedia streams can be scheduled such that a prespecified quality constraint is satisfied with the available service. We present this framework in the context of a PiP application, but it is applicable in general for multiple media streams. The results obtained using the formal framework were further verified using experiments involving system simulation.
多个视频/音频或视频+音频剪辑流的应用程序正在嵌入式设备中实现。Picture-in-Picture (PiP)应用程序就是这样一种应用程序场景,其中同时播放两个视频。尽管通过向多个流提供最高质量的服务,可以非常有效地处理电视和个人计算机中的PiP应用程序,但在具有资源限制的设备中,这是一项困难的任务。为了有效地利用资源,有必要为多个视频流推导必要的处理器周期,使它们在一些预先指定的质量约束下显示。因此,我们提出了一个基于网络演算的形式化框架,以帮助在存在缓冲区约束的情况下调度多个媒体流。此外,我们的框架还提出了可调度性分析条件,以检查多媒体流是否可以调度,从而使可用服务满足预先指定的质量约束。我们在PiP应用程序的上下文中介绍了这个框架,但它通常适用于多个媒体流。通过系统仿真实验进一步验证了采用形式框架得到的结果。
{"title":"Quality-aware media scheduling on MPSoC platforms","authors":"Deepak Gangadharan, S. Chakraborty, Roger Zimmermann","doi":"10.7873/DATE.2013.204","DOIUrl":"https://doi.org/10.7873/DATE.2013.204","url":null,"abstract":"Applications that stream multiple video/audio or video+audio clips are being implemented in embedded devices. A Picture-in-Picture (PiP) application is one such application scenario, where two videos are played simultaneously. Although the PiP application is very efficiently handled in televisions and personal computers by providing maximum quality of service to the multiple streams, it is a difficult task in devices with resource constraints. In order to efficiently utilize the resources, it is essential to derive the necessary processor cycles for multiple video streams such that they are displayed with some prespecified quality constraint. Therefore, we propose a network calculus based formal framework to help schedule multiple media streams in the presence of buffer contraints. Further, our framework also presents a schedulability analysis condition to check if the multimedia streams can be scheduled such that a prespecified quality constraint is satisfied with the available service. We present this framework in the context of a PiP application, but it is applicable in general for multiple media streams. The results obtained using the formal framework were further verified using experiments involving system simulation.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"260 1","pages":"976-981"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76299493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An automatic tool flow for the combined implementation of multi-mode circuits 一种用于组合实现多模电路的自动工具流
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.174
Brahim Al Farisi, Karel Bruneel, João MP Cardoso, D. Stroobandt
A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration for every mode separately. To switch between modes the complete reconfigurable region is rewritten, which often leads to very long reconfiguration times. In this paper we present a novel, fully automated tool flow that exploits similarities between the modes and uses Dynamic Circuit Specialization to drastically reduce reconfiguration time. Experimental results show that the number of bits that is rewritten in the configuration memory reduces with a factor from 4.6× to 5.1× without significant performance penalties.
多模电路实现了有限数量的电路的功能,称为模式,在任何给定的时间只需要实现其中一个。使用FPGA的运行时重新配置,所有模式都可以在相同的可重新配置区域上实现,只需要一个可以包含最大模式的区域。通常,传统的运行时重新配置技术分别为每个模式生成配置。为了在模式之间切换,需要重写整个可重构区域,这通常会导致非常长的重新配置时间。在本文中,我们提出了一种新颖的、完全自动化的工具流,它利用了模式之间的相似性,并使用动态电路专门化来大幅减少重新配置时间。实验结果表明,在配置内存中重写的比特数从4.6倍减少到5.1倍,而没有明显的性能损失。
{"title":"An automatic tool flow for the combined implementation of multi-mode circuits","authors":"Brahim Al Farisi, Karel Bruneel, João MP Cardoso, D. Stroobandt","doi":"10.7873/DATE.2013.174","DOIUrl":"https://doi.org/10.7873/DATE.2013.174","url":null,"abstract":"A multi-mode circuit implements the functionality of a limited number of circuits, called modes, of which at any given time only one needs to be realised. Using run-time reconfiguration of an FPGA, all the modes can be implemented on the same reconfigurable region, requiring only an area that can contain the biggest mode. Typically, conventional run-time reconfiguration techniques generate a configuration for every mode separately. To switch between modes the complete reconfigurable region is rewritten, which often leads to very long reconfiguration times. In this paper we present a novel, fully automated tool flow that exploits similarities between the modes and uses Dynamic Circuit Specialization to drastically reduce reconfiguration time. Experimental results show that the number of bits that is rewritten in the configuration memory reduces with a factor from 4.6× to 5.1× without significant performance penalties.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"43 1","pages":"821-826"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76363171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
期刊
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1