首页 > 最新文献

2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)最新文献

英文 中文
NUMANA: A hybrid numerical and analytical thermal simulator for 3-D ICs NUMANA:用于三维集成电路的混合数值和分析热模拟器
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.282
Yu-Min Lee, T. Wu, Pei-Yu Huang, C. Yang
By combining analytical and numerical simulation techniques, this work develops a hybrid thermal simulator, NUMANA, which can effectively deal with complicated material structures, to estimate the temperature profile of a 3-D IC. Compared with a commercial tool, ANSYS, its maximum relative error is only 1.84%. Compared with a well known linear system solver, SuperLU [1], it can achieve orders of magnitude speedup.
结合分析和数值模拟技术,开发了一种能够有效处理复杂材料结构的混合热模拟器NUMANA,用于估算三维集成电路的温度分布,与商用工具ANSYS相比,其最大相对误差仅为1.84%。与著名的线性系统求解器SuperLU[1]相比,它可以实现数量级的加速。
{"title":"NUMANA: A hybrid numerical and analytical thermal simulator for 3-D ICs","authors":"Yu-Min Lee, T. Wu, Pei-Yu Huang, C. Yang","doi":"10.7873/DATE.2013.282","DOIUrl":"https://doi.org/10.7873/DATE.2013.282","url":null,"abstract":"By combining analytical and numerical simulation techniques, this work develops a hybrid thermal simulator, NUMANA, which can effectively deal with complicated material structures, to estimate the temperature profile of a 3-D IC. Compared with a commercial tool, ANSYS, its maximum relative error is only 1.84%. Compared with a well known linear system solver, SuperLU [1], it can achieve orders of magnitude speedup.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"23 1","pages":"1379-1384"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80567610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A satisfiability approach to speed assignment for distributed real-time systems 分布式实时系统速度分配的可满足性方法
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.160
Pratyush Kumar, Devesh B. Chokshi, L. Thiele
We study the problem of assigning speeds to resources serving distributed applications with delay, buffer and energy constraints. We argue that the considered problem does not have any straightforward solution due to the intricately related constraints. As a solution, we propose using Real-Time Calculus (RTC) to analyse the constraints and a SATisfiability solver to efficiently explore the design space. To this end, we develop an SMT solver by using the OpenSMT framework and the Modular Performance Analysis (MPA) toolbox. Two key enablers for this implementation are the analysis of incomplete models and generation of conflict clauses in RTC. The results on problem instances with very large decision spaces indicate that the proposed SMT solver performs very well in practice.
我们研究了具有延迟、缓冲和能量约束的分布式应用程序资源的速度分配问题。我们认为,考虑的问题没有任何直接的解决方案,由于错综复杂的相关约束。作为解决方案,我们提出使用实时微积分(RTC)来分析约束条件和可满足性求解器来有效地探索设计空间。为此,我们利用OpenSMT框架和模块化性能分析(MPA)工具箱开发了一个SMT求解器。该实现的两个关键促成因素是对RTC中不完整模型的分析和冲突子句的生成。在具有非常大决策空间的问题实例上的结果表明,所提出的SMT求解器在实践中具有良好的性能。
{"title":"A satisfiability approach to speed assignment for distributed real-time systems","authors":"Pratyush Kumar, Devesh B. Chokshi, L. Thiele","doi":"10.7873/DATE.2013.160","DOIUrl":"https://doi.org/10.7873/DATE.2013.160","url":null,"abstract":"We study the problem of assigning speeds to resources serving distributed applications with delay, buffer and energy constraints. We argue that the considered problem does not have any straightforward solution due to the intricately related constraints. As a solution, we propose using Real-Time Calculus (RTC) to analyse the constraints and a SATisfiability solver to efficiently explore the design space. To this end, we develop an SMT solver by using the OpenSMT framework and the Modular Performance Analysis (MPA) toolbox. Two key enablers for this implementation are the analysis of incomplete models and generation of conflict clauses in RTC. The results on problem instances with very large decision spaces indicate that the proposed SMT solver performs very well in practice.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"72 1","pages":"749-754"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80575427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Design and implementation of a group-based RO PUF 基于组的RO PUF的设计与实现
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.094
C. Yin, G. Qu, Qiang Zhou
The silicon physical unclonable functions (PUF) utilize the uncontrollable variations during integrated circuit (IC) fabrication process to facilitate security related applications such as IC authentication. In this paper, we describe a new framework to generate secure PUF secret from ring oscillator (RO) PUF with improved hardware efficiency. Our work is based on the recently proposed group-based RO PUF with the following novel concepts: an entropy distiller to filter the systematic variation; a simplified grouping algorithm to partition the ROs into groups; a new syndrome coding scheme to facilitate error correction; and an entropy packing method to enhance coding efficiency and security. Using RO PUF dataset available in the public domain, we demonstrate these concepts can create PUF secret that can pass the NIST randomness and stability tests. Compared to other state-of-the-art RO PUF design, our approach can generate an average of 72% more PUF secret with the same amount of hardware.
硅物理不可克隆功能(PUF)利用集成电路(IC)制造过程中的不可控变化来促进集成电路认证等安全相关应用。本文提出了一种基于环振(RO) PUF生成安全PUF秘密的新框架,提高了硬件效率。我们的工作是基于最近提出的基于组的RO PUF,具有以下新概念:熵蒸馏器过滤系统变化;一种简化的分组算法将ROs分组;一种便于纠错的新型证候编码方案并采用熵填充方法提高编码效率和安全性。使用公共领域可用的RO PUF数据集,我们演示了这些概念可以创建可以通过NIST随机性和稳定性测试的PUF秘密。与其他最先进的RO PUF设计相比,我们的方法可以在相同数量的硬件下平均多生成72%的PUF秘密。
{"title":"Design and implementation of a group-based RO PUF","authors":"C. Yin, G. Qu, Qiang Zhou","doi":"10.7873/DATE.2013.094","DOIUrl":"https://doi.org/10.7873/DATE.2013.094","url":null,"abstract":"The silicon physical unclonable functions (PUF) utilize the uncontrollable variations during integrated circuit (IC) fabrication process to facilitate security related applications such as IC authentication. In this paper, we describe a new framework to generate secure PUF secret from ring oscillator (RO) PUF with improved hardware efficiency. Our work is based on the recently proposed group-based RO PUF with the following novel concepts: an entropy distiller to filter the systematic variation; a simplified grouping algorithm to partition the ROs into groups; a new syndrome coding scheme to facilitate error correction; and an entropy packing method to enhance coding efficiency and security. Using RO PUF dataset available in the public domain, we demonstrate these concepts can create PUF secret that can pass the NIST randomness and stability tests. Compared to other state-of-the-art RO PUF design, our approach can generate an average of 72% more PUF secret with the same amount of hardware.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"7 1","pages":"416-421"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79224777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
Reachability analysis of nonlinear analog circuits through iterative reachable set reduction 基于迭代可达集约简的非线性模拟电路可达性分析
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.293
S. Ahmadyan, Shobha Vasudevan
We propose a methodology for reachability analysis of nonlinear analog circuits to verify safety properties. Our iterative reachable set reduction algorithm initially considers the entire state space as reachable. Our algorithm iteratively determines which regions in the state space are unreachable and removes those unreachable regions from the over approximated reachable set. We use the State Partitioning Tree (SPT) algorithm to recursively partition the reachable set into convex polytopes. We determine the reachability of adjacent neighbor polytopes by analyzing the direction of state space trajectories at the common faces between two adjacent polytopes. We model the direction of the trajectories as a reachability decision function that we solve using a sound root counting method. We are faithful to the nonlinearities of the system. We demonstrate the memory efficiency of our algorithm through computation of the reachable set of Van der Pol oscillation circuit.
我们提出了一种非线性模拟电路的可达性分析方法来验证其安全特性。我们的迭代可达集约简算法最初认为整个状态空间是可达的。我们的算法迭代地确定状态空间中哪些区域是不可达的,并从过逼近的可达集中删除这些不可达的区域。我们使用状态划分树(SPT)算法递归地将可达集划分为凸多边形。我们通过分析两个相邻多面体之间公共面处的状态空间轨迹方向来确定相邻多面体的可达性。我们将轨迹的方向建模为可达性决策函数,并使用可靠的根计数方法求解。我们忠实于系统的非线性。通过计算范德波尔振荡电路的可达集,证明了该算法的存储效率。
{"title":"Reachability analysis of nonlinear analog circuits through iterative reachable set reduction","authors":"S. Ahmadyan, Shobha Vasudevan","doi":"10.7873/DATE.2013.293","DOIUrl":"https://doi.org/10.7873/DATE.2013.293","url":null,"abstract":"We propose a methodology for reachability analysis of nonlinear analog circuits to verify safety properties. Our iterative reachable set reduction algorithm initially considers the entire state space as reachable. Our algorithm iteratively determines which regions in the state space are unreachable and removes those unreachable regions from the over approximated reachable set. We use the State Partitioning Tree (SPT) algorithm to recursively partition the reachable set into convex polytopes. We determine the reachability of adjacent neighbor polytopes by analyzing the direction of state space trajectories at the common faces between two adjacent polytopes. We model the direction of the trajectories as a reachability decision function that we solve using a sound root counting method. We are faithful to the nonlinearities of the system. We demonstrate the memory efficiency of our algorithm through computation of the reachable set of Van der Pol oscillation circuit.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"38 1","pages":"1436-1441"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88739279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Software enabled wear-leveling for hybrid PCM main memory on embedded systems 嵌入式系统上混合PCM主存储器的软件启用损耗均衡
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.131
J. Hu, Qingfeng Zhuge, C. Xue, Wei-Che Tseng, E. Sha
Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics. However, relatively low endurance has limited its practical applications. In this paper, in additional to existing hardware level optimizations, we propose software enabled wear-leveling techniques to further extend PCM's lifetime when it is adopted in embedded systems. A polynomial-time algorithm, the Software Wear-Leveling (SWL) algorithm, is proposed in this paper to achieve wear-leveling without hardware overhead. According to the experimental results, the proposed technique can reduce the number of writes on the most-written bits by more than 80% when compared with a greedy algorithm, and by around 60% when compared with the existing Optimal Data Allocation (ODA) algorithm with under 6% memory access overhead.
相变存储器(PCM)由于其独特的特性,在嵌入式系统中是一种很有前途的DRAM替代品。然而,相对较低的续航能力限制了其实际应用。在本文中,除了现有的硬件级优化之外,我们提出了软件支持的磨损均衡技术,以进一步延长PCM在嵌入式系统中采用时的使用寿命。为了在不增加硬件开销的情况下实现磨损均衡,本文提出了一种多项式时间算法——软件磨损均衡算法(SWL)。实验结果表明,与贪婪算法相比,该算法可将写次数最多的比特上的写次数减少80%以上,与现有的最优数据分配(ODA)算法相比,该算法可减少60%左右,内存访问开销低于6%。
{"title":"Software enabled wear-leveling for hybrid PCM main memory on embedded systems","authors":"J. Hu, Qingfeng Zhuge, C. Xue, Wei-Che Tseng, E. Sha","doi":"10.7873/DATE.2013.131","DOIUrl":"https://doi.org/10.7873/DATE.2013.131","url":null,"abstract":"Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics. However, relatively low endurance has limited its practical applications. In this paper, in additional to existing hardware level optimizations, we propose software enabled wear-leveling techniques to further extend PCM's lifetime when it is adopted in embedded systems. A polynomial-time algorithm, the Software Wear-Leveling (SWL) algorithm, is proposed in this paper to achieve wear-leveling without hardware overhead. According to the experimental results, the proposed technique can reduce the number of writes on the most-written bits by more than 80% when compared with a greedy algorithm, and by around 60% when compared with the existing Optimal Data Allocation (ODA) algorithm with under 6% memory access overhead.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"45 1","pages":"599-602"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80652514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 67
Co-synthesis of data paths and clock control paths for minimum-period clock gating 用于最小周期时钟门控的数据路径和时钟控制路径的协同合成
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.366
Wen-Pin Tu, Shih-Hsu Huang, Chun-Hua Cheng
Although intentional clock skew can be utilized to reduce the clock period, its application in gated clock designs has not been well studied. A gated clock design includes both data paths and clock control paths, but conventional clock skew scheduling only focus on data paths. Based on that observation, in this paper, we propose an approach to perform the co-synthesis of data paths and clock control paths in a nonzero skew gated clock design. Our objective is to minimize the required inserted delay for working with the lower bound of the clock period (under clocking constraints of both data paths and clock control paths). Different from previous works, our approach can guarantee no clocking constraint violation in the presence of clock gating. Experimental results show our approach can effectively enhance the circuit speed with almost no penalty on the power consumption.
虽然有意的时钟倾斜可以用来减少时钟周期,但它在门控时钟设计中的应用还没有得到很好的研究。门控时钟设计包括数据路径和时钟控制路径,但传统的时钟倾斜调度只关注数据路径。在此基础上,本文提出了一种在非零偏门控时钟设计中实现数据路径和时钟控制路径协同合成的方法。我们的目标是最小化处理时钟周期下界所需的插入延迟(在数据路径和时钟控制路径的时钟约束下)。与以往的工作不同,我们的方法可以保证在存在时钟门控的情况下不违反时钟约束。实验结果表明,该方法可以在不影响功耗的情况下有效地提高电路速度。
{"title":"Co-synthesis of data paths and clock control paths for minimum-period clock gating","authors":"Wen-Pin Tu, Shih-Hsu Huang, Chun-Hua Cheng","doi":"10.7873/DATE.2013.366","DOIUrl":"https://doi.org/10.7873/DATE.2013.366","url":null,"abstract":"Although intentional clock skew can be utilized to reduce the clock period, its application in gated clock designs has not been well studied. A gated clock design includes both data paths and clock control paths, but conventional clock skew scheduling only focus on data paths. Based on that observation, in this paper, we propose an approach to perform the co-synthesis of data paths and clock control paths in a nonzero skew gated clock design. Our objective is to minimize the required inserted delay for working with the lower bound of the clock period (under clocking constraints of both data paths and clock control paths). Different from previous works, our approach can guarantee no clocking constraint violation in the presence of clock gating. Experimental results show our approach can effectively enhance the circuit speed with almost no penalty on the power consumption.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"22 1","pages":"1831-1836"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85269981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Efficient cache architectures for reliable hybrid voltage operation using EDC codes 高效的缓存架构,可靠的混合电压操作使用EDC代码
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.193
Bojan Maric, J. Abella, M. Valero
Semiconductor technology evolution enables the design of sensor-based battery-powered ultra-low-cost chips (e.g., below 1 €) required for new market segments such as body, urban life and environment monitoring. Caches have been shown to be the highest energy and area consumer in those chips. This paper proposes a novel, hybrid-operation (high Vcc, ultra-low Vcc), single-Vcc domain cache architecture based on replacing energy-hungry bitcells (e.g., 10T) by more energy-efficient and smaller cells (e.g., 8T) enhanced with Error Detection and Correction (EDC) features for high reliability and performance predictability. Our architecture is proven to largely outperform existing solutions in terms of energy and area.
半导体技术的发展使基于传感器的电池供电的超低成本芯片(例如,低于1欧元)的设计成为新的细分市场所需,如身体,城市生活和环境监测。缓存已被证明是这些芯片中最高的能量和面积消耗者。本文提出了一种新的混合操作(高Vcc,超低Vcc),单Vcc域缓存架构,该架构基于用更节能和更小的单元(例如8T)替换高能耗的位单元(例如10T),并增强了错误检测和校正(EDC)功能,以实现高可靠性和性能可预测性。我们的建筑被证明在能源和面积方面大大优于现有的解决方案。
{"title":"Efficient cache architectures for reliable hybrid voltage operation using EDC codes","authors":"Bojan Maric, J. Abella, M. Valero","doi":"10.7873/DATE.2013.193","DOIUrl":"https://doi.org/10.7873/DATE.2013.193","url":null,"abstract":"Semiconductor technology evolution enables the design of sensor-based battery-powered ultra-low-cost chips (e.g., below 1 €) required for new market segments such as body, urban life and environment monitoring. Caches have been shown to be the highest energy and area consumer in those chips. This paper proposes a novel, hybrid-operation (high Vcc, ultra-low Vcc), single-Vcc domain cache architecture based on replacing energy-hungry bitcells (e.g., 10T) by more energy-efficient and smaller cells (e.g., 8T) enhanced with Error Detection and Correction (EDC) features for high reliability and performance predictability. Our architecture is proven to largely outperform existing solutions in terms of energy and area.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"48 1","pages":"917-920"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89659450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
3D-MMC: A modular 3D multi-core architecture with efficient resource pooling 3D- mmc:具有高效资源池的模块化3D多核架构
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.257
Tiansheng Zhang, A. Cevrero, Giulia Beanato, Panagiotis Athanasopoulos, A. Coskun, Y. Leblebici
This paper demonstrates a fully functional hardware and software design for a 3D stacked multi-core system for the first time. Our 3D system is a low-power 3D Modular Multi-Core (3D-MMC) architecture built by vertically stacking identical layers. Each layer consists of cores, private and shared memory units, and communication infrastructures. The system uses shared memory communication and Through-Silicon-Vias (TSVs) to transfer data across layers. A serialization scheme is employed for inter-layer communication to minimize the overall number of TSVs. The proposed architecture has been implemented in HDL and verified on a test chip targeting an operating frequency of 400MHz with a vertical bandwidth of 3.2Gbps. The paper first evaluates the performance, power and temperature characteristics of the architecture using a set of software applications we have designed. We demonstrate quantitatively that the proposed modular 3D design improves upon the cost and performance bottlenecks of traditional 2D multi-core design. In addition, a novel resource pooling approach is introduced to efficiently manage the shared memory of the 3D stacked system. Our approach reduces the application execution time significantly compared to 2D and 3D systems with conventional memory sharing.
本文首次展示了一个功能齐全的3D堆叠多核系统的硬件和软件设计。我们的3D系统是一个低功耗的3D模块化多核(3D- mmc)架构,通过垂直堆叠相同的层构建而成。每一层由核心、私有和共享内存单元以及通信基础设施组成。该系统使用共享内存通信和硅通孔(tsv)跨层传输数据。层间通信采用串行化方案,使tsv的总数最小化。该架构已在HDL中实现,并在工作频率为400MHz、垂直带宽为3.2Gbps的测试芯片上进行了验证。本文首先使用我们设计的一套软件应用程序对该体系结构的性能、功耗和温度特性进行了评估。我们定量地证明了所提出的模块化3D设计改进了传统2D多核设计的成本和性能瓶颈。此外,还引入了一种新的资源池方法来有效地管理3D堆叠系统的共享内存。与传统内存共享的2D和3D系统相比,我们的方法大大减少了应用程序的执行时间。
{"title":"3D-MMC: A modular 3D multi-core architecture with efficient resource pooling","authors":"Tiansheng Zhang, A. Cevrero, Giulia Beanato, Panagiotis Athanasopoulos, A. Coskun, Y. Leblebici","doi":"10.7873/DATE.2013.257","DOIUrl":"https://doi.org/10.7873/DATE.2013.257","url":null,"abstract":"This paper demonstrates a fully functional hardware and software design for a 3D stacked multi-core system for the first time. Our 3D system is a low-power 3D Modular Multi-Core (3D-MMC) architecture built by vertically stacking identical layers. Each layer consists of cores, private and shared memory units, and communication infrastructures. The system uses shared memory communication and Through-Silicon-Vias (TSVs) to transfer data across layers. A serialization scheme is employed for inter-layer communication to minimize the overall number of TSVs. The proposed architecture has been implemented in HDL and verified on a test chip targeting an operating frequency of 400MHz with a vertical bandwidth of 3.2Gbps. The paper first evaluates the performance, power and temperature characteristics of the architecture using a set of software applications we have designed. We demonstrate quantitatively that the proposed modular 3D design improves upon the cost and performance bottlenecks of traditional 2D multi-core design. In addition, a novel resource pooling approach is introduced to efficiently manage the shared memory of the 3D stacked system. Our approach reduces the application execution time significantly compared to 2D and 3D systems with conventional memory sharing.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"38 1","pages":"1241-1246"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90200485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Multispeculative additive trees in High-Level Synthesis 高级合成中的多推测加性树
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.052
Alberto A. Del Barrio, R. Hermida, S. Memik, J. Mendias, M. Molina
Multispeculative Functional Units (MSFUs) are arithmetic functional units that operate using several predictors for the carry signal. The carry prediction helps to shorten the critical path of the functional unit. The average performance of these units is determined by the hit rate of the prediction. In spite of utilizing more than one predictor, none or only one additional cycle is enough for producing the correct result in the majority of the cases. In this paper we present multispeculation as a way of increasing the performance of tree structures with a negligible area penalty. By judiciously introducing these structures into computation trees, it will only be necessary to predict in certain selected nodes, thus minimizing the number of operations that can potentially mispredict. Hence, the average latency will be diminished and thus performance will be increased. Our experiments show that it is possible to improve on average 24% and 38% execution time, when considering logarithmic and linear modules, respectively.
多推测功能单元(MSFUs)是一种算术功能单元,它使用几个进位信号的预测符进行操作。进位预测有助于缩短功能单元的关键路径。这些单元的平均性能由预测的命中率决定。尽管使用了多个预测器,但在大多数情况下,没有一个或只有一个额外的周期足以产生正确的结果。在本文中,我们提出了多重推测作为一种提高树形结构性能的方法,其面积损失可以忽略不计。通过明智地将这些结构引入计算树,只需要在某些选定的节点中进行预测,从而最大限度地减少可能错误预测的操作数量。因此,平均延迟将减少,从而提高性能。我们的实验表明,当考虑对数和线性模块时,可以分别平均提高24%和38%的执行时间。
{"title":"Multispeculative additive trees in High-Level Synthesis","authors":"Alberto A. Del Barrio, R. Hermida, S. Memik, J. Mendias, M. Molina","doi":"10.7873/DATE.2013.052","DOIUrl":"https://doi.org/10.7873/DATE.2013.052","url":null,"abstract":"Multispeculative Functional Units (MSFUs) are arithmetic functional units that operate using several predictors for the carry signal. The carry prediction helps to shorten the critical path of the functional unit. The average performance of these units is determined by the hit rate of the prediction. In spite of utilizing more than one predictor, none or only one additional cycle is enough for producing the correct result in the majority of the cases. In this paper we present multispeculation as a way of increasing the performance of tree structures with a negligible area penalty. By judiciously introducing these structures into computation trees, it will only be necessary to predict in certain selected nodes, thus minimizing the number of operations that can potentially mispredict. Hence, the average latency will be diminished and thus performance will be increased. Our experiments show that it is possible to improve on average 24% and 38% execution time, when considering logarithmic and linear modules, respectively.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"91 1","pages":"188-193"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73527796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
From embedded multi-core SoCs to scale-out processors 从嵌入式多核soc到横向扩展处理器
Pub Date : 2013-03-18 DOI: 10.7873/DATE.2013.199
M. Coppola, B. Falsafi, J. Goodacre, Georgios Kornaros
Information technology is now an indispensable pillar of a modern day society. CMOS technologies, which lay the foundation for all digital platforms, however, are experiencing a major inflection point due to a slowdown in voltage scaling. The net result is that power is emerging as the key design constraint for all platforms from embedded systems to datacenters. This tutorial presents emerging design paradigms from embedded multicore SoCs to server processors for scale-out datacenters based on mobile cores.
信息技术现在是现代社会不可或缺的支柱。然而,作为所有数字平台基础的CMOS技术,由于电压缩放速度的放缓,正在经历一个重大的转折点。最终的结果是,功率正在成为从嵌入式系统到数据中心的所有平台的关键设计约束。本教程介绍了新兴的设计范例,从嵌入式多核soc到基于移动核心的横向扩展数据中心的服务器处理器。
{"title":"From embedded multi-core SoCs to scale-out processors","authors":"M. Coppola, B. Falsafi, J. Goodacre, Georgios Kornaros","doi":"10.7873/DATE.2013.199","DOIUrl":"https://doi.org/10.7873/DATE.2013.199","url":null,"abstract":"Information technology is now an indispensable pillar of a modern day society. CMOS technologies, which lay the foundation for all digital platforms, however, are experiencing a major inflection point due to a slowdown in voltage scaling. The net result is that power is emerging as the key design constraint for all platforms from embedded systems to datacenters. This tutorial presents emerging design paradigms from embedded multicore SoCs to server processors for scale-out datacenters based on mobile cores.","PeriodicalId":6310,"journal":{"name":"2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"448 1","pages":"947-951"},"PeriodicalIF":0.0,"publicationDate":"2013-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76612829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
期刊
2013 Design, Automation & Test in Europe Conference & Exhibition (DATE)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1