首页 > 最新文献

2008 45th ACM/IEEE Design Automation Conference最新文献

英文 中文
FPGA area reduction by multi-output function based sequential resynthesis 基于顺序重合成的多输出函数的FPGA面积缩减
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391478
Yu Hu, Victor Shih, R. Majumdar, Lei He
We propose a new resynthesis algorithm for FPGA area reduction. In contrast to existing resynthesis techniques, which consider only single-output Boolean functions and the combinational portion of a circuit, we consider multi-output functions and retiming, and develop effective algorithms that incorporate recent improvements to SAT-based Boolean matching. Our experimental results show that with the optimal logic depth, the resynthesis considering multi-output functions reduces area by up to 0.4% compared to the one considering single-output functions, and the sequential resynthesis reduces area by up to 10% compared to combinational resynthesis when both consider multi-output functions. Furthermore, our proposed resynthesis algorithm reduces area by up to 16% compared to the best existing academic technology mapper, Berrylikei ABC.
提出了一种新的FPGA面积缩减再合成算法。现有的再合成技术只考虑单输出布尔函数和电路的组合部分,与之相反,我们考虑了多输出函数和重新定时,并开发了有效的算法,该算法结合了基于sat的布尔匹配的最新改进。实验结果表明,在最优逻辑深度下,考虑多输出函数的再合成比考虑单输出函数的再合成面积减少0.4%,考虑多输出函数的顺序再合成比考虑组合再合成面积减少10%。此外,与现有最好的学术技术映射器Berrylikei ABC相比,我们提出的再合成算法将面积减少了16%。
{"title":"FPGA area reduction by multi-output function based sequential resynthesis","authors":"Yu Hu, Victor Shih, R. Majumdar, Lei He","doi":"10.1145/1391469.1391478","DOIUrl":"https://doi.org/10.1145/1391469.1391478","url":null,"abstract":"We propose a new resynthesis algorithm for FPGA area reduction. In contrast to existing resynthesis techniques, which consider only single-output Boolean functions and the combinational portion of a circuit, we consider multi-output functions and retiming, and develop effective algorithms that incorporate recent improvements to SAT-based Boolean matching. Our experimental results show that with the optimal logic depth, the resynthesis considering multi-output functions reduces area by up to 0.4% compared to the one considering single-output functions, and the sequential resynthesis reduces area by up to 10% compared to combinational resynthesis when both consider multi-output functions. Furthermore, our proposed resynthesis algorithm reduces area by up to 16% compared to the best existing academic technology mapper, Berrylikei ABC.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114661404","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Automatic synthesis of clock gating logic with controlled netlist perturbation 控制网表扰动的时钟门控逻辑自动合成
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391637
A. Hurst
Clock gating is the insertion of combinational logic along the clock path to prevent the unnecessary switching of registers and reduce dynamic power consumption. The conditions under which the transition of a register may be safely blocked can either be explicitly specified by the designer or detected automatically. We introduce a new method for automatically synthesizing these conditions in a way that minimizes netlist perturbation and is both timing- and physical-aware. Our automatic method is also scalable, utilizing simulation and satisfiability tests and necessitating no symbolic representation. On a set of benchmarks, our technique successfully reduces the dynamic clock power by 14.5% on average. Furthermore, we demonstrate how to apply a straightforward logic simplification to utilize resulting don't cares and reduce the logic by 7.0% on average.
时钟门控是沿时钟路径插入组合逻辑,以防止不必要的寄存器切换和降低动态功耗。寄存器的转换被安全阻止的条件可以由设计人员显式指定,也可以自动检测。我们介绍了一种自动合成这些条件的新方法,以最小化网表扰动的方式,同时具有时间和物理意识。我们的自动化方法也是可扩展的,利用模拟和满意度测试,不需要符号表示。在一组基准测试中,我们的技术成功地将动态时钟功耗平均降低了14.5%。此外,我们演示了如何应用一个简单的逻辑简化来利用结果不关心,并将逻辑平均减少7.0%。
{"title":"Automatic synthesis of clock gating logic with controlled netlist perturbation","authors":"A. Hurst","doi":"10.1145/1391469.1391637","DOIUrl":"https://doi.org/10.1145/1391469.1391637","url":null,"abstract":"Clock gating is the insertion of combinational logic along the clock path to prevent the unnecessary switching of registers and reduce dynamic power consumption. The conditions under which the transition of a register may be safely blocked can either be explicitly specified by the designer or detected automatically. We introduce a new method for automatically synthesizing these conditions in a way that minimizes netlist perturbation and is both timing- and physical-aware. Our automatic method is also scalable, utilizing simulation and satisfiability tests and necessitating no symbolic representation. On a set of benchmarks, our technique successfully reduces the dynamic clock power by 14.5% on average. Furthermore, we demonstrate how to apply a straightforward logic simplification to utilize resulting don't cares and reduce the logic by 7.0% on average.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116209852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 31
A generalized network flow based algorithm for power-aware FPGA memory mapping 基于广义网络流的功耗感知FPGA内存映射算法
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391479
Tien-Yuan Hsu, Ting-Chi Wang
In this paper, we present a generalized network flow based algorithm for power-aware FPGA memory mapping. Our algorithm not only maps user-defined logical memories to physical embedded memory blocks under the memory resource constraint but also achieves minimum power consumption. The experimental results show that our algorithm was always able to efficiently generate optimal solutions for all test cases while an existing greedy method could do so only for about one third of the test cases.
本文提出了一种基于广义网络流的功耗感知FPGA内存映射算法。该算法不仅在内存资源约束下将用户自定义逻辑内存映射到物理嵌入式内存块,而且实现了最小功耗。实验结果表明,我们的算法总是能够有效地为所有测试用例生成最优解,而现有的贪婪方法只能对大约三分之一的测试用例生成最优解。
{"title":"A generalized network flow based algorithm for power-aware FPGA memory mapping","authors":"Tien-Yuan Hsu, Ting-Chi Wang","doi":"10.1145/1391469.1391479","DOIUrl":"https://doi.org/10.1145/1391469.1391479","url":null,"abstract":"In this paper, we present a generalized network flow based algorithm for power-aware FPGA memory mapping. Our algorithm not only maps user-defined logical memories to physical embedded memory blocks under the memory resource constraint but also achieves minimum power consumption. The experimental results show that our algorithm was always able to efficiently generate optimal solutions for all test cases while an existing greedy method could do so only for about one third of the test cases.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121412369","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Predictive dynamic thermal management for multicore systems 多核系统的预测动态热管理
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391658
Inchoon Yeo, C. Liu, Eun Jung Kim
Recently, processor power density has been increasing at an alarming rate resulting in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In this paper, we propose a Predictive Dynamic Thermal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. ABTM predicts future temperature based on the application specific thermal behavior, while CBTM estimates core temperature pattern by steady state temperature and workload. The accuracy of our prediction model is 1.6% error in average compared to the model in HybDTM, which has at most 5% error. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possible overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. PDTM enables the exploration of the tradeoff between throughput and fairness in temperature-constrained multicore systems. We implement PDTM on Intel's Quad-Core system with a specific device driver to access Digital Thermal Sensor (DTS). Compared against Linux standard scheduler, PDTM can decrease average temperature about 10%, and peak temperature by 5degC with negligible impact of performance under 1%, while running single SPEC2006 benchmark. Moreover, our PDTM outperforms HRTM in reducing average temperature by about 7% and peak temperature by about 3degC with performance overhead by 0.15% when running single benchmark.
近年来,处理器功率密度以惊人的速度增长,导致片内温度高。温度越高,漏电流越大,可靠性越差。本文提出了一种基于应用热模型(ABTM)和核热模型(CBTM)的多核系统预测动态热管理(PDTM)方法。ABTM基于应用的特定热行为预测未来温度,而CBTM通过稳态温度和工作负荷来估计堆芯温度模式。与HybDTM中的预测模型相比,我们的预测模型的平均误差为1.6%,最大误差为5%。基于ABTM和CBTM的预测温度,建议的PDTM可以通过将运行的应用程序从可能过热的核心移动到未来最冷的核心(迁移)并减少多核系统中的处理器资源(优先级调度)来保持系统温度低于期望的水平。PDTM允许在温度受限的多核系统中探索吞吐量和公平性之间的权衡。我们在英特尔的四核系统上实现PDTM,并使用特定的设备驱动程序访问数字热传感器(DTS)。与Linux标准调度器相比,在运行单个SPEC2006基准测试时,PDTM可以将平均温度降低约10%,峰值温度降低5℃,对性能的影响在1%以下可以忽略。此外,在运行单个基准测试时,我们的PDTM在平均温度降低约7%和峰值温度降低约3℃方面优于HRTM,性能开销降低0.15%。
{"title":"Predictive dynamic thermal management for multicore systems","authors":"Inchoon Yeo, C. Liu, Eun Jung Kim","doi":"10.1145/1391469.1391658","DOIUrl":"https://doi.org/10.1145/1391469.1391658","url":null,"abstract":"Recently, processor power density has been increasing at an alarming rate resulting in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In this paper, we propose a Predictive Dynamic Thermal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. ABTM predicts future temperature based on the application specific thermal behavior, while CBTM estimates core temperature pattern by steady state temperature and workload. The accuracy of our prediction model is 1.6% error in average compared to the model in HybDTM, which has at most 5% error. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possible overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. PDTM enables the exploration of the tradeoff between throughput and fairness in temperature-constrained multicore systems. We implement PDTM on Intel's Quad-Core system with a specific device driver to access Digital Thermal Sensor (DTS). Compared against Linux standard scheduler, PDTM can decrease average temperature about 10%, and peak temperature by 5degC with negligible impact of performance under 1%, while running single SPEC2006 benchmark. Moreover, our PDTM outperforms HRTM in reducing average temperature by about 7% and peak temperature by about 3degC with performance overhead by 0.15% when running single benchmark.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"290 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125758947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 205
(Bio)-Behavioral CAD (生物)行为CAD
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391562
M. Potkonjak, F. Koushanfar
We propose the use of functional magnetic resonance imaging (fMRI) systems, techniques, and tools to observe the neuron-level activity of the brains of designers or CAD tool developers. The objective is to enable designers and developers to complete their task in a faster and more creative way with significantly reduced number of logical and design errors. While fMRI techniques are already used in economics, decision and several other social sciences, until now their potential for closing the design productivity-silicon productivity (DPSP) gap has not been recognized. By compounding the new approach with techniques for designing integrated circuits and system within fMRI data collection and analysis, we will establish a positive productivity and creativity feedback loop that may permanently close the DPSP gap. As a preliminary and presently feasible step, we propose the creation of behavioral CAD research and development techniques. The usage of judiciously selected verbal, visual information and reintroduction of successful design paradigm and exposure to beneficial synthesis templates may help current and future designers to learn and design more effectively.
我们建议使用功能磁共振成像(fMRI)系统、技术和工具来观察设计师或CAD工具开发人员大脑的神经元水平活动。其目标是使设计人员和开发人员能够以更快、更有创造性的方式完成任务,同时显著减少逻辑和设计错误的数量。虽然fMRI技术已经应用于经济学、决策学和其他一些社会科学领域,但迄今为止,它们在缩小设计生产率与硅生产率(DPSP)差距方面的潜力尚未得到认可。通过将新方法与fMRI数据收集和分析中的集成电路和系统设计技术相结合,我们将建立一个积极的生产力和创造力反馈循环,可能永久地缩小DPSP差距。作为一个初步的和目前可行的步骤,我们建议创建行为CAD研究和开发技术。使用明智选择的语言、视觉信息,重新引入成功的设计范式,并暴露于有益的综合模板,可能有助于当前和未来的设计师更有效地学习和设计。
{"title":"(Bio)-Behavioral CAD","authors":"M. Potkonjak, F. Koushanfar","doi":"10.1145/1391469.1391562","DOIUrl":"https://doi.org/10.1145/1391469.1391562","url":null,"abstract":"We propose the use of functional magnetic resonance imaging (fMRI) systems, techniques, and tools to observe the neuron-level activity of the brains of designers or CAD tool developers. The objective is to enable designers and developers to complete their task in a faster and more creative way with significantly reduced number of logical and design errors. While fMRI techniques are already used in economics, decision and several other social sciences, until now their potential for closing the design productivity-silicon productivity (DPSP) gap has not been recognized. By compounding the new approach with techniques for designing integrated circuits and system within fMRI data collection and analysis, we will establish a positive productivity and creativity feedback loop that may permanently close the DPSP gap. As a preliminary and presently feasible step, we propose the creation of behavioral CAD research and development techniques. The usage of judiciously selected verbal, visual information and reintroduction of successful design paradigm and exposure to beneficial synthesis templates may help current and future designers to learn and design more effectively.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"67 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129891178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
WavePipe: Parallel transient simulation of analog and digital circuits on multi-core shared-memory machines WavePipe:多核共享内存机器上模拟和数字电路的并行瞬态仿真
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391531
Wei Dong, Peng Li, Xiaoji Ye
While the emergence of multi-core shared-memory machines offers a promising computing solution to ever complex chip design problems, new parallel CAD methodologies must be developed to gain the full benefit of these increasingly parallel computing systems. We present a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs. Our new approach, Waveform Pipelining (abbreviated as WavePipe), exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step by moving backwards in time, the latter performs predictive computing along the forward direction of the time axis. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardying convergence and accuracy. As a coarse-grained parallel approach, WavePipe not only requires low parallel programming effort, more importantly, it creates new avenues to fully utilize increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions.
虽然多核共享内存机器的出现为复杂的芯片设计问题提供了一个有前途的计算解决方案,但必须开发新的并行CAD方法来获得这些日益并行的计算系统的全部好处。我们提出了一种并行瞬态仿真方法及其多线程实现,适用于一般模拟和数字集成电路。我们的新方法,波形流水线(缩写为WavePipe),通过在多个相邻时间点同时计算电路解决方案,以类似于硬件流水线的方式,利用粗粒度的应用级并行性。在WavePipe中有两种实现方式:向后和正向管道模式。前者创建独立的计算任务,通过在时间上向后移动来实现更大的未来时间步长,而后者则沿着时间轴的向前方向执行预测计算。与现有的松弛方法不同,WavePipe有助于并行电路仿真,而不会影响收敛性和准确性。作为一种粗粒度并行方法,WavePipe不仅需要较少的并行编程工作,更重要的是,它超越了传统的细粒度并行设备模型评估和矩阵解决方案,为充分利用日益增长的并行硬件创造了新的途径。
{"title":"WavePipe: Parallel transient simulation of analog and digital circuits on multi-core shared-memory machines","authors":"Wei Dong, Peng Li, Xiaoji Ye","doi":"10.1145/1391469.1391531","DOIUrl":"https://doi.org/10.1145/1391469.1391531","url":null,"abstract":"While the emergence of multi-core shared-memory machines offers a promising computing solution to ever complex chip design problems, new parallel CAD methodologies must be developed to gain the full benefit of these increasingly parallel computing systems. We present a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs. Our new approach, Waveform Pipelining (abbreviated as WavePipe), exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step by moving backwards in time, the latter performs predictive computing along the forward direction of the time axis. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardying convergence and accuracy. As a coarse-grained parallel approach, WavePipe not only requires low parallel programming effort, more importantly, it creates new avenues to fully utilize increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Challenges in using system-level models for RTL verification 使用系统级模型进行RTL验证的挑战
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391676
Kelvin Ng
In modern day digital design flow, high-level models written in C and C++ serve multiple purposes, one of which is to aid verification of register-transfer level (RTL) hardware models. These high-level models, also called system-level models (SLMs), act as reference models for hardware designs created at the RTL level. They define the correct behavior for the RTL hardware design under verification. Written in a programming language (or similar) and therefore executable, they are used extensively in both simulation-based verification and formal equivalence checking. This paper presents how SLMs fit into the different RTL verification schemes and the challenges involved in the various verification flows. Input stimulus generation based on formal verification technology is introduced as a new way to improve simulation coverage. This paper also covers other techniques engineers use to meet various challenges encountered in RTL verification.
在现代数字设计流程中,用C和c++编写的高级模型有多种用途,其中之一是帮助验证寄存器传输级(RTL)硬件模型。这些高级模型,也称为系统级模型(slm),作为在RTL级别创建的硬件设计的参考模型。它们为验证中的RTL硬件设计定义了正确的行为。用编程语言(或类似语言)编写,因此可执行,它们广泛用于基于仿真的验证和形式等价检查。本文介绍了slm如何适应不同的RTL验证方案以及各种验证流程中涉及的挑战。基于形式验证技术的输入刺激生成是提高仿真覆盖率的一种新方法。本文还介绍了工程师用来应对RTL验证中遇到的各种挑战的其他技术。
{"title":"Challenges in using system-level models for RTL verification","authors":"Kelvin Ng","doi":"10.1145/1391469.1391676","DOIUrl":"https://doi.org/10.1145/1391469.1391676","url":null,"abstract":"In modern day digital design flow, high-level models written in C and C++ serve multiple purposes, one of which is to aid verification of register-transfer level (RTL) hardware models. These high-level models, also called system-level models (SLMs), act as reference models for hardware designs created at the RTL level. They define the correct behavior for the RTL hardware design under verification. Written in a programming language (or similar) and therefore executable, they are used extensively in both simulation-based verification and formal equivalence checking. This paper presents how SLMs fit into the different RTL verification schemes and the challenges involved in the various verification flows. Input stimulus generation based on formal verification technology is introduced as a new way to improve simulation coverage. This paper also covers other techniques engineers use to meet various challenges encountered in RTL verification.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121982129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Formal datapath representation and manipulation for implementing DSP transforms 实现DSP转换的形式化数据路径表示和操作
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391572
Peter Milder, F. Franchetti, J. Hoe, Markus Püschel
We present a domain-specific approach to representing datapaths for hardware implementations of linear signal transform algorithms. We extend the tensor structure for describing linear transform algorithms, adding the ability to explicitly characterize two important dimensions of datapath architecture. This representation allows both algorithm and datapath to be specified within a single formula and gives the designer the ability to easily consider a wide space of possible datapaths at a high level of abstraction. We have constructed a formula manipulation system based on this representation and have written a compiler that can translate a formula into a hardware implementation. This enables an automatic "push button" compilation flow that produces a register transfer level hardware description from high-level datapath directives and an algorithm (written as a formula). In our experimental results, we demonstrate that this approach yields efficient designs over a large tradeoff space.
我们提出了一种领域特定的方法来表示线性信号变换算法的硬件实现的数据路径。我们扩展了描述线性变换算法的张量结构,增加了明确描述数据路径架构的两个重要维度的能力。这种表示允许在单个公式中指定算法和数据路径,并使设计人员能够在高层次抽象上轻松考虑可能的数据路径的广泛空间。我们基于这种表示构造了一个公式操作系统,并编写了一个编译器,可以将公式转换为硬件实现。这使自动“按钮”编译流能够从高级数据路径指令和算法(以公式形式编写)生成寄存器传输级硬件描述。在我们的实验结果中,我们证明了这种方法在很大的权衡空间上产生了有效的设计。
{"title":"Formal datapath representation and manipulation for implementing DSP transforms","authors":"Peter Milder, F. Franchetti, J. Hoe, Markus Püschel","doi":"10.1145/1391469.1391572","DOIUrl":"https://doi.org/10.1145/1391469.1391572","url":null,"abstract":"We present a domain-specific approach to representing datapaths for hardware implementations of linear signal transform algorithms. We extend the tensor structure for describing linear transform algorithms, adding the ability to explicitly characterize two important dimensions of datapath architecture. This representation allows both algorithm and datapath to be specified within a single formula and gives the designer the ability to easily consider a wide space of possible datapaths at a high level of abstraction. We have constructed a formula manipulation system based on this representation and have written a compiler that can translate a formula into a hardware implementation. This enables an automatic \"push button\" compilation flow that produces a register transfer level hardware description from high-level datapath directives and an algorithm (written as a formula). In our experimental results, we demonstrate that this approach yields efficient designs over a large tradeoff space.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127237893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
DeMOR: Decentralized model order reduction of linear networks with massive ports 具有大量端口的线性网络的分散模型降阶
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391577
Boyuan Yan, Lingfei Zhou, S. Tan, Jie Chen, B. McGaughy
Model order reduction is an efficient technique to reduce the system complexity while producing a good approximation of the input-output behavior. However, the efficiency of reduction degrades as the number of ports increases, which remains a long-standing problem. The reason for the degradation is that existing approaches are based on a centralized framework, where each input-output pair is implicitly assumed to be equally interacted and the matrix-valued transfer function has to be assumed to be fully populated. In this paper, a decentralized model order reduction scheme is proposed, where a multi-input multi-output (MIMO) system is decoupled into a number of subsystems and each subsystem corresponds to one output and several dominant inputs. The decoupling process is based on the relative gain array (RGA), which measures the degree of interaction of each input-output pair. Our experimental results on a number of interconnect circuits show that most of the input- output interactions are usually insignificant, which can lead to extremely compact models even for systems with massive ports. The reduction scheme is very amenable for parallel computing as each decoupled subsystem can be reduced independently.
模型降阶是一种有效的降低系统复杂性的技术,同时可以很好地近似输入-输出行为。然而,随着端口数量的增加,还原效率降低,这是一个长期存在的问题。退化的原因是现有的方法是基于一个集中的框架,其中每个输入-输出对被隐式地假设为相等的相互作用,并且矩阵值传递函数必须被假设为完全填充。本文提出了一种分散模型降阶方案,该方案将多输入多输出(MIMO)系统解耦为多个子系统,每个子系统对应一个输出和多个主导输入。解耦过程基于相对增益阵列(RGA),它测量每个输入输出对的相互作用程度。我们在许多互连电路上的实验结果表明,大多数输入-输出相互作用通常是微不足道的,这可能导致即使对于具有大量端口的系统也会产生极其紧凑的模型。由于每个解耦的子系统都可以独立地进行约简,因此该约简方案非常适合并行计算。
{"title":"DeMOR: Decentralized model order reduction of linear networks with massive ports","authors":"Boyuan Yan, Lingfei Zhou, S. Tan, Jie Chen, B. McGaughy","doi":"10.1145/1391469.1391577","DOIUrl":"https://doi.org/10.1145/1391469.1391577","url":null,"abstract":"Model order reduction is an efficient technique to reduce the system complexity while producing a good approximation of the input-output behavior. However, the efficiency of reduction degrades as the number of ports increases, which remains a long-standing problem. The reason for the degradation is that existing approaches are based on a centralized framework, where each input-output pair is implicitly assumed to be equally interacted and the matrix-valued transfer function has to be assumed to be fully populated. In this paper, a decentralized model order reduction scheme is proposed, where a multi-input multi-output (MIMO) system is decoupled into a number of subsystems and each subsystem corresponds to one output and several dominant inputs. The decoupling process is based on the relative gain array (RGA), which measures the degree of interaction of each input-output pair. Our experimental results on a number of interconnect circuits show that most of the input- output interactions are usually insignificant, which can lead to extremely compact models even for systems with massive ports. The reduction scheme is very amenable for parallel computing as each decoupled subsystem can be reduced independently.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127829205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Modeling of failure probability and statistical design of Spin-Torque Transfer Magnetic Random Access Memory (STT MRAM) array for yield enhancement 提高良率的自旋转矩传递磁随机存储器(STT MRAM)阵列失效概率建模与统计设计
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391540
Jing Li, C. Augustine, S. Salahuddin, K. Roy
Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future universal memory. It combines the desirable attributes of current memory technologies such as SRAM, DRAM and flash memories. It also solves the key drawbacks of conventional MRAM technology: poor scalability and high write current. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we developed an efficient simulation tool to capture the coupled electro/magnetic dynamics of spintronic device, leading to effective prediction for memory yield. We also developed a statistical optimization methodology to minimize the memory failure probability. The proposed methodology can be used at an early stage of the design cycle to enhance memory yield.
自旋转矩传递磁RAM (STT MRAM)是未来通用存储器的一个很有前途的候选材料。它结合了当前存储器技术的理想属性,如SRAM, DRAM和闪存。它还解决了传统MRAM技术的主要缺点:可扩展性差和写入电流大。本文对STT MRAM单元在参数变化下的失效概率进行了分析和建模。基于该模型,我们开发了一个有效的仿真工具来捕捉自旋电子器件的耦合电/磁动力学,从而有效地预测记忆产率。我们还开发了一种统计优化方法来最小化内存失效概率。所提出的方法可用于设计周期的早期阶段,以提高内存产量。
{"title":"Modeling of failure probability and statistical design of Spin-Torque Transfer Magnetic Random Access Memory (STT MRAM) array for yield enhancement","authors":"Jing Li, C. Augustine, S. Salahuddin, K. Roy","doi":"10.1145/1391469.1391540","DOIUrl":"https://doi.org/10.1145/1391469.1391540","url":null,"abstract":"Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future universal memory. It combines the desirable attributes of current memory technologies such as SRAM, DRAM and flash memories. It also solves the key drawbacks of conventional MRAM technology: poor scalability and high write current. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we developed an efficient simulation tool to capture the coupled electro/magnetic dynamics of spintronic device, leading to effective prediction for memory yield. We also developed a statistical optimization methodology to minimize the memory failure probability. The proposed methodology can be used at an early stage of the design cycle to enhance memory yield.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127837210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
期刊
2008 45th ACM/IEEE Design Automation Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1