首页 > 最新文献

2008 45th ACM/IEEE Design Automation Conference最新文献

英文 中文
WavePipe: Parallel transient simulation of analog and digital circuits on multi-core shared-memory machines WavePipe:多核共享内存机器上模拟和数字电路的并行瞬态仿真
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391531
Wei Dong, Peng Li, Xiaoji Ye
While the emergence of multi-core shared-memory machines offers a promising computing solution to ever complex chip design problems, new parallel CAD methodologies must be developed to gain the full benefit of these increasingly parallel computing systems. We present a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs. Our new approach, Waveform Pipelining (abbreviated as WavePipe), exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step by moving backwards in time, the latter performs predictive computing along the forward direction of the time axis. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardying convergence and accuracy. As a coarse-grained parallel approach, WavePipe not only requires low parallel programming effort, more importantly, it creates new avenues to fully utilize increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions.
虽然多核共享内存机器的出现为复杂的芯片设计问题提供了一个有前途的计算解决方案,但必须开发新的并行CAD方法来获得这些日益并行的计算系统的全部好处。我们提出了一种并行瞬态仿真方法及其多线程实现,适用于一般模拟和数字集成电路。我们的新方法,波形流水线(缩写为WavePipe),通过在多个相邻时间点同时计算电路解决方案,以类似于硬件流水线的方式,利用粗粒度的应用级并行性。在WavePipe中有两种实现方式:向后和正向管道模式。前者创建独立的计算任务,通过在时间上向后移动来实现更大的未来时间步长,而后者则沿着时间轴的向前方向执行预测计算。与现有的松弛方法不同,WavePipe有助于并行电路仿真,而不会影响收敛性和准确性。作为一种粗粒度并行方法,WavePipe不仅需要较少的并行编程工作,更重要的是,它超越了传统的细粒度并行设备模型评估和矩阵解决方案,为充分利用日益增长的并行硬件创造了新的途径。
{"title":"WavePipe: Parallel transient simulation of analog and digital circuits on multi-core shared-memory machines","authors":"Wei Dong, Peng Li, Xiaoji Ye","doi":"10.1145/1391469.1391531","DOIUrl":"https://doi.org/10.1145/1391469.1391531","url":null,"abstract":"While the emergence of multi-core shared-memory machines offers a promising computing solution to ever complex chip design problems, new parallel CAD methodologies must be developed to gain the full benefit of these increasingly parallel computing systems. We present a parallel transient simulation methodology and its multi-threaded implementation for general analog and digital ICs. Our new approach, Waveform Pipelining (abbreviated as WavePipe), exploits coarsegrained application-level parallelism by simultaneously computing circuit solutions at multiple adjacent time points in a way resembling hardware pipelining. There are two embodiments in WavePipe: backward and forward pipelining schemes. While the former creates independent computing tasks that contribute to a larger future time step by moving backwards in time, the latter performs predictive computing along the forward direction of the time axis. Unlike existing relaxation methods, WavePipe facilitates parallel circuit simulation without jeopardying convergence and accuracy. As a coarse-grained parallel approach, WavePipe not only requires low parallel programming effort, more importantly, it creates new avenues to fully utilize increasingly parallel hardware by going beyond conventional finer grained parallel device model evaluation and matrix solutions.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130051603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Formal datapath representation and manipulation for implementing DSP transforms 实现DSP转换的形式化数据路径表示和操作
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391572
Peter Milder, F. Franchetti, J. Hoe, Markus Püschel
We present a domain-specific approach to representing datapaths for hardware implementations of linear signal transform algorithms. We extend the tensor structure for describing linear transform algorithms, adding the ability to explicitly characterize two important dimensions of datapath architecture. This representation allows both algorithm and datapath to be specified within a single formula and gives the designer the ability to easily consider a wide space of possible datapaths at a high level of abstraction. We have constructed a formula manipulation system based on this representation and have written a compiler that can translate a formula into a hardware implementation. This enables an automatic "push button" compilation flow that produces a register transfer level hardware description from high-level datapath directives and an algorithm (written as a formula). In our experimental results, we demonstrate that this approach yields efficient designs over a large tradeoff space.
我们提出了一种领域特定的方法来表示线性信号变换算法的硬件实现的数据路径。我们扩展了描述线性变换算法的张量结构,增加了明确描述数据路径架构的两个重要维度的能力。这种表示允许在单个公式中指定算法和数据路径,并使设计人员能够在高层次抽象上轻松考虑可能的数据路径的广泛空间。我们基于这种表示构造了一个公式操作系统,并编写了一个编译器,可以将公式转换为硬件实现。这使自动“按钮”编译流能够从高级数据路径指令和算法(以公式形式编写)生成寄存器传输级硬件描述。在我们的实验结果中,我们证明了这种方法在很大的权衡空间上产生了有效的设计。
{"title":"Formal datapath representation and manipulation for implementing DSP transforms","authors":"Peter Milder, F. Franchetti, J. Hoe, Markus Püschel","doi":"10.1145/1391469.1391572","DOIUrl":"https://doi.org/10.1145/1391469.1391572","url":null,"abstract":"We present a domain-specific approach to representing datapaths for hardware implementations of linear signal transform algorithms. We extend the tensor structure for describing linear transform algorithms, adding the ability to explicitly characterize two important dimensions of datapath architecture. This representation allows both algorithm and datapath to be specified within a single formula and gives the designer the ability to easily consider a wide space of possible datapaths at a high level of abstraction. We have constructed a formula manipulation system based on this representation and have written a compiler that can translate a formula into a hardware implementation. This enables an automatic \"push button\" compilation flow that produces a register transfer level hardware description from high-level datapath directives and an algorithm (written as a formula). In our experimental results, we demonstrate that this approach yields efficient designs over a large tradeoff space.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127237893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Towards acceleration of fault simulation using Graphics Processing Units 用图形处理单元加速故障仿真
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391679
Kanupriya Gulati, S. Khatri
In this paper, we explore the implementation of fault simulation on a graphics processing unit (GPU). In particular, we implement a fault simulator that exploits thread level parallelism. Fault simulation is inherently parallelizable, and the large number of threads that can be computed in parallel on a GPU results in a natural fit for the problem of fault simulation. Our implementation fault- simulates all the gates in a particular level of a circuit, including good and faulty circuit simulations, for all patterns, in parallel. Since GPUs have an extremely large memory bandwidth, we implement each of our fault simulation threads (which execute in parallel with no data dependencies) using memory lookup. Fault injection is also done along with gate evaluation, with each thread using a different fault injection mask. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our results, implemented on a NVIDIA GeForce GTX 8800 GPU card, indicate that our approach is on average 35 x faster when compared to a commercial fault simulation engine. With the recently announced Tesla GPU servers housing up to eight GPUs, our approach would be potentially 238 times faster. The correctness of the GPU based fault simulator has been verified by comparing its result with a CPU based fault simulator.
在本文中,我们探讨了在图形处理单元(GPU)上实现故障仿真。特别是,我们实现了一个利用线程级并行性的故障模拟器。故障仿真具有内在的并行性,GPU上可以并行计算的大量线程使得故障仿真问题得到了很好的解决。我们的实现故障模拟电路中特定级别的所有门,包括良好和故障电路模拟,所有模式,并行。由于gpu具有非常大的内存带宽,我们使用内存查找来实现每个故障模拟线程(并行执行,没有数据依赖)。故障注入也与门评估一起完成,每个线程使用不同的故障注入掩码。根据GPU的单指令多数据(SIMD)编程语义的要求,所有线程计算相同的指令,但处理不同的数据。我们在NVIDIA GeForce GTX 8800 GPU卡上实现的结果表明,与商业故障模拟引擎相比,我们的方法平均快35倍。最近发布的Tesla GPU服务器最多可容纳8个GPU,我们的方法可能会快238倍。通过与基于CPU的故障模拟器的仿真结果比较,验证了基于GPU的故障模拟器的正确性。
{"title":"Towards acceleration of fault simulation using Graphics Processing Units","authors":"Kanupriya Gulati, S. Khatri","doi":"10.1145/1391469.1391679","DOIUrl":"https://doi.org/10.1145/1391469.1391679","url":null,"abstract":"In this paper, we explore the implementation of fault simulation on a graphics processing unit (GPU). In particular, we implement a fault simulator that exploits thread level parallelism. Fault simulation is inherently parallelizable, and the large number of threads that can be computed in parallel on a GPU results in a natural fit for the problem of fault simulation. Our implementation fault- simulates all the gates in a particular level of a circuit, including good and faulty circuit simulations, for all patterns, in parallel. Since GPUs have an extremely large memory bandwidth, we implement each of our fault simulation threads (which execute in parallel with no data dependencies) using memory lookup. Fault injection is also done along with gate evaluation, with each thread using a different fault injection mask. All threads compute identical instructions, but on different data, as required by the Single Instruction Multiple Data (SIMD) programming semantics of the GPU. Our results, implemented on a NVIDIA GeForce GTX 8800 GPU card, indicate that our approach is on average 35 x faster when compared to a commercial fault simulation engine. With the recently announced Tesla GPU servers housing up to eight GPUs, our approach would be potentially 238 times faster. The correctness of the GPU based fault simulator has been verified by comparing its result with a CPU based fault simulator.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127305814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 106
Predictive dynamic thermal management for multicore systems 多核系统的预测动态热管理
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391658
Inchoon Yeo, C. Liu, Eun Jung Kim
Recently, processor power density has been increasing at an alarming rate resulting in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In this paper, we propose a Predictive Dynamic Thermal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. ABTM predicts future temperature based on the application specific thermal behavior, while CBTM estimates core temperature pattern by steady state temperature and workload. The accuracy of our prediction model is 1.6% error in average compared to the model in HybDTM, which has at most 5% error. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possible overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. PDTM enables the exploration of the tradeoff between throughput and fairness in temperature-constrained multicore systems. We implement PDTM on Intel's Quad-Core system with a specific device driver to access Digital Thermal Sensor (DTS). Compared against Linux standard scheduler, PDTM can decrease average temperature about 10%, and peak temperature by 5degC with negligible impact of performance under 1%, while running single SPEC2006 benchmark. Moreover, our PDTM outperforms HRTM in reducing average temperature by about 7% and peak temperature by about 3degC with performance overhead by 0.15% when running single benchmark.
近年来,处理器功率密度以惊人的速度增长,导致片内温度高。温度越高,漏电流越大,可靠性越差。本文提出了一种基于应用热模型(ABTM)和核热模型(CBTM)的多核系统预测动态热管理(PDTM)方法。ABTM基于应用的特定热行为预测未来温度,而CBTM通过稳态温度和工作负荷来估计堆芯温度模式。与HybDTM中的预测模型相比,我们的预测模型的平均误差为1.6%,最大误差为5%。基于ABTM和CBTM的预测温度,建议的PDTM可以通过将运行的应用程序从可能过热的核心移动到未来最冷的核心(迁移)并减少多核系统中的处理器资源(优先级调度)来保持系统温度低于期望的水平。PDTM允许在温度受限的多核系统中探索吞吐量和公平性之间的权衡。我们在英特尔的四核系统上实现PDTM,并使用特定的设备驱动程序访问数字热传感器(DTS)。与Linux标准调度器相比,在运行单个SPEC2006基准测试时,PDTM可以将平均温度降低约10%,峰值温度降低5℃,对性能的影响在1%以下可以忽略。此外,在运行单个基准测试时,我们的PDTM在平均温度降低约7%和峰值温度降低约3℃方面优于HRTM,性能开销降低0.15%。
{"title":"Predictive dynamic thermal management for multicore systems","authors":"Inchoon Yeo, C. Liu, Eun Jung Kim","doi":"10.1145/1391469.1391658","DOIUrl":"https://doi.org/10.1145/1391469.1391658","url":null,"abstract":"Recently, processor power density has been increasing at an alarming rate resulting in high on-chip temperature. Higher temperature increases current leakage and causes poor reliability. In this paper, we propose a Predictive Dynamic Thermal Management (PDTM) based on Application-based Thermal Model (ABTM) and Core-based Thermal Model (CBTM) in the multicore systems. ABTM predicts future temperature based on the application specific thermal behavior, while CBTM estimates core temperature pattern by steady state temperature and workload. The accuracy of our prediction model is 1.6% error in average compared to the model in HybDTM, which has at most 5% error. Based on predicted temperature from ABTM and CBTM, the proposed PDTM can maintain the system temperature below a desired level by moving the running application from the possible overheated core to the future coolest core (migration) and reducing the processor resources (priority scheduling) within multicore systems. PDTM enables the exploration of the tradeoff between throughput and fairness in temperature-constrained multicore systems. We implement PDTM on Intel's Quad-Core system with a specific device driver to access Digital Thermal Sensor (DTS). Compared against Linux standard scheduler, PDTM can decrease average temperature about 10%, and peak temperature by 5degC with negligible impact of performance under 1%, while running single SPEC2006 benchmark. Moreover, our PDTM outperforms HRTM in reducing average temperature by about 7% and peak temperature by about 3degC with performance overhead by 0.15% when running single benchmark.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"290 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125758947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 205
DeMOR: Decentralized model order reduction of linear networks with massive ports 具有大量端口的线性网络的分散模型降阶
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391577
Boyuan Yan, Lingfei Zhou, S. Tan, Jie Chen, B. McGaughy
Model order reduction is an efficient technique to reduce the system complexity while producing a good approximation of the input-output behavior. However, the efficiency of reduction degrades as the number of ports increases, which remains a long-standing problem. The reason for the degradation is that existing approaches are based on a centralized framework, where each input-output pair is implicitly assumed to be equally interacted and the matrix-valued transfer function has to be assumed to be fully populated. In this paper, a decentralized model order reduction scheme is proposed, where a multi-input multi-output (MIMO) system is decoupled into a number of subsystems and each subsystem corresponds to one output and several dominant inputs. The decoupling process is based on the relative gain array (RGA), which measures the degree of interaction of each input-output pair. Our experimental results on a number of interconnect circuits show that most of the input- output interactions are usually insignificant, which can lead to extremely compact models even for systems with massive ports. The reduction scheme is very amenable for parallel computing as each decoupled subsystem can be reduced independently.
模型降阶是一种有效的降低系统复杂性的技术,同时可以很好地近似输入-输出行为。然而,随着端口数量的增加,还原效率降低,这是一个长期存在的问题。退化的原因是现有的方法是基于一个集中的框架,其中每个输入-输出对被隐式地假设为相等的相互作用,并且矩阵值传递函数必须被假设为完全填充。本文提出了一种分散模型降阶方案,该方案将多输入多输出(MIMO)系统解耦为多个子系统,每个子系统对应一个输出和多个主导输入。解耦过程基于相对增益阵列(RGA),它测量每个输入输出对的相互作用程度。我们在许多互连电路上的实验结果表明,大多数输入-输出相互作用通常是微不足道的,这可能导致即使对于具有大量端口的系统也会产生极其紧凑的模型。由于每个解耦的子系统都可以独立地进行约简,因此该约简方案非常适合并行计算。
{"title":"DeMOR: Decentralized model order reduction of linear networks with massive ports","authors":"Boyuan Yan, Lingfei Zhou, S. Tan, Jie Chen, B. McGaughy","doi":"10.1145/1391469.1391577","DOIUrl":"https://doi.org/10.1145/1391469.1391577","url":null,"abstract":"Model order reduction is an efficient technique to reduce the system complexity while producing a good approximation of the input-output behavior. However, the efficiency of reduction degrades as the number of ports increases, which remains a long-standing problem. The reason for the degradation is that existing approaches are based on a centralized framework, where each input-output pair is implicitly assumed to be equally interacted and the matrix-valued transfer function has to be assumed to be fully populated. In this paper, a decentralized model order reduction scheme is proposed, where a multi-input multi-output (MIMO) system is decoupled into a number of subsystems and each subsystem corresponds to one output and several dominant inputs. The decoupling process is based on the relative gain array (RGA), which measures the degree of interaction of each input-output pair. Our experimental results on a number of interconnect circuits show that most of the input- output interactions are usually insignificant, which can lead to extremely compact models even for systems with massive ports. The reduction scheme is very amenable for parallel computing as each decoupled subsystem can be reduced independently.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127829205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Modeling of failure probability and statistical design of Spin-Torque Transfer Magnetic Random Access Memory (STT MRAM) array for yield enhancement 提高良率的自旋转矩传递磁随机存储器(STT MRAM)阵列失效概率建模与统计设计
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391540
Jing Li, C. Augustine, S. Salahuddin, K. Roy
Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future universal memory. It combines the desirable attributes of current memory technologies such as SRAM, DRAM and flash memories. It also solves the key drawbacks of conventional MRAM technology: poor scalability and high write current. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we developed an efficient simulation tool to capture the coupled electro/magnetic dynamics of spintronic device, leading to effective prediction for memory yield. We also developed a statistical optimization methodology to minimize the memory failure probability. The proposed methodology can be used at an early stage of the design cycle to enhance memory yield.
自旋转矩传递磁RAM (STT MRAM)是未来通用存储器的一个很有前途的候选材料。它结合了当前存储器技术的理想属性,如SRAM, DRAM和闪存。它还解决了传统MRAM技术的主要缺点:可扩展性差和写入电流大。本文对STT MRAM单元在参数变化下的失效概率进行了分析和建模。基于该模型,我们开发了一个有效的仿真工具来捕捉自旋电子器件的耦合电/磁动力学,从而有效地预测记忆产率。我们还开发了一种统计优化方法来最小化内存失效概率。所提出的方法可用于设计周期的早期阶段,以提高内存产量。
{"title":"Modeling of failure probability and statistical design of Spin-Torque Transfer Magnetic Random Access Memory (STT MRAM) array for yield enhancement","authors":"Jing Li, C. Augustine, S. Salahuddin, K. Roy","doi":"10.1145/1391469.1391540","DOIUrl":"https://doi.org/10.1145/1391469.1391540","url":null,"abstract":"Spin-torque transfer magnetic RAM (STT MRAM) is a promising candidate for future universal memory. It combines the desirable attributes of current memory technologies such as SRAM, DRAM and flash memories. It also solves the key drawbacks of conventional MRAM technology: poor scalability and high write current. In this paper, we analyzed and modeled the failure probabilities of STT MRAM cells due to parameter variations. Based on the model, we developed an efficient simulation tool to capture the coupled electro/magnetic dynamics of spintronic device, leading to effective prediction for memory yield. We also developed a statistical optimization methodology to minimize the memory failure probability. The proposed methodology can be used at an early stage of the design cycle to enhance memory yield.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127837210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 88
IntellBatt: Towards smarter battery design 智能电池:迈向更智能的电池设计
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391690
Sumana Mandal, Praveen Bhojwani, S. Mohanty, R. Mahapatra
Battery lifetime and safety are primary concerns in the design of battery operated systems. Lifetime management is typically supervised by the system via battery-aware task scheduling, while safety is managed on the battery side via features deployed into smart batteries. This research proposes IntellBatt; an intelligent battery cell array based novel design of a multi-cell battery that offloads battery lifetime management onto the battery. By deploying a battery cell array management unit, IntellBatt exploits various battery related characteristics such as charge recovery effect, to enhance battery lifetime and ensure safe operation. This is achieved by using real-time cell status information to selects cells to deliver the required load current, without the involvement of a complex task scheduler on the host system. The proposed design was evaluated via simulation using accurate cell models and real experimental traces from a portable DVD player. The use of a multi-cell design enhanced battery lifetime by 22% in terms of battery discharge time. Besides a standalone deployment, IntellBatt can also be combined with existing battery-aware task scheduling approaches to further enhance battery lifetime.
电池寿命和安全性是电池操作系统设计的主要关注点。使用寿命管理通常由系统通过电池感知任务调度进行监督,而安全管理则通过部署在智能电池中的功能在电池端进行管理。本研究提出了智能电池;一种基于智能电池阵列的新型多电池设计,将电池寿命管理转移到电池上。通过部署电池阵列管理单元,IntellBatt利用各种电池相关特性,如充电恢复效果,延长电池寿命并确保安全运行。这是通过使用实时单元状态信息来选择单元以提供所需的负载电流来实现的,而无需在主机系统上使用复杂的任务调度器。通过使用精确的单元模型和便携式DVD播放机的真实实验轨迹进行仿真,对所提出的设计进行了评估。在电池放电时间方面,多电池设计的使用使电池寿命延长了22%。除了独立部署外,intellbat还可以与现有的电池感知任务调度方法相结合,以进一步提高电池寿命。
{"title":"IntellBatt: Towards smarter battery design","authors":"Sumana Mandal, Praveen Bhojwani, S. Mohanty, R. Mahapatra","doi":"10.1145/1391469.1391690","DOIUrl":"https://doi.org/10.1145/1391469.1391690","url":null,"abstract":"Battery lifetime and safety are primary concerns in the design of battery operated systems. Lifetime management is typically supervised by the system via battery-aware task scheduling, while safety is managed on the battery side via features deployed into smart batteries. This research proposes IntellBatt; an intelligent battery cell array based novel design of a multi-cell battery that offloads battery lifetime management onto the battery. By deploying a battery cell array management unit, IntellBatt exploits various battery related characteristics such as charge recovery effect, to enhance battery lifetime and ensure safe operation. This is achieved by using real-time cell status information to selects cells to deliver the required load current, without the involvement of a complex task scheduler on the host system. The proposed design was evaluated via simulation using accurate cell models and real experimental traces from a portable DVD player. The use of a multi-cell design enhanced battery lifetime by 22% in terms of battery discharge time. Besides a standalone deployment, IntellBatt can also be combined with existing battery-aware task scheduling approaches to further enhance battery lifetime.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 39
Transistor level gate modeling for accurate and fast timing, noise, and power analysis 晶体管级栅极建模的准确和快速的定时,噪声和功率分析
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391588
S. Raja, F. Varadi, M. Becer, J. Geada
Current source based cell models are becoming a necessity for accurate timing and noise analysis at 65 nm and below. Voltage waveform shapes are increasingly more difficult to represent as simple ramps due to highly resistive interconnects and Miller cap effects at receiver gates. Propagation of complex voltage waveforms, and accurate modeling of nonlinear driver and receiver effects in crosstalk noise analysis require accurate cell models. A good cell model should be independent of input waveform and output load, should be easy to characterize and should not increase the complexity of a cell library with high-dimensional look-up tables. At the same time, it should provide high accuracy compared to SPICE for all analysis scenarios including multiple-input switching, and for all cell types and cell arcs, including those with high stacks. It should also be easily extendable for use in statistical STA and noise analysis, and one should be able to simulate it fast enough for practical use in multi-million gate designs. In this paper, we present a gate model built from fast transistor models (FXM) that has all the desired properties. Along with this model, we also present a multithreaded timing traversal approach that allows one to take advantage of the high accuracy provided by the FXM, at traditional STA speeds. Results are presented using a fully extracted 65 nm TSMC technology.
当前基于源的电池模型正在成为65纳米及以下精确定时和噪声分析的必要条件。由于接收器门处的高电阻互连和米勒帽效应,电压波形形状越来越难以表示为简单的斜坡。在串扰噪声分析中,复杂电压波形的传播以及非线性驱动和接收效应的精确建模需要精确的单元模型。一个好的cell模型应该独立于输入波形和输出负载,应该易于表征,并且不应该增加具有高维查找表的cell库的复杂性。同时,与SPICE相比,对于包括多输入开关在内的所有分析场景,以及所有细胞类型和细胞弧,包括那些具有高堆栈的分析场景,它应该提供更高的准确性。它还应该易于扩展,用于统计STA和噪声分析,并且应该能够足够快地模拟它,以便在数百万栅极设计中实际使用。在本文中,我们提出了一种基于快速晶体管模型(FXM)的栅极模型,该模型具有所有所需的特性。与此模型一起,我们还提出了一种多线程计时遍历方法,该方法允许人们在传统STA速度下利用FXM提供的高精度。结果采用全提取65nm TSMC技术。
{"title":"Transistor level gate modeling for accurate and fast timing, noise, and power analysis","authors":"S. Raja, F. Varadi, M. Becer, J. Geada","doi":"10.1145/1391469.1391588","DOIUrl":"https://doi.org/10.1145/1391469.1391588","url":null,"abstract":"Current source based cell models are becoming a necessity for accurate timing and noise analysis at 65 nm and below. Voltage waveform shapes are increasingly more difficult to represent as simple ramps due to highly resistive interconnects and Miller cap effects at receiver gates. Propagation of complex voltage waveforms, and accurate modeling of nonlinear driver and receiver effects in crosstalk noise analysis require accurate cell models. A good cell model should be independent of input waveform and output load, should be easy to characterize and should not increase the complexity of a cell library with high-dimensional look-up tables. At the same time, it should provide high accuracy compared to SPICE for all analysis scenarios including multiple-input switching, and for all cell types and cell arcs, including those with high stacks. It should also be easily extendable for use in statistical STA and noise analysis, and one should be able to simulate it fast enough for practical use in multi-million gate designs. In this paper, we present a gate model built from fast transistor models (FXM) that has all the desired properties. Along with this model, we also present a multithreaded timing traversal approach that allows one to take advantage of the high accuracy provided by the FXM, at traditional STA speeds. Results are presented using a fully extracted 65 nm TSMC technology.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127673231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 40
Efficient algorithm for the computation of on-chip capacitance sensitivities with respect to a large set of parameters 考虑大量参数的片上电容灵敏度的有效计算算法
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391699
T. El-Moselhy, I. Elfadel, D. Widiger
Recent CAD methodologies of design-for-manufacturability (DFM) have naturally led to a significant increase in the number of process and layout parameters that have to be taken into account in design-rule checking. Methodological consistency requires that a similar number of parameters be taken into account during layout parasitic extraction. Because of the inherent variability of these parameters, the issue of efficiently extracting deterministic parasitic sensitivities with respect to such a large number of parameters must be addressed. In this paper, we tackle this very issue in the context of capacitance sensitivity extraction. In particular, we show how the adjoint sensitivity method can be efficiently integrated within a finite-difference (FD) scheme to compute the sensitivity of the capacitance with respect to a large set of BEOL parameters. If np is the number of parameters, the speedup of the adjoint method is shown to be a factor of np/2 with respect to direct FD sensitivity techniques. The proposed method has been implemented and verified on a 65 nm BEOL cross section having 10 metal layers and a total number of 59 parameters. Because of its speed, the method can be advantageously used to prune out of the CAD flow those BEOL parameters that yield a capacitance sensitivity less than a given threshold.
最近的可制造性设计(DFM)的CAD方法自然导致了在设计规则检查中必须考虑的工艺和布局参数数量的显著增加。方法一致性要求在布局寄生提取过程中考虑相似数量的参数。由于这些参数具有固有的可变性,因此必须解决从如此大量的参数中有效提取确定性寄生灵敏度的问题。在本文中,我们在电容灵敏度提取的背景下解决了这个问题。特别是,我们展示了伴随灵敏度方法如何有效地集成在有限差分(FD)方案中,以计算相对于大量BEOL参数的电容灵敏度。如果np是参数的个数,则伴随方法的加速速度相对于直接FD灵敏度技术是np/2的因数。该方法已在具有10个金属层和59个参数的65nm BEOL截面上进行了实现和验证。由于其速度快,该方法可以有利地用于从CAD流中修剪那些产生电容灵敏度小于给定阈值的BEOL参数。
{"title":"Efficient algorithm for the computation of on-chip capacitance sensitivities with respect to a large set of parameters","authors":"T. El-Moselhy, I. Elfadel, D. Widiger","doi":"10.1145/1391469.1391699","DOIUrl":"https://doi.org/10.1145/1391469.1391699","url":null,"abstract":"Recent CAD methodologies of design-for-manufacturability (DFM) have naturally led to a significant increase in the number of process and layout parameters that have to be taken into account in design-rule checking. Methodological consistency requires that a similar number of parameters be taken into account during layout parasitic extraction. Because of the inherent variability of these parameters, the issue of efficiently extracting deterministic parasitic sensitivities with respect to such a large number of parameters must be addressed. In this paper, we tackle this very issue in the context of capacitance sensitivity extraction. In particular, we show how the adjoint sensitivity method can be efficiently integrated within a finite-difference (FD) scheme to compute the sensitivity of the capacitance with respect to a large set of BEOL parameters. If np is the number of parameters, the speedup of the adjoint method is shown to be a factor of np/2 with respect to direct FD sensitivity techniques. The proposed method has been implemented and verified on a 65 nm BEOL cross section having 10 metal layers and a total number of 59 parameters. Because of its speed, the method can be advantageously used to prune out of the CAD flow those BEOL parameters that yield a capacitance sensitivity less than a given threshold.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1993 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128619858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
On reliable modular testing with vulnerable test access mechanisms 基于脆弱测试访问机制的可靠模块化测试研究
Pub Date : 2008-06-08 DOI: 10.1145/1391469.1391681
Lin Huang, F. Yuan, Q. Xu
In modular testing of system-on-a-chip (SoC), test access mechanisms (TAMs) are used to transport test data between the input/output pins of the SoC and the cores under test. Prior work assumes TAMs to be error-free during test data transfer. The validity of this assumption, however, is questionable with the ever-decreasing feature size of today's VLSI technology and the ever-increasing circuit operational frequency. In particular, when functional interconnects such as network-on-chip (NoC) are reused as TAMs, even if they have passed manufacturing test beforehand, failures caused by electrical noise such as crosstalk and transient errors may happen during test data transfer and make good chips appear to be defective, thus leading to undesired test yield loss. To address the above problem, in this paper, we propose novel solutions that are able to achieve reliable modular testing even if test data may sometimes get corrupted during transmission with vulnerable TAMs, by designing a new "jitter-aware" test wrapper and a new "jitter-transparent" ATE interface. Experimental results on an industrial circuit demonstrate the effectiveness of the proposed technique.
在片上系统(SoC)的模块化测试中,测试访问机制(tam)用于在SoC的输入/输出引脚和被测核心之间传输测试数据。先前的工作假设tam在测试数据传输过程中没有错误。然而,随着当今VLSI技术的特征尺寸不断减小和电路工作频率不断增加,这种假设的有效性受到质疑。特别是,当片上网络(NoC)等功能互连作为tam重复使用时,即使事先通过了制造测试,也可能在测试数据传输过程中发生串扰和瞬态错误等电气噪声引起的故障,使良好的芯片出现缺陷,从而导致不希望的测试良率损失。为了解决上述问题,在本文中,我们通过设计一个新的“抖动感知”测试包装器和一个新的“抖动透明”ATE接口,提出了新的解决方案,即使测试数据在易受攻击的tam传输过程中有时可能被损坏,也能够实现可靠的模块化测试。在工业电路上的实验结果证明了该方法的有效性。
{"title":"On reliable modular testing with vulnerable test access mechanisms","authors":"Lin Huang, F. Yuan, Q. Xu","doi":"10.1145/1391469.1391681","DOIUrl":"https://doi.org/10.1145/1391469.1391681","url":null,"abstract":"In modular testing of system-on-a-chip (SoC), test access mechanisms (TAMs) are used to transport test data between the input/output pins of the SoC and the cores under test. Prior work assumes TAMs to be error-free during test data transfer. The validity of this assumption, however, is questionable with the ever-decreasing feature size of today's VLSI technology and the ever-increasing circuit operational frequency. In particular, when functional interconnects such as network-on-chip (NoC) are reused as TAMs, even if they have passed manufacturing test beforehand, failures caused by electrical noise such as crosstalk and transient errors may happen during test data transfer and make good chips appear to be defective, thus leading to undesired test yield loss. To address the above problem, in this paper, we propose novel solutions that are able to achieve reliable modular testing even if test data may sometimes get corrupted during transmission with vulnerable TAMs, by designing a new \"jitter-aware\" test wrapper and a new \"jitter-transparent\" ATE interface. Experimental results on an industrial circuit demonstrate the effectiveness of the proposed technique.","PeriodicalId":412696,"journal":{"name":"2008 45th ACM/IEEE Design Automation Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-06-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128690533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2008 45th ACM/IEEE Design Automation Conference
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1