首页 > 最新文献

International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.最新文献

英文 中文
Facilitating reuse in hardware models with enhanced type inference 通过增强的类型推断促进硬件模型中的重用
Manish Vachharajani, Neil Vachharajani, S. Malik, David I. August
High-level hardware modeling is an essential, yet time-consuming, part of system design. However, effective component-based reuse in hardware modeling languages can reduce model construction time and enable the exploration of more design alternatives, leading to better designs. While component overloading and parametric polymorphism are critical for effective component-base reuse, no existing modeling language supports both. The lack of these features creates overhead for designers that discourages reuse, negating any benefits of reuse. This work presents a type system which supports both component overloading and parametric polymorphism. It proves that performing type inference for any such system is NP-complete and presents a heuristic that works efficiently in practice. The result is a type system and type inference algorithm that can encourage reuse, reduce design specification time, and lead to better designs.
高级硬件建模是系统设计的一个必要但耗时的部分。然而,在硬件建模语言中有效的基于组件的重用可以减少模型构建时间,并允许探索更多的设计备选方案,从而实现更好的设计。虽然组件重载和参数多态性对于有效的组件基重用至关重要,但没有现有的建模语言同时支持这两种语言。缺少这些特性会给设计人员带来开销,从而阻碍重用,从而抵消重用的任何好处。本文提出了一个既支持组件重载又支持参数多态的类型系统。证明了对任何这样的系统进行类型推理都是np完全的,并给出了一个在实践中有效的启发式方法。其结果是一个类型系统和类型推断算法,可以鼓励重用,减少设计规范时间,并导致更好的设计。
{"title":"Facilitating reuse in hardware models with enhanced type inference","authors":"Manish Vachharajani, Neil Vachharajani, S. Malik, David I. August","doi":"10.1145/1016720.1016744","DOIUrl":"https://doi.org/10.1145/1016720.1016744","url":null,"abstract":"High-level hardware modeling is an essential, yet time-consuming, part of system design. However, effective component-based reuse in hardware modeling languages can reduce model construction time and enable the exploration of more design alternatives, leading to better designs. While component overloading and parametric polymorphism are critical for effective component-base reuse, no existing modeling language supports both. The lack of these features creates overhead for designers that discourages reuse, negating any benefits of reuse. This work presents a type system which supports both component overloading and parametric polymorphism. It proves that performing type inference for any such system is NP-complete and presents a heuristic that works efficiently in practice. The result is a type system and type inference algorithm that can encourage reuse, reduce design specification time, and lead to better designs.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116728855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel deadlock avoidance algorithm and its hardware implementation 一种新的死锁避免算法及其硬件实现
J. Lee, V. Mooney
This work proposes a deadlock avoidance algorithm (DAA) and its hardware implementation, the deadlock avoidance unit (DAU), as an intellectual property (IP) core that provides a mechanism for very fast and automatic deadlock avoidance in multiprocessor system-on-a-chip (MP-SoC) with multiple (e.g., 10) processing elements and multiple (e.g., 40) resources. The DAU avoids deadlock by not allowing any grant or request that leads to a deadlock. In case of livelock, the DAU asks one of the processes involved in the livelock to release resource(s) so that the livelock can also be resolved. We simulated two realistic examples that can benefit from the DAU, and demonstrated that the DAU not only avoids deadlock in a few clock cycles but also achieves a 37% speed-up of application execution time over avoiding deadlock in software. Finally, the SoC area overhead due to the DAU is small, under 0.01% in our example.
本工作提出了一种死锁避免算法(DAA)及其硬件实现,即死锁避免单元(DAU),作为知识产权(IP)核心,在具有多个(例如,10个)处理元素和多个(例如,40个)资源的多处理器片上系统(MP-SoC)中提供了一种非常快速和自动的死锁避免机制。DAU通过不允许任何导致死锁的授权或请求来避免死锁。在发生活锁的情况下,DAU要求活锁中涉及的一个进程释放资源,以便也可以解决活锁。我们模拟了两个可以从DAU中获益的现实例子,并证明DAU不仅可以在几个时钟周期内避免死锁,而且可以使应用程序执行时间比避免软件死锁提高37%。最后,由于DAU而产生的SoC面积开销很小,在我们的例子中低于0.01%。
{"title":"A novel deadlock avoidance algorithm and its hardware implementation","authors":"J. Lee, V. Mooney","doi":"10.1145/1016720.1016769","DOIUrl":"https://doi.org/10.1145/1016720.1016769","url":null,"abstract":"This work proposes a deadlock avoidance algorithm (DAA) and its hardware implementation, the deadlock avoidance unit (DAU), as an intellectual property (IP) core that provides a mechanism for very fast and automatic deadlock avoidance in multiprocessor system-on-a-chip (MP-SoC) with multiple (e.g., 10) processing elements and multiple (e.g., 40) resources. The DAU avoids deadlock by not allowing any grant or request that leads to a deadlock. In case of livelock, the DAU asks one of the processes involved in the livelock to release resource(s) so that the livelock can also be resolved. We simulated two realistic examples that can benefit from the DAU, and demonstrated that the DAU not only avoids deadlock in a few clock cycles but also achieves a 37% speed-up of application execution time over avoiding deadlock in software. Finally, the SoC area overhead due to the DAU is small, under 0.01% in our example.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114738869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Current flattening in software and hardware for security applications 目前安全应用软件和硬件的扁平化
R. Muresan, C. Gebotys
This work presents a new current flattening technique applicable in software and hardware. This technique is important in embedded cryptosystems since power analysis attacks (that make use of the current variation dependency on data and program) compromise the security of the system. The technique flattens the current internally by exploiting current consumption differences at the instruction level. Code transformations supporting current variation reductions due to program dependencies are presented. Also, real-time hardware architecture capable of reducing the current to data and program dependencies is proposed. Measured and simulated current waveforms of cryptographic software are presented in support of these techniques.
本文提出了一种适用于软件和硬件的电流平坦化新技术。这种技术在嵌入式密码系统中很重要,因为功率分析攻击(利用当前对数据和程序的变化依赖)会危及系统的安全性。该技术通过利用指令级的电流消耗差异使电流在内部变平。由于程序的依赖性,代码转换支持当前的变化减少。同时,提出了一种能够减少当前对数据和程序依赖的实时硬件架构。为了支持这些技术,给出了加密软件的测量和模拟电流波形。
{"title":"Current flattening in software and hardware for security applications","authors":"R. Muresan, C. Gebotys","doi":"10.1145/1016720.1016773","DOIUrl":"https://doi.org/10.1145/1016720.1016773","url":null,"abstract":"This work presents a new current flattening technique applicable in software and hardware. This technique is important in embedded cryptosystems since power analysis attacks (that make use of the current variation dependency on data and program) compromise the security of the system. The technique flattens the current internally by exploiting current consumption differences at the instruction level. Code transformations supporting current variation reductions due to program dependencies are presented. Also, real-time hardware architecture capable of reducing the current to data and program dependencies is proposed. Measured and simulated current waveforms of cryptographic software are presented in support of these techniques.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122163574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
Reducing power and latency in 2-D mesh NoCs using globally pseudochronous locally synchronous clocking 利用全局伪同步局部同步时钟降低二维网格noc的功耗和延迟
E. Nilsson, Johnny Öberg
One of the main problems when designing large ASICs today is to distribute a low power synchronous clock over the whole chip and a lot of remedies to this problem have been proposed over the years. For networks-on-chip (NoC), where computational resources are organised in a 2-D mesh connected together through switches in an on-chip interconnection network, another possibility exists: globally pseudochronous locally synchronous clock distribution. We present a clocking scheme for NoCs that we call globally pseudochronous locally synchronous, in which we distribute a clock with a constant phase difference between the switches. As a consequence of the phase difference, some paths along the NoC switch network become faster than the others. We call these paths data motorways. By adapting the switching policy in the switches to prefer data to use the motorways, we show that the latency within the network is reduced with up to 40% compared to a synchronous reference case. The phase difference between the resources also makes the circuit more tolerant to clock skew. It also distributes the current peaks more evenly across the clock period, which leads to a reduction in peak power, which in turn further reduces the clock skew and the jitter in the clock network.
当今设计大型asic时的主要问题之一是在整个芯片上分配低功耗同步时钟,多年来已经提出了许多解决这个问题的方法。对于片上网络(NoC),计算资源被组织在一个二维网格中,通过片上互连网络中的交换机连接在一起,存在另一种可能性:全局伪同步本地同步时钟分布。我们提出了一种noc的时钟方案,我们称之为全局伪同步局部同步,其中我们在交换机之间分配具有恒定相位差的时钟。由于相位差,沿NoC交换网络的一些路径变得比其他路径更快。我们称这些路径为数据高速公路。通过调整交换机中的交换策略来优先使用高速公路上的数据,我们表明,与同步参考情况相比,网络内的延迟减少了40%。资源之间的相位差也使电路更能容忍时钟倾斜。它还在整个时钟周期内更均匀地分配电流峰值,从而导致峰值功率的降低,从而进一步降低时钟网络中的时钟倾斜和抖动。
{"title":"Reducing power and latency in 2-D mesh NoCs using globally pseudochronous locally synchronous clocking","authors":"E. Nilsson, Johnny Öberg","doi":"10.1145/1016720.1016764","DOIUrl":"https://doi.org/10.1145/1016720.1016764","url":null,"abstract":"One of the main problems when designing large ASICs today is to distribute a low power synchronous clock over the whole chip and a lot of remedies to this problem have been proposed over the years. For networks-on-chip (NoC), where computational resources are organised in a 2-D mesh connected together through switches in an on-chip interconnection network, another possibility exists: globally pseudochronous locally synchronous clock distribution. We present a clocking scheme for NoCs that we call globally pseudochronous locally synchronous, in which we distribute a clock with a constant phase difference between the switches. As a consequence of the phase difference, some paths along the NoC switch network become faster than the others. We call these paths data motorways. By adapting the switching policy in the switches to prefer data to use the motorways, we show that the latency within the network is reduced with up to 40% compared to a synchronous reference case. The phase difference between the resources also makes the circuit more tolerant to clock skew. It also distributes the current peaks more evenly across the clock period, which leads to a reduction in peak power, which in turn further reduces the clock skew and the jitter in the clock network.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128981040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 47
Multi-objective mapping for mesh-based NoC architectures 基于网格的NoC体系结构多目标映射
G. Ascia, V. Catania, M. Palesi
We present an approach to multi-objective exploration of the mapping space of a mesh-based network-on-chip architecture. Based on evolutionary computing techniques, the approach is an efficient and accurate way to obtain the Pareto mappings that optimize performance and power consumption. Integration of the approach in an exploration framework with a kernel based on an event-driven trace-based simulator makes it possible to take account of important dynamic effects that have a great impact on mapping. Validation on both synthesized traffic and real applications (an MPEG-2 encoder/decoder system) confirms the efficiency, accuracy and scalability of the approach.
我们提出了一种基于网格的片上网络架构映射空间的多目标探索方法。该方法基于进化计算技术,是一种高效、准确的获得优化性能和功耗的帕累托映射的方法。将该方法与基于事件驱动的基于跟踪的模拟器的内核集成在勘探框架中,可以考虑对映射有重大影响的重要动态效果。综合流量和实际应用(MPEG-2编码器/解码器系统)验证了该方法的效率、准确性和可扩展性。
{"title":"Multi-objective mapping for mesh-based NoC architectures","authors":"G. Ascia, V. Catania, M. Palesi","doi":"10.1145/1016720.1016765","DOIUrl":"https://doi.org/10.1145/1016720.1016765","url":null,"abstract":"We present an approach to multi-objective exploration of the mapping space of a mesh-based network-on-chip architecture. Based on evolutionary computing techniques, the approach is an efficient and accurate way to obtain the Pareto mappings that optimize performance and power consumption. Integration of the approach in an exploration framework with a kernel based on an event-driven trace-based simulator makes it possible to take account of important dynamic effects that have a great impact on mapping. Validation on both synthesized traffic and real applications (an MPEG-2 encoder/decoder system) confirms the efficiency, accuracy and scalability of the approach.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129215052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 178
Fast exploration of bus-based on-chip communication architectures 快速探索基于总线的片上通信架构
S. Pasricha, N. Dutt, M. Ben-Romdhane
As a result of improvements in process technology, more and more components are being integrated into a single system-on-chip (SoC) design. Communication between these components is increasingly dominating critical system paths and frequently becomes the source of performance bottlenecks. It therefore becomes extremely important for designers to explore the communication space early in the design flow. Traditionally, pin-accurate bus cycle accurate (PA-BCA) models were used for exploring the communication space. To speed up simulation, transaction based bus cycle accurate (T-BCA) models have been proposed, which borrow concepts found in the transaction level modeling (TLM) domain. The cycle count accurate at transaction boundaries (CCATB) modeling abstraction was introduced for fast communication space exploration. In This work, we describe the mechanisms that produce the speedup in CCATB models and demonstrate the effectiveness of the CCATB exploration approach with the aid of a case study involving an AMBA 2.0 based SoC subsystem used in the multimedia application domain. We also analyze how the achieved simulation speedup scales with design complexity and show that SoC designs modeled at the CCATB level simulate 120% faster than PA-BCA and 67% faster than T-BCA models on average.
由于工艺技术的改进,越来越多的组件被集成到单个片上系统(SoC)设计中。这些组件之间的通信越来越多地支配着关键的系统路径,并经常成为性能瓶颈的来源。因此,设计师在设计流程的早期探索交流空间变得极其重要。传统上,采用引脚精确总线周期精确(PA-BCA)模型来探索通信空间。为了加快仿真速度,提出了基于事务的总线周期精确(T-BCA)模型,该模型借鉴了事务级建模(TLM)领域的概念。为了快速探索通信空间,引入了事务边界精确循环计数(CCATB)建模抽象。在这项工作中,我们描述了在CCATB模型中产生加速的机制,并通过一个涉及多媒体应用领域中基于AMBA 2.0的SoC子系统的案例研究来证明CCATB探索方法的有效性。我们还分析了所实现的仿真加速是如何随着设计复杂性的增加而增加的,并表明在CCATB级别建模的SoC设计的仿真速度比PA-BCA平均快120%,比T-BCA模型平均快67%。
{"title":"Fast exploration of bus-based on-chip communication architectures","authors":"S. Pasricha, N. Dutt, M. Ben-Romdhane","doi":"10.1145/1016720.1016778","DOIUrl":"https://doi.org/10.1145/1016720.1016778","url":null,"abstract":"As a result of improvements in process technology, more and more components are being integrated into a single system-on-chip (SoC) design. Communication between these components is increasingly dominating critical system paths and frequently becomes the source of performance bottlenecks. It therefore becomes extremely important for designers to explore the communication space early in the design flow. Traditionally, pin-accurate bus cycle accurate (PA-BCA) models were used for exploring the communication space. To speed up simulation, transaction based bus cycle accurate (T-BCA) models have been proposed, which borrow concepts found in the transaction level modeling (TLM) domain. The cycle count accurate at transaction boundaries (CCATB) modeling abstraction was introduced for fast communication space exploration. In This work, we describe the mechanisms that produce the speedup in CCATB models and demonstrate the effectiveness of the CCATB exploration approach with the aid of a case study involving an AMBA 2.0 based SoC subsystem used in the multimedia application domain. We also analyze how the achieved simulation speedup scales with design complexity and show that SoC designs modeled at the CCATB level simulate 120% faster than PA-BCA and 67% faster than T-BCA models on average.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115257915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 69
Analytical models for leakage power estimation of memory array structures 存储阵列结构泄漏功率估计的分析模型
M. Mamidipaka, K. Khouri, N. Dutt, M. Abadir
There is a growing need for accurate power models at the system level. Memory structures such as caches, branch target buffers (BTBs), and register files occupy significant area in contemporary SoC designs and are the main contributors to system leakage power dissipation. Existing models for leakage power estimation in array structures typically use coefficients derived from elaborate SPICE simulations. However, these methodologies are not applicable to array designs in a newer technology, that require power estimates early in the design cycle. In this paper, we propose analytical models for array structures that are based only on high level design parameters. Assuming typical circuit implementation styles, we identify the transistors that contribute to the leakage power in each array sub-circuit and develop models as a function of the operation (read/write/idle) on the array and organizational parameters of the array. The developed models are validated by comparing their estimates against the leakage power measured using SPICE simulations on industrial array designs belonging to the e500 processor core. The comparison shows that the models are accurate with an error margin of less than 21.5% and thus can be used in high-level power-performance exploration. Interestingly, in array designs with dual threshold voltage technology, we observed that contrary to the general expectation, the array memory core contributes to just 9% and the address decoder contributes to as much as 62% of the total leakage power.
对系统级精确功率模型的需求日益增长。缓存、分支目标缓冲区(btb)和寄存器文件等内存结构在当代SoC设计中占据了很大的面积,并且是导致系统泄漏功耗的主要因素。现有的阵列结构泄漏功率估计模型通常使用精细SPICE模拟得出的系数。然而,这些方法不适用于新技术中的阵列设计,因为它们需要在设计周期的早期进行功率估计。在本文中,我们提出了仅基于高层次设计参数的阵列结构的分析模型。假设典型的电路实现风格,我们确定了导致每个阵列子电路中泄漏功率的晶体管,并开发了作为阵列操作(读/写/空闲)和阵列组织参数函数的模型。通过将所开发的模型的估计与在属于e500处理器核心的工业阵列设计上使用SPICE模拟测量的泄漏功率进行比较,验证了所开发模型的有效性。对比结果表明,该模型精度较高,误差范围小于21.5%,可用于高功率性能勘探。有趣的是,在采用双阈值电压技术的阵列设计中,我们观察到与一般预期相反,阵列存储核心仅贡献9%的总泄漏功率,而地址解码器贡献高达62%的总泄漏功率。
{"title":"Analytical models for leakage power estimation of memory array structures","authors":"M. Mamidipaka, K. Khouri, N. Dutt, M. Abadir","doi":"10.1145/1016720.1016757","DOIUrl":"https://doi.org/10.1145/1016720.1016757","url":null,"abstract":"There is a growing need for accurate power models at the system level. Memory structures such as caches, branch target buffers (BTBs), and register files occupy significant area in contemporary SoC designs and are the main contributors to system leakage power dissipation. Existing models for leakage power estimation in array structures typically use coefficients derived from elaborate SPICE simulations. However, these methodologies are not applicable to array designs in a newer technology, that require power estimates early in the design cycle. In this paper, we propose analytical models for array structures that are based only on high level design parameters. Assuming typical circuit implementation styles, we identify the transistors that contribute to the leakage power in each array sub-circuit and develop models as a function of the operation (read/write/idle) on the array and organizational parameters of the array. The developed models are validated by comparing their estimates against the leakage power measured using SPICE simulations on industrial array designs belonging to the e500 processor core. The comparison shows that the models are accurate with an error margin of less than 21.5% and thus can be used in high-level power-performance exploration. Interestingly, in array designs with dual threshold voltage technology, we observed that contrary to the general expectation, the array memory core contributes to just 9% and the address decoder contributes to as much as 62% of the total leakage power.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114645634","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
A timing-accurate HW/SW cosimulation of an ISS with SystemC 国际空间站与SystemC定时精确的软硬件联合仿真
L. Formaggio, F. Fummi, G. Pravadelli
The paper presents a system level co-simulation methodology for modeling, validating, and analyzing the performance of embedded systems. The proposed solution relies on the integration between an instruction set simulator (ISS) and the SystemC simulation kernel. In this way, the ISS is used to abstract the model of the real programmable device where the SW should run, while SystemC is used to model HW components that interact with the SW. A correct validation of such an architecture is infeasible without taking care of timing information. Thus, the paper proposes an effective timing synchronization mechanism, which uses timing information of an ISS (or a board) to synchronize the SystemC simulation.
本文提出了一种用于嵌入式系统性能建模、验证和分析的系统级联合仿真方法。提出的解决方案依赖于指令集模拟器(ISS)和SystemC仿真内核之间的集成。通过这种方式,ISS用于抽象软件应该运行的实际可编程设备的模型,而SystemC用于建模与软件交互的硬件组件。如果不考虑时间信息,对这种体系结构的正确验证是不可行的。因此,本文提出了一种有效的时序同步机制,利用ISS(或单板)的时序信息来同步SystemC仿真。
{"title":"A timing-accurate HW/SW cosimulation of an ISS with SystemC","authors":"L. Formaggio, F. Fummi, G. Pravadelli","doi":"10.1145/1016720.1016759","DOIUrl":"https://doi.org/10.1145/1016720.1016759","url":null,"abstract":"The paper presents a system level co-simulation methodology for modeling, validating, and analyzing the performance of embedded systems. The proposed solution relies on the integration between an instruction set simulator (ISS) and the SystemC simulation kernel. In this way, the ISS is used to abstract the model of the real programmable device where the SW should run, while SystemC is used to model HW components that interact with the SW. A correct validation of such an architecture is infeasible without taking care of timing information. Thus, the paper proposes an effective timing synchronization mechanism, which uses timing information of an ISS (or a board) to synchronize the SystemC simulation.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115889979","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Compiler-directed code restructuring for reducing data TLB energy 编译器导向的代码重组,以减少数据TLB能量
M. Kandemir, I. Kadayif, G. Chen
Prior work on TLB power optimization considered circuit and architectural techniques. A recent software-based technique for data TLBs has considered the possibility of storing the frequently used virtual-to-physical address translations in a set of translation registers (TRs), and using them when necessary instead of going to the data TLB. This work presents a compiler-based strategy for increasing the effectiveness of TRs. The idea is to restructure the application code in such a fashion that once a TR is loaded, its contents are reused as much as possible. Our experimental evaluation with six array-based benchmarks from the Spec2000 suite indicates that the proposed TR reuse strategy brings significant reductions in data TLB energy over an alternate strategy that employs TRs but does not restructure the code for TR reuse.
先前关于TLB功率优化的工作考虑了电路和架构技术。最近一种基于软件的数据TLB技术考虑了将经常使用的虚拟到物理地址转换存储在一组转换寄存器(TRs)中的可能性,并在必要时使用它们,而不是转到数据TLB。这项工作提出了一种基于编译器的策略来提高tr的有效性。其思想是以这样一种方式重构应用程序代码,即一旦加载了TR,其内容将尽可能地被重用。我们对来自Spec2000套件的六个基于阵列的基准测试进行了实验评估,结果表明,与使用TR但不重构TR重用代码的替代策略相比,所提出的TR重用策略显著减少了数据TLB能量。
{"title":"Compiler-directed code restructuring for reducing data TLB energy","authors":"M. Kandemir, I. Kadayif, G. Chen","doi":"10.1145/1016720.1016747","DOIUrl":"https://doi.org/10.1145/1016720.1016747","url":null,"abstract":"Prior work on TLB power optimization considered circuit and architectural techniques. A recent software-based technique for data TLBs has considered the possibility of storing the frequently used virtual-to-physical address translations in a set of translation registers (TRs), and using them when necessary instead of going to the data TLB. This work presents a compiler-based strategy for increasing the effectiveness of TRs. The idea is to restructure the application code in such a fashion that once a TR is loaded, its contents are reused as much as possible. Our experimental evaluation with six array-based benchmarks from the Spec2000 suite indicates that the proposed TR reuse strategy brings significant reductions in data TLB energy over an alternate strategy that employs TRs but does not restructure the code for TR reuse.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123624966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Efficient exploration of on-chip bus architectures and memory allocation 片上总线架构和内存分配的有效探索
Sungchan Kim, Chaeseok Im, S. Ha
Separation between computation and communication in system design allows the system designer to explore the communication architecture independently of component selection and mapping. We present an iterative two-step exploration methodology for bus-based on-chip communication architecture and memory allocation, assuming that memory traces from the processing elements are given from the mapping stage. The proposed method uses a static performance estimation technique to reduce the large design space drastically and quickly, and applies a trace-driven simulation technique to the reduced set of design candidates for accurate performance estimation. Since local memory traffic as well as shared memory traffic are involved in bus contention, memory allocation is considered as an important axis of the design space in our technique. The viability and efficiency of the proposed methodology are validated by two real-life examples, 4-channel digital video recorder (DVR) and an equalizer for OFDM DVB-T receiver.
系统设计中计算与通信的分离允许系统设计者独立于组件的选择和映射来探索通信体系结构。我们提出了一种迭代的两步探索方法,用于基于总线的片上通信架构和内存分配,假设处理元素的内存轨迹是从映射阶段给出的。该方法采用静态性能估计技术来大幅度、快速地缩小大的设计空间,并采用跟踪驱动仿真技术对缩小后的候选设计集进行精确的性能估计。由于本地内存流量和共享内存流量都涉及总线争用,因此在我们的技术中,内存分配被视为设计空间的一个重要轴。通过4通道数字视频录像机(DVR)和OFDM DVB-T接收机均衡器两个实际实例验证了该方法的可行性和有效性。
{"title":"Efficient exploration of on-chip bus architectures and memory allocation","authors":"Sungchan Kim, Chaeseok Im, S. Ha","doi":"10.1145/1016720.1016779","DOIUrl":"https://doi.org/10.1145/1016720.1016779","url":null,"abstract":"Separation between computation and communication in system design allows the system designer to explore the communication architecture independently of component selection and mapping. We present an iterative two-step exploration methodology for bus-based on-chip communication architecture and memory allocation, assuming that memory traces from the processing elements are given from the mapping stage. The proposed method uses a static performance estimation technique to reduce the large design space drastically and quickly, and applies a trace-driven simulation technique to the reduced set of design candidates for accurate performance estimation. Since local memory traffic as well as shared memory traffic are involved in bus contention, memory allocation is considered as an important axis of the design space in our technique. The viability and efficiency of the proposed methodology are validated by two real-life examples, 4-channel digital video recorder (DVR) and an equalizer for OFDM DVB-T receiver.","PeriodicalId":127038,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2004-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116442218","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
期刊
International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004.
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1