International Conference on Hardware/Software Codesign and System Synthesis最新文献

英文中文

Applying network calculus for performance analysis of self-similar traffic in on-chip networks 将网络演算应用于片上网络自相似流量的性能分析

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629497

Yue Qian, Zhonghai Lu, Wenhua Dou

On-chip traffic of many applications exhibits self-similar characteristics. In this paper, we intend to apply network calculus to analyze the delay and backlog bounds for self-similar traffic in networks on chips. We first prove that self-similar traffic can not be constrained by any deterministic arrival curve. Then we prove that self-similar traffic can be constrained by deterministic linear arrival curves α{r,b}(t)=rt+b (r:rate, b:burstiness) if an additional parameter, excess probability ε, is used to capture its burstiness exceeding the arrival envelope. This three-parameter model, ε-α{r,b}(t)=rt+b(ε), enables us to apply and extend the results of network calculus to analyze the performance and buffering cost of networks delivering self-similar traffic flows. Assuming the latency-rate server model for the network elements, we give closed-form equations to compute the delay and backlog bounds for self-similar traffic traversing a series of network elements. Furthermore, we describe a performance analysis flow with self-similar traffic as input. Our experimental results using real on-chip multimedia traffic traces validate our model and approach.

许多应用程序的片上流量表现出自相似的特性。在本文中，我们打算应用网络演算来分析芯片网络中自相似流量的延迟和积压边界。首先证明了自相似交通不受任何确定性到达曲线的约束。然后，我们证明了如果使用一个额外的参数，即超额概率ε来捕获其超过到达包络线的突发性，则自相似流量可以被确定性线性到达曲线α{r,b}(t)=rt+b (r:速率，b:突发性)约束。这个三参数模型ε-α{r,b}(t)=rt+b(ε)，使我们能够应用和扩展网络演算的结果来分析传递自相似流量的网络的性能和缓冲成本。假设网络元的延迟率服务器模型，我们给出了计算自相似流量穿越一系列网络元的延迟和积压边界的封闭形式方程。此外，我们还描述了一个以自相似流量为输入的性能分析流。我们使用真实片上多媒体流量轨迹的实验结果验证了我们的模型和方法。

{"title":"Applying network calculus for performance analysis of self-similar traffic in on-chip networks","authors":"Yue Qian, Zhonghai Lu, Wenhua Dou","doi":"10.1145/1629435.1629497","DOIUrl":"https://doi.org/10.1145/1629435.1629497","url":null,"abstract":"On-chip traffic of many applications exhibits self-similar characteristics. In this paper, we intend to apply network calculus to analyze the delay and backlog bounds for self-similar traffic in networks on chips. We first prove that self-similar traffic can not be constrained by any deterministic arrival curve. Then we prove that self-similar traffic can be constrained by deterministic linear arrival curves α{r,b}(t)=rt+b (r:rate, b:burstiness) if an additional parameter, excess probability ε, is used to capture its burstiness exceeding the arrival envelope. This three-parameter model, ε-α{r,b}(t)=rt+b(ε), enables us to apply and extend the results of network calculus to analyze the performance and buffering cost of networks delivering self-similar traffic flows. Assuming the latency-rate server model for the network elements, we give closed-form equations to compute the delay and backlog bounds for self-similar traffic traversing a series of network elements. Furthermore, we describe a performance analysis flow with self-similar traffic as input. Our experimental results using real on-chip multimedia traffic traces validate our model and approach.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131799049","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 17

FlexRay schedule optimization of the static segment FlexRay的静态分段调度优化

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629485

M. Lukasiewycz, M. Glaß, J. Teich, Paul Milbredt

The FlexRay bus is the prospective automotive standard communication system. For the sake of a high exibility, the protocol includes a static time-triggered and a dynamic event-triggered segment. This paper is dedicated to the scheduling of the static segment in compliance with the automotive-specific AUTOSAR standard. For the determination of an optimal schedule in terms of the number of used slots, a fast greedy heuristic as well as a complete approach based on Integer Linear Programming are presented. For this purpose, a scheme for the transformation of the scheduling problem into a bin packing problem is proposed. Moreover, a metric and optimization method for the extensibility of partially used slots is introduced. Finally, the provided experimental results give evidence of the benefits of the proposed methods. On a realistic case study, the proposed methods are capable of obtaining better results in a significantly smaller amount of time compared to a commercial tool. Additionally, the experimental results provide a case study on incremental scheduling, a scalability analysis, an exploration use case, and an additional test case to emphasis the robustness and exibility of the proposed methods.

FlexRay总线是未来的汽车标准通信系统。为了提高灵活性，协议包括静态时间触发段和动态事件触发段。本文研究了符合汽车专用AUTOSAR标准的静态路段调度问题。针对以槽数确定最优调度问题，提出了一种快速贪婪启发式算法和基于整数线性规划的完备方法。为此，提出了一种将调度问题转化为装箱问题的方案。此外，还提出了一种局部使用槽的可扩展性度量和优化方法。最后，给出的实验结果证明了所提方法的有效性。在实际的案例研究中，与商业工具相比，所提出的方法能够在更短的时间内获得更好的结果。此外，实验结果提供了增量调度的案例研究、可扩展性分析、探索用例和额外的测试用例，以强调所提出方法的鲁棒性和灵活性。

{"title":"FlexRay schedule optimization of the static segment","authors":"M. Lukasiewycz, M. Glaß, J. Teich, Paul Milbredt","doi":"10.1145/1629435.1629485","DOIUrl":"https://doi.org/10.1145/1629435.1629485","url":null,"abstract":"The FlexRay bus is the prospective automotive standard communication system. For the sake of a high exibility, the protocol includes a static time-triggered and a dynamic event-triggered segment. This paper is dedicated to the scheduling of the static segment in compliance with the automotive-specific AUTOSAR standard. For the determination of an optimal schedule in terms of the number of used slots, a fast greedy heuristic as well as a complete approach based on Integer Linear Programming are presented. For this purpose, a scheme for the transformation of the scheduling problem into a bin packing problem is proposed. Moreover, a metric and optimization method for the extensibility of partially used slots is introduced. Finally, the provided experimental results give evidence of the benefits of the proposed methods. On a realistic case study, the proposed methods are capable of obtaining better results in a significantly smaller amount of time compared to a commercial tool. Additionally, the experimental results provide a case study on incremental scheduling, a scalability analysis, an exploration use case, and an additional test case to emphasis the robustness and exibility of the proposed methods.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116446287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 121

FRA: a flash-aware redundancy array of flash storage devices FRA: flash存储设备的flash感知冗余阵列

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629459

Yangsup Lee, Sanghyuk Jung, Y. Song

Since flash memory has many attractive characteristics such as high performance, non-volatility, low power consumption and shock resistance, it has been widely used as storage media in the embedded and computer system environments. In the case of reliability, however, there are many shortcomings in flash memory: potentially high I/O latency due to erase-before-write and poor durability due to limited erase cycles. To overcome these problems, a RAID technique borrowed from storage technology based on hard disks is employed. In the RAID technology, multi-bit burst failures in the page, block or device are easily detected and corrected so that the reliability can be significantly enhanced. However the existing RAID-5 scheme for the flash-based storage has delayed response time for parity updating. To overcome this problem, we propose a novel approach using a RAID technique in flash storage, called Flash-aware Redundancy Array. In this approach, parity updates are postponed so that they are not included in the critical path of read and write operations. Instead, they are scheduled for when the device becomes idle. For example, the proposed scheme shows a 19% improvement in the average write response time, compared to other approaches.

由于闪存具有高性能、无易失性、低功耗和抗冲击等优点，已广泛应用于嵌入式和计算机系统环境中作为存储介质。然而，就可靠性而言，闪存存在许多缺点:由于写前擦除，可能存在较高的I/O延迟;由于擦除周期有限，持久性较差。为了克服这些问题，采用了借鉴基于硬盘的存储技术的RAID技术。在RAID技术中，可以很容易地检测和纠正页面、块或设备中的多位突发故障，从而大大提高了可靠性。然而，现有的基于闪存的RAID-5方案延迟了奇偶更新的响应时间。为了克服这个问题，我们提出了一种使用闪存中的RAID技术的新方法，称为闪存感知冗余阵列。在这种方法中，奇偶校验更新被延迟，因此它们不包括在读写操作的关键路径中。相反，它们是在设备空闲时安排的。例如，与其他方法相比，所提出的方案在平均写响应时间上提高了19%。

{"title":"FRA: a flash-aware redundancy array of flash storage devices","authors":"Yangsup Lee, Sanghyuk Jung, Y. Song","doi":"10.1145/1629435.1629459","DOIUrl":"https://doi.org/10.1145/1629435.1629459","url":null,"abstract":"Since flash memory has many attractive characteristics such as high performance, non-volatility, low power consumption and shock resistance, it has been widely used as storage media in the embedded and computer system environments. In the case of reliability, however, there are many shortcomings in flash memory: potentially high I/O latency due to erase-before-write and poor durability due to limited erase cycles. To overcome these problems, a RAID technique borrowed from storage technology based on hard disks is employed. In the RAID technology, multi-bit burst failures in the page, block or device are easily detected and corrected so that the reliability can be significantly enhanced. However the existing RAID-5 scheme for the flash-based storage has delayed response time for parity updating. To overcome this problem, we propose a novel approach using a RAID technique in flash storage, called Flash-aware Redundancy Array. In this approach, parity updates are postponed so that they are not included in the critical path of read and write operations. Instead, they are scheduled for when the device becomes idle. For example, the proposed scheme shows a 19% improvement in the average write response time, compared to other approaches.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122584699","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 76

Using binary translation in event driven simulation for fast and flexible MPSoC simulation 在事件驱动仿真中使用二进制转换实现快速灵活的MPSoC仿真

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629446

M. Gligor, Nicolas Fournel, F. Pétrot

In this paper, we investigate the use of instruction set simulators (ISS) based on binary translation to accelerate full timed multiprocessor system simulation at transaction level. To have an accurate timing behavior, we had to firstly solve timing issues in processor modeling, secondly define fast and precise cache models, and thirdly solve the synchronization issues due to the different models of computation used in the ISSes and in the rest of the system. We present an integration solution that covers these issues and detail its implementation. We have experimented our proposal using processors models provided by the QEMU framework to replace the existing ISSes and SystemC TLM as simulation environment for the whole platform. This approach proposes a range of solutions trading off simulation speed versus accuracy. The experiments show that even for the most precise configuration, the simulation speedup is still significant.

本文研究了基于二进制转换的指令集模拟器(ISS)在事务级加速全时多处理器系统仿真的方法。为了获得准确的计时行为，我们必须首先解决处理器建模中的计时问题，其次定义快速精确的缓存模型，第三解决由于isse和系统其他部分使用的不同计算模型而导致的同步问题。我们提供了一个集成解决方案，涵盖了这些问题并详细介绍了其实现。我们使用QEMU框架提供的处理器模型来替代现有的isse和SystemC TLM作为整个平台的仿真环境，对我们的建议进行了实验。这种方法提出了一系列解决方案，以权衡模拟速度与准确性。实验表明，即使对于最精确的配置，仿真加速仍然是显著的。

引用次数: 73

Scalable and retargetable simulation techniquesfor multiprocessor systems 多处理器系统的可扩展和可重定向仿真技术

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629448

Heekyung Kim, Dukyoung Yun, S. Ha

For design space exploration of embedded systems, a virtual prototyping system is commonly used to verify the expected performance as well as functionality before a hardware prototype is built. For accurate performance estimation, a virtual prototyping system is constructed by replacing real processing components with component simulators running concurrently. In such a distributed simulation system, the overhead of communication and synchronization between the component simulators increases in proportion to the number of simulators in case the lock-step synchronization is used. As a result the simulation performance is degraded significantly as the number of processors integrated in a chip increases. To overcome this problem, we propose a scalable and retargetable simulation technique that boosts the simulation performance significantly, by attaching a simulator wrapper to each component simulator. The simulator wrapper performs synchronization on behalf of the associated simulator itself between the simulators and the simulation backplane. Use of the simulator wrapper also makes the proposed simulation platform retargetable since a third-party simulator like ARMulator can be integrated into the simulation environment through a wrapper without modification. In addition, it enables parallel simulation that achieves almost linear speed-up as the number of processor cores increases in the simulation host. Through experiments with multimedia CODEC application and other applications varying the number of processor simulators from 1 to 16, it is proved that the simulation performance remains constant. And scalable performance from parallel simulation is also confirmed by experiments.

在探索嵌入式系统的设计空间时，通常使用虚拟原型系统来验证预期性能和功能，然后再构建硬件原型。为了准确估算性能，虚拟原型系统是通过用并发运行的组件模拟器替代真实处理组件来构建的。在这种分布式仿真系统中，如果使用锁步同步，组件仿真器之间的通信和同步开销会随着仿真器数量的增加而成正比增加。因此，随着集成在芯片中的处理器数量的增加，仿真性能会明显下降。为了克服这一问题，我们提出了一种可扩展、可重定向的仿真技术，通过为每个组件仿真器附加一个仿真器包装器来显著提高仿真性能。仿真器包装器代表相关仿真器本身在仿真器和仿真背板之间执行同步。仿真器包装器的使用还使拟议的仿真平台具有可重定向性，因为第三方仿真器（如 ARMulator）无需修改即可通过包装器集成到仿真环境中。此外，它还实现了并行仿真，随着仿真主机中处理器内核数量的增加，仿真速度几乎呈线性提升。通过对多媒体 CODEC 应用程序和其他应用程序进行实验，将处理器模拟器的数量从 1 个增加到 16 个，结果证明模拟性能保持不变。实验还证实了并行仿真的可扩展性能。

{"title":"Scalable and retargetable simulation techniquesfor multiprocessor systems","authors":"Heekyung Kim, Dukyoung Yun, S. Ha","doi":"10.1145/1629435.1629448","DOIUrl":"https://doi.org/10.1145/1629435.1629448","url":null,"abstract":"For design space exploration of embedded systems, a virtual prototyping system is commonly used to verify the expected performance as well as functionality before a hardware prototype is built. For accurate performance estimation, a virtual prototyping system is constructed by replacing real processing components with component simulators running concurrently. In such a distributed simulation system, the overhead of communication and synchronization between the component simulators increases in proportion to the number of simulators in case the lock-step synchronization is used. As a result the simulation performance is degraded significantly as the number of processors integrated in a chip increases. To overcome this problem, we propose a scalable and retargetable simulation technique that boosts the simulation performance significantly, by attaching a simulator wrapper to each component simulator. The simulator wrapper performs synchronization on behalf of the associated simulator itself between the simulators and the simulation backplane. Use of the simulator wrapper also makes the proposed simulation platform retargetable since a third-party simulator like ARMulator can be integrated into the simulation environment through a wrapper without modification. In addition, it enables parallel simulation that achieves almost linear speed-up as the number of processor cores increases in the simulation host. Through experiments with multimedia CODEC application and other applications varying the number of processor simulators from 1 to 16, it is proved that the simulation performance remains constant. And scalable performance from parallel simulation is also confirmed by experiments.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131064142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Supporting RTL flow compatibility in a microarchitecture-level design framework 在微架构级设计框架中支持RTL流兼容性

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629482

Daniel Schwartz-Narbonne, C. Chan, Yogesh S. Mahajan, S. Malik

Current RTL-based design methodologies face significant scaling challenges related to the difficulty of designing, modifying, and verifying RTL. RTL contains primarily low level structural information about the design. In contrast, the microarchitecture-level is much closer to the specification level, making it an effective entry point for hardware design. The explicit description of the high-level units of work is also beneficial for verification. Currently used models for high level design have very complex semantics. In this paper, we present a microarchitectural modeling language with simpler semantics. We demonstrate that it results in a significantly simpler synthesis to Verilog, providing for integration with existing RTL flows. Moreover, the simple semantics of the model enable the generation of PSL assertions for functionally verifying correctness of the synthesis. We demonstrate the efficacy of this approach through two case-studies---a router switch and a processor design. We synthesized both designs, and formally verified the synthesis using the generated assertions.

当前基于RTL的设计方法面临着与设计、修改和验证RTL困难相关的重大扩展挑战。RTL主要包含有关设计的低级结构信息。相比之下，微体系结构级别更接近规范级别，使其成为硬件设计的有效切入点。对高级工作单元的明确描述也有利于验证。目前用于高层设计的模型具有非常复杂的语义。在本文中，我们提出了一种语义更简单的微架构建模语言。我们证明，它可以大大简化对Verilog的合成，并提供与现有RTL流的集成。此外，该模型的简单语义支持生成PSL断言，以便在功能上验证合成的正确性。我们通过两个案例研究证明了这种方法的有效性——路由器开关和处理器设计。我们综合了这两种设计，并使用生成的断言正式验证了综合。

{"title":"Supporting RTL flow compatibility in a microarchitecture-level design framework","authors":"Daniel Schwartz-Narbonne, C. Chan, Yogesh S. Mahajan, S. Malik","doi":"10.1145/1629435.1629482","DOIUrl":"https://doi.org/10.1145/1629435.1629482","url":null,"abstract":"Current RTL-based design methodologies face significant scaling challenges related to the difficulty of designing, modifying, and verifying RTL. RTL contains primarily low level structural information about the design. In contrast, the microarchitecture-level is much closer to the specification level, making it an effective entry point for hardware design. The explicit description of the high-level units of work is also beneficial for verification. Currently used models for high level design have very complex semantics. In this paper, we present a microarchitectural modeling language with simpler semantics. We demonstrate that it results in a significantly simpler synthesis to Verilog, providing for integration with existing RTL flows. Moreover, the simple semantics of the model enable the generation of PSL assertions for functionally verifying correctness of the synthesis. We demonstrate the efficacy of this approach through two case-studies---a router switch and a processor design. We synthesized both designs, and formally verified the synthesis using the generated assertions.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117158107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Automatic customization of device drivers for IP-cores used with assorted CPU organizations 自动定制与各种CPU组织一起使用的ip核的设备驱动程序

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629460

A. Acquaviva, N. Bombieri, F. Fummi, S. Vinco

Plugging an IP core into an embedded platform implies the generation of a device driver complying with the IP communication protocol from one side and with the CPU organization (i.e., single processor, SMP, AMP) from the other side. Reusing an existent driver developed for a different CPU organization needs a time-consuming and error-prone manual customization of it, that discourages the evaluation of alternative target platform organizations. In this context, the paper firstly proposes to extract the formal model of the IP communication protocol from the RTL testbench provided with it. Then a taxonomy of device drivers is presented based on the CPU organization of the platform. This taxonomy allows to select the correct template used to automatically generate a device driver compliant with the CPU organization, with the use in a simulated or in a real platform, with the interrupt support, with the operating system, with the I/O architecture and with the possible parallel execution. The proposed methodology has been successfully tested on a family of embedded platforms with different CPU organizations.

将IP核插入嵌入式平台意味着从一端生成符合IP通信协议的设备驱动程序，从另一端生成符合CPU组织(即单处理器、SMP、AMP)的设备驱动程序。重用为不同CPU组织开发的现有驱动程序需要对其进行耗时且容易出错的手动定制，这不利于对可选目标平台组织进行评估。在此背景下，本文首先提出从所提供的RTL测试台中提取IP通信协议的形式化模型。然后根据平台的CPU组织结构对设备驱动程序进行了分类。这种分类法允许选择正确的模板来自动生成与CPU组织、模拟或真实平台、中断支持、操作系统、I/O体系结构和可能的并行执行兼容的设备驱动程序。所提出的方法已在一系列具有不同CPU组织的嵌入式平台上成功地进行了测试。

{"title":"Automatic customization of device drivers for IP-cores used with assorted CPU organizations","authors":"A. Acquaviva, N. Bombieri, F. Fummi, S. Vinco","doi":"10.1145/1629435.1629460","DOIUrl":"https://doi.org/10.1145/1629435.1629460","url":null,"abstract":"Plugging an IP core into an embedded platform implies the generation of a device driver complying with the IP communication protocol from one side and with the CPU organization (i.e., single processor, SMP, AMP) from the other side. Reusing an existent driver developed for a different CPU organization needs a time-consuming and error-prone manual customization of it, that discourages the evaluation of alternative target platform organizations. In this context, the paper firstly proposes to extract the formal model of the IP communication protocol from the RTL testbench provided with it. Then a taxonomy of device drivers is presented based on the CPU organization of the platform. This taxonomy allows to select the correct template used to automatically generate a device driver compliant with the CPU organization, with the use in a simulated or in a real platform, with the interrupt support, with the operating system, with the I/O architecture and with the possible parallel execution. The proposed methodology has been successfully tested on a family of embedded platforms with different CPU organizations.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130082765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Bottom-up performance analysis considering time slice based software scheduling at system level 考虑基于时间片的系统级软件调度的自底向上性能分析

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629493

A. Viehl, M. Pressler, O. Bringmann

In this paper, a novel approach for integrating time slice based resource access in global performance analysis of distributed real-time critical embedded systems is presented. The performance analysis approach itself is based on bottom-up analysis of communicating processes under consideration of synchronization by inter-process communication and complex internal control flows of the processes. This general analysis methodology is extended concerning concurrent occupation of shared resources using time slice based access methods. The defined extensions are parameterizable for describing arbitrary communication media access schedules and software schedules on shared computation resources, although the explicit focus in this paper is on software scheduling. The applicability of the analysis extensions is presented by a case study of a multimedia subsystem implemented in SystemC.

本文提出了一种将基于时间片的资源访问集成到分布式实时关键嵌入式系统全局性能分析中的新方法。性能分析方法本身基于自底向上的通信过程分析，同时考虑到进程间通信的同步以及进程复杂的内部控制流。将这种通用分析方法扩展到使用基于时间片的访问方法并发占用共享资源。定义的扩展是可参数化的，用于描述任意通信媒体访问调度和共享计算资源上的软件调度，尽管本文明确的重点是软件调度。通过一个在SystemC中实现的多媒体子系统的案例研究，说明了分析扩展的适用性。

引用次数: 4

ESL power analysis of embedded processors for temperature and reliability estimations 用于温度和可靠性估计的嵌入式处理器的ESL功率分析

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629469

Björn Sander, Jürgen Schnerr, O. Bringmann

The ongoing scaling of CMOS technology facilitates the design of systems with continuously increasing functionality but also raises the susceptibility of these systems to reliability issues caused by high power densities and temperatures, respectively. Because of complexity reasons, the Electronic System Level (ESL) is gaining importance as starting point of design. Design alternatives are evaluated at ESL with respect to several design objectives, lately also including temperature. But temperatures are dominated by local power effects - a fact, that has not been sufficiently reflected at ESL until now. There is a lack of appropriate models, which we call ESL Power Density Gap. The contributions of this paper are twofold. First, we describe why the ESL Power Density Gap should be closed. In doing so, we want to stimulate a discussion. After that, we introduce a new ESL methodology for the power analysis of embedded processors, which can be considered as a first step to solve the aforementioned problem. It allows the generation of executable system models from a platform description, combining a functionality representation and component characterizations. Using an example application, it is shown that high power densities, usually invisible at ESL, can be uncovered by applying the proposed approach.

CMOS技术的持续扩展促进了系统功能的不断增加，但也增加了这些系统对高功率密度和高温度引起的可靠性问题的敏感性。由于复杂性的原因，电子系统级(ESL)作为设计的出发点越来越重要。ESL根据几个设计目标对设计方案进行评估，最近还包括温度。但是温度是由当地的电力影响决定的——这一事实直到现在还没有在ESL得到充分的反映。缺乏合适的模型，我们称之为ESL功率密度差距。本文的贡献是双重的。首先，我们描述了为什么ESL的功率密度差距应该被关闭。这样做，我们想激发讨论。之后，我们介绍了一种新的ESL方法用于嵌入式处理器的功耗分析，这可以被认为是解决上述问题的第一步。它允许从平台描述中生成可执行的系统模型，结合功能表示和组件特征。通过一个示例应用，表明高功率密度(通常在ESL中不可见)可以通过应用所提出的方法来发现。

{"title":"ESL power analysis of embedded processors for temperature and reliability estimations","authors":"Björn Sander, Jürgen Schnerr, O. Bringmann","doi":"10.1145/1629435.1629469","DOIUrl":"https://doi.org/10.1145/1629435.1629469","url":null,"abstract":"The ongoing scaling of CMOS technology facilitates the design of systems with continuously increasing functionality but also raises the susceptibility of these systems to reliability issues caused by high power densities and temperatures, respectively. Because of complexity reasons, the Electronic System Level (ESL) is gaining importance as starting point of design. Design alternatives are evaluated at ESL with respect to several design objectives, lately also including temperature. But temperatures are dominated by local power effects - a fact, that has not been sufficiently reflected at ESL until now. There is a lack of appropriate models, which we call ESL Power Density Gap. The contributions of this paper are twofold. First, we describe why the ESL Power Density Gap should be closed. In doing so, we want to stimulate a discussion. After that, we introduce a new ESL methodology for the power analysis of embedded processors, which can be considered as a first step to solve the aforementioned problem. It allows the generation of executable system models from a platform description, combining a functionality representation and component characterizations. Using an example application, it is shown that high power densities, usually invisible at ESL, can be uncovered by applying the proposed approach.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"144 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130520264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems SuSeSim:为嵌入式系统找到最佳L1缓存配置的快速仿真策略

International Conference on Hardware/Software Codesign and System Synthesis

Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629476

M. S. Haque, Andhi Janapsatya, S. Parameswaran

Simulation of an application is a popular and reliable approach to find the optimal configuration of level one cache memory for an application specific embedded system processor. However, long simulation time is one of the main disadvantages of simulation based approaches. In this paper, we propose a new and fast simulation method, Super Set Simulator (SuSeSim). While previous methods use Top-Down searching strategy, SuSeSim utilizes a Bottom-Up search strategy along with a new elaborate data structure to reduce the search space to determine a cache hit or miss. SuSeSim can simulate hundreds of cache configurations simultaneously by reading an application's memory request trace just once. Total number of cache hits and misses are accurately recorded. Depending on different cache block sizes and benchmark applications, SuSeSim can reduce the number of tags to be checked by up to 43% compared to the existing fastest simulation approach (the CRCB algorithm). With the help of a faster search and an easy to maintain data structure, SuSeSim can be up to 94% faster in simulating memory requests compared to the CRCB algorithm.

应用程序模拟是为特定于应用程序的嵌入式系统处理器找到一级缓存内存的最佳配置的一种流行且可靠的方法。然而，仿真时间长是基于仿真方法的主要缺点之一。本文提出了一种新的快速仿真方法——超集模拟器(SuSeSim)。以前的方法使用自顶向下的搜索策略，而SuSeSim使用自底向上的搜索策略以及一个新的精心设计的数据结构来减少搜索空间，以确定缓存命中或未命中。SuSeSim可以通过只读取一次应用程序的内存请求跟踪来同时模拟数百个缓存配置。准确记录缓存命中和未命中的总数。根据不同的缓存块大小和基准测试应用程序，与现有最快的模拟方法(CRCB算法)相比，SuSeSim可以将要检查的标签数量减少多达43%。在更快的搜索和易于维护的数据结构的帮助下，与CRCB算法相比，SuSeSim在模拟内存请求方面的速度可以提高94%。

{"title":"SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems","authors":"M. S. Haque, Andhi Janapsatya, S. Parameswaran","doi":"10.1145/1629435.1629476","DOIUrl":"https://doi.org/10.1145/1629435.1629476","url":null,"abstract":"Simulation of an application is a popular and reliable approach to find the optimal configuration of level one cache memory for an application specific embedded system processor. However, long simulation time is one of the main disadvantages of simulation based approaches. In this paper, we propose a new and fast simulation method, Super Set Simulator (SuSeSim). While previous methods use Top-Down searching strategy, SuSeSim utilizes a Bottom-Up search strategy along with a new elaborate data structure to reduce the search space to determine a cache hit or miss. SuSeSim can simulate hundreds of cache configurations simultaneously by reading an application's memory request trace just once. Total number of cache hits and misses are accurately recorded. Depending on different cache block sizes and benchmark applications, SuSeSim can reduce the number of tags to be checked by up to 43% compared to the existing fastest simulation approach (the CRCB algorithm). With the help of a faster search and an easy to maintain data structure, SuSeSim can be up to 94% faster in simulating memory requests compared to the CRCB algorithm.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124706013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

International Conference on Hardware/Software Codesign and System Synthesis

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀