First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)最新文献

英文中文

Architectural analysis and instruction-set optimization design of network protocol processors 网络协议处理器的体系结构分析与指令集优化设计

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944703

Haiyong Xie, Li Zhao, L. Bhuyan

TCP/IP protocol processing latency has been an important issue in high-speed networks. In this paper, we present an architectural study of TCP/IP protocol. We port the TCP/IP protocol stack from the 4.4 FreeBSD to the SimpleScalar simulation environment. The architectural characteristics, such as instruction level parallelism and cache behavior, are studied through simulation. We also compare the characteristics of TCP/IP protocol to that of SPECint benchmark programs. It turns out that the former is quite different from the latter due to the unique processing structure. Furthermore, in order to improve the effectiveness of instruction cache, frequent instruction pairs are analyzed, and corresponding architectural optimizations are made to the instruction set architecture. The performance is evaluated in the simulator. We find that a 23% improvement can be achieved by taking advantage of the optimization. The instruction set optimizations proposed in this paper will be helpful for the design of new programmable protocol processors in future.

TCP/IP协议处理延迟一直是高速网络中的一个重要问题。本文对TCP/IP协议的体系结构进行了研究。我们将TCP/IP协议栈从4.4 FreeBSD移植到SimpleScalar仿真环境中。通过仿真研究了该系统的结构特征，如指令级并行性和缓存行为。我们还比较了TCP/IP协议与SPECint基准程序的特性。结果表明，由于加工结构的独特，前者与后者有很大的不同。此外，为了提高指令缓存的有效性，分析了频繁指令对，并对指令集体系结构进行了相应的体系结构优化。在模拟器中对性能进行了评估。我们发现，通过利用优化可以实现23%的改进。本文所提出的指令集优化方法对今后新型可编程协议处理器的设计具有一定的指导意义。

引用次数: 13

A fast parallel Reed-Solomon decoder on a reconfigurable architecture 基于可重构结构的快速并行里德-所罗门解码器

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944660

A. Koohi, N. Bagherzadeh, Chengzhi Pan

This paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. Numerous modifications of the first-generation of the architecture have made a scalable computation and communication intensive architecture capable of extracting parallelisms of fine grain in instruction level. Many algorithms and the whole digital video broadcasting base-band receiver as well, have been mapped onto the second architecture with impressive performance. The mapping of a Reed-Solomon decoder proposed in the paper highly parallelizes all of its sub-algorithms, including Syndrome Computation, Berlekamp Algorithm, Chein Search, and Error Value Computation, in a SIMD fashion. The mapping is tested on a cycle-accurate simulator, "Mulate", and the performance is encouragingly better than other architectures. The decoding speed of the RS (255,239,16) decoder using two different methods of GF multiplication can be 1.319 Gbps and 2.534 Gbps, respectively. Furthermore, since there is no functionality specifically tailored to Reed-Solomon decoder, the result has demonstrated the capability of MorphoSys architecture to extracting instruction level parallelism from streamed applications.

本文针对多媒体和DSP等流媒体应用，提出了一种在第二代MorphoSys可重构计算平台上的快速并行Reed-Solomon解码器的软件实现。对第一代体系结构进行了大量修改，使其成为可扩展的计算和通信密集型体系结构，能够在指令级提取细粒度的并行性。许多算法和整个数字视频广播基带接收器都被映射到第二种架构上，并具有令人印象深刻的性能。本文提出的Reed-Solomon解码器的映射以SIMD的方式高度并行化了其所有子算法，包括综合征计算、Berlekamp算法、Chein搜索和误差值计算。该映射在周期精确模拟器“Mulate”上进行了测试，性能比其他架构好得多。采用两种不同的GF乘法方法，RS(255,239,16)解码器的解码速度分别为1.319 Gbps和2.534 Gbps。此外，由于没有专门为Reed-Solomon解码器定制的功能，结果证明了MorphoSys架构从流应用程序中提取指令级并行性的能力。

{"title":"A fast parallel Reed-Solomon decoder on a reconfigurable architecture","authors":"A. Koohi, N. Bagherzadeh, Chengzhi Pan","doi":"10.1145/944645.944660","DOIUrl":"https://doi.org/10.1145/944645.944660","url":null,"abstract":"This paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. Numerous modifications of the first-generation of the architecture have made a scalable computation and communication intensive architecture capable of extracting parallelisms of fine grain in instruction level. Many algorithms and the whole digital video broadcasting base-band receiver as well, have been mapped onto the second architecture with impressive performance. The mapping of a Reed-Solomon decoder proposed in the paper highly parallelizes all of its sub-algorithms, including Syndrome Computation, Berlekamp Algorithm, Chein Search, and Error Value Computation, in a SIMD fashion. The mapping is tested on a cycle-accurate simulator, \"Mulate\", and the performance is encouragingly better than other architectures. The decoding speed of the RS (255,239,16) decoder using two different methods of GF multiplication can be 1.319 Gbps and 2.534 Gbps, respectively. Furthermore, since there is no functionality specifically tailored to Reed-Solomon decoder, the result has demonstrated the capability of MorphoSys architecture to extracting instruction level parallelism from streamed applications.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130836335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

A modular simulation framework for architectural exploration of on-chip interconnection networks 片上互连网络架构探索的模块化仿真框架

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944648

Tim Kogel, Malte Doerper, Andreas Wieferink, R. Leupers, G. Ascheid, H. Meyr, S. Goossens

Ever increasing complexity and heterogeneity of SoC platforms require on-chip communication schemes beyond the currently omnipresent shared bus architectures. To prevent time consuming design changes late in the design flow, we propose the early exploration of the on-chip communication architecture to meet performance and cost requirements. Based on SystemC 2.0.1 we have defined a modular exploration framework, which is able to capture the effect on performance for different on-chip networks like dedicated point-to-point, shared bus, and crossbar topologies. Monitoring of performance parameters like utilization, latency and throughput drives the mapping of the intermodule traffic to an efficient communication architecture. The effectiveness of our approach is demonstrated by the exemplary design of a high performance Network Processing Unit (NPU), which is compared against a commercial NPU device.

SoC平台日益增加的复杂性和异构性要求片上通信方案超越目前无处不在的共享总线架构。为了避免在设计流程后期进行耗时的设计更改，我们建议尽早探索片上通信架构，以满足性能和成本要求。基于SystemC 2.0.1，我们定义了一个模块化的探索框架，它能够捕获不同片上网络(如专用点对点、共享总线和交叉栏拓扑)对性能的影响。监视性能参数(如利用率、延迟和吞吐量)可以将模块间流量映射到高效的通信体系结构。通过高性能网络处理单元(NPU)的示例设计证明了我们方法的有效性，并将其与商用NPU设备进行了比较。

引用次数: 75

Accurate estimation of cache-related preemption delay 准确估计缓存相关的抢占延迟

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944698

H. S. Negi, T. Mitra, Abhik Roychoudhury

Multitasked real-time systems often employ caches to boost performance. However the unpredictable dynamic behavior of caches makes schedulability analysis of such systems difficult. In particular, the effect of caches needs to be considered for estimating the inter-task interference. As the memory blocks of different tasks can map to the same cache blocks, preemption of a task may introduce additional cache misses. The time penalty introduced by these misses is called the cache-related preemption delay (CRPD). In this paper, we provide a program path analysis technique to estimate CRPD. Our technique performs path analysis of both the preempted and the preempting tasks. Furthermore, we improve the accuracy of the analysis by estimating the possible states of the entire cache at each possible preemption point rather than estimating the states of each cache block independently. To avoid incurring high space requirements, the cache states can be maintained symbolically as a binary decision diagram. Experimental results indicate that we obtain tight CRPD estimates for realistic benchmarks.

多任务实时系统通常使用缓存来提高性能。然而，缓存的不可预测的动态行为使得这类系统的可调度性分析变得困难。在评估任务间干扰时，需要考虑缓存的影响。由于不同任务的内存块可以映射到相同的缓存块，因此任务的抢占可能会引入额外的缓存丢失。这些失误带来的时间损失被称为缓存相关抢占延迟(CRPD)。在本文中，我们提供了一种评估CRPD的程序路径分析技术。我们的技术可以对被抢占任务和被抢占任务进行路径分析。此外，我们通过在每个可能的抢占点估计整个缓存的可能状态来提高分析的准确性，而不是单独估计每个缓存块的状态。为了避免产生高空间需求，可以将缓存状态象征性地维护为二进制决策图。实验结果表明，我们获得了严格的CRPD估计。

{"title":"Accurate estimation of cache-related preemption delay","authors":"H. S. Negi, T. Mitra, Abhik Roychoudhury","doi":"10.1145/944645.944698","DOIUrl":"https://doi.org/10.1145/944645.944698","url":null,"abstract":"Multitasked real-time systems often employ caches to boost performance. However the unpredictable dynamic behavior of caches makes schedulability analysis of such systems difficult. In particular, the effect of caches needs to be considered for estimating the inter-task interference. As the memory blocks of different tasks can map to the same cache blocks, preemption of a task may introduce additional cache misses. The time penalty introduced by these misses is called the cache-related preemption delay (CRPD). In this paper, we provide a program path analysis technique to estimate CRPD. Our technique performs path analysis of both the preempted and the preempting tasks. Furthermore, we improve the accuracy of the analysis by estimating the possible states of the entire cache at each possible preemption point rather than estimating the states of each cache block independently. To avoid incurring high space requirements, the cache states can be maintained symbolically as a binary decision diagram. Experimental results indicate that we obtain tight CRPD estimates for realistic benchmarks.","PeriodicalId":174422,"journal":{"name":"First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128480981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 113

Hardware support for real-time operating systems 实时操作系统的硬件支持

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944656

P. Kohout, B. Ganesh, B. Jacob

The growing complexity of embedded applications and pressure on time-to-market has resulted in the increasing use of embedded real-time operating systems. Unfortunately, RTOSes can introduce a significant performance degradation. The paper presents the Real-Time Task Manager (RTM) - a processor extension that minimizes the performance drawbacks associated with RTOSes. The RTM accomplishes this by supporting, in hardware, a few of the common RTOS operations that are performance bottlenecks: task scheduling, time management, and event management. By exploiting the inherent parallelism of these operations, the RTM completes them in constant time, thereby significantly reducing RTOS overhead. It decreases both the processor time used by the RTOS and the maximum response time by an order of magnitude.

嵌入式应用程序的日益复杂和上市时间的压力导致嵌入式实时操作系统的使用越来越多。不幸的是，rtos会导致显著的性能下降。本文介绍了实时任务管理器(RTM)——一种处理器扩展，可以最大限度地减少与rtos相关的性能缺陷。RTM通过在硬件上支持一些常见的RTOS操作来实现这一点，这些操作是性能瓶颈:任务调度、时间管理和事件管理。通过利用这些操作固有的并行性，RTM在恒定的时间内完成它们，从而大大减少了RTOS开销。它将RTOS使用的处理器时间和最大响应时间降低了一个数量级。

引用次数: 117

Early estimation of the size of VHDL projects 早期估计VHDL项目的规模

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944699

W. Fornaciari, F. Salice, D. Scarpazza

The analysis of the amount of human resources required to complete a project is felt as a critical issue in any company of the electronics industry. In particular, early estimation of the effort involved in a development process is a key requirement for any cost-driven system-level design decision. In this paper, we present a methodology to predict the final size of a VHDL project on the basis of a high-level description, obtaining a significant indication about the development effort. The methodology is the composition of a number of specialized models, tailored to estimate the size of specific component types. Models were trained and tested on two disjoint and large sets of real VHDL projects. Quality-of-result indicators show that the methodology is both accurate and robust.

分析完成一个项目所需的人力资源的数量，在电子行业的任何公司都被认为是一个关键问题。特别是，开发过程中所涉及的工作的早期评估是任何成本驱动的系统级设计决策的关键需求。在本文中，我们提出了一种基于高级描述的方法来预测VHDL项目的最终规模，从而获得有关开发工作的重要指示。该方法是许多专门模型的组合，用于估计特定组件类型的大小。模型在两个不相交的大型VHDL项目集上进行了训练和测试。结果质量指标表明，该方法既准确又稳健。

引用次数: 6

Extending the SystemC synthesis subset by object-oriented features 通过面向对象的特性扩展SystemC合成子集

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944652

E. Grimpe, F. Oppenheimer

In this article we present an approach to object-oriented hardware design and synthesis based on SystemC. We give an introduction to an extended SystemC synthesis subset which we propose, and, in particular, its object-oriented features. We will also briefly outline our basic synthesis concepts for object-oriented hardware specifications. Finally, we present some examples for the application of the extended synthesis subset, which are directly processable by a first synthesis tool prototype which we have developed for this purpose.

本文提出了一种基于SystemC的面向对象硬件设计与合成方法。我们介绍了一个扩展的SystemC合成子集，特别是它的面向对象特性。我们还将简要概述面向对象硬件规范的基本综合概念。最后，我们给出了一些应用扩展综合子集的例子，这些例子可由我们为此目的开发的第一个综合工具原型直接处理。

引用次数: 41

Design optimization of mixed time/event-triggered distributed embedded systems 混合时间/事件触发分布式嵌入式系统的设计优化

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944672

T. Pop, P. Eles, Zebo Peng

Distributed embedded systems implemented with mixed, event-triggered task sets, which communicate over bus protocols consisting of both static and dynamic phases, are emerging as the new standard in application areas such as automotive electronics. In a previous paper, we developed a holistic timing analysis and scheduling approach for this category of systems. Based on this result, new design problems are solved, which we identified as characteristic for such hybrid systems: partitioning of the system functionality into time-triggered and event-triggered domains and the optimization of parameters corresponding to the communication protocol. We addressed both problems in the context of a heuristic which performs mapping and scheduling of the system functionality. We demonstrate the efficiency of the proposed technique with extensive experiments.

采用混合的、事件触发的任务集实现的分布式嵌入式系统，通过由静态和动态阶段组成的总线协议进行通信，正在成为汽车电子等应用领域的新标准。在之前的一篇论文中，我们为这类系统开发了一个整体的时序分析和调度方法。在此基础上，解决了新的设计问题，并将其确定为这种混合系统的特征:将系统功能划分为时间触发域和事件触发域，并优化相应的通信协议参数。我们在启发式的上下文中解决了这两个问题，启发式执行系统功能的映射和调度。我们通过大量的实验证明了所提出的技术的有效性。

引用次数: 40

Security wrappers and power analysis for SoC technology SoC技术的安全包装和功耗分析

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944689

C. Gebotys, Y. Zhang

Future wireless Internet enabled services will be increasingly powerful supporting many more applications including one of the most crucial, security. Although SoCs offer more resistance to bus probing attacks, power/EM attacks on cores and network snooping attacks by malicious code are relevant. This paper presents a methodology for security on NoC at both the network level (or transport layer) and at the core level (or application layer) is proposed. For the first time a low cost security wrapper design is presented, which prevents unencrypted keys from leaving the cores and NoC. This is crucial to prevent untrusted software on or off the NoC from gaining access to keys. At the core level (application layer) power analysis attacks are examined for the first time for parallel and adiabatic architectural cores. With the emergence of secure IP cores in the market, a security methodology for designing NoCs is crucial for supporting future wireless Internet enabled devices.

未来的无线互联网服务将会越来越强大，支持更多的应用，包括最关键的一个，安全。尽管soc对总线探测攻击提供了更多的抵抗，但对核心的电源/EM攻击和恶意代码的网络窥探攻击是相关的。本文提出了一种在网络层(或传输层)和核心层(或应用层)实现NoC安全的方法。首次提出了一种低成本的安全封装器设计，可以防止未加密的密钥离开核心和NoC。这对于防止NoC上或NoC下的不受信任的软件获得对密钥的访问权至关重要。在核心层(应用层)，首次对并行和绝热架构核心进行了功耗分析攻击。随着市场上安全IP核的出现，设计noc的安全方法对于支持未来的无线互联网设备至关重要。

引用次数: 32

Verification of design decisions in ForSyDe 在ForSyDe中验证设计决策

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

Pub Date : 2003-10-01 DOI: 10.1145/944645.944692

Tarvo Raudvere, I. Sander, A. Singh, A. Jantsch

The ForSyDe methodology has been developed for system level design. Starting with a formal specification model that captures the functionality of the system at a high abstraction level, it provides formal design transformation methods for a transparent refinement process of the specification model into an implementation model that is optimized for synthesis. A transformation may be semantic preserving or a design decision. The latter modifies the semantics of the system level description and changes the meaning of the model. The main contribution of this paper is the incorporation of model checking to verify that refined system blocks satisfy the design specification. We illustrate the translation of the ForSyDe code to the SMV language and the verification of local design decisions with a case study of a ForSyDe equalizer model.

ForSyDe方法是为系统级设计而开发的。从一个在高抽象级别捕获系统功能的正式规范模型开始，它提供了正式的设计转换方法，用于将规范模型的透明细化过程转化为为综合而优化的实现模型。转换可以是语义保留，也可以是设计决策。后者修改了系统级描述的语义，并改变了模型的含义。本文的主要贡献在于纳入了模型检查，以验证改进的系统块是否满足设计规范。我们通过对ForSyDe均衡器模型的案例研究来说明将ForSyDe代码翻译为SMV语言以及验证本地设计决策。

引用次数: 12

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀