首页 > 最新文献

2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools最新文献

英文 中文
A Multicore SDR Architecture for Reconfigurable WiMAX Downlink 一种可重构WiMAX下行链路的多核SDR体系结构
Pedro Suárez-Casal, Ángel Carro-Lagoa, J. García-Naya, L. Castedo
This paper describes a multicore Software Defined Radio (SDR) architecture devised to implement a fully reconfigurable downlink for WiMAX transceivers. The proposed architecture is made up of Commercial-Off-The-Shelf (COTS) modules available in the market and includes a DSP, three different models of FPGAs, DACs and ADCs. We show that the architecture is capable of supporting all the functionalities of the downlink sub frame of the Orthogonal Frequency Division Multiple Access (OFDMA) WiMAX physical layer, including Partial Usage of Sub carriers (PUSC) symbol structure and Forward Error Correction (FEC). The primary advantage of the design is the full reconfigurability at different levels: bandwidth, size of the FFT, modulation, code rate, etc. without modifying or restarting the system. We show that the five downlink profiles defined by the WiMAX Forum can be successfully implemented with the proposed achitecture.
本文描述了一种多核软件定义无线电(SDR)架构,用于实现WiMAX收发器的完全可重构下行链路。提出的架构由市场上可用的商用现货(COTS)模块组成,包括DSP,三种不同型号的fpga, dac和adc。我们证明了该架构能够支持正交频分多址(OFDMA) WiMAX物理层下行链路子帧的所有功能,包括子载波的部分使用(PUSC)符号结构和前向纠错(FEC)。该设计的主要优点是在不同级别上的完全可重构性:带宽,FFT的大小,调制,码率等,而无需修改或重新启动系统。我们证明了WiMAX论坛定义的五种下行链路配置文件可以通过所提出的架构成功实现。
{"title":"A Multicore SDR Architecture for Reconfigurable WiMAX Downlink","authors":"Pedro Suárez-Casal, Ángel Carro-Lagoa, J. García-Naya, L. Castedo","doi":"10.1109/DSD.2010.108","DOIUrl":"https://doi.org/10.1109/DSD.2010.108","url":null,"abstract":"This paper describes a multicore Software Defined Radio (SDR) architecture devised to implement a fully reconfigurable downlink for WiMAX transceivers. The proposed architecture is made up of Commercial-Off-The-Shelf (COTS) modules available in the market and includes a DSP, three different models of FPGAs, DACs and ADCs. We show that the architecture is capable of supporting all the functionalities of the downlink sub frame of the Orthogonal Frequency Division Multiple Access (OFDMA) WiMAX physical layer, including Partial Usage of Sub carriers (PUSC) symbol structure and Forward Error Correction (FEC). The primary advantage of the design is the full reconfigurability at different levels: bandwidth, size of the FFT, modulation, code rate, etc. without modifying or restarting the system. We show that the five downlink profiles defined by the WiMAX Forum can be successfully implemented with the proposed achitecture.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115232324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Adaptive Beamforming Using the Reconfigurable MONTIUM TP 使用可重构MONTIUM TP的自适应波束形成
M. V. D. Burgwal, K. Rovers, Koen C. H. Blom, A. Kokkeler, G. Smit
Until a decade ago, the concept of phased array beam forming was mainly implemented with mechanical or analog solutions. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beam forming. While more and more applications are using beam forming to improve the communication channel utilization both in space and frequency, many dedicated digital architectures are proposed for the processing. By using a reconfigurable architecture, the same hardware platform can be reused for different applications with different processing needs. In this paper, we present a reconfigurable Multi-processor System-on-Chip based solution for phased array processing that supports advanced tracking mechanisms to continuously receive signals with a mobile receiver. An adaptive beam former for DVB-S satellite reception is presented, that uses a Constant Modulus Algorithm to track satellites. The processing of a receiver with 64 antennas and 3 beams is mapped on a reconfigurable processor named Montium TP. The total implementation of such a receiver requires about 570 clock cycles on a single Montium TP, but can also be partitioned over multiple Montium TPs to support larger phased arrays.
直到十年前,相控阵波束形成的概念主要是通过机械或模拟解决方案实现的。今天,数字硬件已经变得足够强大,可以执行实时数字波束形成所需的大量操作。随着越来越多的应用使用波束形成来提高通信信道在空间和频率上的利用率,提出了许多专用的数字架构来处理波束形成。通过使用可重新配置的体系结构,可以为具有不同处理需求的不同应用程序重用相同的硬件平台。在本文中,我们提出了一种可重构的基于多处理器片上系统的相控阵处理解决方案,该解决方案支持先进的跟踪机制,可以通过移动接收器连续接收信号。提出了一种用于DVB-S卫星接收的自适应波束形成器,该波束形成器采用恒模算法对卫星进行跟踪。具有64个天线和3个波束的接收器的处理被映射到一个名为Montium TP的可重构处理器上。这种接收器的总体实现需要单个Montium TP上大约570个时钟周期,但也可以在多个Montium TP上进行分区,以支持更大的相控阵。
{"title":"Adaptive Beamforming Using the Reconfigurable MONTIUM TP","authors":"M. V. D. Burgwal, K. Rovers, Koen C. H. Blom, A. Kokkeler, G. Smit","doi":"10.1109/DSD.2010.13","DOIUrl":"https://doi.org/10.1109/DSD.2010.13","url":null,"abstract":"Until a decade ago, the concept of phased array beam forming was mainly implemented with mechanical or analog solutions. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beam forming. While more and more applications are using beam forming to improve the communication channel utilization both in space and frequency, many dedicated digital architectures are proposed for the processing. By using a reconfigurable architecture, the same hardware platform can be reused for different applications with different processing needs. In this paper, we present a reconfigurable Multi-processor System-on-Chip based solution for phased array processing that supports advanced tracking mechanisms to continuously receive signals with a mobile receiver. An adaptive beam former for DVB-S satellite reception is presented, that uses a Constant Modulus Algorithm to track satellites. The processing of a receiver with 64 antennas and 3 beams is mapped on a reconfigurable processor named Montium TP. The total implementation of such a receiver requires about 570 clock cycles on a single Montium TP, but can also be partitioned over multiple Montium TPs to support larger phased arrays.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Behavioural Modelling of DLLs for Fast Simulation and Optimisation of Jitter and Power Consumption 用于快速模拟和优化抖动和功耗的dll行为建模
E. Barajas, D. Mateo, J. González
This paper presents a behavioural model for fast DLL simulations. The behavioural model includes a modelling of the various noise sources in the DLL that produce output jitter. The model is used to obtain the dependence of the output jitter versus the power consumption. The model exploits the open-loop DLL analysis to reduce simulation time when compared to typical DLL evaluation.
本文提出了一种用于快速DLL仿真的行为模型。行为模型包括DLL中产生输出抖动的各种噪声源的建模。该模型用于获得输出抖动与功耗的依赖关系。该模型利用开环DLL分析,与典型的DLL评估相比,减少了仿真时间。
{"title":"Behavioural Modelling of DLLs for Fast Simulation and Optimisation of Jitter and Power Consumption","authors":"E. Barajas, D. Mateo, J. González","doi":"10.1109/DSD.2010.86","DOIUrl":"https://doi.org/10.1109/DSD.2010.86","url":null,"abstract":"This paper presents a behavioural model for fast DLL simulations. The behavioural model includes a modelling of the various noise sources in the DLL that produce output jitter. The model is used to obtain the dependence of the output jitter versus the power consumption. The model exploits the open-loop DLL analysis to reduce simulation time when compared to typical DLL evaluation.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125069874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Evaluation of RTD-CMOS Logic Gates RTD-CMOS逻辑门的评价
J. Núñez, M. Avedillo, J. Quintana
The incorporation of Resonant Tunnel Diodes (RTDs) into III/V transistor technologies has shown an improved circuit performance: higher circuit speed, reduced component count, and/or lowered power consumption. Currently, the incorporation of these devices into CMOS technologies (RTD-CMOS) is an area of active research. Although some works have focused the evaluation of the advantages of this incorporation, additional work in this direction is required. This paper compares RTD-CMOS and pure CMOS realizations of a set of logic gates which can be operated in a gate-level nanopipelined fashion, thus allows estimating logic networks operating frequency. Lower power-delay products are obtained for RTD/CMOS implementations.
将谐振隧道二极管(rtd)集成到III/V晶体管技术中已经显示出电路性能的改进:更高的电路速度,更少的元件数量和/或更低的功耗。目前,将这些器件集成到CMOS技术(RTD-CMOS)中是一个活跃的研究领域。虽然一些工作侧重于评价这种结合的好处,但在这个方向上还需要进一步的工作。本文比较了RTD-CMOS和纯CMOS的一组逻辑门的实现,这些逻辑门可以以门级纳米流水线的方式工作,从而可以估计逻辑网络的工作频率。获得了用于RTD/CMOS实现的低功耗延迟产品。
{"title":"Evaluation of RTD-CMOS Logic Gates","authors":"J. Núñez, M. Avedillo, J. Quintana","doi":"10.1109/DSD.2010.17","DOIUrl":"https://doi.org/10.1109/DSD.2010.17","url":null,"abstract":"The incorporation of Resonant Tunnel Diodes (RTDs) into III/V transistor technologies has shown an improved circuit performance: higher circuit speed, reduced component count, and/or lowered power consumption. Currently, the incorporation of these devices into CMOS technologies (RTD-CMOS) is an area of active research. Although some works have focused the evaluation of the advantages of this incorporation, additional work in this direction is required. This paper compares RTD-CMOS and pure CMOS realizations of a set of logic gates which can be operated in a gate-level nanopipelined fashion, thus allows estimating logic networks operating frequency. Lower power-delay products are obtained for RTD/CMOS implementations.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116386578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
H.264 Color Components Video Decoding Parallelization on Multi-core Processors 多核处理器上的H.264彩色分量视频解码并行化
Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine
Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.
多处理器单片系统将成为嵌入式系统的主导架构,因为它提供了并发性的增加,从而改善了系统的性能,而不是增加影响系统功耗的时钟速度。然而,为了提高系统在不同应用环境下的性能,需要利用并发性。新出现的H.264/AVC编码标准旨在涵盖广泛的应用(实时会话服务,如视频会议,视频电话等)。与以前的视频编码标准相比,它有许多需要复杂计算的新功能。这种编码标准对于未来的MPSoC嵌入式系统来说将是一个具有挑战性的工作负载。利用视频编解码器应用程序的不同级别的并行性可以在数据级、功能级或同时完成。本文的目的是探索H.264解码器软件[2]本身存在的自然并行性,而无需对编码器相位进行任何修改,而不是强制并行化技术。我们的新想法是基于H.264解码器分别解码亮度和色度信号的事实,但解码器以串行解码它们的方式实现。我们的方法是并行执行亮度信号的不同解码阶段。使用两个核并行解码亮度和色度信号可以获得15-20%的解码处理时间,并将它们结合在四个核或更多的功能流水线上实现,与目前的顺序执行相比,增益可以达到60%。
{"title":"H.264 Color Components Video Decoding Parallelization on Multi-core Processors","authors":"Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine","doi":"10.1109/DSD.2010.76","DOIUrl":"https://doi.org/10.1109/DSD.2010.76","url":null,"abstract":"Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129261742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An FPGA-Based Accelerator for Analog VLSI Artificial Neural Network Emulation 基于fpga的模拟VLSI人工神经网络仿真加速器
B. V. Liempd, Daniel Herrera, M. Figueroa
Analog VLSI circuits are being used successfully to implement Artificial Neural Networks (ANNs). These analog circuits exhibit nonlinear transfer function characteristics and suffer from device mismatches, degrading network performance. Because of the high cost involved with analog VLSI production, it is beneficial to predict implementation performance during design. We present an FPGA-based accelerator for the emulation of large (500+ synapses, 10k+ test samples) single-neuron ANNs implemented in analog VLSI. We used hardware time-multiplexing to scale network size and maximize hardware usage. An on-chip CPU controls the data flow through various memory systems to allow for large test sequences. We show that Block-RAM availability is the main implementation bottleneck and that a trade-off arises between emulation speed and hardware resources. However, we can emulate large amounts of synapses on an FPGA with limited resources. We have obtained a speedup of 30.5 times with respect to an optimized software implementation on a desktop computer.
模拟VLSI电路正在成功地用于实现人工神经网络(ann)。这些模拟电路表现出非线性传递函数特性,并且受到器件不匹配的影响,降低了网络性能。由于模拟超大规模集成电路的生产成本很高,因此在设计时预测其实现性能是有益的。我们提出了一个基于fpga的加速器,用于模拟VLSI中实现的大型(500+突触,10k+测试样本)单神经元人工神经网络的仿真。我们使用硬件时间复用来扩展网络大小并最大化硬件使用。片上CPU控制通过各种存储系统的数据流,以允许大型测试序列。我们展示了块ram可用性是主要的实现瓶颈,并且在仿真速度和硬件资源之间产生了权衡。然而,我们可以用有限的资源在FPGA上模拟大量的突触。与在台式计算机上优化的软件实现相比,我们获得了30.5倍的加速。
{"title":"An FPGA-Based Accelerator for Analog VLSI Artificial Neural Network Emulation","authors":"B. V. Liempd, Daniel Herrera, M. Figueroa","doi":"10.1109/DSD.2010.20","DOIUrl":"https://doi.org/10.1109/DSD.2010.20","url":null,"abstract":"Analog VLSI circuits are being used successfully to implement Artificial Neural Networks (ANNs). These analog circuits exhibit nonlinear transfer function characteristics and suffer from device mismatches, degrading network performance. Because of the high cost involved with analog VLSI production, it is beneficial to predict implementation performance during design. We present an FPGA-based accelerator for the emulation of large (500+ synapses, 10k+ test samples) single-neuron ANNs implemented in analog VLSI. We used hardware time-multiplexing to scale network size and maximize hardware usage. An on-chip CPU controls the data flow through various memory systems to allow for large test sequences. We show that Block-RAM availability is the main implementation bottleneck and that a trade-off arises between emulation speed and hardware resources. However, we can emulate large amounts of synapses on an FPGA with limited resources. We have obtained a speedup of 30.5 times with respect to an optimized software implementation on a desktop computer.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129341323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
LEON3 ViP: A Virtual Platform with Fault Injection Capabilities leon3vip:具有故障注入功能的虚拟平台
Antônio da Silva, Sebastián Sánchez
In addition to functional simulation for validation of hardware/software designs, there are additional robustness requirements that need advanced simulation techniques and tools to analyze the system behavior in the presence of faults. In this paper, we present the design of a fault injection framework for LEON3, a 32bit SPARC CPU based system used by the European Space Agency, described at Transaction Level using System C. First of all an extension of a previous XML formalization of basic binary faults, like memory and CPU registers corruption, is done in order to support TLM2.0transaction’s parameters corruptions. Next a novel Dynamic Binary Instrumentation (DBI) technique for C++ binaries is used to insert fault injection wrappers in SystemC transaction path. For binary faults in model components the use ofTLM2.0 “transport_dbg” is proposed. This way each component with fault injection capabilities exposes a standard interface to allow internal component inspection and modification.
除了用于验证硬件/软件设计的功能仿真之外,还有额外的鲁棒性要求,需要先进的仿真技术和工具来分析存在故障的系统行为。在本文中,我们为欧洲航天局使用的基于32位SPARC CPU的系统LEON3设计了一个故障注入框架,该系统使用system c在事务级进行描述。首先,为了支持tlm2.0事务的参数损坏,我们扩展了以前对基本二进制故障(如内存和CPU寄存器损坏)的XML形式化。在此基础上,提出了一种针对c++二进制文件的动态二进制检测(DBI)技术,用于在SystemC事务路径中插入故障注入包装器。对于模型组件中的二进制故障,建议使用tlm2.0“transport_dbg”。通过这种方式,每个具有故障注入功能的组件都公开了一个标准接口,以允许对内部组件进行检查和修改。
{"title":"LEON3 ViP: A Virtual Platform with Fault Injection Capabilities","authors":"Antônio da Silva, Sebastián Sánchez","doi":"10.1109/DSD.2010.34","DOIUrl":"https://doi.org/10.1109/DSD.2010.34","url":null,"abstract":"In addition to functional simulation for validation of hardware/software designs, there are additional robustness requirements that need advanced simulation techniques and tools to analyze the system behavior in the presence of faults. In this paper, we present the design of a fault injection framework for LEON3, a 32bit SPARC CPU based system used by the European Space Agency, described at Transaction Level using System C. First of all an extension of a previous XML formalization of basic binary faults, like memory and CPU registers corruption, is done in order to support TLM2.0transaction’s parameters corruptions. Next a novel Dynamic Binary Instrumentation (DBI) technique for C++ binaries is used to insert fault injection wrappers in SystemC transaction path. For binary faults in model components the use ofTLM2.0 “transport_dbg” is proposed. This way each component with fault injection capabilities exposes a standard interface to allow internal component inspection and modification.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126263925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Designing Efficient Source Routing for Mesh Topology Network on Chip Platforms 芯片平台上网状拓扑网络的高效源路由设计
S. Mubeen, Shashi Kumar
Efficient on-chip communication is very important for exploiting enormous computing power available on a multi-core chip. Network on Chip (NoC) has emerged as a competitive candidate for implementing on-chip communication. Routing algorithms significantly affect the performance of a NoC. Most of the existing NoC architectural proposals advocate distributed routing algorithms for building NoC platforms. Although source routing offers many advantages, researchers avoided it due to its apparent disadvantage of larger header size requirement that results in lower bandwidth utilization. In this paper we make a strong case for the use of source routing for NoCs, especially for platforms with small sizes and regular topologies. We present a methodology to compute application specific efficient paths for communication among cores with a high degree of load balancing. The methodology first selects the most appropriate deadlock free routing algorithm, from a set of routing algorithms, based on the application’s traffic patterns. Then the selected (possibly adaptive) routing algorithm is used to compute efficient static paths with the goal of link load balancing. We demonstrate through simulation based evaluation that source routing has a potential of achieving higher performance, for example up to 28% lower latency even at medium load, as compared to distributed routing. A simple scheme is proposed for encoding of router ports to reduce the header overhead. A generic simulator was developed for evaluation and performance comparison between source routing and distributed routing. We also designed a router to support source routing for mesh topology NoC platforms.
高效的片内通信对于利用多核芯片上的巨大计算能力非常重要。片上网络(NoC)已成为实现片上通信的一个有竞争力的候选方案。路由算法对NoC的性能影响很大。大多数现有的NoC架构建议都提倡分布式路由算法来构建NoC平台。虽然源路由有很多优点,但由于其明显的缺点,即对报头大小的要求较大,导致带宽利用率较低,因此研究人员避免使用源路由。在本文中,我们为noc使用源路由提供了强有力的案例,特别是对于具有小尺寸和常规拓扑的平台。我们提出了一种方法来计算具有高度负载平衡的核心之间通信的特定应用程序的有效路径。该方法首先根据应用程序的流量模式,从一组路由算法中选择最合适的无死锁路由算法。然后使用选择的(可能是自适应的)路由算法来计算有效的静态路径,以达到链路负载均衡的目的。我们通过基于仿真的评估证明,与分布式路由相比,源路由具有实现更高性能的潜力,例如,即使在中等负载下,延迟也可降低28%。为了减少报头开销,提出了一种简单的路由器端口编码方案。开发了一个通用的仿真器,用于源路由和分布式路由的性能评估和比较。我们还设计了一个路由器来支持网格拓扑NoC平台的源路由。
{"title":"Designing Efficient Source Routing for Mesh Topology Network on Chip Platforms","authors":"S. Mubeen, Shashi Kumar","doi":"10.1109/DSD.2010.57","DOIUrl":"https://doi.org/10.1109/DSD.2010.57","url":null,"abstract":"Efficient on-chip communication is very important for exploiting enormous computing power available on a multi-core chip. Network on Chip (NoC) has emerged as a competitive candidate for implementing on-chip communication. Routing algorithms significantly affect the performance of a NoC. Most of the existing NoC architectural proposals advocate distributed routing algorithms for building NoC platforms. Although source routing offers many advantages, researchers avoided it due to its apparent disadvantage of larger header size requirement that results in lower bandwidth utilization. In this paper we make a strong case for the use of source routing for NoCs, especially for platforms with small sizes and regular topologies. We present a methodology to compute application specific efficient paths for communication among cores with a high degree of load balancing. The methodology first selects the most appropriate deadlock free routing algorithm, from a set of routing algorithms, based on the application’s traffic patterns. Then the selected (possibly adaptive) routing algorithm is used to compute efficient static paths with the goal of link load balancing. We demonstrate through simulation based evaluation that source routing has a potential of achieving higher performance, for example up to 28% lower latency even at medium load, as compared to distributed routing. A simple scheme is proposed for encoding of router ports to reduce the header overhead. A generic simulator was developed for evaluation and performance comparison between source routing and distributed routing. We also designed a router to support source routing for mesh topology NoC platforms.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125582916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 42
Faults Coverage Improvement Based on Fault Simulation and Partial Duplication 基于故障模拟和部分复制的故障覆盖改进
Jaroslav Borecký, Martin Kohlík, H. Kubátová, P. Kubalík
A method how to improve the coverage of single faults in combinational circuits is proposed. The method is based on Concurrent Error Detection, but uses a fault simulation to find Critical points – the places, where faults are difficult to detect. The partial duplication of the design with regard to these critical points is able to increase the faults coverage with a low area overhead cost. Due to higher fault coverage we can increase the dependability parameters. The proposed modification is tested on the railway station safety devices designs implemented in the FPGA.
提出了一种提高组合电路中单故障覆盖率的方法。该方法基于并发错误检测,但使用故障模拟来寻找临界点-故障难以检测的地方。针对这些关键点的部分重复设计能够以较低的区域开销成本增加故障覆盖率。由于更高的故障覆盖率,我们可以增加可靠性参数。在FPGA实现的火车站安全装置设计上进行了验证。
{"title":"Faults Coverage Improvement Based on Fault Simulation and Partial Duplication","authors":"Jaroslav Borecký, Martin Kohlík, H. Kubátová, P. Kubalík","doi":"10.1109/DSD.2010.112","DOIUrl":"https://doi.org/10.1109/DSD.2010.112","url":null,"abstract":"A method how to improve the coverage of single faults in combinational circuits is proposed. The method is based on Concurrent Error Detection, but uses a fault simulation to find Critical points – the places, where faults are difficult to detect. The partial duplication of the design with regard to these critical points is able to increase the faults coverage with a low area overhead cost. Due to higher fault coverage we can increase the dependability parameters. The proposed modification is tested on the railway station safety devices designs implemented in the FPGA.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115878620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Modeling Reconfigurable Systems-on-Chips with UML MARTE Profile: An Exploratory Analysis 用UML MARTE概要文件建模可重构的片上系统:一个探索性分析
Sana Cherif, I. Quadri, S. Meftali, J. Dekeyser
Reconfigurable FPGA based Systems-on-Chip (SoC)architectures are increasingly becoming the preferred solution for implementing modern embedded systems, due to their flexible nature. However due to the tremendous amount of hardware resources available in these systems, new design methodologies and tools are required to reduce their design complexity. In this paper we present an exploratory analysis for specification of these systems, while utilizing the UML MARTE (Modeling and Analysis of Real-time and Embedded Systems) profile. Our contributions permit us to model fine grain reconfigurable FPGA based SoC architectures while extending the profile to integrate new features such as Partial Dynamic Reconfiguration supported by these modern systems. Finally we present the current limitations of the MARTE profile and ask some open questions regarding how these high level models can be effectively used as input for commercial FPGA simulation and synthesis tools. Solutions to these questions can help in creating a design flow from high level models to synthesis, placement and execution of these reconfigurable SoCs.
基于可重构FPGA的片上系统(SoC)架构由于其灵活性,正日益成为实现现代嵌入式系统的首选解决方案。然而,由于这些系统中可用的硬件资源数量巨大,需要新的设计方法和工具来降低其设计复杂性。在本文中,我们在利用UML MARTE(实时和嵌入式系统的建模和分析)概要文件的同时,提出了对这些系统规范的探索性分析。我们的贡献使我们能够对基于SoC架构的细粒度可重构FPGA进行建模,同时扩展配置文件以集成这些现代系统支持的部分动态重新配置等新功能。最后,我们提出了MARTE配置文件的当前局限性,并提出了一些关于如何将这些高级模型有效地用作商业FPGA仿真和合成工具的输入的开放性问题。这些问题的解决方案可以帮助创建从高级模型到这些可重构soc的合成、放置和执行的设计流程。
{"title":"Modeling Reconfigurable Systems-on-Chips with UML MARTE Profile: An Exploratory Analysis","authors":"Sana Cherif, I. Quadri, S. Meftali, J. Dekeyser","doi":"10.1109/DSD.2010.58","DOIUrl":"https://doi.org/10.1109/DSD.2010.58","url":null,"abstract":"Reconfigurable FPGA based Systems-on-Chip (SoC)architectures are increasingly becoming the preferred solution for implementing modern embedded systems, due to their flexible nature. However due to the tremendous amount of hardware resources available in these systems, new design methodologies and tools are required to reduce their design complexity. In this paper we present an exploratory analysis for specification of these systems, while utilizing the UML MARTE (Modeling and Analysis of Real-time and Embedded Systems) profile. Our contributions permit us to model fine grain reconfigurable FPGA based SoC architectures while extending the profile to integrate new features such as Partial Dynamic Reconfiguration supported by these modern systems. Finally we present the current limitations of the MARTE profile and ask some open questions regarding how these high level models can be effectively used as input for commercial FPGA simulation and synthesis tools. Solutions to these questions can help in creating a design flow from high level models to synthesis, placement and execution of these reconfigurable SoCs.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133629385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
期刊
2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1