首页 > 最新文献

2013 International Symposium on System on Chip (SoC)最新文献

英文 中文
Study of adaptive detection for MIMO-OFDM systems MIMO-OFDM系统的自适应检测研究
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675276
Essi Suikkanen, Janne Janhunen, S. Shahabuddin, M. Juntti
Requirements for higher data rates and lower power consumption set new challenges for implementation of multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) receivers. Simple detectors have the advantage of low complexity and power consumption, but they cannot offer as good performance as more complex detectors. Therefore it would be beneficial to be able to adapt the detector algorithm to suit the channel conditions to minimize the receiver processing power consumption while satisfying the quality of service requirements. At low signal-to-noise ratio (SNR) and/or low rank channel, more power and computation resources could be used for detection in order to guarantee reliable communication, while in good conditions, a simple and less power consuming detector could be used. In this paper, we compare the performance of different detection algorithms. The performance results are based on simulations in long term evolution (LTE) system. The effect of precoding and hybrid automatic repeat request (HARQ) on the performance is shown. Implementation results based on the existing literature are included in the comparison. We discuss when it would be beneficial to use a complex detector and when a simple one would be sufficient. Also the switching criterion is discussed.
对更高数据速率和更低功耗的要求为多输入多输出正交频分复用(MIMO-OFDM)接收机的实现提出了新的挑战。简单的检测器具有低复杂度和低功耗的优点,但它们不能提供像更复杂的检测器那样好的性能。因此,能够使检测器算法适应信道条件,在满足服务质量要求的同时使接收机处理功耗最小化,将是有益的。在低信噪比(SNR)和/或低秩信道中,为了保证通信的可靠性,可以使用更多的功率和计算资源进行检测,而在良好的条件下,可以使用简单且功耗更低的检测器。在本文中,我们比较了不同检测算法的性能。性能结果基于长期演进(LTE)系统的仿真。分析了预编码和混合自动重复请求(HARQ)对性能的影响。根据已有文献的实施结果进行比较。我们讨论了什么时候使用复杂检测器是有益的,什么时候使用简单检测器就足够了。并讨论了切换准则。
{"title":"Study of adaptive detection for MIMO-OFDM systems","authors":"Essi Suikkanen, Janne Janhunen, S. Shahabuddin, M. Juntti","doi":"10.1109/ISSoC.2013.6675276","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675276","url":null,"abstract":"Requirements for higher data rates and lower power consumption set new challenges for implementation of multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) receivers. Simple detectors have the advantage of low complexity and power consumption, but they cannot offer as good performance as more complex detectors. Therefore it would be beneficial to be able to adapt the detector algorithm to suit the channel conditions to minimize the receiver processing power consumption while satisfying the quality of service requirements. At low signal-to-noise ratio (SNR) and/or low rank channel, more power and computation resources could be used for detection in order to guarantee reliable communication, while in good conditions, a simple and less power consuming detector could be used. In this paper, we compare the performance of different detection algorithms. The performance results are based on simulations in long term evolution (LTE) system. The effect of precoding and hybrid automatic repeat request (HARQ) on the performance is shown. Implementation results based on the existing literature are included in the comparison. We discuss when it would be beneficial to use a complex detector and when a simple one would be sufficient. Also the switching criterion is discussed.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121102377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Partitioning constraints and signal routing approach for multi-FPGA prototyping platform 多fpga原型平台的划分约束和信号路由方法
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675273
M. Turki, H. Mehrez, Z. Marrakchi, M. Abid
With the global trend towards digital systems, designer's goal is to manage the system on chip complexity in accordance with the time to market constraint. Multi-FPGA hardware prototyping is an important feature to validate the design before reaching the fabrication phase. However, since the design is partitioned into multi-FPGA platform, the system frequency of the prototyped design is dramatically decreased due to the inter-FPGA communications. In fact, the way in which the design is partitioned affects the number of inter-FPGA signals and the critical path delay. In this paper, we propose a prototyping environment for multi-FPGA platforms. The partitioner tool is constrained so that it tries to find the best trade off between criteria that affects the system frequency. The resulting inter-FPGA signals are routed using an iterative routing algorithm. If the number of these signals exceeds the number of available traces between FPGAs, multiplexing IPs are inserted in the sending and receiving FPGA in order to transmit several signals through the same physical wire.
随着数字系统的全球趋势,设计人员的目标是根据上市时间的限制来管理片上系统的复杂性。多fpga硬件原型是在进入制造阶段之前验证设计的重要特征。然而,由于设计被划分为多个fpga平台,由于fpga之间的通信,原型设计的系统频率大大降低。实际上,设计的划分方式会影响fpga间信号的数量和关键路径延迟。在本文中,我们提出了一个多fpga平台的原型环境。分区工具受到约束,因此它试图在影响系统频率的标准之间找到最佳折衷方案。由此产生的fpga间信号使用迭代路由算法进行路由。如果这些信号的数量超过FPGA之间可用的走线数量,则在发送和接收FPGA中插入多路复用ip,以便通过同一条物理线传输多个信号。
{"title":"Partitioning constraints and signal routing approach for multi-FPGA prototyping platform","authors":"M. Turki, H. Mehrez, Z. Marrakchi, M. Abid","doi":"10.1109/ISSoC.2013.6675273","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675273","url":null,"abstract":"With the global trend towards digital systems, designer's goal is to manage the system on chip complexity in accordance with the time to market constraint. Multi-FPGA hardware prototyping is an important feature to validate the design before reaching the fabrication phase. However, since the design is partitioned into multi-FPGA platform, the system frequency of the prototyped design is dramatically decreased due to the inter-FPGA communications. In fact, the way in which the design is partitioned affects the number of inter-FPGA signals and the critical path delay. In this paper, we propose a prototyping environment for multi-FPGA platforms. The partitioner tool is constrained so that it tries to find the best trade off between criteria that affects the system frequency. The resulting inter-FPGA signals are routed using an iterative routing algorithm. If the number of these signals exceeds the number of available traces between FPGAs, multiplexing IPs are inserted in the sending and receiving FPGA in order to transmit several signals through the same physical wire.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126186459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Framework for industrial embedded system product development and management 框架工业嵌入式系统产品开发和管理
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675265
Arttu Leppakoski, E. Salminen, T. Hämäläinen
Industrial machines like cranes have very long lifetime and high safety and reliability requirements. Embedded systems including HW boards and SW stacks are used to control the machines. The challenge is how to manage several different HW/SW combinations and versions over the life cycle. This paper presents recent development on the workflow and tools for our Programmable Control and Communication Platform (PCCP). The cornerstone is Yocto Project, which handles building Linux-based control SW and includes a new layer for PCCP. Setting up the new workflow took 180 person hours excluding initial studying and training. After deployment, typical changes in HW and SW configurations take only hours and the quality of the process is significantly improved. Based on this work, PCCP can be upgraded in a controlled way during its life cycle by 2026.
像起重机这样的工业机械具有很长的使用寿命和很高的安全性和可靠性要求。嵌入式系统包括硬件板和软件栈用于控制机器。挑战在于如何在整个生命周期中管理几个不同的硬件/软件组合和版本。本文介绍了可编程控制与通信平台(PCCP)的工作流程和工具的最新进展。它的基石是Yocto项目,它处理构建基于linux的控件软件,并包含一个新的PCCP层。除了最初的学习和培训,建立新的工作流程需要180个小时。部署后,硬件和软件配置的典型更改只需要几个小时,而且流程的质量得到了显著提高。在此基础上,到2026年,PCCP可在其生命周期内可控升级。
{"title":"Framework for industrial embedded system product development and management","authors":"Arttu Leppakoski, E. Salminen, T. Hämäläinen","doi":"10.1109/ISSoC.2013.6675265","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675265","url":null,"abstract":"Industrial machines like cranes have very long lifetime and high safety and reliability requirements. Embedded systems including HW boards and SW stacks are used to control the machines. The challenge is how to manage several different HW/SW combinations and versions over the life cycle. This paper presents recent development on the workflow and tools for our Programmable Control and Communication Platform (PCCP). The cornerstone is Yocto Project, which handles building Linux-based control SW and includes a new layer for PCCP. Setting up the new workflow took 180 person hours excluding initial studying and training. After deployment, typical changes in HW and SW configurations take only hours and the quality of the process is significantly improved. Based on this work, PCCP can be upgraded in a controlled way during its life cycle by 2026.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123633432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Optimizing the overhead for network-on-chip routing reconfiguration in parallel multi-core platforms 优化并行多核平台中片上网络路由重新配置的开销
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675258
Marco Balboni, F. Triviño, J. Flich, D. Bertozzi
In order to cope with an increased level of resource contention and dynamic application behaviour, the runtime reconfiguration of the routing function of an on-chip interconnection network is a desirable feature for multi-core hardware platforms in the embedded computing domain. The most intuitive approach consists of draining the network from ongoing packets before reconfiguring its routing tables, thus preventing the occurrence of deadlock from the ground up. The impact on application performance is however unacceptable. On the other hand, truly dynamic approaches are too much of an overhead for an on-chip setting. Recently, the overlapped static reconfiguration (OSR) method was proven to be capable of routing reconfiguration in the presence of background traffic with only a mild impact on the resource budget. This work finds that this method is still far from materializing its potentials in terms of reconfiguration performance (both impact on background traffic, which is still there to some extent, and duration of the reconfiguration transient). Therefore, it proposes a set of optimization methods for OSR spanning the trade-off between performance improvements and implementation cost. To the limit, fully transparent reconfiguration is delivered.
为了应对日益增长的资源竞争和动态应用行为,片上互连网络路由功能的运行时重构是嵌入式计算领域多核硬件平台所需要的特性。最直观的方法是在重新配置路由表之前从正在进行的数据包中抽出网络,从而从头开始防止死锁的发生。然而,对应用程序性能的影响是不可接受的。另一方面,真正动态的方法对于片上设置来说开销太大。最近,重叠静态重新配置(OSR)方法被证明能够在存在后台流量的情况下进行路由重新配置,而对资源预算的影响很小。这项工作发现,该方法在重新配置性能方面仍远未实现其潜力(对背景流量的影响,在某种程度上仍然存在,以及重新配置瞬态的持续时间)。因此,本文提出了一套跨越性能改进和实现成本之间权衡的OSR优化方法。在极限情况下,交付的是完全透明的重新配置。
{"title":"Optimizing the overhead for network-on-chip routing reconfiguration in parallel multi-core platforms","authors":"Marco Balboni, F. Triviño, J. Flich, D. Bertozzi","doi":"10.1109/ISSoC.2013.6675258","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675258","url":null,"abstract":"In order to cope with an increased level of resource contention and dynamic application behaviour, the runtime reconfiguration of the routing function of an on-chip interconnection network is a desirable feature for multi-core hardware platforms in the embedded computing domain. The most intuitive approach consists of draining the network from ongoing packets before reconfiguring its routing tables, thus preventing the occurrence of deadlock from the ground up. The impact on application performance is however unacceptable. On the other hand, truly dynamic approaches are too much of an overhead for an on-chip setting. Recently, the overlapped static reconfiguration (OSR) method was proven to be capable of routing reconfiguration in the presence of background traffic with only a mild impact on the resource budget. This work finds that this method is still far from materializing its potentials in terms of reconfiguration performance (both impact on background traffic, which is still there to some extent, and duration of the reconfiguration transient). Therefore, it proposes a set of optimization methods for OSR spanning the trade-off between performance improvements and implementation cost. To the limit, fully transparent reconfiguration is delivered.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127113986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Achieving QoS in NoC-based MPSoCs through Dynamic Frequency Scaling 通过动态频率缩放实现基于noc的mpsoc的QoS
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675275
G. Guindani, F. Moraes
The management of Quality-of-Service (QoS) constraints in NoC-based MPSoCs, with dozens of tasks running simultaneously, is still a challenge. Techniques applied at design or run-time to address this issue adopts different QoS metrics. Designers include in their systems monitoring techniques, adapting at run-time the QoS parameters to cope with the required constraints. In order words, MPSoC are able to self-adapt themselves, while executing a given set of applications. Self-adaptation capability is a key feature to meet applications' requirements in dynamic systems. Dynamic Voltage and Frequency Scaling (DVFS) is an adaptation technique frequently used to reduce the overall energy consumption, not coupled to QoS constraints, as throughput or latency. Another example of adaptation technique is task migration, which focus on throughput or latency optimization. The self-adaptation technique proposed in this paper adopts Dynamic Frequency Scaling (DFS) trading-off power consumption and QoS constraints. Each processor running the applications' tasks initially reaches a steady state leading each task to a frequency level that optimizes the communication with neighbor tasks. The goal of the initial state is to reach a trade-off between power consumption and communication throughput. Next, the application performance is monitored to adjust the frequency level of each task according to the QoS parameters. Results show that the proposed self-adaptability scheme can meet the required QoS constraints, by changing the frequency of the PEs running the application tasks.
在具有数十个任务同时运行的基于noc的mpsoc中,服务质量(QoS)约束的管理仍然是一个挑战。在设计或运行时用于解决此问题的技术采用不同的QoS度量。设计人员将其系统监控技术包括在内,在运行时调整QoS参数以应对所需的约束。换句话说,MPSoC能够自适应,同时执行给定的一组应用程序。在动态系统中,自适应能力是满足应用需求的关键特征。动态电压和频率缩放(DVFS)是一种经常用于降低总体能耗的自适应技术,不与吞吐量或延迟等QoS约束相耦合。自适应技术的另一个例子是任务迁移,其重点是吞吐量或延迟优化。本文提出的自适应技术采用动态频率缩放(Dynamic Frequency Scaling, DFS)来平衡功耗和QoS约束。运行应用程序任务的每个处理器最初达到一个稳定状态,将每个任务引导到一个优化与相邻任务通信的频率级别。初始状态的目标是在功耗和通信吞吐量之间达到折衷。接下来,监控应用程序性能,根据QoS参数调整每个任务的频率级别。结果表明,通过改变运行应用任务的pe的频率,所提出的自适应方案能够满足QoS约束要求。
{"title":"Achieving QoS in NoC-based MPSoCs through Dynamic Frequency Scaling","authors":"G. Guindani, F. Moraes","doi":"10.1109/ISSoC.2013.6675275","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675275","url":null,"abstract":"The management of Quality-of-Service (QoS) constraints in NoC-based MPSoCs, with dozens of tasks running simultaneously, is still a challenge. Techniques applied at design or run-time to address this issue adopts different QoS metrics. Designers include in their systems monitoring techniques, adapting at run-time the QoS parameters to cope with the required constraints. In order words, MPSoC are able to self-adapt themselves, while executing a given set of applications. Self-adaptation capability is a key feature to meet applications' requirements in dynamic systems. Dynamic Voltage and Frequency Scaling (DVFS) is an adaptation technique frequently used to reduce the overall energy consumption, not coupled to QoS constraints, as throughput or latency. Another example of adaptation technique is task migration, which focus on throughput or latency optimization. The self-adaptation technique proposed in this paper adopts Dynamic Frequency Scaling (DFS) trading-off power consumption and QoS constraints. Each processor running the applications' tasks initially reaches a steady state leading each task to a frequency level that optimizes the communication with neighbor tasks. The goal of the initial state is to reach a trade-off between power consumption and communication throughput. Next, the application performance is monitored to adjust the frequency level of each task according to the QoS parameters. Results show that the proposed self-adaptability scheme can meet the required QoS constraints, by changing the frequency of the PEs running the application tasks.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131465100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
SW and HW speculative Nelder-Mead execution for high performance unconstrained optimization SW和HW推测Nelder-Mead执行高性能无约束优化
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675279
Artur Mariano, Paulo Garcia, T. Gomes
This paper addresses the performance assessment of a new Nelder-Mead variant, that speculatively executes the simplex operations. This new variant was implemented as x86 parallel and sequential CPU versions as well as in handwritten and automatic C-to-RTL FPGA designs. As the execution flow is the same on every version, the efficiency of the synchronization by software and hardware is also accessed. Performance trials of these versions where performed using (i) a last-generation FPGA and a last generation multi-core CPU-chip to run the software versions and (ii) relatively simple objective functions in ℝ2. Results show that performance of the handwritten hardware design is relatively equivalent to the sequential software version of the algorithm, even running at a much lower clock frequency (average of 1.9Mhz vs 3.4GHz). They also suggest that the synchronization methods employed to control the speculative execution are too expensive when managed by software, but efficient if managed by hardware.
本文讨论了一种新的推测性执行单纯形操作的Nelder-Mead变体的性能评估。这个新的变体被实现为x86并行和顺序CPU版本,以及手写和自动C-to-RTL FPGA设计。由于每个版本的执行流程都是相同的,因此也访问了软件和硬件同步的效率。这些版本的性能试验使用(i)上一代FPGA和上一代多核cpu芯片来运行软件版本和(ii)相对简单的目标函数。结果表明,手写硬件设计的性能与顺序软件版本的算法相当,即使在低得多的时钟频率下运行(平均1.9Mhz vs 3.4GHz)。他们还建议,用于控制推测执行的同步方法在由软件管理时过于昂贵,但如果由硬件管理则很有效。
{"title":"SW and HW speculative Nelder-Mead execution for high performance unconstrained optimization","authors":"Artur Mariano, Paulo Garcia, T. Gomes","doi":"10.1109/ISSoC.2013.6675279","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675279","url":null,"abstract":"This paper addresses the performance assessment of a new Nelder-Mead variant, that speculatively executes the simplex operations. This new variant was implemented as x86 parallel and sequential CPU versions as well as in handwritten and automatic C-to-RTL FPGA designs. As the execution flow is the same on every version, the efficiency of the synchronization by software and hardware is also accessed. Performance trials of these versions where performed using (i) a last-generation FPGA and a last generation multi-core CPU-chip to run the software versions and (ii) relatively simple objective functions in ℝ2. Results show that performance of the handwritten hardware design is relatively equivalent to the sequential software version of the algorithm, even running at a much lower clock frequency (average of 1.9Mhz vs 3.4GHz). They also suggest that the synchronization methods employed to control the speculative execution are too expensive when managed by software, but efficient if managed by hardware.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114985632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A cycle accurate simulation framework for asynchronous NoC design 异步NoC设计的周期精确仿真框架
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675263
F. Terraneo, Davide Zoni, W. Fornaciari
Network-on-Chip (NoC) represents a flexible and scalable interconnection candidate for current and future multi-cores. In such a scenario power represents a major design obstacle, requiring accurate early-stage estimation for both cores and NoCs. In this perspective, Dynamic Frequency Scaling (DFS) techniques have been proposed as a flexible and scalable way to optimize the power-performance trade-off. However, there is a lack of tools that allow for an early-stage evaluation of different DFS solutions as well as asynchronous NoC. This work proposes a new cycle-accurate simulation framework supporting asynchronous NoC design, allowing also to assess heterogeneous and dynamic frequency schemes for NoC routers.
片上网络(NoC)代表了当前和未来多核的灵活和可扩展的互连候选。在这种情况下,功耗是一个主要的设计障碍,需要对核心和noc进行准确的早期估计。从这个角度来看,动态频率缩放(DFS)技术已经被提出作为一种灵活和可扩展的方法来优化功率性能权衡。然而,目前缺乏能够对不同的DFS解决方案以及异步NoC进行早期评估的工具。这项工作提出了一个新的周期精确仿真框架,支持异步NoC设计,也允许评估NoC路由器的异构和动态频率方案。
{"title":"A cycle accurate simulation framework for asynchronous NoC design","authors":"F. Terraneo, Davide Zoni, W. Fornaciari","doi":"10.1109/ISSoC.2013.6675263","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675263","url":null,"abstract":"Network-on-Chip (NoC) represents a flexible and scalable interconnection candidate for current and future multi-cores. In such a scenario power represents a major design obstacle, requiring accurate early-stage estimation for both cores and NoCs. In this perspective, Dynamic Frequency Scaling (DFS) techniques have been proposed as a flexible and scalable way to optimize the power-performance trade-off. However, there is a lack of tools that allow for an early-stage evaluation of different DFS solutions as well as asynchronous NoC. This work proposes a new cycle-accurate simulation framework supporting asynchronous NoC design, allowing also to assess heterogeneous and dynamic frequency schemes for NoC routers.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128878303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
ViSA: A highly efficient slot architecture enabling multi-objective ASIP cores ViSA:一个高效的插槽架构,支持多目标ASIP内核
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675270
P. Figuli, Carsten Tradowsky, Nadine Gaertner, J. Becker
Field Programmable Gate Arrays (FPGA) are widely used to accelerate parallel applications by specialized hardware. Especially for data flow intensive applications FPGAs are very well suited to design application specific data paths with a certain degree of parallelism. Since most of applications also need control flow, the most common method is to design complex state machines that are realized in hardware. However, this often leads to very high and inefficient resource utilization on the target architecture for design parts that are not performance critical nor relevant for more efficient realizations. In this paper, we propose a generic VLIW-inspired Slot Architecture (ViSA), which combines two efficient objectives, the performance of parallel hardware and the low area utilization of custom processors. Furthermore, we introduce the methodology for mapping and debugging applications on the efficient ViSA architecture.We present experimental results of two corner case applications showing that our approach is suitable for ultra low power as well as high performance computing. Using the presented co-design methodology, we will conclude that ViSA enables the realization of multi-objective design spaces for various target domains. ViSA has extreme throughput at low operating frequencies leading to significant power and energy savings over state of the art architectures.
现场可编程门阵列(FPGA)被广泛用于通过专用硬件加速并行应用程序。特别是对于数据流密集的应用,fpga非常适合设计具有一定程度并行性的应用特定数据路径。由于大多数应用程序也需要控制流,因此最常用的方法是设计在硬件中实现的复杂状态机。然而,这通常会导致目标体系结构对设计部分的资源利用率非常高且效率低下,这些设计部分对性能不重要,也与更有效的实现无关。在本文中,我们提出了一个通用的受vliw启发的插槽架构(ViSA),它结合了两个高效的目标,并行硬件的性能和定制处理器的低面积利用率。此外,我们还介绍了在高效ViSA架构上映射和调试应用程序的方法。我们给出了两个角落案例应用的实验结果,表明我们的方法适用于超低功耗和高性能计算。使用所提出的协同设计方法,我们将得出结论,ViSA能够实现各种目标领域的多目标设计空间。ViSA在低工作频率下具有极高的吞吐量,比最先进的架构节省了大量的电力和能源。
{"title":"ViSA: A highly efficient slot architecture enabling multi-objective ASIP cores","authors":"P. Figuli, Carsten Tradowsky, Nadine Gaertner, J. Becker","doi":"10.1109/ISSoC.2013.6675270","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675270","url":null,"abstract":"Field Programmable Gate Arrays (FPGA) are widely used to accelerate parallel applications by specialized hardware. Especially for data flow intensive applications FPGAs are very well suited to design application specific data paths with a certain degree of parallelism. Since most of applications also need control flow, the most common method is to design complex state machines that are realized in hardware. However, this often leads to very high and inefficient resource utilization on the target architecture for design parts that are not performance critical nor relevant for more efficient realizations. In this paper, we propose a generic VLIW-inspired Slot Architecture (ViSA), which combines two efficient objectives, the performance of parallel hardware and the low area utilization of custom processors. Furthermore, we introduce the methodology for mapping and debugging applications on the efficient ViSA architecture.We present experimental results of two corner case applications showing that our approach is suitable for ultra low power as well as high performance computing. Using the presented co-design methodology, we will conclude that ViSA enables the realization of multi-objective design spaces for various target domains. ViSA has extreme throughput at low operating frequencies leading to significant power and energy savings over state of the art architectures.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Adaptive QoS techniques for NoC-based MPSoCs 基于noc的mpsoc自适应QoS技术
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675274
Marcelo Ruaro, E. Carara, F. Moraes
With the significant increase in the number of processing elements in NoC-Based MPSoCs, communication becomes, increasingly, a critical resource for performance gains and QoS guarantees. The main gap observed in the NoC-Based MPSoCs literature is the runtime adaptive techniques to meet QoS. In the absence of such techniques, the system user must statically define the resource distribution to each real-time task. The goal of this research is to investigate the runtime adaptation of the NoC resources, according to the QoS requirements of each application running in the MPSoC. The adaptive techniques presented in this work focused in adaptive routing, flow priorities, and switching mode. The monitoring and adaptation management is performed at the operating system level, ensuring QoS to the monitored applications. Monitoring and QoS adaptation were implemented in software. In the experiments, applications with latency and throughput deadlines run concurrently with best-effort applications. Results with real applications reduced in average 60% the number of latency violations, ensuring smaller jitter and higher throughput. The execution time of applications is not penalized applying the proposed QoS adaptation methods.
随着基于noc的mpsoc中处理元素数量的显著增加,通信日益成为提高性能和保证QoS的关键资源。在基于noc的mpsoc文献中观察到的主要差距是满足QoS的运行时自适应技术。在没有这些技术的情况下,系统用户必须静态地定义每个实时任务的资源分配。本研究的目标是根据MPSoC中运行的每个应用程序的QoS要求,研究NoC资源的运行时适应性。本文提出的自适应技术主要集中在自适应路由、流优先级和交换模式上。监控和适配管理在操作系统级别执行,确保被监控应用程序的QoS。监控和QoS自适应在软件中实现。在实验中,具有延迟和吞吐量截止日期的应用程序与尽力而为的应用程序并发运行。实际应用程序的结果平均减少了60%的延迟违规数量,确保了更小的抖动和更高的吞吐量。应用所提出的QoS自适应方法不会影响应用程序的执行时间。
{"title":"Adaptive QoS techniques for NoC-based MPSoCs","authors":"Marcelo Ruaro, E. Carara, F. Moraes","doi":"10.1109/ISSoC.2013.6675274","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675274","url":null,"abstract":"With the significant increase in the number of processing elements in NoC-Based MPSoCs, communication becomes, increasingly, a critical resource for performance gains and QoS guarantees. The main gap observed in the NoC-Based MPSoCs literature is the runtime adaptive techniques to meet QoS. In the absence of such techniques, the system user must statically define the resource distribution to each real-time task. The goal of this research is to investigate the runtime adaptation of the NoC resources, according to the QoS requirements of each application running in the MPSoC. The adaptive techniques presented in this work focused in adaptive routing, flow priorities, and switching mode. The monitoring and adaptation management is performed at the operating system level, ensuring QoS to the monitored applications. Monitoring and QoS adaptation were implemented in software. In the experiments, applications with latency and throughput deadlines run concurrently with best-effort applications. Results with real applications reduced in average 60% the number of latency violations, ensuring smaller jitter and higher throughput. The execution time of applications is not penalized applying the proposed QoS adaptation methods.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126592413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Efficient on-chip vector processing for multicore processors 有效的片上矢量处理多核处理器
Pub Date : 2013-12-02 DOI: 10.1109/ISSoC.2013.6675260
S. F. Beldianu, Sotirios G. Ziavras
Per-core vector support in multicores is not efficient since applications rarely sustain high DLP. We present two Power Gating (PG) schemes to dynamically control Vector co-Processors (VPs) shared by cores. ASIC and FPGA modeling show that PG can reduce the energy by 33% while maintaining high performance.
多核中的每核矢量支持效率不高,因为应用程序很少维持高DLP。提出了两种功率门控(PG)方案来动态控制内核间共享的矢量协处理器(VPs)。ASIC和FPGA建模表明,在保持高性能的同时,PG可以降低33%的能量。
{"title":"Efficient on-chip vector processing for multicore processors","authors":"S. F. Beldianu, Sotirios G. Ziavras","doi":"10.1109/ISSoC.2013.6675260","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675260","url":null,"abstract":"Per-core vector support in multicores is not efficient since applications rarely sustain high DLP. We present two Power Gating (PG) schemes to dynamically control Vector co-Processors (VPs) shared by cores. ASIC and FPGA modeling show that PG can reduce the energy by 33% while maintaining high performance.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"23 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125688135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2013 International Symposium on System on Chip (SoC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1