Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675276
Essi Suikkanen, Janne Janhunen, S. Shahabuddin, M. Juntti
Requirements for higher data rates and lower power consumption set new challenges for implementation of multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) receivers. Simple detectors have the advantage of low complexity and power consumption, but they cannot offer as good performance as more complex detectors. Therefore it would be beneficial to be able to adapt the detector algorithm to suit the channel conditions to minimize the receiver processing power consumption while satisfying the quality of service requirements. At low signal-to-noise ratio (SNR) and/or low rank channel, more power and computation resources could be used for detection in order to guarantee reliable communication, while in good conditions, a simple and less power consuming detector could be used. In this paper, we compare the performance of different detection algorithms. The performance results are based on simulations in long term evolution (LTE) system. The effect of precoding and hybrid automatic repeat request (HARQ) on the performance is shown. Implementation results based on the existing literature are included in the comparison. We discuss when it would be beneficial to use a complex detector and when a simple one would be sufficient. Also the switching criterion is discussed.
{"title":"Study of adaptive detection for MIMO-OFDM systems","authors":"Essi Suikkanen, Janne Janhunen, S. Shahabuddin, M. Juntti","doi":"10.1109/ISSoC.2013.6675276","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675276","url":null,"abstract":"Requirements for higher data rates and lower power consumption set new challenges for implementation of multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) receivers. Simple detectors have the advantage of low complexity and power consumption, but they cannot offer as good performance as more complex detectors. Therefore it would be beneficial to be able to adapt the detector algorithm to suit the channel conditions to minimize the receiver processing power consumption while satisfying the quality of service requirements. At low signal-to-noise ratio (SNR) and/or low rank channel, more power and computation resources could be used for detection in order to guarantee reliable communication, while in good conditions, a simple and less power consuming detector could be used. In this paper, we compare the performance of different detection algorithms. The performance results are based on simulations in long term evolution (LTE) system. The effect of precoding and hybrid automatic repeat request (HARQ) on the performance is shown. Implementation results based on the existing literature are included in the comparison. We discuss when it would be beneficial to use a complex detector and when a simple one would be sufficient. Also the switching criterion is discussed.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121102377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675273
M. Turki, H. Mehrez, Z. Marrakchi, M. Abid
With the global trend towards digital systems, designer's goal is to manage the system on chip complexity in accordance with the time to market constraint. Multi-FPGA hardware prototyping is an important feature to validate the design before reaching the fabrication phase. However, since the design is partitioned into multi-FPGA platform, the system frequency of the prototyped design is dramatically decreased due to the inter-FPGA communications. In fact, the way in which the design is partitioned affects the number of inter-FPGA signals and the critical path delay. In this paper, we propose a prototyping environment for multi-FPGA platforms. The partitioner tool is constrained so that it tries to find the best trade off between criteria that affects the system frequency. The resulting inter-FPGA signals are routed using an iterative routing algorithm. If the number of these signals exceeds the number of available traces between FPGAs, multiplexing IPs are inserted in the sending and receiving FPGA in order to transmit several signals through the same physical wire.
{"title":"Partitioning constraints and signal routing approach for multi-FPGA prototyping platform","authors":"M. Turki, H. Mehrez, Z. Marrakchi, M. Abid","doi":"10.1109/ISSoC.2013.6675273","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675273","url":null,"abstract":"With the global trend towards digital systems, designer's goal is to manage the system on chip complexity in accordance with the time to market constraint. Multi-FPGA hardware prototyping is an important feature to validate the design before reaching the fabrication phase. However, since the design is partitioned into multi-FPGA platform, the system frequency of the prototyped design is dramatically decreased due to the inter-FPGA communications. In fact, the way in which the design is partitioned affects the number of inter-FPGA signals and the critical path delay. In this paper, we propose a prototyping environment for multi-FPGA platforms. The partitioner tool is constrained so that it tries to find the best trade off between criteria that affects the system frequency. The resulting inter-FPGA signals are routed using an iterative routing algorithm. If the number of these signals exceeds the number of available traces between FPGAs, multiplexing IPs are inserted in the sending and receiving FPGA in order to transmit several signals through the same physical wire.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126186459","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675265
Arttu Leppakoski, E. Salminen, T. Hämäläinen
Industrial machines like cranes have very long lifetime and high safety and reliability requirements. Embedded systems including HW boards and SW stacks are used to control the machines. The challenge is how to manage several different HW/SW combinations and versions over the life cycle. This paper presents recent development on the workflow and tools for our Programmable Control and Communication Platform (PCCP). The cornerstone is Yocto Project, which handles building Linux-based control SW and includes a new layer for PCCP. Setting up the new workflow took 180 person hours excluding initial studying and training. After deployment, typical changes in HW and SW configurations take only hours and the quality of the process is significantly improved. Based on this work, PCCP can be upgraded in a controlled way during its life cycle by 2026.
{"title":"Framework for industrial embedded system product development and management","authors":"Arttu Leppakoski, E. Salminen, T. Hämäläinen","doi":"10.1109/ISSoC.2013.6675265","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675265","url":null,"abstract":"Industrial machines like cranes have very long lifetime and high safety and reliability requirements. Embedded systems including HW boards and SW stacks are used to control the machines. The challenge is how to manage several different HW/SW combinations and versions over the life cycle. This paper presents recent development on the workflow and tools for our Programmable Control and Communication Platform (PCCP). The cornerstone is Yocto Project, which handles building Linux-based control SW and includes a new layer for PCCP. Setting up the new workflow took 180 person hours excluding initial studying and training. After deployment, typical changes in HW and SW configurations take only hours and the quality of the process is significantly improved. Based on this work, PCCP can be upgraded in a controlled way during its life cycle by 2026.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123633432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675258
Marco Balboni, F. Triviño, J. Flich, D. Bertozzi
In order to cope with an increased level of resource contention and dynamic application behaviour, the runtime reconfiguration of the routing function of an on-chip interconnection network is a desirable feature for multi-core hardware platforms in the embedded computing domain. The most intuitive approach consists of draining the network from ongoing packets before reconfiguring its routing tables, thus preventing the occurrence of deadlock from the ground up. The impact on application performance is however unacceptable. On the other hand, truly dynamic approaches are too much of an overhead for an on-chip setting. Recently, the overlapped static reconfiguration (OSR) method was proven to be capable of routing reconfiguration in the presence of background traffic with only a mild impact on the resource budget. This work finds that this method is still far from materializing its potentials in terms of reconfiguration performance (both impact on background traffic, which is still there to some extent, and duration of the reconfiguration transient). Therefore, it proposes a set of optimization methods for OSR spanning the trade-off between performance improvements and implementation cost. To the limit, fully transparent reconfiguration is delivered.
{"title":"Optimizing the overhead for network-on-chip routing reconfiguration in parallel multi-core platforms","authors":"Marco Balboni, F. Triviño, J. Flich, D. Bertozzi","doi":"10.1109/ISSoC.2013.6675258","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675258","url":null,"abstract":"In order to cope with an increased level of resource contention and dynamic application behaviour, the runtime reconfiguration of the routing function of an on-chip interconnection network is a desirable feature for multi-core hardware platforms in the embedded computing domain. The most intuitive approach consists of draining the network from ongoing packets before reconfiguring its routing tables, thus preventing the occurrence of deadlock from the ground up. The impact on application performance is however unacceptable. On the other hand, truly dynamic approaches are too much of an overhead for an on-chip setting. Recently, the overlapped static reconfiguration (OSR) method was proven to be capable of routing reconfiguration in the presence of background traffic with only a mild impact on the resource budget. This work finds that this method is still far from materializing its potentials in terms of reconfiguration performance (both impact on background traffic, which is still there to some extent, and duration of the reconfiguration transient). Therefore, it proposes a set of optimization methods for OSR spanning the trade-off between performance improvements and implementation cost. To the limit, fully transparent reconfiguration is delivered.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127113986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675275
G. Guindani, F. Moraes
The management of Quality-of-Service (QoS) constraints in NoC-based MPSoCs, with dozens of tasks running simultaneously, is still a challenge. Techniques applied at design or run-time to address this issue adopts different QoS metrics. Designers include in their systems monitoring techniques, adapting at run-time the QoS parameters to cope with the required constraints. In order words, MPSoC are able to self-adapt themselves, while executing a given set of applications. Self-adaptation capability is a key feature to meet applications' requirements in dynamic systems. Dynamic Voltage and Frequency Scaling (DVFS) is an adaptation technique frequently used to reduce the overall energy consumption, not coupled to QoS constraints, as throughput or latency. Another example of adaptation technique is task migration, which focus on throughput or latency optimization. The self-adaptation technique proposed in this paper adopts Dynamic Frequency Scaling (DFS) trading-off power consumption and QoS constraints. Each processor running the applications' tasks initially reaches a steady state leading each task to a frequency level that optimizes the communication with neighbor tasks. The goal of the initial state is to reach a trade-off between power consumption and communication throughput. Next, the application performance is monitored to adjust the frequency level of each task according to the QoS parameters. Results show that the proposed self-adaptability scheme can meet the required QoS constraints, by changing the frequency of the PEs running the application tasks.
在具有数十个任务同时运行的基于noc的mpsoc中,服务质量(QoS)约束的管理仍然是一个挑战。在设计或运行时用于解决此问题的技术采用不同的QoS度量。设计人员将其系统监控技术包括在内,在运行时调整QoS参数以应对所需的约束。换句话说,MPSoC能够自适应,同时执行给定的一组应用程序。在动态系统中,自适应能力是满足应用需求的关键特征。动态电压和频率缩放(DVFS)是一种经常用于降低总体能耗的自适应技术,不与吞吐量或延迟等QoS约束相耦合。自适应技术的另一个例子是任务迁移,其重点是吞吐量或延迟优化。本文提出的自适应技术采用动态频率缩放(Dynamic Frequency Scaling, DFS)来平衡功耗和QoS约束。运行应用程序任务的每个处理器最初达到一个稳定状态,将每个任务引导到一个优化与相邻任务通信的频率级别。初始状态的目标是在功耗和通信吞吐量之间达到折衷。接下来,监控应用程序性能,根据QoS参数调整每个任务的频率级别。结果表明,通过改变运行应用任务的pe的频率,所提出的自适应方案能够满足QoS约束要求。
{"title":"Achieving QoS in NoC-based MPSoCs through Dynamic Frequency Scaling","authors":"G. Guindani, F. Moraes","doi":"10.1109/ISSoC.2013.6675275","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675275","url":null,"abstract":"The management of Quality-of-Service (QoS) constraints in NoC-based MPSoCs, with dozens of tasks running simultaneously, is still a challenge. Techniques applied at design or run-time to address this issue adopts different QoS metrics. Designers include in their systems monitoring techniques, adapting at run-time the QoS parameters to cope with the required constraints. In order words, MPSoC are able to self-adapt themselves, while executing a given set of applications. Self-adaptation capability is a key feature to meet applications' requirements in dynamic systems. Dynamic Voltage and Frequency Scaling (DVFS) is an adaptation technique frequently used to reduce the overall energy consumption, not coupled to QoS constraints, as throughput or latency. Another example of adaptation technique is task migration, which focus on throughput or latency optimization. The self-adaptation technique proposed in this paper adopts Dynamic Frequency Scaling (DFS) trading-off power consumption and QoS constraints. Each processor running the applications' tasks initially reaches a steady state leading each task to a frequency level that optimizes the communication with neighbor tasks. The goal of the initial state is to reach a trade-off between power consumption and communication throughput. Next, the application performance is monitored to adjust the frequency level of each task according to the QoS parameters. Results show that the proposed self-adaptability scheme can meet the required QoS constraints, by changing the frequency of the PEs running the application tasks.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131465100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675279
Artur Mariano, Paulo Garcia, T. Gomes
This paper addresses the performance assessment of a new Nelder-Mead variant, that speculatively executes the simplex operations. This new variant was implemented as x86 parallel and sequential CPU versions as well as in handwritten and automatic C-to-RTL FPGA designs. As the execution flow is the same on every version, the efficiency of the synchronization by software and hardware is also accessed. Performance trials of these versions where performed using (i) a last-generation FPGA and a last generation multi-core CPU-chip to run the software versions and (ii) relatively simple objective functions in ℝ2. Results show that performance of the handwritten hardware design is relatively equivalent to the sequential software version of the algorithm, even running at a much lower clock frequency (average of 1.9Mhz vs 3.4GHz). They also suggest that the synchronization methods employed to control the speculative execution are too expensive when managed by software, but efficient if managed by hardware.
本文讨论了一种新的推测性执行单纯形操作的Nelder-Mead变体的性能评估。这个新的变体被实现为x86并行和顺序CPU版本,以及手写和自动C-to-RTL FPGA设计。由于每个版本的执行流程都是相同的,因此也访问了软件和硬件同步的效率。这些版本的性能试验使用(i)上一代FPGA和上一代多核cpu芯片来运行软件版本和(ii)相对简单的目标函数。结果表明,手写硬件设计的性能与顺序软件版本的算法相当,即使在低得多的时钟频率下运行(平均1.9Mhz vs 3.4GHz)。他们还建议,用于控制推测执行的同步方法在由软件管理时过于昂贵,但如果由硬件管理则很有效。
{"title":"SW and HW speculative Nelder-Mead execution for high performance unconstrained optimization","authors":"Artur Mariano, Paulo Garcia, T. Gomes","doi":"10.1109/ISSoC.2013.6675279","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675279","url":null,"abstract":"This paper addresses the performance assessment of a new Nelder-Mead variant, that speculatively executes the simplex operations. This new variant was implemented as x86 parallel and sequential CPU versions as well as in handwritten and automatic C-to-RTL FPGA designs. As the execution flow is the same on every version, the efficiency of the synchronization by software and hardware is also accessed. Performance trials of these versions where performed using (i) a last-generation FPGA and a last generation multi-core CPU-chip to run the software versions and (ii) relatively simple objective functions in ℝ2. Results show that performance of the handwritten hardware design is relatively equivalent to the sequential software version of the algorithm, even running at a much lower clock frequency (average of 1.9Mhz vs 3.4GHz). They also suggest that the synchronization methods employed to control the speculative execution are too expensive when managed by software, but efficient if managed by hardware.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"205 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114985632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675263
F. Terraneo, Davide Zoni, W. Fornaciari
Network-on-Chip (NoC) represents a flexible and scalable interconnection candidate for current and future multi-cores. In such a scenario power represents a major design obstacle, requiring accurate early-stage estimation for both cores and NoCs. In this perspective, Dynamic Frequency Scaling (DFS) techniques have been proposed as a flexible and scalable way to optimize the power-performance trade-off. However, there is a lack of tools that allow for an early-stage evaluation of different DFS solutions as well as asynchronous NoC. This work proposes a new cycle-accurate simulation framework supporting asynchronous NoC design, allowing also to assess heterogeneous and dynamic frequency schemes for NoC routers.
{"title":"A cycle accurate simulation framework for asynchronous NoC design","authors":"F. Terraneo, Davide Zoni, W. Fornaciari","doi":"10.1109/ISSoC.2013.6675263","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675263","url":null,"abstract":"Network-on-Chip (NoC) represents a flexible and scalable interconnection candidate for current and future multi-cores. In such a scenario power represents a major design obstacle, requiring accurate early-stage estimation for both cores and NoCs. In this perspective, Dynamic Frequency Scaling (DFS) techniques have been proposed as a flexible and scalable way to optimize the power-performance trade-off. However, there is a lack of tools that allow for an early-stage evaluation of different DFS solutions as well as asynchronous NoC. This work proposes a new cycle-accurate simulation framework supporting asynchronous NoC design, allowing also to assess heterogeneous and dynamic frequency schemes for NoC routers.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128878303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675270
P. Figuli, Carsten Tradowsky, Nadine Gaertner, J. Becker
Field Programmable Gate Arrays (FPGA) are widely used to accelerate parallel applications by specialized hardware. Especially for data flow intensive applications FPGAs are very well suited to design application specific data paths with a certain degree of parallelism. Since most of applications also need control flow, the most common method is to design complex state machines that are realized in hardware. However, this often leads to very high and inefficient resource utilization on the target architecture for design parts that are not performance critical nor relevant for more efficient realizations. In this paper, we propose a generic VLIW-inspired Slot Architecture (ViSA), which combines two efficient objectives, the performance of parallel hardware and the low area utilization of custom processors. Furthermore, we introduce the methodology for mapping and debugging applications on the efficient ViSA architecture.We present experimental results of two corner case applications showing that our approach is suitable for ultra low power as well as high performance computing. Using the presented co-design methodology, we will conclude that ViSA enables the realization of multi-objective design spaces for various target domains. ViSA has extreme throughput at low operating frequencies leading to significant power and energy savings over state of the art architectures.
{"title":"ViSA: A highly efficient slot architecture enabling multi-objective ASIP cores","authors":"P. Figuli, Carsten Tradowsky, Nadine Gaertner, J. Becker","doi":"10.1109/ISSoC.2013.6675270","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675270","url":null,"abstract":"Field Programmable Gate Arrays (FPGA) are widely used to accelerate parallel applications by specialized hardware. Especially for data flow intensive applications FPGAs are very well suited to design application specific data paths with a certain degree of parallelism. Since most of applications also need control flow, the most common method is to design complex state machines that are realized in hardware. However, this often leads to very high and inefficient resource utilization on the target architecture for design parts that are not performance critical nor relevant for more efficient realizations. In this paper, we propose a generic VLIW-inspired Slot Architecture (ViSA), which combines two efficient objectives, the performance of parallel hardware and the low area utilization of custom processors. Furthermore, we introduce the methodology for mapping and debugging applications on the efficient ViSA architecture.We present experimental results of two corner case applications showing that our approach is suitable for ultra low power as well as high performance computing. Using the presented co-design methodology, we will conclude that ViSA enables the realization of multi-objective design spaces for various target domains. ViSA has extreme throughput at low operating frequencies leading to significant power and energy savings over state of the art architectures.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128597182","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675274
Marcelo Ruaro, E. Carara, F. Moraes
With the significant increase in the number of processing elements in NoC-Based MPSoCs, communication becomes, increasingly, a critical resource for performance gains and QoS guarantees. The main gap observed in the NoC-Based MPSoCs literature is the runtime adaptive techniques to meet QoS. In the absence of such techniques, the system user must statically define the resource distribution to each real-time task. The goal of this research is to investigate the runtime adaptation of the NoC resources, according to the QoS requirements of each application running in the MPSoC. The adaptive techniques presented in this work focused in adaptive routing, flow priorities, and switching mode. The monitoring and adaptation management is performed at the operating system level, ensuring QoS to the monitored applications. Monitoring and QoS adaptation were implemented in software. In the experiments, applications with latency and throughput deadlines run concurrently with best-effort applications. Results with real applications reduced in average 60% the number of latency violations, ensuring smaller jitter and higher throughput. The execution time of applications is not penalized applying the proposed QoS adaptation methods.
{"title":"Adaptive QoS techniques for NoC-based MPSoCs","authors":"Marcelo Ruaro, E. Carara, F. Moraes","doi":"10.1109/ISSoC.2013.6675274","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675274","url":null,"abstract":"With the significant increase in the number of processing elements in NoC-Based MPSoCs, communication becomes, increasingly, a critical resource for performance gains and QoS guarantees. The main gap observed in the NoC-Based MPSoCs literature is the runtime adaptive techniques to meet QoS. In the absence of such techniques, the system user must statically define the resource distribution to each real-time task. The goal of this research is to investigate the runtime adaptation of the NoC resources, according to the QoS requirements of each application running in the MPSoC. The adaptive techniques presented in this work focused in adaptive routing, flow priorities, and switching mode. The monitoring and adaptation management is performed at the operating system level, ensuring QoS to the monitored applications. Monitoring and QoS adaptation were implemented in software. In the experiments, applications with latency and throughput deadlines run concurrently with best-effort applications. Results with real applications reduced in average 60% the number of latency violations, ensuring smaller jitter and higher throughput. The execution time of applications is not penalized applying the proposed QoS adaptation methods.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126592413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-02DOI: 10.1109/ISSoC.2013.6675260
S. F. Beldianu, Sotirios G. Ziavras
Per-core vector support in multicores is not efficient since applications rarely sustain high DLP. We present two Power Gating (PG) schemes to dynamically control Vector co-Processors (VPs) shared by cores. ASIC and FPGA modeling show that PG can reduce the energy by 33% while maintaining high performance.
{"title":"Efficient on-chip vector processing for multicore processors","authors":"S. F. Beldianu, Sotirios G. Ziavras","doi":"10.1109/ISSoC.2013.6675260","DOIUrl":"https://doi.org/10.1109/ISSoC.2013.6675260","url":null,"abstract":"Per-core vector support in multicores is not efficient since applications rarely sustain high DLP. We present two Power Gating (PG) schemes to dynamically control Vector co-Processors (VPs) shared by cores. ASIC and FPGA modeling show that PG can reduce the energy by 33% while maintaining high performance.","PeriodicalId":228272,"journal":{"name":"2013 International Symposium on System on Chip (SoC)","volume":"23 9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125688135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}