Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683954
C. Marcon, Alexandre M. Amory, F. T. Bortolon, T. Webber, Thomas Volpato, Jader Munareto
Advances in design integration have enabled the integration of large Multiprocessor Systems-on-Chip (MPSoC). Such systems are prone to the execution of complex applications if high degree of parallelism is employed on the communication infrastructure. Network-on-Chip (NoC) has emerged as a new communication paradigm for large MPSoCs with advantages such as the increase of reliability on components interactions. However, device's integration may convey few shortcomings during MPSoC manufacturing and operation, for instance, the vulnerability to faults. This paper describes Phoenix, which is a direct mesh NoC with fault detection scheme. The proposed architecture explores a fault-tolerant mechanism, which is implemented in a distributed manner as a fault monitor on processors and routers. Results demonstrate that Phoenix can be scalable in view of the stabilization time regarding to faults incidence, allowing MPSoC operation even with the occurrence of a large number of faults.
{"title":"An implementation of a distributed fault-tolerant mechanism for 2D mesh NoCs","authors":"C. Marcon, Alexandre M. Amory, F. T. Bortolon, T. Webber, Thomas Volpato, Jader Munareto","doi":"10.1109/RSP.2013.6683954","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683954","url":null,"abstract":"Advances in design integration have enabled the integration of large Multiprocessor Systems-on-Chip (MPSoC). Such systems are prone to the execution of complex applications if high degree of parallelism is employed on the communication infrastructure. Network-on-Chip (NoC) has emerged as a new communication paradigm for large MPSoCs with advantages such as the increase of reliability on components interactions. However, device's integration may convey few shortcomings during MPSoC manufacturing and operation, for instance, the vulnerability to faults. This paper describes Phoenix, which is a direct mesh NoC with fault detection scheme. The proposed architecture explores a fault-tolerant mechanism, which is implemented in a distributed manner as a fault monitor on processors and routers. Results demonstrate that Phoenix can be scalable in view of the stabilization time regarding to faults incidence, allowing MPSoC operation even with the occurrence of a large number of faults.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122560203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683957
Kazem Cheshmi, M. Soltaniyeh, S. Mohammadi, Jelena Trajkovic
Network on Chip (NoC) is a new communication paradigm for emerging multi- and many-core architectures. Despite major benefits, like scalability and power efficiency, it suffers from lack of guaranteed bounded latency. Many contemporary applications, like multimedia and real-time applications, require such a guarantee. The growth of these applications in embedded systems emphasizes the need for guaranteed services in NoCs. Additionally, increasing numbers of cores in NoCs highlights the clock distribution issue. Globally asynchronous locally synchronous (GALS) NoC architectures propose to solve this issue through using asynchronous routers to connect synchronous blocks. This paper presents a novel approach for guaranteed service in a GALS NoC by using router with set port quota. We propose a novel router architecture which facilitates guaranteed latency for accessing shared media. Our simulations show up to 39% improvement in latency, with a negligible (up to 5%) power overhead.
片上网络(NoC)是新兴的多核和多核架构的一种新的通信范式。尽管具有可伸缩性和能效等主要优点,但它的缺点是缺乏保证的有限延迟。许多当代应用,如多媒体和实时应用,都需要这样的保证。嵌入式系统中这些应用程序的增长强调了noc中有保障服务的需求。此外,noc中内核数量的增加突出了时钟分布问题。全局异步本地同步(global asynchronous local synchronous, GALS) NoC架构通过使用异步路由器连接同步块来解决这个问题。提出了一种在GALS NoC中使用设置端口配额的路由器来保证服务的新方法。我们提出了一种新的路由器架构,以保证访问共享媒体的延迟。我们的模拟显示,延迟提高了39%,而功耗开销可以忽略不计(最多5%)。
{"title":"Quota setting router architecture for quality of service in GALS NoC","authors":"Kazem Cheshmi, M. Soltaniyeh, S. Mohammadi, Jelena Trajkovic","doi":"10.1109/RSP.2013.6683957","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683957","url":null,"abstract":"Network on Chip (NoC) is a new communication paradigm for emerging multi- and many-core architectures. Despite major benefits, like scalability and power efficiency, it suffers from lack of guaranteed bounded latency. Many contemporary applications, like multimedia and real-time applications, require such a guarantee. The growth of these applications in embedded systems emphasizes the need for guaranteed services in NoCs. Additionally, increasing numbers of cores in NoCs highlights the clock distribution issue. Globally asynchronous locally synchronous (GALS) NoC architectures propose to solve this issue through using asynchronous routers to connect synchronous blocks. This paper presents a novel approach for guaranteed service in a GALS NoC by using router with set port quota. We propose a novel router architecture which facilitates guaranteed latency for accessing shared media. Our simulations show up to 39% improvement in latency, with a negligible (up to 5%) power overhead.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115809265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683962
N. Druml, M. Menghin, Daniel Kroisleitner, C. Steger, R. Weiss, H. Bock, J. Haid
Design exploration and evaluation are essential tasks during a product's development cycle. Simulation and hardware emulation are common techniques to explore and evaluate the functionality of hardware/software designs. However, when it comes to distributed secure applications, like contactless reader/smart card systems, non-functional design properties and system aspects (e.g., conctactless power transfer, power consumption) have to be regarded too. State-of-the-art simulation-based and emulation-based design exploration tools cover these design issues and system aspects only to some extent. Here we present a design exploration framework for complete reader/smart card systems using state-of-the-art model-based emulation and estimation techniques. This novel system-based approach is of high importance because of the high availability of battery powered mobile readers (i.e. smart phones) and novel mobile application fields. Contactless power transfer and power consumption analyses of reader and smart cards can be performed for each clock cycle and in real time. Thus, novel system-level power and security optimization techniques can be evaluated considering the reader/smart card system as a whole. We demonstrate the application of our exploration framework by means of a typical Diffie-Hellman key exchange between reader and smart card and highlight power optimization possibilities.
{"title":"Emulation-based design evaluation of reader/smart card systems","authors":"N. Druml, M. Menghin, Daniel Kroisleitner, C. Steger, R. Weiss, H. Bock, J. Haid","doi":"10.1109/RSP.2013.6683962","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683962","url":null,"abstract":"Design exploration and evaluation are essential tasks during a product's development cycle. Simulation and hardware emulation are common techniques to explore and evaluate the functionality of hardware/software designs. However, when it comes to distributed secure applications, like contactless reader/smart card systems, non-functional design properties and system aspects (e.g., conctactless power transfer, power consumption) have to be regarded too. State-of-the-art simulation-based and emulation-based design exploration tools cover these design issues and system aspects only to some extent. Here we present a design exploration framework for complete reader/smart card systems using state-of-the-art model-based emulation and estimation techniques. This novel system-based approach is of high importance because of the high availability of battery powered mobile readers (i.e. smart phones) and novel mobile application fields. Contactless power transfer and power consumption analyses of reader and smart cards can be performed for each clock cycle and in real time. Thus, novel system-level power and security optimization techniques can be evaluated considering the reader/smart card system as a whole. We demonstrate the application of our exploration framework by means of a typical Diffie-Hellman key exchange between reader and smart card and highlight power optimization possibilities.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114766956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683968
S. Lovergine, Antonino Tumeo, Oreste Villa, Fabrizio Ferrandi
Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on noncoherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expected performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.
现代嵌入式系统包含数百个内核。由于难以提供快速、一致的内存体系结构,这些系统通常依赖于每个核心具有私有内存的非一致、非统一的内存体系结构。然而,对这些系统进行编程带来了巨大的挑战。开发人员必须提取大量的并行性,同时编排核心之间的通信以优化应用程序性能。对于不规则应用程序,这些问题变得更加严重,这些不规则应用程序呈现难以分区的数据集、不可预测的内存访问、不平衡的控制流和细粒度通信。手动优化每一个方面都是困难和耗时的,而且它通常不会带来预期的性能。这种复杂和高度并行的体系结构与用于描述规范的高级语言之间的差距越来越大,这些语言是为更简单的系统设计的,没有考虑这些新问题。在本文中,我们介绍了YAPPA (Yet Another Parallel Programming Approach),这是一个基于LLVM的现代mpsoc上不规则应用程序自动并行化的编译框架。我们首先考虑分布式内存系统上不规则应用程序的高效并行编程方法。然后我们提出一组可以减少开发和优化工作的转换。我们的初始原型的结果证实了所提出的方法的正确性。
{"title":"YAPPA: A compiler-based parallelization framework for irregular applications on MPSoCs","authors":"S. Lovergine, Antonino Tumeo, Oreste Villa, Fabrizio Ferrandi","doi":"10.1109/RSP.2013.6683968","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683968","url":null,"abstract":"Modern embedded systems include hundreds of cores. Because of the difficulty in providing a fast, coherent memory architecture, these systems usually rely on noncoherent, non-uniform memory architectures with private memories for each core. However, programming these systems poses significant challenges. The developer must extract large amounts of parallelism, while orchestrating communication among cores to optimize application performance. These issues become even more significant with irregular applications, which present data sets difficult to partition, unpredictable memory accesses, unbalanced control flow and fine grained communication. Hand-optimizing every single aspect is hard and time-consuming, and it often does not lead to the expected performance. There is a growing gap between such complex and highly-parallel architectures and the high level languages used to describe the specification, which were designed for simpler systems and do not consider these new issues. In this paper we introduce YAPPA (Yet Another Parallel Programming Approach), a compilation framework for the automatic parallelization of irregular applications on modern MPSoCs based on LLVM. We start by considering an efficient parallel programming approach for irregular applications on distributed memory systems. We then propose a set of transformations that can reduce the development and optimization effort. The results of our initial prototype confirm the correctness of the proposed approach.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127361338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683960
N. Adler, S. Otten, Markus Mohrhard, K. Müller-Glaser
The international standard ISO 26262 “Road vehicles - Functional safety” claims qualitative and quantitative analysis of hardware designs at the appropriate level of abstraction. For large-scaled hardware designs, these evaluations have to be initiated early in development adequate to hardware architectural design and not delayed to hardware detailed design at the level of electronic schematics. Therefore, we describe a structural modeling and annotation of failure data for hardware architectural designs. Based on a top-down qualitative fault tree analysis, the classification of hardware failure modes in context of system behavior can be determined according to ISO 26262. Using these classifications and assumed failure rates, we facilitate a rapid quantitative safety analysis regarding evaluation of the hardware architectural metrics and evaluation of safety goal violations.
{"title":"Rapid safety evaluation of hardware architectural designs compliant with ISO 26262","authors":"N. Adler, S. Otten, Markus Mohrhard, K. Müller-Glaser","doi":"10.1109/RSP.2013.6683960","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683960","url":null,"abstract":"The international standard ISO 26262 “Road vehicles - Functional safety” claims qualitative and quantitative analysis of hardware designs at the appropriate level of abstraction. For large-scaled hardware designs, these evaluations have to be initiated early in development adequate to hardware architectural design and not delayed to hardware detailed design at the level of electronic schematics. Therefore, we describe a structural modeling and annotation of failure data for hardware architectural designs. Based on a top-down qualitative fault tree analysis, the classification of hardware failure modes in context of system behavior can be determined according to ISO 26262. Using these classifications and assumed failure rates, we facilitate a rapid quantitative safety analysis regarding evaluation of the hardware architectural metrics and evaluation of safety goal violations.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127009960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683963
Purushotham Murugappa, Vianney Lapôtre, A. Baghdadi, M. Jézéquel
Many modern and emerging designs require having efficient dynamically reconfigurable and reprogrammable processors. However, when the implemented design needs an upgrade, newly added features have to be quickly supported and validated. This is clearly noticed in modern receivers of recent wireless communication standards that feature continuously different frame lengths and code rates for the channel decoder. This paper explores with an example the possibility of realizing a flexible channel decoder to implement and validate new/incremental algorithm changes with fast turnaround time in design. An application specific instruction-set processor (ASIP) is proposed as flexible core that can decode low-density parity-check (LDPC) codes with the various block sizes and code rates as specified in WiFi and WiMAX standards. Furthermore, the proposed architecture enables quick support of other Quasi-Cyclic LDPC (QC-LDPC) codes, e.g. DVB-S2, with simple incremental hardware changes at design time.
{"title":"Rapid design and prototyping of a reconfigurable decoder architecture for QC-LDPC codes","authors":"Purushotham Murugappa, Vianney Lapôtre, A. Baghdadi, M. Jézéquel","doi":"10.1109/RSP.2013.6683963","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683963","url":null,"abstract":"Many modern and emerging designs require having efficient dynamically reconfigurable and reprogrammable processors. However, when the implemented design needs an upgrade, newly added features have to be quickly supported and validated. This is clearly noticed in modern receivers of recent wireless communication standards that feature continuously different frame lengths and code rates for the channel decoder. This paper explores with an example the possibility of realizing a flexible channel decoder to implement and validate new/incremental algorithm changes with fast turnaround time in design. An application specific instruction-set processor (ASIP) is proposed as flexible core that can decode low-density parity-check (LDPC) codes with the various block sizes and code rates as specified in WiFi and WiMAX standards. Furthermore, the proposed architecture enables quick support of other Quasi-Cyclic LDPC (QC-LDPC) codes, e.g. DVB-S2, with simple incremental hardware changes at design time.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"216 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129445987","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683970
Shakith Fernando, Firew Siyoum, Yifan He, Akash Kumar, H. Corporaal
Heterogeneous Multiprocessor System-on-Chips (HMPSoC) are becoming popular as a means of meeting energy efficiency requirements of modern embedded systems. However, as these HMPSoCs run multimedia applications as well, they also need to meet real-time requirements. Designing these predictable HMPSoCs is a key challenge, as the current design methods for these platforms are either semi-automated, non-predictable, or have limited heterogeneity. In this paper, we propose a design framework to generate and program HMPSoC designs in a rapid and predictable manner. It takes the application specifications and the architecture model as input and generates the entire HMPSoC, for FPGA prototyping, that meets the throughput constraints. The experimental results show that our framework can provide a conservative bound on the worst-case throughput of the FPGA implementation. We also present results of a case study that computes the area-power trade-offs of an industrial vision application. The entire design space exploration of all configurations was completed in 8 hours. A tool-chain targeting the Xilinx Zynq FPGA is also presented.
{"title":"MAMPSx: A design framework for rapid synthesis of predictable heterogeneous MPSoCs","authors":"Shakith Fernando, Firew Siyoum, Yifan He, Akash Kumar, H. Corporaal","doi":"10.1109/RSP.2013.6683970","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683970","url":null,"abstract":"Heterogeneous Multiprocessor System-on-Chips (HMPSoC) are becoming popular as a means of meeting energy efficiency requirements of modern embedded systems. However, as these HMPSoCs run multimedia applications as well, they also need to meet real-time requirements. Designing these predictable HMPSoCs is a key challenge, as the current design methods for these platforms are either semi-automated, non-predictable, or have limited heterogeneity. In this paper, we propose a design framework to generate and program HMPSoC designs in a rapid and predictable manner. It takes the application specifications and the architecture model as input and generates the entire HMPSoC, for FPGA prototyping, that meets the throughput constraints. The experimental results show that our framework can provide a conservative bound on the worst-case throughput of the FPGA implementation. We also present results of a case study that computes the area-power trade-offs of an industrial vision application. The entire design space exploration of all configurations was completed in 8 hours. A tool-chain targeting the Xilinx Zynq FPGA is also presented.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"77 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123270953","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683958
O. A. D. L. Junior, V. Fresse, F. Rousseau
Networks-on-Chip (NoCs) are currently the most appropriate communication structure for many-core embedded systems. Those networks support many real-time data flows. Their performance depends directly on the routing strategy. In this paper, we present a new congestion-aware routing algorithm (FlexOE) based on a simple and flexible scheme of prioritized sets of rules. These sets of rules are based on the Odd-Even turn model, minimal paths checking, congestion information from adjacent routers and availability of output path. The algorithm FlexOE developed is integrated on a Hermes NoC, and then implemented on an FPGA. The evaluation results point out that FlexOE has greater performances than reference algorithms for some test scenarios and similar performances for others test scenarios.
{"title":"FlexOE: A congestion-aware routing algorithm for NoCs","authors":"O. A. D. L. Junior, V. Fresse, F. Rousseau","doi":"10.1109/RSP.2013.6683958","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683958","url":null,"abstract":"Networks-on-Chip (NoCs) are currently the most appropriate communication structure for many-core embedded systems. Those networks support many real-time data flows. Their performance depends directly on the routing strategy. In this paper, we present a new congestion-aware routing algorithm (FlexOE) based on a simple and flexible scheme of prioritized sets of rules. These sets of rules are based on the Odd-Even turn model, minimal paths checking, congestion information from adjacent routers and availability of output path. The algorithm FlexOE developed is integrated on a Hermes NoC, and then implemented on an FPGA. The evaluation results point out that FlexOE has greater performances than reference algorithms for some test scenarios and similar performances for others test scenarios.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125774386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683955
Takashi Nakada, Shinobu Miwa, K. Yano, Hiroshi Nakamura
Network-on-Chip (NoC) based multiprocessors have become popular as a scalable alternative to classical bus architectures. The performance evaluation of NoC-based multiprocessors is largely based on simulation. However, precise simulation is extremely slow. Additionally, there are many design parameters that affect the total performance. Therefore, it is practically impossible to use the precise simulation for the design space exploration purposes. To alleviate this problem, prototyping NoC systems and estimating their performances are critically important. In this paper, we present a generalized novel performance model that combined with the simulations for designing NoC-based multiprocessors. We revealed that the performance impact of cache and network latencies are dominant. Moreover, network congestion rarely happens under near appropriate configuration. Thus, the performance model is mainly constructed using the hardware parameters and the statistics that obtained from a simple cache simulation that is separated from the network behavior. The proposed performance model is used not only to obtain fast and accurate performance, but also to guide the NoC-based multiprocessor design space exploration. The accuracy of our approach and its practical use are illustrated through simulation. The results showed that proposed model can estimate performance with only 3.4% error on average and 21% at worst. We also confirmed that our evaluation framework can estimate 360 times faster than the brute force full system simulation.
{"title":"Performance modeling for designing NoC-based multiprocessors","authors":"Takashi Nakada, Shinobu Miwa, K. Yano, Hiroshi Nakamura","doi":"10.1109/RSP.2013.6683955","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683955","url":null,"abstract":"Network-on-Chip (NoC) based multiprocessors have become popular as a scalable alternative to classical bus architectures. The performance evaluation of NoC-based multiprocessors is largely based on simulation. However, precise simulation is extremely slow. Additionally, there are many design parameters that affect the total performance. Therefore, it is practically impossible to use the precise simulation for the design space exploration purposes. To alleviate this problem, prototyping NoC systems and estimating their performances are critically important. In this paper, we present a generalized novel performance model that combined with the simulations for designing NoC-based multiprocessors. We revealed that the performance impact of cache and network latencies are dominant. Moreover, network congestion rarely happens under near appropriate configuration. Thus, the performance model is mainly constructed using the hardware parameters and the statistics that obtained from a simple cache simulation that is separated from the network behavior. The proposed performance model is used not only to obtain fast and accurate performance, but also to guide the NoC-based multiprocessor design space exploration. The accuracy of our approach and its practical use are illustrated through simulation. The results showed that proposed model can estimate performance with only 3.4% error on average and 21% at worst. We also confirmed that our evaluation framework can estimate 360 times faster than the brute force full system simulation.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132739199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2013-12-16DOI: 10.1109/RSP.2013.6683964
Vinicius Bohrer, Ramon Fernandes, C. Marcon, T. Webber, L. Poehls, R. Czekster, Fabiano Hessel
The emergence of wireless networks has contributed to a growing number of studies and protocols regarding its performance and reliability requirements, among others. Several issues have to be considered when deploying such devices under harsh environmental conditions. These issues often force the designer to adopt decisions that are usually difficult to verify in real world settings. In order to mitigate such problems, an alternative resides in the use of simulation models for both homogeneous and heterogeneous devices. This paper describes an event-based Wireless Network Simulator (WiNeS) for devices operating in several topologies and configurations of networks. WiNeS is a Java-based framework specially built to support customized network options that offers hybrid simulation for virtual and physical nodes in the same environment. Some of WiNeS' features include the computation of maximum communication distances among devices in 2D and 3D spatial node distributions as well as pairing rules to evaluate the nodes connectivity.
{"title":"A flexible framework for modeling and simulation of multipurpose wireless networks","authors":"Vinicius Bohrer, Ramon Fernandes, C. Marcon, T. Webber, L. Poehls, R. Czekster, Fabiano Hessel","doi":"10.1109/RSP.2013.6683964","DOIUrl":"https://doi.org/10.1109/RSP.2013.6683964","url":null,"abstract":"The emergence of wireless networks has contributed to a growing number of studies and protocols regarding its performance and reliability requirements, among others. Several issues have to be considered when deploying such devices under harsh environmental conditions. These issues often force the designer to adopt decisions that are usually difficult to verify in real world settings. In order to mitigate such problems, an alternative resides in the use of simulation models for both homogeneous and heterogeneous devices. This paper describes an event-based Wireless Network Simulator (WiNeS) for devices operating in several topologies and configurations of networks. WiNeS is a Java-based framework specially built to support customized network options that offers hybrid simulation for virtual and physical nodes in the same environment. Some of WiNeS' features include the computation of maximum communication distances among devices in 2D and 3D spatial node distributions as well as pairing rules to evaluate the nodes connectivity.","PeriodicalId":227927,"journal":{"name":"2013 International Symposium on Rapid System Prototyping (RSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130742817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}