Pedro Suárez-Casal, Ángel Carro-Lagoa, J. García-Naya, L. Castedo
This paper describes a multicore Software Defined Radio (SDR) architecture devised to implement a fully reconfigurable downlink for WiMAX transceivers. The proposed architecture is made up of Commercial-Off-The-Shelf (COTS) modules available in the market and includes a DSP, three different models of FPGAs, DACs and ADCs. We show that the architecture is capable of supporting all the functionalities of the downlink sub frame of the Orthogonal Frequency Division Multiple Access (OFDMA) WiMAX physical layer, including Partial Usage of Sub carriers (PUSC) symbol structure and Forward Error Correction (FEC). The primary advantage of the design is the full reconfigurability at different levels: bandwidth, size of the FFT, modulation, code rate, etc. without modifying or restarting the system. We show that the five downlink profiles defined by the WiMAX Forum can be successfully implemented with the proposed achitecture.
{"title":"A Multicore SDR Architecture for Reconfigurable WiMAX Downlink","authors":"Pedro Suárez-Casal, Ángel Carro-Lagoa, J. García-Naya, L. Castedo","doi":"10.1109/DSD.2010.108","DOIUrl":"https://doi.org/10.1109/DSD.2010.108","url":null,"abstract":"This paper describes a multicore Software Defined Radio (SDR) architecture devised to implement a fully reconfigurable downlink for WiMAX transceivers. The proposed architecture is made up of Commercial-Off-The-Shelf (COTS) modules available in the market and includes a DSP, three different models of FPGAs, DACs and ADCs. We show that the architecture is capable of supporting all the functionalities of the downlink sub frame of the Orthogonal Frequency Division Multiple Access (OFDMA) WiMAX physical layer, including Partial Usage of Sub carriers (PUSC) symbol structure and Forward Error Correction (FEC). The primary advantage of the design is the full reconfigurability at different levels: bandwidth, size of the FFT, modulation, code rate, etc. without modifying or restarting the system. We show that the five downlink profiles defined by the WiMAX Forum can be successfully implemented with the proposed achitecture.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115232324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. V. D. Burgwal, K. Rovers, Koen C. H. Blom, A. Kokkeler, G. Smit
Until a decade ago, the concept of phased array beam forming was mainly implemented with mechanical or analog solutions. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beam forming. While more and more applications are using beam forming to improve the communication channel utilization both in space and frequency, many dedicated digital architectures are proposed for the processing. By using a reconfigurable architecture, the same hardware platform can be reused for different applications with different processing needs. In this paper, we present a reconfigurable Multi-processor System-on-Chip based solution for phased array processing that supports advanced tracking mechanisms to continuously receive signals with a mobile receiver. An adaptive beam former for DVB-S satellite reception is presented, that uses a Constant Modulus Algorithm to track satellites. The processing of a receiver with 64 antennas and 3 beams is mapped on a reconfigurable processor named Montium TP. The total implementation of such a receiver requires about 570 clock cycles on a single Montium TP, but can also be partitioned over multiple Montium TPs to support larger phased arrays.
{"title":"Adaptive Beamforming Using the Reconfigurable MONTIUM TP","authors":"M. V. D. Burgwal, K. Rovers, Koen C. H. Blom, A. Kokkeler, G. Smit","doi":"10.1109/DSD.2010.13","DOIUrl":"https://doi.org/10.1109/DSD.2010.13","url":null,"abstract":"Until a decade ago, the concept of phased array beam forming was mainly implemented with mechanical or analog solutions. Today, digital hardware has become powerful enough to perform the massive number of operations required for real-time digital beam forming. While more and more applications are using beam forming to improve the communication channel utilization both in space and frequency, many dedicated digital architectures are proposed for the processing. By using a reconfigurable architecture, the same hardware platform can be reused for different applications with different processing needs. In this paper, we present a reconfigurable Multi-processor System-on-Chip based solution for phased array processing that supports advanced tracking mechanisms to continuously receive signals with a mobile receiver. An adaptive beam former for DVB-S satellite reception is presented, that uses a Constant Modulus Algorithm to track satellites. The processing of a receiver with 64 antennas and 3 beams is mapped on a reconfigurable processor named Montium TP. The total implementation of such a receiver requires about 570 clock cycles on a single Montium TP, but can also be partitioned over multiple Montium TPs to support larger phased arrays.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121208933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a behavioural model for fast DLL simulations. The behavioural model includes a modelling of the various noise sources in the DLL that produce output jitter. The model is used to obtain the dependence of the output jitter versus the power consumption. The model exploits the open-loop DLL analysis to reduce simulation time when compared to typical DLL evaluation.
{"title":"Behavioural Modelling of DLLs for Fast Simulation and Optimisation of Jitter and Power Consumption","authors":"E. Barajas, D. Mateo, J. González","doi":"10.1109/DSD.2010.86","DOIUrl":"https://doi.org/10.1109/DSD.2010.86","url":null,"abstract":"This paper presents a behavioural model for fast DLL simulations. The behavioural model includes a modelling of the various noise sources in the DLL that produce output jitter. The model is used to obtain the dependence of the output jitter versus the power consumption. The model exploits the open-loop DLL analysis to reduce simulation time when compared to typical DLL evaluation.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125069874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The incorporation of Resonant Tunnel Diodes (RTDs) into III/V transistor technologies has shown an improved circuit performance: higher circuit speed, reduced component count, and/or lowered power consumption. Currently, the incorporation of these devices into CMOS technologies (RTD-CMOS) is an area of active research. Although some works have focused the evaluation of the advantages of this incorporation, additional work in this direction is required. This paper compares RTD-CMOS and pure CMOS realizations of a set of logic gates which can be operated in a gate-level nanopipelined fashion, thus allows estimating logic networks operating frequency. Lower power-delay products are obtained for RTD/CMOS implementations.
{"title":"Evaluation of RTD-CMOS Logic Gates","authors":"J. Núñez, M. Avedillo, J. Quintana","doi":"10.1109/DSD.2010.17","DOIUrl":"https://doi.org/10.1109/DSD.2010.17","url":null,"abstract":"The incorporation of Resonant Tunnel Diodes (RTDs) into III/V transistor technologies has shown an improved circuit performance: higher circuit speed, reduced component count, and/or lowered power consumption. Currently, the incorporation of these devices into CMOS technologies (RTD-CMOS) is an area of active research. Although some works have focused the evaluation of the advantages of this incorporation, additional work in this direction is required. This paper compares RTD-CMOS and pure CMOS realizations of a set of logic gates which can be operated in a gate-level nanopipelined fashion, thus allows estimating logic networks operating frequency. Lower power-delay products are obtained for RTD/CMOS implementations.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116386578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine
Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.
{"title":"H.264 Color Components Video Decoding Parallelization on Multi-core Processors","authors":"Elias Baaklini, Hassan Sbeity, S. Niar, N. Amaneddine","doi":"10.1109/DSD.2010.76","DOIUrl":"https://doi.org/10.1109/DSD.2010.76","url":null,"abstract":"Multiprocessor-system-on-a-chip will be the dominating architecture in embedded systems as it provides an increase in concurrency improving the performance of the system rather than increasing the clock speed which affects the power consumption of the system. However, concurrency needs to be exploited in order to improve the system performance in the different applications’environments. The new emerging H.264/AVC coding standard is designed to cover a wide range of applications (real-time conversational services such as videoconferencing, video phone, etc.). It has many new features that require complex computations compared to previous video coding standards. This coding standard will be a challenging workload for future MPSoC embedded systems. Exploiting the different levels of parallelism for video codec applications can be done at the data level, the functional level, or both simultaneously. Our intention in this paper is to explore the natural existent parallelism in the H.264 decoder software [2] itself without any modification to the encoder phase, rather than forcing parallelization techniques. Our novel idea is based on the fact that the H.264 decoder decodes the luminance and chrominance signals separately, but the decoder is implemented in a way to decode them in series. Our approach is to execute the different decoding phases of the luminance signals in parallel to the chrominance signals. Using two cores to decode the luma and the chroma signals in parallel gives a gain of 15-20% of the decoding processing time and combining them the functional pipelined implementation over four cores or more, the gain can reach 60% compared to the current sequential execution.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129261742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Analog VLSI circuits are being used successfully to implement Artificial Neural Networks (ANNs). These analog circuits exhibit nonlinear transfer function characteristics and suffer from device mismatches, degrading network performance. Because of the high cost involved with analog VLSI production, it is beneficial to predict implementation performance during design. We present an FPGA-based accelerator for the emulation of large (500+ synapses, 10k+ test samples) single-neuron ANNs implemented in analog VLSI. We used hardware time-multiplexing to scale network size and maximize hardware usage. An on-chip CPU controls the data flow through various memory systems to allow for large test sequences. We show that Block-RAM availability is the main implementation bottleneck and that a trade-off arises between emulation speed and hardware resources. However, we can emulate large amounts of synapses on an FPGA with limited resources. We have obtained a speedup of 30.5 times with respect to an optimized software implementation on a desktop computer.
{"title":"An FPGA-Based Accelerator for Analog VLSI Artificial Neural Network Emulation","authors":"B. V. Liempd, Daniel Herrera, M. Figueroa","doi":"10.1109/DSD.2010.20","DOIUrl":"https://doi.org/10.1109/DSD.2010.20","url":null,"abstract":"Analog VLSI circuits are being used successfully to implement Artificial Neural Networks (ANNs). These analog circuits exhibit nonlinear transfer function characteristics and suffer from device mismatches, degrading network performance. Because of the high cost involved with analog VLSI production, it is beneficial to predict implementation performance during design. We present an FPGA-based accelerator for the emulation of large (500+ synapses, 10k+ test samples) single-neuron ANNs implemented in analog VLSI. We used hardware time-multiplexing to scale network size and maximize hardware usage. An on-chip CPU controls the data flow through various memory systems to allow for large test sequences. We show that Block-RAM availability is the main implementation bottleneck and that a trade-off arises between emulation speed and hardware resources. However, we can emulate large amounts of synapses on an FPGA with limited resources. We have obtained a speedup of 30.5 times with respect to an optimized software implementation on a desktop computer.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129341323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In addition to functional simulation for validation of hardware/software designs, there are additional robustness requirements that need advanced simulation techniques and tools to analyze the system behavior in the presence of faults. In this paper, we present the design of a fault injection framework for LEON3, a 32bit SPARC CPU based system used by the European Space Agency, described at Transaction Level using System C. First of all an extension of a previous XML formalization of basic binary faults, like memory and CPU registers corruption, is done in order to support TLM2.0transaction’s parameters corruptions. Next a novel Dynamic Binary Instrumentation (DBI) technique for C++ binaries is used to insert fault injection wrappers in SystemC transaction path. For binary faults in model components the use ofTLM2.0 “transport_dbg” is proposed. This way each component with fault injection capabilities exposes a standard interface to allow internal component inspection and modification.
{"title":"LEON3 ViP: A Virtual Platform with Fault Injection Capabilities","authors":"Antônio da Silva, Sebastián Sánchez","doi":"10.1109/DSD.2010.34","DOIUrl":"https://doi.org/10.1109/DSD.2010.34","url":null,"abstract":"In addition to functional simulation for validation of hardware/software designs, there are additional robustness requirements that need advanced simulation techniques and tools to analyze the system behavior in the presence of faults. In this paper, we present the design of a fault injection framework for LEON3, a 32bit SPARC CPU based system used by the European Space Agency, described at Transaction Level using System C. First of all an extension of a previous XML formalization of basic binary faults, like memory and CPU registers corruption, is done in order to support TLM2.0transaction’s parameters corruptions. Next a novel Dynamic Binary Instrumentation (DBI) technique for C++ binaries is used to insert fault injection wrappers in SystemC transaction path. For binary faults in model components the use ofTLM2.0 “transport_dbg” is proposed. This way each component with fault injection capabilities exposes a standard interface to allow internal component inspection and modification.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126263925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Efficient on-chip communication is very important for exploiting enormous computing power available on a multi-core chip. Network on Chip (NoC) has emerged as a competitive candidate for implementing on-chip communication. Routing algorithms significantly affect the performance of a NoC. Most of the existing NoC architectural proposals advocate distributed routing algorithms for building NoC platforms. Although source routing offers many advantages, researchers avoided it due to its apparent disadvantage of larger header size requirement that results in lower bandwidth utilization. In this paper we make a strong case for the use of source routing for NoCs, especially for platforms with small sizes and regular topologies. We present a methodology to compute application specific efficient paths for communication among cores with a high degree of load balancing. The methodology first selects the most appropriate deadlock free routing algorithm, from a set of routing algorithms, based on the application’s traffic patterns. Then the selected (possibly adaptive) routing algorithm is used to compute efficient static paths with the goal of link load balancing. We demonstrate through simulation based evaluation that source routing has a potential of achieving higher performance, for example up to 28% lower latency even at medium load, as compared to distributed routing. A simple scheme is proposed for encoding of router ports to reduce the header overhead. A generic simulator was developed for evaluation and performance comparison between source routing and distributed routing. We also designed a router to support source routing for mesh topology NoC platforms.
{"title":"Designing Efficient Source Routing for Mesh Topology Network on Chip Platforms","authors":"S. Mubeen, Shashi Kumar","doi":"10.1109/DSD.2010.57","DOIUrl":"https://doi.org/10.1109/DSD.2010.57","url":null,"abstract":"Efficient on-chip communication is very important for exploiting enormous computing power available on a multi-core chip. Network on Chip (NoC) has emerged as a competitive candidate for implementing on-chip communication. Routing algorithms significantly affect the performance of a NoC. Most of the existing NoC architectural proposals advocate distributed routing algorithms for building NoC platforms. Although source routing offers many advantages, researchers avoided it due to its apparent disadvantage of larger header size requirement that results in lower bandwidth utilization. In this paper we make a strong case for the use of source routing for NoCs, especially for platforms with small sizes and regular topologies. We present a methodology to compute application specific efficient paths for communication among cores with a high degree of load balancing. The methodology first selects the most appropriate deadlock free routing algorithm, from a set of routing algorithms, based on the application’s traffic patterns. Then the selected (possibly adaptive) routing algorithm is used to compute efficient static paths with the goal of link load balancing. We demonstrate through simulation based evaluation that source routing has a potential of achieving higher performance, for example up to 28% lower latency even at medium load, as compared to distributed routing. A simple scheme is proposed for encoding of router ports to reduce the header overhead. A generic simulator was developed for evaluation and performance comparison between source routing and distributed routing. We also designed a router to support source routing for mesh topology NoC platforms.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125582916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaroslav Borecký, Martin Kohlík, H. Kubátová, P. Kubalík
A method how to improve the coverage of single faults in combinational circuits is proposed. The method is based on Concurrent Error Detection, but uses a fault simulation to find Critical points – the places, where faults are difficult to detect. The partial duplication of the design with regard to these critical points is able to increase the faults coverage with a low area overhead cost. Due to higher fault coverage we can increase the dependability parameters. The proposed modification is tested on the railway station safety devices designs implemented in the FPGA.
{"title":"Faults Coverage Improvement Based on Fault Simulation and Partial Duplication","authors":"Jaroslav Borecký, Martin Kohlík, H. Kubátová, P. Kubalík","doi":"10.1109/DSD.2010.112","DOIUrl":"https://doi.org/10.1109/DSD.2010.112","url":null,"abstract":"A method how to improve the coverage of single faults in combinational circuits is proposed. The method is based on Concurrent Error Detection, but uses a fault simulation to find Critical points – the places, where faults are difficult to detect. The partial duplication of the design with regard to these critical points is able to increase the faults coverage with a low area overhead cost. Due to higher fault coverage we can increase the dependability parameters. The proposed modification is tested on the railway station safety devices designs implemented in the FPGA.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"131 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115878620","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconfigurable FPGA based Systems-on-Chip (SoC)architectures are increasingly becoming the preferred solution for implementing modern embedded systems, due to their flexible nature. However due to the tremendous amount of hardware resources available in these systems, new design methodologies and tools are required to reduce their design complexity. In this paper we present an exploratory analysis for specification of these systems, while utilizing the UML MARTE (Modeling and Analysis of Real-time and Embedded Systems) profile. Our contributions permit us to model fine grain reconfigurable FPGA based SoC architectures while extending the profile to integrate new features such as Partial Dynamic Reconfiguration supported by these modern systems. Finally we present the current limitations of the MARTE profile and ask some open questions regarding how these high level models can be effectively used as input for commercial FPGA simulation and synthesis tools. Solutions to these questions can help in creating a design flow from high level models to synthesis, placement and execution of these reconfigurable SoCs.
{"title":"Modeling Reconfigurable Systems-on-Chips with UML MARTE Profile: An Exploratory Analysis","authors":"Sana Cherif, I. Quadri, S. Meftali, J. Dekeyser","doi":"10.1109/DSD.2010.58","DOIUrl":"https://doi.org/10.1109/DSD.2010.58","url":null,"abstract":"Reconfigurable FPGA based Systems-on-Chip (SoC)architectures are increasingly becoming the preferred solution for implementing modern embedded systems, due to their flexible nature. However due to the tremendous amount of hardware resources available in these systems, new design methodologies and tools are required to reduce their design complexity. In this paper we present an exploratory analysis for specification of these systems, while utilizing the UML MARTE (Modeling and Analysis of Real-time and Embedded Systems) profile. Our contributions permit us to model fine grain reconfigurable FPGA based SoC architectures while extending the profile to integrate new features such as Partial Dynamic Reconfiguration supported by these modern systems. Finally we present the current limitations of the MARTE profile and ask some open questions regarding how these high level models can be effectively used as input for commercial FPGA simulation and synthesis tools. Solutions to these questions can help in creating a design flow from high level models to synthesis, placement and execution of these reconfigurable SoCs.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133629385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}