Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285751
N. Mentens, K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede
This paper describes the design of a programmable coprocessor for public key cryptography (PKC) on an FPGA. The implementation provides a very broad range of functions together with countermeasures against side-channel analysis (SCA) attacks. The functions are implemented in a hierarchical manner, where all levels are accessible by the user. This makes the coprocessor very flexible and particularly suitable to be used in embedded environments where the border between hardware and software needs to be decided depending on the application. Especially for RSA, the resulting implementation on an XC3S5000 FPGA, from the low-cost Spartan series of Xilinx, shows comparable performance figures compared to the state-of- the-art in PKC coprocessors.
{"title":"A Side-channel Attack Resistant Programmable PKC Coprocessor for Embedded Applications","authors":"N. Mentens, K. Sakiyama, L. Batina, B. Preneel, I. Verbauwhede","doi":"10.1109/ICSAMOS.2007.4285751","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285751","url":null,"abstract":"This paper describes the design of a programmable coprocessor for public key cryptography (PKC) on an FPGA. The implementation provides a very broad range of functions together with countermeasures against side-channel analysis (SCA) attacks. The functions are implemented in a hierarchical manner, where all levels are accessible by the user. This makes the coprocessor very flexible and particularly suitable to be used in embedded environments where the border between hardware and software needs to be decided depending on the application. Especially for RSA, the resulting implementation on an XC3S5000 FPGA, from the low-cost Spartan series of Xilinx, shows comparable performance figures compared to the state-of- the-art in PKC coprocessors.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134008994","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285732
G. P. Vayá, J. Martín-Langerwerf, Piriya Taptimthong, P. Pirsch
This paper describes an enhanced list scheduling algorithm used on a parameterized assembler. The assembler, which is configurable in terms of architectural parameters, is used on a new environment system for exploring and optimizing VLIW architectures for multimedia applications. A generic VLIW architecture with a novel register file structure is used as a base architecture. The proposed scheduling algorithm includes sophisticated features. A backtracking technique allows to undo inappropriate scheduling decisions, while an advanced resource conflict function allows to work with different VLIW architecture configurations. Moreover, local register allocation in conjunction with the instruction scheduling process is also implemented for obtaining better code compaction. Two different multimedia tasks are implemented to check the correctness of the generated code for different architecture configurations. The code compaction efficiency, when scheduling these applications for different VLIW architecture configurations with a partitioned register file and limited number of functional units, reaches up to 94% of the compaction efficiency for the same configuration with an unconstrained register file and unlimited number of functional units.
{"title":"Design Space Exploration of Media Processors: A Parameterized Scheduler","authors":"G. P. Vayá, J. Martín-Langerwerf, Piriya Taptimthong, P. Pirsch","doi":"10.1109/ICSAMOS.2007.4285732","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285732","url":null,"abstract":"This paper describes an enhanced list scheduling algorithm used on a parameterized assembler. The assembler, which is configurable in terms of architectural parameters, is used on a new environment system for exploring and optimizing VLIW architectures for multimedia applications. A generic VLIW architecture with a novel register file structure is used as a base architecture. The proposed scheduling algorithm includes sophisticated features. A backtracking technique allows to undo inappropriate scheduling decisions, while an advanced resource conflict function allows to work with different VLIW architecture configurations. Moreover, local register allocation in conjunction with the instruction scheduling process is also implemented for obtaining better code compaction. Two different multimedia tasks are implemented to check the correctness of the generated code for different architecture configurations. The code compaction efficiency, when scheduling these applications for different VLIW architecture configurations with a partitioned register file and limited number of functional units, reaches up to 94% of the compaction efficiency for the same configuration with an unconstrained register file and unlimited number of functional units.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129405888","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285743
Kehuai Wu, J. Madsen
Dynamically reconfigurable systems demand complicated run-time management. Due to resource constraints and reconfiguration latencies, efficient reconfiguration strategies that can reduce the overhead cost of dynamic reconfiguration need to be studied. In this paper, we i) propose a reconfigurable task model which extends the classical real-time task model to support the additional states and latencies needed to capture dynamically reconfigurable behavior, ii) propose a coprocessor- coupled reconfigurable architecture which has hardware runtime support for task execution, task reallocation and resource management, and iii) present a SystemC based framework to model and simulate coprocessor-coupled reconfigurable systems. We illustrate how COSMOS may be used to capture the dynamic behavior of such systems and emphasize the need for capturing the system aspects of such systems in order to deal with future design challenges of dynamically reconfigurable systems.
{"title":"COSMOS: A System-Level Modelling and Simulation Framework for Coprocessor-Coupled Reconfigurable Systems","authors":"Kehuai Wu, J. Madsen","doi":"10.1109/ICSAMOS.2007.4285743","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285743","url":null,"abstract":"Dynamically reconfigurable systems demand complicated run-time management. Due to resource constraints and reconfiguration latencies, efficient reconfiguration strategies that can reduce the overhead cost of dynamic reconfiguration need to be studied. In this paper, we i) propose a reconfigurable task model which extends the classical real-time task model to support the additional states and latencies needed to capture dynamically reconfigurable behavior, ii) propose a coprocessor- coupled reconfigurable architecture which has hardware runtime support for task execution, task reallocation and resource management, and iii) present a SystemC based framework to model and simulate coprocessor-coupled reconfigurable systems. We illustrate how COSMOS may be used to capture the dynamic behavior of such systems and emphasize the need for capturing the system aspects of such systems in order to deal with future design challenges of dynamically reconfigurable systems.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127287865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285746
N. Vassiliadis, G. Theodoridis, S. Nikolaidis
In this paper, we introduce the ARISE framework for the systematic extension of typical processors with the necessary infrastructure to support arbitrary number and type of reconfigurable hardware units. ARISE extends the micro-architecture of the processor with an interface to allow the coupling of the hardware units. Furthermore, the instruction set of the processor is extended with instructions which expose to the programmer/compiler the full control of the interface. This control includes the configuration of operations on the hardware units, execution of these operations, and communication of data between the processor and the units. The new instructions are incorporated without the need to redesign the processor instruction set architecture. To evaluate our proposal a model of an ARISE extended MIPS processor has been designed. Using a turbodecoder algorithm as benchmarking application a simulation of the ARISE model has been performed. Performance results show impressive application speedups up to times7.5.
{"title":"The ARISE Reconfigurable Instruction Set Extensions Framework","authors":"N. Vassiliadis, G. Theodoridis, S. Nikolaidis","doi":"10.1109/ICSAMOS.2007.4285746","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285746","url":null,"abstract":"In this paper, we introduce the ARISE framework for the systematic extension of typical processors with the necessary infrastructure to support arbitrary number and type of reconfigurable hardware units. ARISE extends the micro-architecture of the processor with an interface to allow the coupling of the hardware units. Furthermore, the instruction set of the processor is extended with instructions which expose to the programmer/compiler the full control of the interface. This control includes the configuration of operations on the hardware units, execution of these operations, and communication of data between the processor and the units. The new instructions are incorporated without the need to redesign the processor instruction set architecture. To evaluate our proposal a model of an ARISE extended MIPS processor has been designed. Using a turbodecoder algorithm as benchmarking application a simulation of the ARISE model has been performed. Performance results show impressive application speedups up to times7.5.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126730004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we present an automated bus matrix synthesis flow for efficient system-on-chip communication design space exploration at the transaction level. Especially, we consider hardware interface design, since it affects overall system cost and performance. Depending on the bus interface, a hardware block can be a master or a slave. We propose a method to solve such hardware interface selection problem by analyzing communication behavior statically. In addition, in order to explore communication design space fast, we automatically generate transaction level models for the hardware blocks according to the hardware interface selection. The synthesis result is verified by transaction level simulation with a commercial tool. We give experimental results with JPEG encoder and H.264 encoder to demonstrate the efficiency of the proposed method. The results show that with our automated synthesis flow, the designer can easily and quickly obtain better communication designs through fast design space exploration. More specifically, our hardware interface selection technique is successful in achieving reduction of area of bus matrix by 41.43% with 0.58% performance overhead on average compared to the case of maximum performance.
{"title":"Automatic Bus Matrix Synthesis based on Hardware Interface Selection for Fast Communication Design Space Exploration","authors":"Ganghee Lee, Seokhyun Lee, Yongjin Ahn, Kiyoung Choi","doi":"10.1109/ICSAMOS.2007.4285733","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285733","url":null,"abstract":"In this paper, we present an automated bus matrix synthesis flow for efficient system-on-chip communication design space exploration at the transaction level. Especially, we consider hardware interface design, since it affects overall system cost and performance. Depending on the bus interface, a hardware block can be a master or a slave. We propose a method to solve such hardware interface selection problem by analyzing communication behavior statically. In addition, in order to explore communication design space fast, we automatically generate transaction level models for the hardware blocks according to the hardware interface selection. The synthesis result is verified by transaction level simulation with a commercial tool. We give experimental results with JPEG encoder and H.264 encoder to demonstrate the efficiency of the proposed method. The results show that with our automated synthesis flow, the designer can easily and quickly obtain better communication designs through fast design space exploration. More specifically, our hardware interface selection technique is successful in achieving reduction of area of bus matrix by 41.43% with 0.58% performance overhead on average compared to the case of maximum performance.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125276514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285735
F. Cazorla, E. Fernández, P. Knijnenburg, Alex Ramírez, R. Sakellariou, M. Valero
Most research work on (simultaneous multithreading processors) SMTs focuses on improving throughput and/or fairness, or on prioritizing some threads over others in a workload. In this paper, we discuss a new problem not previously addressed in the SMT literature. We call this problem workload execution time (WET) minimization. It consists of reducing the total execution time of all threads in a workload. This problem arises in parallel applications, where it is common for a single master thread to spawn several child jobs. The master job cannot continue until all child jobs have finished. Reducing the overall execution time is important to speedup the application. This paper is a first step in analyzing this problem. First, we analyze the WET provided by the best fetch policies turned at improving throughput/fairness. We demonstrate that these policies achieve less than optimum performance. We show that, on average, for the workloads evaluated in this paper, there is space for improvement of up to 18 percentage points. It follows that novel mechanisms trying to reduce WET are required to speedup parallel applications.
{"title":"On the Problem of Minimizing Workload Execution Time in SMT Processors","authors":"F. Cazorla, E. Fernández, P. Knijnenburg, Alex Ramírez, R. Sakellariou, M. Valero","doi":"10.1109/ICSAMOS.2007.4285735","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285735","url":null,"abstract":"Most research work on (simultaneous multithreading processors) SMTs focuses on improving throughput and/or fairness, or on prioritizing some threads over others in a workload. In this paper, we discuss a new problem not previously addressed in the SMT literature. We call this problem workload execution time (WET) minimization. It consists of reducing the total execution time of all threads in a workload. This problem arises in parallel applications, where it is common for a single master thread to spawn several child jobs. The master job cannot continue until all child jobs have finished. Reducing the overall execution time is important to speedup the application. This paper is a first step in analyzing this problem. First, we analyze the WET provided by the best fetch policies turned at improving throughput/fairness. We demonstrate that these policies achieve less than optimum performance. We show that, on average, for the workloads evaluated in this paper, there is space for improvement of up to 18 percentage points. It follows that novel mechanisms trying to reduce WET are required to speedup parallel applications.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122424586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285737
Antonino Tumeo, Marco Branca, L. Camerini, M. Monchiero, G. Palermo, Fabrizio Ferrandi, D. Sciuto
Interrupt-based programming is widely used for interfacing a processor with peripherals and allowing software threads to interact. Many hardware/software architectures have been proposed in the past to support this kind of programming practice. In the context of FPGA-based multiprocessors this topic has not been thoroughly faced yet. This paper presents the architecture of an interrupt controller for a FPGA-based multiprocessor composed of standard off-of-the-shelf softcores. The main feature of this device is to distribute multiple interrupts across the cores of a multiprocessor. In addition, our architecture supports several advanced features like booking, broadcasting and inter-processor interrupt. On the top of this hardware layer, we provide a software library to effectively exploit this mechanism. We realized a prototype of this system. Our experiments show that our interrupt controller efficiently distributes multiple interrupts on the system.
{"title":"An Interrupt Controller for FPGA-based Multiprocessors","authors":"Antonino Tumeo, Marco Branca, L. Camerini, M. Monchiero, G. Palermo, Fabrizio Ferrandi, D. Sciuto","doi":"10.1109/ICSAMOS.2007.4285737","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285737","url":null,"abstract":"Interrupt-based programming is widely used for interfacing a processor with peripherals and allowing software threads to interact. Many hardware/software architectures have been proposed in the past to support this kind of programming practice. In the context of FPGA-based multiprocessors this topic has not been thoroughly faced yet. This paper presents the architecture of an interrupt controller for a FPGA-based multiprocessor composed of standard off-of-the-shelf softcores. The main feature of this device is to distribute multiple interrupts across the cores of a multiprocessor. In addition, our architecture supports several advanced features like booking, broadcasting and inter-processor interrupt. On the top of this hardware layer, we provide a software library to effectively exploit this mechanism. We realized a prototype of this system. Our experiments show that our interrupt controller efficiently distributes multiple interrupts on the system.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133445421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285739
S. Cuenca, A. Martínez-Álvarez, A. Jimeno-Morenilla, J. L. Sánchez
Tool path generation is one of the most complex problems in computer aided manufacturing. Although some efficient strategies have been developed, most of them are only useful for standard machining. The algorithm called virtual digitizing avoids this problem by its own definition but its computing cost is high and makes it difficult for being integrated in standard machining in order to adopt the new ISO standard 14649. Presented in the paper there is a virtual digitizing hardware/software architecture that takes advantage of field programmable gate arrays (FPGAs) to improve the algorithm efficiency and to meet the actual restrictions of the traditional computer numeric control systems at the same time. In order to evaluate the architecture, a prototype was implemented using a commercial reconfigurable platform integrated within a CNC lathe for shoe last machining. The performance of the system for tool path generation was measured for different trajectory and surface precisions using a database of real shoe models. The experiments show a significant speedup for all the cases and maintaining the error of the results below the maximum allowed.
{"title":"A Hardware/Software Architecture for Tool Path Computation. An Application to Turning Lathe Machining","authors":"S. Cuenca, A. Martínez-Álvarez, A. Jimeno-Morenilla, J. L. Sánchez","doi":"10.1109/ICSAMOS.2007.4285739","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285739","url":null,"abstract":"Tool path generation is one of the most complex problems in computer aided manufacturing. Although some efficient strategies have been developed, most of them are only useful for standard machining. The algorithm called virtual digitizing avoids this problem by its own definition but its computing cost is high and makes it difficult for being integrated in standard machining in order to adopt the new ISO standard 14649. Presented in the paper there is a virtual digitizing hardware/software architecture that takes advantage of field programmable gate arrays (FPGAs) to improve the algorithm efficiency and to meet the actual restrictions of the traditional computer numeric control systems at the same time. In order to evaluate the architecture, a prototype was implemented using a commercial reconfigurable platform integrated within a CNC lathe for shoe last machining. The performance of the system for tool path generation was measured for different trajectory and surface precisions using a database of real shoe models. The experiments show a significant speedup for all the cases and maintaining the error of the results below the maximum allowed.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115806635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285734
Lazaros Papadopoulos, Christos Baloukas, N. Zompakis, D. Soudris
In the last years, there is a trend towards network and multimedia applications to be implemented in portable devices. These applications usually contain complex dynamic data structures. The appropriate selection of the dynamic data type (DDT) combination of an application affects the performance and the energy consumption of the whole system. Thus, DDT exploration methodology is used to perform tradeoffs between design factors, such as performance and energy consumption. In this paper we provide a new approach to the DDT exploration procedure, based on a new library of DDTs which remedies the limitations of an existing and allows the DDT optimization of a wide range of application domains. Using the new library, we performed DDT exploration in network and multimedia benchmarks and achieved performance and energy consumption improvements up to 85% and 43% respectively.
{"title":"Systematic Data Structure Exploration of Multimedia and Network Applications realized Embedded Systems","authors":"Lazaros Papadopoulos, Christos Baloukas, N. Zompakis, D. Soudris","doi":"10.1109/ICSAMOS.2007.4285734","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285734","url":null,"abstract":"In the last years, there is a trend towards network and multimedia applications to be implemented in portable devices. These applications usually contain complex dynamic data structures. The appropriate selection of the dynamic data type (DDT) combination of an application affects the performance and the energy consumption of the whole system. Thus, DDT exploration methodology is used to perform tradeoffs between design factors, such as performance and energy consumption. In this paper we provide a new approach to the DDT exploration procedure, based on a new library of DDTs which remedies the limitations of an existing and allows the DDT optimization of a wide range of application domains. Using the new library, we performed DDT exploration in network and multimedia benchmarks and achieved performance and energy consumption improvements up to 85% and 43% respectively.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127272924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2007-07-16DOI: 10.1109/ICSAMOS.2007.4285731
C. Kachris, S. Vassiliadis
Current FPGAs provide a powerful platform for network processing applications. The main challenge is the exploitation of the reconfiguration to increase the performance of the system. In this paper, a design space exploration framework is presented to design a reconfigurable platform for multi-service network processing applications. An integrated design flow is presented from the system level analytical design to the implementation level. Furthermore, the design of an efficient configuration manager is presented in which the platform adaptation is performed for optimum speedup with minimum overhead taking into account the reconfiguration overhead and the network characteristics (packet type distribution, network stability). Finally, a case study is presented in which the platform is used to process three network flows with different processing requirements.
{"title":"Design Space Exploration of Configuration Manager for Network Processing Applications","authors":"C. Kachris, S. Vassiliadis","doi":"10.1109/ICSAMOS.2007.4285731","DOIUrl":"https://doi.org/10.1109/ICSAMOS.2007.4285731","url":null,"abstract":"Current FPGAs provide a powerful platform for network processing applications. The main challenge is the exploitation of the reconfiguration to increase the performance of the system. In this paper, a design space exploration framework is presented to design a reconfigurable platform for multi-service network processing applications. An integrated design flow is presented from the system level analytical design to the implementation level. Furthermore, the design of an efficient configuration manager is presented in which the platform adaptation is performed for optimum speedup with minimum overhead taking into account the reconfiguration overhead and the network characteristics (packet type distribution, network stability). Finally, a case study is presented in which the platform is used to process three network flows with different processing requirements.","PeriodicalId":106933,"journal":{"name":"2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2007-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122258702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}