Application dependent FPGA testing can reduce time and memory requirements comparing with the tests that exercise complete FPGA structure. This paper describes a methodology of FPGA testing that does not require reconfiguration of the tested hardware and thus it preserves conditions that caused erroneous behavior of the FPGA during its function. We show that the tested part of the FPGA can be efficiently tested by deterministic test patters even in case if we have no precise information about the internal FPGA structure. It is too hardware consuming to store uncompressed deterministic test patterns on the FPGA. From this reason we propose to compress the deterministic test patterns with the help of COMPAS – a compression system that uses scan chains for pattern decompression. COMPAS is well suited for current FPGAs as they can store the scan chain content in the LUT based shift registers. The COMPAS test compression system is based on test pattern overlapping, we propose an improved version of it. Application of overlapped test patterns requires additional shift registers for saving test patterns during test response recording into the internal scan chains. The neighborhood of the tested part of the FPGA can be dynamically reconfigured into shift registers and ORA. The shift registers contain compressed test sequence and allow fast test pattern decompression. Experimental results given in the paper demonstrate efficiency of the proposed FPGA tetste testing method.
{"title":"Application Dependent FPGA Testing Method","authors":"M. Rozkovec, Jiri Jenícek, O. Novák","doi":"10.1109/DSD.2010.65","DOIUrl":"https://doi.org/10.1109/DSD.2010.65","url":null,"abstract":"Application dependent FPGA testing can reduce time and memory requirements comparing with the tests that exercise complete FPGA structure. This paper describes a methodology of FPGA testing that does not require reconfiguration of the tested hardware and thus it preserves conditions that caused erroneous behavior of the FPGA during its function. We show that the tested part of the FPGA can be efficiently tested by deterministic test patters even in case if we have no precise information about the internal FPGA structure. It is too hardware consuming to store uncompressed deterministic test patterns on the FPGA. From this reason we propose to compress the deterministic test patterns with the help of COMPAS – a compression system that uses scan chains for pattern decompression. COMPAS is well suited for current FPGAs as they can store the scan chain content in the LUT based shift registers. The COMPAS test compression system is based on test pattern overlapping, we propose an improved version of it. Application of overlapped test patterns requires additional shift registers for saving test patterns during test response recording into the internal scan chains. The neighborhood of the tested part of the FPGA can be dynamically reconfigured into shift registers and ORA. The shift registers contain compressed test sequence and allow fast test pattern decompression. Experimental results given in the paper demonstrate efficiency of the proposed FPGA tetste testing method.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128060111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.
{"title":"Re-NUCA: Boosting CMP Performance Through Block Replication","authors":"P. Foglia, C. Prete, M. Solinas, Giovanna Monni","doi":"10.1109/DSD.2010.41","DOIUrl":"https://doi.org/10.1109/DSD.2010.41","url":null,"abstract":"Chip Multiprocessor (CMP) systems have become the reference architecture for designing micro-processors, thanks to the improvements in semiconductor nanotechnology that have continuously provided a crescent number of faster and smaller per-chip transistors. The interests for CMPs grew up since classical techniques for boosting performance, e.g. the increase of clock frequency and the amount of work performed at each clock cycle, can no longer deliver to significant improvement due to energy constrains and wire delay effects. CMP systems generally adopt a large last-level-cache (LLC) (typically, L2 or L3) shared among all cores, and private L1 caches. As the miss resolution time for private caches depends on the response time of the LLC, which is wire-delay dominated, performance are affected by wire delay. NUCA caches have been proposed for single and multi core systems as a mechanism for tolerating wire-delay effects on the overall performance. In this paper, we introduce a novel NUCA architecture, called Re-NUCA, specifically suited for (but not limited to) CMPs in which cores are placed at different sides of the shared cache. The idea is to allow shared blocks to be replicated inside the shared cache, in order to avoid the limitations to performance improvements that arise in classical D-NUCA caches due to the conflict hit problem. Our results show that Re-NUCA outperforms D-NUCA of more then 5% on average, but for those applications that strongly suffer from the conflict hit problem we observe performance improvements up to 15%.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127292852","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.
{"title":"Low Power FPGA Implementations of 256-bit Luffa Hash Function","authors":"P. Kitsos, N. Sklavos, A. Skodras","doi":"10.1109/DSD.2010.19","DOIUrl":"https://doi.org/10.1109/DSD.2010.19","url":null,"abstract":"Low power techniques in a FPGA implementation of the hash function called Luffa are presented in this paper. This hash function is under consideration for adoption as standard. Two major gate level techniques are introduced in order to reduce the power consumption, namely the pipeline technique (with some variants) and the use of embedded RAM blocks instead of general purpose logic elements. Power consumption reduction from 1.2 to 8.7 times is achieved by means of the proposed techniques compared with the implementation without any low power issue.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121624968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
F. Lavratti, A. R. Pinto, L. Bolzani, Fabian Vargas, C. Montez, F. Hernandez, E. Gatti, C. Silva
Wireless Sensor Networks (WSNs) can be used to monitor hazardous and inaccessible areas. The WSN is composed of several nodes each provided with its separated power supply, e.g. battery. Working in hardly accessible places it is preferable to assure the adoption of the minimum transmission power in order to prolong as much as possible the WSN’’s lifetime. Though, we have to keep in mind that the reliability of the data transmitted represents a crucial requirement. Therefore, power optimization and reliability have become the most important concerns when dealing with modern systems based on WSN. In this context, we propose to evaluate the effectiveness of a Transmission Power Self-Optimization (TPSO) technique for WSNs in an Electromagnetic Interference (EMI) Environment. The TPSO technique consists of an algorithm able to guarantee an equally high Quality of Service (QoS), concentrating on the WSN’’s Efficiency (Ef), while optimizing the transmission power necessary for data communication. Thus, the main idea behind our approach is to reach a trade-off between Ef and energy consumption in an environment with inherent noise.
{"title":"Evaluating a Transmission Power Self-Optimization Technique for WSN in EMI Environments","authors":"F. Lavratti, A. R. Pinto, L. Bolzani, Fabian Vargas, C. Montez, F. Hernandez, E. Gatti, C. Silva","doi":"10.1109/DSD.2010.116","DOIUrl":"https://doi.org/10.1109/DSD.2010.116","url":null,"abstract":"Wireless Sensor Networks (WSNs) can be used to monitor hazardous and inaccessible areas. The WSN is composed of several nodes each provided with its separated power supply, e.g. battery. Working in hardly accessible places it is preferable to assure the adoption of the minimum transmission power in order to prolong as much as possible the WSN’’s lifetime. Though, we have to keep in mind that the reliability of the data transmitted represents a crucial requirement. Therefore, power optimization and reliability have become the most important concerns when dealing with modern systems based on WSN. In this context, we propose to evaluate the effectiveness of a Transmission Power Self-Optimization (TPSO) technique for WSNs in an Electromagnetic Interference (EMI) Environment. The TPSO technique consists of an algorithm able to guarantee an equally high Quality of Service (QoS), concentrating on the WSN’’s Efficiency (Ef), while optimizing the transmission power necessary for data communication. Thus, the main idea behind our approach is to reach a trade-off between Ef and energy consumption in an environment with inherent noise.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"4 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120891562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Rutgers, P. T. Wolkotte, P. Hölzenspies, J. Kuper, G. Smit
This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.
{"title":"An Approximate Maximum Common Subgraph Algorithm for Large Digital Circuits","authors":"J. Rutgers, P. T. Wolkotte, P. Hölzenspies, J. Kuper, G. Smit","doi":"10.1109/DSD.2010.29","DOIUrl":"https://doi.org/10.1109/DSD.2010.29","url":null,"abstract":"This paper presents an approximate Maximum Common Sub graph (MCS) algorithm, specifically for directed, cyclic graphs representing digital circuits. Because of the application domain, the graphs have nice properties: they are very sparse, have many different labels, and most vertices have only one predecessor. The algorithm iterates over all vertices once and uses heuristics to find the MCS. It is linear in computational complexity with respect to the size of the graph. Experiments show that very large common sub graphs were found in graphs of up to 200,000 vertices within a few minutes, when a quarter or less of the graphs differ. The variation in run-time and quality of the result is low.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121316926","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sameer D. Sahasrabuddhe, S. Subramanian, Kunal P. Ghosh, K. Arya, M. Desai
We present a high-level synthesis flow for mapping an algorithm description (in C) to a provably equivalent register transfer level (RTL) description of hardware. This flow uses an intermediate representation which is an orthogonal factorization of the program behavior into control, data and memory aspects, and is suitable for the description of large systems. We show that optimizations such as arbiter-less resource sharing can be efficiently computed on this representation. We apply the flow to a wide range of examples ranging from stream ciphers to database and linear algebra applications. The resulting RTL is then put through a standard ASIC tool chain (synthesis followed by automatic place-and-route), and the performance and power dissipation of the resulting layout is computed. We observe that the energy consumption (per completed task) of each resulting circuit is considerably lower than that of an equivalent executable running on a low-power processor, indicating that this C-to-RTL flow offers an energy efficient alternative to the use of embedded processors in mapping algorithms to digital VLSI systems.
{"title":"A C-to-RTL Flow as an Energy Efficient Alternative to Embedded Processors in Digital Systems","authors":"Sameer D. Sahasrabuddhe, S. Subramanian, Kunal P. Ghosh, K. Arya, M. Desai","doi":"10.1109/DSD.2010.52","DOIUrl":"https://doi.org/10.1109/DSD.2010.52","url":null,"abstract":"We present a high-level synthesis flow for mapping an algorithm description (in C) to a provably equivalent register transfer level (RTL) description of hardware. This flow uses an intermediate representation which is an orthogonal factorization of the program behavior into control, data and memory aspects, and is suitable for the description of large systems. We show that optimizations such as arbiter-less resource sharing can be efficiently computed on this representation. We apply the flow to a wide range of examples ranging from stream ciphers to database and linear algebra applications. The resulting RTL is then put through a standard ASIC tool chain (synthesis followed by automatic place-and-route), and the performance and power dissipation of the resulting layout is computed. We observe that the energy consumption (per completed task) of each resulting circuit is considerably lower than that of an equivalent executable running on a low-power processor, indicating that this C-to-RTL flow offers an energy efficient alternative to the use of embedded processors in mapping algorithms to digital VLSI systems.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127047891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The ever increasing density of integration makes the NoC a relevant communication design paradigm even for FPGAs. But NoC are always designed without considerations of applications and programming models, like busses and crossbars. Dealing with parallelism is still challenging. One way is to take into account the parallel programming model and application field in the design of the NoC, to reduce the semantic gap between application and implementation. In this paper we present a NoC and a design flow which target the implementation of streaming applications, e.g. image and video processing. The NoC topology is described as a matrix of routers (maybe a sparse matrix) mapped on a matrix of FPGAs for prototyping, which brings up a hierarchical dimension. Besides, the NoC has been developed in conjunction with a streaming programming model expressed with a subset of System C language. This allows optimizing the NoC by implementing the communication and synchronization primitives’mechanisms of the programming model directly in hardware: the size of such a router connected to 4 processing elements is about 2000 CLB from Xilinx FPGA, which is comparable with the size of a single processor. The design flow automates the implementation of an application expressed with a System C subset to a NoC based architecture.
{"title":"A Programming Model and a NoC-Based Architecture for Streaming Applications","authors":"Yun Wu, D. Houzet, Sylvain Huet","doi":"10.1109/DSD.2010.66","DOIUrl":"https://doi.org/10.1109/DSD.2010.66","url":null,"abstract":"The ever increasing density of integration makes the NoC a relevant communication design paradigm even for FPGAs. But NoC are always designed without considerations of applications and programming models, like busses and crossbars. Dealing with parallelism is still challenging. One way is to take into account the parallel programming model and application field in the design of the NoC, to reduce the semantic gap between application and implementation. In this paper we present a NoC and a design flow which target the implementation of streaming applications, e.g. image and video processing. The NoC topology is described as a matrix of routers (maybe a sparse matrix) mapped on a matrix of FPGAs for prototyping, which brings up a hierarchical dimension. Besides, the NoC has been developed in conjunction with a streaming programming model expressed with a subset of System C language. This allows optimizing the NoC by implementing the communication and synchronization primitives’mechanisms of the programming model directly in hardware: the size of such a router connected to 4 processing elements is about 2000 CLB from Xilinx FPGA, which is comparable with the size of a single processor. The design flow automates the implementation of an application expressed with a System C subset to a NoC based architecture.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"21 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125685240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an unified digit-serial systolic multiplication architecture for all-one polynomials (AOP) and trinomial over GF (2m) for efficient implementation of Montgomery Multiplication (MM) algorithm suitable for cryptosystem. This is the first reported unified digit serial systolic digit level pipelined MM architecture for AOP and trinomials over GF (2). Analysis shows that the latency and circuit complexity of the proposed architecture are significantly less compared to earlier design for same class of polynomials. The proposed multiplier has clock cycle latency of (2N) where N=ém/Lù, m is the word size and L is the digit size.
{"title":"Unified Digit Serial Systolic Montgomery Multiplication Architecture for Special Classes of Polynomials over GF(2m)","authors":"S. Talapatra, H. Rahaman, Samir K. Saha","doi":"10.1109/DSD.2010.59","DOIUrl":"https://doi.org/10.1109/DSD.2010.59","url":null,"abstract":"This paper presents an unified digit-serial systolic multiplication architecture for all-one polynomials (AOP) and trinomial over GF (2m) for efficient implementation of Montgomery Multiplication (MM) algorithm suitable for cryptosystem. This is the first reported unified digit serial systolic digit level pipelined MM architecture for AOP and trinomials over GF (2). Analysis shows that the latency and circuit complexity of the proposed architecture are significantly less compared to earlier design for same class of polynomials. The proposed multiplier has clock cycle latency of (2N) where N=ém/Lù, m is the word size and L is the digit size.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132773426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
StarSS is a parallel programming model that eases the task of the programmer. He or she has to identify the tasks that can potentially be executed in parallel and the inputs and outputs of these tasks, while the runtime system takes care of the difficult issues of determining inter task dependencies, synchronization, load balancing, scheduling to optimize data locality, etc. Given these issues, however, the runtime system might become a bottleneck that limits the scalability of the system. The contribution of this paper is two-fold. First, we analyze the scalability of the current software runtime system for several synthetic benchmarks with different dependency patterns and task sizes. We show that for fine-grained tasks the system does not scale beyond five cores. Furthermore, we identify the main scalability bottlenecks of the runtime system. Second, we present the design of Nexus, a hardware support system for StarSS applications, that greatly reduces the task management overhead.
{"title":"A Case for Hardware Task Management Support for the StarSS Programming Model","authors":"C. Meenderinck, B. Juurlink","doi":"10.1109/DSD.2010.63","DOIUrl":"https://doi.org/10.1109/DSD.2010.63","url":null,"abstract":"StarSS is a parallel programming model that eases the task of the programmer. He or she has to identify the tasks that can potentially be executed in parallel and the inputs and outputs of these tasks, while the runtime system takes care of the difficult issues of determining inter task dependencies, synchronization, load balancing, scheduling to optimize data locality, etc. Given these issues, however, the runtime system might become a bottleneck that limits the scalability of the system. The contribution of this paper is two-fold. First, we analyze the scalability of the current software runtime system for several synthetic benchmarks with different dependency patterns and task sizes. We show that for fine-grained tasks the system does not scale beyond five cores. Furthermore, we identify the main scalability bottlenecks of the runtime system. Second, we present the design of Nexus, a hardware support system for StarSS applications, that greatly reduces the task management overhead.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116554513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper is going to address the topic of hardware/software systems co-design. The paper will develop two points of view. First, it provides a system-theoretical layout on the problem of designing hardware-software systems. This layout will enable the designer to proceed systematically in optimizing the tradeoff between the desired functionality, available resources and operating conditions. Second, the paper will describe an application of some of the theoretical principles to the design of an embedded automotive system built on a low-cost FPGA.
{"title":"A Design Process for Hardware/Software System Co-design and its Application to Designing a Reconfigurable FPGA","authors":"F. Moreno, I. López, R. Sanz","doi":"10.1109/DSD.2010.43","DOIUrl":"https://doi.org/10.1109/DSD.2010.43","url":null,"abstract":"This paper is going to address the topic of hardware/software systems co-design. The paper will develop two points of view. First, it provides a system-theoretical layout on the problem of designing hardware-software systems. This layout will enable the designer to proceed systematically in optimizing the tradeoff between the desired functionality, available resources and operating conditions. Second, the paper will describe an application of some of the theoretical principles to the design of an embedded automotive system built on a low-cost FPGA.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128447527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}