Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115370
P. Marwedel
This paper stresses the importance of designing efficient embedded software and it provides a global view of some of the techniques that have been developed to meet this goal. These techniques include high-level transformations, compiler optimizations reducing the energy consumption of embedded programs and optimizations exploiting architectural features of embedded processors. Such optimizations lead to significant reductions of the execution time, the required energy and the memory size of embedded applications. Despite this, they can hardly be found in any available compiler.
{"title":"Embedded software: how to make it efficient?","authors":"P. Marwedel","doi":"10.1109/DSD.2002.1115370","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115370","url":null,"abstract":"This paper stresses the importance of designing efficient embedded software and it provides a global view of some of the techniques that have been developed to meet this goal. These techniques include high-level transformations, compiler optimizations reducing the energy consumption of embedded programs and optimizations exploiting architectural features of embedded processors. Such optimizations lead to significant reductions of the execution time, the required energy and the memory size of embedded applications. Despite this, they can hardly be found in any available compiler.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121790260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115360
Ilia Oussorov, W. Raab, J. Hachmann, A. Kravtsov
This paper discusses the integration of instruction set simulators (ISS) for processor cores into highlevel system models. The approaches to providing data communication between high level modules and ISS are addressed as well as the synchronization between these parts.
{"title":"Integration of instruction set simulators into SystemC high level models","authors":"Ilia Oussorov, W. Raab, J. Hachmann, A. Kravtsov","doi":"10.1109/DSD.2002.1115360","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115360","url":null,"abstract":"This paper discusses the integration of instruction set simulators (ISS) for processor cores into highlevel system models. The approaches to providing data communication between high level modules and ISS are addressed as well as the synchronization between these parts.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117038959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115352
J. Hidalgo, J. Lanchares, Aitor Ibarra, R. Hermida
Genetic algorithms (GAs) are stochastic optimization heuristics in which searches in solution space are carried out by imitating the population genetics stated in Darwin's theory of evolution. The compact genetic algorithm (cGA) does not manage a population of solutions but only mimics its existence. The combination of genetic and local search heuristic has been shown to be an effective approach to solve some optimization problems more efficiently than with a single GA or a cGA. multi-FPGA systems design flow has three major tasks: partitioning, placement and routing. In this paper we present a new hybrid algorithm that exploits a cGA in order to generate high quality partitioning and placement solutions and, by means of a local search heuristic, improves the solutions obtained using a cGA or a GA.
{"title":"A hybrid evolutionary algorithm for Multi-FPGA systems design","authors":"J. Hidalgo, J. Lanchares, Aitor Ibarra, R. Hermida","doi":"10.1109/DSD.2002.1115352","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115352","url":null,"abstract":"Genetic algorithms (GAs) are stochastic optimization heuristics in which searches in solution space are carried out by imitating the population genetics stated in Darwin's theory of evolution. The compact genetic algorithm (cGA) does not manage a population of solutions but only mimics its existence. The combination of genetic and local search heuristic has been shown to be an effective approach to solve some optimization problems more efficiently than with a single GA or a cGA. multi-FPGA systems design flow has three major tasks: partitioning, placement and routing. In this paper we present a new hybrid algorithm that exploits a cGA in order to generate high quality partitioning and placement solutions and, by means of a local search heuristic, improves the solutions obtained using a cGA or a GA.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121402422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115396
M. Molina, J. Mendias, R. Hermida
This paper proposes an allocation algorithm able to perform the combined resource selection and operation binding of multiple-precision specifications that maximizes the bit-level reuse of hardware resources. Additionally, it presents an analytic method to estimate the amount of area that our approach could save in comparison with traditional allocation algorithms. In order to minimize the cost of the implementations obtained, the proposed algorithm produces circuits only influenced by the maximum number of bits calculated per cycle. This approach contrasts with the cost of implementations designed by traditional algorithms, which also depends on the number and widths of the operations executed in every cycle.
{"title":"Bit-level allocation of multiple-precision specifications","authors":"M. Molina, J. Mendias, R. Hermida","doi":"10.1109/DSD.2002.1115396","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115396","url":null,"abstract":"This paper proposes an allocation algorithm able to perform the combined resource selection and operation binding of multiple-precision specifications that maximizes the bit-level reuse of hardware resources. Additionally, it presents an analytic method to estimate the amount of area that our approach could save in comparison with traditional allocation algorithms. In order to minimize the cost of the implementations obtained, the proposed algorithm produces circuits only influenced by the maximum number of bits calculated per cycle. This approach contrasts with the cost of implementations designed by traditional algorithms, which also depends on the number and widths of the operations executed in every cycle.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114563193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115345
D. Hormdee, J. Garside, S. Furber
Memory bandwidth is a limiting factor with many modem microprocessors and it is usual to include a cache to reduce the amount of memory traffic. Of the two commonly used cache write-policies, the copy-back approach is better than the write-through approach in this respect. The performance of both approaches can be further aided by the inclusion of a small buffer in the path of outgoing writes to the main memory, especially if this buffer is capable of forwarding its contents back into the main cache if they are needed again before they are emptied from the buffer This is what is known as a victim cache. For an asynchronous microprocessor it is logical that the cache system should be asynchronous as well; since a large degree of the flexibility of an asynchronous microprocessor would be lost if it were to use a standard synchronous memory interface. However implementing a forwarding mechanism in an asynchronous system is more difficult because the data to be forwarded is flowing in a manner unsynchronised to the process which requires it. This paper presents an architecture for a victim cache to resolve forwarding in a totally asynchronous environment. The resultant structure forms a key part of an asynchronous copy-back cache system for the Amulet3, a third generation asynchronous implementation of the ARM processor.
{"title":"An asynchronous victim cache","authors":"D. Hormdee, J. Garside, S. Furber","doi":"10.1109/DSD.2002.1115345","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115345","url":null,"abstract":"Memory bandwidth is a limiting factor with many modem microprocessors and it is usual to include a cache to reduce the amount of memory traffic. Of the two commonly used cache write-policies, the copy-back approach is better than the write-through approach in this respect. The performance of both approaches can be further aided by the inclusion of a small buffer in the path of outgoing writes to the main memory, especially if this buffer is capable of forwarding its contents back into the main cache if they are needed again before they are emptied from the buffer This is what is known as a victim cache. For an asynchronous microprocessor it is logical that the cache system should be asynchronous as well; since a large degree of the flexibility of an asynchronous microprocessor would be lost if it were to use a standard synchronous memory interface. However implementing a forwarding mechanism in an asynchronous system is more difficult because the data to be forwarded is flowing in a manner unsynchronised to the process which requires it. This paper presents an architecture for a victim cache to resolve forwarding in a totally asynchronous environment. The resultant structure forms a key part of an asynchronous copy-back cache system for the Amulet3, a third generation asynchronous implementation of the ARM processor.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124309801","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115395
R. Katti
Two new signed binary representations are presented that simplify hardware or software necessary in elliptic curve cryptosystems. Simplified algorithms are presented for computing the new binary representations. This speeds up elliptic cryptosystems. The algorithms are useful for smart card and digital signature verification applications. The first algorithm computes a new representation for an integer d and speeds up the computation of d/spl times/P, where P is a point on an elliptic curve. The second algorithm computes a new representation for two integers g and h and speeds up the computation of (g/spl times/P)+(h/spl times/Q), where P and Q are points on an elliptic curve.
本文介绍了两种新的带符号二进制表示法,可简化椭圆曲线密码系统所需的硬件或软件。介绍了计算新二进制表示法的简化算法。这加快了椭圆密码系统的速度。这些算法适用于智能卡和数字签名验证应用。第一种算法计算整数 d 的新表示,加快 d/spl times/P 的计算速度,其中 P 是椭圆曲线上的一个点。第二种算法为两个整数 g 和 h 计算一种新的表示方法,并加快 (g/spl times/P)+(h/spl times/Q) 的计算速度,其中 P 和 Q 是椭圆曲线上的点。
{"title":"Speeding up elliptic cryptosystems using a new signed binary representation for integers","authors":"R. Katti","doi":"10.1109/DSD.2002.1115395","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115395","url":null,"abstract":"Two new signed binary representations are presented that simplify hardware or software necessary in elliptic curve cryptosystems. Simplified algorithms are presented for computing the new binary representations. This speeds up elliptic cryptosystems. The algorithms are useful for smart card and digital signature verification applications. The first algorithm computes a new representation for an integer d and speeds up the computation of d/spl times/P, where P is a point on an elliptic curve. The second algorithm computes a new representation for two integers g and h and speeds up the computation of (g/spl times/P)+(h/spl times/Q), where P and Q are points on an elliptic curve.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128052999","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115373
N. Nedjah, L. M. Mourelle
Modular exponentiation and modular multiplication are the cornerstone computations performed in public-key cryptography systems such as RSA cryptosystem. The operations are time consuming for large operands. Much research effort is directed towards an efficient hardware implementation of both operations. This paper describes the characteristics of two architectures: the first one implements modular multiplication using a systolic version of the fast Montgomery algorithm and the other to implement the parallel binary exponentiation algorithm. The latter uses two Montgomery modular multipliers. Results in terms of space and time requirements for an FPGA prototype are given.
{"title":"Reconfigurable hardware implementation of Montgomery modular multiplication and parallel binary exponentiation","authors":"N. Nedjah, L. M. Mourelle","doi":"10.1109/DSD.2002.1115373","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115373","url":null,"abstract":"Modular exponentiation and modular multiplication are the cornerstone computations performed in public-key cryptography systems such as RSA cryptosystem. The operations are time consuming for large operands. Much research effort is directed towards an efficient hardware implementation of both operations. This paper describes the characteristics of two architectures: the first one implements modular multiplication using a systolic version of the fast Montgomery algorithm and the other to implement the parallel binary exponentiation algorithm. The latter uses two Montgomery modular multipliers. Results in terms of space and time requirements for an FPGA prototype are given.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125619387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115388
Toshinori Sato, I. Arita
Modern microprocessors schedule instructions dynamically in order to exploit instruction-level parallelism. It is necessary to increase instruction window size for improving instruction scheduling capability. However, it is difficult to increase the size without any serious impact on processor performance, since the instruction window is one of the dominant determiners of processor cycle time. The instruction window is critical because it is realized using content addressable memory (CAM). In general, RAMs are faster in access time and lower in power dissipation than CAMs. Therefore, it is desirable that the CAM instruction window is replaced by the RAM instruction window. This paper proposes such an instruction window, named the explicit data forwarding instruction window. The principle behind our proposal is to make result forwarding explicit. It is possible to dynamically construct explicit relationships between instructions, since it is expected that each execution result is forwarded to a limited number of dependent instructions. Simulation results show that the explicit data forwarding instruction window achieves a level of performance comparable to that of the conventional instruction window, while also providing benefit in terms of shorter cycle time.
{"title":"Simplifying instruction issue logic in superscalar processors","authors":"Toshinori Sato, I. Arita","doi":"10.1109/DSD.2002.1115388","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115388","url":null,"abstract":"Modern microprocessors schedule instructions dynamically in order to exploit instruction-level parallelism. It is necessary to increase instruction window size for improving instruction scheduling capability. However, it is difficult to increase the size without any serious impact on processor performance, since the instruction window is one of the dominant determiners of processor cycle time. The instruction window is critical because it is realized using content addressable memory (CAM). In general, RAMs are faster in access time and lower in power dissipation than CAMs. Therefore, it is desirable that the CAM instruction window is replaced by the RAM instruction window. This paper proposes such an instruction window, named the explicit data forwarding instruction window. The principle behind our proposal is to make result forwarding explicit. It is possible to dynamically construct explicit relationships between instructions, since it is expected that each execution result is forwarded to a limited number of dependent instructions. Simulation results show that the explicit data forwarding instruction window achieves a level of performance comparable to that of the conventional instruction window, while also providing benefit in terms of shorter cycle time.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130089207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115387
R. Drechsler, Daniel Große
With ever increasing design sizes, verification becomes the bottleneck in modem design flows. Up to 80% of the overall costs are due to the verification task. Formal methods have been proposed to overcome the limitations of simulation approaches. But these techniques have mainly been applied to lower levels of abstraction. With more and more design complexity the need for hardware description languages with a high level of abstraction becomes obvious. We present a formal verification approach for circuits described in SystemC, an extension of C that allows the modeling of hardware. An algorithm for reachability analysis is proposed and a case study of a scalable bus arbiter cell is given.
{"title":"Reachability analysis for formal verification of SystemC","authors":"R. Drechsler, Daniel Große","doi":"10.1109/DSD.2002.1115387","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115387","url":null,"abstract":"With ever increasing design sizes, verification becomes the bottleneck in modem design flows. Up to 80% of the overall costs are due to the verification task. Formal methods have been proposed to overcome the limitations of simulation approaches. But these techniques have mainly been applied to lower levels of abstraction. With more and more design complexity the need for hardware description languages with a high level of abstraction becomes obvious. We present a formal verification approach for circuits described in SystemC, an extension of C that allows the modeling of hardware. An algorithm for reachability analysis is proposed and a case study of a scalable bus arbiter cell is given.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"14 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132748400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2002-09-04DOI: 10.1109/DSD.2002.1115382
P. Green, M. Vakondios, M. Edwards
In a previous paper we presented the concept, design and implementation of an FPGA run-time support system (FSS) for a dynamically reconfigurable FPGA. In this paper we discuss our experiences in running applications on the system. Problems with tool support meant that the full capability of the device could not be exploited; nevertheless, a significant application was executed under FSS control on the system. We discuss how both the application itself and the FSS were tuned to improve overall performance. The paper concludes by considering how our experience impacts upon the development of FSS-like software for the latest generation of reconfigurable devices.
{"title":"An evaluation of an FPGA run-time support system","authors":"P. Green, M. Vakondios, M. Edwards","doi":"10.1109/DSD.2002.1115382","DOIUrl":"https://doi.org/10.1109/DSD.2002.1115382","url":null,"abstract":"In a previous paper we presented the concept, design and implementation of an FPGA run-time support system (FSS) for a dynamically reconfigurable FPGA. In this paper we discuss our experiences in running applications on the system. Problems with tool support meant that the full capability of the device could not be exploited; nevertheless, a significant application was executed under FSS control on the system. We discuss how both the application itself and the FSS were tuned to improve overall performance. The paper concludes by considering how our experience impacts upon the development of FSS-like software for the latest generation of reconfigurable devices.","PeriodicalId":330609,"journal":{"name":"Proceedings Euromicro Symposium on Digital System Design. Architectures, Methods and Tools","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2002-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132039725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}