A correlation matrix memory (CMM) is a form of binary neural network, that can be used for high-speed approximate search and match operations on large unstructured datasets. Typically, the processing requirements for a CMM do not map efficiently onto a modern processor based system. Therefore, an application specific co-processor is normally used to improve performance. This paper outlines two possible FPGA based co-processors for executing core CMM operations based upon a compact bit vector (CBV) data format. This representation significantly increases a system's storage capacity, but reduces processing performance.
{"title":"Designing a binary neural network co-processor","authors":"M. Freeman, J. Austin","doi":"10.1109/DSD.2005.34","DOIUrl":"https://doi.org/10.1109/DSD.2005.34","url":null,"abstract":"A correlation matrix memory (CMM) is a form of binary neural network, that can be used for high-speed approximate search and match operations on large unstructured datasets. Typically, the processing requirements for a CMM do not map efficiently onto a modern processor based system. Therefore, an application specific co-processor is normally used to improve performance. This paper outlines two possible FPGA based co-processors for executing core CMM operations based upon a compact bit vector (CBV) data format. This representation significantly increases a system's storage capacity, but reduces processing performance.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"217 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134261921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents a novel method for automatic functional vectors generation from RT-level HDL descriptions based on path coverage and constraint solving. Compared with existing method, the advantage of this method includes: 1) it avoids generating redundant constraints, which will accelerate the test generation process, 2) it solves the problem of how to propagate the internal values to the primary inputs with decision models, 3) it can handle various HDL description styles, and various styles of designs. Experimental results conduct on several practical designs show that our method can efficiently improve the functional vectors generation process. The prototype system has been applied to verify RTL description of a real 32-bits microprocessor core and complex bugs remained hidden in the RTL descriptions are detected.
{"title":"Functional vectors generation for RT-level Verilog descriptions based on path enumeration and constraint logic programming","authors":"Tun Li, Yang Guo, GongJie Liu, Sikun Li","doi":"10.1109/DSD.2005.43","DOIUrl":"https://doi.org/10.1109/DSD.2005.43","url":null,"abstract":"This paper presents a novel method for automatic functional vectors generation from RT-level HDL descriptions based on path coverage and constraint solving. Compared with existing method, the advantage of this method includes: 1) it avoids generating redundant constraints, which will accelerate the test generation process, 2) it solves the problem of how to propagate the internal values to the primary inputs with decision models, 3) it can handle various HDL description styles, and various styles of designs. Experimental results conduct on several practical designs show that our method can efficiently improve the functional vectors generation process. The prototype system has been applied to verify RTL description of a real 32-bits microprocessor core and complex bugs remained hidden in the RTL descriptions are detected.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115549181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
It is a challenge to provide an efficient healthcare service for countries with continental dimensions. Mechanisms for a more efficient and better attendance of patients are necessary due to the increasing costs of health care systems. This work proposes the development of a system for monitoring vital signs (including ECG) through PDAs. Therefore, this has made possible the local attendance of patients by medical practitioners (here called health agents) with the support of specialist's physicians through a second opinion system. The proposed approach supports: recording and visualization of ECG waveforms. Moreover, patient's information can be transmitted to and from a remote health care server. In order to make easier the use by doctors and health agents, a user-friendly graphical interface has been developed. Methods for an efficient data access have been also developed to cope with storage constraints of PDAs.
{"title":"Vital signs remote management system for PDAs","authors":"Danielly Cruz, E. Barros","doi":"10.1109/DSD.2005.76","DOIUrl":"https://doi.org/10.1109/DSD.2005.76","url":null,"abstract":"It is a challenge to provide an efficient healthcare service for countries with continental dimensions. Mechanisms for a more efficient and better attendance of patients are necessary due to the increasing costs of health care systems. This work proposes the development of a system for monitoring vital signs (including ECG) through PDAs. Therefore, this has made possible the local attendance of patients by medical practitioners (here called health agents) with the support of specialist's physicians through a second opinion system. The proposed approach supports: recording and visualization of ECG waveforms. Moreover, patient's information can be transmitted to and from a remote health care server. In order to make easier the use by doctors and health agents, a user-friendly graphical interface has been developed. Methods for an efficient data access have been also developed to cope with storage constraints of PDAs.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122791543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Several methods improving the fault coverage in mixed-mode BIST are presented in this paper. The test is divided into two phases: the pseudo-random and deterministic. Maximum of faults should be detected by the pseudo-random phase, to reduce the number of faults to be covered in the deterministic one. We study the properties of different pseudo-random pattern generators. Their successful ness in fault covering strictly depends on the tested circuit. We examine properties of LFSRs and cellular automata. Four methods enhancing the pseudo-random fault coverage have been proposed. Then we propose a universal method to efficiently compute test weights. The observations are documented on some of the standard ISCAS benchmarks and the final BIST circuitry is synthesized using the column-matching method.
{"title":"Improvement of the fault coverage of the pseudo-random phase in column-matching BIST","authors":"Peter Filter, H. Kubátová","doi":"10.1109/DSD.2005.51","DOIUrl":"https://doi.org/10.1109/DSD.2005.51","url":null,"abstract":"Several methods improving the fault coverage in mixed-mode BIST are presented in this paper. The test is divided into two phases: the pseudo-random and deterministic. Maximum of faults should be detected by the pseudo-random phase, to reduce the number of faults to be covered in the deterministic one. We study the properties of different pseudo-random pattern generators. Their successful ness in fault covering strictly depends on the tested circuit. We examine properties of LFSRs and cellular automata. Four methods enhancing the pseudo-random fault coverage have been proposed. Then we propose a universal method to efficiently compute test weights. The observations are documented on some of the standard ISCAS benchmarks and the final BIST circuitry is synthesized using the column-matching method.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120990735","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The opportunities created by modern microelectronic technology cannot effectively be exploited, because of weaknesses in traditional circuit synthesis methods used in today's CAD tools. In this paper, a new information-driven circuit synthesis method is discussed that targets combinational circuits implemented with gates. The synthesis method is based on our original information-driven approach to circuit synthesis, bottom-up general functional decomposition and theory of information relationship measures, and considerably differs from all other known methods. The discussion is focused on various sub-function construction methods used during the synthesis. The experimental results from the automatic circuit synthesis tool that implements the method show that the developed by us specific sub-function construction methods targeted to the gate-based circuits deliver much better circuits than the other methods and demonstrate that the information-driven general decomposition produces very fast and compact gate-based circuits.
{"title":"High-quality sub-function construction in the information-driven circuit synthesis with gates","authors":"L. Józwiak, S. Bieganski","doi":"10.1109/DSD.2005.48","DOIUrl":"https://doi.org/10.1109/DSD.2005.48","url":null,"abstract":"The opportunities created by modern microelectronic technology cannot effectively be exploited, because of weaknesses in traditional circuit synthesis methods used in today's CAD tools. In this paper, a new information-driven circuit synthesis method is discussed that targets combinational circuits implemented with gates. The synthesis method is based on our original information-driven approach to circuit synthesis, bottom-up general functional decomposition and theory of information relationship measures, and considerably differs from all other known methods. The discussion is focused on various sub-function construction methods used during the synthesis. The experimental results from the automatic circuit synthesis tool that implements the method show that the developed by us specific sub-function construction methods targeted to the gate-based circuits deliver much better circuits than the other methods and demonstrate that the information-driven general decomposition produces very fast and compact gate-based circuits.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125079947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper contributes to a dynamic branch predictor algorithm based on a perceptron in two directions: Firstly, a new block form of computation is introduced that reduces theoretically by half the combinational critical path for computing a prediction. Secondly, implementation in FPGA hardware is fully developed for quantitative comparison purposes. FPGA circuits for a one-cycle block predictor produces 1.7 faster clock rates than a direct implementation of the original perceptron predictor. This faster clock allows to realize predictions with longer history lengths for the same hardware budget.
{"title":"Implementation of a block based neural branch predictor","authors":"O. Cadenas, G. Megson, Daniel Jones","doi":"10.1109/DSD.2005.49","DOIUrl":"https://doi.org/10.1109/DSD.2005.49","url":null,"abstract":"This paper contributes to a dynamic branch predictor algorithm based on a perceptron in two directions: Firstly, a new block form of computation is introduced that reduces theoretically by half the combinational critical path for computing a prediction. Secondly, implementation in FPGA hardware is fully developed for quantitative comparison purposes. FPGA circuits for a one-cycle block predictor produces 1.7 faster clock rates than a direct implementation of the original perceptron predictor. This faster clock allows to realize predictions with longer history lengths for the same hardware budget.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126920608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper describes the advanced real-time processor architecture (ARPA) system-on-chip. The goal of this work is to create a technology independent and synthetizable system-on-chip (SoC) model for real-time applications. The main component of the SoC is a MIPS32-based RISC processor. It is implemented using a pipelined simultaneous multithreading structure that supports the execution of more than one thread or task at a time. The synergy between pipelining and simultaneous multithreading allows combining the exploration of Instruction level parallelism and task level parallelism, hide the context switching time and reduce the need of employing complex speculative execution techniques to improve the performance of the pipelined processor. A fundamental component of the ARPA SoC is the operating system coprocessor, which implements in hardware some of the operating systems functions, such as task scheduling, switching, communication and timing. The proposed architecture allows building flexible, high performance, time predictable and power efficient processors optimized for embedded real-time systems.
{"title":"ARPA - a technology independent and synthetizable system-on-chip model for real-time applications","authors":"Arnaldo S. R. Oliveira, V. Sklyarov, A. Ferrari","doi":"10.1109/DSD.2005.19","DOIUrl":"https://doi.org/10.1109/DSD.2005.19","url":null,"abstract":"This paper describes the advanced real-time processor architecture (ARPA) system-on-chip. The goal of this work is to create a technology independent and synthetizable system-on-chip (SoC) model for real-time applications. The main component of the SoC is a MIPS32-based RISC processor. It is implemented using a pipelined simultaneous multithreading structure that supports the execution of more than one thread or task at a time. The synergy between pipelining and simultaneous multithreading allows combining the exploration of Instruction level parallelism and task level parallelism, hide the context switching time and reduce the need of employing complex speculative execution techniques to improve the performance of the pipelined processor. A fundamental component of the ARPA SoC is the operating system coprocessor, which implements in hardware some of the operating systems functions, such as task scheduling, switching, communication and timing. The proposed architecture allows building flexible, high performance, time predictable and power efficient processors optimized for embedded real-time systems.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126578234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. Raik, P. Ellervee, Valentin Tihhomirov, R. Ubar
Current paper presents new alternatives for accelerating the task of fault simulation for sequential circuits by hardware emulation on FPGA. Fault simulation is an important subtask in test pattern generation and it is frequently used throughout the test generation process. The problems associated to fault emulation for sequential circuits are explained and alternative implementations are discussed. An environment for hardware emulation of fault simulation is presented. It incorporates hardware support for fault dropping. The proposed approach allows simulation speed-up of 40 to 500 times as compared to the state-of-the-art in fault simulation. Average speedup provided by the method is 250 that is about an order of magnitude higher than previously cited in the literature. Based on the experiments, we can conclude that it is beneficial to use emulation when large numbers of test vectors is required.
{"title":"Improved fault emulation for synchronous sequential circuits","authors":"J. Raik, P. Ellervee, Valentin Tihhomirov, R. Ubar","doi":"10.1109/DSD.2005.50","DOIUrl":"https://doi.org/10.1109/DSD.2005.50","url":null,"abstract":"Current paper presents new alternatives for accelerating the task of fault simulation for sequential circuits by hardware emulation on FPGA. Fault simulation is an important subtask in test pattern generation and it is frequently used throughout the test generation process. The problems associated to fault emulation for sequential circuits are explained and alternative implementations are discussed. An environment for hardware emulation of fault simulation is presented. It incorporates hardware support for fault dropping. The proposed approach allows simulation speed-up of 40 to 500 times as compared to the state-of-the-art in fault simulation. Average speedup provided by the method is 250 that is about an order of magnitude higher than previously cited in the literature. Based on the experiments, we can conclude that it is beneficial to use emulation when large numbers of test vectors is required.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126693841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Jayapal, S. Ramachandran, R. Bhutada, Y. Manoli
Due to the power limitation in wireless sensor nodes, special attention is required in optimizing the power consumption of the necessary electronics on a node. This can be done at different levels of abstraction, while architectural level optimization brings a major power reduction due to the fact that any changes made at this level of abstraction will be reflected back to the lower levels, all other levels must be also considered in an overall power reduction strategy. This paper discusses different possibilities of power reduction at system, architectural, and circuit level of the node's electronics. It also addresses different communication protocols and their effect on the power consumption of a wireless sensor node.
{"title":"Optimization of electronic power consumption in wireless sensor nodes","authors":"S. Jayapal, S. Ramachandran, R. Bhutada, Y. Manoli","doi":"10.1109/DSD.2005.60","DOIUrl":"https://doi.org/10.1109/DSD.2005.60","url":null,"abstract":"Due to the power limitation in wireless sensor nodes, special attention is required in optimizing the power consumption of the necessary electronics on a node. This can be done at different levels of abstraction, while architectural level optimization brings a major power reduction due to the fact that any changes made at this level of abstraction will be reflected back to the lower levels, all other levels must be also considered in an overall power reduction strategy. This paper discusses different possibilities of power reduction at system, architectural, and circuit level of the node's electronics. It also addresses different communication protocols and their effect on the power consumption of a wireless sensor node.","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127156540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Current microprocessors are optimized for the average use. Nevertheless, it is known that different applications impose different demands on the system. This work focuses on the reconfiguration of the first-level caches. In order to achieve good performance, the first-level cache is split physically into two parts, one for instruction and one for data. This separation has the benefit of avoiding interference between instructions and data. Nevertheless, this separation is strict and determined at design-time. In this work we show a cache design that is able to change the split dynamically at runtime. The proposed design was tested using simulation of a variety of benchmark applications from the MiBench suite on two baseline architectures: embeddedXScale and high-end PowerPC. The results show that, while the average misses rate reduction may seem small; certain applications show a benefit larger than 90%. For miss rate reduction, the dynamic split cache seems to be more relevant for the cache with the smaller associativity (PowerPC). Lastly, the dynamic split cache was also used to reduce the energy consumption without loss of performance. This feature resulted in a significant energy reduction and the results showed that it has a bigger impact for the caches with larger associativity (42% energy reduction for the XScale and 28% for the PowerPC for a large data set size).
{"title":"Dynamic split: flexible border between instruction and data cache","authors":"P. Trancoso","doi":"10.1109/DSD.2005.35","DOIUrl":"https://doi.org/10.1109/DSD.2005.35","url":null,"abstract":"Current microprocessors are optimized for the average use. Nevertheless, it is known that different applications impose different demands on the system. This work focuses on the reconfiguration of the first-level caches. In order to achieve good performance, the first-level cache is split physically into two parts, one for instruction and one for data. This separation has the benefit of avoiding interference between instructions and data. Nevertheless, this separation is strict and determined at design-time. In this work we show a cache design that is able to change the split dynamically at runtime. The proposed design was tested using simulation of a variety of benchmark applications from the MiBench suite on two baseline architectures: embeddedXScale and high-end PowerPC. The results show that, while the average misses rate reduction may seem small; certain applications show a benefit larger than 90%. For miss rate reduction, the dynamic split cache seems to be more relevant for the cache with the smaller associativity (PowerPC). Lastly, the dynamic split cache was also used to reduce the energy consumption without loss of performance. This feature resulted in a significant energy reduction and the results showed that it has a bigger impact for the caches with larger associativity (42% energy reduction for the XScale and 28% for the PowerPC for a large data set size).","PeriodicalId":119054,"journal":{"name":"8th Euromicro Conference on Digital System Design (DSD'05)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116176449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}