Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625560
U. Mehta, N. Devashrayee, K. Dasgupta
This paper presents a method to compress partially specified test data for a given SoC in Automatic Test Equipment (ATE). A method “Hamming Distance Based 2-Dimensional Reordering with Power Efficient Don't Care Bit Filling” is presented for compression of test data in which two dimensional i.e. row and columnwise test vector reordering and power optimized don't care bit filling method is applied. The advantage of the approach is a good compression with very low test power achieved without adding area overhead. The advantages are shown by experimental results with ISCAS benchmark circuits.
{"title":"Hamming Distance Based 2-D Reordering with Power Efficient Don't Care Bit Filling: Optimizing the test data compression method","authors":"U. Mehta, N. Devashrayee, K. Dasgupta","doi":"10.1109/ISSOC.2010.5625560","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625560","url":null,"abstract":"This paper presents a method to compress partially specified test data for a given SoC in Automatic Test Equipment (ATE). A method “Hamming Distance Based 2-Dimensional Reordering with Power Efficient Don't Care Bit Filling” is presented for compression of test data in which two dimensional i.e. row and columnwise test vector reordering and power optimized don't care bit filling method is applied. The advantage of the approach is a good compression with very low test power achieved without adding area overhead. The advantages are shown by experimental results with ISCAS benchmark circuits.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134006496","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625549
Thomas Coenen, J. Schleifer, O. Weiss, T. Noll
Embedded Field Programmable Gate Arrays (eFPGAs) offer an attractive way to integrate configurable hardware accelerators for signal processing tasks into systems on chips. To achieve maximum efficiency it is furthermore advisable to adapt a parametrizable eFPGA architecture to a specific class of applications. Conventional mapping tools however accommodate only a single architecture resulting in the need of a portable mapping tool. In this paper we propose a method to employ standard VLSI routing tools to solve the eFPGA routing problem for parametrizable architectures.
{"title":"Interconnect routing of embedded FPGAs using standard VLSI routing tools","authors":"Thomas Coenen, J. Schleifer, O. Weiss, T. Noll","doi":"10.1109/ISSOC.2010.5625549","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625549","url":null,"abstract":"Embedded Field Programmable Gate Arrays (eFPGAs) offer an attractive way to integrate configurable hardware accelerators for signal processing tasks into systems on chips. To achieve maximum efficiency it is furthermore advisable to adapt a parametrizable eFPGA architecture to a specific class of applications. Conventional mapping tools however accommodate only a single architecture resulting in the need of a portable mapping tool. In this paper we propose a method to employ standard VLSI routing tools to solve the eFPGA routing problem for parametrizable architectures.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"09 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129393292","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625556
Diandian Zhang, Han Zhang, J. Castrillón, T. Kempf, G. Ascheid, R. Leupers, B. Vanthournout
With increasing complexity of MPSoCs, efficient runtime management of system resources becomes of vital importance for improving the system performance and energy efficiency. OSIP [1] - an operating system application-specific instruction-set processor - provides a promising solution to this. It delivers high computational performance to deal with dynamic task scheduling and mapping, while still being programmable. However, the distributed computation among the different processing elements introduces complexity to the communication architecture, which tends to become the bottleneck of such systems. In this work, we show a detailed analysis and optimization for the communication architecture of OSIP-based MPSoCs. In particular, the joint effects of OSIP and the communication architecture are investigated from the system point of view.
{"title":"Optimized communication architecture of MPSoCs with a hardware scheduler: A system view","authors":"Diandian Zhang, Han Zhang, J. Castrillón, T. Kempf, G. Ascheid, R. Leupers, B. Vanthournout","doi":"10.1109/ISSOC.2010.5625556","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625556","url":null,"abstract":"With increasing complexity of MPSoCs, efficient runtime management of system resources becomes of vital importance for improving the system performance and energy efficiency. OSIP [1] - an operating system application-specific instruction-set processor - provides a promising solution to this. It delivers high computational performance to deal with dynamic task scheduling and mapping, while still being programmable. However, the distributed computation among the different processing elements introduces complexity to the communication architecture, which tends to become the bottleneck of such systems. In this work, we show a detailed analysis and optimization for the communication architecture of OSIP-based MPSoCs. In particular, the joint effects of OSIP and the communication architecture are investigated from the system point of view.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115798616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625543
Zulfiqar Ali, A. Arshad, U. Razzaq, S. Sana, Abdul Haseeb Ahmed, Abdullah M. Harris
A novel rank order statistic calculation algorithm for OS CFAR is presented. OS CFAR gives improved performance in a multitarget environment as compared to CA CFAR. However, the computational requirements of sorting data arrays complicate its implementation. We present an algorithm to overcome this challenge by employing a rank order statistic finding algorithm coupled with the exploitation of parallelism offered by FPGAs. In this technique previously computed results are used to successively divide the data array in order to find the new rank order value. The design is tested on MTI processed data from a TA-10K air traffic control radar and is part of a single chip FPGA based radar signal processor. It is implemented on a Virtex-4SX35 FPGA using the Xilinx XtremeDSP kit.
{"title":"Design and implementation of an OS-CFAR processor based on a new rank order filtering algorithm","authors":"Zulfiqar Ali, A. Arshad, U. Razzaq, S. Sana, Abdul Haseeb Ahmed, Abdullah M. Harris","doi":"10.1109/ISSOC.2010.5625543","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625543","url":null,"abstract":"A novel rank order statistic calculation algorithm for OS CFAR is presented. OS CFAR gives improved performance in a multitarget environment as compared to CA CFAR. However, the computational requirements of sorting data arrays complicate its implementation. We present an algorithm to overcome this challenge by employing a rank order statistic finding algorithm coupled with the exploitation of parallelism offered by FPGAs. In this technique previously computed results are used to successively divide the data array in order to find the new rank order value. The design is tested on MTI processed data from a TA-10K air traffic control radar and is part of a single chip FPGA based radar signal processor. It is implemented on a Virtex-4SX35 FPGA using the Xilinx XtremeDSP kit.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115667614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625531
A. Ghofrani, F. Javaheri, S. Safari, Z. Navabi
The ever-increasing size of digital circuits makes the process of testing such designs more complex everyday. This complexity leads to more complicated logic cones, which results in harder to control and observe nodes in digital circuits. Reduced controllability and observability will decrease circuit's fault coverage, resulting in harder to test circuits.
{"title":"Automatic selection of efficient observability points in combinational gate level circuits using particle swarm optimization","authors":"A. Ghofrani, F. Javaheri, S. Safari, Z. Navabi","doi":"10.1109/ISSOC.2010.5625531","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625531","url":null,"abstract":"The ever-increasing size of digital circuits makes the process of testing such designs more complex everyday. This complexity leads to more complicated logic cones, which results in harder to control and observe nodes in digital circuits. Reduced controllability and observability will decrease circuit's fault coverage, resulting in harder to test circuits.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127503577","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625547
I. Rust, T. Noll
A common and very efficient approach to division and square root is the subtractive SRT algorithm combined with a redundant partial remainder representation like carry-save. A recently proposed modification of the SRT algorithm for division reduces the number of comparators inside the Quotient Digit Selection Function (QDSF) to the number necessary in a non-redundant implementation and derives partial remainders directly from comparison results calculated inside the QDSF. In this paper it is shown that this modified approach is also applicable to square root operations in an efficient way. A combined radix-8 division and square root kernel for double-precision floating point was synthesized using a 40-nm general-purpose cell library. The implementation comprises a critical path of only 20.8 fanout-4 inverter delays at worst case conditions which is comparable to 20.0 inverter delays published for a high-speed radix-4 SRT implementation. Furthermore, the proposed algorithm reduces the total area compared to equivalent SRT-based implementations.
{"title":"A digit-set-interleaved radix-8 division/square root kernel for double-precision floating point","authors":"I. Rust, T. Noll","doi":"10.1109/ISSOC.2010.5625547","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625547","url":null,"abstract":"A common and very efficient approach to division and square root is the subtractive SRT algorithm combined with a redundant partial remainder representation like carry-save. A recently proposed modification of the SRT algorithm for division reduces the number of comparators inside the Quotient Digit Selection Function (QDSF) to the number necessary in a non-redundant implementation and derives partial remainders directly from comparison results calculated inside the QDSF. In this paper it is shown that this modified approach is also applicable to square root operations in an efficient way. A combined radix-8 division and square root kernel for double-precision floating point was synthesized using a 40-nm general-purpose cell library. The implementation comprises a critical path of only 20.8 fanout-4 inverter delays at worst case conditions which is comparable to 20.0 inverter delays published for a high-speed radix-4 SRT implementation. Furthermore, the proposed algorithm reduces the total area compared to equivalent SRT-based implementations.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132837784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625551
Rainer Findenig, W. Ecker
Most of today's designs use a top-down design flow in which hardware is first implemented at transaction level and, as soon as it's functionality is verified, refined to a register transfer model which is conceptually a cycle true and cycle callable model. Traditionally, both the refinement and its validation are done by hand. We propose a design pattern for both the transaction-level and the cycle callable model that eases both steps: the refinement process is made more intuitive and verifying the cycle callable model is greatly simplified by automatically synchronizing the transaction-level model with the refined model.
{"title":"State chart refinement validation from approximately timed to cycle callable models","authors":"Rainer Findenig, W. Ecker","doi":"10.1109/ISSOC.2010.5625551","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625551","url":null,"abstract":"Most of today's designs use a top-down design flow in which hardware is first implemented at transaction level and, as soon as it's functionality is verified, refined to a register transfer model which is conceptually a cycle true and cycle callable model. Traditionally, both the refinement and its validation are done by hand. We propose a design pattern for both the transaction-level and the cycle callable model that eases both steps: the refinement process is made more intuitive and verifying the cycle callable model is greatly simplified by automatically synchronizing the transaction-level model with the refined model.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134343157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625553
Piia Saastamoinen, J. Nurmi
Rapidly evolving markets and application demands of digital consumer electronics are pushing more functionality to software, increasing also requirements for memory capacity of systems. There have been efforts to reduce the need for memories by for example compressing the program code and thus also program memory footprint. We have previously introduced an effective code compression scheme, and in this paper, we present parameterized and flexible decompression hardware for decoding the compressed code. On-chip hardware and decoding tables, that are used to store compressed code sequences, bring only about 3% reduction to the compression ratio, which is 53% at best.
{"title":"Parameterized decompression hardware for a program memory compression system","authors":"Piia Saastamoinen, J. Nurmi","doi":"10.1109/ISSOC.2010.5625553","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625553","url":null,"abstract":"Rapidly evolving markets and application demands of digital consumer electronics are pushing more functionality to software, increasing also requirements for memory capacity of systems. There have been efforts to reduce the need for memories by for example compressing the program code and thus also program memory footprint. We have previously introduced an effective code compression scheme, and in this paper, we present parameterized and flexible decompression hardware for decoding the compressed code. On-chip hardware and decoding tables, that are used to store compressed code sequences, bring only about 3% reduction to the compression ratio, which is 53% at best.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133566564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625566
S. Pande, F. Morgan, Seamus Cawley, Brian McGinley, Snaider Carrillo, J. Harkin, L. McDaid
This paper presents EMBRACE-SysC, a simulation-based design exploration framework for the EMBRACE mixed signal Network on Chip (NoC)-based hardware Spiking Neural Network (SNN) architecture. EMBRACE-SysC incorporates Genetic Algorithm-based training of SNN applications. Results illustrate the application of EMBRACE-SysC for performance analysis of a NoC-based SNN architecture. The development of EMBRACE-SysC introduces a powerful design exploration framework for EMBRACE architecture development.
{"title":"EMBRACE-SysC for analysis of NoC-based Spiking Neural Network architectures","authors":"S. Pande, F. Morgan, Seamus Cawley, Brian McGinley, Snaider Carrillo, J. Harkin, L. McDaid","doi":"10.1109/ISSOC.2010.5625566","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625566","url":null,"abstract":"This paper presents EMBRACE-SysC, a simulation-based design exploration framework for the EMBRACE mixed signal Network on Chip (NoC)-based hardware Spiking Neural Network (SNN) architecture. EMBRACE-SysC incorporates Genetic Algorithm-based training of SNN applications. Results illustrate the application of EMBRACE-SysC for performance analysis of a NoC-based SNN architecture. The development of EMBRACE-SysC introduces a powerful design exploration framework for EMBRACE architecture development.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121192016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2010-11-09DOI: 10.1109/ISSOC.2010.5625563
Subayal Khan, E. Ovaska, Kari Tiensyrjä, J. Nurmi
Performance simulation techniques play a key role in the architectural exploration phase of embedded systems design. Modern mobile devices support diverse applications that are enabled by rapid increase of computational power of mobile platforms. A brisk performance evaluation phase is required after the application modelling to evaluate feasibility of new applications on a platform. To reduce the modeling effort in performance simulation and to reduce time to market, the Application modeling and performance simulation phases must be seamlessly integrated. The landmark techniques in this area are developed around some key concepts which we explain first. Then we investigate each landmark contribution and mention the way each one of them addresses, extends and/or employs these key concepts. After mentioning the related work done in this area, we elaborate the methodology and tools which could be used as a potential solution to achieve the goal of seamless integration of application design and performance simulation.
{"title":"From Y-chart to seamless integration of application design and performance simulation","authors":"Subayal Khan, E. Ovaska, Kari Tiensyrjä, J. Nurmi","doi":"10.1109/ISSOC.2010.5625563","DOIUrl":"https://doi.org/10.1109/ISSOC.2010.5625563","url":null,"abstract":"Performance simulation techniques play a key role in the architectural exploration phase of embedded systems design. Modern mobile devices support diverse applications that are enabled by rapid increase of computational power of mobile platforms. A brisk performance evaluation phase is required after the application modelling to evaluate feasibility of new applications on a platform. To reduce the modeling effort in performance simulation and to reduce time to market, the Application modeling and performance simulation phases must be seamlessly integrated. The landmark techniques in this area are developed around some key concepts which we explain first. Then we investigate each landmark contribution and mention the way each one of them addresses, extends and/or employs these key concepts. After mentioning the related work done in this area, we elaborate the methodology and tools which could be used as a potential solution to achieve the goal of seamless integration of application design and performance simulation.","PeriodicalId":252669,"journal":{"name":"2010 International Symposium on System on Chip","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131649854","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}