Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606837
T. Disz, R. Olson, R. Stevens
The Argonne Voyager Multimedia Server is being developed in the Futures Lab of the Mathematics and Computer Science Division at Argonne National Laboratory. As a network based service for recording and playing multimedia streams, it is important that the Voyager system be capable of sustaining certain minimal levels of performance in order for it to be a viable system. In this article, we examine the performance characteristics of the server. As we examine the architecture of the system, we try to determine where bottlenecks lie, show actual vs potential performance, and recommend areas for improvement through custom architectures and system tuning.
{"title":"Performance model of the Argonne Voyager multimedia server","authors":"T. Disz, R. Olson, R. Stevens","doi":"10.1109/ASAP.1997.606837","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606837","url":null,"abstract":"The Argonne Voyager Multimedia Server is being developed in the Futures Lab of the Mathematics and Computer Science Division at Argonne National Laboratory. As a network based service for recording and playing multimedia streams, it is important that the Voyager system be capable of sustaining certain minimal levels of performance in order for it to be a viable system. In this article, we examine the performance characteristics of the server. As we examine the architecture of the system, we try to determine where bottlenecks lie, show actual vs potential performance, and recommend areas for improvement through custom architectures and system tuning.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"59 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132185376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606828
U. Eckhardt, R. Merker
We consider a balanced combined application of the known LPGS- and LSGP-partitioning which we call co-partitioning. This approach allows a structural adjustment of the array design as well as a balancing of the size of the local memory and the IO-demand between the processing elements of the co-partitioned array. We determine the size of the LSGP-partitions such that there exists a sequential scheduling within the LSGP-partitions which is free of wait states. We give the proof for the existence of such a scheduling, and we give explicit formulas for the lower and upper bounds of the loops of a for-loop program which represents one of the possible sequential schedulings.
{"title":"Scheduling in co-partitioned array architectures","authors":"U. Eckhardt, R. Merker","doi":"10.1109/ASAP.1997.606828","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606828","url":null,"abstract":"We consider a balanced combined application of the known LPGS- and LSGP-partitioning which we call co-partitioning. This approach allows a structural adjustment of the array design as well as a balancing of the size of the local memory and the IO-demand between the processing elements of the co-partitioned array. We determine the size of the LSGP-partitions such that there exists a sequential scheduling within the LSGP-partitions which is free of wait states. We give the proof for the existence of such a scheduling, and we give explicit formulas for the lower and upper bounds of the loops of a for-loop program which represents one of the possible sequential schedulings.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115104869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606831
S. Bhattacharyya, P. Murthy, Edward A. Lee
This paper reviews a set of techniques for compiling dataflow-based, graphical programs for digital signal processing (DSP) applications into efficient implementations on programmable digital signal processors. This is a critical problem because programmable digital signal processors have very limited amounts of on-chip memory and the speed power, and financial cost penalties for using off-chip memory are often prohibitively high for the types of applications, typically embedded systems, in which these processors are used. The compilation techniques described in this paper are developed for the synchronous dataflow model of computation, a model that has found widespread use for specifying and prototyping DSP systems.
{"title":"Optimized software synthesis for synchronous dataflow","authors":"S. Bhattacharyya, P. Murthy, Edward A. Lee","doi":"10.1109/ASAP.1997.606831","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606831","url":null,"abstract":"This paper reviews a set of techniques for compiling dataflow-based, graphical programs for digital signal processing (DSP) applications into efficient implementations on programmable digital signal processors. This is a critical problem because programmable digital signal processors have very limited amounts of on-chip memory and the speed power, and financial cost penalties for using off-chip memory are often prohibitively high for the types of applications, typically embedded systems, in which these processors are used. The compilation techniques described in this paper are developed for the synchronous dataflow model of computation, a model that has found widespread use for specifying and prototyping DSP systems.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132799774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606811
Dominique Soguet
The watershed transform is a very powerful segmentation tool which comes directly from the idea of watershed line in geohydrology. It has proved its efficiency in many computer vision application fields. This paper presents a new implementation of the watershed which is optimal according to computation time. The flooding algorithm is reminded. Then, a massively parallel cellular automaton is proposed to propagate data using this approach. We discuss the pros and cons of a hardware implementation and give an example of application. A comparison between the results obtained and theoretical limit cases is also presented.
{"title":"A massively parallel implementation of the watershed based on cellular automata","authors":"Dominique Soguet","doi":"10.1109/ASAP.1997.606811","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606811","url":null,"abstract":"The watershed transform is a very powerful segmentation tool which comes directly from the idea of watershed line in geohydrology. It has proved its efficiency in many computer vision application fields. This paper presents a new implementation of the watershed which is optimal according to computation time. The flooding algorithm is reminded. Then, a massively parallel cellular automaton is proposed to propagate data using this approach. We discuss the pros and cons of a hardware implementation and give an example of application. A comparison between the results obtained and theoretical limit cases is also presented.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133528018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606845
C. Araujo, M. V. Santos, E. Barros
In this paper we present the development and implementation of an intravenous infusion controller system based on FPGA's. The system receives information of an infusion drop sensor and controls the drop flow by giving the direction and number of steps of a stepper motor, which compress the drip-feed hose. The system consists of a mixed implementation of software and hardware. The software was implemented in C++ and the hardware was implemented by using FPGA's.
{"title":"A FPGA-based implementation of an intravenous infusion controller system","authors":"C. Araujo, M. V. Santos, E. Barros","doi":"10.1109/ASAP.1997.606845","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606845","url":null,"abstract":"In this paper we present the development and implementation of an intravenous infusion controller system based on FPGA's. The system receives information of an infusion drop sensor and controls the drop flow by giving the direction and number of steps of a stepper motor, which compress the drip-feed hose. The system consists of a mixed implementation of software and hardware. The software was implemented in C++ and the hardware was implemented by using FPGA's.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123505272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606850
S. Pees, M. Vaupel, V. Zivojnovic, H. Meyr
In this survey, key drivers in design methodology are provided that enable successful design of systems-on-a-chip for the highly competitive telecommunications market. Main components of a design environment are described that fulfill the requirements of today's system design: efficient verification by means of fast simulation, integration of intellectual property, support of HW/SW co-design by means of a generic machine description language, generation of dedicated hardware blocks for high speed applications, and the link from system level performance evaluation to implementations in hardware and software.
{"title":"On core and more: a design perspective for systems-on-a-chip","authors":"S. Pees, M. Vaupel, V. Zivojnovic, H. Meyr","doi":"10.1109/ASAP.1997.606850","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606850","url":null,"abstract":"In this survey, key drivers in design methodology are provided that enable successful design of systems-on-a-chip for the highly competitive telecommunications market. Main components of a design environment are described that fulfill the requirements of today's system design: efficient verification by means of fast simulation, integration of intellectual property, support of HW/SW co-design by means of a generic machine description language, generation of dedicated hardware blocks for high speed applications, and the link from system level performance evaluation to implementations in hardware and software.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127959542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606856
Jui-Hua Li, N. Ling
In this paper, we present an efficient MPEG-2 video decoder architecture design to meet MP@ML real-time decoding requirement. The overall architecture, as well as the design of the major function-specific processing blocks, such as the variable-length decoder, the inverse 2-D discrete cosine transform unit, and the motion compensation unit, are discussed. A hierarchical and distributed controller approach is used and a bus-monitoring model for different bus arbitration schemes to control external DRAM accesses is developed and the system is simulated. Practical issues and buffer sizes are addressed. With a 27 MHz clock, our architecture uses much fewer than the 667 cycles, upper bond for the MP@ML decoding requirement, to decode each macroblock with a single external bus and DRAM.
{"title":"An efficient video decoder design for MPEG-2 MP@ML","authors":"Jui-Hua Li, N. Ling","doi":"10.1109/ASAP.1997.606856","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606856","url":null,"abstract":"In this paper, we present an efficient MPEG-2 video decoder architecture design to meet MP@ML real-time decoding requirement. The overall architecture, as well as the design of the major function-specific processing blocks, such as the variable-length decoder, the inverse 2-D discrete cosine transform unit, and the motion compensation unit, are discussed. A hierarchical and distributed controller approach is used and a bus-monitoring model for different bus arbitration schemes to control external DRAM accesses is developed and the system is simulated. Practical issues and buffer sizes are addressed. With a 27 MHz clock, our architecture uses much fewer than the 667 cycles, upper bond for the MP@ML decoding requirement, to decode each macroblock with a single external bus and DRAM.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115762062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606838
Jing Chen, V. Taylor
The interconnection of geographically distributed supercomputers via high-speed networks allows users to access the needed compute power for large-scale, complex applications. For efficient use of such systems, the variance in processor performance and network (i.e., interconnection network versus wide area network) performance must be considered. In this paper, we present a decomposition tool, called PART, for distributed systems. PART takes into consideration the variance in performance of the networks and processors as well as the computational complexity of the application. This is achieved via the parameters used in the objective function of simulated annealing. The initial version of PART focuses on finite element based problems. The results of using PART demonstrate a 30% reduction in execution time as compared to using conventional schemes that partition the problem domain into equal-sized subdomains.
{"title":"PART: a partitioning tool for efficient use of distributed systems","authors":"Jing Chen, V. Taylor","doi":"10.1109/ASAP.1997.606838","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606838","url":null,"abstract":"The interconnection of geographically distributed supercomputers via high-speed networks allows users to access the needed compute power for large-scale, complex applications. For efficient use of such systems, the variance in processor performance and network (i.e., interconnection network versus wide area network) performance must be considered. In this paper, we present a decomposition tool, called PART, for distributed systems. PART takes into consideration the variance in performance of the networks and processors as well as the computational complexity of the application. This is achieved via the parameters used in the objective function of simulated annealing. The initial version of PART focuses on finite element based problems. The results of using PART demonstrate a 30% reduction in execution time as compared to using conventional schemes that partition the problem domain into equal-sized subdomains.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114469698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606854
P. Rieder, J. Nossek
In this paper the efficient implementation of different types of orthogonal wavelet transforms with respect to practical applications is discussed. Orthogonal single-wavelet transforms being based on one scaling function and one wavelet function are used for denosing of signals. Orthogonal multiwavelets are based on several scaling functions and several wavelets. Since they allow properties like regularity, orthogonality and symmetry being impossible in the single-wavelet case, multiwavelets are well suited bases for image compression applications. With respect to an efficient implementation of these orthogonal wavelet transforms approximating the exact rotation angles of the corresponding orthogonal wavelet lattice filters by using very few CORDIC-based elementary rotations reduces the number of shift and add operations significantly. The performance of the resulting, computationally cheap, approximated wavelet transforms with respect to practical applications is discussed in this paper.
{"title":"Implementation of orthogonal wavelet transforms and their applications","authors":"P. Rieder, J. Nossek","doi":"10.1109/ASAP.1997.606854","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606854","url":null,"abstract":"In this paper the efficient implementation of different types of orthogonal wavelet transforms with respect to practical applications is discussed. Orthogonal single-wavelet transforms being based on one scaling function and one wavelet function are used for denosing of signals. Orthogonal multiwavelets are based on several scaling functions and several wavelets. Since they allow properties like regularity, orthogonality and symmetry being impossible in the single-wavelet case, multiwavelets are well suited bases for image compression applications. With respect to an efficient implementation of these orthogonal wavelet transforms approximating the exact rotation angles of the corresponding orthogonal wavelet lattice filters by using very few CORDIC-based elementary rotations reduces the number of shift and add operations significantly. The performance of the resulting, computationally cheap, approximated wavelet transforms with respect to practical applications is discussed in this paper.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115861214","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 1997-07-14DOI: 10.1109/ASAP.1997.606808
V. Roychowdhury, M. Anantram
The past decade has seen tremendous experimental and theoretical progress in the field of mesoscopic devices and molecular self assembly techniques, leading to laboratory demonstration of many new device concepts. While these studies have been important from a fundamental physics perspective, it has been recognized by many that they may offer new insights into building a future generation of computing machines. This has recently led to a number of proposals for computing machines which use these new and novel device concepts. In this paper, we explain the physical principles behind the operation of one of these proposals, namely the ground state computing model. These computational models share some of the characteristics of the well-known systolic type processor arrays, namely spatial locality, and functional uniformity. In particular, we study the effect of metastable states on the relaxation process (and hence information propagation) in locally coupled and boundary-driven structures. We first give a general argument to show that metastable states are inevitable even in the simplest of structures, a wire. At finite temperatures, the relaxation mechanism is a thermally assisted random walk. The time required to reach the ground state and its life time are determined by the coupling parameters. These time scales are studied in a model based on an array of quantum dots.
{"title":"On computing with locally-interconnected architectures in atomic/nanoelectronic systems","authors":"V. Roychowdhury, M. Anantram","doi":"10.1109/ASAP.1997.606808","DOIUrl":"https://doi.org/10.1109/ASAP.1997.606808","url":null,"abstract":"The past decade has seen tremendous experimental and theoretical progress in the field of mesoscopic devices and molecular self assembly techniques, leading to laboratory demonstration of many new device concepts. While these studies have been important from a fundamental physics perspective, it has been recognized by many that they may offer new insights into building a future generation of computing machines. This has recently led to a number of proposals for computing machines which use these new and novel device concepts. In this paper, we explain the physical principles behind the operation of one of these proposals, namely the ground state computing model. These computational models share some of the characteristics of the well-known systolic type processor arrays, namely spatial locality, and functional uniformity. In particular, we study the effect of metastable states on the relaxation process (and hence information propagation) in locally coupled and boundary-driven structures. We first give a general argument to show that metastable states are inevitable even in the simplest of structures, a wire. At finite temperatures, the relaxation mechanism is a thermally assisted random walk. The time required to reach the ground state and its life time are determined by the coupling parameters. These time scales are studied in a model based on an array of quantum dots.","PeriodicalId":368315,"journal":{"name":"Proceedings IEEE International Conference on Application-Specific Systems, Architectures and Processors","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124922891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}