Pub Date : 2003-05-12DOI: 10.1109/CAMP.2003.1598156
J. Kim, D. S. Wills
Vector quantization (VQ) is widely used for color image and video compression. However, its high computational overhead prohibits many applications in real-time systems. This paper presents a novel method to accelerate full-search VQ algorithm by adding quantized color pack extension (QCPX) instruction set architecture (ISA). QCPX not only supports a packed 16-bit YCbCr data format but also obtains performance and code density improvements through three-color pixels in parallel in a 16-bit width. To measure execution performance of the QCPX instruction set architecture (ISA), it is evaluated in a SIMD pixel array platform developed at Georgia Tech. In addition, by varying the grain size (pixel per processing element, PPE), this study can fully measure the impact of QCPX in the presence of different levels of data parallelism. Simulation results indicate that QCPX version achieves speedups from 27% to 297% over non-QCPX with the most impressive improvements >200 % occurring above the communication-bound 16 PPE granularity. QCPX also reduces average PE idle cycles by 45%. QCPX can be incorporated in range of architectures from current ILP processors to future massively data parallel machines
{"title":"Evaluating color instruction set extension for real-time vector quantization","authors":"J. Kim, D. S. Wills","doi":"10.1109/CAMP.2003.1598156","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598156","url":null,"abstract":"Vector quantization (VQ) is widely used for color image and video compression. However, its high computational overhead prohibits many applications in real-time systems. This paper presents a novel method to accelerate full-search VQ algorithm by adding quantized color pack extension (QCPX) instruction set architecture (ISA). QCPX not only supports a packed 16-bit YCbCr data format but also obtains performance and code density improvements through three-color pixels in parallel in a 16-bit width. To measure execution performance of the QCPX instruction set architecture (ISA), it is evaluated in a SIMD pixel array platform developed at Georgia Tech. In addition, by varying the grain size (pixel per processing element, PPE), this study can fully measure the impact of QCPX in the presence of different levels of data parallelism. Simulation results indicate that QCPX version achieves speedups from 27% to 297% over non-QCPX with the most impressive improvements >200 % occurring above the communication-bound 16 PPE granularity. QCPX also reduces average PE idle cycles by 45%. QCPX can be incorporated in range of architectures from current ILP processors to future massively data parallel machines","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129065228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CAMP.2003.1598144
A. A. Cohen
This paper is concerned with novel approaches to solve performance and addressing problems inherent in computer systems. The core of the neuron-like processing machine (NLPM) comprised of a unique type of computer architecture for constructing massively parallel processing machines, with unique types of pattern recognition system and addressing mechanisms that enhances the performance for identification and retrieval of known data patterns. The architecture of the NLPM has neuronal nodes bundled as groups fully interconnected and structured similar to the human brain, therefore the machine is named, neuron like processing machine (NLPM). The paper describes some of the novel architecture consisting of hierarchical structures comprising processing nodes called super neurons (SN), positioned on geographical maps in which selected nodes are grouped together to form structures of dedicated pattern units (PU) for solving generic pattern recognition problems. The paper gives some description of the pattern unit and NLPM architecture. The NLPM makes connections between pattern unit processing nodes to solve any type of pattern recognition and identification tasks
{"title":"Brain-like computer architecture","authors":"A. A. Cohen","doi":"10.1109/CAMP.2003.1598144","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598144","url":null,"abstract":"This paper is concerned with novel approaches to solve performance and addressing problems inherent in computer systems. The core of the neuron-like processing machine (NLPM) comprised of a unique type of computer architecture for constructing massively parallel processing machines, with unique types of pattern recognition system and addressing mechanisms that enhances the performance for identification and retrieval of known data patterns. The architecture of the NLPM has neuronal nodes bundled as groups fully interconnected and structured similar to the human brain, therefore the machine is named, neuron like processing machine (NLPM). The paper describes some of the novel architecture consisting of hierarchical structures comprising processing nodes called super neurons (SN), positioned on geographical maps in which selected nodes are grouped together to form structures of dedicated pattern units (PU) for solving generic pattern recognition problems. The paper gives some description of the pattern unit and NLPM architecture. The NLPM makes connections between pattern unit processing nodes to solve any type of pattern recognition and identification tasks","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128131804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CAMP.2003.1598163
S. Poussier, H. Rabah, S. Weber
This paper presents a new design and implementation of a system on a programmable chip (SOPC) for smart strain gage conditioner. The system is designed to meet flexibility and complex computations required in thermal compensation algorithms of strain gage. To satisfy the real-time processing constraints in one hand, and parameterization in another hand, parts of the algorithms are implemented in hardware and others are implemented in software. Theses architectures are implemented on a field programmable gate array (FPGA) including a core processor. Five methodologies are developed for the thermal compensation. The first is the classical technique usually used. The second is based on Lagrange interpolation. The third is based on the Newton iteration algorithm. The fourth is based on Neville-Aitken recurrence algorithm. The last is based on the spline interpolation algorithm. Implantations techniques and experimental results are given
{"title":"Design and implementation of a smart strain gage conditioner","authors":"S. Poussier, H. Rabah, S. Weber","doi":"10.1109/CAMP.2003.1598163","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598163","url":null,"abstract":"This paper presents a new design and implementation of a system on a programmable chip (SOPC) for smart strain gage conditioner. The system is designed to meet flexibility and complex computations required in thermal compensation algorithms of strain gage. To satisfy the real-time processing constraints in one hand, and parameterization in another hand, parts of the algorithms are implemented in hardware and others are implemented in software. Theses architectures are implemented on a field programmable gate array (FPGA) including a core processor. Five methodologies are developed for the thermal compensation. The first is the classical technique usually used. The second is based on Lagrange interpolation. The third is based on the Newton iteration algorithm. The fourth is based on Neville-Aitken recurrence algorithm. The last is based on the spline interpolation algorithm. Implantations techniques and experimental results are given","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130285398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CAMP.2003.1598177
P. Chalimbaud, F. Berry, P. Martinet
In this paper, we present a "visual task" which can be considered as a part of a active vision sensor. This task consists in a tracking of gray levels windows of interest. Our approach is based on an efficient matching between hardware architecture and software algorithm. The notion of active detector is introduced in order to take into account the adaptive and local aspect of the processing. To validate our approach, a high speed tracking method based on a CMOS sensor and FPGA is presented. According to the size of the window, the acquisition rate varies from 200 fr/s to 1000 fr/s
{"title":"The task \"template tracking\" in a sensor dedicated to active vision","authors":"P. Chalimbaud, F. Berry, P. Martinet","doi":"10.1109/CAMP.2003.1598177","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598177","url":null,"abstract":"In this paper, we present a \"visual task\" which can be considered as a part of a active vision sensor. This task consists in a tracking of gray levels windows of interest. Our approach is based on an efficient matching between hardware architecture and software algorithm. The notion of active detector is introduced in order to take into account the adaptive and local aspect of the processing. To validate our approach, a high speed tracking method based on a CMOS sensor and FPGA is presented. According to the size of the window, the acquisition rate varies from 200 fr/s to 1000 fr/s","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115222746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2003-05-12DOI: 10.1109/CAMP.2003.1598170
A. Utgikar, G. Seetharaman, H. Le
We design and implement an efficient architecture for geometric computation of the global position of an airborne video camera from images of known landmarks. A solution based on this analysis, a robust Hough transform-like method facilitated by a class of CORDIC-structured computations is implemented within the framework of terrain navigation. It empowers aerial surveillance systems to navigate effectively when the global position and inertial navigation sensors are out of order. This is particularly useful when the GPS functionality is disrupted by jamming and other techniques. Our architecture exploits parallelism among independent operations and uses pipelining of critical components for superior performance. Double precision division being computationally expensive is performed minimally. Correlation between data is tapped to reduce complexity of flash ADCs, at the cost of few clock cycles once to initialize Hough voting
{"title":"VLSI architecture for video-assisted global positioning","authors":"A. Utgikar, G. Seetharaman, H. Le","doi":"10.1109/CAMP.2003.1598170","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598170","url":null,"abstract":"We design and implement an efficient architecture for geometric computation of the global position of an airborne video camera from images of known landmarks. A solution based on this analysis, a robust Hough transform-like method facilitated by a class of CORDIC-structured computations is implemented within the framework of terrain navigation. It empowers aerial surveillance systems to navigate effectively when the global position and inertial navigation sensors are out of order. This is particularly useful when the GPS functionality is disrupted by jamming and other techniques. Our architecture exploits parallelism among independent operations and uses pipelining of critical components for superior performance. Double precision division being computationally expensive is performed minimally. Correlation between data is tapped to reduce complexity of flash ADCs, at the cost of few clock cycles once to initialize Hough voting","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116692891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}