首页 > 最新文献

2003 IEEE International Workshop on Computer Architectures for Machine Perception最新文献

英文 中文
Monitoring terrain database integrity through aircraft sensor consistency checking: architecture and flight test results 通过飞机传感器一致性检查监测地形数据库完整性:架构和飞行测试结果
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598153
M. U. de Haag, J. Sayre, J. Campbell, S. Young
This paper discusses the architecture and flight test results of a digital elevation model (DEM) integrity monitor for a synthetic vision system (SVS). An SVS provides pilots with either a heads down display (HDD) or a heads up display (HUD) containing aircraft state, guidance and navigation information, and a virtual depiction of the terrain as viewed "from the cockpit". Introduction of SVS technology in the aircraft flight deck has the potential to improve flight safety by increasing the situational awareness (SA) in low to near zero-visibility conditions to a level of awareness similar to daytime clear weather flying. This SA improvement does not only enable low-visibility operations, but may also reduce the likelihood of controlled flight into terrain (CFIT)
本文讨论了一种用于合成视觉系统(SVS)的数字高程模型(DEM)完整性监测器的结构和飞行试验结果。SVS为飞行员提供头向下显示(HDD)或头向上显示(HUD),其中包含飞机状态、制导和导航信息,以及“从驾驶舱”看到的虚拟地形描述。在飞机飞行甲板上引入SVS技术,通过将低能见度至接近零能见度条件下的态势感知(SA)提高到类似于白天晴朗天气飞行的感知水平,有可能提高飞行安全性。这种SA的改进不仅可以实现低能见度操作,还可以减少控制飞行进入地形(CFIT)的可能性。
{"title":"Monitoring terrain database integrity through aircraft sensor consistency checking: architecture and flight test results","authors":"M. U. de Haag, J. Sayre, J. Campbell, S. Young","doi":"10.1109/CAMP.2003.1598153","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598153","url":null,"abstract":"This paper discusses the architecture and flight test results of a digital elevation model (DEM) integrity monitor for a synthetic vision system (SVS). An SVS provides pilots with either a heads down display (HDD) or a heads up display (HUD) containing aircraft state, guidance and navigation information, and a virtual depiction of the terrain as viewed \"from the cockpit\". Introduction of SVS technology in the aircraft flight deck has the potential to improve flight safety by increasing the situational awareness (SA) in low to near zero-visibility conditions to a level of awareness similar to daytime clear weather flying. This SA improvement does not only enable low-visibility operations, but may also reduce the likelihood of controlled flight into terrain (CFIT)","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122804674","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memory considerations for high performance SIMD systems with on-chip control 具有片上控制的高性能SIMD系统的内存考虑
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598157
M. Herbordt, Jade Cravyt, Calvin Lint
Although arrays of SIMD PEs can be built with very high operating frequencies, problems exist in keeping the array busy. The inherent mismatch between host and array makes it difficult to maintain high array utilization: either the rate of instruction issue is very low or PE data locality is compromised, having the same effect. Our solution is based on an array control unit (ACU) design that expands macroinstructions in two stages, first by data tile and then into microinstructions. The expansion itself solves the issue problem; decoupling the expansion modalities maintains data locality. Several issues involving host/ACU interaction need to be resolved to effect this solution. We present experimental results showing that our approach delivers substantial improvement in memory hierarchy performance: a cache of only one fourth the size is sufficient to achieve the same performance as previous approaches
尽管SIMD pe阵列可以使用非常高的工作频率构建,但在保持阵列繁忙方面存在问题。主机和阵列之间固有的不匹配使得很难保持较高的阵列利用率:要么指令发布率非常低,要么PE数据局部性受到损害,都具有相同的效果。我们的解决方案是基于阵列控制单元(ACU)设计,分两个阶段扩展宏指令,首先是数据块,然后是微指令。扩张本身解决了问题;解耦扩展模式保持了数据的局部性。要实现此解决方案,需要解决涉及主机/ACU交互的几个问题。我们提供的实验结果表明,我们的方法在内存层次结构性能方面提供了实质性的改进:只有四分之一大小的缓存足以达到与以前方法相同的性能
{"title":"Memory considerations for high performance SIMD systems with on-chip control","authors":"M. Herbordt, Jade Cravyt, Calvin Lint","doi":"10.1109/CAMP.2003.1598157","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598157","url":null,"abstract":"Although arrays of SIMD PEs can be built with very high operating frequencies, problems exist in keeping the array busy. The inherent mismatch between host and array makes it difficult to maintain high array utilization: either the rate of instruction issue is very low or PE data locality is compromised, having the same effect. Our solution is based on an array control unit (ACU) design that expands macroinstructions in two stages, first by data tile and then into microinstructions. The expansion itself solves the issue problem; decoupling the expansion modalities maintains data locality. Several issues involving host/ACU interaction need to be resolved to effect this solution. We present experimental results showing that our approach delivers substantial improvement in memory hierarchy performance: a cache of only one fourth the size is sufficient to achieve the same performance as previous approaches","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129090104","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A parallel algorithm and architecture for object recognition in images 图像中目标识别的并行算法和体系结构
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598158
K. Sitaraman, A. Ejnioui, N. Ranganathan
The problem of tree pattern matching for object recognition in images is computationally intensive in nature. In two-dimensional images, the objects can be represented through multiscale decomposition as tree structures. The pattern tree representing an object can be matched with a subject tree representing an image in order to detect the objects within the image. Several sequential, parallel and hardware algorithms exist in the literature for tree pattern matching. In this paper, we describe a new parallel algorithm and its realization as a VLSI chip for tree pattern matching. The hardware algorithm is based on a linear array of processing elements (PEs) where the pattern matching is done in a pipelined fashion relying on nearest-neighbor communication between the PE's and the subject and pattern trees of arbitrary length can be processed using a fixed size PE array. The algorithm has an improved execution time of O(lceilm/arceiln) required to perform the matching where m, a and n are the sizes of the pattern tree, processor array, subject tree respectively. A prototype CMOS VLSI chip implementing the proposed algorithm has been designed and verified. It is shown that the hardware algorithm proposed in this work represent a sign improvement in terms of computational complexity, data flow, and architecture over the ones previously proposed for this problem
图像中目标识别的树模式匹配问题本质上是计算密集型的。在二维图像中,物体可以通过多尺度分解表示为树形结构。表示对象的模式树可以与表示图像的主题树相匹配,以便检测图像中的对象。对于树模式匹配,文献中存在顺序、并行和硬件算法。在本文中,我们描述了一种新的并行算法及其在VLSI芯片上的实现。硬件算法基于处理元素的线性阵列(PE),其中模式匹配以流水线方式完成,依赖于PE与主题之间的最近邻通信,并且可以使用固定大小的PE阵列处理任意长度的模式树。该算法执行匹配所需的执行时间为O(lceilm/arceiln),其中m、a、n分别为模式树、处理器阵列、主题树的大小。设计并验证了实现该算法的CMOS VLSI原型芯片。结果表明,本工作中提出的硬件算法在计算复杂性、数据流和架构方面比先前提出的算法有了明显的改进
{"title":"A parallel algorithm and architecture for object recognition in images","authors":"K. Sitaraman, A. Ejnioui, N. Ranganathan","doi":"10.1109/CAMP.2003.1598158","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598158","url":null,"abstract":"The problem of tree pattern matching for object recognition in images is computationally intensive in nature. In two-dimensional images, the objects can be represented through multiscale decomposition as tree structures. The pattern tree representing an object can be matched with a subject tree representing an image in order to detect the objects within the image. Several sequential, parallel and hardware algorithms exist in the literature for tree pattern matching. In this paper, we describe a new parallel algorithm and its realization as a VLSI chip for tree pattern matching. The hardware algorithm is based on a linear array of processing elements (PEs) where the pattern matching is done in a pipelined fashion relying on nearest-neighbor communication between the PE's and the subject and pattern trees of arbitrary length can be processed using a fixed size PE array. The algorithm has an improved execution time of O(lceilm/arceiln) required to perform the matching where m, a and n are the sizes of the pattern tree, processor array, subject tree respectively. A prototype CMOS VLSI chip implementing the proposed algorithm has been designed and verified. It is shown that the hardware algorithm proposed in this work represent a sign improvement in terms of computational complexity, data flow, and architecture over the ones previously proposed for this problem","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114637575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparison of hardware resources required by real-time stereo dense algorithms 实时立体密集算法所需硬件资源的比较
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598176
M. Perez, F. Cabestaing
Many algorithms for computing correlation based stereo correspondence have been proposed. Some of them can be implemented on specialized architectures, in order to obtain results in real time. In this communication, we propose an experimental comparison of the amount of hardware resources required for implementing these algorithms. An efficient architecture is presented, the STREAM, (acronym for systeme temps-reel d'extraction et d'analyse du mouvement, i.e. real-time motion extraction and analysis system) which is a processor dedicated to image sequence analysis
许多基于相关的立体对应计算算法已经被提出。其中一些可以在专门的体系结构上实现,以便实时获得结果。在本通信中,我们提出了实现这些算法所需的硬件资源量的实验比较。提出了一种高效的体系结构STREAM (system temes -reel d'extraction et d'analyse du movement的缩写,即实时运动提取和分析系统),它是一种专门用于图像序列分析的处理器
{"title":"A comparison of hardware resources required by real-time stereo dense algorithms","authors":"M. Perez, F. Cabestaing","doi":"10.1109/CAMP.2003.1598176","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598176","url":null,"abstract":"Many algorithms for computing correlation based stereo correspondence have been proposed. Some of them can be implemented on specialized architectures, in order to obtain results in real time. In this communication, we propose an experimental comparison of the amount of hardware resources required for implementing these algorithms. An efficient architecture is presented, the STREAM, (acronym for systeme temps-reel d'extraction et d'analyse du mouvement, i.e. real-time motion extraction and analysis system) which is a processor dedicated to image sequence analysis","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114735949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
System on chip evolution of a SIMD architecture for image processing 用于图像处理的SIMD体系结构的片上系统演变
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598175
J. Denoulet, A. Mérigot
This paper presents the current evolution of the associative mesh project. It aims at the design of a reconfigurable, asynchronous and massively parallel SIMD architecture, targeted towards image analysis implementation. Its basic principle relies on the use of global operations (associations) that, given any interpixel connection graph, can compute global operations over connected sets of these graphs. One of our current objectives is the implementation of an associative mesh with a SoC-type circuit. In this paper, we examine which architectural modifications would this approach imply. We also consider the benefits brought by this technique and the repercussions on the design's performances
本文介绍了关联网格工程的最新进展。它旨在设计一个可重构、异步和大规模并行的SIMD体系结构,以实现图像分析。它的基本原理依赖于全局操作(关联)的使用,给定任何像素间连接图,可以计算这些图的连接集上的全局操作。我们当前的目标之一是实现带有soc类型电路的关联网格。在本文中,我们将研究这种方法所隐含的架构修改。我们还考虑了这种技术带来的好处以及对设计性能的影响
{"title":"System on chip evolution of a SIMD architecture for image processing","authors":"J. Denoulet, A. Mérigot","doi":"10.1109/CAMP.2003.1598175","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598175","url":null,"abstract":"This paper presents the current evolution of the associative mesh project. It aims at the design of a reconfigurable, asynchronous and massively parallel SIMD architecture, targeted towards image analysis implementation. Its basic principle relies on the use of global operations (associations) that, given any interpixel connection graph, can compute global operations over connected sets of these graphs. One of our current objectives is the implementation of an associative mesh with a SoC-type circuit. In this paper, we examine which architectural modifications would this approach imply. We also consider the benefits brought by this technique and the repercussions on the design's performances","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125387935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Design of a language-independent parallel string matching unit for NLP 面向自然语言处理的并行字符串匹配单元设计
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598159
V. S. Murty, P. C. Reghu Raj, S. Raman
In natural language processing applications, string matching is the main time-consuming operation due to the large size of lexicon. Data dependence is minimal in string matching operations, and hence it is ideal for parallelization. A dedicated hardware for string matching that uses memory interleaving and parallel processing techniques can relieve the host CPU from this burden, thereby making the system suitable for real-time applications. This paper reports the FPGA design of such a system with m parallel matching units. The time complexity of the proposed algorithm is O (log2 n), where n is the total number of lexical entries. This has been achieved by a proper selection of the value of m. A special memory organization technique, which reduces the storage space by nearly 70%, has been adopted for storing lexical entries. The techniques used for matching and storage of lexical entries make the system language independent
在自然语言处理应用中,由于词汇量大,字符串匹配是最耗时的操作。数据依赖性在字符串匹配操作中是最小的,因此它是并行化的理想选择。使用内存交错并行处理技术的字符串匹配专用硬件可以减轻主机CPU的负担,从而使系统适合实时应用。本文报道了一个具有m个并行匹配单元的系统的FPGA设计。本文算法的时间复杂度为O (log2 n),其中n为词法条目的总数。这是通过正确选择m的值来实现的。在存储词法条目时,采用了一种特殊的内存组织技术,该技术将存储空间减少了近70%。用于匹配和存储词法条目的技术使系统与语言无关
{"title":"Design of a language-independent parallel string matching unit for NLP","authors":"V. S. Murty, P. C. Reghu Raj, S. Raman","doi":"10.1109/CAMP.2003.1598159","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598159","url":null,"abstract":"In natural language processing applications, string matching is the main time-consuming operation due to the large size of lexicon. Data dependence is minimal in string matching operations, and hence it is ideal for parallelization. A dedicated hardware for string matching that uses memory interleaving and parallel processing techniques can relieve the host CPU from this burden, thereby making the system suitable for real-time applications. This paper reports the FPGA design of such a system with m parallel matching units. The time complexity of the proposed algorithm is O (log2 n), where n is the total number of lexical entries. This has been achieved by a proper selection of the value of m. A special memory organization technique, which reduces the storage space by nearly 70%, has been adopted for storing lexical entries. The techniques used for matching and storage of lexical entries make the system language independent","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124334386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A multiscale technique for optical flow computation using piecewise affine approximation 基于分段仿射近似的多尺度光流计算技术
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598167
H. Le, G. Seetharaman, B. Zavidovique
We present a technique to estimate the optical flow in an image sequence, based on a piecewise affine model. In this piecewise approach, the area of interest in each image frame is divided into a set of small triangular patches. These triangular meshes are established over a set of feature points, which are extracted from the images and tracked from one frame to another. The velocity field within each triangular patch is parameterized by an affine transform. A multiscale coarse-to-fine approach is employed to increase the robustness of the method as well as the accuracy of the optical flow resulted from piecewise affine approximations. Finally, an adaptive filter is used to refine the estimated flow field. The filter is designed in such a way that not only can it reduce noises caused by errors of the process described above, but it can also avoid smoothing the discontinuities in the motion field. The method has been implemented and some experimental results are presented in this paper. The method takes advantage of widely used MPEG-4 encoding hardware/software tools
提出了一种基于分段仿射模型的图像序列光流估计方法。在这种分段方法中,每个图像帧中的感兴趣区域被划分为一组小的三角形块。这些三角网格是在一组特征点上建立的,这些特征点是从图像中提取出来的,并从一帧到另一帧进行跟踪。每个三角块内的速度场用仿射变换参数化。为了提高方法的鲁棒性和分段仿射近似产生的光流的精度,采用了多尺度粗到精的方法。最后,采用自适应滤波器对估计的流场进行细化。该滤波器的设计不仅可以降低上述过程误差引起的噪声,而且还可以避免平滑运动场中的不连续。本文给出了该方法的实现和一些实验结果。该方法利用了广泛使用的MPEG-4编码硬件/软件工具
{"title":"A multiscale technique for optical flow computation using piecewise affine approximation","authors":"H. Le, G. Seetharaman, B. Zavidovique","doi":"10.1109/CAMP.2003.1598167","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598167","url":null,"abstract":"We present a technique to estimate the optical flow in an image sequence, based on a piecewise affine model. In this piecewise approach, the area of interest in each image frame is divided into a set of small triangular patches. These triangular meshes are established over a set of feature points, which are extracted from the images and tracked from one frame to another. The velocity field within each triangular patch is parameterized by an affine transform. A multiscale coarse-to-fine approach is employed to increase the robustness of the method as well as the accuracy of the optical flow resulted from piecewise affine approximations. Finally, an adaptive filter is used to refine the estimated flow field. The filter is designed in such a way that not only can it reduce noises caused by errors of the process described above, but it can also avoid smoothing the discontinuities in the motion field. The method has been implemented and some experimental results are presented in this paper. The method takes advantage of widely used MPEG-4 encoding hardware/software tools","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121753187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Design and implementation of a high performance matrix multiplier core for Xilinx Virtex FPGAs Xilinx Virtex fpga高性能矩阵乘法器核心的设计与实现
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598160
S. Belkacemi, K. Benkrid, D. Crookes, A. Benkrid
Matrix multiplication is a core operation in digital signal processing operations with a variety of applications such as image processing, computer graphics, sonar processing and robotics. This paper presents the design and implementation of a high performance, fully parallel matrix multiplication core. The core is parameterised and scalable in terms of the matrices' dimensions (row and column number) and the input data word length. Fully floorplanned FPGA configurations are generated automatically, from high-level descriptions of the matrix multiplication operation, in the form of EDIF netlists in less than 1 sec. These are specifically optimised for Xilinx Virtex FPGA chips. By exploiting the abundance of logic resources in Xilinx Virtex FPGAs (look-up tables, fast carry logic, shift registers, flip flops etc.), a fully parallel implementation of the matrix multiplier core has been achieved; with a full matrix result being generated every clock cycle. A 3times3 matrix multiplier instance consumes 2,448 Virtex slices and can run at 175 MHz on an XCV1000E-6 Virtex-E chip, thus performing over 4.7 billion MAC/sec. This leads to 175 million full 3times3 matrix result per second
矩阵乘法是数字信号处理运算中的核心运算,在图像处理、计算机图形学、声纳处理和机器人等领域有着广泛的应用。本文介绍了一种高性能、全并行矩阵乘法核心的设计与实现。核心是参数化的,可以根据矩阵的维度(行数和列数)和输入数据字长进行扩展。在不到1秒的时间内,从矩阵乘法运算的高级描述中,以EDIF网络列表的形式自动生成完整的FPGA配置。这些配置是专门针对赛灵思Virtex FPGA芯片进行优化的。通过利用Xilinx Virtex fpga丰富的逻辑资源(查找表、快速进位逻辑、移位寄存器、触发器等),实现了矩阵乘法器核心的完全并行实现;每个时钟周期生成一个完整的矩阵结果。一个3times3矩阵乘法器实例消耗2448个Virtex切片,可以在XCV1000E-6 Virtex- e芯片上以175 MHz的频率运行,从而执行超过47亿MAC/秒。这导致每秒产生1.75亿个完整的3times3矩阵结果
{"title":"Design and implementation of a high performance matrix multiplier core for Xilinx Virtex FPGAs","authors":"S. Belkacemi, K. Benkrid, D. Crookes, A. Benkrid","doi":"10.1109/CAMP.2003.1598160","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598160","url":null,"abstract":"Matrix multiplication is a core operation in digital signal processing operations with a variety of applications such as image processing, computer graphics, sonar processing and robotics. This paper presents the design and implementation of a high performance, fully parallel matrix multiplication core. The core is parameterised and scalable in terms of the matrices' dimensions (row and column number) and the input data word length. Fully floorplanned FPGA configurations are generated automatically, from high-level descriptions of the matrix multiplication operation, in the form of EDIF netlists in less than 1 sec. These are specifically optimised for Xilinx Virtex FPGA chips. By exploiting the abundance of logic resources in Xilinx Virtex FPGAs (look-up tables, fast carry logic, shift registers, flip flops etc.), a fully parallel implementation of the matrix multiplier core has been achieved; with a full matrix result being generated every clock cycle. A 3times3 matrix multiplier instance consumes 2,448 Virtex slices and can run at 175 MHz on an XCV1000E-6 Virtex-E chip, thus performing over 4.7 billion MAC/sec. This leads to 175 million full 3times3 matrix result per second","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124771479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Parallelisation of a face tracking algorithm with the SKiPPER-II parallel programming environment 基于SKiPPER-II并行编程环境的人脸跟踪算法并行化
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598164
R. Coudarcher, F. Duculty, J. Sérot, F. Jurie, J. Derutin, M. Dhome
This paper casts a light on the parallelisation, using algorithmic skeletons, of a complete and realistic image processing application in which we have pointed out requirement for skeleton nesting. The image processing application we have chosen is a 3D face tracking algorithm from appearance
本文阐述了一个完整的、真实的图像处理应用中使用算法骨架的并行化问题,并指出了骨架嵌套的要求。我们选择的图像处理应用是一种从外观出发的三维人脸跟踪算法
{"title":"Parallelisation of a face tracking algorithm with the SKiPPER-II parallel programming environment","authors":"R. Coudarcher, F. Duculty, J. Sérot, F. Jurie, J. Derutin, M. Dhome","doi":"10.1109/CAMP.2003.1598164","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598164","url":null,"abstract":"This paper casts a light on the parallelisation, using algorithmic skeletons, of a complete and realistic image processing application in which we have pointed out requirement for skeleton nesting. The image processing application we have chosen is a 3D face tracking algorithm from appearance","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128207530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Real-time dense stereo on a personal computer 个人电脑上的实时密集立体声
Pub Date : 2003-05-12 DOI: 10.1109/CAMP.2003.1598169
L. Di Stefano, M. Marchionni, S. Mattoccia
This paper presents a stereo algorithm that enables real time dense disparity measurements on standard personal computers. Unlike many other dense stereo algorithms, which are based on two matching phases, the proposed algorithm relies on a single matching phase and allows for rejecting unreliable matches by exploiting violations of the uniqueness constraint and analysing the behaviour of the correlation scores. The overall algorithm has been carefully optimised using very efficient calculation schemes and deploying massively the SIMD parallel processing capabilities available nowadays in state-of-the-art general purpose microprocessors. The paper describes the algorithm and the optimisation strategies, and provides experimental results obtained on stereo pairs with ground-truth as well as execution times measurements
本文提出了一种在标准个人计算机上实现密集视差实时测量的立体算法。与许多其他基于两个匹配阶段的密集立体算法不同,该算法依赖于单个匹配阶段,并允许通过利用违反唯一性约束和分析相关分数的行为来拒绝不可靠的匹配。使用非常有效的计算方案和大规模部署SIMD并行处理能力,在最先进的通用微处理器中,对整个算法进行了精心优化。本文介绍了该算法和优化策略,并给出了基于真值的立体对的实验结果和执行时间测量
{"title":"Real-time dense stereo on a personal computer","authors":"L. Di Stefano, M. Marchionni, S. Mattoccia","doi":"10.1109/CAMP.2003.1598169","DOIUrl":"https://doi.org/10.1109/CAMP.2003.1598169","url":null,"abstract":"This paper presents a stereo algorithm that enables real time dense disparity measurements on standard personal computers. Unlike many other dense stereo algorithms, which are based on two matching phases, the proposed algorithm relies on a single matching phase and allows for rejecting unreliable matches by exploiting violations of the uniqueness constraint and analysing the behaviour of the correlation scores. The overall algorithm has been carefully optimised using very efficient calculation schemes and deploying massively the SIMD parallel processing capabilities available nowadays in state-of-the-art general purpose microprocessors. The paper describes the algorithm and the optimisation strategies, and provides experimental results obtained on stereo pairs with ground-truth as well as execution times measurements","PeriodicalId":443821,"journal":{"name":"2003 IEEE International Workshop on Computer Architectures for Machine Perception","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130272523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2003 IEEE International Workshop on Computer Architectures for Machine Perception
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1