dsp阵列中的矩阵计算

[1992] Proceedings of the International Conference on Application Specific Array Processors Pub Date : 1992-08-04 DOI:10.1109/ASAP.1992.218549

Jeime Moreno, M. Medina

{"title":"dsp阵列中的矩阵计算","authors":"Jeime Moreno, M. Medina","doi":"10.1109/ASAP.1992.218549","DOIUrl":null,"url":null,"abstract":"The authors present the use of the multimesh graph representation to map matrix algorithms onto arrays of digital signal processors (DSPs), using the TMS 320C30 as example. This processor, as most DSPs, is characterized by a two-level memory subsystem and a built-in DMA controller. The mapping process focuses on large matrices which do not fit in the first level of memory. Good utilization of the DSP resources is achieved by programming the execution of the algorithms by prisms from the multimesh graph; the optimal size of the prisms is obtained. Performance estimates indicate that it is possible to program the DSP in such a way that the impact of slower second-level memory is not significant (around 7% degradation with five wait states). Good load balancing throughout the array is achieved by allocating to processors partitions of the problem of nonuniform size, as suggested in a previous publication. The schedule of operations proposed deviates from the conventional ordering, wherein the inner-product among two vectors is fully computed at once. Instead, the proposed schedule divides inner-products into portions which are executed in interleaved manner throughout the computation of the entire problem, each portion using as an input the partial result obtained earlier from the execution of the corresponding previous portion.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Matrix computations in arrays of DSPs\",\"authors\":\"Jeime Moreno, M. Medina\",\"doi\":\"10.1109/ASAP.1992.218549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The authors present the use of the multimesh graph representation to map matrix algorithms onto arrays of digital signal processors (DSPs), using the TMS 320C30 as example. This processor, as most DSPs, is characterized by a two-level memory subsystem and a built-in DMA controller. The mapping process focuses on large matrices which do not fit in the first level of memory. Good utilization of the DSP resources is achieved by programming the execution of the algorithms by prisms from the multimesh graph; the optimal size of the prisms is obtained. Performance estimates indicate that it is possible to program the DSP in such a way that the impact of slower second-level memory is not significant (around 7% degradation with five wait states). Good load balancing throughout the array is achieved by allocating to processors partitions of the problem of nonuniform size, as suggested in a previous publication. The schedule of operations proposed deviates from the conventional ordering, wherein the inner-product among two vectors is fully computed at once. Instead, the proposed schedule divides inner-products into portions which are executed in interleaved manner throughout the computation of the entire problem, each portion using as an input the partial result obtained earlier from the execution of the corresponding previous portion.<<ETX>>\",\"PeriodicalId\":265438,\"journal\":{\"name\":\"[1992] Proceedings of the International Conference on Application Specific Array Processors\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1992] Proceedings of the International Conference on Application Specific Array Processors\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP.1992.218549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1992] Proceedings of the International Conference on Application Specific Array Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.1992.218549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

作者以tms320c30为例，介绍了使用多网格图表示将矩阵算法映射到数字信号处理器(dsp)阵列上的方法。与大多数dsp一样，该处理器的特点是两级存储器子系统和内置DMA控制器。映射过程集中在不适合第一层内存的大矩阵上。利用多网格图中的棱镜对算法的执行进行编程，实现了DSP资源的有效利用;得到了棱镜的最佳尺寸。性能估计表明，有可能以这样一种方式对DSP进行编程，即较慢的二级存储器的影响并不显著(在五种等待状态下，性能下降约7%)。整个数组的良好负载平衡是通过将大小不一致的问题分区分配给处理器来实现的，这在以前的出版物中有过建议。所提出的操作计划偏离了传统的排序，其中两个向量之间的内积是一次完全计算的。相反，提议的调度将内积分成若干部分，这些部分在整个问题的计算过程中以交错的方式执行，每一部分使用从先前相应部分执行中获得的部分结果作为输入。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Matrix computations in arrays of DSPs

The authors present the use of the multimesh graph representation to map matrix algorithms onto arrays of digital signal processors (DSPs), using the TMS 320C30 as example. This processor, as most DSPs, is characterized by a two-level memory subsystem and a built-in DMA controller. The mapping process focuses on large matrices which do not fit in the first level of memory. Good utilization of the DSP resources is achieved by programming the execution of the algorithms by prisms from the multimesh graph; the optimal size of the prisms is obtained. Performance estimates indicate that it is possible to program the DSP in such a way that the impact of slower second-level memory is not significant (around 7% degradation with five wait states). Good load balancing throughout the array is achieved by allocating to processors partitions of the problem of nonuniform size, as suggested in a previous publication. The schedule of operations proposed deviates from the conventional ordering, wherein the inner-product among two vectors is fully computed at once. Instead, the proposed schedule divides inner-products into portions which are executed in interleaved manner throughout the computation of the entire problem, each portion using as an input the partial result obtained earlier from the execution of the corresponding previous portion.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

[1992] Proceedings of the International Conference on Application Specific Array Processors

自引率

0.00%

发文量