{"title":"dsp阵列中的矩阵计算","authors":"Jeime Moreno, M. Medina","doi":"10.1109/ASAP.1992.218549","DOIUrl":null,"url":null,"abstract":"The authors present the use of the multimesh graph representation to map matrix algorithms onto arrays of digital signal processors (DSPs), using the TMS 320C30 as example. This processor, as most DSPs, is characterized by a two-level memory subsystem and a built-in DMA controller. The mapping process focuses on large matrices which do not fit in the first level of memory. Good utilization of the DSP resources is achieved by programming the execution of the algorithms by prisms from the multimesh graph; the optimal size of the prisms is obtained. Performance estimates indicate that it is possible to program the DSP in such a way that the impact of slower second-level memory is not significant (around 7% degradation with five wait states). Good load balancing throughout the array is achieved by allocating to processors partitions of the problem of nonuniform size, as suggested in a previous publication. The schedule of operations proposed deviates from the conventional ordering, wherein the inner-product among two vectors is fully computed at once. Instead, the proposed schedule divides inner-products into portions which are executed in interleaved manner throughout the computation of the entire problem, each portion using as an input the partial result obtained earlier from the execution of the corresponding previous portion.<<ETX>>","PeriodicalId":265438,"journal":{"name":"[1992] Proceedings of the International Conference on Application Specific Array Processors","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1992-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Matrix computations in arrays of DSPs\",\"authors\":\"Jeime Moreno, M. Medina\",\"doi\":\"10.1109/ASAP.1992.218549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The authors present the use of the multimesh graph representation to map matrix algorithms onto arrays of digital signal processors (DSPs), using the TMS 320C30 as example. This processor, as most DSPs, is characterized by a two-level memory subsystem and a built-in DMA controller. The mapping process focuses on large matrices which do not fit in the first level of memory. Good utilization of the DSP resources is achieved by programming the execution of the algorithms by prisms from the multimesh graph; the optimal size of the prisms is obtained. Performance estimates indicate that it is possible to program the DSP in such a way that the impact of slower second-level memory is not significant (around 7% degradation with five wait states). Good load balancing throughout the array is achieved by allocating to processors partitions of the problem of nonuniform size, as suggested in a previous publication. The schedule of operations proposed deviates from the conventional ordering, wherein the inner-product among two vectors is fully computed at once. Instead, the proposed schedule divides inner-products into portions which are executed in interleaved manner throughout the computation of the entire problem, each portion using as an input the partial result obtained earlier from the execution of the corresponding previous portion.<<ETX>>\",\"PeriodicalId\":265438,\"journal\":{\"name\":\"[1992] Proceedings of the International Conference on Application Specific Array Processors\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1992-08-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"[1992] Proceedings of the International Conference on Application Specific Array Processors\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASAP.1992.218549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"[1992] Proceedings of the International Conference on Application Specific Array Processors","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASAP.1992.218549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The authors present the use of the multimesh graph representation to map matrix algorithms onto arrays of digital signal processors (DSPs), using the TMS 320C30 as example. This processor, as most DSPs, is characterized by a two-level memory subsystem and a built-in DMA controller. The mapping process focuses on large matrices which do not fit in the first level of memory. Good utilization of the DSP resources is achieved by programming the execution of the algorithms by prisms from the multimesh graph; the optimal size of the prisms is obtained. Performance estimates indicate that it is possible to program the DSP in such a way that the impact of slower second-level memory is not significant (around 7% degradation with five wait states). Good load balancing throughout the array is achieved by allocating to processors partitions of the problem of nonuniform size, as suggested in a previous publication. The schedule of operations proposed deviates from the conventional ordering, wherein the inner-product among two vectors is fully computed at once. Instead, the proposed schedule divides inner-products into portions which are executed in interleaved manner throughout the computation of the entire problem, each portion using as an input the partial result obtained earlier from the execution of the corresponding previous portion.<>