{"title":"SIMD处理器多组存储器中的软件可编程数据分配","authors":"Jian Wang, Joar Sohl, Olof Kraigher, Dake Liu","doi":"10.1109/DSD.2010.26","DOIUrl":null,"url":null,"abstract":"The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.","PeriodicalId":356885,"journal":{"name":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Software Programmable Data Allocation in Multi-bank Memory of SIMD Processors\",\"authors\":\"Jian Wang, Joar Sohl, Olof Kraigher, Dake Liu\",\"doi\":\"10.1109/DSD.2010.26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.\",\"PeriodicalId\":356885,\"journal\":{\"name\":\"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DSD.2010.26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2010.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Software Programmable Data Allocation in Multi-bank Memory of SIMD Processors
The host-SIMD style heterogeneous multi-processor architecture offers high computing performance and user friendly programmability. It explores both task level parallelism and data level parallelism by the on-chip multiple SIMD coprocessors. For embedded DSP applications with predictable computing feature, this architecture can be further optimized for performance, implementation cost and power consumption. The optimization could be done by improving the SIMD processing efficiency and reducing redundant memory accesses and data shuffle operations. This paper introduces one effective approach by designing a software programmable multi-bank memory system for SIMD processors. Both the hardware architecture and software programming model are described in this paper, with an implementation example of the BLAS syrk routine. The proposed memory system offers high SIMD data access flexibility by using lookup table based address generators, and applying data permutations on both DMA controller interface and SIMD data access. The evaluation results show that the SIMD processor with this memory system can achieve high execution efficiency, with only 10% to 30% overhead. The proposed memory system also saves the implementation cost on SIMD local registers, in our system, each SIMD core has only 8 128-bit vector registers.