{"title":"移动cpu / gpu并行计算的平衡块设计体系结构","authors":"G. Mani, S. Berkovich, Duoduo Liao","doi":"10.1109/COMGEO.2013.27","DOIUrl":null,"url":null,"abstract":"To increase performance, processor manufacturers extract parallelism through shrinking transistors and adding more of them to single-core chips and create multi-core systems. Although microprocessors performance continues to grow at an exponential rate, this approach generates too much heat and consumes too much power. These architectures not only introduce several complications but require tremendous efforts for organization of special software for parallel processing. In many cases, these difficulties are insurmountable. The programmers have to write complex code to prioritize the tasks or perform the task in parallel like extracting parallelism through threads in GPUs. One of the key issues for the programmers is how to divide the tasks in to sub-tasks. A faulty calculation may lead to increased data dependency which will slow the processor. Processor that performs more parallel operations can simultaneously increase the queuing delays. In both of the scenarios mentioned above, the relative cost of communication (also known as data transportation energy) between processing elements in microprocessor (or objects in parallel programming) is increasing relative to that of computation. This trend is resulting in larger caches for every new processor generation and more complex and costly latency tolerant mechanisms. Here we introduce a combinatorial architecture that has a unique property-multi-core running on a sequential code. This architecture can be used for both CPUs and GPUs. Some minor adjustments to a regular compiler are needed for loading. Especially, current mobile GPUs technologies are still relatively immature and require substantial improvements to enable wireless devices to perform the complex graphics-related functions. Our new architecture is more suitable for mobile GPUs/CPUs, i.e., mobile heterogeneous computing, with limited resources and relative greater performance.","PeriodicalId":383309,"journal":{"name":"2013 Fourth International Conference on Computing for Geospatial Research and Application","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs\",\"authors\":\"G. Mani, S. Berkovich, Duoduo Liao\",\"doi\":\"10.1109/COMGEO.2013.27\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To increase performance, processor manufacturers extract parallelism through shrinking transistors and adding more of them to single-core chips and create multi-core systems. Although microprocessors performance continues to grow at an exponential rate, this approach generates too much heat and consumes too much power. These architectures not only introduce several complications but require tremendous efforts for organization of special software for parallel processing. In many cases, these difficulties are insurmountable. The programmers have to write complex code to prioritize the tasks or perform the task in parallel like extracting parallelism through threads in GPUs. One of the key issues for the programmers is how to divide the tasks in to sub-tasks. A faulty calculation may lead to increased data dependency which will slow the processor. Processor that performs more parallel operations can simultaneously increase the queuing delays. In both of the scenarios mentioned above, the relative cost of communication (also known as data transportation energy) between processing elements in microprocessor (or objects in parallel programming) is increasing relative to that of computation. This trend is resulting in larger caches for every new processor generation and more complex and costly latency tolerant mechanisms. Here we introduce a combinatorial architecture that has a unique property-multi-core running on a sequential code. This architecture can be used for both CPUs and GPUs. Some minor adjustments to a regular compiler are needed for loading. Especially, current mobile GPUs technologies are still relatively immature and require substantial improvements to enable wireless devices to perform the complex graphics-related functions. Our new architecture is more suitable for mobile GPUs/CPUs, i.e., mobile heterogeneous computing, with limited resources and relative greater performance.\",\"PeriodicalId\":383309,\"journal\":{\"name\":\"2013 Fourth International Conference on Computing for Geospatial Research and Application\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-07-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 Fourth International Conference on Computing for Geospatial Research and Application\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/COMGEO.2013.27\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Fourth International Conference on Computing for Geospatial Research and Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMGEO.2013.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs
To increase performance, processor manufacturers extract parallelism through shrinking transistors and adding more of them to single-core chips and create multi-core systems. Although microprocessors performance continues to grow at an exponential rate, this approach generates too much heat and consumes too much power. These architectures not only introduce several complications but require tremendous efforts for organization of special software for parallel processing. In many cases, these difficulties are insurmountable. The programmers have to write complex code to prioritize the tasks or perform the task in parallel like extracting parallelism through threads in GPUs. One of the key issues for the programmers is how to divide the tasks in to sub-tasks. A faulty calculation may lead to increased data dependency which will slow the processor. Processor that performs more parallel operations can simultaneously increase the queuing delays. In both of the scenarios mentioned above, the relative cost of communication (also known as data transportation energy) between processing elements in microprocessor (or objects in parallel programming) is increasing relative to that of computation. This trend is resulting in larger caches for every new processor generation and more complex and costly latency tolerant mechanisms. Here we introduce a combinatorial architecture that has a unique property-multi-core running on a sequential code. This architecture can be used for both CPUs and GPUs. Some minor adjustments to a regular compiler are needed for loading. Especially, current mobile GPUs technologies are still relatively immature and require substantial improvements to enable wireless devices to perform the complex graphics-related functions. Our new architecture is more suitable for mobile GPUs/CPUs, i.e., mobile heterogeneous computing, with limited resources and relative greater performance.