Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs

G. Mani, S. Berkovich, Duoduo Liao
{"title":"Balanced Block Design Architecture for Parallel Computing in Mobile CPUs/GPUs","authors":"G. Mani, S. Berkovich, Duoduo Liao","doi":"10.1109/COMGEO.2013.27","DOIUrl":null,"url":null,"abstract":"To increase performance, processor manufacturers extract parallelism through shrinking transistors and adding more of them to single-core chips and create multi-core systems. Although microprocessors performance continues to grow at an exponential rate, this approach generates too much heat and consumes too much power. These architectures not only introduce several complications but require tremendous efforts for organization of special software for parallel processing. In many cases, these difficulties are insurmountable. The programmers have to write complex code to prioritize the tasks or perform the task in parallel like extracting parallelism through threads in GPUs. One of the key issues for the programmers is how to divide the tasks in to sub-tasks. A faulty calculation may lead to increased data dependency which will slow the processor. Processor that performs more parallel operations can simultaneously increase the queuing delays. In both of the scenarios mentioned above, the relative cost of communication (also known as data transportation energy) between processing elements in microprocessor (or objects in parallel programming) is increasing relative to that of computation. This trend is resulting in larger caches for every new processor generation and more complex and costly latency tolerant mechanisms. Here we introduce a combinatorial architecture that has a unique property-multi-core running on a sequential code. This architecture can be used for both CPUs and GPUs. Some minor adjustments to a regular compiler are needed for loading. Especially, current mobile GPUs technologies are still relatively immature and require substantial improvements to enable wireless devices to perform the complex graphics-related functions. Our new architecture is more suitable for mobile GPUs/CPUs, i.e., mobile heterogeneous computing, with limited resources and relative greater performance.","PeriodicalId":383309,"journal":{"name":"2013 Fourth International Conference on Computing for Geospatial Research and Application","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 Fourth International Conference on Computing for Geospatial Research and Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMGEO.2013.27","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

To increase performance, processor manufacturers extract parallelism through shrinking transistors and adding more of them to single-core chips and create multi-core systems. Although microprocessors performance continues to grow at an exponential rate, this approach generates too much heat and consumes too much power. These architectures not only introduce several complications but require tremendous efforts for organization of special software for parallel processing. In many cases, these difficulties are insurmountable. The programmers have to write complex code to prioritize the tasks or perform the task in parallel like extracting parallelism through threads in GPUs. One of the key issues for the programmers is how to divide the tasks in to sub-tasks. A faulty calculation may lead to increased data dependency which will slow the processor. Processor that performs more parallel operations can simultaneously increase the queuing delays. In both of the scenarios mentioned above, the relative cost of communication (also known as data transportation energy) between processing elements in microprocessor (or objects in parallel programming) is increasing relative to that of computation. This trend is resulting in larger caches for every new processor generation and more complex and costly latency tolerant mechanisms. Here we introduce a combinatorial architecture that has a unique property-multi-core running on a sequential code. This architecture can be used for both CPUs and GPUs. Some minor adjustments to a regular compiler are needed for loading. Especially, current mobile GPUs technologies are still relatively immature and require substantial improvements to enable wireless devices to perform the complex graphics-related functions. Our new architecture is more suitable for mobile GPUs/CPUs, i.e., mobile heterogeneous computing, with limited resources and relative greater performance.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
移动cpu / gpu并行计算的平衡块设计体系结构
为了提高性能,处理器制造商通过缩小晶体管并在单核芯片中添加更多晶体管来提取并行性,并创建多核系统。虽然微处理器的性能继续以指数速度增长,但这种方法产生太多的热量,消耗太多的功率。这些体系结构不仅带来了一些复杂性,而且需要大量的工作来组织用于并行处理的专用软件。在许多情况下,这些困难是无法克服的。程序员必须编写复杂的代码来确定任务的优先级,或者并行执行任务,比如通过gpu中的线程提取并行性。对于程序员来说,关键问题之一是如何将任务划分为子任务。错误的计算可能导致增加的数据依赖性,这将减慢处理器的速度。执行更多并行操作的处理器会同时增加排队延迟。在上面提到的两种情况下,微处理器(或并行编程中的对象)中处理元素之间的通信(也称为数据传输能量)的相对成本相对于计算的相对成本正在增加。这种趋势导致每一代新处理器都需要更大的缓存,以及更复杂、更昂贵的延迟容忍机制。这里我们介绍一种组合体系结构,它具有独特的特性——在顺序代码上运行多核。这种架构既可以用于cpu,也可以用于gpu。加载需要对常规编译器进行一些小的调整。特别是,目前的移动gpu技术还相对不成熟,需要大量的改进才能使无线设备执行复杂的图形相关功能。我们的新架构更适合移动gpu / cpu,即移动异构计算,资源有限,性能相对更高。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Geospatial Management and Utilization of Large-Scale Urban Visual Reconstructions Demonstrating the Utility of a New 3D Benefit: Cost Tool for Adaptation to Sea Level Rise and Storm Surge Application of Statistical Methods in City Economic and Living Standard Study: A Case of China (2003 -- 2008) Coupling Simulations of Human Driven Land Use Change with Natural Vegetation Dynamics Analysis of Spatial Autocorrelation for Traffic Accident Data Based on Spatial Decision Tree
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1