改进GPU处理器上OLAP操作符的内存争用

International Workshop on Data Management on New Hardware Pub Date : 2012-05-21 DOI:10.1145/2236584.2236590

Evangelia A. Sitaridi, K. A. Ross

{"title":"改进GPU处理器上OLAP操作符的内存争用","authors":"Evangelia A. Sitaridi, K. A. Ross","doi":"10.1145/2236584.2236590","DOIUrl":null,"url":null,"abstract":"Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to contention caused by popular values, have to deal with a new performance limiting factor: thread serialization when accessing values belonging to the same bank.\n Here, we define the problem of bank and value conflict optimization for data processing operators using the CUDA platform. To analyze the impact of these two factors on operator performance we use two database operations: foreignkey join and grouped aggregation. We suggest and evaluate techniques for optimizing the data arrangement offline by creating clones of values to reduce overall memory contention. Results indicate that columns used for writes, as grouping columns, need be optimized to fully exploit the maximum bandwidth of shared memory.","PeriodicalId":298901,"journal":{"name":"International Workshop on Data Management on New Hardware","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Ameliorating memory contention of OLAP operators on GPU processors\",\"authors\":\"Evangelia A. Sitaridi, K. A. Ross\",\"doi\":\"10.1145/2236584.2236590\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to contention caused by popular values, have to deal with a new performance limiting factor: thread serialization when accessing values belonging to the same bank.\\n Here, we define the problem of bank and value conflict optimization for data processing operators using the CUDA platform. To analyze the impact of these two factors on operator performance we use two database operations: foreignkey join and grouped aggregation. We suggest and evaluate techniques for optimizing the data arrangement offline by creating clones of values to reduce overall memory contention. Results indicate that columns used for writes, as grouping columns, need be optimized to fully exploit the maximum bandwidth of shared memory.\",\"PeriodicalId\":298901,\"journal\":{\"name\":\"International Workshop on Data Management on New Hardware\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Workshop on Data Management on New Hardware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2236584.2236590\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2236584.2236590","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

与多核cpu实现相比，在GPU处理器上实现数据库运算符显示出显著的性能改进。GPU线程可以使用共享内存进行合作，共享内存被组织在交错的内存库中，只有当线程读取和修改属于不同内存库的地址时才会快速。因此，在GPU上实现的数据处理操作符，除了流行值引起的争用之外，还必须处理一个新的性能限制因素:访问属于同一银行的值时的线程序列化。在这里，我们定义了使用CUDA平台的数据处理算子的bank和value冲突优化问题。为了分析这两个因素对操作符性能的影响，我们使用两个数据库操作:外键连接和分组聚合。我们建议并评估通过创建值的克隆来离线优化数据安排的技术，以减少总体内存争用。结果表明，需要对用于写的列(如分组列)进行优化，以充分利用共享内存的最大带宽。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Ameliorating memory contention of OLAP operators on GPU processors

Implementations of database operators on GPU processors have shown dramatic performance improvement compared to multicore-CPU implementations. GPU threads can cooperate using shared memory, which is organized in interleaved banks and is fast only when threads read and modify addresses belonging to distinct memory banks. Therefore, data processing operators implemented on a GPU, in addition to contention caused by popular values, have to deal with a new performance limiting factor: thread serialization when accessing values belonging to the same bank. Here, we define the problem of bank and value conflict optimization for data processing operators using the CUDA platform. To analyze the impact of these two factors on operator performance we use two database operations: foreignkey join and grouped aggregation. We suggest and evaluate techniques for optimizing the data arrangement offline by creating clones of values to reduce overall memory contention. Results indicate that columns used for writes, as grouping columns, need be optimized to fully exploit the maximum bandwidth of shared memory.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Workshop on Data Management on New Hardware

自引率

0.00%

发文量

期刊最新文献

On testing persistent-memory-based software SIMD-accelerated regular expression matching FPGA-accelerated group-by aggregation using synchronizing caches Customized OS support for data-processing Larger-than-memory data management on modern storage hardware for in-memory OLTP database systems