通过内存映射实现NUMA遗忘

Proceedings of the 11th International Workshop on Data Management on New Hardware Pub Date : 2015-05-31 DOI:10.1145/2771937.2771948

M. Gawade, M. Kersten

{"title":"通过内存映射实现NUMA遗忘","authors":"M. Gawade, M. Kersten","doi":"10.1145/2771937.2771948","DOIUrl":null,"url":null,"abstract":"With the rise of multi-socket multi-core CPUs a lot of effort is being put into how to best exploit their abundant CPU power. In a shared memory setting the multi-socket CPUs are equipped with their own memory module, and access memory modules across sockets in a non-uniform access pattern (NUMA). Memory access across socket is relatively expensive compared to memory access within a socket. One of the common solutions to minimize across socket memory access is to partition the data, such that the data affinity is maintained per socket. In this paper we explore the role of memory mapped storage to provide transparent data access in a NUMA environment, without the need of explicit data partitioning. We compare the performance of a database engine in a distributed setting in a multi-socket environment, with a database engine in a NUMA oblivious setting. We show that though the operating system tries to keep the data affinity to local sockets, a significant remote memory access still occurs, as the number of threads increase. Hence, setting explicit process and memory affinity results into a robust execution in NUMA oblivious plans. We use micro-experiments and SQL queries from the TPC-H benchmark to provide an in-depth experimental exploration of the landscape, in a four socket Intel machine.","PeriodicalId":267524,"journal":{"name":"Proceedings of the 11th International Workshop on Data Management on New Hardware","volume":"128 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"NUMA obliviousness through memory mapping\",\"authors\":\"M. Gawade, M. Kersten\",\"doi\":\"10.1145/2771937.2771948\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With the rise of multi-socket multi-core CPUs a lot of effort is being put into how to best exploit their abundant CPU power. In a shared memory setting the multi-socket CPUs are equipped with their own memory module, and access memory modules across sockets in a non-uniform access pattern (NUMA). Memory access across socket is relatively expensive compared to memory access within a socket. One of the common solutions to minimize across socket memory access is to partition the data, such that the data affinity is maintained per socket. In this paper we explore the role of memory mapped storage to provide transparent data access in a NUMA environment, without the need of explicit data partitioning. We compare the performance of a database engine in a distributed setting in a multi-socket environment, with a database engine in a NUMA oblivious setting. We show that though the operating system tries to keep the data affinity to local sockets, a significant remote memory access still occurs, as the number of threads increase. Hence, setting explicit process and memory affinity results into a robust execution in NUMA oblivious plans. We use micro-experiments and SQL queries from the TPC-H benchmark to provide an in-depth experimental exploration of the landscape, in a four socket Intel machine.\",\"PeriodicalId\":267524,\"journal\":{\"name\":\"Proceedings of the 11th International Workshop on Data Management on New Hardware\",\"volume\":\"128 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 11th International Workshop on Data Management on New Hardware\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2771937.2771948\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th International Workshop on Data Management on New Hardware","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2771937.2771948","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

随着多插槽多核CPU的兴起，如何更好地利用其丰富的CPU能力成为人们关注的焦点。在共享内存设置中，多套接字cpu配备了它们自己的内存模块，并以非统一访问模式(NUMA)跨套接字访问内存模块。与套接字内的内存访问相比，跨套接字的内存访问相对昂贵。最小化跨套接字内存访问的常见解决方案之一是对数据进行分区，这样每个套接字维护数据关联。在本文中，我们探讨了内存映射存储在NUMA环境中提供透明数据访问的作用，而不需要显式的数据分区。我们比较了多套接字环境中分布式设置下的数据库引擎与NUMA无关设置下的数据库引擎的性能。我们表明，尽管操作系统试图保持数据与本地套接字的关联，但随着线程数量的增加，仍然会发生大量的远程内存访问。因此，在NUMA无关计划中设置显式的进程和内存关联可以实现健壮的执行。我们使用来自TPC-H基准测试的微实验和SQL查询，在一台四个插槽的Intel机器上对场景进行了深入的实验探索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

NUMA obliviousness through memory mapping

With the rise of multi-socket multi-core CPUs a lot of effort is being put into how to best exploit their abundant CPU power. In a shared memory setting the multi-socket CPUs are equipped with their own memory module, and access memory modules across sockets in a non-uniform access pattern (NUMA). Memory access across socket is relatively expensive compared to memory access within a socket. One of the common solutions to minimize across socket memory access is to partition the data, such that the data affinity is maintained per socket. In this paper we explore the role of memory mapped storage to provide transparent data access in a NUMA environment, without the need of explicit data partitioning. We compare the performance of a database engine in a distributed setting in a multi-socket environment, with a database engine in a NUMA oblivious setting. We show that though the operating system tries to keep the data affinity to local sockets, a significant remote memory access still occurs, as the number of threads increase. Hence, setting explicit process and memory affinity results into a robust execution in NUMA oblivious plans. We use micro-experiments and SQL queries from the TPC-H benchmark to provide an in-depth experimental exploration of the landscape, in a four socket Intel machine.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 11th International Workshop on Data Management on New Hardware

自引率

0.00%

发文量

期刊最新文献

Toward GPUs being mainstream in analytic processing: An initial argument using simple scan-aggregate queries Applying HTM to an OLTP System: No Free Lunch TLB misses: The Missing Issue of Adaptive Radix Tree? The Serial Safety Net: Efficient Concurrency Control on Modern Hardware Scaling the Memory Power Wall With DRAM-Aware Data Management