Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu
{"title":"面向远程内存访问的ARM服务器高效硬件节点间链路","authors":"Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu","doi":"10.1109/CCGrid.2016.66","DOIUrl":null,"url":null,"abstract":"The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access\",\"authors\":\"Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu\",\"doi\":\"10.1109/CCGrid.2016.66\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.\",\"PeriodicalId\":103641,\"journal\":{\"name\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2016.66\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2016.66","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access
The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.