通过节点级请求调度改进并行写

2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid Pub Date : 2009-05-18 DOI:10.1109/CCGRID.2009.71

Kazuki Ohta, Hiroya Matsuba, Y. Ishikawa

{"title":"通过节点级请求调度改进并行写","authors":"Kazuki Ohta, Hiroya Matsuba, Y. Ishikawa","doi":"10.1109/CCGRID.2009.71","DOIUrl":null,"url":null,"abstract":"In a cluster of multiple processors or cpu-cores, many processes may run on each compute node. Each process tends to issue contiguous I/O requests for snapshot, checkpointing or so, however, if large number of processes enter the I/O phase at the same time, the requests from the same process may be interrupted by the requests of other processes. Then, the I/O nodes receive these requests as non-contiguous way. This interleaved access pattern causes performance degradation in parallel file systems. In order to overcome the problem, we have designed the Gather-Arrange-Scatter (GAS) I/O architecture, for optimizing the parallel write performance. The GAS is an architecture for capturing write operations, buffering them in the memory, and scheduling them to reduce I/O cost at I/O nodes. The scheduling is done per compute node, and the requests are sent to the remote disks in parallel. In this paper, after introducing the GAS architecture in detail, its efficiency and scalability are evaluated using the NAS Parallel Benchmark BTIO. GAS is 5.2%faster than ROMIO collective I/O on PVFS2 in BTIO with 16 nodes/64 processes, and 34.9% faster than MPI noncollective I/O in the same configuration.","PeriodicalId":118263,"journal":{"name":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-05-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Improving Parallel Write by Node-Level Request Scheduling\",\"authors\":\"Kazuki Ohta, Hiroya Matsuba, Y. Ishikawa\",\"doi\":\"10.1109/CCGRID.2009.71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a cluster of multiple processors or cpu-cores, many processes may run on each compute node. Each process tends to issue contiguous I/O requests for snapshot, checkpointing or so, however, if large number of processes enter the I/O phase at the same time, the requests from the same process may be interrupted by the requests of other processes. Then, the I/O nodes receive these requests as non-contiguous way. This interleaved access pattern causes performance degradation in parallel file systems. In order to overcome the problem, we have designed the Gather-Arrange-Scatter (GAS) I/O architecture, for optimizing the parallel write performance. The GAS is an architecture for capturing write operations, buffering them in the memory, and scheduling them to reduce I/O cost at I/O nodes. The scheduling is done per compute node, and the requests are sent to the remote disks in parallel. In this paper, after introducing the GAS architecture in detail, its efficiency and scalability are evaluated using the NAS Parallel Benchmark BTIO. GAS is 5.2%faster than ROMIO collective I/O on PVFS2 in BTIO with 16 nodes/64 processes, and 34.9% faster than MPI noncollective I/O in the same configuration.\",\"PeriodicalId\":118263,\"journal\":{\"name\":\"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid\",\"volume\":\"58 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-05-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2009.71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2009.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

在由多个处理器或cpu核心组成的集群中，每个计算节点上可能运行多个进程。每个进程倾向于发出连续的I/O请求，例如快照、检查点等，但是，如果大量进程同时进入I/O阶段，则同一进程的请求可能会被其他进程的请求中断。然后，I/O节点以不连续的方式接收这些请求。这种交错访问模式会导致并行文件系统的性能下降。为了克服这个问题，我们设计了收集-排列-分散(GAS) I/O架构，以优化并行写性能。GAS是一种架构，用于捕获写操作，将它们缓冲在内存中，并对它们进行调度，以减少I/O节点上的I/O成本。调度是在每个计算节点上完成的，请求被并行地发送到远程磁盘。本文在详细介绍GAS体系结构的基础上，利用NAS并行基准BTIO对其效率和可扩展性进行了评估。在16节点/64进程的BTIO中，GAS比PVFS2上的ROMIO集体I/O快5.2%，比相同配置下的MPI非集体I/O快34.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Improving Parallel Write by Node-Level Request Scheduling

In a cluster of multiple processors or cpu-cores, many processes may run on each compute node. Each process tends to issue contiguous I/O requests for snapshot, checkpointing or so, however, if large number of processes enter the I/O phase at the same time, the requests from the same process may be interrupted by the requests of other processes. Then, the I/O nodes receive these requests as non-contiguous way. This interleaved access pattern causes performance degradation in parallel file systems. In order to overcome the problem, we have designed the Gather-Arrange-Scatter (GAS) I/O architecture, for optimizing the parallel write performance. The GAS is an architecture for capturing write operations, buffering them in the memory, and scheduling them to reduce I/O cost at I/O nodes. The scheduling is done per compute node, and the requests are sent to the remote disks in parallel. In this paper, after introducing the GAS architecture in detail, its efficiency and scalability are evaluated using the NAS Parallel Benchmark BTIO. GAS is 5.2%faster than ROMIO collective I/O on PVFS2 in BTIO with 16 nodes/64 processes, and 34.9% faster than MPI noncollective I/O in the same configuration.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid

自引率

0.00%

发文量