基于Myrinet的NOWs乐观仿真中检查点和通信库的调优

F. Quaglia, Andrea Santoro, B. Ciciani
{"title":"基于Myrinet的NOWs乐观仿真中检查点和通信库的调优","authors":"F. Quaglia, Andrea Santoro, B. Ciciani","doi":"10.1109/MASCOT.2001.948874","DOIUrl":null,"url":null,"abstract":"Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based network of workstations (NOWs) has been presented. CCL offloads checkpoint operations from the CPU by charging them to a programmable DMA engine on the Myrinet network card. CCL includes also functionalities for freezing the simulation application on demand, which can be used for data consistency maintenance (for example when a state buffer needs to be accessed for further modifications while a DMA based checkpoint operation involving it is still in progress). Programming the DMA to perform a checkpoint operation by transferring large data blocks in a single burst allows the latency of any checkpoint operation to be kept low. This reduces the probability for application freezing to really occur On the other hand, transferring large data blocks in a single burst might cause negative interference on communication since that DMA (and other circuitry) cannot be used for communication functionalities until the currently executed data transfer is not yet completed. In this paper we present a detailed identification of the effects of the burst length, from which we outline a set of relevant phenomena to take into account in order to determine a compile time suited value for the burst length itself. We also report measures quantifying these phenomena for the case of a PC cluster. Actually, the data indicate that communication functionalities do not suffer from the use of non-minimal burst lengths for checkpoint operations, thus pointing out how, if well tuned, CCL provides highly effective, CPU off-loaded, checkpointing functionalities.","PeriodicalId":375127,"journal":{"name":"MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Tuning of the Checkpointing and Communication Library for optimistic simulation on Myrinet based NOWs\",\"authors\":\"F. Quaglia, Andrea Santoro, B. Ciciani\",\"doi\":\"10.1109/MASCOT.2001.948874\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based network of workstations (NOWs) has been presented. CCL offloads checkpoint operations from the CPU by charging them to a programmable DMA engine on the Myrinet network card. CCL includes also functionalities for freezing the simulation application on demand, which can be used for data consistency maintenance (for example when a state buffer needs to be accessed for further modifications while a DMA based checkpoint operation involving it is still in progress). Programming the DMA to perform a checkpoint operation by transferring large data blocks in a single burst allows the latency of any checkpoint operation to be kept low. This reduces the probability for application freezing to really occur On the other hand, transferring large data blocks in a single burst might cause negative interference on communication since that DMA (and other circuitry) cannot be used for communication functionalities until the currently executed data transfer is not yet completed. In this paper we present a detailed identification of the effects of the burst length, from which we outline a set of relevant phenomena to take into account in order to determine a compile time suited value for the burst length itself. We also report measures quantifying these phenomena for the case of a PC cluster. Actually, the data indicate that communication functionalities do not suffer from the use of non-minimal burst lengths for checkpoint operations, thus pointing out how, if well tuned, CCL provides highly effective, CPU off-loaded, checkpointing functionalities.\",\"PeriodicalId\":375127,\"journal\":{\"name\":\"MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems\",\"volume\":\"34 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-08-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MASCOT.2001.948874\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"MASCOTS 2001, Proceedings Ninth International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MASCOT.2001.948874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

最近提出了一种用于基于Myrinet的工作站网络(NOWs)乐观仿真的检查点和通信库(CCL)。CCL通过向Myrinet网卡上的可编程DMA引擎收费,从CPU卸载检查点操作。CCL还包括按需冻结模拟应用程序的功能,可用于数据一致性维护(例如,当需要访问状态缓冲区以进行进一步修改时,涉及它的基于DMA的检查点操作仍在进行中)。通过在单个突发中传输大数据块对DMA进行编程以执行检查点操作,可以使任何检查点操作的延迟保持在较低的水平。另一方面,在单个突发中传输大数据块可能会对通信造成负面干扰,因为DMA(和其他电路)不能用于通信功能,直到当前执行的数据传输尚未完成。在本文中,我们对突发长度的影响进行了详细的鉴定,从中我们概述了一组需要考虑的相关现象,以便为突发长度本身确定适合编译时间的值。我们还报告了在PC集群的情况下量化这些现象的措施。实际上,数据表明,通信功能不会因为使用非最小突发长度进行检查点操作而受到影响,从而指出,如果调整得当,CCL如何提供高效的、CPU卸载的检查点功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Tuning of the Checkpointing and Communication Library for optimistic simulation on Myrinet based NOWs
Recently a Checkpointing and Communication Library (CCL) for optimistic simulation on Myrinet based network of workstations (NOWs) has been presented. CCL offloads checkpoint operations from the CPU by charging them to a programmable DMA engine on the Myrinet network card. CCL includes also functionalities for freezing the simulation application on demand, which can be used for data consistency maintenance (for example when a state buffer needs to be accessed for further modifications while a DMA based checkpoint operation involving it is still in progress). Programming the DMA to perform a checkpoint operation by transferring large data blocks in a single burst allows the latency of any checkpoint operation to be kept low. This reduces the probability for application freezing to really occur On the other hand, transferring large data blocks in a single burst might cause negative interference on communication since that DMA (and other circuitry) cannot be used for communication functionalities until the currently executed data transfer is not yet completed. In this paper we present a detailed identification of the effects of the burst length, from which we outline a set of relevant phenomena to take into account in order to determine a compile time suited value for the burst length itself. We also report measures quantifying these phenomena for the case of a PC cluster. Actually, the data indicate that communication functionalities do not suffer from the use of non-minimal burst lengths for checkpoint operations, thus pointing out how, if well tuned, CCL provides highly effective, CPU off-loaded, checkpointing functionalities.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Optimal resource assignment in Internet data centers A bit-parallel search algorithm for allocating free space BRITE: an approach to universal topology generation Analysis of timeout-based adaptive wormhole routing Performance study of a multipath routing method for wireless mobile ad hoc networks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1