{"title":"Analysis and Optimisation of PBS/TORQUE Fault Tolerance Tools","authors":"A. Efimov, K. Pavsky","doi":"10.1109/OPCS.2019.8880236","DOIUrl":null,"url":null,"abstract":"This work is devoted to the problem of detecting and processing faults of computing nodes during execution of parallel programs on distributed computing systems. The fault tolerance tools of PBS/TORQUE are considered. The functional model for faults handling optimization are proposed.","PeriodicalId":288547,"journal":{"name":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OPCS.2019.8880236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work is devoted to the problem of detecting and processing faults of computing nodes during execution of parallel programs on distributed computing systems. The fault tolerance tools of PBS/TORQUE are considered. The functional model for faults handling optimization are proposed.