{"title":"PBS/TORQUE容错工具的分析与优化","authors":"A. Efimov, K. Pavsky","doi":"10.1109/OPCS.2019.8880236","DOIUrl":null,"url":null,"abstract":"This work is devoted to the problem of detecting and processing faults of computing nodes during execution of parallel programs on distributed computing systems. The fault tolerance tools of PBS/TORQUE are considered. The functional model for faults handling optimization are proposed.","PeriodicalId":288547,"journal":{"name":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Analysis and Optimisation of PBS/TORQUE Fault Tolerance Tools\",\"authors\":\"A. Efimov, K. Pavsky\",\"doi\":\"10.1109/OPCS.2019.8880236\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work is devoted to the problem of detecting and processing faults of computing nodes during execution of parallel programs on distributed computing systems. The fault tolerance tools of PBS/TORQUE are considered. The functional model for faults handling optimization are proposed.\",\"PeriodicalId\":288547,\"journal\":{\"name\":\"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/OPCS.2019.8880236\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 15th International Asian School-Seminar Optimization Problems of Complex Systems (OPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/OPCS.2019.8880236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Analysis and Optimisation of PBS/TORQUE Fault Tolerance Tools
This work is devoted to the problem of detecting and processing faults of computing nodes during execution of parallel programs on distributed computing systems. The fault tolerance tools of PBS/TORQUE are considered. The functional model for faults handling optimization are proposed.