{"title":"防止故障后果在soc中传播的问题","authors":"Е. А. Suvorova","doi":"10.31799/978-5-8088-1554-4-2021-2-339-343","DOIUrl":null,"url":null,"abstract":"Accelerated aging processes occurring in chips manufactured according to fine design rules using ASIC technology lead to the occurrence of recoverable and unrecoverable faults in them during operation. Various mechanisms for fault mitigation are used to ensure the correct functioning of the SoC. The overhead costs (area and energy consumption) for the implementation of these mechanisms are different. The time required to detect an fault and restore the correct operation of the system is also different. As a rule, there are strict limits on the time of fault detection and recovery of the correct operation of the system and on the allowable overhead costs (area and power consumption). These restrictions can be especially severe for aerospace systems, for systems with hard real time requirements. However, the implementation of mechanisms of quickly detecting of faults lead essential hardware overheads. This paper discusses a way to eliminate this contradiction by preventing the consequences of the propagation of faults in individual components throughout the system. This approach can significantly reduce the time to restore the correct operation of the system. The paper discusses methods that allow preventing the propagation of the consequences of faults, an example of their application is given.","PeriodicalId":318959,"journal":{"name":"The Second International Scientific Conference. Collection of reports","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"THE PROBLEM OF PREVENTING THE PROPAGATION OF THE CONSEQUENCES OF FAULTS IN THE SOC\",\"authors\":\"Е. А. Suvorova\",\"doi\":\"10.31799/978-5-8088-1554-4-2021-2-339-343\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accelerated aging processes occurring in chips manufactured according to fine design rules using ASIC technology lead to the occurrence of recoverable and unrecoverable faults in them during operation. Various mechanisms for fault mitigation are used to ensure the correct functioning of the SoC. The overhead costs (area and energy consumption) for the implementation of these mechanisms are different. The time required to detect an fault and restore the correct operation of the system is also different. As a rule, there are strict limits on the time of fault detection and recovery of the correct operation of the system and on the allowable overhead costs (area and power consumption). These restrictions can be especially severe for aerospace systems, for systems with hard real time requirements. However, the implementation of mechanisms of quickly detecting of faults lead essential hardware overheads. This paper discusses a way to eliminate this contradiction by preventing the consequences of the propagation of faults in individual components throughout the system. This approach can significantly reduce the time to restore the correct operation of the system. The paper discusses methods that allow preventing the propagation of the consequences of faults, an example of their application is given.\",\"PeriodicalId\":318959,\"journal\":{\"name\":\"The Second International Scientific Conference. Collection of reports\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Second International Scientific Conference. Collection of reports\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31799/978-5-8088-1554-4-2021-2-339-343\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Second International Scientific Conference. Collection of reports","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31799/978-5-8088-1554-4-2021-2-339-343","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
THE PROBLEM OF PREVENTING THE PROPAGATION OF THE CONSEQUENCES OF FAULTS IN THE SOC
Accelerated aging processes occurring in chips manufactured according to fine design rules using ASIC technology lead to the occurrence of recoverable and unrecoverable faults in them during operation. Various mechanisms for fault mitigation are used to ensure the correct functioning of the SoC. The overhead costs (area and energy consumption) for the implementation of these mechanisms are different. The time required to detect an fault and restore the correct operation of the system is also different. As a rule, there are strict limits on the time of fault detection and recovery of the correct operation of the system and on the allowable overhead costs (area and power consumption). These restrictions can be especially severe for aerospace systems, for systems with hard real time requirements. However, the implementation of mechanisms of quickly detecting of faults lead essential hardware overheads. This paper discusses a way to eliminate this contradiction by preventing the consequences of the propagation of faults in individual components throughout the system. This approach can significantly reduce the time to restore the correct operation of the system. The paper discusses methods that allow preventing the propagation of the consequences of faults, an example of their application is given.