Junchi Ma, Yuzhu Ding, Sulei Huang, Zongtao Duan, Lei Tang
{"title":"TEMPORISE: Extracting semantic representations of varied input executions for silent data corruption evaluation","authors":"Junchi Ma, Yuzhu Ding, Sulei Huang, Zongtao Duan, Lei Tang","doi":"10.1016/j.future.2025.107734","DOIUrl":null,"url":null,"abstract":"<div><div>The continuous advancement of technology has led to increasingly complex computing systems, but it has also made them more susceptible to soft errors. Among the challenges posed by soft errors, silent data corruption (SDC) stands out as a particularly insidious threat, often occurring without warning. Estimating SDC probabilities for a program is a formidable task due to the diversity of inputs it can encounter, resulting in significant variations in these probabilities. This paper introduces TEMPORISE, a novel approach designed to tackle this challenge. TEMPORISE leverages the control data flow graph and calling context tree to represent the commonalities and distinctions between different input executions. The embeddings of these graphs are learned through structured graph attention network and AttrE2vec. These embeddings are then combined and input into a regression model to calculate SDC probabilities. The experiments demonstrate that TEMPORISE excels in predicting SDC probabilities, achieving a 78.4 % reduction in mean absolute error compared to vTRIDENT, the state-of-the-art baseline model. Moreover, TEMPORISE improves the rank correlation of SDC probabilities for various inputs by 11.4 % compared to vTRIDENT, indicating its superior ability to capture the relative ordering of SDC probabilities. In terms of computational efficiency, TEMPORISE boasts an impressive 91.3 % reduction in time cost compared to the traditional fault injection approach.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"166 ","pages":"Article 107734"},"PeriodicalIF":6.2000,"publicationDate":"2025-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25000299","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
The continuous advancement of technology has led to increasingly complex computing systems, but it has also made them more susceptible to soft errors. Among the challenges posed by soft errors, silent data corruption (SDC) stands out as a particularly insidious threat, often occurring without warning. Estimating SDC probabilities for a program is a formidable task due to the diversity of inputs it can encounter, resulting in significant variations in these probabilities. This paper introduces TEMPORISE, a novel approach designed to tackle this challenge. TEMPORISE leverages the control data flow graph and calling context tree to represent the commonalities and distinctions between different input executions. The embeddings of these graphs are learned through structured graph attention network and AttrE2vec. These embeddings are then combined and input into a regression model to calculate SDC probabilities. The experiments demonstrate that TEMPORISE excels in predicting SDC probabilities, achieving a 78.4 % reduction in mean absolute error compared to vTRIDENT, the state-of-the-art baseline model. Moreover, TEMPORISE improves the rank correlation of SDC probabilities for various inputs by 11.4 % compared to vTRIDENT, indicating its superior ability to capture the relative ordering of SDC probabilities. In terms of computational efficiency, TEMPORISE boasts an impressive 91.3 % reduction in time cost compared to the traditional fault injection approach.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.