A Topological Approach to Hardware Bug Triage

2015 16th International Workshop on Microprocessor and SOC Test and Verification (MTV) Pub Date : 2015-12-01 DOI:10.1109/MTV.2015.10

Rico Angell, Ben Oztalay, A. DeOrio

{"title":"A Topological Approach to Hardware Bug Triage","authors":"Rico Angell, Ben Oztalay, A. DeOrio","doi":"10.1109/MTV.2015.10","DOIUrl":null,"url":null,"abstract":"Verification is a critical bottleneck in the time to market of a new digital design. As complexity continues to increase, post-silicon validation shoulders an increasing share of the verification/validation effort. Post-silicon validation is burdened by large volumes of test failures, and is further complicated by root cause bugs that manifest in multiple test failures. At present, these failures are prioritized and assigned to validation engineers in an ad-hoc fashion. When multiple failures caused by the same root cause bug are debugged by multiple engineers at the same time, scarce, time-critical engineering resources are wasted. Our scalable bug triage technique begins with a database of test failures. It extracts defining features from the failure reports, using a novel, topology-aware approach based on graph partitioning. It then leverages unsupervised machine learning to extract the structure of the failures, identifying groups of failures that are likely to be the result of a common root cause. With our technique, related failures can be debugged as a group, rather than individually. Additionally, we propose a metric for measuring verification efficiency as a result of bug triage called Unique Debugging Instances (UDI). We evaluated our approach on the industrial-size OpenSPARC T2 design with a set of injected bugs, and found that our approach increased average verification efficiency by 243%, with a confidence interval of 99%.","PeriodicalId":273432,"journal":{"name":"2015 16th International Workshop on Microprocessor and SOC Test and Verification (MTV)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 16th International Workshop on Microprocessor and SOC Test and Verification (MTV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MTV.2015.10","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Verification is a critical bottleneck in the time to market of a new digital design. As complexity continues to increase, post-silicon validation shoulders an increasing share of the verification/validation effort. Post-silicon validation is burdened by large volumes of test failures, and is further complicated by root cause bugs that manifest in multiple test failures. At present, these failures are prioritized and assigned to validation engineers in an ad-hoc fashion. When multiple failures caused by the same root cause bug are debugged by multiple engineers at the same time, scarce, time-critical engineering resources are wasted. Our scalable bug triage technique begins with a database of test failures. It extracts defining features from the failure reports, using a novel, topology-aware approach based on graph partitioning. It then leverages unsupervised machine learning to extract the structure of the failures, identifying groups of failures that are likely to be the result of a common root cause. With our technique, related failures can be debugged as a group, rather than individually. Additionally, we propose a metric for measuring verification efficiency as a result of bug triage called Unique Debugging Instances (UDI). We evaluated our approach on the industrial-size OpenSPARC T2 design with a set of injected bugs, and found that our approach increased average verification efficiency by 243%, with a confidence interval of 99%.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

硬件Bug分类的拓扑方法

验证是新数字设计推向市场的关键瓶颈。随着复杂性的不断增加，后硅验证承担了越来越多的验证/确认工作。硅后验证被大量的测试失败所负担，并且由于在多个测试失败中出现的根本原因错误而进一步复杂化。目前，这些故障被按优先级排序，并以一种特别的方式分配给验证工程师。当由同一个根本原因bug引起的多个故障由多个工程师同时调试时，稀缺的、时间紧迫的工程资源就被浪费了。我们可扩展的bug分类技术从一个测试失败数据库开始。它使用一种基于图划分的新颖的拓扑感知方法，从故障报告中提取定义特征。然后，它利用无监督机器学习来提取故障的结构，识别可能由共同根本原因导致的故障组。使用我们的技术，相关的故障可以作为一个组进行调试，而不是单独调试。此外，我们提出了一个度量标准，用于度量作为错误分类结果的验证效率，称为唯一调试实例(Unique Debugging Instances, UDI)。我们在工业规模的OpenSPARC T2设计上用一组注入的错误评估了我们的方法，发现我们的方法将平均验证效率提高了243%，置信区间为99%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2015 16th International Workshop on Microprocessor and SOC Test and Verification (MTV)

自引率

0.00%

发文量