CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE) Pub Date : 2019-05-01 DOI:10.1109/ICSE.2019.00107

H. Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan

{"title":"CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries","authors":"H. Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan","doi":"10.1109/ICSE.2019.00107","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"1027-1038"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"122","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 122

Abstract

Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

CRADLE:跨后端验证以检测和定位深度学习库中的错误

深度学习(DL)系统被广泛应用于飞机防撞系统、阿尔茨海默病诊断和自动驾驶汽车等领域。尽管对高可靠性有要求，但深度学习系统很难测试。现有的深度学习测试工作侧重于测试深度学习模型，而不是模型的实现(例如，深度学习软件库)。测试DL库的一个关键挑战是很难知道给定输入实例的DL库的预期输出。幸运的是，相同的深度学习算法在不同的深度学习库中有多种实现。因此，我们提出了CRADLE，这是一种专注于查找和定位DL软件库中的错误的新方法。CRADLE(1)执行跨实现不一致性检查以检测DL库中的错误，(2)利用异常传播跟踪和分析来定位导致错误的DL库中的错误函数。我们在三个库(TensorFlow, CNTK和Theano)， 11个数据集(包括ImageNet, MNIST和KGS Go游戏)和30个预训练模型上评估CRADLE。CRADLE检测到12个bug和104个唯一的不一致，并突出显示与所有104个唯一不一致的不一致原因相关的函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

VFix: Value-Flow-Guided Precise Program Repair for Null Pointer Dereferences Search-Based Energy Testing of Android Scalable Approaches for Test Suite Reduction A System Identification Based Oracle for Control-CPS Software Fault Localization Training Binary Classifiers as Data Structure Invariants