CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries

H. Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan
{"title":"CRADLE: Cross-Backend Validation to Detect and Localize Bugs in Deep Learning Libraries","authors":"H. Pham, Thibaud Lutellier, Weizhen Qi, Lin Tan","doi":"10.1109/ICSE.2019.00107","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"1027-1038"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"122","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE.2019.00107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 122

Abstract

Deep learning (DL) systems are widely used in domains including aircraft collision avoidance systems, Alzheimer's disease diagnosis, and autonomous driving cars. Despite the requirement for high reliability, DL systems are difficult to test. Existing DL testing work focuses on testing the DL models, not the implementations (e.g., DL software libraries) of the models. One key challenge of testing DL libraries is the difficulty of knowing the expected output of DL libraries given an input instance. Fortunately, there are multiple implementations of the same DL algorithms in different DL libraries. Thus, we propose CRADLE, a new approach that focuses on finding and localizing bugs in DL software libraries. CRADLE (1) performs cross-implementation inconsistency checking to detect bugs in DL libraries, and (2) leverages anomaly propagation tracking and analysis to localize faulty functions in DL libraries that cause the bugs. We evaluate CRADLE on three libraries (TensorFlow, CNTK, and Theano), 11 datasets (including ImageNet, MNIST, and KGS Go game), and 30 pre-trained models. CRADLE detects 12 bugs and 104 unique inconsistencies, and highlights functions relevant to the causes of inconsistencies for all 104 unique inconsistencies.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
CRADLE:跨后端验证以检测和定位深度学习库中的错误
深度学习(DL)系统被广泛应用于飞机防撞系统、阿尔茨海默病诊断和自动驾驶汽车等领域。尽管对高可靠性有要求,但深度学习系统很难测试。现有的深度学习测试工作侧重于测试深度学习模型,而不是模型的实现(例如,深度学习软件库)。测试DL库的一个关键挑战是很难知道给定输入实例的DL库的预期输出。幸运的是,相同的深度学习算法在不同的深度学习库中有多种实现。因此,我们提出了CRADLE,这是一种专注于查找和定位DL软件库中的错误的新方法。CRADLE(1)执行跨实现不一致性检查以检测DL库中的错误,(2)利用异常传播跟踪和分析来定位导致错误的DL库中的错误函数。我们在三个库(TensorFlow, CNTK和Theano), 11个数据集(包括ImageNet, MNIST和KGS Go游戏)和30个预训练模型上评估CRADLE。CRADLE检测到12个bug和104个唯一的不一致,并突出显示与所有104个唯一不一致的不一致原因相关的函数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
VFix: Value-Flow-Guided Precise Program Repair for Null Pointer Dereferences Search-Based Energy Testing of Android Scalable Approaches for Test Suite Reduction A System Identification Based Oracle for Control-CPS Software Fault Localization Training Binary Classifiers as Data Structure Invariants
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1