AUTOTRAINER:一个自动DNN训练问题检测和修复系统

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE) Pub Date : 2021-05-01 DOI:10.1109/ICSE43902.2021.00043

Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen

{"title":"AUTOTRAINER:一个自动DNN训练问题检测和修复系统","authors":"Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen","doi":"10.1109/ICSE43902.2021.00043","DOIUrl":null,"url":null,"abstract":"With machine learning models especially Deep Neural Network (DNN) models becoming an integral part of the new intelligent software, new tools to support their engineering process are in high demand. Existing DNN debugging tools are either post-training which wastes a lot of time training a buggy model and requires expertises, or limited on collecting training logs without analyzing the problem not even fixing them. In this paper, we propose AUTOTRAINER, a DNN training monitoring and automatic repairing tool which supports detecting and auto repairing five commonly seen training problems. During training, it periodically checks the training status and detects potential problems. Once a problem is found, AUTOTRAINER tries to fix it by using built-in state-of-the-art solutions. It supports various model structures and input data types, such as Convolutional Neural Networks (CNNs) for image and Recurrent Neural Networks (RNNs) for texts. Our evaluation on 6 datasets, 495 models show that AUTOTRAINER can effectively detect all potential problems with 100% detection rate and no false positives. Among all models with problems, it can fix 97.33% of them, increasing the accuracy by 47.08% on average.","PeriodicalId":305167,"journal":{"name":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System\",\"authors\":\"Xiaoyu Zhang, Juan Zhai, Shiqing Ma, Chao Shen\",\"doi\":\"10.1109/ICSE43902.2021.00043\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"With machine learning models especially Deep Neural Network (DNN) models becoming an integral part of the new intelligent software, new tools to support their engineering process are in high demand. Existing DNN debugging tools are either post-training which wastes a lot of time training a buggy model and requires expertises, or limited on collecting training logs without analyzing the problem not even fixing them. In this paper, we propose AUTOTRAINER, a DNN training monitoring and automatic repairing tool which supports detecting and auto repairing five commonly seen training problems. During training, it periodically checks the training status and detects potential problems. Once a problem is found, AUTOTRAINER tries to fix it by using built-in state-of-the-art solutions. It supports various model structures and input data types, such as Convolutional Neural Networks (CNNs) for image and Recurrent Neural Networks (RNNs) for texts. Our evaluation on 6 datasets, 495 models show that AUTOTRAINER can effectively detect all potential problems with 100% detection rate and no false positives. Among all models with problems, it can fix 97.33% of them, increasing the accuracy by 47.08% on average.\",\"PeriodicalId\":305167,\"journal\":{\"name\":\"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSE43902.2021.00043\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSE43902.2021.00043","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

摘要

随着机器学习模型，特别是深度神经网络(DNN)模型成为新的智能软件的组成部分，支持其工程过程的新工具需求量很大。现有的DNN调试工具要么是训练后的，这浪费了大量时间来训练一个有bug的模型，需要专业知识，要么仅限于收集训练日志，而没有分析问题，甚至没有修复问题。在本文中，我们提出了AUTOTRAINER，一个DNN训练监测和自动修复工具，支持检测和自动修复五个常见的训练问题。在训练过程中，定期检查训练状态，发现潜在问题。一旦发现问题，AUTOTRAINER会尝试使用内置的最先进的解决方案来解决问题。它支持各种模型结构和输入数据类型，例如图像的卷积神经网络(cnn)和文本的循环神经网络(rnn)。我们对6个数据集，495个模型的评估表明，AUTOTRAINER可以有效地检测出所有潜在的问题，检出率为100%，无误报。在所有存在问题的模型中，修复率达到97.33%，平均提高准确率47.08%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

AUTOTRAINER: An Automatic DNN Training Problem Detection and Repair System

With machine learning models especially Deep Neural Network (DNN) models becoming an integral part of the new intelligent software, new tools to support their engineering process are in high demand. Existing DNN debugging tools are either post-training which wastes a lot of time training a buggy model and requires expertises, or limited on collecting training logs without analyzing the problem not even fixing them. In this paper, we propose AUTOTRAINER, a DNN training monitoring and automatic repairing tool which supports detecting and auto repairing five commonly seen training problems. During training, it periodically checks the training status and detects potential problems. Once a problem is found, AUTOTRAINER tries to fix it by using built-in state-of-the-art solutions. It supports various model structures and input data types, such as Convolutional Neural Networks (CNNs) for image and Recurrent Neural Networks (RNNs) for texts. Our evaluation on 6 datasets, 495 models show that AUTOTRAINER can effectively detect all potential problems with 100% detection rate and no false positives. Among all models with problems, it can fix 97.33% of them, increasing the accuracy by 47.08% on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量

期刊最新文献

MuDelta: Delta-Oriented Mutation Testing at Commit Time Verifying Determinism in Sequential Programs Data-Oriented Differential Testing of Object-Relational Mapping Systems IoT Bugs and Development Challenges Onboarding vs. Diversity, Productivity and Quality — Empirical Study of the OpenStack Ecosystem