An Empirical Study of Challenges in Converting Deep Learning Models

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME) Pub Date : 2022-06-28 DOI:10.1109/ICSME55016.2022.00010

Moses Openja, Amin Nikanjam, Ahmed Haj Yahmed, Foutse Khomh, Zhengyong Jiang

{"title":"An Empirical Study of Challenges in Converting Deep Learning Models","authors":"Moses Openja, Amin Nikanjam, Ahmed Haj Yahmed, Foutse Khomh, Zhengyong Jiang","doi":"10.1109/ICSME55016.2022.00010","DOIUrl":null,"url":null,"abstract":"There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.","PeriodicalId":300084,"journal":{"name":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","volume":"239 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSME55016.2022.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

There is an increase in deploying Deep Learning (DL)-based software systems in real-world applications. Usually, DL models are developed and trained using DL frameworks like TensorFlow and PyTorch. Each framework has its own internal mechanisms/formats to represent and train DL models (deep neural networks), and usually those formats cannot be recognized by other frameworks. Moreover, trained models are usually deployed in environments different from where they were developed. To solve the interoperability issue and make DL models compatible with different frameworks/environments, some exchange formats are introduced for DL models, like ONNX and CoreML. However, ONNX and CoreML were never empirically evaluated by the community to reveal their prediction accuracy, performance, and robustness after conversion. Poor accuracy or non-robust behavior of converted models may lead to poor quality of deployed DL-based software systems. We conduct, in this paper, the first empirical study to assess ONNX and CoreML for converting trained DL models. In our systematic approach, two popular DL frameworks, Keras and PyTorch, are used to train five widely used DL models on three popular datasets. The trained models are then converted to ONNX and CoreML and transferred to two runtime environments designated for such formats, to be evaluated. We investigate the prediction accuracy before and after conversion. Our results unveil that the prediction accuracy of converted models are at the same level of originals. The performance (time cost and memory consumption) of converted models are studied as well. The size of models are reduced after conversion, which can result in optimized DL-based software deployment. We also study the adversarial robustness of converted models to make sure about the robustness of deployed DL-based software. Leveraging the state-of-the-art adversarial attack approaches, converted models are generally assessed robust at the same level of originals. However, obtained results show that CoreML models are more vulnerable to adversarial attacks compared to ONNX. The general message of our findings is that DL developers should be cautious on the deployment of converted models that may 1) perform poorly while switching from one framework to another, 2) have challenges in robust deployment, or 3) run slowly, leading to poor quality of deployed DL-based software, including DL-based software maintenance tasks, like bug prediction.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

深度学习模型转换挑战的实证研究

在实际应用中部署基于深度学习(DL)的软件系统的情况有所增加。通常，深度学习模型是使用像TensorFlow和PyTorch这样的深度学习框架开发和训练的。每个框架都有自己的内部机制/格式来表示和训练DL模型(深度神经网络)，通常这些格式不能被其他框架识别。此外，训练过的模型通常部署在不同于开发它们的环境中。为了解决互操作性问题并使深度学习模型与不同的框架/环境兼容，为深度学习模型引入了一些交换格式，如ONNX和CoreML。然而，ONNX和CoreML从未经过社区的经验评估，以揭示其转换后的预测准确性，性能和鲁棒性。转换模型的低准确性或非鲁棒性行为可能导致部署的基于dl的软件系统质量差。在本文中，我们进行了第一次实证研究，以评估ONNX和CoreML用于转换训练好的深度学习模型。在我们的系统方法中，使用两个流行的深度学习框架Keras和PyTorch在三个流行的数据集上训练五个广泛使用的深度学习模型。然后将训练好的模型转换为ONNX和CoreML，并转移到为这些格式指定的两个运行时环境中进行评估。研究了转换前后的预测精度。结果表明，转换后的模型预测精度与原始模型相当。对转换模型的性能(时间成本和内存消耗)进行了研究。转换后减小了模型的大小，从而优化了基于dl的软件部署。我们还研究了转换模型的对抗鲁棒性，以确保部署的基于dl的软件的鲁棒性。利用最先进的对抗性攻击方法，转换后的模型通常在与原始模型相同的水平上进行健壮性评估。然而，获得的结果表明，与ONNX相比，CoreML模型更容易受到对抗性攻击。我们发现的一般信息是，深度学习开发人员应该对转换模型的部署保持谨慎，这些模型可能1)在从一个框架切换到另一个框架时表现不佳，2)在健壮的部署中面临挑战，或者3)运行缓慢，导致部署的基于DL的软件质量差，包括基于DL的软件维护任务，如bug预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2022 IEEE International Conference on Software Maintenance and Evolution (ICSME)

自引率

0.00%

发文量

期刊最新文献

RestTestGen: An Extensible Framework for Automated Black-box Testing of RESTful APIs COBREX: A Tool for Extracting Business Rules from COBOL On the Security of Python Virtual Machines: An Empirical Study The Phantom Menace: Unmasking Security Issues in Evolving Software Impact of Defect Instances for Successful Deep Learning-based Automatic Program Repair