Pub Date : 2023-03-24DOI: 10.1109/CAIN58948.2023.00012
Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, A. V. Deursen
Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term "results" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of $color{green}{text{Green AI}}$ to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the training stage of the deep learning pipeline from a sustainability perspective, through the study of hyperparameter tuning strategies and the model complexity, two factors vastly impacting the overall pipeline’s energy consumption. First, we investigate the effectiveness of grid search, random search and Bayesian optimisation during hyperparameter tuning, and we find that Bayesian optimisation significantly dominates the other strategies. Furthermore, we analyse the architecture of convolutional neural networks with the energy consumption of three prominent layer types: convolutional, linear and ReLU layers. The results show that convolutional layers are the most computationally expensive by a strong margin. Additionally, we observe diminishing returns in accuracy for more energy-hungry models. The overall energy consumption of training can be halved by reducing the network complexity. In conclusion, we highlight innovative and promising energy-efficient practices for training deep learning models. To expand the application of $color{green}{text{Green AI}}$, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.
{"title":"Uncovering Energy-Efficient Practices in Deep Learning Training: Preliminary Steps Towards Green AI","authors":"Tim Yarally, Luís Cruz, Daniel Feitosa, June Sallou, A. V. Deursen","doi":"10.1109/CAIN58948.2023.00012","DOIUrl":"https://doi.org/10.1109/CAIN58948.2023.00012","url":null,"abstract":"Modern AI practices all strive towards the same goal: better results. In the context of deep learning, the term \"results\" often refers to the achieved accuracy on a competitive problem set. In this paper, we adopt an idea from the emerging field of $color{green}{text{Green AI}}$ to consider energy consumption as a metric of equal importance to accuracy and to reduce any irrelevant tasks or energy usage. We examine the training stage of the deep learning pipeline from a sustainability perspective, through the study of hyperparameter tuning strategies and the model complexity, two factors vastly impacting the overall pipeline’s energy consumption. First, we investigate the effectiveness of grid search, random search and Bayesian optimisation during hyperparameter tuning, and we find that Bayesian optimisation significantly dominates the other strategies. Furthermore, we analyse the architecture of convolutional neural networks with the energy consumption of three prominent layer types: convolutional, linear and ReLU layers. The results show that convolutional layers are the most computationally expensive by a strong margin. Additionally, we observe diminishing returns in accuracy for more energy-hungry models. The overall energy consumption of training can be halved by reducing the network complexity. In conclusion, we highlight innovative and promising energy-efficient practices for training deep learning models. To expand the application of $color{green}{text{Green AI}}$, we advocate for a shift in the design of deep learning models, by considering the trade-off between energy efficiency and accuracy.","PeriodicalId":175580,"journal":{"name":"2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115647931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-23DOI: 10.1109/CAIN58948.2023.00032
M. Grote, J. Bogner
Today, many systems use artificial intelligence (AI) to solve complex problems. While this often increases system effectiveness, developing a production-ready AI-based system is a difficult task. Thus, solid AI engineering practices are required to ensure the quality of the resulting system and to improve the development process. While several practices have already been proposed for the development of AI-based systems, detailed practical experiences of applying these practices are rare.In this paper, we aim to address this gap by collecting such experiences during a case study, namely the development of an autonomous stock trading system that uses machine learning functionality to invest in stocks. We selected 10 AI engineering practices from the literature and systematically applied them during development, with the goal to collect evidence about their applicability and effectiveness. Using structured field notes, we documented our experiences. Furthermore, we also used field notes to document challenges that occurred during the development, and the solutions we applied to overcome them. Afterwards, we analyzed the collected field notes, and evaluated how each practice improved the development. Lastly, we compared our evidence with existing literature.Most applied practices improved our system, albeit to varying extent, and we were able to overcome all major challenges. The qualitative results provide detailed accounts about 10 AI engineering practices, as well as challenges and solutions associated with such a project. Our experiences therefore enrich the emerging body of evidence in this field, which may be especially helpful for practitioner teams new to AI engineering.
{"title":"A Case Study on AI Engineering Practices: Developing an Autonomous Stock Trading System","authors":"M. Grote, J. Bogner","doi":"10.1109/CAIN58948.2023.00032","DOIUrl":"https://doi.org/10.1109/CAIN58948.2023.00032","url":null,"abstract":"Today, many systems use artificial intelligence (AI) to solve complex problems. While this often increases system effectiveness, developing a production-ready AI-based system is a difficult task. Thus, solid AI engineering practices are required to ensure the quality of the resulting system and to improve the development process. While several practices have already been proposed for the development of AI-based systems, detailed practical experiences of applying these practices are rare.In this paper, we aim to address this gap by collecting such experiences during a case study, namely the development of an autonomous stock trading system that uses machine learning functionality to invest in stocks. We selected 10 AI engineering practices from the literature and systematically applied them during development, with the goal to collect evidence about their applicability and effectiveness. Using structured field notes, we documented our experiences. Furthermore, we also used field notes to document challenges that occurred during the development, and the solutions we applied to overcome them. Afterwards, we analyzed the collected field notes, and evaluated how each practice improved the development. Lastly, we compared our evidence with existing literature.Most applied practices improved our system, albeit to varying extent, and we were able to overcome all major challenges. The qualitative results provide detailed accounts about 10 AI engineering practices, as well as challenges and solutions associated with such a project. Our experiences therefore enrich the emerging body of evidence in this field, which may be especially helpful for practitioner teams new to AI engineering.","PeriodicalId":175580,"journal":{"name":"2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123071325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-23DOI: 10.1109/CAIN58948.2023.00035
Lukas Heiland, Marius Hauser, J. Bogner
Systems with artificial intelligence components, so-called AI-based systems, have gained considerable attention recently. However, many organizations have issues with achieving production readiness with such systems. As a means to improve certain software quality attributes and to address frequently occurring problems, design patterns represent proven solution blueprints. While new patterns for AI-based systems are emerging, existing patterns have also been adapted to this new context.The goal of this study is to provide an overview of design patterns for AI-based systems, both new and adapted ones. We want to collect and categorize patterns, and make them accessible for researchers and practitioners. To this end, we first performed a multivocal literature review (MLR) to collect design patterns used with AI-based systems. We then integrated the created pattern collection into a web-based pattern repository to make the patterns browsable and easy to find.As a result, we selected 51 resources (35 white and 16 gray ones), from which we extracted 70 unique patterns used for AI-based systems. Among these are 34 new patterns and 36 traditional ones that have been adapted to this context. Popular pattern categories include architecture (25 patterns), deployment (16), implementation (9), or security & safety (9). While some patterns with four or more mentions already seem established, the majority of patterns have only been mentioned once or twice (51 patterns). Our results in this emerging field can be used by researchers as a foundation for follow-up studies and by practitioners to discover relevant patterns for informing the design of AI-based systems.
{"title":"Design Patterns for AI-based Systems: A Multivocal Literature Review and Pattern Repository","authors":"Lukas Heiland, Marius Hauser, J. Bogner","doi":"10.1109/CAIN58948.2023.00035","DOIUrl":"https://doi.org/10.1109/CAIN58948.2023.00035","url":null,"abstract":"Systems with artificial intelligence components, so-called AI-based systems, have gained considerable attention recently. However, many organizations have issues with achieving production readiness with such systems. As a means to improve certain software quality attributes and to address frequently occurring problems, design patterns represent proven solution blueprints. While new patterns for AI-based systems are emerging, existing patterns have also been adapted to this new context.The goal of this study is to provide an overview of design patterns for AI-based systems, both new and adapted ones. We want to collect and categorize patterns, and make them accessible for researchers and practitioners. To this end, we first performed a multivocal literature review (MLR) to collect design patterns used with AI-based systems. We then integrated the created pattern collection into a web-based pattern repository to make the patterns browsable and easy to find.As a result, we selected 51 resources (35 white and 16 gray ones), from which we extracted 70 unique patterns used for AI-based systems. Among these are 34 new patterns and 36 traditional ones that have been adapted to this context. Popular pattern categories include architecture (25 patterns), deployment (16), implementation (9), or security & safety (9). While some patterns with four or more mentions already seem established, the majority of patterns have only been mentioned once or twice (51 patterns). Our results in this emerging field can be used by researchers as a foundation for follow-up studies and by practitioners to discover relevant patterns for informing the design of AI-based systems.","PeriodicalId":175580,"journal":{"name":"2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123177115","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-23DOI: 10.1109/CAIN58948.2023.00029
P. Heck, Gerard Schouten
For an AI solution to evolve from a trained machine learning model into a production-ready AI system, many more things need to be considered than just the performance of the machine learning model. A production-ready AI system needs to be trustworthy, i.e. of high quality. But how to determine this in practiceƒ For traditional software, ISO25000 and its predecessors have since long time been used to define and measure quality characteristics. Recently, quality models for AI systems, based on ISO25000, have been introduced. This paper applies one such quality model to a real-life case study: a deep learning platform for monitoring wildflowers. The paper presents three realistic scenarios sketching what it means to respectively use, extend and incrementally improve the deep learning platform for wildflower identification and counting. Next, it is shown how the quality model can be used as a structured dictionary to define quality requirements for data, model and software. Future work remains to extend the quality model with metrics, tools and best practices to aid AI engineering practitioners in implementing trustworthy AI systems.
{"title":"Defining Quality Requirements for a Trustworthy AI Wildflower Monitoring Platform","authors":"P. Heck, Gerard Schouten","doi":"10.1109/CAIN58948.2023.00029","DOIUrl":"https://doi.org/10.1109/CAIN58948.2023.00029","url":null,"abstract":"For an AI solution to evolve from a trained machine learning model into a production-ready AI system, many more things need to be considered than just the performance of the machine learning model. A production-ready AI system needs to be trustworthy, i.e. of high quality. But how to determine this in practiceƒ For traditional software, ISO25000 and its predecessors have since long time been used to define and measure quality characteristics. Recently, quality models for AI systems, based on ISO25000, have been introduced. This paper applies one such quality model to a real-life case study: a deep learning platform for monitoring wildflowers. The paper presents three realistic scenarios sketching what it means to respectively use, extend and incrementally improve the deep learning platform for wildflower identification and counting. Next, it is shown how the quality model can be used as a structured dictionary to define quality requirements for data, model and software. Future work remains to extend the quality model with metrics, tools and best practices to aid AI engineering practitioners in implementing trustworthy AI systems.","PeriodicalId":175580,"journal":{"name":"2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)","volume":"42 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131205550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-17DOI: 10.1109/CAIN58948.2023.00013
Nicolás Cardozo, Ivana Dusparic, Christian Cabrera
Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects’ maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms.
{"title":"Prevalence of Code Smells in Reinforcement Learning Projects","authors":"Nicolás Cardozo, Ivana Dusparic, Christian Cabrera","doi":"10.1109/CAIN58948.2023.00013","DOIUrl":"https://doi.org/10.1109/CAIN58948.2023.00013","url":null,"abstract":"Reinforcement Learning (RL) is being increasingly used to learn and adapt application behavior in many domains, including large-scale and safety critical systems, as for example, autonomous driving. With the advent of plug-n-play RL libraries, its applicability has further increased, enabling integration of RL algorithms by users. We note, however, that the majority of such code is not developed by RL engineers, which as a consequence, may lead to poor program quality yielding bugs, suboptimal performance, maintainability, and evolution problems for RL-based projects. In this paper we begin the exploration of this hypothesis, specific to code utilizing RL, analyzing different projects found in the wild, to assess their quality from a software engineering perspective. Our study includes 24 popular RL-based Python projects, analyzed with standard software engineering metrics. Our results, aligned with similar analyses for ML code in general, show that popular and widely reused RL repositories contain many code smells (3.95% of the code base on average), significantly affecting the projects’ maintainability. The most common code smells detected are long method and long method chain, highlighting problems in the definition and interaction of agents. Detected code smells suggest problems in responsibility separation, and the appropriateness of current abstractions for the definition of RL algorithms.","PeriodicalId":175580,"journal":{"name":"2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)","volume":"291 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114837337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-16DOI: 10.1109/CAIN58948.2023.00010
Andrei Paleyes, Siyuan Guo, B. Scholkopf, Neil D. Lawrence
Component-based development is one of the core principles behind modern software engineering practices. Understanding of causal relationships between components of a software system can yield significant benefits to developers. Yet modern software design approaches make it difficult to track and discover such relationships at system scale, which leads to growing intellectual debt. In this paper we consider an alternative approach to software design, flow-based programming (FBP), and draw the attention of the community to the connection between dataflow graphs produced by FBP and structural causal models. With expository examples we show how this connection can be leveraged to improve day-to-day tasks in software projects, including fault localisation, business analysis and experimentation.
{"title":"Dataflow graphs as complete causal graphs","authors":"Andrei Paleyes, Siyuan Guo, B. Scholkopf, Neil D. Lawrence","doi":"10.1109/CAIN58948.2023.00010","DOIUrl":"https://doi.org/10.1109/CAIN58948.2023.00010","url":null,"abstract":"Component-based development is one of the core principles behind modern software engineering practices. Understanding of causal relationships between components of a software system can yield significant benefits to developers. Yet modern software design approaches make it difficult to track and discover such relationships at system scale, which leads to growing intellectual debt. In this paper we consider an alternative approach to software design, flow-based programming (FBP), and draw the attention of the community to the connection between dataflow graphs produced by FBP and structural causal models. With expository examples we show how this connection can be leveraged to improve day-to-day tasks in software projects, including fault localisation, business analysis and experimentation.","PeriodicalId":175580,"journal":{"name":"2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114728744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-03-10DOI: 10.1109/CAIN58948.2023.00011
Hans-Martin Heyn, K. M. Habibullah, E. Knauss, Jennifer Horkoff, Markus Borg, Alessia Knauss, Polly Jing Li
Software that contains machine learning algorithms is an integral part of automotive perception, for example, in driving automation systems. The development of such software, specifically the training and validation of the machine learning components, requires large annotated datasets. An industry of data and annotation services has emerged to serve the development of such data-intensive automotive software components. Wide-spread difficulties to specify data and annotation needs challenge collaborations between OEMs (Original Equipment Manufacturers) and their suppliers of software components, data, and annotations.This paper investigates the reasons for these difficulties for practitioners in the Swedish automotive industry to arrive at clear specifications for data and annotations. The results from an interview study show that a lack of effective metrics for data quality aspects, ambiguities in the way of working, unclear definitions of annotation quality, and deficits in the business ecosystems are causes for the difficulty in deriving the specifications. We provide a list of recommendations that can mitigate challenges when deriving specifications and we propose future research opportunities to overcome these challenges. Our work contributes towards the on-going research on accountability of machine learning as applied to complex software systems, especially for high-stake applications such as automated driving.
{"title":"Automotive Perception Software Development: An Empirical Investigation into Data, Annotation, and Ecosystem Challenges","authors":"Hans-Martin Heyn, K. M. Habibullah, E. Knauss, Jennifer Horkoff, Markus Borg, Alessia Knauss, Polly Jing Li","doi":"10.1109/CAIN58948.2023.00011","DOIUrl":"https://doi.org/10.1109/CAIN58948.2023.00011","url":null,"abstract":"Software that contains machine learning algorithms is an integral part of automotive perception, for example, in driving automation systems. The development of such software, specifically the training and validation of the machine learning components, requires large annotated datasets. An industry of data and annotation services has emerged to serve the development of such data-intensive automotive software components. Wide-spread difficulties to specify data and annotation needs challenge collaborations between OEMs (Original Equipment Manufacturers) and their suppliers of software components, data, and annotations.This paper investigates the reasons for these difficulties for practitioners in the Swedish automotive industry to arrive at clear specifications for data and annotations. The results from an interview study show that a lack of effective metrics for data quality aspects, ambiguities in the way of working, unclear definitions of annotation quality, and deficits in the business ecosystems are causes for the difficulty in deriving the specifications. We provide a list of recommendations that can mitigate challenges when deriving specifications and we propose future research opportunities to overcome these challenges. Our work contributes towards the on-going research on accountability of machine learning as applied to complex software systems, especially for high-stake applications such as automated driving.","PeriodicalId":175580,"journal":{"name":"2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN)","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131332223","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}