M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson
{"title":"机器学习用例对工业数据管道的影响","authors":"M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson","doi":"10.1109/APSEC53868.2021.00053","DOIUrl":null,"url":null,"abstract":"The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"On the Impact of ML use cases on Industrial Data Pipelines\",\"authors\":\"M. A. Raj, Jan Bosch, H. H. Olsson, Anders Jansson\",\"doi\":\"10.1109/APSEC53868.2021.00053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.\",\"PeriodicalId\":143800,\"journal\":{\"name\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC53868.2021.00053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
On the Impact of ML use cases on Industrial Data Pipelines
The impact of the Artificial Intelligence revolution is undoubtedly substantial in our society, life, firms, and employment. With data being a critical element, organizations are working towards obtaining high-quality data to train their AI models. Although data, data management, and data pipelines are part of industrial practice even before the introduction of ML models, the significance of data increased further with the advent of ML models, which force data pipeline developers to go beyond the traditional focus on data quality. The objective of this study is to analyze the impact of ML use cases on data pipelines. We assume that the data pipelines that serve ML models are given more importance compared to the conventional data pipelines. We report on a study that we conducted by observing software teams at three companies as they develop both conventional(Non-ML) data pipelines and data pipelines that serve ML-based applications. We study six data pipelines from three companies and categorize them based on their criticality and purpose. Further, we identify the determinants that can be used to compare the development and maintenance of these data pipelines. Finally, we map these factors in a two-dimensional space to illustrate their importance on a scale of low, moderate, and high.