Pub Date : 2021-07-06DOI: 10.1109/SEAA53835.2021.00046
Yuchu Liu, D. I. Mattos, J. Bosch, H. H. Olsson, Jonn Lantz
A/B testing is gaining attention in the automotive sector as a promising tool to measure casual effects from software changes. Different from the web-facing businesses, where A/B testing has been well-established, the automotive domain often suffers from limited eligible users to participate in online experiments. To address this shortcoming, we present a method for designing balanced control and treatment groups so that sound conclusions can be drawn from experiments with considerably small sample sizes. While the Balance Match Weighted method has been used in other domains such as medicine, this is the first paper to apply and evaluate it in the context of software development. Furthermore, we describe the Balance Match Weighted method in detail and we conduct a case study together with an automotive manufacturer to apply the group design method in a fleet of vehicles. Finally, we present our case study in the automotive software engineering domain, as well as a discussion on the benefits and limitations of the A/B group design method.
{"title":"Size matters? Or not: A/B testing with limited sample in automotive embedded software","authors":"Yuchu Liu, D. I. Mattos, J. Bosch, H. H. Olsson, Jonn Lantz","doi":"10.1109/SEAA53835.2021.00046","DOIUrl":"https://doi.org/10.1109/SEAA53835.2021.00046","url":null,"abstract":"A/B testing is gaining attention in the automotive sector as a promising tool to measure casual effects from software changes. Different from the web-facing businesses, where A/B testing has been well-established, the automotive domain often suffers from limited eligible users to participate in online experiments. To address this shortcoming, we present a method for designing balanced control and treatment groups so that sound conclusions can be drawn from experiments with considerably small sample sizes. While the Balance Match Weighted method has been used in other domains such as medicine, this is the first paper to apply and evaluate it in the context of software development. Furthermore, we describe the Balance Match Weighted method in detail and we conduct a case study together with an automotive manufacturer to apply the group design method in a fleet of vehicles. Finally, we present our case study in the automotive software engineering domain, as well as a discussion on the benefits and limitations of the A/B group design method.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132636297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2021-06-30DOI: 10.1109/SEAA53835.2021.00053
C. Berger
Machine Learning (ML)-enabled software systems have been incorporated in many public demonstrations for automated driving (AD) systems. Such solutions have also been considered as a crucial approach to aim at SAE Level 5 systems, where the passengers in such vehicles do not have to interact with the system at all anymore. Already in 2016, Nvidia demonstrated a complete end-to-end approach for training the complete software stack covering perception, planning and decision making, and the actual vehicle control. While such approaches show the great potential of such ML-enabled systems, there have also been demonstrations where already changes to single pixels in a video frame can potentially lead to completely different decisions with dangerous consequences in the worst case. In this paper, a structured analysis has been conducted to explore video degradation effects on the performance of an ML-enabled pedestrian detector. Firstly, a baseline of applying “You only look once” (YOLO) to 1,026 frames with pedestrian annotations in the KITTI Vision Benchmark Suite has been established. Next, video degradation candidates for each of these frames were generated using the leading video compression codecs libx264, libx265, Nvidia HEVC, and AV1: 52 frames for the various compression presets for color frames, and 52 frames for gray-scale frames resulting in 104 degradation candidates per original KITTI frame and in 426,816 images in total. YOLO was applied to each image to compute the intersection-over-union (IoU) metric to compare the performance with the original baseline. While aggressively lossy compression settings result in significant performance drops as expected, it was also observed that some configurations actually result in slightly better IoU results compared to the baseline. Hence, while related work in literature demonstrated the potentially negative consequences of even simple modifications to video data when using ML-enabled systems, the findings from this work show that carefully chosen lossy video configurations preserve a decent performance of particular ML-enabled systems while allowing for substantial savings when storing or transmitting data. Such aspects are of crucial importance when, for example, video data needs to be collected from multiple vehicles wirelessly, where lossy video codecs are required to cope with bandwidth limitations for example.
支持机器学习(ML)的软件系统已被纳入许多自动驾驶(AD)系统的公开演示中。这种解决方案也被认为是实现SAE 5级系统的关键途径,即车辆中的乘客不再需要与系统互动。早在2016年,英伟达就展示了一种完整的端到端方法,用于培训涵盖感知、规划和决策以及实际车辆控制的完整软件堆栈。虽然这些方法显示了这种基于ml的系统的巨大潜力,但也有一些演示表明,在视频帧中改变单个像素可能会导致完全不同的决定,在最坏的情况下可能会带来危险的后果。本文进行了结构化分析,以探索视频退化对启用ml的行人检测器性能的影响。首先,在KITTI视觉基准测试套件中建立了对1026帧行人注释应用“You only look once”(YOLO)的基线。接下来,使用领先的视频压缩编解码器libx264、libx265、Nvidia HEVC和AV1生成这些帧的视频退化候选项:彩色帧的各种压缩预设为52帧,灰度帧的压缩预设为52帧,因此每个原始KITTI帧有104个退化候选项,总共有426,816张图像。对每张图像应用YOLO来计算相交-超并度(IoU)度量,并将性能与原始基线进行比较。虽然像预期的那样,严重的有损压缩设置会导致显著的性能下降,但也观察到,与基线相比,某些配置实际上会产生略好的IoU结果。因此,虽然文献中的相关工作表明,在使用支持ml的系统时,即使对视频数据进行简单的修改也会产生潜在的负面影响,但本工作的发现表明,精心选择的有损视频配置保留了特定支持ml的系统的良好性能,同时允许在存储或传输数据时节省大量费用。例如,当需要从多辆车无线收集视频数据时,这些方面是至关重要的,例如,需要有损视频编解码器来应对带宽限制。
{"title":"A Structured Analysis of the Video Degradation Effects on the Performance of a Machine Learning-enabled Pedestrian Detector","authors":"C. Berger","doi":"10.1109/SEAA53835.2021.00053","DOIUrl":"https://doi.org/10.1109/SEAA53835.2021.00053","url":null,"abstract":"Machine Learning (ML)-enabled software systems have been incorporated in many public demonstrations for automated driving (AD) systems. Such solutions have also been considered as a crucial approach to aim at SAE Level 5 systems, where the passengers in such vehicles do not have to interact with the system at all anymore. Already in 2016, Nvidia demonstrated a complete end-to-end approach for training the complete software stack covering perception, planning and decision making, and the actual vehicle control. While such approaches show the great potential of such ML-enabled systems, there have also been demonstrations where already changes to single pixels in a video frame can potentially lead to completely different decisions with dangerous consequences in the worst case. In this paper, a structured analysis has been conducted to explore video degradation effects on the performance of an ML-enabled pedestrian detector. Firstly, a baseline of applying “You only look once” (YOLO) to 1,026 frames with pedestrian annotations in the KITTI Vision Benchmark Suite has been established. Next, video degradation candidates for each of these frames were generated using the leading video compression codecs libx264, libx265, Nvidia HEVC, and AV1: 52 frames for the various compression presets for color frames, and 52 frames for gray-scale frames resulting in 104 degradation candidates per original KITTI frame and in 426,816 images in total. YOLO was applied to each image to compute the intersection-over-union (IoU) metric to compare the performance with the original baseline. While aggressively lossy compression settings result in significant performance drops as expected, it was also observed that some configurations actually result in slightly better IoU results compared to the baseline. Hence, while related work in literature demonstrated the potentially negative consequences of even simple modifications to video data when using ML-enabled systems, the findings from this work show that carefully chosen lossy video configurations preserve a decent performance of particular ML-enabled systems while allowing for substantial savings when storing or transmitting data. Such aspects are of crucial importance when, for example, video data needs to be collected from multiple vehicles wirelessly, where lossy video codecs are required to cope with bandwidth limitations for example.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131309086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2019-12-18DOI: 10.1109/SEAA53835.2021.00010
B. Napoleão, K. Felizardo, É. Souza, Fábio Petrillo, N. Vijaykumar, E. Nakagawa, Sylvain Hallé
Context: A tertiary study can be performed to identify related reviews on a topic of interest. However, the elaboration of an appropriate and effective search string to detect secondary studies is challenging for Software Engineering (SE) researchers. Objective: The main goal of this study is to propose a suitable search string to detect secondary studies in SE, addressing issues such as the quantity of applied terms, relevance, recall and precision. Method: We analyzed seven tertiary studies under two perspectives: (1) structure – strings’ terms to detect secondary studies; and (2) field: where searching – titles alone or abstracts alone or titles and abstracts together, among others. We validate our string by performing a twostep validation process. Firstly, we evaluated the capability to retrieve secondary studies over a set of 1537 secondary studies included in 24 tertiary studies in SE. Secondly, we evaluated the general capacity of retrieving secondary studies over an automated search using the Scopus digital library. Results: Our string was capable to retrieve an optimum value of over 90% of the included secondary studies (recall) with a high general precision of almost 60%. Conclusion: The suitable search string for finding secondary studies in SE contains the terms “systematic review”, “literature review”, “systematic mapping”, “mapping study” and “systematic map”.
{"title":"Establishing a Search String to Detect Secondary Studies in Software Engineering","authors":"B. Napoleão, K. Felizardo, É. Souza, Fábio Petrillo, N. Vijaykumar, E. Nakagawa, Sylvain Hallé","doi":"10.1109/SEAA53835.2021.00010","DOIUrl":"https://doi.org/10.1109/SEAA53835.2021.00010","url":null,"abstract":"Context: A tertiary study can be performed to identify related reviews on a topic of interest. However, the elaboration of an appropriate and effective search string to detect secondary studies is challenging for Software Engineering (SE) researchers. Objective: The main goal of this study is to propose a suitable search string to detect secondary studies in SE, addressing issues such as the quantity of applied terms, relevance, recall and precision. Method: We analyzed seven tertiary studies under two perspectives: (1) structure – strings’ terms to detect secondary studies; and (2) field: where searching – titles alone or abstracts alone or titles and abstracts together, among others. We validate our string by performing a twostep validation process. Firstly, we evaluated the capability to retrieve secondary studies over a set of 1537 secondary studies included in 24 tertiary studies in SE. Secondly, we evaluated the general capacity of retrieving secondary studies over an automated search using the Scopus digital library. Results: Our string was capable to retrieve an optimum value of over 90% of the included secondary studies (recall) with a high general precision of almost 60%. Conclusion: The suitable search string for finding secondary studies in SE contains the terms “systematic review”, “literature review”, “systematic mapping”, “mapping study” and “systematic map”.","PeriodicalId":435977,"journal":{"name":"2021 47th Euromicro Conference on Software Engineering and Advanced Applications (SEAA)","volume":"176 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122661884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}