Automatic correction of programs is a challenging problem with numerous real world applications in security, verification, and education. One application that is becoming increasingly important is the correction of student submissions in online courses for providing feedback. Most existing program repair techniques analyze Abstract Syntax Trees (ASTs) of programs, which are unfortunately unavailable for programs with syntax errors. In this paper, we propose a novel Neuro-symbolic approach that combines neural networks with constraint-based reasoning. Specifically, our method first uses a Recurrent Neural Network (RNN) to perform syntax repairs for the buggy programs; subsequently, the resulting syntactically-fixed programs are repaired using constraint-based techniques to ensure functional correctness. The RNNs are trained using a corpus of syntactically correct submissions for a given programming assignment, and are then queried to fix syntax errors in an incorrect programming submission by replacing or inserting the predicted tokens at the error location. We evaluate our technique on a dataset comprising of over 14,500 student submissions with syntax errors. Our method is able to repair syntax errors in 60% (8689) of submissions, and finds functionally correct repairs for 23.8% (3455) submissions.
{"title":"Neuro-Symbolic Program Corrector for Introductory Programming Assignments","authors":"Sahil Bhatia, Pushmeet Kohli, Rishabh Singh","doi":"10.1145/3180155.3180219","DOIUrl":"https://doi.org/10.1145/3180155.3180219","url":null,"abstract":"Automatic correction of programs is a challenging problem with numerous real world applications in security, verification, and education. One application that is becoming increasingly important is the correction of student submissions in online courses for providing feedback. Most existing program repair techniques analyze Abstract Syntax Trees (ASTs) of programs, which are unfortunately unavailable for programs with syntax errors. In this paper, we propose a novel Neuro-symbolic approach that combines neural networks with constraint-based reasoning. Specifically, our method first uses a Recurrent Neural Network (RNN) to perform syntax repairs for the buggy programs; subsequently, the resulting syntactically-fixed programs are repaired using constraint-based techniques to ensure functional correctness. The RNNs are trained using a corpus of syntactically correct submissions for a given programming assignment, and are then queried to fix syntax errors in an incorrect programming submission by replacing or inserting the predicted tokens at the error location. We evaluate our technique on a dataset comprising of over 14,500 student submissions with syntax errors. Our method is able to repair syntax errors in 60% (8689) of submissions, and finds functionally correct repairs for 23.8% (3455) submissions.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"4 1","pages":"60-70"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89273159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The Android platform has been the dominant mobile platform in recent years resulting in millions of apps and security threats against those apps. Anti-malware products aim to protect smartphone users from these threats, especially from malicious apps. However, malware authors use code obfuscation on their apps to evade detection by anti-malware products. To assess the effects of code obfuscation on Android apps and anti-malware products, we have conducted a large-scale empirical study that evaluates the effectiveness of the top anti-malware products against various obfuscation tools and strategies. To that end, we have obfuscated 3,000 benign apps and 3,000 malicious apps and generated 73,362 obfuscated apps using 29 obfuscation strategies from 7 open-source, academic, and commercial obfuscation tools. The findings of our study indicate that (1) code obfuscation significantly impacts Android anti-malware products; (2) the majority of anti-malware products are severely impacted by even trivial obfuscations; (3) in general, combined obfuscation strategies do not successfully evade anti-malware products more than individual strategies; (4) the detection of anti-malware products depend not only on the applied obfuscation strategy but also on the leveraged obfuscation tool; (5) anti-malware products are slow to adopt signatures of malicious apps; and (6) code obfuscation often results in changes to an app's semantic behaviors.
{"title":"A Large-Scale Empirical Study on the Effects of Code Obfuscations on Android Apps and Anti-Malware Products","authors":"Mahmoud M. Hammad, Joshua Garcia, S. Malek","doi":"10.1145/3180155.3180228","DOIUrl":"https://doi.org/10.1145/3180155.3180228","url":null,"abstract":"The Android platform has been the dominant mobile platform in recent years resulting in millions of apps and security threats against those apps. Anti-malware products aim to protect smartphone users from these threats, especially from malicious apps. However, malware authors use code obfuscation on their apps to evade detection by anti-malware products. To assess the effects of code obfuscation on Android apps and anti-malware products, we have conducted a large-scale empirical study that evaluates the effectiveness of the top anti-malware products against various obfuscation tools and strategies. To that end, we have obfuscated 3,000 benign apps and 3,000 malicious apps and generated 73,362 obfuscated apps using 29 obfuscation strategies from 7 open-source, academic, and commercial obfuscation tools. The findings of our study indicate that (1) code obfuscation significantly impacts Android anti-malware products; (2) the majority of anti-malware products are severely impacted by even trivial obfuscations; (3) in general, combined obfuscation strategies do not successfully evade anti-malware products more than individual strategies; (4) the detection of anti-malware products depend not only on the applied obfuscation strategy but also on the leveraged obfuscation tool; (5) anti-malware products are slow to adopt signatures of malicious apps; and (6) code obfuscation often results in changes to an app's semantic behaviors.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"23 1","pages":"421-431"},"PeriodicalIF":0.0,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86534591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Model-Driven Engineering (MDE) is a software engineering paradigm where models are actively used to specify, test, simulate, analyse and maintain the systems to be built, among other activities. Models can be defined using general-purpose modelling languages like the UML, but for particular domains, the use of domain-specific languages is pervasive. Either way, models must conform to a meta-model which defines their abstract syntax. In MDE, the definition of model management operations – often typed over project-specific meta-models – is recurrent. However, even if two operations are similar, they must be developed from scratch whenever they are applied to instances of different meta-models. This is so as operations defined (i.e., typed) over a meta-model cannot be directly reused for another. Part of this difficulty of reuse is because classes in meta-models are used in two ways: as templates to create objects and as static classifiers for them. These two aspects are inherently tied in most meta-modelling approaches, which results in unnecessarily rigid systems and hinders reusability of MDE artefacts. To enhance flexibility and reuse in MDE, we propose an approach to decouple object creation from typing [1]. The approach relies on standard mechanisms for object creation, and proposes the notion of a posteriori typing as a means to retype objects and enable multiple, partial, dynamic typings. A posteriori typing enhances flexibility because it allows models to be retyped with respect to other meta-models. Hence, we distinguish between creation meta-models used to construct models, and role meta-models into which models are retyped. This permits unanticipated reuse, as a model management operation defined for a role meta-model can be reused as-is with models built using a different creation meta-model, once such models are reclassified. Moreover, our approach permits expressing some types of bidirectional model transformations by reclassification. The transformations defined as reclassifications have better performance than the equivalent ones defined with traditional transformation languages, because reclassification does not require creating new objects. In [1], we propose two mechanisms to define a posteriori typings: type-level (mappings between meta-models) and instance-level (set of model queries). The paper presents the underlying theory and type correctness criteria of both mechanisms, defines some analysis methods, identifies practical restrictions for retyping specifications, and demonstrates the feasibility of the approach by an implementation atop our meta-modelling tool MetaDepth. We also explore application scenarios of a posteriori typing (to define transformations, for model transformation reuse, and to improve transformation expressiveness by dynamic type change), and present an experiment showing the potential performance gains when expressing transformations as retypings. The tool is available at http://metadepth.org/. A catalo
{"title":"[Journal First] A Posteriori Typing for Model-Driven Engineering: Concepts, Analysis, and Applications","authors":"J. de Lara, E. Guerra","doi":"10.1145/3180155.3182545","DOIUrl":"https://doi.org/10.1145/3180155.3182545","url":null,"abstract":"Model-Driven Engineering (MDE) is a software engineering paradigm where models are actively used to specify, test, simulate, analyse and maintain the systems to be built, among other activities. Models can be defined using general-purpose modelling languages like the UML, but for particular domains, the use of domain-specific languages is pervasive. Either way, models must conform to a meta-model which defines their abstract syntax. In MDE, the definition of model management operations – often typed over project-specific meta-models – is recurrent. However, even if two operations are similar, they must be developed from scratch whenever they are applied to instances of different meta-models. This is so as operations defined (i.e., typed) over a meta-model cannot be directly reused for another. Part of this difficulty of reuse is because classes in meta-models are used in two ways: as templates to create objects and as static classifiers for them. These two aspects are inherently tied in most meta-modelling approaches, which results in unnecessarily rigid systems and hinders reusability of MDE artefacts. To enhance flexibility and reuse in MDE, we propose an approach to decouple object creation from typing [1]. The approach relies on standard mechanisms for object creation, and proposes the notion of a posteriori typing as a means to retype objects and enable multiple, partial, dynamic typings. A posteriori typing enhances flexibility because it allows models to be retyped with respect to other meta-models. Hence, we distinguish between creation meta-models used to construct models, and role meta-models into which models are retyped. This permits unanticipated reuse, as a model management operation defined for a role meta-model can be reused as-is with models built using a different creation meta-model, once such models are reclassified. Moreover, our approach permits expressing some types of bidirectional model transformations by reclassification. The transformations defined as reclassifications have better performance than the equivalent ones defined with traditional transformation languages, because reclassification does not require creating new objects. In [1], we propose two mechanisms to define a posteriori typings: type-level (mappings between meta-models) and instance-level (set of model queries). The paper presents the underlying theory and type correctness criteria of both mechanisms, defines some analysis methods, identifies practical restrictions for retyping specifications, and demonstrates the feasibility of the approach by an implementation atop our meta-modelling tool MetaDepth. We also explore application scenarios of a posteriori typing (to define transformations, for model transformation reuse, and to improve transformation expressiveness by dynamic type change), and present an experiment showing the potential performance gains when expressing transformations as retypings. The tool is available at http://metadepth.org/. A catalo","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"18 1","pages":"1136-1136"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81519720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Program comprehension is an important skill for programmers – extending and debugging existing source code is part of the daily routine. Syntax highlighting is one of the most common tools used to support developers in understanding algorithms. However, most research in this area originates from a time when programmers used a completely different tool chain. Objective: We examined the influence of syntax highlighting on novices' ability to comprehend source code. Additional analyses cover the influence of task type and programming experience on the code comprehension ability itself and its relation to syntax highlighting. Method: We conducted a controlled experiment with 390 undergraduate students in an introductory Java programming course. We measured the correctness with which they solved small coding tasks. Each test subject received some tasks with syntax highlighting and some without. Results: The data provided no evidence that syntax highlighting improves novices' ability to comprehend source code. Limitations: There are very few similar experiments and it is unclear as of yet which factors impact the effectiveness of syntax highlighting. One major limitation may be the types of tasks chosen for this experiment. Conclusion: The results suggest that syntax highlighting squanders a feedback channel from the IDE to the programmer that can be used more effectively.
{"title":"[Journal First] Does Syntax Highlighting Help Programming Novices?","authors":"C. Hannebauer, M. Hesenius, V. Gruhn","doi":"10.1145/3180155.3182554","DOIUrl":"https://doi.org/10.1145/3180155.3182554","url":null,"abstract":"Background: Program comprehension is an important skill for programmers – extending and debugging existing source code is part of the daily routine. Syntax highlighting is one of the most common tools used to support developers in understanding algorithms. However, most research in this area originates from a time when programmers used a completely different tool chain. Objective: We examined the influence of syntax highlighting on novices' ability to comprehend source code. Additional analyses cover the influence of task type and programming experience on the code comprehension ability itself and its relation to syntax highlighting. Method: We conducted a controlled experiment with 390 undergraduate students in an introductory Java programming course. We measured the correctness with which they solved small coding tasks. Each test subject received some tasks with syntax highlighting and some without. Results: The data provided no evidence that syntax highlighting improves novices' ability to comprehend source code. Limitations: There are very few similar experiments and it is unclear as of yet which factors impact the effectiveness of syntax highlighting. One major limitation may be the types of tasks chosen for this experiment. Conclusion: The results suggest that syntax highlighting squanders a feedback channel from the IDE to the programmer that can be used more effectively.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"5 1","pages":"704-704"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89408061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wiesław Kopeć, Bartłomiej Balcerzak, R. Nielek, Grzegorz Kowalik, A. Wierzbicki, F. Casati
Globally observed trends in aging indicate that older adults constitute a growing share of the population and an increasing demographic in the modern technologies marketplace. Therefore, it has become important to address the issue of participation of older adults in the process of developing solutions suitable for their group. In this study, we approached this topic by organizing a hackathon involving teams of young programmers and older adult participants. In our paper we describe a case study of that hackathon, in which our objective was to motivate older adults to participate in software engineering processes. Based on our results from an array of qualitative methods, we propose a set of good practices that may lead to improved older adult participation in similar events and an improved process of developing apps that target older adults.
{"title":"[Journal First] Older Adults and Hackathons: A Qualitative Study","authors":"Wiesław Kopeć, Bartłomiej Balcerzak, R. Nielek, Grzegorz Kowalik, A. Wierzbicki, F. Casati","doi":"10.1145/3180155.3182547","DOIUrl":"https://doi.org/10.1145/3180155.3182547","url":null,"abstract":"Globally observed trends in aging indicate that older adults constitute a growing share of the population and an increasing demographic in the modern technologies marketplace. Therefore, it has become important to address the issue of participation of older adults in the process of developing solutions suitable for their group. In this study, we approached this topic by organizing a hackathon involving teams of young programmers and older adult participants. In our paper we describe a case study of that hackathon, in which our objective was to motivate older adults to participate in software engineering processes. Based on our results from an array of qualitative methods, we propose a set of good practices that may lead to improved older adult participation in similar events and an improved process of developing apps that target older adults.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"96 1","pages":"702-703"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90823084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In practice, a program may contain multiple bugs. The simultaneous presence of these bugs may deteriorate the effectiveness of existing fault-localization techniques to locate program bugs. While it is acceptable to use all failed and successful tests to identify suspicious code for programs with exactly one bug, it is not appropriate to use the same approach for programs with multiple bugs because the due-to relationship between failed tests and underlying bugs cannot be easily identified. One solution is to generate fault-focused clusters by grouping failed tests caused by the same bug into the same clusters. We propose MSeer - an advanced fault localization technique for locating multiple bugs in parallel. Our major contributions include the use of (1) a revised Kendall tau distance to measure the distance between two failed tests, (2) an innovative approach to simultaneously estimate the number of clusters and assign initial medoids to these clusters, and (3) an improved K-medoids clustering algorithm to better identify the due-to relationship between failed tests and their corresponding bugs. Case studies on 840 multiple-bug versions of seven programs suggest that MSeer performs better in terms of effectiveness and efficiency than two other techniques for locating multiple bugs in parallel.
{"title":"[Journal First] MSeer – An Advanced Technique for Locating Multiple Bugs in Parallel","authors":"Ruizhi Gao, E. Wong","doi":"10.1145/3180155.3182552","DOIUrl":"https://doi.org/10.1145/3180155.3182552","url":null,"abstract":"In practice, a program may contain multiple bugs. The simultaneous presence of these bugs may deteriorate the effectiveness of existing fault-localization techniques to locate program bugs. While it is acceptable to use all failed and successful tests to identify suspicious code for programs with exactly one bug, it is not appropriate to use the same approach for programs with multiple bugs because the due-to relationship between failed tests and underlying bugs cannot be easily identified. One solution is to generate fault-focused clusters by grouping failed tests caused by the same bug into the same clusters. We propose MSeer - an advanced fault localization technique for locating multiple bugs in parallel. Our major contributions include the use of (1) a revised Kendall tau distance to measure the distance between two failed tests, (2) an innovative approach to simultaneously estimate the number of clusters and assign initial medoids to these clusters, and (3) an improved K-medoids clustering algorithm to better identify the due-to relationship between failed tests and their corresponding bugs. Case studies on 840 multiple-bug versions of seven programs suggest that MSeer performs better in terms of effectiveness and efficiency than two other techniques for locating multiple bugs in parallel.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"1064-1064"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74109656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eirini Kalliamvakou, C. Bird, Thomas Zimmermann, Andrew Begel, R. Deline, D. Germán
Having great managers is as critical to success as having a good team or organization. A great manager is seen as fuelling the team they manage, enabling it to use its full potential. Though software engineering research studies factors that may affect the performance and productivity of software engineers and teams (like tools and skill), it has overlooked the software engineering manager. On the one hand, experts are questioning how the abundant work in management applies to software engineering. On the other hand, practitioners are looking to researchers for evidence-based guidance on how to manage software teams. We conducted a mixed methods empirical study to investigate what manager attributes developers and engineering managers perceive important and why. We present a conceptual framework of manager attributes, and find that technical skills are not the sign of greatness for an engineering manager. Through statistical analysis we identify how engineers and managers relate in their views, and how software engineering differs from other knowledge work groups.
{"title":"[Journal First] What Makes a Great Manager of Software Engineers?","authors":"Eirini Kalliamvakou, C. Bird, Thomas Zimmermann, Andrew Begel, R. Deline, D. Germán","doi":"10.1145/3180155.3182525","DOIUrl":"https://doi.org/10.1145/3180155.3182525","url":null,"abstract":"Having great managers is as critical to success as having a good team or organization. A great manager is seen as fuelling the team they manage, enabling it to use its full potential. Though software engineering research studies factors that may affect the performance and productivity of software engineers and teams (like tools and skill), it has overlooked the software engineering manager. On the one hand, experts are questioning how the abundant work in management applies to software engineering. On the other hand, practitioners are looking to researchers for evidence-based guidance on how to manage software teams. We conducted a mixed methods empirical study to investigate what manager attributes developers and engineering managers perceive important and why. We present a conceptual framework of manager attributes, and find that technical skills are not the sign of greatness for an engineering manager. Through statistical analysis we identify how engineers and managers relate in their views, and how software engineering differs from other knowledge work groups.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"30 5","pages":"701-701"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72889967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kostadin Damevski, Hui Chen, D. Shepherd, Nicholas A. Kraft, L. Pollock
Interaction data, gathered from developers' daily clicks and key presses in the IDE, has found use in both empirical studies and in recommendation systems for software engineering. We observe that this data has several characteristics, common across IDEs: 1) exponentially distributed - some events or commands dominate the trace (e.g., cursor movement commands), while most other commands occur relatively infrequently; 2) noisy - the traces include spurious commands (or clicks), or unrelated events, that may not be important to the behavior of interest; 3) comprise of overlapping events and commands - specific commands can be invoked by separate mechanisms, and similar events can be triggered by different sources. These characteristics of this data are analogous to the characteristics of synonymy and polysemy in natural language corpora. Therefore, this paper (and presentation) presents a new modeling approach for this type of data, leveraging topic models typically applied to streams of natural language text.
{"title":"[Journal First] Predicting Future Developer Behavior in the IDE Using Topic Models","authors":"Kostadin Damevski, Hui Chen, D. Shepherd, Nicholas A. Kraft, L. Pollock","doi":"10.1145/3180155.3182541","DOIUrl":"https://doi.org/10.1145/3180155.3182541","url":null,"abstract":"Interaction data, gathered from developers' daily clicks and key presses in the IDE, has found use in both empirical studies and in recommendation systems for software engineering. We observe that this data has several characteristics, common across IDEs: 1) exponentially distributed - some events or commands dominate the trace (e.g., cursor movement commands), while most other commands occur relatively infrequently; 2) noisy - the traces include spurious commands (or clicks), or unrelated events, that may not be important to the behavior of interest; 3) comprise of overlapping events and commands - specific commands can be invoked by separate mechanisms, and similar events can be triggered by different sources. These characteristics of this data are analogous to the characteristics of synonymy and polysemy in natural language corpora. Therefore, this paper (and presentation) presents a new modeling approach for this type of data, leveraging topic models typically applied to streams of natural language text.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"105 3","pages":"932-932"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72572749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Just-In-Time (JIT) models identify fix-inducing code changes. JIT models are trained using techniques that assume that past fix-inducing changes are similar to future ones. However, this assumption may not hold, e.g., as system complexity tends to accrue, expertise may become more important as systems age. In this paper, we study JIT models as systems evolve. Through a longitudinal case study of 37,524 changes from the rapidly evolving Qt and OpenStack systems, we find that fluctuations in the properties of fix-inducing changes can impact the performance and interpretation of JIT models. More specifically: (a) the discriminatory power (AUC) and calibration (Brier) scores of JIT models drop considerably one year after being trained; (b) the role that code change properties (e.g., Size, Experience) play within JIT models fluctuates over time; and (c) those fluctuations yield over- and underestimates of the future impact of code change properties on the likelihood of inducing fixes. To avoid erroneous or misleading predictions, JIT models should be retrained using recently recorded data (within three months). Moreover, quality improvement plans should be informed by JIT models that are trained using six months (or more) of historical data, since they are more resilient to period-specific fluctuations in the importance of code change properties.
{"title":"[Journal First] Are Fix-Inducing Changes a Moving Target?: A Longitudinal Case Study of Just-in-Time Defect Prediction","authors":"Shane McIntosh, Yasutaka Kamei","doi":"10.1145/3180155.3182514","DOIUrl":"https://doi.org/10.1145/3180155.3182514","url":null,"abstract":"Just-In-Time (JIT) models identify fix-inducing code changes. JIT models are trained using techniques that assume that past fix-inducing changes are similar to future ones. However, this assumption may not hold, e.g., as system complexity tends to accrue, expertise may become more important as systems age. In this paper, we study JIT models as systems evolve. Through a longitudinal case study of 37,524 changes from the rapidly evolving Qt and OpenStack systems, we find that fluctuations in the properties of fix-inducing changes can impact the performance and interpretation of JIT models. More specifically: (a) the discriminatory power (AUC) and calibration (Brier) scores of JIT models drop considerably one year after being trained; (b) the role that code change properties (e.g., Size, Experience) play within JIT models fluctuates over time; and (c) those fluctuations yield over- and underestimates of the future impact of code change properties on the likelihood of inducing fixes. To avoid erroneous or misleading predictions, JIT models should be retrained using recently recorded data (within three months). Moreover, quality improvement plans should be informed by JIT models that are trained using six months (or more) of historical data, since they are more resilient to period-specific fluctuations in the importance of code change properties.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"34 1","pages":"560-560"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72997870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mayy Habayeb, Syed Shariyar Murtaza, A. Miranskyy, A. Bener
A significant amount of time is spent by software developers in investigating bug reports. It is useful to indicate when a bug report will be closed, since it would help software teams to prioritise their work. Several studies have been conducted to address this problem in the past decade. Most of these studies have used the frequency of occurrence of certain developer activities as input attributes in building their prediction models. However, these approaches tend to ignore the temporal nature of the occurrence of these activities. In this paper, a novel approach using Hidden Markov models (HMMs) and temporal sequences of developer activities is proposed. The approach is empirically demonstrated in a case study using eight years of bug reports collected from the Firefox project.
{"title":"[Journal First] On the Use of Hidden Markov Model to Predict the Time to Fix Bugs","authors":"Mayy Habayeb, Syed Shariyar Murtaza, A. Miranskyy, A. Bener","doi":"10.1145/3180155.3182522","DOIUrl":"https://doi.org/10.1145/3180155.3182522","url":null,"abstract":"A significant amount of time is spent by software developers in investigating bug reports. It is useful to indicate when a bug report will be closed, since it would help software teams to prioritise their work. Several studies have been conducted to address this problem in the past decade. Most of these studies have used the frequency of occurrence of certain developer activities as input attributes in building their prediction models. However, these approaches tend to ignore the temporal nature of the occurrence of these activities. In this paper, a novel approach using Hidden Markov models (HMMs) and temporal sequences of developer activities is proposed. The approach is empirically demonstrated in a case study using eight years of bug reports collected from the Firefox project.","PeriodicalId":6560,"journal":{"name":"2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE)","volume":"71 1","pages":"700-700"},"PeriodicalIF":0.0,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77285486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}