Jonathan Aldrich Carnegie Mellon University Aldeida Aleti Monash University Dalal Alrajeh Imperial College London Samik Basu Iowa State University Benoit Baudry KTH Gabriele Bavota Università della Svizzera Italiana (USI) Nelly Bencomo Aston University Ayse Bener Ryerson University Domenico Bianculli SnT Centre University of Luxembourg Christian Bird Microsoft Kelly Blincoe The University of Auckland Barbora Buhnova Masaryk University Marcel Böhme Monash University Maria Christakis MPI-SWS Siobhan Clarke Trinity College Dublin James Clause University of Delaware Marcelo d'Amorim Federal University of Pernambuco Cleidson De Souza Vale Institute of Technology and UFPA Robert Deline Microsoft Danny Dig Oregon State University Yvonne Dittrich IT University of Copenhagen Hakan Erdogmus Carnegie Mellon University Robert Feldt Blekinge Institute of Technology Maria Angela Ferrario Lancaster University Antonio Filieri Imperial College London Thomas Fritz University of Zurich, University of British Columbia Diego Garbervetsky Departamento de Computación. FCEyN. UBA Jaco Geldenhuys Stellenbosch University Milos Gligoric The University of Texas at Austin Mike Godfrey University of Waterloo Alex Groce Northern Arizona University John Grundy Monash University Lars Grunske Humboldt University Berlin Paul Grünbacher Johannes Kepler University Linz Arie Gurfinkel University of Waterloo William G.J. Halfond University of Southern California Tracy Hall Lancaster University Sylvain Hallé Université du Québec à Chicoutimi Dan Hao Peking University Mark Harman University College London Rachel Harrison University of Oxford Emily Hill Drew University
Jonathan Aldrich卡内基梅隆大学Aldeida Aleti莫纳什大学Dalal Alrajeh帝国理工学院伦敦Samik Basu爱荷华州立大学Benoit Baudry KTH Gabriele Bavota意大利Svizzera大学(USI) Nelly Bencomo阿斯顿大学Ayse Bener Ryerson大学Domenico Bianculli SnT中心卢森堡大学Christian Bird微软Kelly Blincoe奥克兰大学Barbora Buhnova Masaryk大学Marcel Böhme莫纳什大学Maria Christakis mpa - sws西沃恩·克拉克都柏林三一学院詹姆斯·克拉宾大学特拉华大学马塞洛·德·阿莫林伯南布哥联邦大学克莱德森·德·苏扎·瓦尔理工学院和upa罗伯特·德林微软丹尼·迪格俄勒冈州立大学伊冯·迪特里希IT大学哥本哈根哈坎·埃尔多格莫斯卡内基梅隆大学罗伯特·费尔特·布莱金理工学院玛丽亚·安吉拉·费拉里奥兰开斯特大学安东尼奥·菲利埃里伦敦帝国理工学院托马斯·弗里茨苏黎世大学英属哥伦比亚大学Diego Garbervetsky系Computación。FCEyN。UBA Jaco Geldenhuys Stellenbosch大学Milos Gligoric德克萨斯大学奥斯汀分校Mike Godfrey滑铁卢大学Alex Groce北亚利桑那大学John Grundy Monash大学Lars Grunske Humboldt大学Berlin Paul gr nbacher Johannes Kepler大学Linz Arie Gurfinkel滑铁卢大学William G.J. Halfond南加州大学Tracy Hall兰开斯特大学Sylvain hall quacimbec大学Chicoutimi Dan Hao北京大学Mark Harman伦敦大学学院雷切尔·哈里森牛津大学艾米莉·希尔德鲁大学
{"title":"Program Committee of ICSE 2019","authors":"Jonathan Aldrich","doi":"10.1109/icse.2019.00014","DOIUrl":"https://doi.org/10.1109/icse.2019.00014","url":null,"abstract":"Jonathan Aldrich Carnegie Mellon University Aldeida Aleti Monash University Dalal Alrajeh Imperial College London Samik Basu Iowa State University Benoit Baudry KTH Gabriele Bavota Università della Svizzera Italiana (USI) Nelly Bencomo Aston University Ayse Bener Ryerson University Domenico Bianculli SnT Centre University of Luxembourg Christian Bird Microsoft Kelly Blincoe The University of Auckland Barbora Buhnova Masaryk University Marcel Böhme Monash University Maria Christakis MPI-SWS Siobhan Clarke Trinity College Dublin James Clause University of Delaware Marcelo d'Amorim Federal University of Pernambuco Cleidson De Souza Vale Institute of Technology and UFPA Robert Deline Microsoft Danny Dig Oregon State University Yvonne Dittrich IT University of Copenhagen Hakan Erdogmus Carnegie Mellon University Robert Feldt Blekinge Institute of Technology Maria Angela Ferrario Lancaster University Antonio Filieri Imperial College London Thomas Fritz University of Zurich, University of British Columbia Diego Garbervetsky Departamento de Computación. FCEyN. UBA Jaco Geldenhuys Stellenbosch University Milos Gligoric The University of Texas at Austin Mike Godfrey University of Waterloo Alex Groce Northern Arizona University John Grundy Monash University Lars Grunske Humboldt University Berlin Paul Grünbacher Johannes Kepler University Linz Arie Gurfinkel University of Waterloo William G.J. Halfond University of Southern California Tracy Hall Lancaster University Sylvain Hallé Université du Québec à Chicoutimi Dan Hao Peking University Mark Harman University College London Rachel Harrison University of Oxford Emily Hill Drew University","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"04 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86079284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Web Chairs of ICSE 2019","authors":"","doi":"10.1109/icse.2019.00010","DOIUrl":"https://doi.org/10.1109/icse.2019.00010","url":null,"abstract":"","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86716632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dynamic program slicing is useful for a variety of tasks, from testing to debugging to security. Prior slicing approaches have targeted traditional desktop/server platforms, rather than mobile platforms such as Android. Slicing mobile, event-based systems is challenging due to their asynchronous callback construction and the IPC (interprocess communication)- heavy, sensor-driven, timing-sensitive nature of the platform. To address these problems, we introduce AndroidSlicer1, the first slicing approach for Android. AndroidSlicer combines a novel asynchronous slicing approach for modeling data and control dependences in the presence of callbacks with lightweight and precise instrumentation; this allows slicing for apps running on actual phones, and without requiring the app's source code. Our slicer is capable of handling a wide array of inputs that Android supports without adding any noticeable overhead. Experiments on 60 apps from Google Play show that AndroidSlicer is effective (reducing the number of instructions to be examined to 0.3% of executed instructions) and efficient (app instrumentation and post-processing combined takes 31 seconds); all while imposing a runtime overhead of just 4%. We present three applications of AndroidSlicer that are particularly relevant in the mobile domain: (1) finding and tracking input parts responsible for an error/crash, (2) fault localization, i.e., finding the instructions responsible for an error/crash, and (3) reducing the regression test suite. Experiments with these applications on an additional set of 18 popular apps indicate that AndroidSlicer is effective for Android testing and debugging.
{"title":"Dynamic Slicing for Android","authors":"Tanzirul Azim, Arash Alavi, Iulian Neamtiu, Rajiv Gupta","doi":"10.1109/ICSE.2019.00118","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00118","url":null,"abstract":"Dynamic program slicing is useful for a variety of tasks, from testing to debugging to security. Prior slicing approaches have targeted traditional desktop/server platforms, rather than mobile platforms such as Android. Slicing mobile, event-based systems is challenging due to their asynchronous callback construction and the IPC (interprocess communication)- heavy, sensor-driven, timing-sensitive nature of the platform. To address these problems, we introduce AndroidSlicer1, the first slicing approach for Android. AndroidSlicer combines a novel asynchronous slicing approach for modeling data and control dependences in the presence of callbacks with lightweight and precise instrumentation; this allows slicing for apps running on actual phones, and without requiring the app's source code. Our slicer is capable of handling a wide array of inputs that Android supports without adding any noticeable overhead. Experiments on 60 apps from Google Play show that AndroidSlicer is effective (reducing the number of instructions to be examined to 0.3% of executed instructions) and efficient (app instrumentation and post-processing combined takes 31 seconds); all while imposing a runtime overhead of just 4%. We present three applications of AndroidSlicer that are particularly relevant in the mobile domain: (1) finding and tracking input parts responsible for an error/crash, (2) fault localization, i.e., finding the instructions responsible for an error/crash, and (3) reducing the regression test suite. Experiments with these applications on an additional set of 18 popular apps indicate that AndroidSlicer is effective for Android testing and debugging.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"25 1","pages":"1154-1164"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77872337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the Journal-First Chair of ICSE 2019","authors":"","doi":"10.1109/icse.2019.00007","DOIUrl":"https://doi.org/10.1109/icse.2019.00007","url":null,"abstract":"","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"179 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87595449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zahra Shakeri Hossein Abad, V. Gervasi, D. Zowghi, B. Far
In many software development projects, analysts are required to deal with systems' requirements from unfamiliar domains. Familiarity with the domain is necessary in order to get full leverage from interaction with stakeholders and for extracting relevant information from the existing project documents. Accurate and timely extraction and classification of requirements knowledge support analysts in this challenging scenario. Our approach is to mine real-time interaction records and project documents for the relevant phrasal units about the requirements related topics being discussed during elicitation. We propose to use both generative and discriminating methods. To extract the relevant terms, we leverage the flexibility and power of Weighted Finite State Transducers (WFSTs) in dynamic modelling of natural language processing tasks. We used an extended version of Support Vector Machines (SVMs) with variable-sized feature vectors to efficiently and dynamically extract and classify requirements-related knowledge from the existing documents. To evaluate the performance of our approach intuitively and quantitatively, we used edit distance and precision/recall metrics. We show in three case studies that the snippets extracted by our method are intuitively relevant and reasonably accurate. Furthermore, we found that statistical and linguistic parameters such as smoothing methods, and words contiguity and order features can impact the performance of both extraction and classification tasks.
{"title":"Supporting Analysts by Dynamic Extraction and Classification of Requirements-Related Knowledge","authors":"Zahra Shakeri Hossein Abad, V. Gervasi, D. Zowghi, B. Far","doi":"10.1109/ICSE.2019.00057","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00057","url":null,"abstract":"In many software development projects, analysts are required to deal with systems' requirements from unfamiliar domains. Familiarity with the domain is necessary in order to get full leverage from interaction with stakeholders and for extracting relevant information from the existing project documents. Accurate and timely extraction and classification of requirements knowledge support analysts in this challenging scenario. Our approach is to mine real-time interaction records and project documents for the relevant phrasal units about the requirements related topics being discussed during elicitation. We propose to use both generative and discriminating methods. To extract the relevant terms, we leverage the flexibility and power of Weighted Finite State Transducers (WFSTs) in dynamic modelling of natural language processing tasks. We used an extended version of Support Vector Machines (SVMs) with variable-sized feature vectors to efficiently and dynamically extract and classify requirements-related knowledge from the existing documents. To evaluate the performance of our approach intuitively and quantitatively, we used edit distance and precision/recall metrics. We show in three case studies that the snippets extracted by our method are intuitively relevant and reasonably accurate. Furthermore, we found that statistical and linguistic parameters such as smoothing methods, and words contiguity and order features can impact the performance of both extraction and classification tasks.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"63 1","pages":"442-453"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76220008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Exploiting machine learning techniques for analyzing programs has attracted much attention. One key problem is how to represent code fragments well for follow-up analysis. Traditional information retrieval based methods often treat programs as natural language texts, which could miss important semantic information of source code. Recently, state-of-the-art studies demonstrate that abstract syntax tree (AST) based neural models can better represent source code. However, the sizes of ASTs are usually large and the existing models are prone to the long-term dependency problem. In this paper, we propose a novel AST-based Neural Network (ASTNN) for source code representation. Unlike existing models that work on entire ASTs, ASTNN splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements. Based on the sequence of statement vectors, a bidirectional RNN model is used to leverage the naturalness of statements and finally produce the vector representation of a code fragment. We have applied our neural network based source code representation method to two common program comprehension tasks: source code classification and code clone detection. Experimental results on the two tasks indicate that our model is superior to state-of-the-art approaches.
{"title":"A Novel Neural Source Code Representation Based on Abstract Syntax Tree","authors":"Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, Kaixuan Wang, Xudong Liu","doi":"10.1109/ICSE.2019.00086","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00086","url":null,"abstract":"Exploiting machine learning techniques for analyzing programs has attracted much attention. One key problem is how to represent code fragments well for follow-up analysis. Traditional information retrieval based methods often treat programs as natural language texts, which could miss important semantic information of source code. Recently, state-of-the-art studies demonstrate that abstract syntax tree (AST) based neural models can better represent source code. However, the sizes of ASTs are usually large and the existing models are prone to the long-term dependency problem. In this paper, we propose a novel AST-based Neural Network (ASTNN) for source code representation. Unlike existing models that work on entire ASTs, ASTNN splits each large AST into a sequence of small statement trees, and encodes the statement trees to vectors by capturing the lexical and syntactical knowledge of statements. Based on the sequence of statement vectors, a bidirectional RNN model is used to leverage the naturalness of statements and finally produce the vector representation of a code fragment. We have applied our neural network based source code representation method to two common program comprehension tasks: source code classification and code clone detection. Experimental results on the two tasks indicate that our model is superior to state-of-the-art approaches.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"40 1","pages":"783-794"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78771330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, Charles Zhang
Detecting memory leak at industrial scale is still not well addressed, in spite of the tremendous effort from both industry and academia in the past decades. Existing work suffers from an unresolved paradox - a highly precise analysis limits its scalability and an imprecise one seriously hurts its precision or recall. In this work, we present SMOKE, a staged approach to resolve this paradox. In the ?rst stage, instead of using a uniform precise analysis for all paths, we use a scalable but imprecise analysis to compute a succinct set of candidate memory leak paths. In the second stage, we leverage a more precise analysis to verify the feasibility of those candidates. The ?rst stage is scalable, due to the design of a new sparse program representation, the use-?ow graph (UFG), that models the problem as a polynomial-time state analysis. The second stage analysis is both precise and ef?cient, due to the smaller number of candidates and the design of a dedicated constraint solver. Experimental results show that SMOKE can ?nish checking industrial-sized projects, up to 8MLoC, in forty minutes with an average false positive rate of 24.4%. Besides, SMOKE is signi?cantly faster than the state-of-the-art research techniques as well as the industrial tools, with the speedup ranging from 5.2X to 22.8X. In the twenty-nine mature and extensively checked benchmark projects, SMOKE has discovered thirty previously unknown memory leaks which were con?rmed by developers, and one even assigned a CVE ID.
{"title":"SMOKE: Scalable Path-Sensitive Memory Leak Detection for Millions of Lines of Code","authors":"Gang Fan, Rongxin Wu, Qingkai Shi, Xiao Xiao, Jinguo Zhou, Charles Zhang","doi":"10.1109/ICSE.2019.00025","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00025","url":null,"abstract":"Detecting memory leak at industrial scale is still not well addressed, in spite of the tremendous effort from both industry and academia in the past decades. Existing work suffers from an unresolved paradox - a highly precise analysis limits its scalability and an imprecise one seriously hurts its precision or recall. In this work, we present SMOKE, a staged approach to resolve this paradox. In the ?rst stage, instead of using a uniform precise analysis for all paths, we use a scalable but imprecise analysis to compute a succinct set of candidate memory leak paths. In the second stage, we leverage a more precise analysis to verify the feasibility of those candidates. The ?rst stage is scalable, due to the design of a new sparse program representation, the use-?ow graph (UFG), that models the problem as a polynomial-time state analysis. The second stage analysis is both precise and ef?cient, due to the smaller number of candidates and the design of a dedicated constraint solver. Experimental results show that SMOKE can ?nish checking industrial-sized projects, up to 8MLoC, in forty minutes with an average false positive rate of 24.4%. Besides, SMOKE is signi?cantly faster than the state-of-the-art research techniques as well as the industrial tools, with the speedup ranging from 5.2X to 22.8X. In the twenty-nine mature and extensively checked benchmark projects, SMOKE has discovered thirty previously unknown memory leaks which were con?rmed by developers, and one even assigned a CVE ID.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"20 1","pages":"72-82"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84163253","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Souti Chattopadhyay, Nicholas Nelson, Yenifer Ramirez Gonzalez, Annel Amelia Leon, Rahul Pandita, A. Sarma
In order to build efficient tools that support complex programming tasks, it is imperative that we understand how developers program. We know that developers create a context around their programming task by gathering relevant information. We also know that developers decompose their tasks recursively into smaller units. However, important gaps exist in our knowledge about: (1) the role that context plays in supporting smaller units of tasks, (2) the relationship that exists among these smaller units, and (3) how context flows across them. The goal of this research is to gain a better understanding of how developers structure their tasks and manage context through a field study of ten professional developers in an industrial setting. Our analysis reveals that developers decompose their tasks into smaller units with distinct goals, that specific patterns exist in how they sequence these smaller units, and that developers may maintain context between those smaller units with related goals.
{"title":"Latent Patterns in Activities: A Field Study of How Developers Manage Context","authors":"Souti Chattopadhyay, Nicholas Nelson, Yenifer Ramirez Gonzalez, Annel Amelia Leon, Rahul Pandita, A. Sarma","doi":"10.1109/ICSE.2019.00051","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00051","url":null,"abstract":"In order to build efficient tools that support complex programming tasks, it is imperative that we understand how developers program. We know that developers create a context around their programming task by gathering relevant information. We also know that developers decompose their tasks recursively into smaller units. However, important gaps exist in our knowledge about: (1) the role that context plays in supporting smaller units of tasks, (2) the relationship that exists among these smaller units, and (3) how context flows across them. The goal of this research is to gain a better understanding of how developers structure their tasks and manage context through a field study of ten professional developers in an industrial setting. Our analysis reveals that developers decompose their tasks into smaller units with distinct goals, that specific patterns exist in how they sequence these smaller units, and that developers may maintain context between those smaller units with related goals.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"88 1","pages":"373-383"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79979372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Carmine Vassallo, Sebastian Proksch, H. Gall, M. D. Penta
Continuous Integration (CI) is a widely-used software engineering practice. The software is continuously built so that changes can be easily integrated and issues such as unmet quality goals or style inconsistencies get detected early. Unfortunately, it is not only hard to introduce CI into an existing project, but it is also challenging to live up to the CI principles when facing tough deadlines or business decisions. Previous work has identified common anti-patterns that reduce the promised benefits of CI. Typically, these anti-patterns slowly creep into a project over time before they are identified. We argue that automated detection can help with early identification and prevent such a process decay. In this work, we further analyze this assumption and survey 124 developers about CI anti-patterns. From the results, we build CI-Odor, a reporting tool for CI processes that detects the existence of four relevant anti-patterns by analyzing regular build logs and repository information. In a study on the 18,474 build logs of 36 popular JAVA projects, we reveal the presence of 3,823 high-severity warnings spread across projects. We validate our reports in a survey among 13 original developers of these projects and through general feedback from 42 developers that confirm the relevance of our reports.
{"title":"Automated Reporting of Anti-Patterns and Decay in Continuous Integration","authors":"Carmine Vassallo, Sebastian Proksch, H. Gall, M. D. Penta","doi":"10.1109/ICSE.2019.00028","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00028","url":null,"abstract":"Continuous Integration (CI) is a widely-used software engineering practice. The software is continuously built so that changes can be easily integrated and issues such as unmet quality goals or style inconsistencies get detected early. Unfortunately, it is not only hard to introduce CI into an existing project, but it is also challenging to live up to the CI principles when facing tough deadlines or business decisions. Previous work has identified common anti-patterns that reduce the promised benefits of CI. Typically, these anti-patterns slowly creep into a project over time before they are identified. We argue that automated detection can help with early identification and prevent such a process decay. In this work, we further analyze this assumption and survey 124 developers about CI anti-patterns. From the results, we build CI-Odor, a reporting tool for CI processes that detects the existence of four relevant anti-patterns by analyzing regular build logs and repository information. In a study on the 18,474 build logs of 36 popular JAVA projects, we reveal the presence of 3,823 high-severity warnings spread across projects. We validate our reports in a survey among 13 original developers of these projects and through general feedback from 42 developers that confirm the relevance of our reports.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"53 1","pages":"105-115"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84458613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Automated build systems are routinely used by software engineers to minimize the number of objects that need to be recompiled after incremental changes to the source files of a project. In order to achieve efficient and correct builds, developers must provide the build tools with dependency information between the files and modules of a project, usually expressed in a macro language specific to each build tool. In order to guarantee correctness, the authors of these tools are responsible for enumerating all the files whose contents an output depends on. Unfortunately, this is a tedious process and not all dependencies are captured in practice, which leads to incorrect builds. We automatically uncover such missing dependencies through a novel method that we call build fuzzing. The correctness of build definitions is verified by modifying files in a project, triggering incremental builds and comparing the set of changed files to the set of expected changes. These sets are determined using a dependency graph inferred by tracing the system calls executed during a clean build. We evaluate our method by exhaustively testing build rules of open-source projects, uncovering issues leading to race conditions and faulty builds in 31 of them. We provide a discussion of the bugs we detect, identifying anti-patterns in the use of the macro languages. We fix some of the issues in projects where the features of build systems allow a clean solution.
{"title":"Detecting Incorrect Build Rules","authors":"N. Licker, A. Rice","doi":"10.1109/ICSE.2019.00125","DOIUrl":"https://doi.org/10.1109/ICSE.2019.00125","url":null,"abstract":"Automated build systems are routinely used by software engineers to minimize the number of objects that need to be recompiled after incremental changes to the source files of a project. In order to achieve efficient and correct builds, developers must provide the build tools with dependency information between the files and modules of a project, usually expressed in a macro language specific to each build tool. In order to guarantee correctness, the authors of these tools are responsible for enumerating all the files whose contents an output depends on. Unfortunately, this is a tedious process and not all dependencies are captured in practice, which leads to incorrect builds. We automatically uncover such missing dependencies through a novel method that we call build fuzzing. The correctness of build definitions is verified by modifying files in a project, triggering incremental builds and comparing the set of changed files to the set of expected changes. These sets are determined using a dependency graph inferred by tracing the system calls executed during a clean build. We evaluate our method by exhaustively testing build rules of open-source projects, uncovering issues leading to race conditions and faulty builds in 31 of them. We provide a discussion of the bugs we detect, identifying anti-patterns in the use of the macro languages. We fix some of the issues in projects where the features of build systems allow a clean solution.","PeriodicalId":6736,"journal":{"name":"2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE)","volume":"7 1","pages":"1234-1244"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88228724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}