Behavior Driven Development (BDD) is an agile approach that uses. feature files to describe the functionalities of a software system using natural language constructs (English-like phrases). Because of the English-like structure of. feature files, BDD specifications become an evolving documentation that helps all (even non-technical) stakeholders to understand and contribute to a software project. After specifying a. feature files, developers can use a BDD tool (e.g., Cucumber) to automatically generate test cases and implement the code of the specified functionality. However, maintaining traceability between. feature files and source code requires human efforts. Therefore,. feature files can be out-of-date, reducing the advantages of using BDD. Furthermore, existing research do not attempt to improve the traceability between. feature files and source code files. In this paper, we study the co-changes between. feature files and source code files to improve the traceability between. feature files and source code files. Due to the English-like syntax of. feature files, we use natural language processing to identify co-changes, with an accuracy of 79%. We study the characteristics of BDD co-changes and build random forest models to predict when a. feature files should be modified before committing a code change. The random forest model obtains an AUC of 0.77. The model can assist developers in identifying when a. feature files should be modified in code commits. Once the traceability is up-to-date, BDD developers can write test code more efficiently and keep the software documentation up-to-date.
{"title":"Predicting Co-Changes between Functionality Specifications and Source Code in Behavior Driven Development","authors":"Aidan Z. H. Yang, D. A. D. Costa, Ying Zou","doi":"10.1109/MSR.2019.00080","DOIUrl":"https://doi.org/10.1109/MSR.2019.00080","url":null,"abstract":"Behavior Driven Development (BDD) is an agile approach that uses. feature files to describe the functionalities of a software system using natural language constructs (English-like phrases). Because of the English-like structure of. feature files, BDD specifications become an evolving documentation that helps all (even non-technical) stakeholders to understand and contribute to a software project. After specifying a. feature files, developers can use a BDD tool (e.g., Cucumber) to automatically generate test cases and implement the code of the specified functionality. However, maintaining traceability between. feature files and source code requires human efforts. Therefore,. feature files can be out-of-date, reducing the advantages of using BDD. Furthermore, existing research do not attempt to improve the traceability between. feature files and source code files. In this paper, we study the co-changes between. feature files and source code files to improve the traceability between. feature files and source code files. Due to the English-like syntax of. feature files, we use natural language processing to identify co-changes, with an accuracy of 79%. We study the characteristics of BDD co-changes and build random forest models to predict when a. feature files should be modified before committing a code change. The random forest model obtains an AUC of 0.77. The model can assist developers in identifying when a. feature files should be modified in code commits. Once the traceability is up-to-date, BDD developers can write test code more efficiently and keep the software documentation up-to-date.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"15 1","pages":"534-544"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80100303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Musfiqur Rahman, Peter C. Rigby, Dharani Palani, T. Nguyen
Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. In this paper we clean StackOverflow, one of the most popular online discussion forums for programmers, to generate a parallel English-Code corpus from Android posts. We contrast three data cleaning approaches: standard NLP, title only, and software task extraction. We evaluate the quality of the each corpus for MT. To provide indicators of how useful each corpus will be for machine translation, we provide researchers with measurements of the corpus size, percentage of unique tokens, and per-word maximum likelihood alignment entropy. We have used these corpus cleaning approaches to translate between English and Code [22, 23], to compare existing SMT approaches from word mapping to neural networks [24], and to re-examine the "natural software" hypothesis [29]. After cleaning and aligning the data, we create a simple maximum likelihood MT model to show that English words in the corpus map to a small number of specific code elements. This model provides a basis for the success of using StackOverflow for search and other tasks in the software engineering literature and paves the way for MT. Our scripts and corpora are publicly available on GitHub [1] as well as at https://search.datacite.org/works/10.5281/zenodo.2558551.
{"title":"Cleaning StackOverflow for Machine Translation","authors":"Musfiqur Rahman, Peter C. Rigby, Dharani Palani, T. Nguyen","doi":"10.1109/MSR.2019.00021","DOIUrl":"https://doi.org/10.1109/MSR.2019.00021","url":null,"abstract":"Generating source code API sequences from an English query using Machine Translation (MT) has gained much interest in recent years. For any kind of MT, the model needs to be trained on a parallel corpus. In this paper we clean StackOverflow, one of the most popular online discussion forums for programmers, to generate a parallel English-Code corpus from Android posts. We contrast three data cleaning approaches: standard NLP, title only, and software task extraction. We evaluate the quality of the each corpus for MT. To provide indicators of how useful each corpus will be for machine translation, we provide researchers with measurements of the corpus size, percentage of unique tokens, and per-word maximum likelihood alignment entropy. We have used these corpus cleaning approaches to translate between English and Code [22, 23], to compare existing SMT approaches from word mapping to neural networks [24], and to re-examine the \"natural software\" hypothesis [29]. After cleaning and aligning the data, we create a simple maximum likelihood MT model to show that English words in the corpus map to a small number of specific code elements. This model provides a basis for the success of using StackOverflow for search and other tasks in the software engineering literature and paves the way for MT. Our scripts and corpora are publicly available on GitHub [1] as well as at https://search.datacite.org/works/10.5281/zenodo.2558551.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"18 1","pages":"79-83"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87249080","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While several researchers have published bug data sets in the past, there has been less focus on bugs related to non-functional requirements. Non-functional requirements describe the quality attributes of a program. In this work, we introduce NFBugs, a data set of 133 non-functional bug fixes collected from 65 open-source projects written in Java and Python. NFBugs can be used to support code recommender systems focusing on non-functional properties.
{"title":"A Dataset of Non-Functional Bugs","authors":"Aida Radu, Sarah Nadi","doi":"10.1109/MSR.2019.00066","DOIUrl":"https://doi.org/10.1109/MSR.2019.00066","url":null,"abstract":"While several researchers have published bug data sets in the past, there has been less focus on bugs related to non-functional requirements. Non-functional requirements describe the quality attributes of a program. In this work, we introduce NFBugs, a data set of 133 non-functional bug fixes collected from 65 open-source projects written in Java and Python. NFBugs can be used to support code recommender systems focusing on non-functional properties.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"4 1","pages":"399-403"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85591473","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cryptocurrencies have a significant open source development presence on GitHub. This presents a unique opportunity to observe their related developer effort and software growth. Individual cryptocurrency prices are partly driven by attractiveness, and we hypothesize that high-quality, actively-developed software is one of its influences. Thus, we report on a study of a panel data set containing nearly a year of daily observations of development activity, popularity, and market capitalization for over two hundred open source cryptocurrencies. We find that open source project popularity is associated with higher market capitalization, though development activity and quality assurance practices are insignificant variables in our models. Using Granger causality tests, we find no compelling evidence for a dynamic relation between market capitalization and metrics such as daily stars, forks, watchers, commits, contributors, and lines of code changed.
{"title":"Striking Gold in Software Repositories? An Econometric Study of Cryptocurrencies on GitHub","authors":"Asher Trockman, R. V. Tonder, Bogdan Vasilescu","doi":"10.1109/MSR.2019.00036","DOIUrl":"https://doi.org/10.1109/MSR.2019.00036","url":null,"abstract":"Cryptocurrencies have a significant open source development presence on GitHub. This presents a unique opportunity to observe their related developer effort and software growth. Individual cryptocurrency prices are partly driven by attractiveness, and we hypothesize that high-quality, actively-developed software is one of its influences. Thus, we report on a study of a panel data set containing nearly a year of daily observations of development activity, popularity, and market capitalization for over two hundred open source cryptocurrencies. We find that open source project popularity is associated with higher market capitalization, though development activity and quality assurance practices are insignificant variables in our models. Using Granger causality tests, we find no compelling evidence for a dynamic relation between market capitalization and metrics such as daily stars, forks, watchers, commits, contributors, and lines of code changed.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"69 1","pages":"181-185"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87142071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
On development using a version control system, understanding differences of source code is important. Edit scripts (in short, ES) represent differences between two versions of source code. One of the tools generating ESs is GumTree. GumTree takes two versions of source code as input and generates an ES consisting of insert, delete, update and move nodes of abstract syntax tree (in short, AST). However, the accuracy of move and update actions generated by GumTree is insufficient, which makes ESs more difficult to understand. A reason why the accuracy is insufficient is that GumTree generates ESs from only information of AST. Thus, in this research, we propose to generate easier-to-understand ESs by using not only structures of AST but also information of line differences. To evaluate our methodology, we applied it to some open source software, and we confirmed that ESs generated by our methodology are more helpful to understand the differences of source code than GumTree.
{"title":"Beyond GumTree: A Hybrid Approach to Generate Edit Scripts","authors":"Junnosuke Matsumoto, Yoshiki Higo, S. Kusumoto","doi":"10.1109/MSR.2019.00082","DOIUrl":"https://doi.org/10.1109/MSR.2019.00082","url":null,"abstract":"On development using a version control system, understanding differences of source code is important. Edit scripts (in short, ES) represent differences between two versions of source code. One of the tools generating ESs is GumTree. GumTree takes two versions of source code as input and generates an ES consisting of insert, delete, update and move nodes of abstract syntax tree (in short, AST). However, the accuracy of move and update actions generated by GumTree is insufficient, which makes ESs more difficult to understand. A reason why the accuracy is insufficient is that GumTree generates ESs from only information of AST. Thus, in this research, we propose to generate easier-to-understand ESs by using not only structures of AST but also information of line differences. To evaluate our methodology, we applied it to some open source software, and we confirmed that ESs generated by our methodology are more helpful to understand the differences of source code than GumTree.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"73 1","pages":"550-554"},"PeriodicalIF":0.0,"publicationDate":"2019-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84022045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sven Amann, H. Nguyen, Sarah Nadi, T. Nguyen, M. Mezini
Application Programming Interfaces (APIs) often impose constraints such as call order or preconditions. API misuses, i.e., usages violating these constraints, may cause software crashes, data-loss, and vulnerabilities. Researchers developed several approaches to detect API misuses, typically still resulting in low recall and precision. In this work, we investigate ways to improve API-misuse detection. We design MUDetect, an API-misuse detector that builds on the strengths of existing detectors and tries to mitigate their weaknesses. MUDetect uses a new graph representation of API usages that captures different types of API misuses and a systematically designed ranking strategy that effectively improves precision. Evaluation shows that MUDetect identifies real-world API misuses with twice the recall of previous detectors and 2.5x higher precision. It even achieves almost 4x higher precision and recall, when mining patterns across projects, rather than from only the target project.
{"title":"Investigating Next Steps in Static API-Misuse Detection","authors":"Sven Amann, H. Nguyen, Sarah Nadi, T. Nguyen, M. Mezini","doi":"10.1109/MSR.2019.00053","DOIUrl":"https://doi.org/10.1109/MSR.2019.00053","url":null,"abstract":"Application Programming Interfaces (APIs) often impose constraints such as call order or preconditions. API misuses, i.e., usages violating these constraints, may cause software crashes, data-loss, and vulnerabilities. Researchers developed several approaches to detect API misuses, typically still resulting in low recall and precision. In this work, we investigate ways to improve API-misuse detection. We design MUDetect, an API-misuse detector that builds on the strengths of existing detectors and tries to mitigate their weaknesses. MUDetect uses a new graph representation of API usages that captures different types of API misuses and a systematically designed ranking strategy that effectively improves precision. Evaluation shows that MUDetect identifies real-world API misuses with twice the recall of previous detectors and 2.5x higher precision. It even achieves almost 4x higher precision and recall, when mining patterns across projects, rather than from only the target project.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"55 1","pages":"265-275"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86570877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hugo Matalonga, Bruno Cabral, F. C. Filho, Marco Couto, Rui Pereira, S. Sousa, J. Fernandes
As mobile devices are supporting more and more of our daily activities, it is vital to widen their battery up-time as much as possible. In fact, according to the Wall Street Journal, 9/10 users suffer from low battery anxiety. The goal of our work is to understand how Android usage, apps, operating systems, hardware and user habits influence battery lifespan. Our strategy is to collect anonymous raw data from devices all over the world, through a mobile app, build and analyze a large-scale dataset containing real-world, day-to-day data, representative of user practices. So far, the dataset we collected includes 12 million+ (anonymous) data samples, across 900+ device brands and 5.000+ models. And, it keeps growing. The data we collect, which is publicly available and by different channels, is sufficiently heterogeneous for supporting studies with a wide range of focuses and research goals, thus opening the opportunity to inform and reshape user habits, and even influence the development of both hardware and software for mobile devices.
{"title":"GreenHub Farmer: Real-World Data for Android Energy Mining","authors":"Hugo Matalonga, Bruno Cabral, F. C. Filho, Marco Couto, Rui Pereira, S. Sousa, J. Fernandes","doi":"10.1109/MSR.2019.00034","DOIUrl":"https://doi.org/10.1109/MSR.2019.00034","url":null,"abstract":"As mobile devices are supporting more and more of our daily activities, it is vital to widen their battery up-time as much as possible. In fact, according to the Wall Street Journal, 9/10 users suffer from low battery anxiety. The goal of our work is to understand how Android usage, apps, operating systems, hardware and user habits influence battery lifespan. Our strategy is to collect anonymous raw data from devices all over the world, through a mobile app, build and analyze a large-scale dataset containing real-world, day-to-day data, representative of user practices. So far, the dataset we collected includes 12 million+ (anonymous) data samples, across 900+ device brands and 5.000+ models. And, it keeps growing. The data we collect, which is publicly available and by different channels, is sufficiently heterogeneous for supporting studies with a wide range of focuses and research goals, thus opening the opportunity to inform and reshape user habits, and even influence the development of both hardware and software for mobile devices.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"23 1","pages":"171-175"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77837581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
J. F. Pimentel, Leonardo Gresta Paulino Murta, V. Braganholo, J. Freire
Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development.
{"title":"A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks","authors":"J. F. Pimentel, Leonardo Gresta Paulino Murta, V. Braganholo, J. Freire","doi":"10.1109/MSR.2019.00077","DOIUrl":"https://doi.org/10.1109/MSR.2019.00077","url":null,"abstract":"Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"81 1","pages":"507-517"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78658011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}