We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning. We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages ("library vectors"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).
{"title":"Import2vec: Learning Embeddings for Software Libraries","authors":"B. Theeten, Frederik Vandeputte, T. V. Cutsem","doi":"10.1109/MSR.2019.00014","DOIUrl":"https://doi.org/10.1109/MSR.2019.00014","url":null,"abstract":"We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning. We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages (\"library vectors\"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"27 1","pages":"18-28"},"PeriodicalIF":0.0,"publicationDate":"2019-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84052019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication, from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.
{"title":"git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories","authors":"Christoph Gote, Ingo Scholtes, F. Schweitzer","doi":"10.1109/MSR.2019.00070","DOIUrl":"https://doi.org/10.1109/MSR.2019.00070","url":null,"abstract":"Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication, from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"32 2 1","pages":"433-444"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89916240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
João Eduardo Montandon, L. L. Silva, M. T. Valente
Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.
{"title":"Identifying Experts in Software Libraries and Frameworks Among GitHub Users","authors":"João Eduardo Montandon, L. L. Silva, M. T. Valente","doi":"10.1109/MSR.2019.00054","DOIUrl":"https://doi.org/10.1109/MSR.2019.00054","url":null,"abstract":"Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"77 1","pages":"276-287"},"PeriodicalIF":0.0,"publicationDate":"2019-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79687192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anwar Alqaimi, Patanamon Thongtanunam, Christoph Treude
When lambda expressions were introduced to the Java programming language as part of the release of Java 8 in 2014, they were the language's first step into functional programming. Since lambda expressions are still relatively new, not all developers use or understand them. In this paper, we first present the results of an empirical study to determine how frequently developers of GitHub repositories make use of lambda expressions and how they are documented. We find that 11% of Java GitHub repositories use lambda expressions, and that only 6% of the lambda expressions are accompanied by source code comments. We then present a tool called LambdaDoc which can automatically detect lambda expressions in a Java repository and generate natural language documentation for them. Our evaluation of LambdaDoc with 23 professional developers shows that they perceive the generated documentation to be complete, concise, and expressive, while the majority of the documentation produced by our participants without tool support was inadequate. Our contribution builds an important step towards automatically generating documentation for functional programming constructs in an object-oriented language.
{"title":"Automatically Generating Documentation for Lambda Expressions in Java","authors":"Anwar Alqaimi, Patanamon Thongtanunam, Christoph Treude","doi":"10.1109/MSR.2019.00057","DOIUrl":"https://doi.org/10.1109/MSR.2019.00057","url":null,"abstract":"When lambda expressions were introduced to the Java programming language as part of the release of Java 8 in 2014, they were the language's first step into functional programming. Since lambda expressions are still relatively new, not all developers use or understand them. In this paper, we first present the results of an empirical study to determine how frequently developers of GitHub repositories make use of lambda expressions and how they are documented. We find that 11% of Java GitHub repositories use lambda expressions, and that only 6% of the lambda expressions are accompanied by source code comments. We then present a tool called LambdaDoc which can automatically detect lambda expressions in a Java repository and generate natural language documentation for them. Our evaluation of LambdaDoc with 23 professional developers shows that they perceive the generated documentation to be complete, concise, and expressive, while the majority of the documentation produced by our participants without tool support was inadequate. Our contribution builds an important step towards automatically generating documentation for functional programming constructs in an object-oriented language.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"237 1","pages":"310-320"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77276069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, B. Baudry
Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.
{"title":"The Emergence of Software Diversity in Maven Central","authors":"César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, B. Baudry","doi":"10.1109/MSR.2019.00059","DOIUrl":"https://doi.org/10.1109/MSR.2019.00059","url":null,"abstract":"Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"27 1","pages":"333-343"},"PeriodicalIF":0.0,"publicationDate":"2019-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80100666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont
Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool, which we developed, and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software, and the commits fixing them. The data were obtained both from the National Vulnerability Database (NVD), and from project-specific web resources, which we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct opensource Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46, which do have such identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories, and to augment the attributes available for each instance. Moreover, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; it also represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.
{"title":"A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software","authors":"Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont","doi":"10.1109/MSR.2019.00064","DOIUrl":"https://doi.org/10.1109/MSR.2019.00064","url":null,"abstract":"Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool, which we developed, and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software, and the commits fixing them. The data were obtained both from the National Vulnerability Database (NVD), and from project-specific web resources, which we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct opensource Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46, which do have such identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories, and to augment the attributes available for each instance. Moreover, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; it also represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"92 1","pages":"383-387"},"PeriodicalIF":0.0,"publicationDate":"2019-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87287827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais
The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.
{"title":"The Maven Dependency Graph: A Temporal Graph-Based Representation of Maven Central","authors":"Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais","doi":"10.1109/MSR.2019.00060","DOIUrl":"https://doi.org/10.1109/MSR.2019.00060","url":null,"abstract":"The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"79 1","pages":"344-348"},"PeriodicalIF":0.0,"publicationDate":"2019-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91242426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wellington de Oliveira Júnior, R. Oliveira dos Santos, Fernando José Castor de Lima Filho, Benito Fernandes de Araújo Neto, Gustavo Henrique Lima Pinto
Over the last years, increasing attention has been given to creating energy-efficient software systems. However, developers still lack the knowledge and the tools to support them in that task. In this work, we explore our vision that energy consumption non-specialists can build software that consumes less energy by alternating, at development time, between third-party, readily available, diversely-designed pieces of software, without increasing the development complexity. To support our vision, we propose an approach for energy-aware development that combines the construction of application-independent energy profiles of Java collections and static analysis to produce an estimate of in which ways and how intensively a system employs these collections. By combining these two pieces of information, it is possible to produce energy-saving recommendations for alternative collection implementations to be used in different parts of the system. We implement this approach in a tool named CT+ that works with both desktop and mobile Java systems, and is capable of analyzing 40 different collection implementations of lists, maps, and sets. We applied CT+ to twelve software systems: two mobile-based, seven desktop-based, and three that can run in both environments. Our evaluation infrastructure involved a high-end server, a notebook, and three mobile devices. When applying the (mostly trivial) recommendations, we achieved up to 17.34% reduction in energy consumption just by replacing collection implementations. Even for a real world, mature, highly-optimized system such as Xalan, CT+ could achieve a 5.81% reduction in energy consumption. Our results indicate that some widely used collections, e.g., ArrayList, HashMap, and HashTable, are not energy-efficient and sometimes should be avoided when energy consumption is a major concern.
{"title":"Recommending Energy-Efficient Java Collections","authors":"Wellington de Oliveira Júnior, R. Oliveira dos Santos, Fernando José Castor de Lima Filho, Benito Fernandes de Araújo Neto, Gustavo Henrique Lima Pinto","doi":"10.1109/MSR.2019.00033","DOIUrl":"https://doi.org/10.1109/MSR.2019.00033","url":null,"abstract":"Over the last years, increasing attention has been given to creating energy-efficient software systems. However, developers still lack the knowledge and the tools to support them in that task. In this work, we explore our vision that energy consumption non-specialists can build software that consumes less energy by alternating, at development time, between third-party, readily available, diversely-designed pieces of software, without increasing the development complexity. To support our vision, we propose an approach for energy-aware development that combines the construction of application-independent energy profiles of Java collections and static analysis to produce an estimate of in which ways and how intensively a system employs these collections. By combining these two pieces of information, it is possible to produce energy-saving recommendations for alternative collection implementations to be used in different parts of the system. We implement this approach in a tool named CT+ that works with both desktop and mobile Java systems, and is capable of analyzing 40 different collection implementations of lists, maps, and sets. We applied CT+ to twelve software systems: two mobile-based, seven desktop-based, and three that can run in both environments. Our evaluation infrastructure involved a high-end server, a notebook, and three mobile devices. When applying the (mostly trivial) recommendations, we achieved up to 17.34% reduction in energy consumption just by replacing collection implementations. Even for a real world, mature, highly-optimized system such as Xalan, CT+ could achieve a 5.81% reduction in energy consumption. Our results indicate that some widely used collections, e.g., ArrayList, HashMap, and HashTable, are not energy-efficient and sometimes should be avoided when energy consumption is a major concern.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"10 1","pages":"160-170"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79670762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anderson Severo de Matos, João Bosco Ferreira Filho, Lincoln Souza Rocha
Software unbundling consists of dividing an existing software artifact into smaller ones. Unbundling can be useful for removing clutter from the original application or separating different features that may not share the same purpose, or simply for isolating an emergent functionality that merits to be an application on its own. This phenomenon is frequent with mobile apps and it is also propagating to APIs. This paper proposes a first empirical study on unbundling to understand its effects on popular APIs. We explore the possibilities of splitting libraries into 2 or more bundles based on the use that their client projects make of them. We mine over than 71,000 client projects of 10 open source APIs and automatically generate 2,090 sub-APIs to then study their properties. We find that it is possible to have sets of different ways of using a given API and to unbundle it accordingly; the bundles can vary their representativeness and uniqueness, which is analyzed thoroughly in this study.
{"title":"Splitting APIs: An Exploratory Study of Software Unbundling","authors":"Anderson Severo de Matos, João Bosco Ferreira Filho, Lincoln Souza Rocha","doi":"10.1109/MSR.2019.00062","DOIUrl":"https://doi.org/10.1109/MSR.2019.00062","url":null,"abstract":"Software unbundling consists of dividing an existing software artifact into smaller ones. Unbundling can be useful for removing clutter from the original application or separating different features that may not share the same purpose, or simply for isolating an emergent functionality that merits to be an application on its own. This phenomenon is frequent with mobile apps and it is also propagating to APIs. This paper proposes a first empirical study on unbundling to understand its effects on popular APIs. We explore the possibilities of splitting libraries into 2 or more bundles based on the use that their client projects make of them. We mine over than 71,000 client projects of 10 open source APIs and automatically generate 2,090 sub-APIs to then study their properties. We find that it is possible to have sets of different ways of using a given API and to unbundle it accordingly; the bundles can vary their representativeness and uniqueness, which is analyzed thoroughly in this study.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"245 1","pages":"360-370"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74495839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rui Abreu, University of Lisbon, Portugal Jun Ai, Beihang University, China Domenico Amalfitano, University of Naples Federico II, Italy Doo-Hwan Bae, Korea Advanced Institute of Science and Technology, Korea Xiaoying Bai, Tsinghua University, China Lingfeng Bao, Zhejiang University City College, China David Benavides, University of Seville, Spain Antonia Bertolino, Italian National Research Council, Italy Mario Bravetti, Università di Bologna, Italy Christof Budnik, Siemens, Germany Yan Cai, Chinese Academy of Sciences, China Emilia Cambronero, Universidad Castilla-La Mancha, Spain Ana Cavalli, IT SudParis, France Arun Chakrapani Rao, University of Warwick, UK W.K. Chan, City University of Hong Kong, Hong Kong Junjie Chen, Peking University, China Yue Chen, Palo Alto Networks, USA William Chu, Tunghai University, Taiwan Sunita Chulani, Cisco, USA Frederic Dadeau, University of Franche-Comté, France Yuanshun Dai, University of Electronic Science and Technology of China, China Junhua Ding, East Carolina University, USA Tadashi Dohi, Hiroshima University, Japan Wei Dong, National University of Defense Technology, China Yunwei Dong, Northwestern Polytechnical University, China Benedikt Eberhardinger, MHP — A Porsche Company, Germany Khaled El-Fakih, American University of Sharjah, UAE Sadik Esmelioglu, Middle East Technical University, Turkey Hugues Evrard, Imperial College London, UK Joao Pascoal Faria, University of Porto, Portugal Thoshitha Gamage, Southern Illinois University Edwardsville, USA Sudipto Ghosh, Colorado State University, USA Arnaud Gotlieb, Simula Research Laboratory, Norway Matthias Güdemann, Input Output Hong Kong, Hong Kong Rajiv Gupta, University of California, Riverside, USA Chin-Yu Huang, National Tsing-Hua University, Taiwan Song Huang, Army Engineering University, China Ali Hurson, Missouri University of Science and Technology, USA Bo Jiang, Beihang University, China He Jiang, Dalian University of Technology, China Yu Jiang, Tsinghua University, China Xiaoyuan Jing, Wuhan University, China Roland Jochem, TU Berlin, Germany Sun Jun, Singapore University of Technology and Design, Singapore Jacky Keung, City University of Hong Kong, Hong Kong Pavneet Kochhar, Microsoft, USA Xuan-Bach Le, Carnegie Mellon University, USA
{"title":"Program Committee","authors":"Rui Abreu","doi":"10.1109/eitt.2018.00007","DOIUrl":"https://doi.org/10.1109/eitt.2018.00007","url":null,"abstract":"Rui Abreu, University of Lisbon, Portugal Jun Ai, Beihang University, China Domenico Amalfitano, University of Naples Federico II, Italy Doo-Hwan Bae, Korea Advanced Institute of Science and Technology, Korea Xiaoying Bai, Tsinghua University, China Lingfeng Bao, Zhejiang University City College, China David Benavides, University of Seville, Spain Antonia Bertolino, Italian National Research Council, Italy Mario Bravetti, Università di Bologna, Italy Christof Budnik, Siemens, Germany Yan Cai, Chinese Academy of Sciences, China Emilia Cambronero, Universidad Castilla-La Mancha, Spain Ana Cavalli, IT SudParis, France Arun Chakrapani Rao, University of Warwick, UK W.K. Chan, City University of Hong Kong, Hong Kong Junjie Chen, Peking University, China Yue Chen, Palo Alto Networks, USA William Chu, Tunghai University, Taiwan Sunita Chulani, Cisco, USA Frederic Dadeau, University of Franche-Comté, France Yuanshun Dai, University of Electronic Science and Technology of China, China Junhua Ding, East Carolina University, USA Tadashi Dohi, Hiroshima University, Japan Wei Dong, National University of Defense Technology, China Yunwei Dong, Northwestern Polytechnical University, China Benedikt Eberhardinger, MHP — A Porsche Company, Germany Khaled El-Fakih, American University of Sharjah, UAE Sadik Esmelioglu, Middle East Technical University, Turkey Hugues Evrard, Imperial College London, UK Joao Pascoal Faria, University of Porto, Portugal Thoshitha Gamage, Southern Illinois University Edwardsville, USA Sudipto Ghosh, Colorado State University, USA Arnaud Gotlieb, Simula Research Laboratory, Norway Matthias Güdemann, Input Output Hong Kong, Hong Kong Rajiv Gupta, University of California, Riverside, USA Chin-Yu Huang, National Tsing-Hua University, Taiwan Song Huang, Army Engineering University, China Ali Hurson, Missouri University of Science and Technology, USA Bo Jiang, Beihang University, China He Jiang, Dalian University of Technology, China Yu Jiang, Tsinghua University, China Xiaoyuan Jing, Wuhan University, China Roland Jochem, TU Berlin, Germany Sun Jun, Singapore University of Technology and Design, Singapore Jacky Keung, City University of Hong Kong, Hong Kong Pavneet Kochhar, Microsoft, USA Xuan-Bach Le, Carnegie Mellon University, USA","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74509026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}