首页 > 最新文献

2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)最新文献

英文 中文
Import2vec: Learning Embeddings for Software Libraries Import2vec:学习软件库的嵌入
B. Theeten, Frederik Vandeputte, T. V. Cutsem
We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning. We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages ("library vectors"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).
我们考虑为捕获库之间语义相似性的库包开发合适的学习表示(嵌入)的问题。已知这种表示可以提高下游学习任务(例如分类)或应用程序(例如上下文搜索和类比推理)的性能。我们应用自然语言处理(NLP)中的词嵌入技术来训练库包的嵌入(“库向量”)。库向量通过类似的使用上下文表示库,这些上下文由源代码中的import语句决定。通过在三个大型开源软件语料库上训练这种嵌入获得的实验结果表明,库向量捕获了软件库之间语义上有意义的关系,例如框架与其插件之间的关系,以及生态系统中常用的库之间的关系,例如大数据基础设施项目(Java),前端和后端web开发框架(JavaScript)和数据科学工具包(Python)。
{"title":"Import2vec: Learning Embeddings for Software Libraries","authors":"B. Theeten, Frederik Vandeputte, T. V. Cutsem","doi":"10.1109/MSR.2019.00014","DOIUrl":"https://doi.org/10.1109/MSR.2019.00014","url":null,"abstract":"We consider the problem of developing suitable learning representations (embeddings) for library packages that capture semantic similarity among libraries. Such representations are known to improve the performance of downstream learning tasks (e.g. classification) or applications such as contextual search and analogical reasoning. We apply word embedding techniques from natural language processing (NLP) to train embeddings for library packages (\"library vectors\"). Library vectors represent libraries by similar context of use as determined by import statements present in source code. Experimental results obtained from training such embeddings on three large open source software corpora reveals that library vectors capture semantically meaningful relationships among software libraries, such as the relationship between frameworks and their plug-ins and libraries commonly used together within ecosystems such as big data infrastructure projects (in Java), front-end and back-end web development frameworks (in JavaScript) and data science toolkits (in Python).","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"27 1","pages":"18-28"},"PeriodicalIF":0.0,"publicationDate":"2019-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84052019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories 从大型git存储库中挖掘带有时间戳的协同编辑网络
Christoph Gote, Ingo Scholtes, F. Schweitzer
Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication, from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.
来自软件存储库的数据已经成为软件工程过程实证研究的重要基础。存储库挖掘文献中反复出现的一个主题是从项目的提交历史中获取开发人员网络的推断,例如协作、协调或通信。所研究的大多数网络都是基于在文件、模块或包级别上定义的软件工件的共同作者。虽然这种方法导致了对软件开发的社会方面的深入了解,但它忽略了关于代码更改和代码所有权的详细信息,例如,哪一行代码是由哪个开发人员编写的,这些信息包含在软件项目的提交日志中。为了解决这个问题,我们引入了git2net,这是一个可扩展的python软件,可以方便地从大型git存储库中提取细粒度的协同编辑网络。它使用文本挖掘技术来分析文件中文本修改的详细历史。这些信息允许我们构建定向的、加权的和带有时间戳的网络,其中一个链接表示一个开发人员编辑了最初由另一个开发人员编写的源代码块。我们的工具应用于开源和商业软件项目的案例研究中。我们认为,它为人类协作模式的高分辨率数据开辟了一个巨大的新来源。
{"title":"git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories","authors":"Christoph Gote, Ingo Scholtes, F. Schweitzer","doi":"10.1109/MSR.2019.00070","DOIUrl":"https://doi.org/10.1109/MSR.2019.00070","url":null,"abstract":"Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication, from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"32 2 1","pages":"433-444"},"PeriodicalIF":0.0,"publicationDate":"2019-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89916240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Identifying Experts in Software Libraries and Frameworks Among GitHub Users 在GitHub用户中识别软件库和框架专家
João Eduardo Montandon, L. L. Silva, M. T. Valente
Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.
软件开发越来越依赖于库和框架来提高生产力和缩短上市时间。尽管如此,我们仍然缺乏技术来评估开发人员在广泛流行的库和框架中的专业知识。在本文中,我们评估了无监督(基于聚类)和监督机器学习分类器(随机森林和支持向量机)的性能,以识别三个流行的JavaScript库中的专家:facebook/react, mongodb/node-mongodb和socketio/socket.io。首先,我们收集了关于GitHub项目上开发人员活动的13个特性,包括对依赖这些库的源代码文件的提交。我们还建立了一个基本事实,包括575名开发人员对所研究库的专业知识,正如他们在调查中自我报告的那样。根据我们的发现,我们记录了使用机器学习分类器预测软件库专业知识的挑战,使用从GitHub提取的特征。然后,我们提出了一种基于聚类GitHub特征数据的图书馆专家识别方法;通过将这种方法的结果与Linkedin个人资料上可用的信息进行三角测量,我们发现它能够推荐数十个GitHub用户,这些用户有证据表明他们是所研究的JavaScript库的专家。我们还提供了一个公共数据集,其中包含了所研究库的575名开发人员的专业知识。
{"title":"Identifying Experts in Software Libraries and Frameworks Among GitHub Users","authors":"João Eduardo Montandon, L. L. Silva, M. T. Valente","doi":"10.1109/MSR.2019.00054","DOIUrl":"https://doi.org/10.1109/MSR.2019.00054","url":null,"abstract":"Software development increasingly depends on libraries and frameworks to increase productivity and reduce time-to-market. Despite this fact, we still lack techniques to assess developers expertise in widely popular libraries and frameworks. In this paper, we evaluate the performance of unsupervised (based on clustering) and supervised machine learning classifiers (Random Forest and SVM) to identify experts in three popular JavaScript libraries: facebook/react, mongodb/node-mongodb, and socketio/socket.io. First, we collect 13 features about developers activity on GitHub projects, including commits on source code files that depend on these libraries. We also build a ground truth including the expertise of 575 developers on the studied libraries, as self-reported by them in a survey. Based on our findings, we document the challenges of using machine learning classifiers to predict expertise in software libraries, using features extracted from GitHub. Then, we propose a method to identify library experts based on clustering feature data from GitHub; by triangulating the results of this method with information available on Linkedin profiles, we show that it is able to recommend dozens of GitHub users with evidences of being experts in the studied JavaScript libraries. We also provide a public dataset with the expertise of 575 developers on the studied libraries.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"77 1","pages":"276-287"},"PeriodicalIF":0.0,"publicationDate":"2019-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79687192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
Automatically Generating Documentation for Lambda Expressions in Java 在Java中自动生成Lambda表达式文档
Anwar Alqaimi, Patanamon Thongtanunam, Christoph Treude
When lambda expressions were introduced to the Java programming language as part of the release of Java 8 in 2014, they were the language's first step into functional programming. Since lambda expressions are still relatively new, not all developers use or understand them. In this paper, we first present the results of an empirical study to determine how frequently developers of GitHub repositories make use of lambda expressions and how they are documented. We find that 11% of Java GitHub repositories use lambda expressions, and that only 6% of the lambda expressions are accompanied by source code comments. We then present a tool called LambdaDoc which can automatically detect lambda expressions in a Java repository and generate natural language documentation for them. Our evaluation of LambdaDoc with 23 professional developers shows that they perceive the generated documentation to be complete, concise, and expressive, while the majority of the documentation produced by our participants without tool support was inadequate. Our contribution builds an important step towards automatically generating documentation for functional programming constructs in an object-oriented language.
当lambda表达式作为2014年Java 8发布的一部分被引入Java编程语言时,它们是该语言进入函数式编程的第一步。由于lambda表达式仍然相对较新,并非所有开发人员都使用或理解它们。在本文中,我们首先提出了一项实证研究的结果,以确定GitHub存储库的开发人员使用lambda表达式的频率以及它们是如何记录的。我们发现11%的Java GitHub存储库使用lambda表达式,只有6%的lambda表达式伴随着源代码注释。然后,我们介绍了一个名为LambdaDoc的工具,它可以自动检测Java存储库中的lambda表达式,并为它们生成自然语言文档。我们对23名专业开发人员对LambdaDoc的评估表明,他们认为生成的文档是完整的、简洁的和富有表现力的,而我们的参与者在没有工具支持的情况下生成的大多数文档是不充分的。我们的贡献为在面向对象语言中自动生成函数式编程结构的文档迈出了重要的一步。
{"title":"Automatically Generating Documentation for Lambda Expressions in Java","authors":"Anwar Alqaimi, Patanamon Thongtanunam, Christoph Treude","doi":"10.1109/MSR.2019.00057","DOIUrl":"https://doi.org/10.1109/MSR.2019.00057","url":null,"abstract":"When lambda expressions were introduced to the Java programming language as part of the release of Java 8 in 2014, they were the language's first step into functional programming. Since lambda expressions are still relatively new, not all developers use or understand them. In this paper, we first present the results of an empirical study to determine how frequently developers of GitHub repositories make use of lambda expressions and how they are documented. We find that 11% of Java GitHub repositories use lambda expressions, and that only 6% of the lambda expressions are accompanied by source code comments. We then present a tool called LambdaDoc which can automatically detect lambda expressions in a Java repository and generate natural language documentation for them. Our evaluation of LambdaDoc with 23 professional developers shows that they perceive the generated documentation to be complete, concise, and expressive, while the majority of the documentation produced by our participants without tool support was inadequate. Our contribution builds an important step towards automatically generating documentation for functional programming constructs in an object-oriented language.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"237 1","pages":"310-320"},"PeriodicalIF":0.0,"publicationDate":"2019-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77276069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
The Emergence of Software Diversity in Maven Central Maven Central中软件多样性的出现
César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, B. Baudry
Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.
Maven构件是不可变的:在Maven Central上上传的构件不能被删除或修改。开发人员升级库的唯一方法是发布新版本。因此,Maven Central会累积在那里发布的所有库的所有版本,并且声明依赖于某个库的应用程序可以选择任何版本。在这项工作中,我们假设Maven工件的不变性和选择任何版本的能力自然地支持Maven Central中软件多样性的出现。我们分析了代表73,653个库的所有版本的1,487,956个工件。我们观察到超过30%的库有多个版本,这些版本被最新的工件积极使用。在流行的库中,超过50%的版本被使用。我们还观察到,超过17%的库有几个版本的使用率明显高于其他版本。我们的结果表明,Maven Central中工件的不变性确实支持存储库中库版本之间的持续多样性。
{"title":"The Emergence of Software Diversity in Maven Central","authors":"César Soto-Valero, Amine Benelallam, Nicolas Harrand, Olivier Barais, B. Baudry","doi":"10.1109/MSR.2019.00059","DOIUrl":"https://doi.org/10.1109/MSR.2019.00059","url":null,"abstract":"Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to release a new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"27 1","pages":"333-343"},"PeriodicalIF":0.0,"publicationDate":"2019-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80100666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software 开源软件漏洞修复的人工管理数据集
Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont
Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool, which we developed, and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software, and the commits fixing them. The data were obtained both from the National Vulnerability Database (NVD), and from project-specific web resources, which we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct opensource Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46, which do have such identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories, and to augment the attributes available for each instance. Moreover, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; it also represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.
提高我们对软件漏洞的理解,自动识别它们,分析它们的影响,并最终缓解它们,对于开发更安全的软件是必要的。在操作我们开发的漏洞评估工具(目前SAP的数百个开发单位正在使用该工具)时,我们手动收集并管理了开源软件的漏洞数据集,并提交了修复它们的文件。数据来自国家漏洞数据库(NVD)和项目特定的网络资源,我们对这些资源进行了持续的监控。从这些数据中,我们提取了一个数据集,该数据集将624个公开披露的漏洞映射到1282个修复它们的提交上,这些漏洞影响了SAP产品或内部工具中使用的205个不同的开源Java项目。在624个漏洞中,29个根本没有CVE(公共漏洞和暴露)标识符,46个有由编号机构分配的标识符,但在NVD中还不可用。该数据集是在开源许可下发布的,连同支持脚本,允许研究人员从相应的存储库自动检索提交的实际内容,并增加每个实例可用的属性。此外,这些脚本允许使用非安全修复的额外实例来补充数据集(这在机器学习应用程序中很有用)。我们的数据集已经成功地用于训练分类器,这些分类器可以自动识别代码库中与安全相关的提交。该数据集和支持代码的开源发布将使未来的研究基于与工业相关的数据;它也代表了一个具体的步骤,使这个数据集的维护成为一个涉及开源社区、学术界和工业界的共同努力。
{"title":"A Manually-Curated Dataset of Fixes to Vulnerabilities of Open-Source Software","authors":"Serena Elisa Ponta, H. Plate, A. Sabetta, M. Bezzi, Cédric Dangremont","doi":"10.1109/MSR.2019.00064","DOIUrl":"https://doi.org/10.1109/MSR.2019.00064","url":null,"abstract":"Advancing our understanding of software vulnerabilities, automating their identification, the analysis of their impact, and ultimately their mitigation is necessary to enable the development of software that is more secure. While operating a vulnerability assessment tool, which we developed, and that is currently used by hundreds of development units at SAP, we manually collected and curated a dataset of vulnerabilities of open-source software, and the commits fixing them. The data were obtained both from the National Vulnerability Database (NVD), and from project-specific web resources, which we monitor on a continuous basis. From that data, we extracted a dataset that maps 624 publicly disclosed vulnerabilities affecting 205 distinct opensource Java projects, used in SAP products or internal tools, onto the 1282 commits that fix them. Out of 624 vulnerabilities, 29 do not have a CVE (Common Vulnerability and Exposure) identifier at all, and 46, which do have such identifier assigned by a numbering authority, are not available in the NVD yet. The dataset is released under an open-source license, together with supporting scripts that allow researchers to automatically retrieve the actual content of the commits from the corresponding repositories, and to augment the attributes available for each instance. Moreover, these scripts allow to complement the dataset with additional instances that are not security fixes (which is useful, for example, in machine learning applications). Our dataset has been successfully used to train classifiers that could automatically identify security-relevant commits in code repositories. The release of this dataset and the supporting code as open-source will allow future research to be based on data of industrial relevance; it also represents a concrete step towards making the maintenance of this dataset a shared effort involving open-source communities, academia, and the industry.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"92 1","pages":"383-387"},"PeriodicalIF":0.0,"publicationDate":"2019-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87287827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 82
The Maven Dependency Graph: A Temporal Graph-Based Representation of Maven Central Maven依赖图:Maven Central的临时图表示
Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais
The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.
Maven中央存储库为理解Java应用程序之间复杂的体系结构和进化现象提供了非凡的数据源。截至2018年9月6日,该存储库包含280万个工件(用基于jvm的语言实现的编译代码段),每个工件都具有元数据特征,例如精确版本,上传日期和对其他工件的依赖列表。今天,想要分析Maven工件及其依赖关系的完整生态系统的人面临两个关键挑战:(i)这是一个巨大的数据集;并且(ii)工件之间的依赖关系没有被显式建模并且不能被查询。在本文中,我们展示了Maven依赖关系图。这个开源数据集提供了两个贡献:2018年9月6日拍摄的整个Maven Central的快照,存储在一个图形数据库中,我们在其中显式地对所有依赖关系建模;一个开源的基础设施来查询这个庞大的数据集。
{"title":"The Maven Dependency Graph: A Temporal Graph-Based Representation of Maven Central","authors":"Amine Benelallam, Nicolas Harrand, César Soto-Valero, B. Baudry, Olivier Barais","doi":"10.1109/MSR.2019.00060","DOIUrl":"https://doi.org/10.1109/MSR.2019.00060","url":null,"abstract":"The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"79 1","pages":"344-348"},"PeriodicalIF":0.0,"publicationDate":"2019-01-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91242426","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Recommending Energy-Efficient Java Collections 推荐节能的Java集合
Wellington de Oliveira Júnior, R. Oliveira dos Santos, Fernando José Castor de Lima Filho, Benito Fernandes de Araújo Neto, Gustavo Henrique Lima Pinto
Over the last years, increasing attention has been given to creating energy-efficient software systems. However, developers still lack the knowledge and the tools to support them in that task. In this work, we explore our vision that energy consumption non-specialists can build software that consumes less energy by alternating, at development time, between third-party, readily available, diversely-designed pieces of software, without increasing the development complexity. To support our vision, we propose an approach for energy-aware development that combines the construction of application-independent energy profiles of Java collections and static analysis to produce an estimate of in which ways and how intensively a system employs these collections. By combining these two pieces of information, it is possible to produce energy-saving recommendations for alternative collection implementations to be used in different parts of the system. We implement this approach in a tool named CT+ that works with both desktop and mobile Java systems, and is capable of analyzing 40 different collection implementations of lists, maps, and sets. We applied CT+ to twelve software systems: two mobile-based, seven desktop-based, and three that can run in both environments. Our evaluation infrastructure involved a high-end server, a notebook, and three mobile devices. When applying the (mostly trivial) recommendations, we achieved up to 17.34% reduction in energy consumption just by replacing collection implementations. Even for a real world, mature, highly-optimized system such as Xalan, CT+ could achieve a 5.81% reduction in energy consumption. Our results indicate that some widely used collections, e.g., ArrayList, HashMap, and HashTable, are not energy-efficient and sometimes should be avoided when energy consumption is a major concern.
在过去的几年里,越来越多的人开始关注创建节能软件系统。然而,开发人员仍然缺乏知识和工具来支持他们完成这项任务。在这项工作中,我们探索了我们的愿景,即能源消耗非专业人员可以通过在开发时在第三方、随时可用的、不同设计的软件块之间交替构建消耗更少能源的软件,而不会增加开发的复杂性。为了支持我们的愿景,我们提出了一种能源意识开发的方法,该方法结合了Java集合的独立于应用程序的能源配置文件的构建和静态分析,以产生对系统以何种方式和如何密集地使用这些集合的估计。通过结合这两个信息,就有可能为系统不同部分使用的可选收集实现产生节能建议。我们在一个名为CT+的工具中实现了这种方法,该工具可以与桌面和移动Java系统一起工作,并且能够分析40种不同的列表、地图和集合的集合实现。我们将CT+应用于12个软件系统:2个基于移动设备,7个基于桌面设备,3个可以在两种环境下运行。我们的评估基础设施包括一台高端服务器、一台笔记本电脑和三台移动设备。当应用这些建议(大多数是微不足道的)时,我们仅通过替换收集实现就减少了高达17.34%的能耗。即使在现实世界中,成熟的、高度优化的系统,如Xalan, CT+也可以实现5.81%的能耗降低。我们的结果表明,一些广泛使用的集合,如ArrayList、HashMap和HashTable,并不节能,当能源消耗是一个主要问题时,有时应该避免使用。
{"title":"Recommending Energy-Efficient Java Collections","authors":"Wellington de Oliveira Júnior, R. Oliveira dos Santos, Fernando José Castor de Lima Filho, Benito Fernandes de Araújo Neto, Gustavo Henrique Lima Pinto","doi":"10.1109/MSR.2019.00033","DOIUrl":"https://doi.org/10.1109/MSR.2019.00033","url":null,"abstract":"Over the last years, increasing attention has been given to creating energy-efficient software systems. However, developers still lack the knowledge and the tools to support them in that task. In this work, we explore our vision that energy consumption non-specialists can build software that consumes less energy by alternating, at development time, between third-party, readily available, diversely-designed pieces of software, without increasing the development complexity. To support our vision, we propose an approach for energy-aware development that combines the construction of application-independent energy profiles of Java collections and static analysis to produce an estimate of in which ways and how intensively a system employs these collections. By combining these two pieces of information, it is possible to produce energy-saving recommendations for alternative collection implementations to be used in different parts of the system. We implement this approach in a tool named CT+ that works with both desktop and mobile Java systems, and is capable of analyzing 40 different collection implementations of lists, maps, and sets. We applied CT+ to twelve software systems: two mobile-based, seven desktop-based, and three that can run in both environments. Our evaluation infrastructure involved a high-end server, a notebook, and three mobile devices. When applying the (mostly trivial) recommendations, we achieved up to 17.34% reduction in energy consumption just by replacing collection implementations. Even for a real world, mature, highly-optimized system such as Xalan, CT+ could achieve a 5.81% reduction in energy consumption. Our results indicate that some widely used collections, e.g., ArrayList, HashMap, and HashTable, are not energy-efficient and sometimes should be avoided when energy consumption is a major concern.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"10 1","pages":"160-170"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79670762","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Splitting APIs: An Exploratory Study of Software Unbundling 拆分api:软件解绑的探索性研究
Anderson Severo de Matos, João Bosco Ferreira Filho, Lincoln Souza Rocha
Software unbundling consists of dividing an existing software artifact into smaller ones. Unbundling can be useful for removing clutter from the original application or separating different features that may not share the same purpose, or simply for isolating an emergent functionality that merits to be an application on its own. This phenomenon is frequent with mobile apps and it is also propagating to APIs. This paper proposes a first empirical study on unbundling to understand its effects on popular APIs. We explore the possibilities of splitting libraries into 2 or more bundles based on the use that their client projects make of them. We mine over than 71,000 client projects of 10 open source APIs and automatically generate 2,090 sub-APIs to then study their properties. We find that it is possible to have sets of different ways of using a given API and to unbundle it accordingly; the bundles can vary their representativeness and uniqueness, which is analyzed thoroughly in this study.
软件解绑包括将现有的软件构件划分为更小的构件。解绑定可以用于从原始应用程序中删除混乱,或者分离可能不具有相同目的的不同功能,或者仅仅用于隔离适合单独作为应用程序的紧急功能。这种现象在移动应用中很常见,而且也蔓延到了api中。本文提出了对解绑的第一个实证研究,以了解其对流行api的影响。我们将根据客户端项目对库的使用情况,探讨将库拆分为2个或更多个包的可能性。我们从10个开源api中挖掘了超过71,000个客户端项目,并自动生成2,090个子api,然后研究它们的属性。我们发现有可能有一组不同的方式来使用一个给定的API,并相应地解除它的捆绑;这些束具有不同的代表性和独特性,本研究对此进行了深入的分析。
{"title":"Splitting APIs: An Exploratory Study of Software Unbundling","authors":"Anderson Severo de Matos, João Bosco Ferreira Filho, Lincoln Souza Rocha","doi":"10.1109/MSR.2019.00062","DOIUrl":"https://doi.org/10.1109/MSR.2019.00062","url":null,"abstract":"Software unbundling consists of dividing an existing software artifact into smaller ones. Unbundling can be useful for removing clutter from the original application or separating different features that may not share the same purpose, or simply for isolating an emergent functionality that merits to be an application on its own. This phenomenon is frequent with mobile apps and it is also propagating to APIs. This paper proposes a first empirical study on unbundling to understand its effects on popular APIs. We explore the possibilities of splitting libraries into 2 or more bundles based on the use that their client projects make of them. We mine over than 71,000 client projects of 10 open source APIs and automatically generate 2,090 sub-APIs to then study their properties. We find that it is possible to have sets of different ways of using a given API and to unbundle it accordingly; the bundles can vary their representativeness and uniqueness, which is analyzed thoroughly in this study.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"245 1","pages":"360-370"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74495839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Program Committee 项目委员会
Rui Abreu
Rui Abreu, University of Lisbon, Portugal Jun Ai, Beihang University, China Domenico Amalfitano, University of Naples Federico II, Italy Doo-Hwan Bae, Korea Advanced Institute of Science and Technology, Korea Xiaoying Bai, Tsinghua University, China Lingfeng Bao, Zhejiang University City College, China David Benavides, University of Seville, Spain Antonia Bertolino, Italian National Research Council, Italy Mario Bravetti, Università di Bologna, Italy Christof Budnik, Siemens, Germany Yan Cai, Chinese Academy of Sciences, China Emilia Cambronero, Universidad Castilla-La Mancha, Spain Ana Cavalli, IT SudParis, France Arun Chakrapani Rao, University of Warwick, UK W.K. Chan, City University of Hong Kong, Hong Kong Junjie Chen, Peking University, China Yue Chen, Palo Alto Networks, USA William Chu, Tunghai University, Taiwan Sunita Chulani, Cisco, USA Frederic Dadeau, University of Franche-Comté, France Yuanshun Dai, University of Electronic Science and Technology of China, China Junhua Ding, East Carolina University, USA Tadashi Dohi, Hiroshima University, Japan Wei Dong, National University of Defense Technology, China Yunwei Dong, Northwestern Polytechnical University, China Benedikt Eberhardinger, MHP — A Porsche Company, Germany Khaled El-Fakih, American University of Sharjah, UAE Sadik Esmelioglu, Middle East Technical University, Turkey Hugues Evrard, Imperial College London, UK Joao Pascoal Faria, University of Porto, Portugal Thoshitha Gamage, Southern Illinois University Edwardsville, USA Sudipto Ghosh, Colorado State University, USA Arnaud Gotlieb, Simula Research Laboratory, Norway Matthias Güdemann, Input Output Hong Kong, Hong Kong Rajiv Gupta, University of California, Riverside, USA Chin-Yu Huang, National Tsing-Hua University, Taiwan Song Huang, Army Engineering University, China Ali Hurson, Missouri University of Science and Technology, USA Bo Jiang, Beihang University, China He Jiang, Dalian University of Technology, China Yu Jiang, Tsinghua University, China Xiaoyuan Jing, Wuhan University, China Roland Jochem, TU Berlin, Germany Sun Jun, Singapore University of Technology and Design, Singapore Jacky Keung, City University of Hong Kong, Hong Kong Pavneet Kochhar, Microsoft, USA Xuan-Bach Le, Carnegie Mellon University, USA
Rui Abreu、里斯本大学、葡萄牙Jun Ai、北京航空航天大学、中国Domenico Amalfitano、那不勒斯大学Federico II、意大利Bae do - hwan、韩国科学技术院、韩国白晓英、清华大学、中国鲍凌峰、浙江大学城市学院、中国David Benavides、西班牙塞维利亚大学、西班牙Antonia Bertolino、意大利国家研究委员会、意大利Mario Bravetti、博洛尼亚大学、意大利Christof Budnik、西门子、德国Yan Cai、中国科学院、中国Emilia Cambronero、西班牙Castilla-La Mancha大学、西班牙Ana Cavalli、法国IT SudParis、法国Arun Chakrapani Rao、华威大学、英国陈伟强、香港城市大学、香港陈俊杰、北京大学、中国陈越、美国Palo Alto Networks、美国Chu William、东海大学、台湾Sunita Chulani、美国Cisco、Frederic Dadeau、法国franche - comt大学、法国戴元顺、中国电子科技大学、中国丁俊华,东卡罗莱纳大学,美国Dohi Tadashi,广岛大学,日本董伟,国防科技大学,中国董云伟,西北工业大学,中国Benedikt Eberhardinger, MHP -保时捷公司、德国Khaled El-Fakih、美国沙迦大学、阿联酋Sadik Esmelioglu、中东技术大学、土耳其Hugues Evrard、伦敦帝国理工学院、英国Joao Pascoal Faria、葡萄牙波尔图大学、Thoshitha Gamage、美国南伊利诺伊大学爱德华兹维尔分校、美国科罗拉多州立大学、美国Sudipto Ghosh、美国Arnaud Gotlieb、美国Simula研究实验室、挪威Matthias g demann、香港Input Output、香港Rajiv Gupta、加州大学河间分校、美国黄金宇、国立清华大学、台湾黄松、陆军工程大学、中国阿里·赫森、密苏里科技大学、美国江波、北京航空航天大学、中国河江、大连理工大学、中国于江、清华大学、中国景小原、武汉大学、中国罗兰·约chem、德国柏林工业大学孙军、新加坡科技设计大学、新加坡Jacky Keung、香港城市大学、香港Pavneet Kochhar、乐宣巴赫,卡内基梅隆大学,美国
{"title":"Program Committee","authors":"Rui Abreu","doi":"10.1109/eitt.2018.00007","DOIUrl":"https://doi.org/10.1109/eitt.2018.00007","url":null,"abstract":"Rui Abreu, University of Lisbon, Portugal Jun Ai, Beihang University, China Domenico Amalfitano, University of Naples Federico II, Italy Doo-Hwan Bae, Korea Advanced Institute of Science and Technology, Korea Xiaoying Bai, Tsinghua University, China Lingfeng Bao, Zhejiang University City College, China David Benavides, University of Seville, Spain Antonia Bertolino, Italian National Research Council, Italy Mario Bravetti, Università di Bologna, Italy Christof Budnik, Siemens, Germany Yan Cai, Chinese Academy of Sciences, China Emilia Cambronero, Universidad Castilla-La Mancha, Spain Ana Cavalli, IT SudParis, France Arun Chakrapani Rao, University of Warwick, UK W.K. Chan, City University of Hong Kong, Hong Kong Junjie Chen, Peking University, China Yue Chen, Palo Alto Networks, USA William Chu, Tunghai University, Taiwan Sunita Chulani, Cisco, USA Frederic Dadeau, University of Franche-Comté, France Yuanshun Dai, University of Electronic Science and Technology of China, China Junhua Ding, East Carolina University, USA Tadashi Dohi, Hiroshima University, Japan Wei Dong, National University of Defense Technology, China Yunwei Dong, Northwestern Polytechnical University, China Benedikt Eberhardinger, MHP — A Porsche Company, Germany Khaled El-Fakih, American University of Sharjah, UAE Sadik Esmelioglu, Middle East Technical University, Turkey Hugues Evrard, Imperial College London, UK Joao Pascoal Faria, University of Porto, Portugal Thoshitha Gamage, Southern Illinois University Edwardsville, USA Sudipto Ghosh, Colorado State University, USA Arnaud Gotlieb, Simula Research Laboratory, Norway Matthias Güdemann, Input Output Hong Kong, Hong Kong Rajiv Gupta, University of California, Riverside, USA Chin-Yu Huang, National Tsing-Hua University, Taiwan Song Huang, Army Engineering University, China Ali Hurson, Missouri University of Science and Technology, USA Bo Jiang, Beihang University, China He Jiang, Dalian University of Technology, China Yu Jiang, Tsinghua University, China Xiaoyuan Jing, Wuhan University, China Roland Jochem, TU Berlin, Germany Sun Jun, Singapore University of Technology and Design, Singapore Jacky Keung, City University of Hong Kong, Hong Kong Pavneet Kochhar, Microsoft, USA Xuan-Bach Le, Carnegie Mellon University, USA","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74509026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1