Derek Banks, Camille Leonard, Shilpa Narayan, Nicholas Thompson, Brandon L. Kramer, Gizem Korkmaz
{"title":"Measuring the Impact of Open Source Software Innovation Using Network Analysis on GitHub Hosted Python Packages","authors":"Derek Banks, Camille Leonard, Shilpa Narayan, Nicholas Thompson, Brandon L. Kramer, Gizem Korkmaz","doi":"10.1109/sieds55548.2022.9799290","DOIUrl":null,"url":null,"abstract":"Open Source Software (OSS) is computer software that has its source code publicly available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. Despite its extensive use, reliable measures of the scope and impact of OSS are scarce. In this paper, we focus on packages developed for Python programming language as it is one of the most widely-used languages mainly due to its flexibility and simple syntax that makes its framework easy to learn and share. We aim to develop a framework to measure the impact of Python packages listed on Package Index (PyPI.org). We use data from GitHub repositories (where these packages are developed) to obtain information about their development activity e.g., lines of code. Our goal is to identify influential actors, e.g., packages, developers, countries by using the impact measures. We use network-based and OSS-based measures such as number of downloads. Network-based statistics include centrality measures such as degree, and eigenvector centrality. Moreover, we calcu-late the cost of OSS as intangible capital using the COCOMO II model [1] to determine the cost of development and study the relationship between development cost and impact of Python projects. The findings show that the number of downloads for a package are correlated with the centrality statistics, supporting the hypothesis that the most influential are the most downloaded as well. We show which packages are saving on development cost by leveraging dependencies. This framework and measures can be applied more broadly to the OSS ecosystem and contribute to the National Science Foundation (NSF) policy indicators for measurement of innovation.","PeriodicalId":286724,"journal":{"name":"2022 Systems and Information Engineering Design Symposium (SIEDS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sieds55548.2022.9799290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Open Source Software (OSS) is computer software that has its source code publicly available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. Despite its extensive use, reliable measures of the scope and impact of OSS are scarce. In this paper, we focus on packages developed for Python programming language as it is one of the most widely-used languages mainly due to its flexibility and simple syntax that makes its framework easy to learn and share. We aim to develop a framework to measure the impact of Python packages listed on Package Index (PyPI.org). We use data from GitHub repositories (where these packages are developed) to obtain information about their development activity e.g., lines of code. Our goal is to identify influential actors, e.g., packages, developers, countries by using the impact measures. We use network-based and OSS-based measures such as number of downloads. Network-based statistics include centrality measures such as degree, and eigenvector centrality. Moreover, we calcu-late the cost of OSS as intangible capital using the COCOMO II model [1] to determine the cost of development and study the relationship between development cost and impact of Python projects. The findings show that the number of downloads for a package are correlated with the centrality statistics, supporting the hypothesis that the most influential are the most downloaded as well. We show which packages are saving on development cost by leveraging dependencies. This framework and measures can be applied more broadly to the OSS ecosystem and contribute to the National Science Foundation (NSF) policy indicators for measurement of innovation.
开放源码软件(OSS)是一种计算机软件,它的源代码是公开的,并带有许可,其中版权所有者提供了研究、更改和将软件分发给任何人和任何目的的权利。尽管它被广泛使用,但是对OSS的范围和影响的可靠度量是稀缺的。在本文中,我们主要关注Python编程语言开发的包,因为它是最广泛使用的语言之一,主要是由于它的灵活性和简单的语法,使其框架易于学习和共享。我们的目标是开发一个框架来衡量Package Index (PyPI.org)上列出的Python包的影响。我们使用来自GitHub存储库(这些包开发的地方)的数据来获取有关其开发活动的信息,例如代码行。我们的目标是通过使用影响措施确定有影响的行为者,例如,一揽子计划、开发人员和国家。我们使用基于网络和基于oss的指标,如下载量。基于网络的统计包括中心性度量,如度和特征向量中心性。此外,我们使用COCOMO II模型[1]计算OSS作为无形资本的成本,以确定开发成本,并研究开发成本与Python项目影响之间的关系。研究结果表明,一个软件包的下载次数与中心性统计相关,这支持了最具影响力的软件包也是下载次数最多的假设。我们展示了哪些包通过利用依赖关系节省了开发成本。这个框架和措施可以更广泛地应用于OSS生态系统,并有助于国家科学基金会(NSF)衡量创新的政策指标。