使用GitHub托管Python包的网络分析来衡量开源软件创新的影响

Derek Banks, Camille Leonard, Shilpa Narayan, Nicholas Thompson, Brandon L. Kramer, Gizem Korkmaz
{"title":"使用GitHub托管Python包的网络分析来衡量开源软件创新的影响","authors":"Derek Banks, Camille Leonard, Shilpa Narayan, Nicholas Thompson, Brandon L. Kramer, Gizem Korkmaz","doi":"10.1109/sieds55548.2022.9799290","DOIUrl":null,"url":null,"abstract":"Open Source Software (OSS) is computer software that has its source code publicly available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. Despite its extensive use, reliable measures of the scope and impact of OSS are scarce. In this paper, we focus on packages developed for Python programming language as it is one of the most widely-used languages mainly due to its flexibility and simple syntax that makes its framework easy to learn and share. We aim to develop a framework to measure the impact of Python packages listed on Package Index (PyPI.org). We use data from GitHub repositories (where these packages are developed) to obtain information about their development activity e.g., lines of code. Our goal is to identify influential actors, e.g., packages, developers, countries by using the impact measures. We use network-based and OSS-based measures such as number of downloads. Network-based statistics include centrality measures such as degree, and eigenvector centrality. Moreover, we calcu-late the cost of OSS as intangible capital using the COCOMO II model [1] to determine the cost of development and study the relationship between development cost and impact of Python projects. The findings show that the number of downloads for a package are correlated with the centrality statistics, supporting the hypothesis that the most influential are the most downloaded as well. We show which packages are saving on development cost by leveraging dependencies. This framework and measures can be applied more broadly to the OSS ecosystem and contribute to the National Science Foundation (NSF) policy indicators for measurement of innovation.","PeriodicalId":286724,"journal":{"name":"2022 Systems and Information Engineering Design Symposium (SIEDS)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Measuring the Impact of Open Source Software Innovation Using Network Analysis on GitHub Hosted Python Packages\",\"authors\":\"Derek Banks, Camille Leonard, Shilpa Narayan, Nicholas Thompson, Brandon L. Kramer, Gizem Korkmaz\",\"doi\":\"10.1109/sieds55548.2022.9799290\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Open Source Software (OSS) is computer software that has its source code publicly available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. Despite its extensive use, reliable measures of the scope and impact of OSS are scarce. In this paper, we focus on packages developed for Python programming language as it is one of the most widely-used languages mainly due to its flexibility and simple syntax that makes its framework easy to learn and share. We aim to develop a framework to measure the impact of Python packages listed on Package Index (PyPI.org). We use data from GitHub repositories (where these packages are developed) to obtain information about their development activity e.g., lines of code. Our goal is to identify influential actors, e.g., packages, developers, countries by using the impact measures. We use network-based and OSS-based measures such as number of downloads. Network-based statistics include centrality measures such as degree, and eigenvector centrality. Moreover, we calcu-late the cost of OSS as intangible capital using the COCOMO II model [1] to determine the cost of development and study the relationship between development cost and impact of Python projects. The findings show that the number of downloads for a package are correlated with the centrality statistics, supporting the hypothesis that the most influential are the most downloaded as well. We show which packages are saving on development cost by leveraging dependencies. This framework and measures can be applied more broadly to the OSS ecosystem and contribute to the National Science Foundation (NSF) policy indicators for measurement of innovation.\",\"PeriodicalId\":286724,\"journal\":{\"name\":\"2022 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-04-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/sieds55548.2022.9799290\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/sieds55548.2022.9799290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

开放源码软件(OSS)是一种计算机软件,它的源代码是公开的,并带有许可,其中版权所有者提供了研究、更改和将软件分发给任何人和任何目的的权利。尽管它被广泛使用,但是对OSS的范围和影响的可靠度量是稀缺的。在本文中,我们主要关注Python编程语言开发的包,因为它是最广泛使用的语言之一,主要是由于它的灵活性和简单的语法,使其框架易于学习和共享。我们的目标是开发一个框架来衡量Package Index (PyPI.org)上列出的Python包的影响。我们使用来自GitHub存储库(这些包开发的地方)的数据来获取有关其开发活动的信息,例如代码行。我们的目标是通过使用影响措施确定有影响的行为者,例如,一揽子计划、开发人员和国家。我们使用基于网络和基于oss的指标,如下载量。基于网络的统计包括中心性度量,如度和特征向量中心性。此外,我们使用COCOMO II模型[1]计算OSS作为无形资本的成本,以确定开发成本,并研究开发成本与Python项目影响之间的关系。研究结果表明,一个软件包的下载次数与中心性统计相关,这支持了最具影响力的软件包也是下载次数最多的假设。我们展示了哪些包通过利用依赖关系节省了开发成本。这个框架和措施可以更广泛地应用于OSS生态系统,并有助于国家科学基金会(NSF)衡量创新的政策指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Measuring the Impact of Open Source Software Innovation Using Network Analysis on GitHub Hosted Python Packages
Open Source Software (OSS) is computer software that has its source code publicly available with a license in which the copyright holder provides the rights to study, change, and distribute the software to anyone and for any purpose. Despite its extensive use, reliable measures of the scope and impact of OSS are scarce. In this paper, we focus on packages developed for Python programming language as it is one of the most widely-used languages mainly due to its flexibility and simple syntax that makes its framework easy to learn and share. We aim to develop a framework to measure the impact of Python packages listed on Package Index (PyPI.org). We use data from GitHub repositories (where these packages are developed) to obtain information about their development activity e.g., lines of code. Our goal is to identify influential actors, e.g., packages, developers, countries by using the impact measures. We use network-based and OSS-based measures such as number of downloads. Network-based statistics include centrality measures such as degree, and eigenvector centrality. Moreover, we calcu-late the cost of OSS as intangible capital using the COCOMO II model [1] to determine the cost of development and study the relationship between development cost and impact of Python projects. The findings show that the number of downloads for a package are correlated with the centrality statistics, supporting the hypothesis that the most influential are the most downloaded as well. We show which packages are saving on development cost by leveraging dependencies. This framework and measures can be applied more broadly to the OSS ecosystem and contribute to the National Science Foundation (NSF) policy indicators for measurement of innovation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
The Linville Creek Bridge: A Case Study of Design Thinking in Structural Engineering Convergence Across Behavioral and Self-report Measures Evaluating Individuals' Trust in an Autonomous Golf Cart Investigating the Illicit Trade of Cultural Property with an Automated Data Pipeline Architecture Investigating Disinformation Through the Lens of Mass Media: A System Design Dynamic Coal Production Line: Plant Design and Analysis Tool
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1