PIMan: A Comprehensive Approach for Establishing Plausible Influence among Software Repositories

Md Omar Faruk Rokon, Risul Islam, Md Rayhanul Masud, M. Faloutsos
{"title":"PIMan: A Comprehensive Approach for Establishing Plausible Influence among Software Repositories","authors":"Md Omar Faruk Rokon, Risul Islam, Md Rayhanul Masud, M. Faloutsos","doi":"10.1109/ASONAM55673.2022.10068629","DOIUrl":null,"url":null,"abstract":"How can we quantify the influence among repos-itories in online archives like GitHub? Determining repository influence is an essential building block for understanding the dynamics of GitHub-like software archives. The key challenge is to define the appropriate representation model of influence that captures the nuances of the concept and considers its diverse manifestations. We propose PIMan, a systematic approach to quantify the influence among the repositories in a software archive by focusing on the social level interactions. As our key novelty, we introduce the concept of Plausible Influence which considers three types of information: (a) repository level interactions, (b) author level interactions, and (c) temporal considerations. We evaluate and apply our method using 2089 malware repositories from GitHub spanning approximately 12 years. First, we show how our approach provides a powerful and flexible way to generate a plausible influence graph whose density is determined by the Plausible Influence Threshold (PIT), which is modifiable to meet the needs of a study. Second, we find that there is a significant collaboration and influence among the repositories in our dataset. We identify 28 connected components in the plausible influence graph (PIT = 0.25) with 7% of the components containing at least 15 repositories. Furthermore, we find 19 repositories that influenced at least 10 other repositories directly and spawned at least two “families” of repositories. In addition, the results show that our influence metrics capture the manifold aspects of the interactions that are not captured by the typical repository popularity metrics (e.g. number of stars). Overall, our work is a fundamental building block for identifying the influence and lineage of the repositories in online software platforms.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068629","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

How can we quantify the influence among repos-itories in online archives like GitHub? Determining repository influence is an essential building block for understanding the dynamics of GitHub-like software archives. The key challenge is to define the appropriate representation model of influence that captures the nuances of the concept and considers its diverse manifestations. We propose PIMan, a systematic approach to quantify the influence among the repositories in a software archive by focusing on the social level interactions. As our key novelty, we introduce the concept of Plausible Influence which considers three types of information: (a) repository level interactions, (b) author level interactions, and (c) temporal considerations. We evaluate and apply our method using 2089 malware repositories from GitHub spanning approximately 12 years. First, we show how our approach provides a powerful and flexible way to generate a plausible influence graph whose density is determined by the Plausible Influence Threshold (PIT), which is modifiable to meet the needs of a study. Second, we find that there is a significant collaboration and influence among the repositories in our dataset. We identify 28 connected components in the plausible influence graph (PIT = 0.25) with 7% of the components containing at least 15 repositories. Furthermore, we find 19 repositories that influenced at least 10 other repositories directly and spawned at least two “families” of repositories. In addition, the results show that our influence metrics capture the manifold aspects of the interactions that are not captured by the typical repository popularity metrics (e.g. number of stars). Overall, our work is a fundamental building block for identifying the influence and lineage of the repositories in online software platforms.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在软件存储库之间建立合理影响的综合方法
我们如何量化像GitHub这样的在线档案库之间的影响?确定库的影响是理解类似github的软件归档动态的重要组成部分。关键的挑战是定义适当的影响力表示模型,以捕捉概念的细微差别并考虑其各种表现形式。我们提出了PIMan,这是一种系统的方法,通过关注社会层面的交互来量化软件存档中存储库之间的影响。作为我们的关键新颖之处,我们引入了似是而非的影响概念,它考虑了三种类型的信息:(a)存储库级别的交互,(b)作者级别的交互,以及(c)时间考虑。我们使用来自GitHub的2089个恶意软件存储库来评估和应用我们的方法,跨度约为12年。首先,我们展示了我们的方法如何提供一种强大而灵活的方式来生成可信影响图,其密度由可信影响阈值(PIT)决定,该阈值可以修改以满足研究的需要。其次,我们发现数据集中的存储库之间存在显著的协作和影响。我们在可信影响图(PIT = 0.25)中确定了28个相互连接的组件,其中7%的组件包含至少15个存储库。此外,我们发现19个存储库直接影响了至少10个其他存储库,并产生了至少两个存储库“家族”。此外,结果显示,我们的影响度量捕获了交互的多方面,而典型的存储库流行度量(例如星星的数量)没有捕获这些方面。总的来说,我们的工作是确定在线软件平台中存储库的影响和沿袭的基本构建块。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
MOGPlay: A Decentralized Crowd Journalism Application for Democratic News Production The Pursuit of Being Heard: An Unsupervised Approach to Narrative Detection in Online Protest ASONAM 2022 Tutorial I: Mining and Analysing Collaboration in git Repositories with git2net Multigraph transformation for community detection applied to financial services Whole-File Chunk-Based Deduplication Using Reinforcement Learning for Cloud Storage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1