How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories

Michael Meli, Matthew R. McNiece, Bradley Reaves
{"title":"How Bad Can It Git? Characterizing Secret Leakage in Public GitHub Repositories","authors":"Michael Meli, Matthew R. McNiece, Bradley Reaves","doi":"10.14722/ndss.2019.23418","DOIUrl":null,"url":null,"abstract":"—GitHub and similar platforms have made public collaborative development of software commonplace. However, a problem arises when this public code must manage authentication secrets, such as API keys or cryptographic secrets. These secrets must be kept private for security, yet common development practices like adding these secrets to code make accidental leakage frequent. In this paper, we present the first large-scale and longitudinal analysis of secret leakage on GitHub. We examine billions of files collected using two complementary approaches: a nearly six-month scan of real-time public GitHub commits and a public snapshot covering 13% of open-source repositories. We focus on private key files and 11 high-impact platforms with distinctive API key formats. This focus allows us to develop conservative detection techniques that we manually and automatically evaluate to ensure accurate results. We find that not only is secret leakage pervasive — affecting over 100,000 repositories — but that thousands of new, unique secrets are leaked every day. We also use our data to explore possible root causes of leakage and to evaluate potential mitigation strategies. This work shows that secret leakage on public repository platforms is rampant and far from a solved problem, placing developers and services at persistent risk of compromise and abuse.","PeriodicalId":20444,"journal":{"name":"Proceedings 2019 Network and Distributed System Security Symposium","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 2019 Network and Distributed System Security Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14722/ndss.2019.23418","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 56

Abstract

—GitHub and similar platforms have made public collaborative development of software commonplace. However, a problem arises when this public code must manage authentication secrets, such as API keys or cryptographic secrets. These secrets must be kept private for security, yet common development practices like adding these secrets to code make accidental leakage frequent. In this paper, we present the first large-scale and longitudinal analysis of secret leakage on GitHub. We examine billions of files collected using two complementary approaches: a nearly six-month scan of real-time public GitHub commits and a public snapshot covering 13% of open-source repositories. We focus on private key files and 11 high-impact platforms with distinctive API key formats. This focus allows us to develop conservative detection techniques that we manually and automatically evaluate to ensure accurate results. We find that not only is secret leakage pervasive — affecting over 100,000 repositories — but that thousands of new, unique secrets are leaked every day. We also use our data to explore possible root causes of leakage and to evaluate potential mitigation strategies. This work shows that secret leakage on public repository platforms is rampant and far from a solved problem, placing developers and services at persistent risk of compromise and abuse.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
它能变得多糟糕?描述GitHub公共存储库中的秘密泄漏
-GitHub和类似的平台使得软件的公共协作开发变得司空见惯。但是,当此公共代码必须管理身份验证秘密(如API密钥或加密秘密)时,就会出现问题。为了安全起见,这些秘密必须保持私密性,但是将这些秘密添加到代码中的常见开发实践经常会导致意外泄漏。在本文中,我们首次对GitHub上的秘密泄漏进行了大规模的纵向分析。我们使用两种互补的方法来检查收集的数十亿个文件:近六个月的实时GitHub公共提交扫描和覆盖13%开源存储库的公共快照。我们专注于私钥文件和11个具有独特API密钥格式的高影响力平台。这种专注使我们能够开发保守的检测技术,我们手动和自动评估以确保准确的结果。我们发现,不仅秘密泄露无处不在——影响了超过10万个存储库——而且每天都有数千个新的、独特的秘密被泄露。我们还使用我们的数据来探索泄漏的可能根本原因,并评估潜在的缓解策略。这项工作表明,公共存储库平台上的秘密泄漏非常猖獗,远远没有解决问题,将开发人员和服务置于妥协和滥用的持续风险中。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Network and System Security: 17th International Conference, NSS 2023, Canterbury, UK, August 14–16, 2023, Proceedings Network and System Security: 16th International Conference, NSS 2022, Denarau Island, Fiji, December 9–12, 2022, Proceedings Network and System Security: 15th International Conference, NSS 2021, Tianjin, China, October 23, 2021, Proceedings Network and System Security: 14th International Conference, NSS 2020, Melbourne, VIC, Australia, November 25–27, 2020, Proceedings Neuro-Symbolic Execution: Augmenting Symbolic Execution with Neural Constraints
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1