A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks

J. F. Pimentel, Leonardo Gresta Paulino Murta, V. Braganholo, J. Freire
{"title":"A Large-Scale Study About Quality and Reproducibility of Jupyter Notebooks","authors":"J. F. Pimentel, Leonardo Gresta Paulino Murta, V. Braganholo, J. Freire","doi":"10.1109/MSR.2019.00077","DOIUrl":null,"url":null,"abstract":"Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development.","PeriodicalId":6706,"journal":{"name":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","volume":"81 1","pages":"507-517"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"148","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MSR.2019.00077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 148

Abstract

Jupyter Notebooks have been widely adopted by many different communities, both in science and industry. They support the creation of literate programming documents that combine code, text, and execution results with visualizations and all sorts of rich media. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks. At the same time, there has been growing criticism that the way notebooks are being used leads to unexpected behavior, encourage poor coding practices, and that their results can be hard to reproduce. To understand good and bad practices used in the development of real notebooks, we studied 1.4 million notebooks from GitHub. We present a detailed analysis of their characteristics that impact reproducibility. We also propose a set of best practices that can improve the rate of reproducibility and discuss open challenges that require further research and development.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
木星笔记的质量和再现性的大规模研究
Jupyter Notebooks已被许多不同的团体广泛采用,包括科学界和工业界。它们支持创建文字编程文档,这些文档将代码、文本和执行结果与可视化和各种富媒体结合起来。自记录方面和重现结果的能力被吹捧为笔记本的显著优点。与此同时,越来越多的人批评说,使用笔记本的方式会导致意想不到的行为,鼓励糟糕的编码实践,而且它们的结果很难重现。为了了解在真实笔记本的开发中使用的好的和坏的做法,我们研究了来自GitHub的140万台笔记本。我们提出了他们的特点,影响再现性的详细分析。我们还提出了一套可以提高再现率的最佳实践,并讨论了需要进一步研究和开发的开放挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SeSaMe: A Data Set of Semantically Similar Java Methods Lessons Learned from Using a Deep Tree-Based Model for Software Defect Prediction in Practice STRAIT: A Tool for Automated Software Reliability Growth Analysis Assessing Diffusion and Perception of Test Smells in Scala Projects An Empirical History of Permission Requests and Mistakes in Open Source Android Apps
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1