{"title":"JupySim: Jupyter Notebook Similarity Search System","authors":"Misato Horiuchi, Yuya Sasaki, Chuan Xiao, Makoto Onizuka","doi":"10.48786/edbt.2022.49","DOIUrl":null,"url":null,"abstract":"Computational notebooks such as Jupyter notebooks are popular for machine learning and data analytic tasks. Numerous computational notebooks are available on the Web and reusable; however, searching for computational notebooks manually is a tedious task and so far there are no tools to search for computational notebooks effectively and efficiently. In this paper, we develop JupySim , which is a system for similarity search on Jupyter notebooks. In JupySim , users specify contents (codes, tabular data, libraries, and formats of outputs) in Jupyter notebooks as a query, and then retrieve top- 𝑘 Jupyter notebooks with the most similar contents to the given query. The characteristic of JupySim is that the queries and Jupyter notebooks are modeled by graphs for capturing the relationships between codes, data, and outputs. JupySim has intuitive user interfaces that the users can specify their targets of Jupyter notebooks easily. Our demonstration scenarios show that JupySim is effective to find Jupyter notebooks shared on Kaggle for data science.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"100 1","pages":"2:554-2:557"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2022.49","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Computational notebooks such as Jupyter notebooks are popular for machine learning and data analytic tasks. Numerous computational notebooks are available on the Web and reusable; however, searching for computational notebooks manually is a tedious task and so far there are no tools to search for computational notebooks effectively and efficiently. In this paper, we develop JupySim , which is a system for similarity search on Jupyter notebooks. In JupySim , users specify contents (codes, tabular data, libraries, and formats of outputs) in Jupyter notebooks as a query, and then retrieve top- 𝑘 Jupyter notebooks with the most similar contents to the given query. The characteristic of JupySim is that the queries and Jupyter notebooks are modeled by graphs for capturing the relationships between codes, data, and outputs. JupySim has intuitive user interfaces that the users can specify their targets of Jupyter notebooks easily. Our demonstration scenarios show that JupySim is effective to find Jupyter notebooks shared on Kaggle for data science.