Lodestar:通过数据驱动的分析建议，支持数据科学工作流的快速原型

IF 1.8 4区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING Information Visualization Pub Date : 2023-08-14 DOI:10.1177/14738716231190429

Deepthi Raghunandan, Zhe Cui, Kartik Krishnan, Segen Tirfe, Shenzhi Shi, Tejaswi Darshan Shrestha, L. Battle, N. Elmqvist

{"title":"Lodestar:通过数据驱动的分析建议，支持数据科学工作流的快速原型","authors":"Deepthi Raghunandan, Zhe Cui, Kartik Krishnan, Segen Tirfe, Shenzhi Shi, Tejaswi Darshan Shrestha, L. Battle, N. Elmqvist","doi":"10.1177/14738716231190429","DOIUrl":null,"url":null,"abstract":"Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6000 Jupyter notebooks. We validated Lodestar through three separate user studies: first a formative evaluation involving novices learning data science using the tool. We used the feedback from this study to improve the tool. This was followed by a summative study involving both new and returning participants from the formative evaluation to test the efficacy of our improvements. We also engaged professional data scientists in an expert review assessing the utility of the different recommendations. Overall, our results suggest that both novice and professional users find Lodestar useful for rapidly creating data science workflows.","PeriodicalId":50360,"journal":{"name":"Information Visualization","volume":"1 1","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Lodestar: Supporting rapid prototyping of data science workflows through data-driven analysis recommendations\",\"authors\":\"Deepthi Raghunandan, Zhe Cui, Kartik Krishnan, Segen Tirfe, Shenzhi Shi, Tejaswi Darshan Shrestha, L. Battle, N. Elmqvist\",\"doi\":\"10.1177/14738716231190429\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6000 Jupyter notebooks. We validated Lodestar through three separate user studies: first a formative evaluation involving novices learning data science using the tool. We used the feedback from this study to improve the tool. This was followed by a summative study involving both new and returning participants from the formative evaluation to test the efficacy of our improvements. We also engaged professional data scientists in an expert review assessing the utility of the different recommendations. Overall, our results suggest that both novice and professional users find Lodestar useful for rapidly creating data science workflows.\",\"PeriodicalId\":50360,\"journal\":{\"name\":\"Information Visualization\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":1.8000,\"publicationDate\":\"2023-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Visualization\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1177/14738716231190429\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Visualization","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/14738716231190429","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

跟上当前的趋势、技术和最佳实践在可视化和数据分析变得越来越困难,特别是对于羽翼未丰的数据科学家。在本文中，我们提出了lodestar，这是一个交互式计算笔记本，允许用户通过从自动分析建议列表中进行选择来快速探索和构建新的数据科学工作流。我们从已知分析状态的有向图中得出我们的建议，有两个输入源:一个来自在线数据科学教程的手动策划，另一个通过对6000多个Jupyter笔记本的语料库的半自动分析提取。我们通过三个独立的用户研究验证了Lodestar:首先是一个涉及使用该工具学习数据科学的新手的形成性评估。我们利用这项研究的反馈来改进工具。接下来是一项总结性研究，涉及新参与者和从形成性评估中返回的参与者，以测试我们改进的有效性。我们还聘请了专业数据科学家进行专家评审，评估不同建议的效用。总的来说，我们的结果表明新手和专业用户都发现Lodestar对于快速创建数据科学工作流很有用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Lodestar: Supporting rapid prototyping of data science workflows through data-driven analysis recommendations

Keeping abreast of current trends, technologies, and best practices in visualization and data analysis is becoming increasingly difficult, especially for fledgling data scientists. In this paper, we propose lodestar, an interactive computational notebook that allows users to quickly explore and construct new data science workflows by selecting from a list of automated analysis recommendations. We derive our recommendations from directed graphs of known analysis states, with two input sources: one manually curated from online data science tutorials, and another extracted through semi-automatic analysis of a corpus of over 6000 Jupyter notebooks. We validated Lodestar through three separate user studies: first a formative evaluation involving novices learning data science using the tool. We used the feedback from this study to improve the tool. This was followed by a summative study involving both new and returning participants from the formative evaluation to test the efficacy of our improvements. We also engaged professional data scientists in an expert review assessing the utility of the different recommendations. Overall, our results suggest that both novice and professional users find Lodestar useful for rapidly creating data science workflows.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Visualization COMPUTER SCIENCE, SOFTWARE ENGINEERING-

CiteScore

5.40

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Information Visualization is essential reading for researchers and practitioners of information visualization and is of interest to computer scientists and data analysts working on related specialisms. This journal is an international, peer-reviewed journal publishing articles on fundamental research and applications of information visualization. The journal acts as a dedicated forum for the theories, methodologies, techniques and evaluations of information visualization and its applications. The journal is a core vehicle for developing a generic research agenda for the field by identifying and developing the unique and significant aspects of information visualization. Emphasis is placed on interdisciplinary material and on the close connection between theory and practice. This journal is a member of the Committee on Publication Ethics (COPE).

期刊最新文献

Multidimensional data visualization and synchronization for revealing hidden pandemic information Interactive visual formula composition of multidimensional data classifiers Exploring annotation taxonomy in grouped bar charts: A qualitative classroom study Designing complex network visualisations using the wayfinding map metaphor Graph & Network Visualization and Beyond