Ferry: Toward Better Understanding of Input/Output Space for Data Wrangling Scripts

Zhongsu Luo;Kai Xiong;Jiajun Zhu;Ran Chen;Xinhuan Shu;Di Weng;Yingcai Wu
{"title":"Ferry: Toward Better Understanding of Input/Output Space for Data Wrangling Scripts","authors":"Zhongsu Luo;Kai Xiong;Jiajun Zhu;Ran Chen;Xinhuan Shu;Di Weng;Yingcai Wu","doi":"10.1109/TVCG.2024.3456328","DOIUrl":null,"url":null,"abstract":"Understanding the input and output of data wrangling scripts is crucial for various tasks like debugging code and onboarding new data. However, existing research on script understanding primarily focuses on revealing the process of data transformations, lacking the ability to analyze the potential scope, i.e., the space of script inputs and outputs. Meanwhile, constructing input/output space during script analysis is challenging, as the wrangling scripts could be semantically complex and diverse, and the association between different data objects is intricate. To facilitate data workers in understanding the input and output space of wrangling scripts, we summarize ten types of constraints to express table space and build a mapping between data transformations and these constraints to guide the construction of the input/output for individual transformations. Then, we propose a constraint generation model for integrating table constraints across multiple transformations. Based on the model, we develop Ferry, an interactive system that extracts and visualizes the data constraints describing the input and output space of data wrangling scripts, thereby enabling users to grasp the high-level semantics of complex scripts and locate the origins of faulty data transformations. Besides, Ferry provides example input and output data to assist users in interpreting the extracted constraints and checking and resolving the conflicts between these constraints and any uploaded dataset. Ferry's effectiveness and usability are evaluated through two usage scenarios and two case studies, including understanding, debugging, and checking both single and multiple scripts, with and without executable data. Furthermore, an illustrative application is presented to demonstrate Ferry's flexibility.","PeriodicalId":94035,"journal":{"name":"IEEE transactions on visualization and computer graphics","volume":"31 1","pages":"1202-1212"},"PeriodicalIF":6.5000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on visualization and computer graphics","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10670464/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding the input and output of data wrangling scripts is crucial for various tasks like debugging code and onboarding new data. However, existing research on script understanding primarily focuses on revealing the process of data transformations, lacking the ability to analyze the potential scope, i.e., the space of script inputs and outputs. Meanwhile, constructing input/output space during script analysis is challenging, as the wrangling scripts could be semantically complex and diverse, and the association between different data objects is intricate. To facilitate data workers in understanding the input and output space of wrangling scripts, we summarize ten types of constraints to express table space and build a mapping between data transformations and these constraints to guide the construction of the input/output for individual transformations. Then, we propose a constraint generation model for integrating table constraints across multiple transformations. Based on the model, we develop Ferry, an interactive system that extracts and visualizes the data constraints describing the input and output space of data wrangling scripts, thereby enabling users to grasp the high-level semantics of complex scripts and locate the origins of faulty data transformations. Besides, Ferry provides example input and output data to assist users in interpreting the extracted constraints and checking and resolving the conflicts between these constraints and any uploaded dataset. Ferry's effectiveness and usability are evaluated through two usage scenarios and two case studies, including understanding, debugging, and checking both single and multiple scripts, with and without executable data. Furthermore, an illustrative application is presented to demonstrate Ferry's flexibility.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
渡轮:更好地理解数据整理脚本的输入/输出空间
理解数据处理脚本的输入和输出对于调试代码和导入新数据等各种任务至关重要。然而,现有的脚本理解研究主要侧重于揭示数据转换过程,缺乏对潜在范围(即脚本输入和输出空间)的分析能力。同时,在脚本分析过程中构建输入/输出空间是一项挑战,因为处理脚本的语义可能复杂多样,不同数据对象之间的关联错综复杂。为了便于数据工作者理解篡改脚本的输入和输出空间,我们总结了十类约束来表达表空间,并建立了数据转换与这些约束之间的映射,以指导构建单个转换的输入/输出。然后,我们提出了一个约束生成模型,用于在多个转换中整合表约束。基于该模型,我们开发了一个交互式系统 Ferry,它可以提取描述数据处理脚本输入和输出空间的数据约束并将其可视化,从而使用户能够掌握复杂脚本的高层语义,并找到错误数据转换的根源。此外,Ferry 还提供输入和输出数据示例,帮助用户解释提取的约束条件,检查并解决这些约束条件与任何上传数据集之间的冲突。通过两个使用场景和两个案例研究,对 Ferry 的有效性和可用性进行了评估,包括理解、调试和检查单个和多个脚本,以及有无可执行数据。此外,还介绍了一个示例应用,以展示 Ferry 的灵活性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
DynAvatar: Dynamic 3D Head Avatar Deformation With Expression Guided Gaussian Splatting. Understanding the Research-Practice Gap in Visualization Design Guidelines. QuRAFT: Enhancing Quantum Algorithm Design by Visual Linking Between Mathematical Concepts and Quantum Circuits. DanceAgent: Dance Movement Refinement With LLM Agent. Do You "Trust" This Visualization? An Inventory to Measure Trust in Visualizations.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1