Learning to Restructure Tables Automatically

ACM SIGMOD Record Pub Date : 2024-05-14 DOI:10.1145/3665252.3665268

J. M. Hellerstein

引用次数: 0

Abstract

By now, it is widely-accepted folk wisdom that "half of the time in any data analysis project is spent wrangling the data". Analytic algorithms and tools-built on mathematical foundations of matrices and relations-require their data to be lined up in particular rows and columns. In the relational model (known in data science circles as "tidy data"), each row is an independent observation, and each column is a distinct attribute of the phenomenon described by the data. While there are many thorny aspects to data wrangling, perhaps none is more basic than the challenge of getting data reorganized, positionally, into the right form for analysis.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

学会自动重组表格

现在，"任何数据分析项目都有一半的时间花在处理数据上"，这是广为接受的民间智慧。建立在矩阵和关系数学基础上的分析算法和工具要求数据按特定的行列排列。在关系模型中（在数据科学界被称为 "整齐数据"），每一行都是一个独立的观察结果，每一列都是数据所描述现象的独特属性。虽然数据处理有许多棘手的问题，但最基本的挑战可能莫过于如何将数据重新组织、定位，使其成为分析所需的正确形式。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

ACM SIGMOD Record

自引率

0.00%

发文量

期刊最新文献

Technical Perspective: Efficient and Reusable Lazy Sampling Unicorn: A Unified Multi-Tasking Matching Model Learning to Restructure Tables Automatically DBSP: Incremental Computation on Streams and Its Applications to Databases Efficient and Reusable Lazy Sampling