dcbench

Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning Pub Date : 2022-06-12 DOI:10.1145/3533028.3533310

Sabri Eyuboglu, Bojan Karlas, Christopher Ré, Ce Zhang, James Zou

引用次数: 12

Abstract

The development workflow for today's AI applications has grown far beyond the standard model training task. This workflow typically consists of various data and model management tasks. It includes a "data cycle" aimed at producing high-quality training data, and a "model cycle" aimed at managing trained models on their way to production. This broadened workflow has opened a space for already emerging tools and systems for AI development. However, as a research community, we are still missing standardized ways to evaluate these tools and systems. In a humble effort to get this wheel turning, we developed dcbench, a benchmark for evaluating systems for data-centric AI development. In this report, we present the main ideas behind dcbench, some benchmark tasks that we included in the initial release, and a short summary of its implementation.

查看原文