Label, Segment, Featurize: A Cross Domain Framework for Prediction Engineering

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) Pub Date : 2016-10-01 DOI:10.1109/DSAA.2016.54

James Max Kanter, O. Gillespie, K. Veeramachaneni

引用次数: 20

Abstract

In this paper, we introduce "prediction engineering" as a formal step in the predictive modeling process. We define a generalizable 3 part framework — Label, Segment, Featurize (L-S-F) — to address the growing demand for predictive models. The framework provides abstractions for data scientists to customize the process to unique prediction problems. We describe how to apply the L-S-F framework to characteristic problems in 2 domains and demonstrate an implementation over 5 unique prediction problems defined on a dataset of crowdfunding projects from DonorsChoose.org. The results demonstrate how the L-S-F framework complements existing tools to allow us to rapidly build and evaluate 26 distinct predictive models. L-S-F enables development of models that provide value to all parties involved (donors, teachers, and people running the platform).

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

标签、分段、特征:预测工程的跨领域框架

在本文中，我们引入了“预测工程”作为预测建模过程中的正式步骤。我们定义了一个可概括的三部分框架-标签，分段，特征(L-S-F) -以满足对预测模型日益增长的需求。该框架为数据科学家提供了抽象，以便针对独特的预测问题定制流程。我们描述了如何将L-S-F框架应用于2个领域的特征问题，并演示了在DonorsChoose.org众筹项目数据集上定义的5个独特预测问题的实现。结果证明了L-S-F框架如何补充现有工具，使我们能够快速构建和评估26种不同的预测模型。L-S-F使模型的开发能够为所有相关方(捐赠者、教师和运行平台的人)提供价值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)

自引率

0.00%

发文量

期刊最新文献

A Multi-Granularity Pattern-Based Sequence Classification Framework for Educational Data Task Composition in Crowdsourcing Maritime Pattern Extraction from AIS Data Using a Genetic Algorithm What Did I Do Wrong in My MOBA Game? Mining Patterns Discriminating Deviant Behaviours Nonparametric Adjoint-Based Inference for Stochastic Differential Equations