捕获:自动化特征工程的协同特征集搜索

Proceedings of the ACM Web Conference 2023 Pub Date : 2023-04-30 DOI:10.1145/3543507.3583527

Guoshan Lu, Haobo Wang, Saisai Yang, Jing Yuan, Guozheng Yang, Cheng Zang, Gang Chen, J. Zhao

{"title":"捕获:自动化特征工程的协同特征集搜索","authors":"Guoshan Lu, Haobo Wang, Saisai Yang, Jing Yuan, Guozheng Yang, Cheng Zang, Gang Chen, J. Zhao","doi":"10.1145/3543507.3583527","DOIUrl":null,"url":null,"abstract":"Feature engineering often plays a crucial role in building mining systems for tabular data, which traditionally requires experienced human experts to perform. Thanks to the rapid advances in reinforcement learning, it has offered an automated alternative, i.e. automated feature engineering (AutoFE). In this work, through scrutiny of the prior AutoFE methods, we characterize several research challenges that remained in this regime, concerning system-wide efficiency, efficacy, and practicality toward production. We then propose Catch, a full-fledged new AutoFE framework that comprehensively addresses the aforementioned challenges. The core to Catch composes a hierarchical-policy reinforcement learning scheme that manifests a collaborative feature engineering exploration and exploitation grounded on the granularity of the whole feature set. At a higher level of the hierarchy, a decision-making module controls the post-processing of the attained feature engineering transformation. We extensively experiment with Catch on 26 academic standardized tabular datasets and 9 industrialized real-world datasets. Measured by numerous metrics and analyses, Catch establishes a new state-of-the-art, from perspectives performance, latency as well as its practicality towards production. Source code1 can be found at https://github.com/1171000709/Catch.","PeriodicalId":296351,"journal":{"name":"Proceedings of the ACM Web Conference 2023","volume":"94 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Catch: Collaborative Feature Set Search for Automated Feature Engineering\",\"authors\":\"Guoshan Lu, Haobo Wang, Saisai Yang, Jing Yuan, Guozheng Yang, Cheng Zang, Gang Chen, J. Zhao\",\"doi\":\"10.1145/3543507.3583527\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Feature engineering often plays a crucial role in building mining systems for tabular data, which traditionally requires experienced human experts to perform. Thanks to the rapid advances in reinforcement learning, it has offered an automated alternative, i.e. automated feature engineering (AutoFE). In this work, through scrutiny of the prior AutoFE methods, we characterize several research challenges that remained in this regime, concerning system-wide efficiency, efficacy, and practicality toward production. We then propose Catch, a full-fledged new AutoFE framework that comprehensively addresses the aforementioned challenges. The core to Catch composes a hierarchical-policy reinforcement learning scheme that manifests a collaborative feature engineering exploration and exploitation grounded on the granularity of the whole feature set. At a higher level of the hierarchy, a decision-making module controls the post-processing of the attained feature engineering transformation. We extensively experiment with Catch on 26 academic standardized tabular datasets and 9 industrialized real-world datasets. Measured by numerous metrics and analyses, Catch establishes a new state-of-the-art, from perspectives performance, latency as well as its practicality towards production. Source code1 can be found at https://github.com/1171000709/Catch.\",\"PeriodicalId\":296351,\"journal\":{\"name\":\"Proceedings of the ACM Web Conference 2023\",\"volume\":\"94 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM Web Conference 2023\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3543507.3583527\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM Web Conference 2023","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543507.3583527","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

特征工程通常在构建表格数据挖掘系统中起着至关重要的作用，传统上这需要有经验的人类专家来执行。由于强化学习的快速发展，它提供了一个自动化的替代方案，即自动化特征工程(AutoFE)。在这项工作中，通过对先前AutoFE方法的审查，我们描述了在该制度下仍然存在的几个研究挑战，涉及系统范围的效率，功效和生产的实用性。然后，我们提出Catch，这是一个全面解决上述挑战的完整的新AutoFE框架。Catch的核心组成了一个分层策略强化学习方案，该方案体现了基于整个特征集粒度的协作特征工程探索和利用。在层次结构的更高层次上，决策模块控制所获得的特征工程转换的后处理。我们在26个学术标准化表格数据集和9个工业化真实世界数据集上广泛地实验了Catch。通过大量的指标和分析，Catch从性能、延迟和生产实用性的角度建立了一个新的技术水平。源代码1可以在https://github.com/1171000709/Catch上找到。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Catch: Collaborative Feature Set Search for Automated Feature Engineering

Feature engineering often plays a crucial role in building mining systems for tabular data, which traditionally requires experienced human experts to perform. Thanks to the rapid advances in reinforcement learning, it has offered an automated alternative, i.e. automated feature engineering (AutoFE). In this work, through scrutiny of the prior AutoFE methods, we characterize several research challenges that remained in this regime, concerning system-wide efficiency, efficacy, and practicality toward production. We then propose Catch, a full-fledged new AutoFE framework that comprehensively addresses the aforementioned challenges. The core to Catch composes a hierarchical-policy reinforcement learning scheme that manifests a collaborative feature engineering exploration and exploitation grounded on the granularity of the whole feature set. At a higher level of the hierarchy, a decision-making module controls the post-processing of the attained feature engineering transformation. We extensively experiment with Catch on 26 academic standardized tabular datasets and 9 industrialized real-world datasets. Measured by numerous metrics and analyses, Catch establishes a new state-of-the-art, from perspectives performance, latency as well as its practicality towards production. Source code1 can be found at https://github.com/1171000709/Catch.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM Web Conference 2023

自引率

0.00%

发文量

期刊最新文献

CurvDrop: A Ricci Curvature Based Approach to Prevent Graph Neural Networks from Over-Smoothing and Over-Squashing Learning to Simulate Crowd Trajectories with Graph Networks Word Sense Disambiguation by Refining Target Word Embedding Curriculum Graph Poisoning Optimizing Guided Traversal for Fast Learned Sparse Retrieval