图表特征管理:影响、挑战和机遇

Proceedings of the 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) Pub Date : 2023-06-18 DOI:10.1145/3594778.3596882

James Cheng

{"title":"图表特征管理:影响、挑战和机遇","authors":"James Cheng","doi":"10.1145/3594778.3596882","DOIUrl":null,"url":null,"abstract":"Graph features are crucial to many applications such as recommender systems and risk management systems. The process to obtain useful graph features involves ingesting data from various upstream data sources, defining the desired graph features for the required applications, constructing a feature engineering workflow to compute the features, and storing and managing the resulting features for downstream tasks (e.g., graph AI and graph BI) and for future reuse. To the majority of users, especially SMEs and non-tech companies, this process poses daunting challenges as it requires users to not only learn various methods (e.g., graph analytical algorithms, non-GNN graph embeddings, GNNs) to define graph features and program their computation, but also learn many infrastructures (e.g., upstream databases, downstream ML systems, graph analytics systems) to compute, manage and use the graph features in production. These challenges have significantly restricted the wider applications of graph technologies such as graph AI and graph BI currently in industry. The current solution provided by major graph database vendors (e.g., Amazon Neptune, Neo4j, Tiger-Graph) is to connect various upstream and downstream systems to their own graph database, which is used to compute and manage graph features. However, such a solution ties users to a specific graph infrastructure that may not be the preferred infrastructure and may even require them to re-develop their applications on a new infrastructure. In addition, a specific graph database or infrastructure often does not have the best performance for all workloads and certainly does not support the computation of all types of graph features. As a result, the existing solution limits users' flexibility in choosing their own infrastructure and their productivity in developing their applications. In Part 1 of this talk, I will introduce various types of graph features and their applications. Then I will present some trends in using graph databases for graph feature computation and management, analyze the limitations of the existing methods, and identify the requirements of a graph feature management solution that is practical and highly usable to average users. In Part 2 of this talk, I will introduce our ongoing project that aims at providing a highly usable graph feature platform. Our solution decouples graph feature logic specification and management (i.e., how features are defined, coded and managed) from the generation and execution of the workflow for feature computation (i.e., execution plan generation and the actual execution), so that users can flexibly select different infrastructures suitable for the computation of specific types of graph features. It also manages the upstream, downstream and feature engineering and serving infrastructures, so as to free users from tedious tasks associated with deploying infrastructures and connecting them in a feature engineering dataflow. Thus, users can focus on creating and delivering innovative feature workflow logic. Finally, I will also highlight some possible future directions about graph feature management.","PeriodicalId":371215,"journal":{"name":"Proceedings of the 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph Feature Management: Impact, Challenges and Opportunities\",\"authors\":\"James Cheng\",\"doi\":\"10.1145/3594778.3596882\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph features are crucial to many applications such as recommender systems and risk management systems. The process to obtain useful graph features involves ingesting data from various upstream data sources, defining the desired graph features for the required applications, constructing a feature engineering workflow to compute the features, and storing and managing the resulting features for downstream tasks (e.g., graph AI and graph BI) and for future reuse. To the majority of users, especially SMEs and non-tech companies, this process poses daunting challenges as it requires users to not only learn various methods (e.g., graph analytical algorithms, non-GNN graph embeddings, GNNs) to define graph features and program their computation, but also learn many infrastructures (e.g., upstream databases, downstream ML systems, graph analytics systems) to compute, manage and use the graph features in production. These challenges have significantly restricted the wider applications of graph technologies such as graph AI and graph BI currently in industry. The current solution provided by major graph database vendors (e.g., Amazon Neptune, Neo4j, Tiger-Graph) is to connect various upstream and downstream systems to their own graph database, which is used to compute and manage graph features. However, such a solution ties users to a specific graph infrastructure that may not be the preferred infrastructure and may even require them to re-develop their applications on a new infrastructure. In addition, a specific graph database or infrastructure often does not have the best performance for all workloads and certainly does not support the computation of all types of graph features. As a result, the existing solution limits users' flexibility in choosing their own infrastructure and their productivity in developing their applications. In Part 1 of this talk, I will introduce various types of graph features and their applications. Then I will present some trends in using graph databases for graph feature computation and management, analyze the limitations of the existing methods, and identify the requirements of a graph feature management solution that is practical and highly usable to average users. In Part 2 of this talk, I will introduce our ongoing project that aims at providing a highly usable graph feature platform. Our solution decouples graph feature logic specification and management (i.e., how features are defined, coded and managed) from the generation and execution of the workflow for feature computation (i.e., execution plan generation and the actual execution), so that users can flexibly select different infrastructures suitable for the computation of specific types of graph features. It also manages the upstream, downstream and feature engineering and serving infrastructures, so as to free users from tedious tasks associated with deploying infrastructures and connecting them in a feature engineering dataflow. Thus, users can focus on creating and delivering innovative feature workflow logic. Finally, I will also highlight some possible future directions about graph feature management.\",\"PeriodicalId\":371215,\"journal\":{\"name\":\"Proceedings of the 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3594778.3596882\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3594778.3596882","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

图形特性对于许多应用程序(如推荐系统和风险管理系统)至关重要。获取有用的图形特征的过程包括从各种上游数据源摄取数据，为所需的应用定义所需的图形特征，构建特征工程工作流来计算特征，并为下游任务(例如，图形AI和图形BI)和未来重用存储和管理结果特征。对于大多数用户，特别是中小企业和非科技公司来说，这个过程带来了艰巨的挑战，因为它要求用户不仅要学习各种方法(例如，图分析算法，非gnn图嵌入，gnn)来定义图特征并对其计算进行编程，还需要学习许多基础设施(例如，上游数据库，下游ML系统，图分析系统)来计算，管理和使用生产中的图特征。这些挑战极大地限制了图形技术(如图形人工智能和图形商业智能)在工业上的广泛应用。目前主要的图数据库供应商(例如Amazon Neptune, Neo4j, Tiger-Graph)提供的解决方案是将各种上下游系统连接到他们自己的图数据库，该数据库用于计算和管理图特征。然而，这样的解决方案将用户绑定到特定的图形基础设施上，而这些基础设施可能不是首选的基础设施，甚至可能要求用户在新的基础设施上重新开发应用程序。此外，特定的图形数据库或基础设施通常不会对所有工作负载具有最佳性能，并且肯定不支持所有类型的图形特征的计算。因此，现有的解决方案限制了用户选择自己的基础设施的灵活性和开发应用程序的生产力。在本讲座的第1部分，我将介绍各种类型的图形特征及其应用。然后，我将介绍使用图数据库进行图特征计算和管理的一些趋势，分析现有方法的局限性，并确定一种实用且对普通用户高可用性的图特征管理解决方案的需求。在本演讲的第2部分，我将介绍我们正在进行的项目，旨在提供一个高度可用的图形特性平台。我们的解决方案将图特征逻辑规范和管理(即特征如何定义、编码和管理)与特征计算工作流的生成和执行(即执行计划的生成和实际执行)解耦，以便用户可以灵活地选择适合特定类型图特征计算的不同基础设施。它还管理上游、下游、特征工程和服务基础设施，从而将用户从部署基础设施和在特征工程数据流中连接它们的繁琐任务中解放出来。因此，用户可以专注于创建和交付创新的功能工作流逻辑。最后，我还将强调图形特征管理的一些可能的未来方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Graph Feature Management: Impact, Challenges and Opportunities

Graph features are crucial to many applications such as recommender systems and risk management systems. The process to obtain useful graph features involves ingesting data from various upstream data sources, defining the desired graph features for the required applications, constructing a feature engineering workflow to compute the features, and storing and managing the resulting features for downstream tasks (e.g., graph AI and graph BI) and for future reuse. To the majority of users, especially SMEs and non-tech companies, this process poses daunting challenges as it requires users to not only learn various methods (e.g., graph analytical algorithms, non-GNN graph embeddings, GNNs) to define graph features and program their computation, but also learn many infrastructures (e.g., upstream databases, downstream ML systems, graph analytics systems) to compute, manage and use the graph features in production. These challenges have significantly restricted the wider applications of graph technologies such as graph AI and graph BI currently in industry. The current solution provided by major graph database vendors (e.g., Amazon Neptune, Neo4j, Tiger-Graph) is to connect various upstream and downstream systems to their own graph database, which is used to compute and manage graph features. However, such a solution ties users to a specific graph infrastructure that may not be the preferred infrastructure and may even require them to re-develop their applications on a new infrastructure. In addition, a specific graph database or infrastructure often does not have the best performance for all workloads and certainly does not support the computation of all types of graph features. As a result, the existing solution limits users' flexibility in choosing their own infrastructure and their productivity in developing their applications. In Part 1 of this talk, I will introduce various types of graph features and their applications. Then I will present some trends in using graph databases for graph feature computation and management, analyze the limitations of the existing methods, and identify the requirements of a graph feature management solution that is practical and highly usable to average users. In Part 2 of this talk, I will introduce our ongoing project that aims at providing a highly usable graph feature platform. Our solution decouples graph feature logic specification and management (i.e., how features are defined, coded and managed) from the generation and execution of the workflow for feature computation (i.e., execution plan generation and the actual execution), so that users can flexibly select different infrastructures suitable for the computation of specific types of graph features. It also manages the upstream, downstream and feature engineering and serving infrastructures, so as to free users from tedious tasks associated with deploying infrastructures and connecting them in a feature engineering dataflow. Thus, users can focus on creating and delivering innovative feature workflow logic. Finally, I will also highlight some possible future directions about graph feature management.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 6th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)

自引率

0.00%

发文量

期刊最新文献

Better Distributed Graph Query Planning With Scouting Queries Fast Synthetic Data-Aware Log Generation for Temporal Declarative Models Future-Time Temporal Path Queries Going with the Flow: Real-Time Max-Flow on Asynchronous Dynamic Graphs The Commercial Side of Graph Analytics: Big Uses, Big Mistakes, Big Opportunities