Distributed Training of Knowledge Graph Embedding Models using Ray

Advances in database technology : proceedings. International Conference on Extending Database Technology Pub Date : 2022-01-01 DOI:10.48786/edbt.2022.48

Nasrullah Sheikh, Xiao Qin, B. Reinwald

{"title":"Distributed Training of Knowledge Graph Embedding Models using Ray","authors":"Nasrullah Sheikh, Xiao Qin, B. Reinwald","doi":"10.48786/edbt.2022.48","DOIUrl":null,"url":null,"abstract":"Knowledge graphs are at the core of numerous consumer and enterprise applications where learned graph embeddings are used to derive insights for the users of these applications. Since knowledge graphs can be very large, the process of learning embeddings is time and resource intensive and needs to be done in a distributed manner to leverage compute resources of multiple machines. Therefore, these applications demand performance and scalability at the development and deployment stages, and require these models to be developed and deployed in frameworks that address these requirements. Ray 1 is an example of such a framework that offers both ease of development and deployment, and enables running tasks in a distributed manner using simple APIs. In this work, we use Ray to build an end-to-end system for data preprocessing and distributed training of graph neural network based knowledge graph embedding models. We apply our system to link prediction task, i.e. using knowledge graph embedding to discover links between nodes in graphs. We evaluate our system on a real-world industrial dataset and demonstrate significant speedups of both, distributed data preprocessing and distributed model training. Compared to non-distributed learning, we achieved a training speedup of 12 × with 4 Ray workers without any deterioration in the evaluation metrics.","PeriodicalId":88813,"journal":{"name":"Advances in database technology : proceedings. International Conference on Extending Database Technology","volume":"29 1","pages":"2:549-2:553"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in database technology : proceedings. International Conference on Extending Database Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48786/edbt.2022.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Knowledge graphs are at the core of numerous consumer and enterprise applications where learned graph embeddings are used to derive insights for the users of these applications. Since knowledge graphs can be very large, the process of learning embeddings is time and resource intensive and needs to be done in a distributed manner to leverage compute resources of multiple machines. Therefore, these applications demand performance and scalability at the development and deployment stages, and require these models to be developed and deployed in frameworks that address these requirements. Ray 1 is an example of such a framework that offers both ease of development and deployment, and enables running tasks in a distributed manner using simple APIs. In this work, we use Ray to build an end-to-end system for data preprocessing and distributed training of graph neural network based knowledge graph embedding models. We apply our system to link prediction task, i.e. using knowledge graph embedding to discover links between nodes in graphs. We evaluate our system on a real-world industrial dataset and demonstrate significant speedups of both, distributed data preprocessing and distributed model training. Compared to non-distributed learning, we achieved a training speedup of 12 × with 4 Ray workers without any deterioration in the evaluation metrics.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于Ray的知识图嵌入模型的分布式训练

知识图是许多消费者和企业应用程序的核心，其中学习的图嵌入用于为这些应用程序的用户派生见解。由于知识图可能非常大，学习嵌入的过程是时间和资源密集型的，需要以分布式方式完成，以利用多台机器的计算资源。因此，这些应用程序在开发和部署阶段需要性能和可伸缩性，并且需要在满足这些需求的框架中开发和部署这些模型。Ray 1就是这样一个框架的例子，它提供了开发和部署的便利性，并支持使用简单的api以分布式方式运行任务。在这项工作中，我们使用Ray构建了一个端到端的系统，用于基于知识图嵌入模型的图神经网络的数据预处理和分布式训练。我们将该系统应用于链接预测任务，即利用知识图嵌入来发现图中节点之间的链接。我们在现实世界的工业数据集上评估了我们的系统，并展示了分布式数据预处理和分布式模型训练的显着速度。与非分布式学习相比，我们使用4个Ray工人实现了12倍的培训加速，而评估指标没有任何恶化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Advances in database technology : proceedings. International Conference on Extending Database Technology

自引率

0.00%

发文量

期刊最新文献

Computing Generic Abstractions from Application Datasets Fair Spatial Indexing: A paradigm for Group Spatial Fairness. Data Coverage for Detecting Representation Bias in Image Datasets: A Crowdsourcing Approach Auditing for Spatial Fairness TransEdge: Supporting Efficient Read Queries Across Untrusted Edge Nodes