Unifying Structured Data as Graph for Data-to-Text Pre-Training

IF 5.5 3区 材料科学 Q2 CHEMISTRY, PHYSICAL ACS Applied Energy Materials Pub Date : 2024-01-02 DOI:10.1162/tacl_a_00641
Shujie Li, Liang Li, Ruiying Geng, Min Yang, Binhua Li, Guanghu Yuan, Wanwei He, Shao Yuan, Can Ma, Fei Huang, Yongbin Li
{"title":"Unifying Structured Data as Graph for Data-to-Text Pre-Training","authors":"Shujie Li, Liang Li, Ruiying Geng, Min Yang, Binhua Li, Guanghu Yuan, Wanwei He, Shao Yuan, Can Ma, Fei Huang, Yongbin Li","doi":"10.1162/tacl_a_00641","DOIUrl":null,"url":null,"abstract":"Abstract Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performance. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different D2T generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.","PeriodicalId":4,"journal":{"name":"ACS Applied Energy Materials","volume":"29 4","pages":"210-228"},"PeriodicalIF":5.5000,"publicationDate":"2024-01-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Energy Materials","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1162/tacl_a_00641","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 1

Abstract

Abstract Data-to-text (D2T) generation aims to transform structured data into natural language text. Data-to-text pre-training has proved to be powerful in enhancing D2T generation and yields impressive performance. However, previous pre-training methods either oversimplified structured data into a sequence without considering input structures or designed training objectives tailored for a specific data structure (e.g., table or knowledge graph). In this paper, we unify different types of structured data (i.e., table, key-value data, knowledge graph) into the graph format and cast different D2T generation tasks as graph-to-text generation. To effectively exploit the structural information of the input graph, we propose a structure-enhanced pre-training method for D2T generation by designing a structure-enhanced Transformer. Concretely, we devise a position matrix for the Transformer, encoding relative positional information of connected nodes in the input graph. In addition, we propose a new attention matrix to incorporate graph structures into the original Transformer by taking the available explicit connectivity structure into account. Extensive experiments on six benchmark datasets show the effectiveness of our model. Our source codes are available at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
将结构化数据统一为图表,进行数据到文本的预训练
摘要 数据到文本(D2T)生成旨在将结构化数据转换为自然语言文本。事实证明,数据到文本的预训练能有效提高 D2T 生成能力,并产生令人印象深刻的性能。然而,以往的预训练方法要么是将结构化数据过度简化为序列而不考虑输入结构,要么是为特定数据结构(如表格或知识图谱)量身定制训练目标。在本文中,我们将不同类型的结构化数据(如表格、键值数据、知识图谱)统一为图格式,并将不同的 D2T 生成任务转换为图到文本的生成。为了有效利用输入图的结构信息,我们通过设计结构增强变换器(structure-enhanced Transformer),为 D2T 生成提出了一种结构增强预训练方法。具体来说,我们为变换器设计了一个位置矩阵,编码输入图中连接节点的相对位置信息。此外,我们还提出了一种新的注意力矩阵,通过考虑可用的显式连接结构,将图结构纳入原始变换器。在六个基准数据集上进行的广泛实验表明了我们模型的有效性。我们的源代码可在 https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/unid2t 上获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
ACS Applied Energy Materials
ACS Applied Energy Materials Materials Science-Materials Chemistry
CiteScore
10.30
自引率
6.20%
发文量
1368
期刊介绍: ACS Applied Energy Materials is an interdisciplinary journal publishing original research covering all aspects of materials, engineering, chemistry, physics and biology relevant to energy conversion and storage. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrate knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important energy applications.
期刊最新文献
Issue Editorial Masthead Issue Publication Information PtFe Alloy Nanoparticles Supported on Polymeric Schiff Base-Derived N-Doped Carbon for Oxygen Reduction Reaction Improved Perovskite Solar Cells with an Environmentally Friendly Phthalocyanine Hole Extracting Interlayer Boosting MIL-101(V) as a Vanadium-Based Metal–Organic Framework via MoS2/Graphene Quantum Dot Nanocomposite in Electrochemical Hydrogen Storage
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1