SAN-T2T:基于选择性注意网络的自动表到文本生成器

IF 2.3 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Natural Language Engineering Pub Date : 2023-05-05 DOI:10.1017/s135132492300013x

Haijie Ding, Xiaolong Xu

{"title":"SAN-T2T:基于选择性注意网络的自动表到文本生成器","authors":"Haijie Ding, Xiaolong Xu","doi":"10.1017/s135132492300013x","DOIUrl":null,"url":null,"abstract":"\n Table-to-text generation aims to generate descriptions for structured data (i.e., tables) and has been applied in many fields like question-answering systems and search engines. Current approaches mostly use neural language models to learn alignment between output and input based on the attention mechanisms, which are still flawed by the gradual weakening of attention when processing long texts and the inability to utilize the records’ structural information. To solve these problems, we propose a novel generative model SAN-T2T, which consists of a field-content selective encoder and a descriptive decoder, connected with a selective attention network. In the encoding phase, the table’s structure is integrated into its field representation, and a content selector with self-aligned gates is applied to take advantage of the fact that different records can determine each other’s importance. In the decoding phase, the content selector’s semantic information enhances the alignment between description and records, and a featured copy mechanism is applied to solve the rare word problem. Experiments on WikiBio and WeatherGov datasets show that SAN-T2T outperforms the baselines by a large margin, and the content selector indeed improves the model’s performance.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2023-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SAN-T2T: An automated table-to-text generator based on selective attention network\",\"authors\":\"Haijie Ding, Xiaolong Xu\",\"doi\":\"10.1017/s135132492300013x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Table-to-text generation aims to generate descriptions for structured data (i.e., tables) and has been applied in many fields like question-answering systems and search engines. Current approaches mostly use neural language models to learn alignment between output and input based on the attention mechanisms, which are still flawed by the gradual weakening of attention when processing long texts and the inability to utilize the records’ structural information. To solve these problems, we propose a novel generative model SAN-T2T, which consists of a field-content selective encoder and a descriptive decoder, connected with a selective attention network. In the encoding phase, the table’s structure is integrated into its field representation, and a content selector with self-aligned gates is applied to take advantage of the fact that different records can determine each other’s importance. In the decoding phase, the content selector’s semantic information enhances the alignment between description and records, and a featured copy mechanism is applied to solve the rare word problem. Experiments on WikiBio and WeatherGov datasets show that SAN-T2T outperforms the baselines by a large margin, and the content selector indeed improves the model’s performance.\",\"PeriodicalId\":49143,\"journal\":{\"name\":\"Natural Language Engineering\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.3000,\"publicationDate\":\"2023-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Natural Language Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1017/s135132492300013x\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s135132492300013x","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

表到文本生成旨在生成结构化数据（即表）的描述，并已应用于许多领域，如问答系统和搜索引擎。目前的方法大多使用神经语言模型来学习基于注意力机制的输出和输入之间的一致性，但在处理长文本时注意力逐渐减弱，无法利用记录的结构信息，这仍然存在缺陷。为了解决这些问题，我们提出了一种新的生成模型SAN-T2T，它由一个字段内容选择性编码器和一个描述性解码器组成，并与一个选择性注意力网络相连。在编码阶段，表的结构被集成到其字段表示中，并应用具有自对准门的内容选择器，以利用不同记录可以确定彼此的重要性这一事实。在解码阶段，内容选择器的语义信息增强了描述和记录之间的对齐，并应用了一种特色复制机制来解决罕见词问题。在WikiBio和WeatherGov数据集上的实验表明，SAN-T2T在很大程度上优于基线，内容选择器确实提高了模型的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SAN-T2T: An automated table-to-text generator based on selective attention network

Table-to-text generation aims to generate descriptions for structured data (i.e., tables) and has been applied in many fields like question-answering systems and search engines. Current approaches mostly use neural language models to learn alignment between output and input based on the attention mechanisms, which are still flawed by the gradual weakening of attention when processing long texts and the inability to utilize the records’ structural information. To solve these problems, we propose a novel generative model SAN-T2T, which consists of a field-content selective encoder and a descriptive decoder, connected with a selective attention network. In the encoding phase, the table’s structure is integrated into its field representation, and a content selector with self-aligned gates is applied to take advantage of the fact that different records can determine each other’s importance. In the decoding phase, the content selector’s semantic information enhances the alignment between description and records, and a featured copy mechanism is applied to solve the rare word problem. Experiments on WikiBio and WeatherGov datasets show that SAN-T2T outperforms the baselines by a large margin, and the content selector indeed improves the model’s performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Natural Language Engineering COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-

CiteScore

5.90

自引率

12.00%

发文量

审稿时长

>12 weeks

期刊介绍： Natural Language Engineering meets the needs of professionals and researchers working in all areas of computerised language processing, whether from the perspective of theoretical or descriptive linguistics, lexicology, computer science or engineering. Its aim is to bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing research articles on a broad range of topics - from text analysis, machine translation, information retrieval and speech analysis and generation to integrated systems and multi modal interfaces - it also publishes special issues on specific areas and technologies within these topics, an industry watch column and book reviews.