用于文本分类的图形接收变换器编码器

IF 3 3区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC IEEE Transactions on Signal and Information Processing over Networks Pub Date : 2024-03-21 DOI:10.1109/TSIPN.2024.3380362

Arda Can Aras;Tuna Alikaşifoğlu;Aykut Koç

{"title":"用于文本分类的图形接收变换器编码器","authors":"Arda Can Aras;Tuna Alikaşifoğlu;Aykut Koç","doi":"10.1109/TSIPN.2024.3380362","DOIUrl":null,"url":null,"abstract":"By employing attention mechanisms, transformers have made great improvements in nearly all NLP tasks, including text classification. However, the context of the transformer's attention mechanism is limited to single sequences, and their fine-tuning stage can utilize only inductive learning. Focusing on broader contexts by representing texts as graphs, previous works have generalized transformer models to graph domains to employ attention mechanisms beyond single sequences. However, these approaches either require exhaustive pre-training stages, learn only transductively, or can learn inductively without utilizing pre-trained models. To address these problems simultaneously, we propose the Graph Receptive Transformer Encoder (GRTE), which combines graph neural networks (GNNs) with large-scale pre-trained models for text classification in both inductive and transductive fashions. By constructing heterogeneous and homogeneous graphs over given corpora and not requiring a pre-training stage, GRTE can utilize information from both large-scale pre-trained models and graph-structured relations. Our proposed method retrieves global and contextual information in documents and generates word embeddings as a by-product of inductive inference. We compared the proposed GRTE with a wide range of baseline models through comprehensive experiments. Compared to the state-of-the-art, we demonstrated that GRTE improves model performances and offers computational savings up to ˜100×.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"10 ","pages":"347-359"},"PeriodicalIF":3.0000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Graph Receptive Transformer Encoder for Text Classification\",\"authors\":\"Arda Can Aras;Tuna Alikaşifoğlu;Aykut Koç\",\"doi\":\"10.1109/TSIPN.2024.3380362\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By employing attention mechanisms, transformers have made great improvements in nearly all NLP tasks, including text classification. However, the context of the transformer's attention mechanism is limited to single sequences, and their fine-tuning stage can utilize only inductive learning. Focusing on broader contexts by representing texts as graphs, previous works have generalized transformer models to graph domains to employ attention mechanisms beyond single sequences. However, these approaches either require exhaustive pre-training stages, learn only transductively, or can learn inductively without utilizing pre-trained models. To address these problems simultaneously, we propose the Graph Receptive Transformer Encoder (GRTE), which combines graph neural networks (GNNs) with large-scale pre-trained models for text classification in both inductive and transductive fashions. By constructing heterogeneous and homogeneous graphs over given corpora and not requiring a pre-training stage, GRTE can utilize information from both large-scale pre-trained models and graph-structured relations. Our proposed method retrieves global and contextual information in documents and generates word embeddings as a by-product of inductive inference. We compared the proposed GRTE with a wide range of baseline models through comprehensive experiments. Compared to the state-of-the-art, we demonstrated that GRTE improves model performances and offers computational savings up to ˜100×.\",\"PeriodicalId\":56268,\"journal\":{\"name\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"volume\":\"10 \",\"pages\":\"347-359\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10477516/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10477516/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

通过采用注意力机制，变换器在包括文本分类在内的几乎所有 NLP 任务中都取得了巨大进步。但是，转换器的注意机制的上下文仅限于单一序列，而且其微调阶段只能利用归纳学习。以前的研究通过将文本表示为图来关注更广泛的上下文，并将变换器模型推广到图域，以采用超越单一序列的关注机制。然而，这些方法要么需要详尽的预训练阶段，要么只能进行归纳学习，要么只能进行归纳学习而不能利用预训练模型。为了同时解决这些问题，我们提出了图接收变换器编码器（GRTE），它将图神经网络（GNN）与大规模预训练模型相结合，以归纳和变换的方式进行文本分类。通过在给定的语料库中构建异质和同质图，并且不需要预训练阶段，GRTE 可以利用大规模预训练模型和图结构关系中的信息。我们提出的方法可以检索文档中的全局信息和上下文信息，并生成词嵌入作为归纳推理的副产品。通过综合实验，我们将所提出的 GRTE 与各种基线模型进行了比较。与最先进的模型相比，我们证明 GRTE 提高了模型性能，并节省了高达 ˜100 倍的计算量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Graph Receptive Transformer Encoder for Text Classification

By employing attention mechanisms, transformers have made great improvements in nearly all NLP tasks, including text classification. However, the context of the transformer's attention mechanism is limited to single sequences, and their fine-tuning stage can utilize only inductive learning. Focusing on broader contexts by representing texts as graphs, previous works have generalized transformer models to graph domains to employ attention mechanisms beyond single sequences. However, these approaches either require exhaustive pre-training stages, learn only transductively, or can learn inductively without utilizing pre-trained models. To address these problems simultaneously, we propose the Graph Receptive Transformer Encoder (GRTE), which combines graph neural networks (GNNs) with large-scale pre-trained models for text classification in both inductive and transductive fashions. By constructing heterogeneous and homogeneous graphs over given corpora and not requiring a pre-training stage, GRTE can utilize information from both large-scale pre-trained models and graph-structured relations. Our proposed method retrieves global and contextual information in documents and generates word embeddings as a by-product of inductive inference. We compared the proposed GRTE with a wide range of baseline models through comprehensive experiments. Compared to the state-of-the-art, we demonstrated that GRTE improves model performances and offers computational savings up to ˜100×.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications

CiteScore

5.80

自引率

12.50%

发文量

期刊介绍： The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.