{"title":"关系抽取的同构多模态句子表示","authors":"Kai Wang, Yanping Chen, WeiZhe Yang, Yongbin Qin","doi":"10.1016/j.inffus.2025.102968","DOIUrl":null,"url":null,"abstract":"<div><div>Deep neural networks enable a sentence to be transformed into different multimodalities such as a token sequence representation (a one-dimensional semantic representation) or a semantic plane (a two-dimensional semantic representation). Sequence representation has the advantage of learning sequential dependencies of a sentence. Semantic plane is built by organizing all spans of a sentence, which is effective in resolving complicated sentence semantic structures. The two representations are derived from a homogeneous resource (the same sentence), but they are separately used in related works. In this paper, a homogeneous multimodality sentence representation is proposed to make full use of semantic information in a sentence. We construct a homomodality model, which is composed of three components: a sequential encoder to generate sequential modality, a plane encoder to build plane modality, and a multimodality fusion component aligning homogeneous multimodalities for learning a multi-granularity semantic representation. Our model is evaluated on four public datasets to support the relation extraction task. Compared with related works, it achieves state-of-the-art performance on all datasets. Analytical experiments show that fusing homogeneous multimodalities is effective in making full use of sentence information for advancing the discriminability of a deep neural network.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"118 ","pages":"Article 102968"},"PeriodicalIF":15.5000,"publicationDate":"2025-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A homogeneous multimodality sentence representation for relation extraction\",\"authors\":\"Kai Wang, Yanping Chen, WeiZhe Yang, Yongbin Qin\",\"doi\":\"10.1016/j.inffus.2025.102968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Deep neural networks enable a sentence to be transformed into different multimodalities such as a token sequence representation (a one-dimensional semantic representation) or a semantic plane (a two-dimensional semantic representation). Sequence representation has the advantage of learning sequential dependencies of a sentence. Semantic plane is built by organizing all spans of a sentence, which is effective in resolving complicated sentence semantic structures. The two representations are derived from a homogeneous resource (the same sentence), but they are separately used in related works. In this paper, a homogeneous multimodality sentence representation is proposed to make full use of semantic information in a sentence. We construct a homomodality model, which is composed of three components: a sequential encoder to generate sequential modality, a plane encoder to build plane modality, and a multimodality fusion component aligning homogeneous multimodalities for learning a multi-granularity semantic representation. Our model is evaluated on four public datasets to support the relation extraction task. Compared with related works, it achieves state-of-the-art performance on all datasets. Analytical experiments show that fusing homogeneous multimodalities is effective in making full use of sentence information for advancing the discriminability of a deep neural network.</div></div>\",\"PeriodicalId\":50367,\"journal\":{\"name\":\"Information Fusion\",\"volume\":\"118 \",\"pages\":\"Article 102968\"},\"PeriodicalIF\":15.5000,\"publicationDate\":\"2025-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Fusion\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1566253525000417\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/3 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525000417","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/3 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
A homogeneous multimodality sentence representation for relation extraction
Deep neural networks enable a sentence to be transformed into different multimodalities such as a token sequence representation (a one-dimensional semantic representation) or a semantic plane (a two-dimensional semantic representation). Sequence representation has the advantage of learning sequential dependencies of a sentence. Semantic plane is built by organizing all spans of a sentence, which is effective in resolving complicated sentence semantic structures. The two representations are derived from a homogeneous resource (the same sentence), but they are separately used in related works. In this paper, a homogeneous multimodality sentence representation is proposed to make full use of semantic information in a sentence. We construct a homomodality model, which is composed of three components: a sequential encoder to generate sequential modality, a plane encoder to build plane modality, and a multimodality fusion component aligning homogeneous multimodalities for learning a multi-granularity semantic representation. Our model is evaluated on four public datasets to support the relation extraction task. Compared with related works, it achieves state-of-the-art performance on all datasets. Analytical experiments show that fusing homogeneous multimodalities is effective in making full use of sentence information for advancing the discriminability of a deep neural network.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.