{"title":"Panoptic segmentation-based semantic embedding matching model for scene graph generation","authors":"Ming Zhao, Jing Zhang","doi":"10.1016/j.patrec.2025.04.005","DOIUrl":null,"url":null,"abstract":"<div><div>Scene Graph Generation aims to construct a structured representation of entities and their relationships in an image. Traditional methods use object detection for entity localization but struggle with relationship modeling in complex scenes. Most approaches also face challenges in predicate classification due to inter-class similarity and intra-class variability. Additionally, when multiple entities are present in an image, the contextual information between them are crucial. To address these challenges, this paper proposes a Panoptic Segmentation-based Semantic Embedding Matching Network, which optimizes the entire process from entity localization to entity-pair and predicate prediction. Specifically, we use a panoptic segmentation module to locate all entities (including the foreground and background), providing comprehensive support for predicate prediction in complex scenes. Simultaneously, a semantic embedding module is introduced to fuse the visual and semantic features of entities and predicates respectively, constructing a similarity-based matching mechanism. Furthermore, we incorporate a graph attention network before the semantic embedding of entities, effectively capturing contextual information among multiple entities and dynamically adjusting the semantic embedding module. Experiments on the PSG dataset validate the proposed method’s effectiveness. The results show that our model outperforms existing methods in relationship detection and generation in complex scenes.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 56-63"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525001382","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/19 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Scene Graph Generation aims to construct a structured representation of entities and their relationships in an image. Traditional methods use object detection for entity localization but struggle with relationship modeling in complex scenes. Most approaches also face challenges in predicate classification due to inter-class similarity and intra-class variability. Additionally, when multiple entities are present in an image, the contextual information between them are crucial. To address these challenges, this paper proposes a Panoptic Segmentation-based Semantic Embedding Matching Network, which optimizes the entire process from entity localization to entity-pair and predicate prediction. Specifically, we use a panoptic segmentation module to locate all entities (including the foreground and background), providing comprehensive support for predicate prediction in complex scenes. Simultaneously, a semantic embedding module is introduced to fuse the visual and semantic features of entities and predicates respectively, constructing a similarity-based matching mechanism. Furthermore, we incorporate a graph attention network before the semantic embedding of entities, effectively capturing contextual information among multiple entities and dynamically adjusting the semantic embedding module. Experiments on the PSG dataset validate the proposed method’s effectiveness. The results show that our model outperforms existing methods in relationship detection and generation in complex scenes.
期刊介绍:
Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition.
Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.