Panoptic segmentation-based semantic embedding matching model for scene graph generation

IF 3.3 3区 计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pattern Recognition Letters Pub Date : 2025-07-01 Epub Date: 2025-04-19 DOI:10.1016/j.patrec.2025.04.005
Ming Zhao, Jing Zhang
{"title":"Panoptic segmentation-based semantic embedding matching model for scene graph generation","authors":"Ming Zhao,&nbsp;Jing Zhang","doi":"10.1016/j.patrec.2025.04.005","DOIUrl":null,"url":null,"abstract":"<div><div>Scene Graph Generation aims to construct a structured representation of entities and their relationships in an image. Traditional methods use object detection for entity localization but struggle with relationship modeling in complex scenes. Most approaches also face challenges in predicate classification due to inter-class similarity and intra-class variability. Additionally, when multiple entities are present in an image, the contextual information between them are crucial. To address these challenges, this paper proposes a Panoptic Segmentation-based Semantic Embedding Matching Network, which optimizes the entire process from entity localization to entity-pair and predicate prediction. Specifically, we use a panoptic segmentation module to locate all entities (including the foreground and background), providing comprehensive support for predicate prediction in complex scenes. Simultaneously, a semantic embedding module is introduced to fuse the visual and semantic features of entities and predicates respectively, constructing a similarity-based matching mechanism. Furthermore, we incorporate a graph attention network before the semantic embedding of entities, effectively capturing contextual information among multiple entities and dynamically adjusting the semantic embedding module. Experiments on the PSG dataset validate the proposed method’s effectiveness. The results show that our model outperforms existing methods in relationship detection and generation in complex scenes.</div></div>","PeriodicalId":54638,"journal":{"name":"Pattern Recognition Letters","volume":"193 ","pages":"Pages 56-63"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition Letters","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167865525001382","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/19 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Scene Graph Generation aims to construct a structured representation of entities and their relationships in an image. Traditional methods use object detection for entity localization but struggle with relationship modeling in complex scenes. Most approaches also face challenges in predicate classification due to inter-class similarity and intra-class variability. Additionally, when multiple entities are present in an image, the contextual information between them are crucial. To address these challenges, this paper proposes a Panoptic Segmentation-based Semantic Embedding Matching Network, which optimizes the entire process from entity localization to entity-pair and predicate prediction. Specifically, we use a panoptic segmentation module to locate all entities (including the foreground and background), providing comprehensive support for predicate prediction in complex scenes. Simultaneously, a semantic embedding module is introduced to fuse the visual and semantic features of entities and predicates respectively, constructing a similarity-based matching mechanism. Furthermore, we incorporate a graph attention network before the semantic embedding of entities, effectively capturing contextual information among multiple entities and dynamically adjusting the semantic embedding module. Experiments on the PSG dataset validate the proposed method’s effectiveness. The results show that our model outperforms existing methods in relationship detection and generation in complex scenes.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于全视分割的场景图生成语义嵌入匹配模型
场景图生成旨在构建图像中实体及其关系的结构化表示。传统的方法使用对象检测进行实体定位,但在复杂场景中难以建立关系模型。由于类间相似性和类内可变性,大多数方法在谓词分类方面也面临挑战。此外,当图像中存在多个实体时,它们之间的上下文信息至关重要。为了解决这些问题,本文提出了一种基于泛视分割的语义嵌入匹配网络,该网络优化了从实体定位到实体对和谓词预测的整个过程。具体来说,我们使用全景分割模块来定位所有实体(包括前景和背景),为复杂场景下的谓词预测提供全面的支持。同时引入语义嵌入模块,分别融合实体和谓词的视觉特征和语义特征,构建基于相似度的匹配机制。此外,我们在实体语义嵌入之前引入了一个图关注网络,有效地捕获多个实体之间的上下文信息,并动态调整语义嵌入模块。在PSG数据集上的实验验证了该方法的有效性。结果表明,该模型在复杂场景下的关系检测和生成方面优于现有方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Pattern Recognition Letters
Pattern Recognition Letters 工程技术-计算机:人工智能
CiteScore
12.40
自引率
5.90%
发文量
287
审稿时长
9.1 months
期刊介绍: Pattern Recognition Letters aims at rapid publication of concise articles of a broad interest in pattern recognition. Subject areas include all the current fields of interest represented by the Technical Committees of the International Association of Pattern Recognition, and other developing themes involving learning and recognition.
期刊最新文献
DM-SR: Diffusion-based multimodal semantic restoration within semantic communication systems VoMarkSplat: Robust watermarking for 3D Gaussian splatting with patch and multi-convolutional voting Object – PSF: A unified representation framework for end-to-end panoptic segmentation forecasting Task-Driven learned image compression with explainability preservation for image classification VWENet: Volumetric wavelet and mixture-of-experts network for 3D medical image segmentation
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1