Industrial exploded views (IEVs) integrate images, text, and part–assembly relations, which is essential for advancing intelligent manufacturing. However, semantic ambiguities, structural inconsistencies, and fragmented annotations hinder effective knowledge extraction and reuse. We cast extraction from IEVs as constrained inference over scene graphs and present a Scene-aware Cascade Expert Chain (SACEC) that incrementally resolves entities, relations, and assembly context. A Visual–Structural–Rule (VSR) validator then enforces domain rules and semantic consistency on every triple. A dynamic triple-cutting strategy selects credible triples by jointly balancing local evidence, contextual coherence, and assembly order, yielding a multimodal knowledge graph (MMKG). We also introduce the Industrial Exploded-View (IEV) dataset, with fine-grained component and relation annotations and assembly-order metadata. Experiments on VRD, VG150, and the IEV dataset demonstrate significant improvements over state-of-the-art baselines, achieving R@100 of 73.2%, 63.9%, and 67.4%, and TripleAcc of 31.8%, 20.2%, and 24.9%. At the triple level, we further obtain P@100 of 54.9%, 39.8%, and 49.6%, and F1@100 of 46.2%, 34.1%, and 45.1%. Against strong path- and context-based baselines, our method improves by up to +7.4 pp in recall@100, +2.7 pp in TripleAcc, +15.8 pp in Precision@100, and +13.5 pp in F1@100. The approach reduces manual annotation and yields interpretable, audit-ready outputs for intelligent design and process planning, offering a practical route to automated and interpretable knowledge extraction in industrial environments.
扫码关注我们
求助内容:
应助结果提醒方式:
