利用无监督学习从公共数据集中检索CAD装配模型

IF 9.9 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Advanced Engineering Informatics Pub Date : 2025-05-01 Epub Date: 2025-02-17 DOI:10.1016/j.aei.2025.103182

Yixuan Li , Jie Zhang , Jiazhen Pang , Ya Yao

{"title":"利用无监督学习从公共数据集中检索CAD装配模型","authors":"Yixuan Li , Jie Zhang , Jiazhen Pang , Ya Yao","doi":"10.1016/j.aei.2025.103182","DOIUrl":null,"url":null,"abstract":"<div><div>Retrieving assembly models from public datasets can yield enriched outcomes and broaden the spectrum of insights. However, public datasets often present unique challenges, such as variance in quality and granularity of assembly models, lack of standardized methods for organizing and labeling, which hinder efficient and accurate retrieval. To address these issues, this paper presents a robust two-step retrieval method tailored for CAD assembly models from public datasets. The first phase utilizes hierarchical clustering in an unsupervised learning framework to systematically organize CAD assembly models. Each assembly model is represented by a feature vector that encapsulates geometrical and topological features derived from its Boundary Representation (B-rep), and reflects hierarchical relationships among parts and components. These feature vectors serve as the basis for systematic indexing via hierarchical clustering, grouping models based on similarity measurement. Each cluster’s centroid, representing the collective feature vector, facilitates efficient and targeted retrieval. In the second phase, the query model is directly compared to cluster centroids, enabling rapid identification of similar assembly collections. To enhance precision within identified clusters, we introduce a fine-grained retrieval technique that integrates Optimal Subsequence Bijection (OSB) with Maximum Mean Discrepancy (MMD). Evaluations on a heterogeneous dataset demonstrate that our method not only streamlines dataset organization but also effectively addresses quality variations, significantly improving retrieval efficiency across extensive collections.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103182"},"PeriodicalIF":9.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harnessing unsupervised learning for retrieving CAD assembly models from public datasets\",\"authors\":\"Yixuan Li , Jie Zhang , Jiazhen Pang , Ya Yao\",\"doi\":\"10.1016/j.aei.2025.103182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Retrieving assembly models from public datasets can yield enriched outcomes and broaden the spectrum of insights. However, public datasets often present unique challenges, such as variance in quality and granularity of assembly models, lack of standardized methods for organizing and labeling, which hinder efficient and accurate retrieval. To address these issues, this paper presents a robust two-step retrieval method tailored for CAD assembly models from public datasets. The first phase utilizes hierarchical clustering in an unsupervised learning framework to systematically organize CAD assembly models. Each assembly model is represented by a feature vector that encapsulates geometrical and topological features derived from its Boundary Representation (B-rep), and reflects hierarchical relationships among parts and components. These feature vectors serve as the basis for systematic indexing via hierarchical clustering, grouping models based on similarity measurement. Each cluster’s centroid, representing the collective feature vector, facilitates efficient and targeted retrieval. In the second phase, the query model is directly compared to cluster centroids, enabling rapid identification of similar assembly collections. To enhance precision within identified clusters, we introduce a fine-grained retrieval technique that integrates Optimal Subsequence Bijection (OSB) with Maximum Mean Discrepancy (MMD). Evaluations on a heterogeneous dataset demonstrate that our method not only streamlines dataset organization but also effectively addresses quality variations, significantly improving retrieval efficiency across extensive collections.</div></div>\",\"PeriodicalId\":50941,\"journal\":{\"name\":\"Advanced Engineering Informatics\",\"volume\":\"65 \",\"pages\":\"Article 103182\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advanced Engineering Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1474034625000758\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625000758","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

从公共数据集中检索装配模型可以产生丰富的结果，并拓宽见解的范围。然而，公共数据集经常面临独特的挑战，如装配模型的质量和粒度差异，缺乏标准化的组织和标记方法，这阻碍了有效和准确的检索。为了解决这些问题，本文提出了一种针对公共数据集的CAD装配模型定制的鲁棒两步检索方法。第一阶段利用无监督学习框架中的分层聚类系统地组织CAD装配模型。每个装配模型由一个特征向量表示，该特征向量封装了来自其边界表示（B-rep）的几何和拓扑特征，并反映了零件和组件之间的层次关系。这些特征向量作为系统索引的基础，通过层次聚类，基于相似性度量的分组模型。每个聚类的质心代表集体特征向量，便于高效和有针对性的检索。在第二阶段，直接将查询模型与聚类质心进行比较，从而能够快速识别相似的装配集合。为了提高识别聚类的精度，我们引入了一种细粒度检索技术，该技术将最优子序列双注入（OSB）和最大平均差异（MMD）相结合。对异构数据集的评估表明，我们的方法不仅简化了数据集组织，而且有效地解决了质量差异，显著提高了广泛集合的检索效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Harnessing unsupervised learning for retrieving CAD assembly models from public datasets

Retrieving assembly models from public datasets can yield enriched outcomes and broaden the spectrum of insights. However, public datasets often present unique challenges, such as variance in quality and granularity of assembly models, lack of standardized methods for organizing and labeling, which hinder efficient and accurate retrieval. To address these issues, this paper presents a robust two-step retrieval method tailored for CAD assembly models from public datasets. The first phase utilizes hierarchical clustering in an unsupervised learning framework to systematically organize CAD assembly models. Each assembly model is represented by a feature vector that encapsulates geometrical and topological features derived from its Boundary Representation (B-rep), and reflects hierarchical relationships among parts and components. These feature vectors serve as the basis for systematic indexing via hierarchical clustering, grouping models based on similarity measurement. Each cluster’s centroid, representing the collective feature vector, facilitates efficient and targeted retrieval. In the second phase, the query model is directly compared to cluster centroids, enabling rapid identification of similar assembly collections. To enhance precision within identified clusters, we introduce a fine-grained retrieval technique that integrates Optimal Subsequence Bijection (OSB) with Maximum Mean Discrepancy (MMD). Evaluations on a heterogeneous dataset demonstrate that our method not only streamlines dataset organization but also effectively addresses quality variations, significantly improving retrieval efficiency across extensive collections.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Advanced Engineering Informatics 工程技术-工程：综合

CiteScore

12.40

自引率

18.20%

发文量

292

审稿时长

45 days

期刊介绍： Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.