{"title":"利用无监督学习从公共数据集中检索CAD装配模型","authors":"Yixuan Li , Jie Zhang , Jiazhen Pang , Ya Yao","doi":"10.1016/j.aei.2025.103182","DOIUrl":null,"url":null,"abstract":"<div><div>Retrieving assembly models from public datasets can yield enriched outcomes and broaden the spectrum of insights. However, public datasets often present unique challenges, such as variance in quality and granularity of assembly models, lack of standardized methods for organizing and labeling, which hinder efficient and accurate retrieval. To address these issues, this paper presents a robust two-step retrieval method tailored for CAD assembly models from public datasets. The first phase utilizes hierarchical clustering in an unsupervised learning framework to systematically organize CAD assembly models. Each assembly model is represented by a feature vector that encapsulates geometrical and topological features derived from its Boundary Representation (B-rep), and reflects hierarchical relationships among parts and components. These feature vectors serve as the basis for systematic indexing via hierarchical clustering, grouping models based on similarity measurement. Each cluster’s centroid, representing the collective feature vector, facilitates efficient and targeted retrieval. In the second phase, the query model is directly compared to cluster centroids, enabling rapid identification of similar assembly collections. To enhance precision within identified clusters, we introduce a fine-grained retrieval technique that integrates Optimal Subsequence Bijection (OSB) with Maximum Mean Discrepancy (MMD). Evaluations on a heterogeneous dataset demonstrate that our method not only streamlines dataset organization but also effectively addresses quality variations, significantly improving retrieval efficiency across extensive collections.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103182"},"PeriodicalIF":9.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harnessing unsupervised learning for retrieving CAD assembly models from public datasets\",\"authors\":\"Yixuan Li , Jie Zhang , Jiazhen Pang , Ya Yao\",\"doi\":\"10.1016/j.aei.2025.103182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Retrieving assembly models from public datasets can yield enriched outcomes and broaden the spectrum of insights. However, public datasets often present unique challenges, such as variance in quality and granularity of assembly models, lack of standardized methods for organizing and labeling, which hinder efficient and accurate retrieval. To address these issues, this paper presents a robust two-step retrieval method tailored for CAD assembly models from public datasets. The first phase utilizes hierarchical clustering in an unsupervised learning framework to systematically organize CAD assembly models. Each assembly model is represented by a feature vector that encapsulates geometrical and topological features derived from its Boundary Representation (B-rep), and reflects hierarchical relationships among parts and components. These feature vectors serve as the basis for systematic indexing via hierarchical clustering, grouping models based on similarity measurement. Each cluster’s centroid, representing the collective feature vector, facilitates efficient and targeted retrieval. In the second phase, the query model is directly compared to cluster centroids, enabling rapid identification of similar assembly collections. To enhance precision within identified clusters, we introduce a fine-grained retrieval technique that integrates Optimal Subsequence Bijection (OSB) with Maximum Mean Discrepancy (MMD). Evaluations on a heterogeneous dataset demonstrate that our method not only streamlines dataset organization but also effectively addresses quality variations, significantly improving retrieval efficiency across extensive collections.</div></div>\",\"PeriodicalId\":50941,\"journal\":{\"name\":\"Advanced Engineering Informatics\",\"volume\":\"65 \",\"pages\":\"Article 103182\"},\"PeriodicalIF\":9.9000,\"publicationDate\":\"2025-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Advanced Engineering Informatics\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1474034625000758\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/2/17 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625000758","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/2/17 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Harnessing unsupervised learning for retrieving CAD assembly models from public datasets
Retrieving assembly models from public datasets can yield enriched outcomes and broaden the spectrum of insights. However, public datasets often present unique challenges, such as variance in quality and granularity of assembly models, lack of standardized methods for organizing and labeling, which hinder efficient and accurate retrieval. To address these issues, this paper presents a robust two-step retrieval method tailored for CAD assembly models from public datasets. The first phase utilizes hierarchical clustering in an unsupervised learning framework to systematically organize CAD assembly models. Each assembly model is represented by a feature vector that encapsulates geometrical and topological features derived from its Boundary Representation (B-rep), and reflects hierarchical relationships among parts and components. These feature vectors serve as the basis for systematic indexing via hierarchical clustering, grouping models based on similarity measurement. Each cluster’s centroid, representing the collective feature vector, facilitates efficient and targeted retrieval. In the second phase, the query model is directly compared to cluster centroids, enabling rapid identification of similar assembly collections. To enhance precision within identified clusters, we introduce a fine-grained retrieval technique that integrates Optimal Subsequence Bijection (OSB) with Maximum Mean Discrepancy (MMD). Evaluations on a heterogeneous dataset demonstrate that our method not only streamlines dataset organization but also effectively addresses quality variations, significantly improving retrieval efficiency across extensive collections.
期刊介绍:
Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.