Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering

IF 9.9 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Advanced Engineering Informatics Pub Date : 2025-05-01 Epub Date: 2025-03-22 DOI:10.1016/j.aei.2025.103265

Xinxin Liang, Zuoxu Wang, Jihong Liu

{"title":"Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering","authors":"Xinxin Liang, Zuoxu Wang, Jihong Liu","doi":"10.1016/j.aei.2025.103265","DOIUrl":null,"url":null,"abstract":"<div><div>Facing the promising tendency of human-artificial intelligence (AI) collaborative product design, fine-grained and multi-modal mechanical part recognition and semantic understanding have become a basic task for achieving a self-cognitive product design system. However, traditional semantic understanding approaches for mechanical parts can only handle single-modal data, which is either textual or image data, resulting in the following limitations 1) insufficient mining on fine-grained part’s functional/behavioral/structural information, and 2) ineffectiveness on multi-modal part information alignment, therefore restricting the intelligence level of the previous product design assistants. To mitigate these challenges, this paper proposes a fine-grained multimodal reasoning approach for mechanical part semantic understanding. The proposed approach utilizes a pre-trained Convolutional Neural Network (CNN) for visual feature extraction, a large language model (LLM) called LLaMA3 for advanced textual analysis, and a Unified Feature Fusion Module (UFFM) to facilitate robust cross-modal interactions. A positive and negative sample generation mechanism is implemented to refine the model’s ability to discern subtle variations in complex components. Experimental evaluations on the Industrial Part Multimodal Dataset (IPMD) demonstrate a significant improvement in classification accuracy, providing a more precise and intelligent solution for the semantic understanding in complex product design systems.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103265"},"PeriodicalIF":9.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625001582","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Facing the promising tendency of human-artificial intelligence (AI) collaborative product design, fine-grained and multi-modal mechanical part recognition and semantic understanding have become a basic task for achieving a self-cognitive product design system. However, traditional semantic understanding approaches for mechanical parts can only handle single-modal data, which is either textual or image data, resulting in the following limitations 1) insufficient mining on fine-grained part’s functional/behavioral/structural information, and 2) ineffectiveness on multi-modal part information alignment, therefore restricting the intelligence level of the previous product design assistants. To mitigate these challenges, this paper proposes a fine-grained multimodal reasoning approach for mechanical part semantic understanding. The proposed approach utilizes a pre-trained Convolutional Neural Network (CNN) for visual feature extraction, a large language model (LLM) called LLaMA3 for advanced textual analysis, and a Unified Feature Fusion Module (UFFM) to facilitate robust cross-modal interactions. A positive and negative sample generation mechanism is implemented to refine the model’s ability to discern subtle variations in complex components. Experimental evaluations on the Industrial Part Multimodal Dataset (IPMD) demonstrate a significant improvement in classification accuracy, providing a more precise and intelligent solution for the semantic understanding in complex product design systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

走向自我认知的复杂产品设计系统：机械工程中使用大型语言模型的细粒度多模态特征识别和语义理解方法

面对人-人工智能协同产品设计的发展趋势，细粒度、多模态的机械零件识别和语义理解成为实现自认知产品设计系统的基本任务。然而，传统的机械零件语义理解方法只能处理单模态数据，即文本数据或图像数据，因此存在以下局限性：1)对细粒度零件的功能/行为/结构信息挖掘不足；2)对多模态零件信息对齐效果不佳，从而限制了以往产品设计助手的智能水平。为了缓解这些挑战，本文提出了一种用于机械零件语义理解的细粒度多模态推理方法。所提出的方法利用预训练卷积神经网络（CNN）进行视觉特征提取，一个名为LLaMA3的大型语言模型（LLM）进行高级文本分析，以及一个统一特征融合模块（UFFM）来促进鲁棒的跨模态交互。正、负样本生成机制的实施，以完善模型的能力，以辨别微妙的变化在复杂的组件。在工业零件多模态数据集（IPMD）上的实验评估表明，该方法显著提高了分类精度，为复杂产品设计系统的语义理解提供了更精确、更智能的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Advanced Engineering Informatics 工程技术-工程：综合

CiteScore

12.40

自引率

18.20%

发文量

292

审稿时长

45 days

期刊介绍： Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.