Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering

IF 9.9 1区 工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Advanced Engineering Informatics Pub Date : 2025-05-01 Epub Date: 2025-03-22 DOI:10.1016/j.aei.2025.103265
Xinxin Liang, Zuoxu Wang, Jihong Liu
{"title":"Towards a self-cognitive complex product design system: A fine-grained multi-modal feature recognition and semantic understanding approach using large language models in mechanical engineering","authors":"Xinxin Liang,&nbsp;Zuoxu Wang,&nbsp;Jihong Liu","doi":"10.1016/j.aei.2025.103265","DOIUrl":null,"url":null,"abstract":"<div><div>Facing the promising tendency of human-artificial intelligence (AI) collaborative product design, fine-grained and multi-modal mechanical part recognition and semantic understanding have become a basic task for achieving a self-cognitive product design system. However, traditional semantic understanding approaches for mechanical parts can only handle single-modal data, which is either textual or image data, resulting in the following limitations 1) insufficient mining on fine-grained part’s functional/behavioral/structural information, and 2) ineffectiveness on multi-modal part information alignment, therefore restricting the intelligence level of the previous product design assistants. To mitigate these challenges, this paper proposes a fine-grained multimodal reasoning approach for mechanical part semantic understanding. The proposed approach utilizes a pre-trained Convolutional Neural Network (CNN) for visual feature extraction, a large language model (LLM) called LLaMA3 for advanced textual analysis, and a Unified Feature Fusion Module (UFFM) to facilitate robust cross-modal interactions. A positive and negative sample generation mechanism is implemented to refine the model’s ability to discern subtle variations in complex components. Experimental evaluations on the Industrial Part Multimodal Dataset (IPMD) demonstrate a significant improvement in classification accuracy, providing a more precise and intelligent solution for the semantic understanding in complex product design systems.</div></div>","PeriodicalId":50941,"journal":{"name":"Advanced Engineering Informatics","volume":"65 ","pages":"Article 103265"},"PeriodicalIF":9.9000,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advanced Engineering Informatics","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1474034625001582","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/22 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Facing the promising tendency of human-artificial intelligence (AI) collaborative product design, fine-grained and multi-modal mechanical part recognition and semantic understanding have become a basic task for achieving a self-cognitive product design system. However, traditional semantic understanding approaches for mechanical parts can only handle single-modal data, which is either textual or image data, resulting in the following limitations 1) insufficient mining on fine-grained part’s functional/behavioral/structural information, and 2) ineffectiveness on multi-modal part information alignment, therefore restricting the intelligence level of the previous product design assistants. To mitigate these challenges, this paper proposes a fine-grained multimodal reasoning approach for mechanical part semantic understanding. The proposed approach utilizes a pre-trained Convolutional Neural Network (CNN) for visual feature extraction, a large language model (LLM) called LLaMA3 for advanced textual analysis, and a Unified Feature Fusion Module (UFFM) to facilitate robust cross-modal interactions. A positive and negative sample generation mechanism is implemented to refine the model’s ability to discern subtle variations in complex components. Experimental evaluations on the Industrial Part Multimodal Dataset (IPMD) demonstrate a significant improvement in classification accuracy, providing a more precise and intelligent solution for the semantic understanding in complex product design systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
走向自我认知的复杂产品设计系统:机械工程中使用大型语言模型的细粒度多模态特征识别和语义理解方法
面对人-人工智能协同产品设计的发展趋势,细粒度、多模态的机械零件识别和语义理解成为实现自认知产品设计系统的基本任务。然而,传统的机械零件语义理解方法只能处理单模态数据,即文本数据或图像数据,因此存在以下局限性:1)对细粒度零件的功能/行为/结构信息挖掘不足;2)对多模态零件信息对齐效果不佳,从而限制了以往产品设计助手的智能水平。为了缓解这些挑战,本文提出了一种用于机械零件语义理解的细粒度多模态推理方法。所提出的方法利用预训练卷积神经网络(CNN)进行视觉特征提取,一个名为LLaMA3的大型语言模型(LLM)进行高级文本分析,以及一个统一特征融合模块(UFFM)来促进鲁棒的跨模态交互。正、负样本生成机制的实施,以完善模型的能力,以辨别微妙的变化在复杂的组件。在工业零件多模态数据集(IPMD)上的实验评估表明,该方法显著提高了分类精度,为复杂产品设计系统的语义理解提供了更精确、更智能的解决方案。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Advanced Engineering Informatics
Advanced Engineering Informatics 工程技术-工程:综合
CiteScore
12.40
自引率
18.20%
发文量
292
审稿时长
45 days
期刊介绍: Advanced Engineering Informatics is an international Journal that solicits research papers with an emphasis on 'knowledge' and 'engineering applications'. The Journal seeks original papers that report progress in applying methods of engineering informatics. These papers should have engineering relevance and help provide a scientific base for more reliable, spontaneous, and creative engineering decision-making. Additionally, papers should demonstrate the science of supporting knowledge-intensive engineering tasks and validate the generality, power, and scalability of new methods through rigorous evaluation, preferably both qualitatively and quantitatively. Abstracting and indexing for Advanced Engineering Informatics include Science Citation Index Expanded, Scopus and INSPEC.
期刊最新文献
Automated generation of assembly schedules for precast building projects under uncertainty using reinforcement learning and Monte Carlo sampling Continual health prognosis of machines via hypergraph topology-aware knowledge preserving and replay Application of GAN-based data augmentation and filtering methods for imbalanced grinding wheel specification classification A physics-informed and stochastic KAN framework for car-following behavior modeling of human-driven vehicles in mixed traffic flow Singularity-free prescribed performance control of a quadrotor UAV for precision agriculture
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1