Dual data mapping with fine-tuned large language models and asset administration shells toward interoperable knowledge representation

IF 11.4 1区计算机科学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS Robotics and Computer-integrated Manufacturing Pub Date : 2024-07-26 DOI:10.1016/j.rcim.2024.102837

Dachuan Shi , Olga Meyer , Michael Oberle , Thomas Bauernhansl

{"title":"Dual data mapping with fine-tuned large language models and asset administration shells toward interoperable knowledge representation","authors":"Dachuan Shi , Olga Meyer , Michael Oberle , Thomas Bauernhansl","doi":"10.1016/j.rcim.2024.102837","DOIUrl":null,"url":null,"abstract":"<div><p>In the context of Industry 4.0, ensuring the compatibility of digital twins (DTs) with existing software systems in the manufacturing sector presents a significant challenge. The Asset Administration Shell (AAS), conceptualized as the standardized DT for an asset, offers a powerful framework that connects the DT with the established software infrastructure through interoperable knowledge representation. Although the IEC 63278 series specifies the AAS metamodel, it lacks a matching strategy for automating the mapping between proprietary data from existing software and AAS information models. Addressing this gap, we introduce a novel dual data mapping system (DDMS) that utilizes a fine-tuned open-source large language model (LLM) for entity matching. This system facilitates not only the mapping between existing software and AAS models but also between AAS models and standardized vocabulary dictionaries, thereby enhancing the model's semantic interoperability. A case study within the injection molding domain illustrates the practical application of DDMS for the automated creation of AAS instances, seamlessly integrating the manufacturer's existing data. Furthermore, we extensively investigate the potential of fine-tuning decode-only LLMs as generative classifiers and encoding-based classifiers for the entity matching task. To this end, we establish two AAS-specific datasets by collecting and compiling AAS-related resources. In addition, supplementary experiments are performed on general entity-matching benchmark datasets to ensure that our empirical conclusions and insights are generally applicable. The experiment results indicate that the fine-tuned generative LLM classifier achieves slightly better results, while the encoding-based classifier enables much faster inference. Furthermore, the fine-tuned LLM surpasses all state-of-the-art approaches for entity matching, including GPT-4 enhanced with in-context learning and chain of thoughts. This evidence highlights the effectiveness of the proposed DDMS in bridging the interoperability gap within DT applications, offering a scalable solution for the manufacturing industry.</p></div>","PeriodicalId":21452,"journal":{"name":"Robotics and Computer-integrated Manufacturing","volume":"91 ","pages":"Article 102837"},"PeriodicalIF":11.4000,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0736584524001248/pdfft?md5=ce13f8902e3f0294b6ee2bb0f02f505e&pid=1-s2.0-S0736584524001248-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Robotics and Computer-integrated Manufacturing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0736584524001248","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

In the context of Industry 4.0, ensuring the compatibility of digital twins (DTs) with existing software systems in the manufacturing sector presents a significant challenge. The Asset Administration Shell (AAS), conceptualized as the standardized DT for an asset, offers a powerful framework that connects the DT with the established software infrastructure through interoperable knowledge representation. Although the IEC 63278 series specifies the AAS metamodel, it lacks a matching strategy for automating the mapping between proprietary data from existing software and AAS information models. Addressing this gap, we introduce a novel dual data mapping system (DDMS) that utilizes a fine-tuned open-source large language model (LLM) for entity matching. This system facilitates not only the mapping between existing software and AAS models but also between AAS models and standardized vocabulary dictionaries, thereby enhancing the model's semantic interoperability. A case study within the injection molding domain illustrates the practical application of DDMS for the automated creation of AAS instances, seamlessly integrating the manufacturer's existing data. Furthermore, we extensively investigate the potential of fine-tuning decode-only LLMs as generative classifiers and encoding-based classifiers for the entity matching task. To this end, we establish two AAS-specific datasets by collecting and compiling AAS-related resources. In addition, supplementary experiments are performed on general entity-matching benchmark datasets to ensure that our empirical conclusions and insights are generally applicable. The experiment results indicate that the fine-tuned generative LLM classifier achieves slightly better results, while the encoding-based classifier enables much faster inference. Furthermore, the fine-tuned LLM surpasses all state-of-the-art approaches for entity matching, including GPT-4 enhanced with in-context learning and chain of thoughts. This evidence highlights the effectiveness of the proposed DDMS in bridging the interoperability gap within DT applications, offering a scalable solution for the manufacturing industry.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

利用微调大型语言模型和资产管理外壳进行双重数据映射，实现可互操作的知识表示法

在工业 4.0 的背景下，确保数字孪生（DT）与制造业现有软件系统的兼容性是一项重大挑战。资产管理外壳（AAS）的概念是资产的标准化数字孪生，它提供了一个强大的框架，通过可互操作的知识表示将数字孪生与现有的软件基础设施连接起来。虽然 IEC 63278 系列规定了 AAS 元模型，但它缺乏自动映射现有软件专有数据和 AAS 信息模型的匹配策略。为了弥补这一不足，我们引入了一种新颖的双数据映射系统（DDMS），该系统利用经过微调的开源大型语言模型（LLM）进行实体匹配。该系统不仅能促进现有软件与 AAS 模型之间的映射，还能促进 AAS 模型与标准化词汇词典之间的映射，从而增强模型的语义互操作性。注塑成型领域的一个案例研究说明了 DDMS 在自动创建 AAS 实例方面的实际应用，无缝集成了制造商的现有数据。此外，我们还广泛研究了微调解码 LLM 作为生成分类器和基于编码的分类器在实体匹配任务中的潜力。为此，我们通过收集和编译 AAS 相关资源，建立了两个 AAS 专用数据集。此外，我们还在一般实体匹配基准数据集上进行了补充实验，以确保我们的经验结论和见解具有普遍适用性。实验结果表明，经过微调的生成式 LLM 分类器取得了稍好的结果，而基于编码的分类器的推理速度要快得多。此外，经过微调的 LLM 超越了所有最先进的实体匹配方法，包括通过上下文学习和思维链增强的 GPT-4。这些证据凸显了所提出的 DDMS 在弥合 DT 应用程序中的互操作性差距方面的有效性，为制造业提供了一个可扩展的解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Robotics and Computer-integrated Manufacturing 工程技术-工程：制造

CiteScore

24.10

自引率

13.50%

发文量

160

审稿时长

50 days

期刊介绍： The journal, Robotics and Computer-Integrated Manufacturing, focuses on sharing research applications that contribute to the development of new or enhanced robotics, manufacturing technologies, and innovative manufacturing strategies that are relevant to industry. Papers that combine theory and experimental validation are preferred, while review papers on current robotics and manufacturing issues are also considered. However, papers on traditional machining processes, modeling and simulation, supply chain management, and resource optimization are generally not within the scope of the journal, as there are more appropriate journals for these topics. Similarly, papers that are overly theoretical or mathematical will be directed to other suitable journals. The journal welcomes original papers in areas such as industrial robotics, human-robot collaboration in manufacturing, cloud-based manufacturing, cyber-physical production systems, big data analytics in manufacturing, smart mechatronics, machine learning, adaptive and sustainable manufacturing, and other fields involving unique manufacturing technologies.