Domain-adaptation-based named entity recognition with information enrichment for equipment fault knowledge graph

IF 2.5 Q2 ENGINEERING, INDUSTRIAL IET Collaborative Intelligent Manufacturing Pub Date : 2024-11-25 DOI:10.1049/cim2.70003
Dengrui Xiong, Xinyu Li, Liang Gao, Yiping Gao
{"title":"Domain-adaptation-based named entity recognition with information enrichment for equipment fault knowledge graph","authors":"Dengrui Xiong,&nbsp;Xinyu Li,&nbsp;Liang Gao,&nbsp;Yiping Gao","doi":"10.1049/cim2.70003","DOIUrl":null,"url":null,"abstract":"<p>Numerous files, such as records and logs, are generated in the process of equipment diagnosis and maintenance (D&amp;M). These files contain lots of unstructured plain text. Knowledge in these files could be reused for similar equipment faults. In practice, knowledge presented in plain text is hard to acquire. Thus, automated named entity recognition (NER) and relation extraction (RE) methods based on pretrained encoders could be used to extract entities and relations and develop a structured knowledge graph (KG), thus facilitating intelligent manufacturing. However, equipment fault NER exhibits suboptimal performance with existing encoders pretrained on general-domain corpus. In this paper, domain-adaptation-based NER with information enrichment is proposed for developing an equipment fault KG. A domain-adapted encoder is tailored for equipment fault NER through domain-adaptive pretraining (DAPT). Update of word segmentation dictionary and adjustment of masking approach are implemented during DAPT for information enrichment, which helps make the most of the limited domain-specific pretraining corpus. Experimental results show that the F1 score of NER is improved by 1.22% using the domain-adapted encoder compared to its counterpart using the encoder pretrained on general-domain corpus. Furthermore, a reliable and robust question answering (QA) application of the developed equipment fault KG is also shown.</p>","PeriodicalId":33286,"journal":{"name":"IET Collaborative Intelligent Manufacturing","volume":"6 4","pages":""},"PeriodicalIF":2.5000,"publicationDate":"2024-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cim2.70003","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Collaborative Intelligent Manufacturing","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cim2.70003","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0

Abstract

Numerous files, such as records and logs, are generated in the process of equipment diagnosis and maintenance (D&M). These files contain lots of unstructured plain text. Knowledge in these files could be reused for similar equipment faults. In practice, knowledge presented in plain text is hard to acquire. Thus, automated named entity recognition (NER) and relation extraction (RE) methods based on pretrained encoders could be used to extract entities and relations and develop a structured knowledge graph (KG), thus facilitating intelligent manufacturing. However, equipment fault NER exhibits suboptimal performance with existing encoders pretrained on general-domain corpus. In this paper, domain-adaptation-based NER with information enrichment is proposed for developing an equipment fault KG. A domain-adapted encoder is tailored for equipment fault NER through domain-adaptive pretraining (DAPT). Update of word segmentation dictionary and adjustment of masking approach are implemented during DAPT for information enrichment, which helps make the most of the limited domain-specific pretraining corpus. Experimental results show that the F1 score of NER is improved by 1.22% using the domain-adapted encoder compared to its counterpart using the encoder pretrained on general-domain corpus. Furthermore, a reliable and robust question answering (QA) application of the developed equipment fault KG is also shown.

Abstract Image

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于领域适应的命名实体识别与设备故障知识图谱的信息浓缩
在设备诊断和维护 (D&M) 过程中会产生大量文件,如记录和日志。这些文件包含大量非结构化的纯文本。这些文件中的知识可重复用于类似的设备故障。实际上,以纯文本形式呈现的知识很难获取。因此,可以使用基于预训练编码器的自动命名实体识别(NER)和关系提取(RE)方法来提取实体和关系,并开发结构化知识图谱(KG),从而促进智能制造。然而,现有编码器在通用领域语料库上进行预训练后,设备故障 NER 的性能并不理想。本文提出了基于领域适应的 NER 方法,该方法具有信息富集功能,可用于开发设备故障知识图谱。通过领域自适应预训练(DAPT),为设备故障 NER 定制了领域自适应编码器。在 DAPT 期间更新分词字典和调整掩码方法以丰富信息,这有助于充分利用有限的特定领域预训练语料。实验结果表明,与使用通用语料库预训练的编码器相比,使用领域适应编码器的 NER F1 分数提高了 1.22%。此外,还展示了所开发的设备故障 KG 在问题解答(QA)中的可靠和稳健应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IET Collaborative Intelligent Manufacturing
IET Collaborative Intelligent Manufacturing Engineering-Industrial and Manufacturing Engineering
CiteScore
9.10
自引率
2.40%
发文量
25
审稿时长
20 weeks
期刊介绍: IET Collaborative Intelligent Manufacturing is a Gold Open Access journal that focuses on the development of efficient and adaptive production and distribution systems. It aims to meet the ever-changing market demands by publishing original research on methodologies and techniques for the application of intelligence, data science, and emerging information and communication technologies in various aspects of manufacturing, such as design, modeling, simulation, planning, and optimization of products, processes, production, and assembly. The journal is indexed in COMPENDEX (Elsevier), Directory of Open Access Journals (DOAJ), Emerging Sources Citation Index (Clarivate Analytics), INSPEC (IET), SCOPUS (Elsevier) and Web of Science (Clarivate Analytics).
期刊最新文献
Domain-adaptation-based named entity recognition with information enrichment for equipment fault knowledge graph Welding defect detection with image processing on a custom small dataset: A comparative study A novel deep reinforcement learning-based algorithm for multi-objective energy-efficient flow-shop scheduling Spiking neural network tactile classification method with faster and more accurate membrane potential representation Digital twin-based production logistics resource optimisation configuration method in smart cloud manufacturing environment
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1