{"title":"通过语义搜索揭示工业文本中的维护见解","authors":"Syed Meesam Raza Naqvi , Mohammad Ghufran , Christophe Varnier , Jean-Marc Nicod , Kamran Javed , Noureddine Zerhouni","doi":"10.1016/j.compind.2024.104083","DOIUrl":null,"url":null,"abstract":"<div><p>Maintenance records in Computerized Maintenance Management Systems (CMMS) contain valuable human knowledge on maintenance activities. These records primarily consist of noisy and unstructured texts written by maintenance experts. The technical nature of the text, combined with a concise writing style and frequent use of abbreviations, makes it difficult to be processed through classical Natural Language Processing (NLP) pipelines. Due to these complexities, this text must be normalized before feeding to classical machine learning models. Developing these custom normalization pipelines requires manual labor and domain expertise and is a time-consuming process that demands constant updates. This leads to the under-utilization of this valuable source of information to generate insights to help with maintenance decision support. This study proposes a Technical Language Processing (TLP) pipeline for semantic search in industrial text using BERT (Bidirectional Encoder Representations), a transformer-based Large Language Model (LLM). The proposed pipeline can automatically process complex unstructured industrial text and does not require custom preprocessing. To adapt the BERT model for the target domain, three unsupervised domain fine-tuning techniques are compared to identify the best strategy for leveraging available tacit knowledge in industrial text. The proposed approach is validated on two industrial maintenance records from the mining and aviation domains. Semantic search results are analyzed from a quantitative and qualitative perspective. Analysis shows that TSDAE, a state-of-the-art unsupervised domain fine-tuning technique, can efficiently identify intricate patterns in the industrial text regardless of associated complexities. BERT model fine-tuned with TSDAE on industrial text achieved a precision of 0.94 and 0.97 for mining excavators and aviation maintenance records, respectively.</p></div>","PeriodicalId":55219,"journal":{"name":"Computers in Industry","volume":"157 ","pages":"Article 104083"},"PeriodicalIF":8.2000,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unlocking maintenance insights in industrial text through semantic search\",\"authors\":\"Syed Meesam Raza Naqvi , Mohammad Ghufran , Christophe Varnier , Jean-Marc Nicod , Kamran Javed , Noureddine Zerhouni\",\"doi\":\"10.1016/j.compind.2024.104083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Maintenance records in Computerized Maintenance Management Systems (CMMS) contain valuable human knowledge on maintenance activities. These records primarily consist of noisy and unstructured texts written by maintenance experts. The technical nature of the text, combined with a concise writing style and frequent use of abbreviations, makes it difficult to be processed through classical Natural Language Processing (NLP) pipelines. Due to these complexities, this text must be normalized before feeding to classical machine learning models. Developing these custom normalization pipelines requires manual labor and domain expertise and is a time-consuming process that demands constant updates. This leads to the under-utilization of this valuable source of information to generate insights to help with maintenance decision support. This study proposes a Technical Language Processing (TLP) pipeline for semantic search in industrial text using BERT (Bidirectional Encoder Representations), a transformer-based Large Language Model (LLM). The proposed pipeline can automatically process complex unstructured industrial text and does not require custom preprocessing. To adapt the BERT model for the target domain, three unsupervised domain fine-tuning techniques are compared to identify the best strategy for leveraging available tacit knowledge in industrial text. The proposed approach is validated on two industrial maintenance records from the mining and aviation domains. Semantic search results are analyzed from a quantitative and qualitative perspective. Analysis shows that TSDAE, a state-of-the-art unsupervised domain fine-tuning technique, can efficiently identify intricate patterns in the industrial text regardless of associated complexities. BERT model fine-tuned with TSDAE on industrial text achieved a precision of 0.94 and 0.97 for mining excavators and aviation maintenance records, respectively.</p></div>\",\"PeriodicalId\":55219,\"journal\":{\"name\":\"Computers in Industry\",\"volume\":\"157 \",\"pages\":\"Article 104083\"},\"PeriodicalIF\":8.2000,\"publicationDate\":\"2024-03-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers in Industry\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0166361524000113\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers in Industry","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0166361524000113","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
Unlocking maintenance insights in industrial text through semantic search
Maintenance records in Computerized Maintenance Management Systems (CMMS) contain valuable human knowledge on maintenance activities. These records primarily consist of noisy and unstructured texts written by maintenance experts. The technical nature of the text, combined with a concise writing style and frequent use of abbreviations, makes it difficult to be processed through classical Natural Language Processing (NLP) pipelines. Due to these complexities, this text must be normalized before feeding to classical machine learning models. Developing these custom normalization pipelines requires manual labor and domain expertise and is a time-consuming process that demands constant updates. This leads to the under-utilization of this valuable source of information to generate insights to help with maintenance decision support. This study proposes a Technical Language Processing (TLP) pipeline for semantic search in industrial text using BERT (Bidirectional Encoder Representations), a transformer-based Large Language Model (LLM). The proposed pipeline can automatically process complex unstructured industrial text and does not require custom preprocessing. To adapt the BERT model for the target domain, three unsupervised domain fine-tuning techniques are compared to identify the best strategy for leveraging available tacit knowledge in industrial text. The proposed approach is validated on two industrial maintenance records from the mining and aviation domains. Semantic search results are analyzed from a quantitative and qualitative perspective. Analysis shows that TSDAE, a state-of-the-art unsupervised domain fine-tuning technique, can efficiently identify intricate patterns in the industrial text regardless of associated complexities. BERT model fine-tuned with TSDAE on industrial text achieved a precision of 0.94 and 0.97 for mining excavators and aviation maintenance records, respectively.
期刊介绍:
The objective of Computers in Industry is to present original, high-quality, application-oriented research papers that:
• Illuminate emerging trends and possibilities in the utilization of Information and Communication Technology in industry;
• Establish connections or integrations across various technology domains within the expansive realm of computer applications for industry;
• Foster connections or integrations across diverse application areas of ICT in industry.