识别故障模式的文本挖掘技术

IF 2.4 Q3 ENGINEERING, INDUSTRIAL Journal of Quality in Maintenance Engineering Pub Date : 2023-02-06 DOI:10.1108/jqme-02-2020-0012

Francina Malan, J. L. Jooste

{"title":"识别故障模式的文本挖掘技术","authors":"Francina Malan, J. L. Jooste","doi":"10.1108/jqme-02-2020-0012","DOIUrl":null,"url":null,"abstract":"PurposeThe purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.Design/methodology/approachThe paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.FindingsFrom the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.Originality/valueThis work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.","PeriodicalId":16938,"journal":{"name":"Journal of Quality in Maintenance Engineering","volume":" ","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text mining techniques for identifying failure modes\",\"authors\":\"Francina Malan, J. L. Jooste\",\"doi\":\"10.1108/jqme-02-2020-0012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeThe purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.Design/methodology/approachThe paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.FindingsFrom the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.Originality/valueThis work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.\",\"PeriodicalId\":16938,\"journal\":{\"name\":\"Journal of Quality in Maintenance Engineering\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2023-02-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Quality in Maintenance Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/jqme-02-2020-0012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Quality in Maintenance Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/jqme-02-2020-0012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

摘要

目的本文的目的是比较各种文本挖掘技术的有效性，这些技术可用于将维修工单记录分类为各自的故障模式，重点是算法的选择和预处理转换。评估了三种算法，即伯努利-朴素贝叶斯、多项式-朴素贝叶斯和支持向量机。设计/方法论/方法这篇论文既有理论部分，也有实验部分。在文献综述中，从三个角度考虑了文本分类中使用的各种算法和预处理技术：特定领域的维护文献、更广泛的短格式文献和一般的文本分类文献。实验组件由5×2嵌套交叉验证组成，内部优化循环使用随机搜索程序执行。发现从文献综述来看，受短文档长度影响最大的方面是特征表示方案、高阶n-gram、文档长度归一化、词干、停止词去除和算法选择。然而，从实验分析来看，预处理转换的选择似乎更多地取决于特定的算法，而不是短文档长度。多项式Naïve Bayes的性能略好于其他算法，但总体而言，优化模型的性能相当。原创性/价值这项工作强调了模型优化的重要性，包括预处理变换的选择。优化不仅大大提高了所有算法的性能，而且还影响了模型比较，多项式朴素贝叶斯算法从最差的算法变成了性能最好的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Text mining techniques for identifying failure modes

PurposeThe purpose of this paper is to compare the effectiveness of the various text mining techniques that can be used to classify maintenance work-order records into their respective failure modes, focussing on the choice of algorithm and preprocessing transforms. Three algorithms are evaluated, namely Bernoulli Naïve Bayes, multinomial Naïve Bayes and support vector machines.Design/methodology/approachThe paper has both a theoretical and experimental component. In the literature review, the various algorithms and preprocessing techniques used in text classification is considered from three perspectives: the domain-specific maintenance literature, the broader short-form literature and the general text classification literature. The experimental component consists of a 5 × 2 nested cross-validation with an inner optimisation loop performed using a randomised search procedure.FindingsFrom the literature review, the aspects most affected by short document length are identified as the feature representation scheme, higher-order n-grams, document length normalisation, stemming, stop-word removal and algorithm selection. However, from the experimental analysis, the selection of preprocessing transforms seemed more dependent on the particular algorithm than on short document length. Multinomial Naïve Bayes performs marginally better than the other algorithms, but overall, the performances of the optimised models are comparable.Originality/valueThis work highlights the importance of model optimisation, including the selection of preprocessing transforms. Not only did the optimisation improve the performance of all the algorithms substantially, but it also affects model comparisons, with multinomial Naïve Bayes going from the worst to the best performing algorithm.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Quality in Maintenance Engineering Engineering-Safety, Risk, Reliability and Quality

CiteScore

4.00

自引率

13.30%

发文量

期刊介绍： This exciting journal looks at maintenance engineering from a positive standpoint, and clarifies its recently elevatedstatus as a highly technical, scientific, and complex field. Typical areas examined include: ■Budget and control ■Equipment management ■Maintenance information systems ■Process capability and maintenance ■Process monitoring techniques ■Reliability-based maintenance ■Replacement and life cycle costs ■TQM and maintenance