{"title":"Natural Language Processing and Classification Methods for the Maintenance and Optimization of US Weapon Systems","authors":"Nicola Bruno, Tommy Jun, Henry Tessier","doi":"10.1109/SIEDS.2019.8735587","DOIUrl":null,"url":null,"abstract":"The Logistics Management Institute (LMI) works with the US Department of Defense (DoD) in analyzing maintenance logs on US weapons systems. A major issue in processing this data is determining how to extract useful information from disorganized short-form texts in order to optimize the maintenance of these systems. Unlike text from other corpora, these text entries are only a few words in length and do not conform to lexical convention. LMI has provided a subset of about 10 million of these maintenance logs, each labeled with action-object pairs. The goals of this research are to construct a model that predicts action-object pairs and provide a metric to assess its validity. Prior to analysis, the entries are vectorized by either TFIDF and TSVD, or Word2vec. Several models are applied, including logistic regression, k-NN, SVM, decision trees, LSA, and DBSCAN clustering. Unsupervised models are tested in addition to supervised models due to the ambiguity regarding the validity of the provided ground truth values. The results of these tests yield accuracy scores of about 0.53 for action words and 0.73 for object words. Furthermore, the results from clustering provides evidence for discrepancies in the ground truth values. Taking this into consideration, prior models are adjusted and accuracy scores increased to 0.78 for action words.","PeriodicalId":265421,"journal":{"name":"2019 Systems and Information Engineering Design Symposium (SIEDS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2019.8735587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The Logistics Management Institute (LMI) works with the US Department of Defense (DoD) in analyzing maintenance logs on US weapons systems. A major issue in processing this data is determining how to extract useful information from disorganized short-form texts in order to optimize the maintenance of these systems. Unlike text from other corpora, these text entries are only a few words in length and do not conform to lexical convention. LMI has provided a subset of about 10 million of these maintenance logs, each labeled with action-object pairs. The goals of this research are to construct a model that predicts action-object pairs and provide a metric to assess its validity. Prior to analysis, the entries are vectorized by either TFIDF and TSVD, or Word2vec. Several models are applied, including logistic regression, k-NN, SVM, decision trees, LSA, and DBSCAN clustering. Unsupervised models are tested in addition to supervised models due to the ambiguity regarding the validity of the provided ground truth values. The results of these tests yield accuracy scores of about 0.53 for action words and 0.73 for object words. Furthermore, the results from clustering provides evidence for discrepancies in the ground truth values. Taking this into consideration, prior models are adjusted and accuracy scores increased to 0.78 for action words.