{"title":"美国武器系统维护与优化的自然语言处理与分类方法","authors":"Nicola Bruno, Tommy Jun, Henry Tessier","doi":"10.1109/SIEDS.2019.8735587","DOIUrl":null,"url":null,"abstract":"The Logistics Management Institute (LMI) works with the US Department of Defense (DoD) in analyzing maintenance logs on US weapons systems. A major issue in processing this data is determining how to extract useful information from disorganized short-form texts in order to optimize the maintenance of these systems. Unlike text from other corpora, these text entries are only a few words in length and do not conform to lexical convention. LMI has provided a subset of about 10 million of these maintenance logs, each labeled with action-object pairs. The goals of this research are to construct a model that predicts action-object pairs and provide a metric to assess its validity. Prior to analysis, the entries are vectorized by either TFIDF and TSVD, or Word2vec. Several models are applied, including logistic regression, k-NN, SVM, decision trees, LSA, and DBSCAN clustering. Unsupervised models are tested in addition to supervised models due to the ambiguity regarding the validity of the provided ground truth values. The results of these tests yield accuracy scores of about 0.53 for action words and 0.73 for object words. Furthermore, the results from clustering provides evidence for discrepancies in the ground truth values. Taking this into consideration, prior models are adjusted and accuracy scores increased to 0.78 for action words.","PeriodicalId":265421,"journal":{"name":"2019 Systems and Information Engineering Design Symposium (SIEDS)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Natural Language Processing and Classification Methods for the Maintenance and Optimization of US Weapon Systems\",\"authors\":\"Nicola Bruno, Tommy Jun, Henry Tessier\",\"doi\":\"10.1109/SIEDS.2019.8735587\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Logistics Management Institute (LMI) works with the US Department of Defense (DoD) in analyzing maintenance logs on US weapons systems. A major issue in processing this data is determining how to extract useful information from disorganized short-form texts in order to optimize the maintenance of these systems. Unlike text from other corpora, these text entries are only a few words in length and do not conform to lexical convention. LMI has provided a subset of about 10 million of these maintenance logs, each labeled with action-object pairs. The goals of this research are to construct a model that predicts action-object pairs and provide a metric to assess its validity. Prior to analysis, the entries are vectorized by either TFIDF and TSVD, or Word2vec. Several models are applied, including logistic regression, k-NN, SVM, decision trees, LSA, and DBSCAN clustering. Unsupervised models are tested in addition to supervised models due to the ambiguity regarding the validity of the provided ground truth values. The results of these tests yield accuracy scores of about 0.53 for action words and 0.73 for object words. Furthermore, the results from clustering provides evidence for discrepancies in the ground truth values. Taking this into consideration, prior models are adjusted and accuracy scores increased to 0.78 for action words.\",\"PeriodicalId\":265421,\"journal\":{\"name\":\"2019 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"54 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS.2019.8735587\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS.2019.8735587","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Natural Language Processing and Classification Methods for the Maintenance and Optimization of US Weapon Systems
The Logistics Management Institute (LMI) works with the US Department of Defense (DoD) in analyzing maintenance logs on US weapons systems. A major issue in processing this data is determining how to extract useful information from disorganized short-form texts in order to optimize the maintenance of these systems. Unlike text from other corpora, these text entries are only a few words in length and do not conform to lexical convention. LMI has provided a subset of about 10 million of these maintenance logs, each labeled with action-object pairs. The goals of this research are to construct a model that predicts action-object pairs and provide a metric to assess its validity. Prior to analysis, the entries are vectorized by either TFIDF and TSVD, or Word2vec. Several models are applied, including logistic regression, k-NN, SVM, decision trees, LSA, and DBSCAN clustering. Unsupervised models are tested in addition to supervised models due to the ambiguity regarding the validity of the provided ground truth values. The results of these tests yield accuracy scores of about 0.53 for action words and 0.73 for object words. Furthermore, the results from clustering provides evidence for discrepancies in the ground truth values. Taking this into consideration, prior models are adjusted and accuracy scores increased to 0.78 for action words.