{"title":"一种从异常检测系统计算系统日志消息中提取和编码数据的算法的发展","authors":"G. A. Drachev","doi":"10.17587/it.29.351-359","DOIUrl":null,"url":null,"abstract":"This article is devoted to development of an algorithm for automated analysis and transformation of a log message into a list of features in the form of a fixed-length vector and accumulation of the obtained vectors into a single dataset. The resulted dataset is proposed to be used in machine learning based anomaly detection systems. An additional requirement for the algorithm being developed is the diversity of protocols used to collect log messages in a computer system. These goals were achieved by develop of the software package. The software package collect and parse data from log messages in order to isolate and encode the features from log messages. The software package is enable to collect log messages by several protocols: syslog, SNMP, SQL, reading text and binary files. The data extracted from the log messages of the computing system is considered. The support of LUA scripts for data enrichment is applied. The list of features is generated. The method to encode text data extracted from log messages is proposed. The transformation algorithm of an arbitrary log message into a features vector of fixed dimension is proposed. A methodology for the formation of a dataset for subsequent use in machine learning of the anomaly detection system in a computing system is provided. An example of a dataset storage structure is given.","PeriodicalId":37476,"journal":{"name":"Radioelektronika, Nanosistemy, Informacionnye Tehnologii","volume":"38 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Development of an Algorithm for Extracting and Encoding Data from Log Messages of a Computing System for Anomaly Detection Systems\",\"authors\":\"G. A. Drachev\",\"doi\":\"10.17587/it.29.351-359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article is devoted to development of an algorithm for automated analysis and transformation of a log message into a list of features in the form of a fixed-length vector and accumulation of the obtained vectors into a single dataset. The resulted dataset is proposed to be used in machine learning based anomaly detection systems. An additional requirement for the algorithm being developed is the diversity of protocols used to collect log messages in a computer system. These goals were achieved by develop of the software package. The software package collect and parse data from log messages in order to isolate and encode the features from log messages. The software package is enable to collect log messages by several protocols: syslog, SNMP, SQL, reading text and binary files. The data extracted from the log messages of the computing system is considered. The support of LUA scripts for data enrichment is applied. The list of features is generated. The method to encode text data extracted from log messages is proposed. The transformation algorithm of an arbitrary log message into a features vector of fixed dimension is proposed. A methodology for the formation of a dataset for subsequent use in machine learning of the anomaly detection system in a computing system is provided. An example of a dataset storage structure is given.\",\"PeriodicalId\":37476,\"journal\":{\"name\":\"Radioelektronika, Nanosistemy, Informacionnye Tehnologii\",\"volume\":\"38 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Radioelektronika, Nanosistemy, Informacionnye Tehnologii\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.17587/it.29.351-359\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Materials Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Radioelektronika, Nanosistemy, Informacionnye Tehnologii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17587/it.29.351-359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Materials Science","Score":null,"Total":0}
Development of an Algorithm for Extracting and Encoding Data from Log Messages of a Computing System for Anomaly Detection Systems
This article is devoted to development of an algorithm for automated analysis and transformation of a log message into a list of features in the form of a fixed-length vector and accumulation of the obtained vectors into a single dataset. The resulted dataset is proposed to be used in machine learning based anomaly detection systems. An additional requirement for the algorithm being developed is the diversity of protocols used to collect log messages in a computer system. These goals were achieved by develop of the software package. The software package collect and parse data from log messages in order to isolate and encode the features from log messages. The software package is enable to collect log messages by several protocols: syslog, SNMP, SQL, reading text and binary files. The data extracted from the log messages of the computing system is considered. The support of LUA scripts for data enrichment is applied. The list of features is generated. The method to encode text data extracted from log messages is proposed. The transformation algorithm of an arbitrary log message into a features vector of fixed dimension is proposed. A methodology for the formation of a dataset for subsequent use in machine learning of the anomaly detection system in a computing system is provided. An example of a dataset storage structure is given.
期刊介绍:
Journal “Radioelectronics. Nanosystems. Information Technologies” (abbr RENSIT) publishes original articles, reviews and brief reports, not previously published, on topical problems in radioelectronics (including biomedical) and fundamentals of information, nano- and biotechnologies and adjacent areas of physics and mathematics. The authors of the journal are academicians, corresponding members and foreign members of the Russian Academy of Natural Sciences (RANS) and their colleagues, as well as other russian and foreign authors on the proposal of the members of RANS, which can be obtained by the author before sending articles to the editor or after its arrival on the recommendation of a member of the editorial board or another member of the RANS, who gave the opinion on the article at the request of the editior. The editors will accept articles in both Russian and English languages. Articles are internally peer reviewed (double-blind peer review) by members of the Editorial Board. Some articles undergo external review, if necessary. Designed for researchers, graduate students, physics students of senior courses and teachers. It turns out 2 times a year (that includes 2 rooms)