{"title":"Data Validation Algorithm Based on Vector Rank Analysis","authors":"O. R. Kivchun","doi":"10.17587/it.30.198-205","DOIUrl":null,"url":null,"abstract":"In the process of solving the problem of managing large technical systems, the data obtained from various measuring devices are processed by known methods. On the basis of their analysis, acceptable solutions are formed, and as a result of the choice, the best is made. Some of the data are parametrized and are stochastic, i.e. they are random variables. However, the information for making management decisions must be strictly deterministic. Therefore, the main task of stochastic data processing is to obtain deterministic invariants suitable for use as information in the decision-making process. The article presents an algorithm for verifying data that allows you to determine which type they belong to: Gaussian or non-Gaussian. The results of this test will make it possible to make the right choice of mathematical apparatus for obtaining deterministic invariants. The scientific novelty of the algorithm lies in the fact that the mathematical apparatus of the algorithm is developed within the framework of vector rank analysis. Its essence lies in the fact that a sample is made from the \"general population\" of available data, on which the average and standard are determined. Then a part of the data taken from the \"general population\" is added to this sample, and the average and standard are determined again. Such a procedure for increasing the sample continues until the \"general population\" is completely exhausted. Next, the normalized dependence of the mean and standard values on the sample size is constructed. At the same time, if the dependence has a pronounced tendency to stabilize, then the data belong to the Gaussian type. In another case, they are considered non-Gaussian. The efficiency of the algorithm has been confirmed in the framework of studies of a significant number of samples of data on the power consumption of various large technical systems.","PeriodicalId":504905,"journal":{"name":"Informacionnye Tehnologii","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informacionnye Tehnologii","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17587/it.30.198-205","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the process of solving the problem of managing large technical systems, the data obtained from various measuring devices are processed by known methods. On the basis of their analysis, acceptable solutions are formed, and as a result of the choice, the best is made. Some of the data are parametrized and are stochastic, i.e. they are random variables. However, the information for making management decisions must be strictly deterministic. Therefore, the main task of stochastic data processing is to obtain deterministic invariants suitable for use as information in the decision-making process. The article presents an algorithm for verifying data that allows you to determine which type they belong to: Gaussian or non-Gaussian. The results of this test will make it possible to make the right choice of mathematical apparatus for obtaining deterministic invariants. The scientific novelty of the algorithm lies in the fact that the mathematical apparatus of the algorithm is developed within the framework of vector rank analysis. Its essence lies in the fact that a sample is made from the "general population" of available data, on which the average and standard are determined. Then a part of the data taken from the "general population" is added to this sample, and the average and standard are determined again. Such a procedure for increasing the sample continues until the "general population" is completely exhausted. Next, the normalized dependence of the mean and standard values on the sample size is constructed. At the same time, if the dependence has a pronounced tendency to stabilize, then the data belong to the Gaussian type. In another case, they are considered non-Gaussian. The efficiency of the algorithm has been confirmed in the framework of studies of a significant number of samples of data on the power consumption of various large technical systems.