首页 > 最新文献

Radio Electronics Computer Science Control最新文献

英文 中文
COMPARISON OF SHORT-TERM FORECASTING METHODS OF ELECTRICITY CONSUMPTION IN MICROGRIDS 微电网用电量短期预测方法比较
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-02-24 DOI: 10.15588/1607-3274-2023-1-2
Yuliia Parfenenko, V. Shendryk, Yevhenii Kholiavka, P. M. Pavlenko
Context. The current stage of development of the electric power industry is characterized by an intensive process of microgrid development and management. The feasibility of using a microgrid is determined by the fact that it has a number of advantages compared to classical methods of energy generation, transmission, and distribution. It is much easier to ensure the reliability of electricity supply within the microgrid than in large energy systems. Energy consumers in a microgrid can affect the power balancing process by regulating their loads, generating, storing, and releasing electricity. One of the main tasks of the microgrid is to provide consumers with electrical energy in a balance between its generation and consumption. This is achieved thanks to the intelligent management of the microgrid operation, which uses energy consumption forecasting data. This allows to increase the efficiency of energy infrastructure management. Objective. The purpose of this work is to develop short-term electricity consumption forecasting models for various types of microgrid electricity consumers, which will improve the efficiency of energy infrastructure management and reduce electricity consumption. Method. The SARIMA autoregressive model and the LSTM machine learning model are used to obtain forecast values of electricity consumption. AIC and BIC information criteria are used to compare autoregressive models. The accuracy of forecasting models is evaluated using MAE, RMSE, MAPE errors. Results. The experiments that forecast the amount of electricity consumption for the different types of consumers were conducted. Forecasting was carried out for both LSTM and AR models on formed data sets at intervals of 6 hours, 1 day, and 3 days. The forecasting results of the LSTM model met the forecasting requirements, providing better forecasting quality compared to AR models. Conclusions. The conducted study of electricity consumption forecasting made it possible to find universal forecasting models that meet the requirements of forecasting quality. A comparative analysis of developed time series forecasting models was performed, as a result of which the advantages of ML models over AR models were revealed. The predictive quality of the LSTM model showed the accuracy of the MAPE of forecasting electricity consumption for a private house – 0.1%, a dairy plant – 3.74%, and a gas station – 3.67%. The obtained results will allow to increase the efficiency of microgrid management, the distribution of electricity between electricity consumers to reduce the amount of energy consumption and prevent peak loads on the power grid.
上下文。当前电力工业发展阶段的特点是微网建设和管理的集约化过程。使用微电网的可行性是由这样一个事实决定的:与传统的能源产生、传输和分配方法相比,它具有许多优势。与大型能源系统相比,确保微电网供电的可靠性要容易得多。微电网中的能源消费者可以通过调节其负载、发电、储存和释放电力来影响电力平衡过程。微电网的主要任务之一是为消费者提供发电和消费平衡的电能。这要归功于微电网运行的智能管理,它使用了能耗预测数据。这可以提高能源基础设施管理的效率。目标。本工作旨在建立各类微电网用电用户的短期用电量预测模型,提高能源基础设施管理效率,降低用电量。方法。利用SARIMA自回归模型和LSTM机器学习模型获得电力消耗预测值。采用AIC和BIC信息准则对自回归模型进行比较。利用MAE、RMSE、MAPE误差对预测模型的精度进行了评价。结果。对不同类型的消费者进行了电量预测实验。LSTM和AR模型在形成的数据集上分别以6小时、1天和3天的间隔进行预测。LSTM模型的预测结果满足预测要求,与AR模型相比具有更好的预测质量。结论。通过对电力消费预测的研究,可以找到满足预测质量要求的通用预测模型。对已开发的时间序列预测模型进行了比较分析,结果显示ML模型优于AR模型。LSTM模型的预测质量表明,预测私人住宅用电量的MAPE准确率为0.1%,乳品厂为3.74%,加油站为3.67%。所获得的结果将允许提高微电网管理的效率,在电力消费者之间分配电力,以减少能源消耗并防止电网的峰值负荷。
{"title":"COMPARISON OF SHORT-TERM FORECASTING METHODS OF ELECTRICITY CONSUMPTION IN MICROGRIDS","authors":"Yuliia Parfenenko, V. Shendryk, Yevhenii Kholiavka, P. M. Pavlenko","doi":"10.15588/1607-3274-2023-1-2","DOIUrl":"https://doi.org/10.15588/1607-3274-2023-1-2","url":null,"abstract":"Context. The current stage of development of the electric power industry is characterized by an intensive process of microgrid development and management. The feasibility of using a microgrid is determined by the fact that it has a number of advantages compared to classical methods of energy generation, transmission, and distribution. It is much easier to ensure the reliability of electricity supply within the microgrid than in large energy systems. Energy consumers in a microgrid can affect the power balancing process by regulating their loads, generating, storing, and releasing electricity. One of the main tasks of the microgrid is to provide consumers with electrical energy in a balance between its generation and consumption. This is achieved thanks to the intelligent management of the microgrid operation, which uses energy consumption forecasting data. This allows to increase the efficiency of energy infrastructure management. \u0000Objective. The purpose of this work is to develop short-term electricity consumption forecasting models for various types of microgrid electricity consumers, which will improve the efficiency of energy infrastructure management and reduce electricity consumption. \u0000Method. The SARIMA autoregressive model and the LSTM machine learning model are used to obtain forecast values of electricity consumption. AIC and BIC information criteria are used to compare autoregressive models. The accuracy of forecasting models is evaluated using MAE, RMSE, MAPE errors. \u0000Results. The experiments that forecast the amount of electricity consumption for the different types of consumers were conducted. Forecasting was carried out for both LSTM and AR models on formed data sets at intervals of 6 hours, 1 day, and 3 days. The forecasting results of the LSTM model met the forecasting requirements, providing better forecasting quality compared to AR models. \u0000Conclusions. The conducted study of electricity consumption forecasting made it possible to find universal forecasting models that meet the requirements of forecasting quality. A comparative analysis of developed time series forecasting models was performed, as a result of which the advantages of ML models over AR models were revealed. The predictive quality of the LSTM model showed the accuracy of the MAPE of forecasting electricity consumption for a private house – 0.1%, a dairy plant – 3.74%, and a gas station – 3.67%. The obtained results will allow to increase the efficiency of microgrid management, the distribution of electricity between electricity consumers to reduce the amount of energy consumption and prevent peak loads on the power grid.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"68 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2023-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86073198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
SYNTHESIS OF THE SYMBOLOGIES OF MULTICOLOR INTERFERENCE-RESISTANT BAR CODES ON THE BASE OF MULTI-VALUED BCH CODES 基于多值BCH码的多色抗干扰条形码符号合成
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-15 DOI: 10.15588/1607-3274-2022-4-9
Ye. S. Sulema, L. V. Drozdenko, A. Dychka
Context. The problem of constructing a set of barcode patterns for multicolor barcodes that are resistant to distortions of one or two elements within each pattern is considered. Objective. The goal of the work is ensuring the reliability of the reading of multi-color barcode images. Method. A multicolor barcode pattern has the property of interference immunity if its digital equivalent (vector) is a codeword of a multi-valued (non-binary) correcting code capable to correct errors (distortions of the pattern elements). It is shown that the construction of barcode patterns should be performed on the basis of a multi-valued correcting BCH code capable to correct two errors. A method is proposed for constructing a set of interference-resistant barcode patterns of a given capacity, which ensure reliable reproduction of data when they are read from a carrier. A procedure for encoding data with a multi-valued BCH code based on the generator matrix of the code using operations by the modulo of a prime number has been developed. A new method of constructing the check matrix of the multivalued BCH code based on the vector representation of the elements of the finite field is proposed. A generalized algorithm for generating symbologies of a multi-color barcode with the possibility of correcting double errors in barcode patterns has been developed. The method also makes it possible to build symbology of a given capacity based on shortened BCH codes. A method of reducing the generator and check matrices of a multi-valued full BCH code to obtain a shortened code of a given length is proposed. It is shown that, in addition to correction double errors, multi-valued BCH codes also make it possible to detect errors of higher multiplicity – this property is enhanced when using shortened BCH codes. The method provides for the construction of a family of multicolor noise-immune barcodes. Results. On the basis of the developed software tools, statistical data were obtained that characterize the ability of multi-valued BCH codes to detect and correct errors, and on their basis to design multi-color interference-resistant bar codes. Conclusions. The conducted experiments have confirmed the operability of the proposed algorithmic tools and allow to recommend it for use in practice for developing interference-resistant multi-color barcodes in automatic identification systems.
上下文。考虑了多色条形码的一组条形码模式的构造问题,这些条形码模式能够抵抗图案中一个或两个元素的扭曲。目标。工作的目标是保证多色条码图像读取的可靠性。方法。如果多色条码图案的数字等效(矢量)是能够纠正错误(图案元素的扭曲)的多值(非二进制)校正码的码字,则该多色条码图案具有抗干扰性。结果表明,条形码模式的构建应基于能够纠正两个错误的多值校正BCH码。提出了一种构造一组给定容量的抗干扰条码模式的方法,该方法可确保从载波读取数据时可靠地再现数据。一个程序编码数据与多值BCH码基于代码的生成器矩阵使用操作的模数素数已开发。提出了一种基于有限域元素向量表示构造多值BCH码校验矩阵的新方法。提出了一种用于生成多色条形码符号的广义算法,该算法具有纠正条形码模式双重错误的可能性。该方法还可以基于缩短的BCH代码构建给定容量的符号学。提出了一种简化多值全BCH码的生成矩阵和校验矩阵以得到给定长度的缩短码的方法。研究表明,除了校正双重错误外,多值BCH码还可以检测更高多重性的错误——当使用缩短的BCH码时,这种特性得到增强。该方法提供了一个多色抗噪声条形码家族的构建。结果。在开发的软件工具的基础上,获得了表征多值BCH码检测和纠错能力的统计数据,并在此基础上设计了多色抗干扰条形码。结论。所进行的实验证实了所提出的算法工具的可操作性,并允许将其推荐用于在自动识别系统中开发抗干扰多色条形码的实践中。
{"title":"SYNTHESIS OF THE SYMBOLOGIES OF MULTICOLOR INTERFERENCE-RESISTANT BAR CODES ON THE BASE OF MULTI-VALUED BCH CODES","authors":"Ye. S. Sulema, L. V. Drozdenko, A. Dychka","doi":"10.15588/1607-3274-2022-4-9","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-9","url":null,"abstract":"Context. The problem of constructing a set of barcode patterns for multicolor barcodes that are resistant to distortions of one or two elements within each pattern is considered. \u0000Objective. The goal of the work is ensuring the reliability of the reading of multi-color barcode images. \u0000Method. A multicolor barcode pattern has the property of interference immunity if its digital equivalent (vector) is a codeword of a multi-valued (non-binary) correcting code capable to correct errors (distortions of the pattern elements). It is shown that the construction of barcode patterns should be performed on the basis of a multi-valued correcting BCH code capable to correct two errors. A method is proposed for constructing a set of interference-resistant barcode patterns of a given capacity, which ensure reliable reproduction of data when they are read from a carrier. A procedure for encoding data with a multi-valued BCH code based on the generator matrix of the code using operations by the modulo of a prime number has been developed. A new method of constructing the check matrix of the multivalued BCH code based on the vector representation of the elements of the finite field is proposed. A generalized algorithm for generating symbologies of a multi-color barcode with the possibility of correcting double errors in barcode patterns has been developed. The method also makes it possible to build symbology of a given capacity based on shortened BCH codes. A method of reducing the generator and check matrices of a multi-valued full BCH code to obtain a shortened code of a given length is proposed. It is shown that, in addition to correction double errors, multi-valued BCH codes also make it possible to detect errors of higher multiplicity – this property is enhanced when using shortened BCH codes. The method provides for the construction of a family of multicolor noise-immune barcodes. \u0000Results. On the basis of the developed software tools, statistical data were obtained that characterize the ability of multi-valued BCH codes to detect and correct errors, and on their basis to design multi-color interference-resistant bar codes. \u0000Conclusions. The conducted experiments have confirmed the operability of the proposed algorithmic tools and allow to recommend it for use in practice for developing interference-resistant multi-color barcodes in automatic identification systems.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"3 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80719738","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
INFORMATION TECHNOLOGY OF TRANSPORT INFRASTRUCTURE MONITORING BASED ON REMOTE SENSING DATA 基于遥感数据的交通基础设施监测信息技术
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-15 DOI: 10.15588/1607-3274-2022-4-7
S. Danshyna, A. Nechausov, S. Andrieiev
Context. In the light of current road network monitoring practices, this study aims to explore the capability of remote sensing technologies to solve the problems of increasing the objectivity of preliminary evaluations of the condition of the infrastructure as a whole. The object of the study was to process the monitoring of transport infrastructure (TI) to find ways to improve it in the implementation of development projects. Objective. The goal of the work is to increase objectivity of decision-making on the evaluation, reconstruction, development of the transport network structure due to the visual presentation and disclosure of open data for monitoring the transport value. Method. Existing approaches to TI monitoring and evaluating its condition are analyzed. The identified shortcomings, as well as the development of remote sensing technologies, open up prospects for the use of remote sensing data in the TI monitoring process. A set-theoretic model of the monitoring process information flows is proposed, the consistent refinement of the elements of which made it possible to develop information technology (IT). Formation of a set of input and output parameters of IT, the set of its operations, their representation with IDEFX-models set explains how a set of heterogeneous (graphic, text, digital, cartographic, etc.) data about TI elements coming from different sources are processed and presented to support decision-making on the survey of existing infrastructure and its improvement. The developed IT makes it possible to obtain complex indicators for analyzing the TI of a particular area, to solve the problems of inventorying objects, TI and its elements modeling, taking into account the physical and geographical location, which makes it possible to consider it as an auxiliary tool that complements existing methods of TI monitoring. Results. The developed IT was studied in solving the problem of monitoring the TI section of the Kharkiv region using satellite imagery of medium (Sentinel–2) and high (SuperView-1) resolution and the results of laser survey of the road bridge across the river Mzha (as an element of infrastructure). Conclusions. The conducted experiments confirmed the operability of the proposed information technology and showed expediency of its practical use in solving the problems of obtaining generalizing characteristics of the infrastructure, inventory of TI objects and their modeling. This opens up opportunities for substantiating project decisions for the reconstruction of the transport network and planning procedures for examining its condition. Prospects for further research may include: creating reference models of TI objects, expanding the table of decryption signs of road transport infrastructure objects, integrating remote data, survey results of TI sections and engineering surveys of objects to obtain evaluations of the condition of TI in general.
上下文。结合当前路网监测实践,本研究旨在探索遥感技术在提高基础设施整体状况初步评价客观性方面的能力。这项研究的目的是处理对运输基础设施的监测,以便在实施发展项目时找到改进它的方法。目标。这项工作的目标是通过可视化展示和公开监测运输价值的数据,提高交通网络结构评估、重建和发展决策的客观性。方法。分析了现有的TI监测和评估方法。已确定的缺点以及遥感技术的发展为在TI监测过程中使用遥感数据开辟了前景。提出了一种监控过程信息流的集合论模型,该模型对其要素进行了一致的细化,使信息技术的发展成为可能。IT的一组输入和输出参数的形成,它的操作集合,它们用idefx模型集的表示解释了如何处理和呈现来自不同来源的关于TI元素的一组异构(图形、文本、数字、地图等)数据,以支持对现有基础设施的调查及其改进的决策。发达的IT可以获得复杂的指标来分析特定地区的TI,解决对象的盘点问题,考虑到物理和地理位置的TI及其元素建模,这使得它可以作为一种辅助工具来补充现有的TI监测方法。结果。利用中分辨率(Sentinel-2)和高分辨率(SuperView-1)卫星图像和横跨Mzha河的公路桥(作为基础设施的组成部分)的激光测量结果,研究了开发的信息技术,以解决监测哈尔科夫地区TI部分的问题。结论。实验验证了所提出的信息技术的可操作性,并显示了其在解决基础设施的泛化特征、TI对象的清单及其建模等问题上的便捷性。这为重建交通网络的项目决策和检查其状况的规划程序提供了机会。进一步研究的前景可能包括:建立TI对象的参考模型,扩展道路交通基础设施对象的解密标志表,整合远程数据、TI断面调查结果和对对象的工程调查结果,获得对TI总体状况的评估。
{"title":"INFORMATION TECHNOLOGY OF TRANSPORT INFRASTRUCTURE MONITORING BASED ON REMOTE SENSING DATA","authors":"S. Danshyna, A. Nechausov, S. Andrieiev","doi":"10.15588/1607-3274-2022-4-7","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-7","url":null,"abstract":"Context. In the light of current road network monitoring practices, this study aims to explore the capability of remote sensing technologies to solve the problems of increasing the objectivity of preliminary evaluations of the condition of the infrastructure as a whole. The object of the study was to process the monitoring of transport infrastructure (TI) to find ways to improve it in the implementation of development projects. \u0000Objective. The goal of the work is to increase objectivity of decision-making on the evaluation, reconstruction, development of the transport network structure due to the visual presentation and disclosure of open data for monitoring the transport value. \u0000Method. Existing approaches to TI monitoring and evaluating its condition are analyzed. The identified shortcomings, as well as the development of remote sensing technologies, open up prospects for the use of remote sensing data in the TI monitoring process. A set-theoretic model of the monitoring process information flows is proposed, the consistent refinement of the elements of which made it possible to develop information technology (IT). Formation of a set of input and output parameters of IT, the set of its operations, their representation with IDEFX-models set explains how a set of heterogeneous (graphic, text, digital, cartographic, etc.) data about TI elements coming from different sources are processed and presented to support decision-making on the survey of existing infrastructure and its improvement. The developed IT makes it possible to obtain complex indicators for analyzing the TI of a particular area, to solve the problems of inventorying objects, TI and its elements modeling, taking into account the physical and geographical location, which makes it possible to consider it as an auxiliary tool that complements existing methods of TI monitoring. \u0000Results. The developed IT was studied in solving the problem of monitoring the TI section of the Kharkiv region using satellite imagery of medium (Sentinel–2) and high (SuperView-1) resolution and the results of laser survey of the road bridge across the river Mzha (as an element of infrastructure). \u0000Conclusions. The conducted experiments confirmed the operability of the proposed information technology and showed expediency of its practical use in solving the problems of obtaining generalizing characteristics of the infrastructure, inventory of TI objects and their modeling. This opens up opportunities for substantiating project decisions for the reconstruction of the transport network and planning procedures for examining its condition. Prospects for further research may include: creating reference models of TI objects, expanding the table of decryption signs of road transport infrastructure objects, integrating remote data, survey results of TI sections and engineering surveys of objects to obtain evaluations of the condition of TI in general.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"76 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86307667","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
REWRITING IDENTIFICATION TECHNOLOGY FOR TEXT CONTENT BASED ON MACHINE LEARNING METHODS 基于机器学习方法的文本内容重写识别技术
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-13 DOI: 10.15588/1607-3274-2022-4-11
N. Kholodna, V. Vysotska
Context. Paraphrased textual content or rewriting is one of the difficult problems of detecting academic plagiarism. Most plagiarism detection systems are designed to detect common words, sequences of linguistic units, and minor changes, but are unable to detect significant semantic and structural changes. Therefore, most cases of plagiarism using paraphrasing remain unnoticed. Objective of the study is to develop a technology for detecting paraphrasing in text based on a classification model and machine learning methods through the use of Siamese neural network based on recurrent and Transformer type – RoBERTa to analyze the level of similarity of sentences of text content. Method. For this study, the following semantic similarity metrics or indicators were chosen as features: Jacquard coefficient for shared N-grams, cosine distance between vector representations of sentences, Word Mover’s Distance, distances according to WordNet dictionaries, prediction of two ML models: Siamese neural network based on recurrent and Transformer type - RoBERTa. Results. An intelligent system for detecting paraphrasing in text based on a classification model and machine learning methods has been developed. The developed system uses the principle of model stacking and feature engineering. Additional features indicate the semantic affiliation of the sentences or the normalized number of common N-grams. An additional fine-tuned RoBERTa neural network (with additional fully connected layers) is less sensitive to pairs of sentences that are not paraphrases of each other. This specificity of the model may contribute to incorrect accusations of plagiarism or incorrect association of user-generated content. Additional features increase both the overall classification accuracy and the model’s sensitivity to pairs of sentences that are not paraphrases of each other. Conclusions. The created model shows excellent classification results on PAWS test data: precision – 93%, recall – 92%, F1score – 92%, accuracy – 92%. The results of the study showed that Transformer-type NNs can be successfully applied to detect paraphrasing in a pair of texts with fairly high accuracy without the need for additional feature generation.
上下文。文章内容的意译或改写是学术剽窃检测的难点之一。大多数抄袭检测系统的设计目的是检测常用词、语言单位序列和微小的变化,但无法检测重大的语义和结构变化。因此,大多数使用释义的抄袭案例都没有被注意到。本研究的目的是开发一种基于分类模型和机器学习方法的文本释义检测技术,通过使用基于recurrent和Transformer类型的Siamese神经网络- RoBERTa来分析文本内容句子的相似程度。方法。本研究选择以下语义相似度度量或指标作为特征:共享n -gram的Jacquard系数、句子向量表示之间的余弦距离、Word Mover’s distance、根据WordNet字典的距离、两种ML模型的预测:基于recurrent和Transformer类型的Siamese神经网络- RoBERTa。结果。提出了一种基于分类模型和机器学习方法的智能文本释义检测系统。开发的系统采用了模型叠加和特征工程的原理。附加特征表示句子的语义关联或公共n -gram的规范化数量。一个额外的微调RoBERTa神经网络(具有额外的全连接层)对不是相互转述的句子对不那么敏感。这种模型的特殊性可能会导致对剽窃的错误指控或用户生成内容的错误关联。额外的特征增加了整体分类的准确性和模型对不是彼此转述的句子对的敏感性。结论。所创建的模型在PAWS测试数据上显示出优异的分类结果:准确率- 93%,召回率- 92%,F1score - 92%,准确率- 92%。研究结果表明,transformer类型的神经网络可以成功地应用于检测一对文本中的释义,并且具有相当高的准确性,而无需额外的特征生成。
{"title":"REWRITING IDENTIFICATION TECHNOLOGY FOR TEXT CONTENT BASED ON MACHINE LEARNING METHODS","authors":"N. Kholodna, V. Vysotska","doi":"10.15588/1607-3274-2022-4-11","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-11","url":null,"abstract":"Context. Paraphrased textual content or rewriting is one of the difficult problems of detecting academic plagiarism. Most plagiarism detection systems are designed to detect common words, sequences of linguistic units, and minor changes, but are unable to detect significant semantic and structural changes. Therefore, most cases of plagiarism using paraphrasing remain unnoticed. \u0000Objective of the study is to develop a technology for detecting paraphrasing in text based on a classification model and machine learning methods through the use of Siamese neural network based on recurrent and Transformer type – RoBERTa to analyze the level of similarity of sentences of text content. \u0000Method. For this study, the following semantic similarity metrics or indicators were chosen as features: Jacquard coefficient for shared N-grams, cosine distance between vector representations of sentences, Word Mover’s Distance, distances according to WordNet dictionaries, prediction of two ML models: Siamese neural network based on recurrent and Transformer type - RoBERTa. \u0000Results. An intelligent system for detecting paraphrasing in text based on a classification model and machine learning methods has been developed. The developed system uses the principle of model stacking and feature engineering. Additional features indicate the semantic affiliation of the sentences or the normalized number of common N-grams. An additional fine-tuned RoBERTa neural network (with additional fully connected layers) is less sensitive to pairs of sentences that are not paraphrases of each other. This specificity of the model may contribute to incorrect accusations of plagiarism or incorrect association of user-generated content. Additional features increase both the overall classification accuracy and the model’s sensitivity to pairs of sentences that are not paraphrases of each other. \u0000Conclusions. The created model shows excellent classification results on PAWS test data: precision – 93%, recall – 92%, F1score – 92%, accuracy – 92%. The results of the study showed that Transformer-type NNs can be successfully applied to detect paraphrasing in a pair of texts with fairly high accuracy without the need for additional feature generation.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"76 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90996717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RISK ASSESSMENT MODELING OF ERP-SYSTEMS erp系统的风险评估建模
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-13 DOI: 10.15588/1607-3274-2022-4-12
A. D. Kozhukhivskyi, O. Kozhukhivska
Context. Because assessing security risks is a complex and complete uncertainty process, and uncertainties are a major factor influencing valuation performance, it is advisable to use fuzzy methods and models that are adaptive to noncomputed data. The formation of vague assessments of risk factors is subjective, and risk assessment depends on the practical results obtained in the process of processing the risks of threats that have already arisen during the functioning of the organization and experience of security professionals. Therefore, it will be advisable to use models that can ade-quately assess fuzzy factors and have the ability to adjust their impact on risk assessment. The greatest performance indicators for solving such problems are neuro-fuzzy models that combine methods of fuzzy logic and artificial neural networks and systems, i.e. “human-like” style of considerations of fuzzy systems with training and simulation of mental phenomena of neural networks. To build a model for calculating the risk assessment of security, it is proposed to use a fuzzy product model. Fuzzy product models (Rule-Based Fuzzy Models/Systems) this is a common type of fuzzy models used to describe, analyze and simulate complex systems and processes that are poorly formalized. Objective. Development of a fuzzy model of quality of security risk assessment and protection of ERP systems through the use of fuzzy neural models. Method. To build a model for calculating the risk assessment of security, it is proposed to use a fuzzy product model. Fuzzy product models are a common kind of fuzzy models used to describe, analyze and model complex systems and processes that are poorly formalized. Results. Identified factors influencing risk assessment suggest the use of linguistic variables to describe them and use fuzzy variables to assess their qualities, as well as a system of qualitative assessments. The choice of parameters was substantiated and a fuzzy product model of risk assessment and a database of rules of fuzzy logical conclusion using the MATLAB application package and the Fuzzy Logic Toolbox extension package was implemented, as well as improved by introducing the adaptability of the model to experimental data by introducing neuro-fuzzy components into the model. The use of fuzzy models to solve the problems of security risk assessment, as well as the concept and construction of ERP systems and the analyzed problems of their security and vulnerabilities are considered. Conclusions. A fuzzy model has been developed risk assessment of the ERP system. Selected a list of factors affecting the risk of security. Methods of risk assessment of information resources and ERP-systems in general, assessment of financial losses from the implementation of threats, determination of the type of risk according to its assessment for the formation of recommendations on their processing in order to maintain the level of protection of the ERP-system are proposed. The list of
上下文。由于安全风险评估是一个复杂的、完全不确定的过程,不确定性是影响评估绩效的主要因素,因此建议采用对非计算数据具有适应性的模糊方法和模型。对风险因素的模糊评估的形成是主观的,风险评估依赖于对组织运作过程中已经产生的威胁风险进行处理的过程中所获得的实际结果和安全专业人员的经验。因此,建议使用能够充分评估模糊因素并具有调整其对风险评估影响的能力的模型。解决这类问题的最佳性能指标是神经模糊模型,它结合了模糊逻辑和人工神经网络和系统的方法,即“类人”风格的模糊系统考虑与神经网络心理现象的训练和模拟。为了建立安全风险评估的计算模型,提出了使用模糊产品模型。模糊产品模型(基于规则的模糊模型/系统)这是一种常见的模糊模型类型,用于描述、分析和模拟形式化程度较差的复杂系统和过程。目标。利用模糊神经模型建立了ERP系统安全风险评估和保护质量的模糊模型。方法。为了建立安全风险评估的计算模型,提出了使用模糊产品模型。模糊产品模型是一种常用的模糊模型,用于描述、分析和建模形式化程度较差的复杂系统和过程。结果。已确定的影响风险评估的因素建议使用语言变量来描述它们,并使用模糊变量来评估它们的质量,以及一个定性评估系统。利用MATLAB应用程序包和模糊逻辑工具箱扩展包,验证了参数的选择,实现了风险评估的模糊产品模型和模糊逻辑结论规则库,并通过在模型中引入神经模糊分量,提高了模型对实验数据的适应性。利用模糊模型解决安全风险评估问题,考虑ERP系统的概念和结构,分析其安全性和脆弱性问题。结论。建立了ERP系统风险评估的模糊模型。选取影响安全风险的因素列表。对信息资源和erp系统的一般风险进行评估的方法,评估实施威胁造成的财务损失,根据其评估确定风险类型,形成对其处理的建议,以保持erp系统的保护水平。定义了模型的语言变量列表。选择了模糊积规则数据库的结构- miso结构。建立了模糊模型的结构。确定了模糊变量模型。
{"title":"RISK ASSESSMENT MODELING OF ERP-SYSTEMS","authors":"A. D. Kozhukhivskyi, O. Kozhukhivska","doi":"10.15588/1607-3274-2022-4-12","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-12","url":null,"abstract":"Context. Because assessing security risks is a complex and complete uncertainty process, and uncertainties are a major factor influencing valuation performance, it is advisable to use fuzzy methods and models that are adaptive to noncomputed data. The formation of vague assessments of risk factors is subjective, and risk assessment depends on the practical results obtained in the process of processing the risks of threats that have already arisen during the functioning of the organization and experience of security professionals. Therefore, it will be advisable to use models that can ade-quately assess fuzzy factors and have the ability to adjust their impact on risk assessment. The greatest performance indicators for solving such problems are neuro-fuzzy models that combine methods of fuzzy logic and artificial neural networks and systems, i.e. “human-like” style of considerations of fuzzy systems with training and simulation of mental phenomena of neural networks. To build a model for calculating the risk assessment of security, it is proposed to use a fuzzy product model. Fuzzy product models (Rule-Based Fuzzy Models/Systems) this is a common type of fuzzy models used to describe, analyze and simulate complex systems and processes that are poorly formalized. \u0000Objective. Development of a fuzzy model of quality of security risk assessment and protection of ERP systems through the use of fuzzy neural models. \u0000Method. To build a model for calculating the risk assessment of security, it is proposed to use a fuzzy product model. Fuzzy product models are a common kind of fuzzy models used to describe, analyze and model complex systems and processes that are poorly formalized. \u0000Results. Identified factors influencing risk assessment suggest the use of linguistic variables to describe them and use fuzzy variables to assess their qualities, as well as a system of qualitative assessments. The choice of parameters was substantiated and a fuzzy product model of risk assessment and a database of rules of fuzzy logical conclusion using the MATLAB application package and the Fuzzy Logic Toolbox extension package was implemented, as well as improved by introducing the adaptability of the model to experimental data by introducing neuro-fuzzy components into the model. The use of fuzzy models to solve the problems of security risk assessment, as well as the concept and construction of ERP systems and the analyzed problems of their security and vulnerabilities are considered. \u0000Conclusions. A fuzzy model has been developed risk assessment of the ERP system. Selected a list of factors affecting the risk of security. Methods of risk assessment of information resources and ERP-systems in general, assessment of financial losses from the implementation of threats, determination of the type of risk according to its assessment for the formation of recommendations on their processing in order to maintain the level of protection of the ERP-system are proposed. The list of ","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"98 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76195514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PERMANENT DECOMPOSITION ALGORITHM FOR THE COMBINATORIAL OBJECTS GENERATION 永久分解算法用于组合对象的生成
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-11 DOI: 10.15588/1607-3274-2022-4-10
Y. Turbal, S. Babych, N. Kunanets
Context. The problem of generating vectors consisting of different representatives of a given set of sets is considered. Such problems arise, in particular, in scheduling theory, when scheduling appointments. A special case of this problem is the problem of generating permutations. Objective. Problem is considered from the point of view of a permanent approach and a well-known one, based on the concept of lexicographic order. Method. In many tasks, it becomes necessary to generate various combinatorial objects: permutations, combinations with and without repetitions, various subsets. In this paper we consider a new approach to the combinatorial objects generation, which is based on the procedure of the permanent decomposition. Permanent is built for the special matrix of incidence. The main idea of this approach is including to the process of the algebraic permanent decomposition by row additional function for the column identifiers writing into corresponding data structures. In this case, the algebraic permanent in not calculated, but we get a specific recursive algorithm for generating a combinatorial object. The computational complexity of this algorithm is analyzed. Results. It is investigated a new approach to the generation of complex combinatorial objects, based on the procedure of decomposition of the modified permanent of the incidence matrix by line with memorization of index elements. Conclusions. The permanent algorithms of the combinatorial objects generation is investigated. The complexity of our approach in the case of permutation is compared with the lexicographic algorithm and the Johnson-Trotter algorithm. The obtained results showed that our algorithm belongs to the same complexity class as the lexicographic algorithm and the Johnson-Trotter method. Numerical results confirmed the effectiveness of our approach.
上下文。研究了由给定集合的不同代表组成的向量的生成问题。当安排约会时,这些问题尤其出现在调度理论中。这个问题的一个特例是生成排列的问题。目标。基于词典顺序的概念,从一种永久的、众所周知的方法的角度来考虑问题。方法。在许多任务中,有必要生成各种组合对象:排列、有或没有重复的组合、各种子集。本文提出了一种新的基于永久分解过程的组合对象生成方法。永久性是针对特殊的入射矩阵建立的。这种方法的主要思想是在按行进行代数永久分解的过程中增加了将列标识符写入相应数据结构的附加功能。在这种情况下,代数永久不计算,但我们得到一个特定的递归算法来生成一个组合对象。分析了该算法的计算复杂度。结果。本文研究了一种新的复杂组合对象生成方法,该方法是基于关联矩阵的修正常数的线性分解和索引元素的记忆。结论。研究了组合对象生成的永久算法。在排列的情况下,我们的方法的复杂性与词典算法和约翰逊-特罗特算法进行了比较。结果表明,该算法与词典算法和Johnson-Trotter算法属于同一复杂度类。数值结果证实了该方法的有效性。
{"title":"PERMANENT DECOMPOSITION ALGORITHM FOR THE COMBINATORIAL OBJECTS GENERATION","authors":"Y. Turbal, S. Babych, N. Kunanets","doi":"10.15588/1607-3274-2022-4-10","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-10","url":null,"abstract":"Context. The problem of generating vectors consisting of different representatives of a given set of sets is considered. Such problems arise, in particular, in scheduling theory, when scheduling appointments. A special case of this problem is the problem of generating permutations. \u0000Objective. Problem is considered from the point of view of a permanent approach and a well-known one, based on the concept of lexicographic order. \u0000Method. In many tasks, it becomes necessary to generate various combinatorial objects: permutations, combinations with and without repetitions, various subsets. In this paper we consider a new approach to the combinatorial objects generation, which is based on the procedure of the permanent decomposition. Permanent is built for the special matrix of incidence. The main idea of this approach is including to the process of the algebraic permanent decomposition by row additional function for the column identifiers writing into corresponding data structures. In this case, the algebraic permanent in not calculated, but we get a specific recursive algorithm for generating a combinatorial object. The computational complexity of this algorithm is analyzed. \u0000Results. It is investigated a new approach to the generation of complex combinatorial objects, based on the procedure of decomposition of the modified permanent of the incidence matrix by line with memorization of index elements. \u0000Conclusions. The permanent algorithms of the combinatorial objects generation is investigated. The complexity of our approach in the case of permutation is compared with the lexicographic algorithm and the Johnson-Trotter algorithm. \u0000The obtained results showed that our algorithm belongs to the same complexity class as the lexicographic algorithm and the Johnson-Trotter method. Numerical results confirmed the effectiveness of our approach.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"38 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76005217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DETERMINATION OF INHERITANCE RELATIONS AND RESTRUCTURING OF SOFTWARE CLASS MODELS IN THE PROCESS OF DEVELOPING INFORMATION SYSTEMS 信息系统开发过程中软件类模型的继承关系确定与重构
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-10 DOI: 10.15588/1607-3274-2022-4-8
O. Kungurtsev, A. I. Vytnova
Context. The implementation of different use-cases may be performed by different development teams at different times. This results in a poorly structured code. The problem is exacerbated when developing medium and large projects in a short time. Objective. Since inheritance is one of the effective ways to structure and improve the quality of code, the aim of the study is to determine possible inheritance relationships for a variety of class models. Method. It is proposed to select from the entire set of classes representing the class model at a certain design stage, subsets for which a common parent class (in a particular case, an abstract class) is possible. To solve the problem, signs of the generality of classes have been formulated. The mathematical model of the conceptual class has been improved by including information about the responsibilities of the class, its methods and attributes. The connection of each class with the script items for which it is used has been established. A system of data types for class model elements is proposed. Description of class method signatures has been extended. A method for restructuring the class model, which involves 3 stages, has been developed. At the first stage, the proximity coefficients of classes are determined. At the second, subsets of possible child classes are created. At the third stage, an automated transformation of the class structure is performed, considering the identified inheritance relationships. Results. A software product for conducting experiments to identify possible inheritance relationships depending on the number of classes and the degree of their similarity has been developed. The results of the conducted tests showed the effectiveness of the decisions made. Conclusions. The method uses an algorithm for forming subsets of classes that can have one parent and an algorithm for automatically creating and converting classes to build a two-level class hierarchy. An experiment showed a threefold reduction in errors in detecting inheritance and a multiple reduction in time in comparison with the existing technology.
上下文。不同用例的实现可以由不同的开发团队在不同的时间执行。这将导致代码结构不佳。在短时间内开发大中型项目时,这一问题更加突出。目标。由于继承是构建和提高代码质量的有效方法之一,因此本研究的目的是确定各种类模型的可能继承关系。方法。建议在某一设计阶段从表示类模型的整个类集中选择可能存在共同父类(在特定情况下为抽象类)的子集。为了解决这个问题,我们制定了类的一般性符号。通过包含有关类的职责、方法和属性的信息,概念类的数学模型得到了改进。已经建立了每个类与使用它的脚本项的连接。提出了类模型元素的数据类型体系。类方法签名的描述已经扩展。本文提出了一种重构类模型的方法,该方法分为三个阶段。第一阶段,确定类的接近系数。第二步,创建可能的子类的子集。在第三阶段,考虑到确定的继承关系,执行类结构的自动转换。结果。已经开发了一个软件产品,用于进行实验,以根据类的数量及其相似程度确定可能的继承关系。所进行的测试结果表明所作决定的有效性。结论。该方法使用一种算法来形成可以有一个父类的类子集,并使用一种算法来自动创建和转换类,以构建两级类层次结构。实验表明,与现有技术相比,检测继承的错误减少了三倍,时间减少了几倍。
{"title":"DETERMINATION OF INHERITANCE RELATIONS AND RESTRUCTURING OF SOFTWARE CLASS MODELS IN THE PROCESS OF DEVELOPING INFORMATION SYSTEMS","authors":"O. Kungurtsev, A. I. Vytnova","doi":"10.15588/1607-3274-2022-4-8","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-8","url":null,"abstract":"Context. The implementation of different use-cases may be performed by different development teams at different times. This results in a poorly structured code. The problem is exacerbated when developing medium and large projects in a short time. \u0000Objective. Since inheritance is one of the effective ways to structure and improve the quality of code, the aim of the study is to determine possible inheritance relationships for a variety of class models. \u0000Method. It is proposed to select from the entire set of classes representing the class model at a certain design stage, subsets for which a common parent class (in a particular case, an abstract class) is possible. To solve the problem, signs of the generality of classes have been formulated. The mathematical model of the conceptual class has been improved by including information about the responsibilities of the class, its methods and attributes. The connection of each class with the script items for which it is used has been established. A system of data types for class model elements is proposed. Description of class method signatures has been extended. A method for restructuring the class model, which involves 3 stages, has been developed. At the first stage, the proximity coefficients of classes are determined. At the second, subsets of possible child classes are created. At the third stage, an automated transformation of the class structure is performed, considering the identified inheritance relationships. \u0000Results. A software product for conducting experiments to identify possible inheritance relationships depending on the number of classes and the degree of their similarity has been developed. The results of the conducted tests showed the effectiveness of the decisions made. \u0000Conclusions. The method uses an algorithm for forming subsets of classes that can have one parent and an algorithm for automatically creating and converting classes to build a two-level class hierarchy. An experiment showed a threefold reduction in errors in detecting inheritance and a multiple reduction in time in comparison with the existing technology.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"9 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78464232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DATA CLUSTERING BASED ON INDUCTIVE LEARNING OF NEURO-FUZZY NETWORK WITH DISTANCE HASHING 基于距离哈希神经模糊网络归纳学习的数据聚类
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-09 DOI: 10.15588/1607-3274-2022-4-6
S. Subbotin
Context. Cluster analysis is widely used to analyze data of various nature and dimensions. However, the known methods of cluster analysis are characterized by low speed and are demanding on computer memory resources due to the need to calculate pairwise distances between instances in a multidimensional feature space. In addition, the results of known methods of cluster analysis are difficult for human perception and analysis with a large number of features. Objective. The purpose of the work is to increase the speed of cluster analysis, the interpretability of the resulting partition into clusters, as well as to reduce the requirements of cluster analysis to computer memory. Method. A method for cluster analysis of multidimensional data is proposed, which for each instance calculates its hash based on the distance to the conditional center of coordinates, uses a one-dimensional coordinate along the hash axis to determine the distances between instances, considers the resulting hash as a pseudo-output feature, breaking it into intervals, which matches the labels pseudo-classes – clusters, having received a rough crisp partition of the feature space and sample instances, automatically generates a partition of input features into fuzzy terms, determines the rules for referring instances to clusters and, as a result, forms a fuzzy inference system of the Mamdani-Zadeh classifier type, which is further trained in the form of a neuro-fuzzy network to ensure acceptable values of the clustering quality functional. This makes it possible to reduce the number of terms and features used, to evaluate their contribution to making decisions about assigning instances to clusters, to increase the speed of data cluster analysis, and to increase the interpretability of the resulting data splitting into clusters. Results. The mathematical support for solving the problem of cluster data analysis in conditions of large data dimensions has been developed. The experiments confirmed the operability of the developed mathematical support have been carried out. Conclusions. . The developed method and its software implementation can be recommended for use in practice in the problems of analyzing data of various nature and dimensions.
上下文。聚类分析被广泛用于分析各种性质和维度的数据。然而,已知的聚类分析方法的特点是速度慢,并且由于需要计算多维特征空间中实例之间的成对距离,需要占用计算机内存资源。此外,已知的聚类分析方法的结果对于大量特征的人类来说是难以感知和分析的。目标。这项工作的目的是提高聚类分析的速度,提高聚类划分结果的可解释性,以及降低聚类分析对计算机内存的要求。方法。提出了一种多维数据聚类分析方法,该方法根据每个实例到条件坐标中心的距离计算其哈希值,使用沿哈希轴的一维坐标确定实例之间的距离,将得到的哈希值作为伪输出特征,将其分解为与伪类-聚类标签相匹配的区间,得到特征空间和样本实例的粗糙清晰划分;自动将输入特征划分为模糊项,确定实例指向聚类的规则,形成Mamdani-Zadeh分类器类型的模糊推理系统,并以神经模糊网络的形式对其进行训练,以保证聚类质量泛函数的可接受值。这样就可以减少使用的术语和特征的数量,评估它们对将实例分配给集群的决策的贡献,提高数据集群分析的速度,并增加将结果数据划分为集群的可解释性。结果。为解决大数据维数条件下的聚类数据分析问题提供了数学支持。实验证实了所开发的数学支持的可操作性。结论。。所开发的方法及其软件实现可以推荐用于实际分析各种性质和维度的数据问题。
{"title":"DATA CLUSTERING BASED ON INDUCTIVE LEARNING OF NEURO-FUZZY NETWORK WITH DISTANCE HASHING","authors":"S. Subbotin","doi":"10.15588/1607-3274-2022-4-6","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-6","url":null,"abstract":"Context. Cluster analysis is widely used to analyze data of various nature and dimensions. However, the known methods of cluster analysis are characterized by low speed and are demanding on computer memory resources due to the need to calculate pairwise distances between instances in a multidimensional feature space. In addition, the results of known methods of cluster analysis are difficult for human perception and analysis with a large number of features. \u0000Objective. The purpose of the work is to increase the speed of cluster analysis, the interpretability of the resulting partition into clusters, as well as to reduce the requirements of cluster analysis to computer memory. \u0000Method. A method for cluster analysis of multidimensional data is proposed, which for each instance calculates its hash based on the distance to the conditional center of coordinates, uses a one-dimensional coordinate along the hash axis to determine the distances between instances, considers the resulting hash as a pseudo-output feature, breaking it into intervals, which matches the labels pseudo-classes – clusters, having received a rough crisp partition of the feature space and sample instances, automatically generates a partition of input features into fuzzy terms, determines the rules for referring instances to clusters and, as a result, forms a fuzzy inference system of the Mamdani-Zadeh classifier type, which is further trained in the form of a neuro-fuzzy network to ensure acceptable values of the clustering quality functional. This makes it possible to reduce the number of terms and features used, to evaluate their contribution to making decisions about assigning instances to clusters, to increase the speed of data cluster analysis, and to increase the interpretability of the resulting data splitting into clusters. \u0000Results. The mathematical support for solving the problem of cluster data analysis in conditions of large data dimensions has been developed. The experiments confirmed the operability of the developed mathematical support have been carried out. \u0000Conclusions. . The developed method and its software implementation can be recommended for use in practice in the problems of analyzing data of various nature and dimensions.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"167 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83248160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CLUSTERIZATION OF DATA ARRAYS BASED ON COMBINED OPTIMIZATION OF DISTRIBUTION DENSITY FUNCTIONS AND THE EVOLUTIONARY METHOD OF CAT SWARM 基于分布密度函数联合优化和猫群进化方法的数据阵列聚类
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-05 DOI: 10.15588/1607-3274-2022-4-5
Y. Bodyanskiy, I. Pliss, A. Shafronenko
Context. The task of clustering arrays of observations of an arbitrary nature is an integral part of Data Mining, and in the more general case of Data Science, a huge number of approaches have been proposed for its solution, which differ from each other both in a priori assumptions regarding the physical nature of the data and the problem, and in the mathematical apparatus. From a computational point of view, the clustering problem turns into a problem of finding local extrema of a multiextremal function of the vector density argument using gradient procedures that are repeatedly launched from different points of the initial data array. It is possible to speed up the process of searching for these extremes by using the ideas of evolutionary optimization, which includes algorithms inspired by nature, swarm algorithms, population algorithms, etc. Objective. The purpose of the work is to introduce a data clustering procedure based on the peaks of the data distribution density and the evolutionary method of cat swarms, that combines the main advantages of methods for working with data in conditions of overlapping classes, is characterized by high-quality clustering, high speed and accuracy of the obtained results. Method. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarms was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap. Results. The results of the experiments confirm the effectiveness of the proposed approach in clustering problems under the condition of classes that overlap and allow us to recommend the proposed method for use in practice to solve problems of automatic clustering big data. Conclusions. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarm was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap. The method is quite simple from the numerical implementation and is not critical for choosing an optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under conditions of overlapping clusters.
上下文。对任意性质的观察结果进行聚类的任务是数据挖掘的一个组成部分,在数据科学的更一般的情况下,已经提出了大量的方法来解决它,这些方法在关于数据和问题的物理性质的先验假设以及数学装置方面彼此不同。从计算的角度来看,聚类问题变成了一个使用梯度过程寻找向量密度参数的多极值函数的局部极值的问题,梯度过程从初始数据数组的不同点反复启动。利用进化优化的思想可以加速寻找这些极端的过程,进化优化包括受自然启发的算法、群体算法、种群算法等。目标。本文的目的是引入一种基于数据分布密度峰值的数据聚类方法和cat群的进化方法,该方法结合了在重叠类条件下处理数据的方法的主要优点,具有聚类质量高、速度快、结果准确的特点。方法。提出了一种基于分布密度函数优化与猫群进化方法相结合的数据阵列聚类方法。该方法的优点是减少了在集群重叠的情况下求解优化问题的时间。结果。实验结果证实了本文方法在类重叠情况下聚类问题的有效性,并推荐本文方法用于实际解决自动聚类大数据问题。结论。提出了一种基于分布密度函数优化和猫群进化方法的数据阵列聚类方法。该方法的优点是减少了在集群重叠的情况下求解优化问题的时间。从数值实现来看,该方法非常简单,对于选择优化程序并不重要。实验结果证实了该方法在重叠聚类条件下处理聚类问题的有效性。
{"title":"CLUSTERIZATION OF DATA ARRAYS BASED ON COMBINED OPTIMIZATION OF DISTRIBUTION DENSITY FUNCTIONS AND THE EVOLUTIONARY METHOD OF CAT SWARM","authors":"Y. Bodyanskiy, I. Pliss, A. Shafronenko","doi":"10.15588/1607-3274-2022-4-5","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-5","url":null,"abstract":"Context. The task of clustering arrays of observations of an arbitrary nature is an integral part of Data Mining, and in the more general case of Data Science, a huge number of approaches have been proposed for its solution, which differ from each other both in a priori assumptions regarding the physical nature of the data and the problem, and in the mathematical apparatus. From a computational point of view, the clustering problem turns into a problem of finding local extrema of a multiextremal function of the vector density argument using gradient procedures that are repeatedly launched from different points of the initial data array. It is possible to speed up the process of searching for these extremes by using the ideas of evolutionary optimization, which includes algorithms inspired by nature, swarm algorithms, population algorithms, etc. \u0000Objective. The purpose of the work is to introduce a data clustering procedure based on the peaks of the data distribution density and the evolutionary method of cat swarms, that combines the main advantages of methods for working with data in conditions of overlapping classes, is characterized by high-quality clustering, high speed and accuracy of the obtained results. \u0000Method. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarms was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap. \u0000Results. The results of the experiments confirm the effectiveness of the proposed approach in clustering problems under the condition of classes that overlap and allow us to recommend the proposed method for use in practice to solve problems of automatic clustering big data. \u0000Conclusions. The method for clustering data arrays based on the combined optimization of distribution density functions and the evolutionary method of cat swarm was proposed. The advantage of the proposed approach is to reduce the time for solving optimization problems in conditions where clusters are overlap. The method is quite simple from the numerical implementation and is not critical for choosing an optimization procedure. The experimental results confirm the effectiveness of the proposed approach in clustering problems under conditions of overlapping clusters.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"12 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85160949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
RESTORATION OF DISCONTINUOUS FUNCTIONS BY DISCONTINUOUS INTERLINATION SPLINES 不连续插值样条法恢复不连续函数
IF 0.5 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2022-12-04 DOI: 10.15588/1607-3274-2022-4-3
I. Pershyna
Context. The problem of development and research of methods for approximation of discontinuous functions by discontinuous interlination splines and its further application to problems of computed tomography. The object of the study was the modeling of objects with a discontinuous internal structure. Objective. The aim of this study is to develop a general method for constructing discontinuous interlining polynomial splines, which, as a special case, include discontinuous and continuously differentiated splines. Method. Modern methods of restoring functions are characterized by new approaches to obtaining, processing and analyzing information. There is a need to build mathematical models in which information can be represented not only by function values at points, but also in the form of a set of function traces on planes or straight lines. At the same time, practice shows that among the multidimensional objects that need to be investigated, more problems are described by a discontinuous functions. The paper develops a general method for constructing discontinuous interlining polynomial splines, which, as a special case, include discontinuous and continuously differentiable splines. It is considered that the domain of the definition of the required twodimensional function is divided into rectangular elements. Theorems on interlination and approximation properties of such discontinuous constructions are formulated and proved. The method is developed for approximating discontinuous functions of two variables based on the constructed discontinuous splines. The input data are the traces of an unknown function along a given system of mutually perpendicular straight lines. The proposed method has not only theoretical significance but also practical application in the IT domain, especially in computing tomography, allowing more accurately restore the internal structure of the body. Results. The discontinuous interlination operator from known traces of the function of two variables on a system of mutually perpendicular straight lines is researched. Conclusions. The functions of two variables that are discontinuous at some points or on some lines are better approximated by discontinuous spline interlinants. At the same time, equally high approximation estimates can be obtained. The results obtained have significant advantages over existing methods of interpolation and approximation of discontinuous functions. In further research, the authors plan to develop a theory of discontinuous splines on areas of complex shape bounded by arcs of known curves.
上下文。不连续插值样条逼近不连续函数方法的发展与研究问题及其在计算机断层成像问题中的进一步应用。研究对象是具有不连续内部结构的物体的建模。目标。本文的目的是建立一种构造不连续插值多项式样条的一般方法,作为一种特殊情况,它包括不连续和连续微分样条。方法。现代功能恢复方法的特点是获取、处理和分析信息的新方法。有必要建立数学模型,其中的信息不仅可以用点上的函数值表示,而且可以用平面或直线上的一组函数轨迹的形式表示。同时,实践表明,在需要研究的多维对象中,用不连续函数来描述的问题更多。本文给出了构造不连续插值多项式样条的一般方法,作为特例,它包括不连续和连续可微样条。认为所要求的二维函数的定义域被划分为矩形单元。给出并证明了这类不连续结构的插值和近似性质定理。提出了一种基于构造的不连续样条逼近两变量不连续函数的方法。输入数据是一个未知函数沿着一个给定的相互垂直的直线系统的轨迹。该方法不仅具有理论意义,而且在IT领域,特别是计算机断层扫描领域具有实际应用价值,可以更准确地恢复人体内部结构。结果。研究了二元函数在相互垂直直线系统上已知轨迹的不连续插值算子。结论。对于在某些点或某些直线上不连续的两个变量的函数,用不连续样条插值可以更好地逼近。同时,可以得到同样高的近似估计。所得结果与现有的不连续函数插值和逼近方法相比具有显著的优越性。在进一步的研究中,作者计划在以已知曲线的弧线为界的复杂形状区域上发展不连续样条的理论。
{"title":"RESTORATION OF DISCONTINUOUS FUNCTIONS BY DISCONTINUOUS INTERLINATION SPLINES","authors":"I. Pershyna","doi":"10.15588/1607-3274-2022-4-3","DOIUrl":"https://doi.org/10.15588/1607-3274-2022-4-3","url":null,"abstract":"Context. The problem of development and research of methods for approximation of discontinuous functions by discontinuous interlination splines and its further application to problems of computed tomography. The object of the study was the modeling of objects with a discontinuous internal structure. \u0000Objective. The aim of this study is to develop a general method for constructing discontinuous interlining polynomial splines, which, as a special case, include discontinuous and continuously differentiated splines. \u0000Method. Modern methods of restoring functions are characterized by new approaches to obtaining, processing and analyzing information. There is a need to build mathematical models in which information can be represented not only by function values at points, but also in the form of a set of function traces on planes or straight lines. \u0000At the same time, practice shows that among the multidimensional objects that need to be investigated, more problems are described by a discontinuous functions. \u0000The paper develops a general method for constructing discontinuous interlining polynomial splines, which, as a special case, include discontinuous and continuously differentiable splines. It is considered that the domain of the definition of the required twodimensional function is divided into rectangular elements. Theorems on interlination and approximation properties of such discontinuous constructions are formulated and proved. The method is developed for approximating discontinuous functions of two variables based on the constructed discontinuous splines. The input data are the traces of an unknown function along a given system of mutually perpendicular straight lines. The proposed method has not only theoretical significance but also practical application in the IT domain, especially in computing tomography, allowing more accurately restore the internal structure of the body. \u0000Results. The discontinuous interlination operator from known traces of the function of two variables on a system of mutually perpendicular straight lines is researched. \u0000Conclusions. The functions of two variables that are discontinuous at some points or on some lines are better approximated by discontinuous spline interlinants. At the same time, equally high approximation estimates can be obtained. The results obtained have significant advantages over existing methods of interpolation and approximation of discontinuous functions. In further research, the authors plan to develop a theory of discontinuous splines on areas of complex shape bounded by arcs of known curves.","PeriodicalId":43783,"journal":{"name":"Radio Electronics Computer Science Control","volume":"61 1","pages":""},"PeriodicalIF":0.5,"publicationDate":"2022-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83693485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Radio Electronics Computer Science Control
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1