Estimating the response time of a data-intensive computing environment

Q3 Mathematics Informatsionno-Upravliaiushchie Sistemy Pub Date : 2022-09-12 DOI:10.31799/1684-8853-2022-4-12-19

A. Gorbunova, V. Vishnevsky

{"title":"Estimating the response time of a data-intensive computing environment","authors":"A. Gorbunova, V. Vishnevsky","doi":"10.31799/1684-8853-2022-4-12-19","DOIUrl":null,"url":null,"abstract":"Introduction: The amount of digital data is constantly growing as well as the need for its storage and processing for various purposes. To conduct data analysis, high-performance computing environments associated with parallelization methods, and, accordingly, data-intensive applications are used. The lack of quality tools for evaluating the effectiveness of the process of parallel data processing or tasks leads to excessive allocation of resources. Purpose: To develop mathematical models of data-intensive computing environments and methods for their performance analysis, i.e., for estimating the average system response time based on the data on system performance at the level of subtask solving. Results: We present a mathematical model of a parallel computing system in the form of a queueing system with parallel query processing on various architectures, including non-Poisson input flow and non-exponential service times. As a method for analyzing the average response time, we use a combination of simulation modeling with one of the machine learning methods (artificial neural networks). The effectiveness of the method is confirmed by numerical experiments and depends neither on the type of input flow, nor on the type of distribution of query service times, nor on the number of servers in the nodes of the system. The approximation error of the average response time does not exceed 10%, which makes it possible to optimize the generally accepted resource allocation, significantly reducing the amount of the resources. Practical relevance: The presented models and the method of their analysis can be used for efficient planning and allocation of resources for data-intensive systems.","PeriodicalId":36977,"journal":{"name":"Informatsionno-Upravliaiushchie Sistemy","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatsionno-Upravliaiushchie Sistemy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31799/1684-8853-2022-4-12-19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}

引用次数: 1

Abstract

Introduction: The amount of digital data is constantly growing as well as the need for its storage and processing for various purposes. To conduct data analysis, high-performance computing environments associated with parallelization methods, and, accordingly, data-intensive applications are used. The lack of quality tools for evaluating the effectiveness of the process of parallel data processing or tasks leads to excessive allocation of resources. Purpose: To develop mathematical models of data-intensive computing environments and methods for their performance analysis, i.e., for estimating the average system response time based on the data on system performance at the level of subtask solving. Results: We present a mathematical model of a parallel computing system in the form of a queueing system with parallel query processing on various architectures, including non-Poisson input flow and non-exponential service times. As a method for analyzing the average response time, we use a combination of simulation modeling with one of the machine learning methods (artificial neural networks). The effectiveness of the method is confirmed by numerical experiments and depends neither on the type of input flow, nor on the type of distribution of query service times, nor on the number of servers in the nodes of the system. The approximation error of the average response time does not exceed 10%, which makes it possible to optimize the generally accepted resource allocation, significantly reducing the amount of the resources. Practical relevance: The presented models and the method of their analysis can be used for efficient planning and allocation of resources for data-intensive systems.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

估计数据密集型计算环境的响应时间

简介：数字数据的数量在不断增长，对其存储和处理的需求也在不断增长。为了进行数据分析，需要使用与并行化方法相关的高性能计算环境，以及相应的数据密集型应用程序。缺乏用于评估并行数据处理或任务过程有效性的高质量工具，导致资源分配过多。目的：开发数据密集型计算环境的数学模型及其性能分析方法，即基于子任务解决级别的系统性能数据估计平均系统响应时间。结果：我们提出了一个并行计算系统的数学模型，该模型采用排队系统的形式，在各种体系结构上进行并行查询处理，包括非泊松输入流和非指数服务时间。作为一种分析平均响应时间的方法，我们将模拟建模与机器学习方法（人工神经网络）相结合。该方法的有效性通过数值实验得到了证实，既不取决于输入流的类型，也不取决于查询服务时间的分布类型，更不取决于系统节点中服务器的数量。平均响应时间的近似误差不超过10%，这使得优化普遍接受的资源分配成为可能，从而显著减少了资源量。实际相关性：所提出的模型及其分析方法可用于数据密集型系统的有效规划和资源分配。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Informatsionno-Upravliaiushchie Sistemy Mathematics-Control and Optimization

CiteScore

1.40

自引率

0.00%

发文量