Estimating the response time of a data-intensive computing environment

A. Gorbunova, V. Vishnevsky
{"title":"Estimating the response time of a data-intensive computing environment","authors":"A. Gorbunova, V. Vishnevsky","doi":"10.31799/1684-8853-2022-4-12-19","DOIUrl":null,"url":null,"abstract":"Introduction: The amount of digital data is constantly growing as well as the need for its storage and processing for various purposes. To conduct data analysis, high-performance computing environments associated with parallelization methods, and, accordingly, data-intensive applications are used. The lack of quality tools for evaluating the effectiveness of the process of parallel data processing or tasks leads to excessive allocation of resources. Purpose: To develop mathematical models of data-intensive computing environments and methods for their performance analysis, i.e., for estimating the average system response time based on the data on system performance at the level of subtask solving. Results: We present a mathematical model of a parallel computing system in the form of a queueing system with parallel query processing on various architectures, including non-Poisson input flow and non-exponential service times. As a method for analyzing the average response time, we use a combination of simulation modeling with one of the machine learning methods (artificial neural networks). The effectiveness of the method is confirmed by numerical experiments and depends neither on the type of input flow, nor on the type of distribution of query service times, nor on the number of servers in the nodes of the system. The approximation error of the average response time does not exceed 10%, which makes it possible to optimize the generally accepted resource allocation, significantly reducing the amount of the resources. Practical relevance: The presented models and the method of their analysis can be used for efficient planning and allocation of resources for data-intensive systems.","PeriodicalId":36977,"journal":{"name":"Informatsionno-Upravliaiushchie Sistemy","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Informatsionno-Upravliaiushchie Sistemy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31799/1684-8853-2022-4-12-19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Mathematics","Score":null,"Total":0}
引用次数: 1

Abstract

Introduction: The amount of digital data is constantly growing as well as the need for its storage and processing for various purposes. To conduct data analysis, high-performance computing environments associated with parallelization methods, and, accordingly, data-intensive applications are used. The lack of quality tools for evaluating the effectiveness of the process of parallel data processing or tasks leads to excessive allocation of resources. Purpose: To develop mathematical models of data-intensive computing environments and methods for their performance analysis, i.e., for estimating the average system response time based on the data on system performance at the level of subtask solving. Results: We present a mathematical model of a parallel computing system in the form of a queueing system with parallel query processing on various architectures, including non-Poisson input flow and non-exponential service times. As a method for analyzing the average response time, we use a combination of simulation modeling with one of the machine learning methods (artificial neural networks). The effectiveness of the method is confirmed by numerical experiments and depends neither on the type of input flow, nor on the type of distribution of query service times, nor on the number of servers in the nodes of the system. The approximation error of the average response time does not exceed 10%, which makes it possible to optimize the generally accepted resource allocation, significantly reducing the amount of the resources. Practical relevance: The presented models and the method of their analysis can be used for efficient planning and allocation of resources for data-intensive systems.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
估计数据密集型计算环境的响应时间
简介:数字数据的数量在不断增长,对其存储和处理的需求也在不断增长。为了进行数据分析,需要使用与并行化方法相关的高性能计算环境,以及相应的数据密集型应用程序。缺乏用于评估并行数据处理或任务过程有效性的高质量工具,导致资源分配过多。目的:开发数据密集型计算环境的数学模型及其性能分析方法,即基于子任务解决级别的系统性能数据估计平均系统响应时间。结果:我们提出了一个并行计算系统的数学模型,该模型采用排队系统的形式,在各种体系结构上进行并行查询处理,包括非泊松输入流和非指数服务时间。作为一种分析平均响应时间的方法,我们将模拟建模与机器学习方法(人工神经网络)相结合。该方法的有效性通过数值实验得到了证实,既不取决于输入流的类型,也不取决于查询服务时间的分布类型,更不取决于系统节点中服务器的数量。平均响应时间的近似误差不超过10%,这使得优化普遍接受的资源分配成为可能,从而显著减少了资源量。实际相关性:所提出的模型及其分析方法可用于数据密集型系统的有效规划和资源分配。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Informatsionno-Upravliaiushchie Sistemy
Informatsionno-Upravliaiushchie Sistemy Mathematics-Control and Optimization
CiteScore
1.40
自引率
0.00%
发文量
35
期刊最新文献
Modeling of bumping routes in the RSK algorithm and analysis of their approach to limit shapes Continuous control algorithms for conveyer belt routing based on multi-agent deep reinforcement learning Fully integrated optical sensor system with intensity interrogation Decoding of linear codes for single error bursts correction based on the determination of certain events Backend Bug Finder — a platform for effective compiler fuzzing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1