dQCOB: managing large data flows using dynamic embedded queries

Proceedings the Ninth International Symposium on High-Performance Distributed Computing Pub Date : 2000-08-01 DOI:10.1109/HPDC.2000.868658

Beth Plale, K. Schwan

{"title":"dQCOB: managing large data flows using dynamic embedded queries","authors":"Beth Plale, K. Schwan","doi":"10.1109/HPDC.2000.868658","DOIUrl":null,"url":null,"abstract":"The dQUOB system satisfies client need for specific information from high-volume data streams. The data streams we speak of are the flow of data existing during large-scale visualizations, video streaming to large numbers of distributed users, and high volume business transactions. We introduce the notion of conceptualizing a data stream as a set of relational database tables so that a scientist can request information with an SQL-like query. Transformation or computation that often needs to be performed on the data en-route can be conceptualized as computation performed on consecutive views of the data, with computation associated with each view. The dQUOB system moves the query code into the data stream as a quoblet; as compiled code. The relational database data model has the significant advantage of presenting opportunities for efficient reoptimizations of queries and sets of queries. Using examples from global atmospheric modeling, we illustrate the usefulness of the dQUOB system. We carry the examples through the experiments to establish the viability of the approach for high performance computing with a baseline benchmark. We define a cost-metric of end-to-end latency that can be used to determine realistic cases where optimization should be applied. Finally, we show that end-to-end latency can be controlled through a probability assigned to a query that a query will evaluate to true.","PeriodicalId":400728,"journal":{"name":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2000-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings the Ninth International Symposium on High-Performance Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPDC.2000.868658","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 51

Abstract

The dQUOB system satisfies client need for specific information from high-volume data streams. The data streams we speak of are the flow of data existing during large-scale visualizations, video streaming to large numbers of distributed users, and high volume business transactions. We introduce the notion of conceptualizing a data stream as a set of relational database tables so that a scientist can request information with an SQL-like query. Transformation or computation that often needs to be performed on the data en-route can be conceptualized as computation performed on consecutive views of the data, with computation associated with each view. The dQUOB system moves the query code into the data stream as a quoblet; as compiled code. The relational database data model has the significant advantage of presenting opportunities for efficient reoptimizations of queries and sets of queries. Using examples from global atmospheric modeling, we illustrate the usefulness of the dQUOB system. We carry the examples through the experiments to establish the viability of the approach for high performance computing with a baseline benchmark. We define a cost-metric of end-to-end latency that can be used to determine realistic cases where optimization should be applied. Finally, we show that end-to-end latency can be controlled through a probability assigned to a query that a query will evaluate to true.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

dQCOB:使用动态嵌入查询管理大型数据流

dQUOB系统满足了客户端对来自大容量数据流的特定信息的需求。我们所说的数据流是指在大规模可视化、面向大量分布式用户的视频流以及大量业务事务期间存在的数据流。我们引入了将数据流概念化为一组关系数据库表的概念，以便科学家可以使用类似sql的查询请求信息。经常需要在途中对数据执行的转换或计算可以被概念化为在数据的连续视图上执行的计算，并且计算与每个视图相关联。dQUOB系统将查询代码作为quoblet移动到数据流中;作为编译后的代码。关系数据库数据模型具有显著的优势，它为查询和查询集的有效重新优化提供了机会。通过全球大气模拟的实例，我们说明了dQUOB系统的实用性。我们通过实验来验证该方法在高性能计算中的可行性。我们定义了端到端延迟的成本度量，可用于确定应该应用优化的实际情况。最后，我们展示了端到端延迟可以通过分配给查询的概率来控制，查询将计算为true。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

Proceedings the Ninth International Symposium on High-Performance Distributed Computing

自引率

0.00%

发文量

期刊最新文献

Event services for high performance computing The Modeler's Workbench: a system for dynamically distributed simulation and data collection Probe - a distributed storage testbed Grid-based file access: the Legion I/O model Creating large scale database servers