Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store

Andrej Andrejev, S. Toor, A. Hellander, S. Holmgren, T. Risch
{"title":"Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store","authors":"Andrej Andrejev, S. Toor, A. Hellander, S. Holmgren, T. Risch","doi":"10.1109/ESCIENCE.2013.19","DOIUrl":null,"url":null,"abstract":"Data-intensive applications in e-Science require scalable solutions for storage as well as interactive tools for analysis of scientific data. It is important to be able to query the data in a storage-independent way, and to be able to obtain the results of the data-analysis incrementally (in contrast to traditional batch solutions). We use the RDF data model extended with multidimensional numeric arrays to represent the results, parameters, and other metadata describing scientific experiments, and SciSPARQL, an extension of the SPARQL language, to combine massive numeric array data and metadata in queries. To address the scalability problem we present an architecture that enables the same SciSPARQL queries to be executed on the RDF dataset whether it is stored in a relational DBMS or mapped over a specialized geographically distributed e-Science data store. In order to minimize access and communication costs, we represent the arrays with proxy objects, and retrieve their content lazily. We formulate typical analysis tasks from a computational biology application in terms of SciSPARQL queries, and compare the query processing performance with manually written scripts in MATLAB.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 9th International Conference on e-Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ESCIENCE.2013.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Data-intensive applications in e-Science require scalable solutions for storage as well as interactive tools for analysis of scientific data. It is important to be able to query the data in a storage-independent way, and to be able to obtain the results of the data-analysis incrementally (in contrast to traditional batch solutions). We use the RDF data model extended with multidimensional numeric arrays to represent the results, parameters, and other metadata describing scientific experiments, and SciSPARQL, an extension of the SPARQL language, to combine massive numeric array data and metadata in queries. To address the scalability problem we present an architecture that enables the same SciSPARQL queries to be executed on the RDF dataset whether it is stored in a relational DBMS or mapped over a specialized geographically distributed e-Science data store. In order to minimize access and communication costs, we represent the arrays with proxy objects, and retrieve their content lazily. We formulate typical analysis tasks from a computational biology application in terms of SciSPARQL queries, and compare the query processing performance with manually written scripts in MATLAB.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在可扩展的电子科学数据存储上扩展SPARQL查询的科学分析
电子科学中的数据密集型应用需要可扩展的存储解决方案以及用于分析科学数据的交互式工具。能够以与存储无关的方式查询数据,并能够增量地获得数据分析的结果(与传统的批处理解决方案相反),这一点非常重要。我们使用扩展了多维数字数组的RDF数据模型来表示描述科学实验的结果、参数和其他元数据,并使用SPARQL语言的扩展SciSPARQL来在查询中组合大量数字数组数据和元数据。为了解决可伸缩性问题,我们提出了一种体系结构,该体系结构允许在RDF数据集上执行相同的SciSPARQL查询,无论RDF数据集存储在关系DBMS中还是映射到专门的地理分布式e-Science数据存储中。为了最小化访问和通信成本,我们使用代理对象表示数组,并惰性地检索其内容。我们根据SciSPARQL查询从计算生物学应用程序中制定了典型的分析任务,并将查询处理性能与MATLAB中手动编写的脚本进行了比较。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Policy Derived Access Rights in the Social Cloud Accelerating In-memory Cross Match of Astronomical Catalogs Scientific Analysis by Queries in Extended SPARQL over a Scalable e-Science Data Store Malleable Access Rights to Establish and Enable Scientific Collaboration An Autonomous Security Storage Solution for Data-Intensive Cooperative Cloud Computing
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1