剪刀服务器计算:让分析更接近数据

Dmitry Medvedev, G. Lemson, M. Rippin
{"title":"剪刀服务器计算:让分析更接近数据","authors":"Dmitry Medvedev, G. Lemson, M. Rippin","doi":"10.1145/2949689.2949700","DOIUrl":null,"url":null,"abstract":"SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"SciServer Compute: Bringing Analysis Close to the Data\",\"authors\":\"Dmitry Medvedev, G. Lemson, M. Rippin\",\"doi\":\"10.1145/2949689.2949700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.\",\"PeriodicalId\":254803,\"journal\":{\"name\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2949689.2949700\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949689.2949700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

摘要

SciServer Compute使用在服务器端Docker容器中运行的Jupyter笔记本,这些容器附加到大型关系数据库和文件存储中,从而为数据提供高级分析功能。剪刀服务器计算是剪刀服务器的一个组成部分,剪刀服务器是约翰霍普金斯大学开发的一个大数据基础设施项目,将为计算研究提供一个公共环境。SciServer Compute集成了天文学、宇宙学、湍流、基因组学、海洋学和材料科学领域的大型现有数据库。这些都可以通过CasJobs服务进行直接SQL查询。SciServer Compute通过Python、R和MATLAB中的笔记本增加了交互式服务器端计算能力,用于运行异步任务的API,以及用于存储中间结果的非常大(数百tb)的刮刮空间。为科学准备的结果可以存储在一个类似于dropbox的服务scirive上,以便与合作者分享并向公众传播。笔记本和批处理作业在用户拥有的Docker容器中运行。这提供了安全性和隔离性,并允许通过特定于域的映像和装载特定于域的数据集灵活地配置计算上下文。我们提供了一个演示,演示了SciServer Compute的功能:使用Jupyter笔记本,对来自不同科学领域的数据选择执行分析,以及在Docker容器中运行异步作业。该演示将突出显示文件存储、数据库和计算组件之间的数据流。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
SciServer Compute: Bringing Analysis Close to the Data
SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
SMS: Stable Matching Algorithm using Skylines Graph-based modelling of query sets for differential privacy Efficient Feedback Collection for Pay-as-you-go Source Selection Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters Compact and queryable representation of raster datasets
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1