剪刀服务器计算:让分析更接近数据

Proceedings of the 28th International Conference on Scientific and Statistical Database Management Pub Date : 2016-07-18 DOI:10.1145/2949689.2949700

Dmitry Medvedev, G. Lemson, M. Rippin

{"title":"剪刀服务器计算:让分析更接近数据","authors":"Dmitry Medvedev, G. Lemson, M. Rippin","doi":"10.1145/2949689.2949700","DOIUrl":null,"url":null,"abstract":"SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.","PeriodicalId":254803,"journal":{"name":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":"{\"title\":\"SciServer Compute: Bringing Analysis Close to the Data\",\"authors\":\"Dmitry Medvedev, G. Lemson, M. Rippin\",\"doi\":\"10.1145/2949689.2949700\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.\",\"PeriodicalId\":254803,\"journal\":{\"name\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"volume\":\"64 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"25\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 28th International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2949689.2949700\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2949689.2949700","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 25

摘要

SciServer Compute使用在服务器端Docker容器中运行的Jupyter笔记本，这些容器附加到大型关系数据库和文件存储中，从而为数据提供高级分析功能。剪刀服务器计算是剪刀服务器的一个组成部分，剪刀服务器是约翰霍普金斯大学开发的一个大数据基础设施项目，将为计算研究提供一个公共环境。SciServer Compute集成了天文学、宇宙学、湍流、基因组学、海洋学和材料科学领域的大型现有数据库。这些都可以通过CasJobs服务进行直接SQL查询。SciServer Compute通过Python、R和MATLAB中的笔记本增加了交互式服务器端计算能力，用于运行异步任务的API，以及用于存储中间结果的非常大(数百tb)的刮刮空间。为科学准备的结果可以存储在一个类似于dropbox的服务scirive上，以便与合作者分享并向公众传播。笔记本和批处理作业在用户拥有的Docker容器中运行。这提供了安全性和隔离性，并允许通过特定于域的映像和装载特定于域的数据集灵活地配置计算上下文。我们提供了一个演示，演示了SciServer Compute的功能:使用Jupyter笔记本，对来自不同科学领域的数据选择执行分析，以及在Docker容器中运行异步作业。该演示将突出显示文件存储、数据库和计算组件之间的数据流。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

SciServer Compute: Bringing Analysis Close to the Data

SciServer Compute uses Jupyter notebooks running within server-side Docker containers attached to large relational databases and file storage to bring advanced analysis capabilities close to the data. SciServer Compute is a component of SciServer, a big-data infrastructure project developed at Johns Hopkins University that will provide a common environment for computational research. SciServer Compute integrates with large existing databases in the fields of astronomy, cosmology, turbulence, genomics, oceanography and materials science. These are accessible through the CasJobs service for direct SQL queries. SciServer Compute adds interactive server-side computational capabilities through notebooks in Python, R and MATLAB, an API for running asynchronous tasks, and a very large (hundreds of terabytes) scratch space for storing intermediate results. Science-ready results can be stored on a Dropbox-like service, SciDrive, for sharing with collaborators and dissemination to the public. Notebooks and batch jobs run inside Docker containers owned by the users. This provides security and isolation and allows flexible configuration of computational contexts through domain specific images and mounting of domain specific data sets. We present a demo that illustrates the capabilities of SciServer Compute: using Jupyter notebooks, performing analyses on data selections from diverse scientific fields, and running asynchronous jobs in a Docker container. The demo will highlight the data flow between file storage, database, and compute components.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 28th International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量

期刊最新文献

SMS: Stable Matching Algorithm using Skylines Graph-based modelling of query sets for differential privacy Efficient Feedback Collection for Pay-as-you-go Source Selection Multi-Assignment Single Joins for Parallel Cross-Match of Astronomic Catalogs on Heterogeneous Clusters Compact and queryable representation of raster datasets