首页 > 最新文献

2018 IEEE 14th International Conference on e-Science (e-Science)最新文献

英文 中文
Scientific Partnership: A Pledge for a New Level of Collaboration between Scientists and IT Specialists 科学伙伴关系:科学家和信息技术专家之间新水平合作的承诺
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00122
J. Weismüller, A. Frank
ICT technologies play an increasing role in almost every aspect of science. The adaption of the new technologies however consume an increasing amount of the researcher's time, time they could better spend on their actual research. Not adapting new technologies however will inevitably lead to biased research, since scientists will not know about all the possibilities and methods that are available from modern technology. This dilemma can only be resolved by close collaboration and scientific partnership between researchers and IT experts from i.e. a local computing centre. In contrast to traditional IT service provision, IT experts have to understand the scientific problems and methods of the scientists in order to help them to select suitable services. If none are available, they can then consider adapting existing services or develop new ones according to the actual needs of the scientists. In addition, the partnership helps towards good scientific practice, since the IT experts can ensure reproducibility of the research by professionalising the workflow and applying FAIR data principles. We elaborate on this dilemma with examples from an IT centre's perspective, and sketch a path towards unbiased research and the development of new IT services that are tailored for the scientific community.
信息通信技术在科学的几乎每一个方面都发挥着越来越大的作用。然而,新技术的适应消耗了越来越多的研究人员的时间,他们本可以把时间花在他们的实际研究上。然而,不采用新技术将不可避免地导致有偏见的研究,因为科学家不会了解现代技术提供的所有可能性和方法。这一困境只能通过研究人员和来自本地计算中心的IT专家之间的密切合作和科学伙伴关系来解决。与传统的IT服务提供相比,IT专家必须了解科学家的科学问题和方法,以帮助他们选择合适的服务。如果没有可用的,他们可以考虑调整现有的服务或根据科学家的实际需要开发新的服务。此外,这种伙伴关系有助于实现良好的科学实践,因为IT专家可以通过使工作流程专业化和应用公平的数据原则来确保研究的可重复性。我们从一个IT中心的角度详细阐述了这一困境,并描绘了一条通向无偏见研究和为科学界量身定制的新IT服务开发的道路。
{"title":"Scientific Partnership: A Pledge for a New Level of Collaboration between Scientists and IT Specialists","authors":"J. Weismüller, A. Frank","doi":"10.1109/eScience.2018.00122","DOIUrl":"https://doi.org/10.1109/eScience.2018.00122","url":null,"abstract":"ICT technologies play an increasing role in almost every aspect of science. The adaption of the new technologies however consume an increasing amount of the researcher's time, time they could better spend on their actual research. Not adapting new technologies however will inevitably lead to biased research, since scientists will not know about all the possibilities and methods that are available from modern technology. This dilemma can only be resolved by close collaboration and scientific partnership between researchers and IT experts from i.e. a local computing centre. In contrast to traditional IT service provision, IT experts have to understand the scientific problems and methods of the scientists in order to help them to select suitable services. If none are available, they can then consider adapting existing services or develop new ones according to the actual needs of the scientists. In addition, the partnership helps towards good scientific practice, since the IT experts can ensure reproducibility of the research by professionalising the workflow and applying FAIR data principles. We elaborate on this dilemma with examples from an IT centre's perspective, and sketch a path towards unbiased research and the development of new IT services that are tailored for the scientific community.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"402-402"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86862624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
How to Bring Value of Domain Specific Big Data in an Interdisciplinary Way? A Software Landscape 如何跨领域大数据创造价值?软件景观
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00111
B. Thage, L. K. Andersen
Digital competences, such as advanced scientific computing and data mining tools, are often anchored in domain specific research areas. There is a substantial overlap of data types generated from the different fields of science, and hence there is a possibility for sharing knowledge and tools across disciplins. Mapping of software programming tools and identification of use cases that includes High Performance Computing (HPC) may serve as inspiration in order to bring new value of domain specific data in an interdisciplinary way. This poster share the experience from Denmark based on software mapping in 290 publications that included use of HPC.
数字能力,如先进的科学计算和数据挖掘工具,往往锚定在特定领域的研究领域。不同科学领域产生的数据类型有很大的重叠,因此有可能跨学科共享知识和工具。软件编程工具的映射和包括高性能计算(HPC)在内的用例的识别可以作为灵感,以便以跨学科的方式带来特定领域数据的新价值。这张海报分享了丹麦基于290种出版物的软件映射的经验,其中包括HPC的使用。
{"title":"How to Bring Value of Domain Specific Big Data in an Interdisciplinary Way? A Software Landscape","authors":"B. Thage, L. K. Andersen","doi":"10.1109/eScience.2018.00111","DOIUrl":"https://doi.org/10.1109/eScience.2018.00111","url":null,"abstract":"Digital competences, such as advanced scientific computing and data mining tools, are often anchored in domain specific research areas. There is a substantial overlap of data types generated from the different fields of science, and hence there is a possibility for sharing knowledge and tools across disciplins. Mapping of software programming tools and identification of use cases that includes High Performance Computing (HPC) may serve as inspiration in order to bring new value of domain specific data in an interdisciplinary way. This poster share the experience from Denmark based on software mapping in 290 publications that included use of HPC.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"26 1","pages":"384-385"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83224295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Modeling Impact of Execution Strategies on Resource Utilization 执行策略对资源利用的建模影响
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00085
A. Poyda, M. Titov, A. Klimentov, J. Wells, S. Oral, K. De, D. Oleynik, S. Jha
The analysis of the hundreds of petabytes of raw and derived HEP (High Energy Physics) data will necessitate exascale computing. In addition to unprecedented volume, these data are distributed over hundreds of computing centers. In response to these application requirement, as well as performance requirement by using parallel processing (i.e., parallelism), and as a consequence of technology trends, there has been an increase in the uptake of supercomputers by HEP projects.
对数百pb的原始和衍生HEP(高能物理)数据的分析将需要百亿亿次计算。除了空前的数据量,这些数据分布在数百个计算中心。为了响应这些应用程序需求,以及使用并行处理(即并行性)的性能需求,以及作为技术趋势的结果,HEP项目对超级计算机的采用有所增加。
{"title":"Modeling Impact of Execution Strategies on Resource Utilization","authors":"A. Poyda, M. Titov, A. Klimentov, J. Wells, S. Oral, K. De, D. Oleynik, S. Jha","doi":"10.1109/eScience.2018.00085","DOIUrl":"https://doi.org/10.1109/eScience.2018.00085","url":null,"abstract":"The analysis of the hundreds of petabytes of raw and derived HEP (High Energy Physics) data will necessitate exascale computing. In addition to unprecedented volume, these data are distributed over hundreds of computing centers. In response to these application requirement, as well as performance requirement by using parallel processing (i.e., parallelism), and as a consequence of technology trends, there has been an increase in the uptake of supercomputers by HEP projects.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"126 1","pages":"340-340"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87815811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Strategies for Modeling Extreme Luminosities in the CMS Simulation CMS模拟中极端亮度的建模策略
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00090
M. Hildreth, E. Sexton-Kennedy, K. Pedro, M. Kortelainen
The LHC simulation frameworks are already confronting the High Luminosity LHC (HL-LHC) era. In order to design and evaluate the performance of the HL-LHC detector upgrades, realistic simulations of the future detectors and the extreme luminosity conditions they may encounter have to be simulated now. The use of many individual minimum-bias interactions to model the pileup poses several challenges to the CMS Simulation framework, including huge memory consumption, increased computation time, and the necessary handling of large numbers of event files during Monte Carlo production. Simulating a single hard scatter at an instantaneous luminosity corresponding to 200 pileup interactions per crossing can involve the input of thousands of individual minimum-bias events. Brute-force Monte Carlo production requires the overlay of these events for each hard-scatter event simulated.
大型强子对撞机模拟框架已经面临高亮度大型强子对撞机(HL-LHC)时代。为了设计和评估HL-LHC探测器升级的性能,现在必须对未来探测器及其可能遇到的极端亮度条件进行真实模拟。使用许多单独的最小偏差交互来对堆积建模给CMS仿真框架带来了几个挑战,包括巨大的内存消耗、增加的计算时间,以及在Monte Carlo生产期间对大量事件文件的必要处理。在瞬时光度下模拟单个硬散射,每次交叉对应200个堆积相互作用,可能涉及数千个单个最小偏置事件的输入。强力蒙特卡罗生成需要对模拟的每个硬散射事件叠加这些事件。
{"title":"Strategies for Modeling Extreme Luminosities in the CMS Simulation","authors":"M. Hildreth, E. Sexton-Kennedy, K. Pedro, M. Kortelainen","doi":"10.1109/eScience.2018.00090","DOIUrl":"https://doi.org/10.1109/eScience.2018.00090","url":null,"abstract":"The LHC simulation frameworks are already confronting the High Luminosity LHC (HL-LHC) era. In order to design and evaluate the performance of the HL-LHC detector upgrades, realistic simulations of the future detectors and the extreme luminosity conditions they may encounter have to be simulated now. The use of many individual minimum-bias interactions to model the pileup poses several challenges to the CMS Simulation framework, including huge memory consumption, increased computation time, and the necessary handling of large numbers of event files during Monte Carlo production. Simulating a single hard scatter at an instantaneous luminosity corresponding to 200 pileup interactions per crossing can involve the input of thousands of individual minimum-bias events. Brute-force Monte Carlo production requires the overlay of these events for each hard-scatter event simulated.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"347-347"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83050970","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resolving Clouds in a Global Atmosphere Model - A Multiscale Approach with Nested Models 在全球大气模式中解析云-嵌套模式的多尺度方法
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00043
F. Jansson, G. Oord, P. Siebesma, D. Crommelin
n/a
N/A
{"title":"Resolving Clouds in a Global Atmosphere Model - A Multiscale Approach with Nested Models","authors":"F. Jansson, G. Oord, P. Siebesma, D. Crommelin","doi":"10.1109/eScience.2018.00043","DOIUrl":"https://doi.org/10.1109/eScience.2018.00043","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"27 1","pages":"270-270"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87460048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Bringing Data Science to Qualitative Analysis 将数据科学带入定性分析
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00076
Y. Cheah, Drew Paine, D. Ghoshal, L. Ramakrishnan
Qualitative user research is a human-intensive approach that draws upon ethnographic methods from social sciences to develop insights about work practices to inform software design and development. Recent advances in data science, and in particular, natural language processing (NLP), enables the derivation of machine-generated insights to augment existing techniques. Our work describes our prototype framework based in Jupyter, a software tool that supports interactive data science and scientific computing, that leverages NLP techniques to make sense of transcribed texts from user interviews. This work also serves as a starting point for incorporating data science techniques in the qualitative analyses process.
定性用户研究是一种以人为本的方法,它利用社会科学中的人种学方法来发展对工作实践的见解,从而为软件设计和开发提供信息。数据科学的最新进展,特别是自然语言处理(NLP),使机器生成的见解能够派生,以增强现有技术。我们的工作描述了基于Jupyter的原型框架,Jupyter是一个支持交互式数据科学和科学计算的软件工具,它利用NLP技术来理解来自用户访谈的转录文本。这项工作也可以作为将数据科学技术纳入定性分析过程的起点。
{"title":"Bringing Data Science to Qualitative Analysis","authors":"Y. Cheah, Drew Paine, D. Ghoshal, L. Ramakrishnan","doi":"10.1109/eScience.2018.00076","DOIUrl":"https://doi.org/10.1109/eScience.2018.00076","url":null,"abstract":"Qualitative user research is a human-intensive approach that draws upon ethnographic methods from social sciences to develop insights about work practices to inform software design and development. Recent advances in data science, and in particular, natural language processing (NLP), enables the derivation of machine-generated insights to augment existing techniques. Our work describes our prototype framework based in Jupyter, a software tool that supports interactive data science and scientific computing, that leverages NLP techniques to make sense of transcribed texts from user interviews. This work also serves as a starting point for incorporating data science techniques in the qualitative analyses process.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"24 1","pages":"325-326"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82203398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Weather Reanalysis on an Urban Scale using WRF 基于WRF的城市尺度天气再分析
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00049
R. V. Haren, S. Koopmans, G. Steeneveld, N. Theeuwes, R. Uijlenhoet, A. Holtslag
In this study, we improve the Weather Research and Forecasting mesoscale model (WRF) performance by incorporating observations of a variety of sources using data assimilation and nudging techniques on a resolution up to 100 meter for urban areas. Our final goal is to create a 15 year climatological urban re-analysis data archive of (hydro)meteorological variables for Amsterdam which is named ERA-urban. This will enable us to trace trends in thermal comfort and extreme precipitation.
在本研究中,我们利用数据同化和推动技术,在城市地区以高达100米的分辨率整合各种来源的观测数据,从而改进了天气研究与预报中尺度模式(WRF)的性能。我们的最终目标是为阿姆斯特丹创建一个15年的(水文)气象变量的城市气候再分析数据档案,命名为ERA-urban。这将使我们能够追踪热舒适和极端降水的趋势。
{"title":"Weather Reanalysis on an Urban Scale using WRF","authors":"R. V. Haren, S. Koopmans, G. Steeneveld, N. Theeuwes, R. Uijlenhoet, A. Holtslag","doi":"10.1109/eScience.2018.00049","DOIUrl":"https://doi.org/10.1109/eScience.2018.00049","url":null,"abstract":"In this study, we improve the Weather Research and Forecasting mesoscale model (WRF) performance by incorporating observations of a variety of sources using data assimilation and nudging techniques on a resolution up to 100 meter for urban areas. Our final goal is to create a 15 year climatological urban re-analysis data archive of (hydro)meteorological variables for Amsterdam which is named ERA-urban. This will enable us to trace trends in thermal comfort and extreme precipitation.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"97 1","pages":"279-280"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76555339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Reflections from a Decade of Running CCPForge 运行CCPForge十年的感想
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00012
C. Jones, A. Kyffin, G. Poulter
This short paper shares the experience of running a collaborative software development platform for a decade and examines what the analysis of the usage shows about the software development practices of the community that used it and the lessons learnt from supporting this community.
这篇短文分享了十年来运行协作软件开发平台的经验,并检查了对使用它的社区的软件开发实践的分析,以及从支持该社区中吸取的教训。
{"title":"Reflections from a Decade of Running CCPForge","authors":"C. Jones, A. Kyffin, G. Poulter","doi":"10.1109/eScience.2018.00012","DOIUrl":"https://doi.org/10.1109/eScience.2018.00012","url":null,"abstract":"This short paper shares the experience of running a collaborative software development platform for a decade and examines what the analysis of the usage shows about the software development practices of the community that used it and the lessons learnt from supporting this community.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"100 1","pages":"21-22"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89436720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Skluma: An Extensible Metadata Extraction Pipeline for Disorganized Data Skluma:用于无序数据的可扩展元数据提取管道
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00040
Tyler J. Skluzacek, Rohan Kumar, Ryan Chard, Galen Harrison, Paul Beckman, K. Chard, Ian T Foster
To mitigate the effects of high-velocity data expansion and to automate the organization of filesystems and data repositories, we have developed Skluma-a system that automatically processes a target filesystem or repository, extracts content-and context-based metadata, and organizes extracted metadata for subsequent use. Skluma is able to extract diverse metadata, including aggregate values derived from embedded structured data; named entities and latent topics buried within free-text documents; and content encoded in images. Skluma implements an overarching probabilistic pipeline to extract increasingly specific metadata from files. It applies machine learning methods to determine file types, dynamically prioritizes and then executes a suite of metadata extractors, and explores contextual metadata based on relationships among files. The derived metadata, represented in JSON, describes probabilistic knowledge of each file that may be subsequently used for discovery or organization. Skluma's architecture enables it to be deployed both locally and used as an on-demand, cloud-hosted service to create and execute dynamic extraction workflows on massive numbers of files. It is modular and extensible-allowing users to contribute their own specialized metadata extractors. Thus far we have tested Skluma on local filesystems, remote FTP-accessible servers, and publicly-accessible Globus endpoints. We have demonstrated its efficacy by applying it to a scientific environmental data repository of more than 500,000 files. We show that we can extract metadata from those files with modest cloud costs in a few hours.
为了减轻高速数据扩展的影响并实现文件系统和数据存储库组织的自动化,我们开发了skluma——一个自动处理目标文件系统或存储库、提取基于内容和上下文的元数据并组织提取的元数据以供后续使用的系统。Skluma能够提取各种元数据,包括从嵌入式结构化数据中派生的聚合值;命名实体和隐藏在自由文本文档中的潜在主题;以及用图像编码的内容。Skluma实现了一个总体概率管道,从文件中提取越来越具体的元数据。它应用机器学习方法来确定文件类型,动态确定优先级,然后执行一套元数据提取器,并基于文件之间的关系探索上下文元数据。派生的元数据以JSON表示,描述了每个文件的概率知识,这些知识可能随后用于发现或组织。Skluma的架构使其既可以部署在本地,也可以作为按需云托管服务使用,可以在大量文件上创建和执行动态提取工作流。它是模块化和可扩展的——允许用户贡献他们自己专门的元数据提取器。到目前为止,我们已经在本地文件系统、远程ftp访问服务器和公共访问的Globus端点上测试了Skluma。我们已经通过将其应用于超过500,000个文件的科学环境数据存储库来证明其有效性。我们展示了我们可以在几个小时内以适度的云成本从这些文件中提取元数据。
{"title":"Skluma: An Extensible Metadata Extraction Pipeline for Disorganized Data","authors":"Tyler J. Skluzacek, Rohan Kumar, Ryan Chard, Galen Harrison, Paul Beckman, K. Chard, Ian T Foster","doi":"10.1109/eScience.2018.00040","DOIUrl":"https://doi.org/10.1109/eScience.2018.00040","url":null,"abstract":"To mitigate the effects of high-velocity data expansion and to automate the organization of filesystems and data repositories, we have developed Skluma-a system that automatically processes a target filesystem or repository, extracts content-and context-based metadata, and organizes extracted metadata for subsequent use. Skluma is able to extract diverse metadata, including aggregate values derived from embedded structured data; named entities and latent topics buried within free-text documents; and content encoded in images. Skluma implements an overarching probabilistic pipeline to extract increasingly specific metadata from files. It applies machine learning methods to determine file types, dynamically prioritizes and then executes a suite of metadata extractors, and explores contextual metadata based on relationships among files. The derived metadata, represented in JSON, describes probabilistic knowledge of each file that may be subsequently used for discovery or organization. Skluma's architecture enables it to be deployed both locally and used as an on-demand, cloud-hosted service to create and execute dynamic extraction workflows on massive numbers of files. It is modular and extensible-allowing users to contribute their own specialized metadata extractors. Thus far we have tested Skluma on local filesystems, remote FTP-accessible servers, and publicly-accessible Globus endpoints. We have demonstrated its efficacy by applying it to a scientific environmental data repository of more than 500,000 files. We show that we can extract metadata from those files with modest cloud costs in a few hours.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"115 1","pages":"256-266"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80854311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Serving Scientists in Agri-Food Area by Virtual Research Environments 利用虚拟研究环境为农业食品领域的科学家服务
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00124
A. Ballis, A. Boizet, Leonardo Candela, D. Castelli, E. Fernández, M. Filter, T. Günther, G. Kakaletris, P. Karampiperis, Dimitris Katris, R. Knapen, R. Lokers, L. Penev, G. Sipos, P. Zervas
Agri-food research calls for changes in the practices dealing with data collection, collation, processing and analytics, and publishing thus to fully benefit from and contribute to the Open Science movement. One of the major issues faced by the agri-food researchers is the fragmentation of the "assets" that can be exploited when performing research tasks, e.g. data of interest are heterogeneous and scattered across several repositories, the tools exploited by modellers are diverse and often rely on local computing environments, the publishing practices are various and rarely aim at making available the "whole story" with datasets, processes, workflows. This paper presents the AGINFRA+ endeavour to overcome these limitations by providing researchers in three designated communities with Virtual Research Environments facilitating the access to and use of the "assets" of interest and promote collaboration.
农业食品研究要求改变处理数据收集、整理、处理和分析以及出版的做法,从而充分受益并为开放科学运动做出贡献。农业食品研究人员面临的主要问题之一是在执行研究任务时可以利用的“资产”的碎片化,例如,感兴趣的数据是异构的,分散在几个存储库中,建模者利用的工具是多种多样的,通常依赖于本地计算环境,出版实践是多种多样的,很少旨在通过数据集,过程,工作流程提供“完整的故事”。本文介绍了AGINFRA+通过为三个指定社区的研究人员提供虚拟研究环境来克服这些限制,从而促进对感兴趣的“资产”的访问和使用,并促进合作。
{"title":"Serving Scientists in Agri-Food Area by Virtual Research Environments","authors":"A. Ballis, A. Boizet, Leonardo Candela, D. Castelli, E. Fernández, M. Filter, T. Günther, G. Kakaletris, P. Karampiperis, Dimitris Katris, R. Knapen, R. Lokers, L. Penev, G. Sipos, P. Zervas","doi":"10.1109/eScience.2018.00124","DOIUrl":"https://doi.org/10.1109/eScience.2018.00124","url":null,"abstract":"Agri-food research calls for changes in the practices dealing with data collection, collation, processing and analytics, and publishing thus to fully benefit from and contribute to the Open Science movement. One of the major issues faced by the agri-food researchers is the fragmentation of the \"assets\" that can be exploited when performing research tasks, e.g. data of interest are heterogeneous and scattered across several repositories, the tools exploited by modellers are diverse and often rely on local computing environments, the publishing practices are various and rarely aim at making available the \"whole story\" with datasets, processes, workflows. This paper presents the AGINFRA+ endeavour to overcome these limitations by providing researchers in three designated communities with Virtual Research Environments facilitating the access to and use of the \"assets\" of interest and promote collaboration.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"29 1","pages":"405-406"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75415537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2018 IEEE 14th International Conference on e-Science (e-Science)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1