首页 > 最新文献

2018 IEEE 14th International Conference on e-Science (e-Science)最新文献

英文 中文
Implementation of the ATLAS Trigger Within the ATLAS Multi-threaded Software Framework AthenaMT ATLAS多线程软件框架AthenaMT中ATLAS触发器的实现
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00084
T. Martin
n/a
{"title":"Implementation of the ATLAS Trigger Within the ATLAS Multi-threaded Software Framework AthenaMT","authors":"T. Martin","doi":"10.1109/eScience.2018.00084","DOIUrl":"https://doi.org/10.1109/eScience.2018.00084","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"27 1","pages":"339-339"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74079503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Estimating Subgraph Generation Models to Understand Large Network Formation 估计子图生成模型以理解大型网络的形成
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00106
L. Bogaardt, Frank W. Takes
Recently, a new network formation model was proposed: SUGM. Our research looks into a method to estimate the parameters of this model based on the subgraph census.
最近,提出了一种新的网络形成模型:SUGM。本文研究了一种基于子图普查的模型参数估计方法。
{"title":"Estimating Subgraph Generation Models to Understand Large Network Formation","authors":"L. Bogaardt, Frank W. Takes","doi":"10.1109/eScience.2018.00106","DOIUrl":"https://doi.org/10.1109/eScience.2018.00106","url":null,"abstract":"Recently, a new network formation model was proposed: SUGM. Our research looks into a method to estimate the parameters of this model based on the subgraph census.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"70 1","pages":"375-376"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74595272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Linking Natural History Collections 博物馆藏链接
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00113
Lise Stork, Andreas Weber, E. Miracle, K. Wolstencroft
n/a
{"title":"Linking Natural History Collections","authors":"Lise Stork, Andreas Weber, E. Miracle, K. Wolstencroft","doi":"10.1109/eScience.2018.00113","DOIUrl":"https://doi.org/10.1109/eScience.2018.00113","url":null,"abstract":"n/a","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"1 1","pages":"388-389"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73004001","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Spark-Based Platform to Extract Phenological Information from Satellite Images 基于spark的卫星图像物候信息提取平台
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00095
Viktor Bakayov, R. Goncalves, R. Zurita-Milla, E. Izquierdo-Verdiguier
Phenology is the study of periodic plant and animal life cycle events and how these are influenced by seasonal and inter-annual variations in weather and climate, as well as in other environmental factors. Time series of remote sensing (RS) images can be used to characterize land surface phenology at continental to global scales. For this, the RS images are typically transformed into various vegetation indices (VI) such as the normalized difference vegetation index (NDVI) or the enhanced vegetation index (EVI). These indices can then be used to extract various phenological metrics. In our previous work we used cloud computing to generate temperature-based phenological indices [1], [2], and to relate one phenological metric, namely the Start-of-Season (SOS), with those indices [3], [4]. Here we present an extension of our work where we use a Spark-based platform to efficiently extract phenological metrics from time series of NDVI and EVI. This platform allows obtaining and analyzing high spatial resolution metrics (in this case 1km) from 10-day composites. The platform uses the same architecture as in [3], i.e., it is organized into three layers: a storage layer, a processing layer, and JupyterHub services for user-interaction. It is designed to store the data in well-known file formats like GeoTiffs and Hierarchical Data Format (HDF). For the data analysis the user expresses the operations in Jupyter notebooks as Python, R, or Scala code (Fig. 1). Hence, with a browser and remote connection, the user can express a research question and/or collect insights from large data sets. All computations are pushed down to the computational platform, and results fetched back for data visualization. To extract the phenological metrics, we rely on TimeSat [5]. TimeSat is a software package that can be used to fit a function (e.g. double logistic) to time series of VIs. After that, it uses various approaches to extract vegetation seasonality metrics such as SOS. The programs numerical and graphical routines are coded in Matlab and Fortran. These routines are highly vectorized and efficient for use with large data sets. However, distributed processing is required to determine SOS at continental scales. Through an efficient partition of the data, and Spark’s scheduling policies, these single-core routines are scheduled for parallel execution over multiple machines. The study evaluates which VIs and fitting functions are most suitable for certain vegetation types by comparing the SOS metrics to volunteered phenological observations curated by the USA national phenological network [6]. Our preliminary results show there can be up to 20-30 days differences in the SOS depending on the fitting function, the VI and the approach used to extract the SOS metric. In the South, SOS is around mid-February or March whereas in mountainous regions and the North, the SOS can be as late as June-July. We are to further evaluate how our results compare to the ground volunteered ob
物候学是研究周期性植物和动物生命周期事件,以及这些事件如何受到季节和年际天气和气候变化以及其他环境因素的影响。时间序列遥感影像可用于表征大陆到全球尺度的陆地表面物候特征。为此,通常将RS图像转换为各种植被指数(VI),如归一化植被指数(NDVI)或增强植被指数(EVI)。这些指数可以用来提取各种物候指标。在我们之前的工作中,我们使用云计算来生成基于温度的物候指数[1],[2],并将一个物候指标,即季节开始(SOS)与这些指数[3],[4]联系起来。在这里,我们展示了我们工作的扩展,我们使用基于spark的平台从NDVI和EVI的时间序列中有效地提取物候指标。该平台可以从10天的复合材料中获取和分析高空间分辨率指标(在这种情况下为1公里)。该平台使用与[3]相同的架构,即它被组织为三层:存储层、处理层和用于用户交互的JupyterHub服务。它被设计成以众所周知的文件格式存储数据,如geotiff和分层数据格式(HDF)。对于数据分析,用户将Jupyter笔记本中的操作表达为Python, R或Scala代码(图1)。因此,通过浏览器和远程连接,用户可以表达研究问题和/或从大数据集中收集见解。所有的计算都下推到计算平台,并将结果提取出来用于数据可视化。为了提取物候指标,我们依赖于TimeSat[5]。TimeSat是一个软件包,可以用来拟合一个函数(如双逻辑)的时间序列的VIs。之后,它使用各种方法提取植被季节性指标,如SOS。程序的数值例程和图形例程分别用Matlab和Fortran编写。这些例程是高度矢量化的,对于大型数据集的使用效率很高。然而,需要分布式处理来确定大陆尺度上的SOS。通过有效的数据分区和Spark的调度策略,这些单核例程被安排在多台机器上并行执行。该研究通过比较SOS指标与美国国家物候网络[6]组织的志愿物候观测,评估了哪些VIs和拟合函数最适合某些植被类型。我们的初步结果表明,根据拟合函数,VI和用于提取SOS度量的方法,SOS可能存在高达20-30天的差异。在南方,SOS大约在2月中旬或3月,而在山区和北方,SOS可能晚至6月至7月。我们将进一步评估我们的结果与地面自愿观测结果的比较。这项工作是能够系统地分析和绘制气候变化对植物季节性影响的第一块踏脚石。我们的测试表明,该平台是可扩展的,可以扩展到更高分辨率的VIs,例如那些可以从Sentinel-2图像(10米分辨率)中获得的VIs。正因为如此,我们的工作为大陆到全球范围的研究打开了大门,并为使用高和非常高的空间分辨率数据打开了大门。
{"title":"A Spark-Based Platform to Extract Phenological Information from Satellite Images","authors":"Viktor Bakayov, R. Goncalves, R. Zurita-Milla, E. Izquierdo-Verdiguier","doi":"10.1109/eScience.2018.00095","DOIUrl":"https://doi.org/10.1109/eScience.2018.00095","url":null,"abstract":"Phenology is the study of periodic plant and animal life cycle events and how these are influenced by seasonal and inter-annual variations in weather and climate, as well as in other environmental factors. Time series of remote sensing (RS) images can be used to characterize land surface phenology at continental to global scales. For this, the RS images are typically transformed into various vegetation indices (VI) such as the normalized difference vegetation index (NDVI) or the enhanced vegetation index (EVI). These indices can then be used to extract various phenological metrics. In our previous work we used cloud computing to generate temperature-based phenological indices [1], [2], and to relate one phenological metric, namely the Start-of-Season (SOS), with those indices [3], [4]. Here we present an extension of our work where we use a Spark-based platform to efficiently extract phenological metrics from time series of NDVI and EVI. This platform allows obtaining and analyzing high spatial resolution metrics (in this case 1km) from 10-day composites. The platform uses the same architecture as in [3], i.e., it is organized into three layers: a storage layer, a processing layer, and JupyterHub services for user-interaction. It is designed to store the data in well-known file formats like GeoTiffs and Hierarchical Data Format (HDF). For the data analysis the user expresses the operations in Jupyter notebooks as Python, R, or Scala code (Fig. 1). Hence, with a browser and remote connection, the user can express a research question and/or collect insights from large data sets. All computations are pushed down to the computational platform, and results fetched back for data visualization. To extract the phenological metrics, we rely on TimeSat [5]. TimeSat is a software package that can be used to fit a function (e.g. double logistic) to time series of VIs. After that, it uses various approaches to extract vegetation seasonality metrics such as SOS. The programs numerical and graphical routines are coded in Matlab and Fortran. These routines are highly vectorized and efficient for use with large data sets. However, distributed processing is required to determine SOS at continental scales. Through an efficient partition of the data, and Spark’s scheduling policies, these single-core routines are scheduled for parallel execution over multiple machines. The study evaluates which VIs and fitting functions are most suitable for certain vegetation types by comparing the SOS metrics to volunteered phenological observations curated by the USA national phenological network [6]. Our preliminary results show there can be up to 20-30 days differences in the SOS depending on the fitting function, the VI and the approach used to extract the SOS metric. In the South, SOS is around mid-February or March whereas in mountainous regions and the North, the SOS can be as late as June-July. We are to further evaluate how our results compare to the ground volunteered ob","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"42 1","pages":"354-355"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78413015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Smart Data Scouting in Professional Soccer: Evaluating Passing Performance Based on Position Tracking Data 职业足球中的智能数据侦察:基于位置跟踪数据评估传球表现
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00126
M. Kempe, F. Goes, K. Lemmink
Sports analytics in general and soccer analytics, in particular, have evolved in recent years due to the increased availability of large data amounts of (tracking) data. Especially in terms of evaluating tactical behavior, data science could change the way we think about soccer. In this study, we evaluate passing performance in soccer to prove the hypothesis that tactical behavior in team sports can be analyzed based exclusively on tracking data. To prove this point, we explore the relationship between changes in spatiotemporal variables in relation to passing and key performance indicators. Based on our results that demonstrate the ability of spatiotemporal variables to predict pass accuracy and key performances indicators on an individual level, we confirmed our hypothesis. Furthermore, we calculated a simple composite performance indicator to evaluate passes and players based on tracking data. In conclusion, our results can be used as an approach for real-time evaluation of tactical behavior and as a new method to scout and evaluate players in soccer and team sports in general.
一般来说,体育分析,特别是足球分析,由于大数据量(跟踪)数据的可用性增加,近年来得到了发展。特别是在评估战术行为方面,数据科学可以改变我们对足球的看法。在本研究中,我们评估了足球中的传球表现,以证明团队运动中的战术行为可以完全基于跟踪数据进行分析的假设。为了证明这一点,我们探讨了与传球和关键绩效指标相关的时空变量变化之间的关系。基于我们的研究结果,时空变量能够在个体层面上预测传球精度和关键性能指标,我们证实了我们的假设。此外,我们还基于追踪数据计算了一个简单的综合性能指标来评估传球和球员。总之,我们的研究结果可以作为战术行为实时评估的一种方法,也可以作为一种新的方法来球探和评估足球和团队运动中的球员。
{"title":"Smart Data Scouting in Professional Soccer: Evaluating Passing Performance Based on Position Tracking Data","authors":"M. Kempe, F. Goes, K. Lemmink","doi":"10.1109/eScience.2018.00126","DOIUrl":"https://doi.org/10.1109/eScience.2018.00126","url":null,"abstract":"Sports analytics in general and soccer analytics, in particular, have evolved in recent years due to the increased availability of large data amounts of (tracking) data. Especially in terms of evaluating tactical behavior, data science could change the way we think about soccer. In this study, we evaluate passing performance in soccer to prove the hypothesis that tactical behavior in team sports can be analyzed based exclusively on tracking data. To prove this point, we explore the relationship between changes in spatiotemporal variables in relation to passing and key performance indicators. Based on our results that demonstrate the ability of spatiotemporal variables to predict pass accuracy and key performances indicators on an individual level, we confirmed our hypothesis. Furthermore, we calculated a simple composite performance indicator to evaluate passes and players based on tracking data. In conclusion, our results can be used as an approach for real-time evaluation of tactical behavior and as a new method to scout and evaluate players in soccer and team sports in general.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"26 3","pages":"409-410"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72627661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
De-duplicating the OpenAIRE Scholarly Communication Big Graph OpenAIRE学术交流大图的去重复化
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00104
Claudio Atzori, P. Manghi, A. Bardi
The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a fully-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.
OpenAIRE基础结构填充了一个学术交流大图表,将出版物、数据集、软件、组织、资助者和项目的元数据对象相互连接。为了对这个图进行重复数据删除,OpenAIRE开发了GDup,这是一个集成的、可扩展的通用系统,用于对大信息图进行实体重复数据删除。GDup提供了在通用输入图上实现成熟的实体重复数据删除工作流的功能,包括Ground Truth支持、最终用户反馈,以及用于识别和合并重复项以获得输出消歧图的策略。
{"title":"De-duplicating the OpenAIRE Scholarly Communication Big Graph","authors":"Claudio Atzori, P. Manghi, A. Bardi","doi":"10.1109/eScience.2018.00104","DOIUrl":"https://doi.org/10.1109/eScience.2018.00104","url":null,"abstract":"The OpenAIRE infrastructure populates a scholarly communication big graph interlinking metadata objects of publications, datasets, software, organizations, funders, and projects. In order to de-duplicate this graph, OpenAIRE has developed GDup, an integrated, scalable, general-purpose system for entity deduplication over big information graphs. GDup offers functionalities to realize a fully-fledged entity deduplication workflow over a generic input graph, inclusive of Ground Truth support, end-user feedback, and strategies for identifying and merging duplicates to obtain an output disambiguated graph.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"25 1","pages":"372-373"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81835274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Lesson Development for Open Source Software Best Practices Adoption 开源软件最佳实践采用的课程开发
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00011
Mateusz Kuzak, Jen Harrow, R. Jiménez, P. Martínez, Fotis Psomopoulos, R. Vareková, A. Via
The "ELIXIR Training Platform" is partnering with The Carpentries (Software and Data Carpentry) to train life science researchers in computing and data management skills. The "ELIXIR Software development best practices" group, which is part of the ELIXIR Tools Platform, has proposed "Four simple recommendations to encourage best practices in research software" aiming to help researchers and developers to adopt Open Source Software (OSS) practices and thus improve the quality and sustainability of research software. In order to encourage researchers and developers to adopt the four recommendations (4OSS) and build FAIR software, we are developing specific and practical training materials, taking advantage of the Carpentries approach and experience in training material development and maintenance.
“ELIXIR培训平台”正在与The Carpentries(软件和数据木工)合作,培训生命科学研究人员的计算和数据管理技能。作为ELIXIR工具平台的一部分,“ELIXIR软件开发最佳实践”小组提出了“鼓励研究软件最佳实践的四个简单建议”,旨在帮助研究人员和开发人员采用开源软件(OSS)实践,从而提高研究软件的质量和可持续性。为了鼓励研究人员和开发人员采用四个建议(4OSS)并构建FAIR软件,我们正在开发具体的实用培训材料,利用Carpentries方法和培训材料开发和维护方面的经验。
{"title":"Lesson Development for Open Source Software Best Practices Adoption","authors":"Mateusz Kuzak, Jen Harrow, R. Jiménez, P. Martínez, Fotis Psomopoulos, R. Vareková, A. Via","doi":"10.1109/eScience.2018.00011","DOIUrl":"https://doi.org/10.1109/eScience.2018.00011","url":null,"abstract":"The \"ELIXIR Training Platform\" is partnering with The Carpentries (Software and Data Carpentry) to train life science researchers in computing and data management skills. The \"ELIXIR Software development best practices\" group, which is part of the ELIXIR Tools Platform, has proposed \"Four simple recommendations to encourage best practices in research software\" aiming to help researchers and developers to adopt Open Source Software (OSS) practices and thus improve the quality and sustainability of research software. In order to encourage researchers and developers to adopt the four recommendations (4OSS) and build FAIR software, we are developing specific and practical training materials, taking advantage of the Carpentries approach and experience in training material development and maintenance.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"15 1","pages":"19-20"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77969846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Workflows Orchestrating Workflows: Thousands of Queries and Their Fault Tolerance Using APIs of Omics Web Resources 工作流编排工作流:使用组态Web资源api的数千个查询及其容错性
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00061
Yassene Mohammed
High throughput -omics like proteomics and genomics allow detailed molecular studies of organisms. Such studies are inherently on the Big Data side regarding volume and complexity. Following the FAIR principles and reaching for transparency in publication, raw data and results are often shared in public repositories. However, despite the steadily increased amount of shared omics data, it is still challenging to compare, correlate, and integrate it to answer new questions. Here we report on our experience in reusing and repurposing publically available proteomics and genomics data to design new targeted proteomics experiments. We have developed a scientific workflow to retrieve and integrate information from various repositories and domain knowledge-bases including UniPortKB [1], GPMDB [2], PRIDE [3], PeptideAtlas [4], ProteomicsDB [5], MassIVE [6], ExPASy [7], NCBI’s dbSNP [8], and PeptideTracker [9]. Following a “Map-Reduce” approach [10] the workflow select best proteotypic peptides for Multiple Reaction Monitoring (MRM) experiment. In an attempt to gain insights into the human proteome, we have designed a second workflow to orchestrate the selection workflow. 100,000s of queries were sent to online repositories to determine if peptides were seen in previous experiments. Fault tolerance ranged from dealing with no-reply to wrong annotations. Three months run of the workflow generated a comprehensive list of 165k+ suitable proteotypic peptides covering most human proteins. The main challenge has been the evolving APIs of the resources which continuously affects the components of our integrative bioinformatic solutions.
像蛋白质组学和基因组学这样的高通量组学允许对生物体进行详细的分子研究。这些研究在数量和复杂性方面本质上是大数据方面的。遵循公平原则并在发布中达到透明度,原始数据和结果通常在公共存储库中共享。然而,尽管共享组学数据的数量稳步增加,但对其进行比较、关联和整合以回答新问题仍然具有挑战性。在这里,我们报告了我们在重用和重新利用公开可用的蛋白质组学和基因组学数据来设计新的靶向蛋白质组学实验方面的经验。我们已经开发了一个科学的工作流程来检索和整合来自不同存储库和领域知识库的信息,包括UniPortKB[1]、GPMDB[2]、PRIDE[3]、PeptideAtlas[4]、ProteomicsDB[5]、MassIVE[6]、ExPASy[7]、NCBI的dbSNP[8]和PeptideTracker[9]。遵循“Map-Reduce”方法[10],工作流程为多反应监测(MRM)实验选择最佳的蛋白型肽。为了深入了解人类蛋白质组,我们设计了第二个工作流程来协调选择工作流程。10万个查询被发送到在线存储库,以确定是否在以前的实验中看到了肽。容错范围从处理无回复到错误注释。三个月的工作流程生成了覆盖大多数人类蛋白质的165k+合适的蛋白型肽的综合列表。主要的挑战是不断发展的api资源,不断影响我们的综合生物信息学解决方案的组成部分。
{"title":"Workflows Orchestrating Workflows: Thousands of Queries and Their Fault Tolerance Using APIs of Omics Web Resources","authors":"Yassene Mohammed","doi":"10.1109/eScience.2018.00061","DOIUrl":"https://doi.org/10.1109/eScience.2018.00061","url":null,"abstract":"High throughput -omics like proteomics and genomics allow detailed molecular studies of organisms. Such studies are inherently on the Big Data side regarding volume and complexity. Following the FAIR principles and reaching for transparency in publication, raw data and results are often shared in public repositories. However, despite the steadily increased amount of shared omics data, it is still challenging to compare, correlate, and integrate it to answer new questions. Here we report on our experience in reusing and repurposing publically available proteomics and genomics data to design new targeted proteomics experiments. We have developed a scientific workflow to retrieve and integrate information from various repositories and domain knowledge-bases including UniPortKB [1], GPMDB [2], PRIDE [3], PeptideAtlas [4], ProteomicsDB [5], MassIVE [6], ExPASy [7], NCBI’s dbSNP [8], and PeptideTracker [9]. Following a “Map-Reduce” approach [10] the workflow select best proteotypic peptides for Multiple Reaction Monitoring (MRM) experiment. In an attempt to gain insights into the human proteome, we have designed a second workflow to orchestrate the selection workflow. 100,000s of queries were sent to online repositories to determine if peptides were seen in previous experiments. Fault tolerance ranged from dealing with no-reply to wrong annotations. Three months run of the workflow generated a comprehensive list of 165k+ suitable proteotypic peptides covering most human proteins. The main challenge has been the evolving APIs of the resources which continuously affects the components of our integrative bioinformatic solutions.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"145 1","pages":"299-300"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80459975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Web Service Architecture for Objective Station Classification Purposes 一种用于客观站点分类的Web服务体系结构
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00051
M. Schultz, Sander Apweiler, Jan Vogelsang, B. Hagemeier, F. Kleinert, Daniel Mallmann
The Tropospheric Ozone Assessment Report (TOAR) has recently pioneered the use of global Earth Observation data to derive a globally consistent scheme for characterizing the local environment of stations measuring weather and atmospheric composition. Here, we are building on the TOAR concept and expand it to a set of web services, which will allow for a flexible, automated characterization of any point location through standardized REST APIs. These services shall be freely available to the community and thus pave the way for new concepts to analyze global monitoring data and evaluate numerical models.
对流层臭氧评估报告(TOAR)最近率先使用全球地球观测数据,得出一个全球一致的方案,用于描述测量天气和大气成分的站点的当地环境。在这里,我们正在构建TOAR概念,并将其扩展为一组web服务,这将允许通过标准化的REST api对任何点位置进行灵活、自动化的表征。这些服务应免费提供给社会,从而为分析全球监测数据和评价数值模型的新概念铺平道路。
{"title":"A Web Service Architecture for Objective Station Classification Purposes","authors":"M. Schultz, Sander Apweiler, Jan Vogelsang, B. Hagemeier, F. Kleinert, Daniel Mallmann","doi":"10.1109/eScience.2018.00051","DOIUrl":"https://doi.org/10.1109/eScience.2018.00051","url":null,"abstract":"The Tropospheric Ozone Assessment Report (TOAR) has recently pioneered the use of global Earth Observation data to derive a globally consistent scheme for characterizing the local environment of stations measuring weather and atmospheric composition. Here, we are building on the TOAR concept and expand it to a set of web services, which will allow for a flexible, automated characterization of any point location through standardized REST APIs. These services shall be freely available to the community and thus pave the way for new concepts to analyze global monitoring data and evaluate numerical models.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"25 1","pages":"283-284"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78883739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Educational Outreach & Stakeholder Role Evolution in a Cyberinfrastructure Project 网络基础设施项目中的教育拓展与利益相关者角色演变
Pub Date : 2018-10-01 DOI: 10.1109/eScience.2018.00035
David P. Randall, Drew Paine, Charlotte P. Lee
Over the last several years, a growing body of work has examined the nature of large-scale virtual organizations for data-intensive cooperative science. These projects, known as Cyberinfrastructures (CI) in the United States, are established realms of inquiry for the eScience and Computer Supported Cooperative Work (CSCW) communities. Scholarship in these communities extends technology focused inquiries to investigate the sociotechnical concerns to such infrastructure creation and maintenance. In this paper we present findings from our qualitative study of a federated cyberinfrastructure organization known as GENI. We contribute to this body of scholarship by investigating how stakeholders in the GENI project position existing, and newly created, resources for use in educational settings. We examine how stakeholders acquaint new potential stakeholders with this CI in order to draw them into the community, and the ways in which stakeholder's roles evolve over time. Our findings illustrate several ways stakeholders leverage and align existing relationships and resources to expand the CI project's user base. Finally, this paper suggests avenues of further inquiry and implications for organizing future CI projects.
在过去的几年里,越来越多的工作研究了数据密集型合作科学的大规模虚拟组织的性质。这些项目在美国被称为网络基础设施(cyber infrastructure, CI),是eScience和计算机支持的协同工作(Computer Supported Cooperative Work, CSCW)社区研究的既定领域。这些社区的学术研究扩展了以技术为中心的调查,以调查社会技术对基础设施创建和维护的关注。在本文中,我们展示了我们对联邦网络基础设施组织GENI的定性研究结果。我们通过调查GENI项目的利益相关者如何定位现有的和新创建的用于教育环境的资源,为这一学术体系做出贡献。我们研究了利益相关者如何让新的潜在利益相关者熟悉这个CI,以便将他们吸引到社区中,以及利益相关者的角色随着时间的推移而演变的方式。我们的发现说明了利益相关者利用和调整现有关系和资源来扩展CI项目用户群的几种方法。最后,本文提出了进一步研究的途径和组织未来CI项目的启示。
{"title":"Educational Outreach & Stakeholder Role Evolution in a Cyberinfrastructure Project","authors":"David P. Randall, Drew Paine, Charlotte P. Lee","doi":"10.1109/eScience.2018.00035","DOIUrl":"https://doi.org/10.1109/eScience.2018.00035","url":null,"abstract":"Over the last several years, a growing body of work has examined the nature of large-scale virtual organizations for data-intensive cooperative science. These projects, known as Cyberinfrastructures (CI) in the United States, are established realms of inquiry for the eScience and Computer Supported Cooperative Work (CSCW) communities. Scholarship in these communities extends technology focused inquiries to investigate the sociotechnical concerns to such infrastructure creation and maintenance. In this paper we present findings from our qualitative study of a federated cyberinfrastructure organization known as GENI. We contribute to this body of scholarship by investigating how stakeholders in the GENI project position existing, and newly created, resources for use in educational settings. We examine how stakeholders acquaint new potential stakeholders with this CI in order to draw them into the community, and the ways in which stakeholder's roles evolve over time. Our findings illustrate several ways stakeholders leverage and align existing relationships and resources to expand the CI project's user base. Finally, this paper suggests avenues of further inquiry and implications for organizing future CI projects.","PeriodicalId":6476,"journal":{"name":"2018 IEEE 14th International Conference on e-Science (e-Science)","volume":"73 1","pages":"201-211"},"PeriodicalIF":0.0,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86928464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2018 IEEE 14th International Conference on e-Science (e-Science)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1