首页 > 最新文献

2012 IEEE 8th International Conference on E-Science最新文献

英文 中文
Scientific Workflow Interchanging through Patterns: Reversals and Lessons Learned 通过模式的科学工作流交换:反转和经验教训
Pub Date : 2015-08-31 DOI: 10.1109/eScience.2015.26
Bruno F. Bastos, Regina M. M. Braga, A. A. Gomes
Scientific workflows are used for dealing with complex problems in different e-science domains. These workflows are modeled and executed using Scientific Workflow Management Systems (SWfMSs). Generally, SWfMSs provide their own Workflow Specification Language (WfSL), and this is a challenge considering the possibility of interchanging workflow specifications between different SWfMSs. Nevertheless, the reuse of workflows gains growing importance as it helps with fostering the collaboration and cross-fertilization across different research groups. This paper presents a research proposal, including its mishaps and assimilations, on the use of workflow patterns combined with software architecture concepts to capture the key semantics expressed in scientific workflows specified in different WfSLs and to allow the interchanging of these specifications between different SWfMSs. This paper also shows how our findings based on real world specifications led us to reformulate our initial proposal and discuss the new results.
科学工作流用于处理不同电子科学领域的复杂问题。这些工作流是使用科学工作流管理系统(swfms)建模和执行的。通常,swfms提供自己的工作流规范语言(Workflow Specification Language, WfSL),考虑到在不同swfms之间交换工作流规范的可能性,这是一个挑战。然而,工作流的重用变得越来越重要,因为它有助于促进不同研究小组之间的协作和交叉施肥。本文提出了一个研究建议,包括它的错误和吸收,关于使用工作流模式结合软件体系结构概念来捕获在不同的wfsl中指定的科学工作流中表达的关键语义,并允许在不同的swfms之间交换这些规范。本文还展示了我们基于现实世界规范的发现如何引导我们重新制定我们的初始建议并讨论新的结果。
{"title":"Scientific Workflow Interchanging through Patterns: Reversals and Lessons Learned","authors":"Bruno F. Bastos, Regina M. M. Braga, A. A. Gomes","doi":"10.1109/eScience.2015.26","DOIUrl":"https://doi.org/10.1109/eScience.2015.26","url":null,"abstract":"Scientific workflows are used for dealing with complex problems in different e-science domains. These workflows are modeled and executed using Scientific Workflow Management Systems (SWfMSs). Generally, SWfMSs provide their own Workflow Specification Language (WfSL), and this is a challenge considering the possibility of interchanging workflow specifications between different SWfMSs. Nevertheless, the reuse of workflows gains growing importance as it helps with fostering the collaboration and cross-fertilization across different research groups. This paper presents a research proposal, including its mishaps and assimilations, on the use of workflow patterns combined with software architecture concepts to capture the key semantics expressed in scientific workflows specified in different WfSLs and to allow the interchanging of these specifications between different SWfMSs. This paper also shows how our findings based on real world specifications led us to reformulate our initial proposal and discuss the new results.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"34 1","pages":"557-564"},"PeriodicalIF":0.0,"publicationDate":"2015-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75511419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Shape Analysis Using the Spectral Graph Wavelet Transform 基于谱图小波变换的形状分析
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.45
J. Leandro, R. M. C. Junior, R. Feris
The present work describes a framework for morphological characterization of galaxies based on the Spectral Graph Wavelet Transform. A galaxy image is sampled with a number of points randomly chosen, whose Delaunay triangulation results in an arbitrary graph. The average intensity value in a 5 × 5 vicinity of a pixel related to a graph vertex is assigned to the corresponding graph vertex. A weight inversely proportional to the photometric distance between each pair of vertices is assigned to the respective graph edge. The Spectral Graph Wavelet Transform is computed from this weighted graph with real-valued vertices yielding a high-dimensional feature vector, which is reduced to a two dimensional vector through Principal Component Analysis. The proposed framework has been assessed through two case studies, namely, the case study of analyzing (i) 2D binary images from shapes and preliminary results of (ii) 2D gray tone images from galaxies. The obtained results imply the suitability of this framework for the characterization of galaxies images.
本文描述了一个基于谱图小波变换的星系形态表征框架。用随机选择的若干点对星系图像进行采样,其Delaunay三角剖分结果为任意图。将与图顶点相关的像素的5 × 5附近的平均强度值分配给相应的图顶点。将与每对顶点之间的光度距离成反比的权值分配给各自的图边。该谱图小波变换由实值顶点加权图计算得到高维特征向量,通过主成分分析将其降阶为二维特征向量。通过两个案例研究,即分析(i)来自形状的二维二值图像的案例研究和(ii)来自星系的二维灰度图像的初步结果,对拟议的框架进行了评估。所获得的结果表明,该框架适用于星系图像的表征。
{"title":"Shape Analysis Using the Spectral Graph Wavelet Transform","authors":"J. Leandro, R. M. C. Junior, R. Feris","doi":"10.1109/eScience.2013.45","DOIUrl":"https://doi.org/10.1109/eScience.2013.45","url":null,"abstract":"The present work describes a framework for morphological characterization of galaxies based on the Spectral Graph Wavelet Transform. A galaxy image is sampled with a number of points randomly chosen, whose Delaunay triangulation results in an arbitrary graph. The average intensity value in a 5 × 5 vicinity of a pixel related to a graph vertex is assigned to the corresponding graph vertex. A weight inversely proportional to the photometric distance between each pair of vertices is assigned to the respective graph edge. The Spectral Graph Wavelet Transform is computed from this weighted graph with real-valued vertices yielding a high-dimensional feature vector, which is reduced to a two dimensional vector through Principal Component Analysis. The proposed framework has been assessed through two case studies, namely, the case study of analyzing (i) 2D binary images from shapes and preliminary results of (ii) 2D gray tone images from galaxies. The obtained results imply the suitability of this framework for the characterization of galaxies images.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"34 1","pages":"307-316"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76526015","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Generalized representation and mapping for social-ecological data: Freeing data from the database 社会生态数据的广义表示和映射:从数据库中释放数据
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404486
S. Jensen, Beth Plale, Xiaozhong Liu, Miao Chen, David B. Leake, Julie England
Scientific discovery increasingly requires collaboration between scientific sub-domains that often have different representations for their data. To bridge gaps between varying domain representations, researchers are developing metadata and semantic representations meaningful to broader communities. Through exploiting these representations we propose a logical model and architecture by which cross-domain researchers can more easily discover, use, and eventually archive, data. In this paper we present an architecture, intermediate data model, and methodology for mapping diverse social-ecological data sources stored in relational databases to a common representation, and for classifying textual data using machine learning. The results are visualized through client views that are built against the general logical model, and applied against a longitudinal database from social-ecological research.
科学发现越来越需要科学子领域之间的协作,这些子领域通常对其数据有不同的表示。为了弥合不同领域表示之间的差距,研究人员正在开发对更广泛的社区有意义的元数据和语义表示。通过利用这些表示,我们提出了一个逻辑模型和架构,通过该模型和架构,跨领域研究人员可以更容易地发现、使用并最终存档数据。在本文中,我们提出了一种架构、中间数据模型和方法,用于将存储在关系数据库中的各种社会生态数据源映射到一个共同的表示,并使用机器学习对文本数据进行分类。结果通过基于一般逻辑模型构建的客户视图可视化,并应用于来自社会生态研究的纵向数据库。
{"title":"Generalized representation and mapping for social-ecological data: Freeing data from the database","authors":"S. Jensen, Beth Plale, Xiaozhong Liu, Miao Chen, David B. Leake, Julie England","doi":"10.1109/eScience.2012.6404486","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404486","url":null,"abstract":"Scientific discovery increasingly requires collaboration between scientific sub-domains that often have different representations for their data. To bridge gaps between varying domain representations, researchers are developing metadata and semantic representations meaningful to broader communities. Through exploiting these representations we propose a logical model and architecture by which cross-domain researchers can more easily discover, use, and eventually archive, data. In this paper we present an architecture, intermediate data model, and methodology for mapping diverse social-ecological data sources stored in relational databases to a common representation, and for classifying textual data using machine learning. The results are visualized through client views that are built against the general logical model, and applied against a longitudinal database from social-ecological research.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"52 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74789463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Scientific workflow rewriting while preserving provenance 科学的工作流重写,同时保留来源
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404419
Sarah Cohen Boulakia, C. Froidevaux, Jiuqiang Chen
Scientific workflow systems are numerous and equipped of provenance modules able to collect data produced and consumed during workflow runs to enhance reproducibility. An increasing number of approaches have been developed to help managing provenance information. Some of them are able to process data in a polynomial time but they require workflows to have series-parallel (SP) structures. Rewriting any workflow into an SP workflow is thus particularly important. In this paper, (i) we introduce the concept of provenance-equivalent rewriting process, (ii) we review existing graph transformations, (iii) we design the provenance-equivalent SPFlow algorithm, (iv) we evaluate our approach over a thousand of real workflows.
科学的工作流系统数量众多,并配备了能够收集工作流运行期间产生和消耗的数据的来源模块,以提高可重复性。已经开发了越来越多的方法来帮助管理来源信息。其中一些能够在多项式时间内处理数据,但它们要求工作流具有串并联(SP)结构。因此,将任何工作流重写为SP工作流尤为重要。在本文中,(i)我们引入了等价出处重写过程的概念,(ii)我们回顾了现有的图转换,(iii)我们设计了等价出处SPFlow算法,(iv)我们在一千多个实际工作流中评估了我们的方法。
{"title":"Scientific workflow rewriting while preserving provenance","authors":"Sarah Cohen Boulakia, C. Froidevaux, Jiuqiang Chen","doi":"10.1109/eScience.2012.6404419","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404419","url":null,"abstract":"Scientific workflow systems are numerous and equipped of provenance modules able to collect data produced and consumed during workflow runs to enhance reproducibility. An increasing number of approaches have been developed to help managing provenance information. Some of them are able to process data in a polynomial time but they require workflows to have series-parallel (SP) structures. Rewriting any workflow into an SP workflow is thus particularly important. In this paper, (i) we introduce the concept of provenance-equivalent rewriting process, (ii) we review existing graph transformations, (iii) we design the provenance-equivalent SPFlow algorithm, (iv) we evaluate our approach over a thousand of real workflows.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"75 1","pages":"1-9"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74977586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
g-Social: Enhancing integrated e-science tools with Social Networking functionality g-Social:增强集成电子科学工具与社交网络功能
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404454
Andriani Stylianou, N. Loulloudes, M. Dikaiakos
During the last decade, the scientific community has witnessed an unprecedented deployment of large-scale, federated e-Infrastructures such as Grid Computing, primarily for supporting data-intensive scientific exploration and coordinated problem solving. However, practical experience and user studies have indicated that the adoption of such e-Infrastructures is lagging behind original expectations, a fact which is mainly attributed to the limited support that available tools provide for user collaboration and information sharing. The goal of this paper is twofold, first to lay down the foundations for building a collaboration environment in the form of abstractions and second to show the effectiveness of these abstractions through g-Social, an Eclipse-based, open-source environment as an extension to g-Eclipse, that provides a powerful, user-friendly, platform-independent toolset for users, application developers and administrators of Grid infrastructures. g-Social enables user collaboration and resource sharing through Online Social Networking services, capitalizing on the success that these services have.
在过去的十年中,科学界见证了前所未有的大规模联合电子基础设施(如网格计算)的部署,主要用于支持数据密集型科学探索和协调解决问题。然而,实践经验和用户研究表明,这种电子基础设施的采用落后于最初的预期,这一事实主要归因于现有工具为用户协作和信息共享提供的有限支持。本文的目标有两个,首先是为以抽象形式构建协作环境奠定基础,其次是通过g-Social展示这些抽象的有效性。g-Social是一个基于eclipse的开源环境,作为g-Eclipse的扩展,它为用户、应用程序开发人员和网格基础设施管理员提供了一个强大的、用户友好的、独立于平台的工具集。g-Social通过在线社交网络服务实现用户协作和资源共享,利用这些服务的成功。
{"title":"g-Social: Enhancing integrated e-science tools with Social Networking functionality","authors":"Andriani Stylianou, N. Loulloudes, M. Dikaiakos","doi":"10.1109/eScience.2012.6404454","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404454","url":null,"abstract":"During the last decade, the scientific community has witnessed an unprecedented deployment of large-scale, federated e-Infrastructures such as Grid Computing, primarily for supporting data-intensive scientific exploration and coordinated problem solving. However, practical experience and user studies have indicated that the adoption of such e-Infrastructures is lagging behind original expectations, a fact which is mainly attributed to the limited support that available tools provide for user collaboration and information sharing. The goal of this paper is twofold, first to lay down the foundations for building a collaboration environment in the form of abstractions and second to show the effectiveness of these abstractions through g-Social, an Eclipse-based, open-source environment as an extension to g-Eclipse, that provides a powerful, user-friendly, platform-independent toolset for users, application developers and administrators of Grid infrastructures. g-Social enables user collaboration and resource sharing through Online Social Networking services, capitalizing on the success that these services have.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"3 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84887225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data-oriented research for bioresource utilization: A case study to investigate water uptake in cellulose using Principal Components 面向数据的生物资源利用研究:利用主成分研究纤维素水分吸收的案例研究
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404485
L. Ling, C. Driemeier, R. M. C. Junior
Bioresource utilization represents an important interdisciplinary research that integrates academic and industrial expertise across diverse scientific domains, including physics, chemistry, biology, and engineering. The present paper describes a cyber-infrastructure being created at the Brazilian Bioethanol Science and Technology Laboratory (CTBE) to assist scientists working on the field. One key element of the infrastructure is the LignoCel Platform, a tailor-made database for upload, curation, and sharing of lignocellulose data. Particularly, LignoCel allows querying the data and exporting subsets that are analyzed for knowledge extraction. In the present paper, a case-study is described, in which scientists want to investigate the dimensions that relate cellulose structure and water uptake. Data analysis and dimensionality reduction using Principal Component Analysis (PCA) is employed. Different PCA-based measurements are extracted and visualized through automatically-generated HTML pages available for the domain scientists. In this case study, the workflow successfully provided dimensionality reduction from a data matrix originated from a heterogeneous set of materials. PCA scores and loadings are explored for data analysis and visualization. PCA reduced the 11 measured features (obtained from three different experimental techniques, 55 possible combinations of size 2) into a two-dimensional PC1PC2 loadings plot representing 89% of data variance. Examples of the output produced by the system are available at http://data.bioetanol.org. br/~liu.ling/pca-lignocel/.
生物资源利用是一项重要的跨学科研究,它整合了物理、化学、生物和工程等不同科学领域的学术和工业专业知识。这篇论文描述了巴西生物乙醇科学技术实验室(CTBE)正在创建的一个网络基础设施,以帮助在该领域工作的科学家。该基础设施的一个关键要素是LignoCel平台,这是一个定制的数据库,用于上传、管理和共享木质纤维素数据。特别是,LignoCel允许查询数据和导出用于知识提取的分析子集。在本文中,一个案例研究被描述,其中科学家想要调查有关纤维素结构和水摄取的尺寸。采用主成分分析(PCA)进行数据分析和降维。通过为领域科学家提供的自动生成的HTML页面提取和可视化不同的基于pca的测量。在这个案例研究中,工作流成功地提供了来自一组异构材料的数据矩阵的降维。探讨了PCA分数和加载的数据分析和可视化。PCA将11个测量特征(从三种不同的实验技术中获得,55种可能的大小2组合)减少到一个二维PC1PC2加载图,代表89%的数据方差。该系统产生的输出示例可在http://data.bioetanol.org上获得。br / ~ liu.ling / pca-lignocel /。
{"title":"Data-oriented research for bioresource utilization: A case study to investigate water uptake in cellulose using Principal Components","authors":"L. Ling, C. Driemeier, R. M. C. Junior","doi":"10.1109/eScience.2012.6404485","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404485","url":null,"abstract":"Bioresource utilization represents an important interdisciplinary research that integrates academic and industrial expertise across diverse scientific domains, including physics, chemistry, biology, and engineering. The present paper describes a cyber-infrastructure being created at the Brazilian Bioethanol Science and Technology Laboratory (CTBE) to assist scientists working on the field. One key element of the infrastructure is the LignoCel Platform, a tailor-made database for upload, curation, and sharing of lignocellulose data. Particularly, LignoCel allows querying the data and exporting subsets that are analyzed for knowledge extraction. In the present paper, a case-study is described, in which scientists want to investigate the dimensions that relate cellulose structure and water uptake. Data analysis and dimensionality reduction using Principal Component Analysis (PCA) is employed. Different PCA-based measurements are extracted and visualized through automatically-generated HTML pages available for the domain scientists. In this case study, the workflow successfully provided dimensionality reduction from a data matrix originated from a heterogeneous set of materials. PCA scores and loadings are explored for data analysis and visualization. PCA reduced the 11 measured features (obtained from three different experimental techniques, 55 possible combinations of size 2) into a two-dimensional PC1PC2 loadings plot representing 89% of data variance. Examples of the output produced by the system are available at http://data.bioetanol.org. br/~liu.ling/pca-lignocel/.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"2 1","pages":"1-7"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76565242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Automated data verification in a large-scale citizen science project: A case study 大规模公民科学项目中的自动数据验证:案例研究
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404472
Jun Yu, S. Kelling, Jeff Gerbracht, Weng-Keen Wong
Although citizen science projects can engage a very large number of volunteers to collect volumes of data, they are susceptible to issues with data quality. Our experience with eBird, which is a broad-scale citizen science project to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. The increasing volume of data being collected by eBird places a huge burden on these volunteer experts and other automated approaches to improve data quality are needed. In this work, we describe a case study in which we evaluate an automated data quality filter that improves data quality by identifying outliers and categorizing these outliers as either unusual valid observations or mis-identified (invalid) observations. This automated data filter involves a two-step process: first, a data-driven method detects outliers (ie. observations that are unusual for a given region and date). Next, we use a data quality model based on an observer's predicted expertise to decide if an outlier should be flagged for review. We applied this automated data filter retrospectively to eBird data from Tompkins Co., NY and found that that this automated process significantly reduced the workload of reviewers by as much as 43% and identifies 52% more potentially invalid observations.
尽管公民科学项目可以吸引大量志愿者来收集大量数据,但它们容易受到数据质量问题的影响。eBird是一个收集鸟类观测数据的大规模公民科学项目,我们的经验表明,志愿者专家需要付出大量努力来筛选数据,识别异常值并在数据库中标记它们。eBird收集的数据量不断增加,给这些志愿者专家带来了巨大的负担,需要其他自动化方法来提高数据质量。在这项工作中,我们描述了一个案例研究,其中我们评估了一个自动数据质量过滤器,该过滤器通过识别异常值并将这些异常值分类为异常有效观察值或错误识别(无效)观察值来提高数据质量。这种自动数据过滤包括两步过程:首先,数据驱动的方法检测异常值(即。对某一特定地区和日期来说不寻常的观测)。接下来,我们使用基于观察者预测专业知识的数据质量模型来决定是否应该标记异常值以进行审查。我们将这种自动数据过滤器回顾性地应用于来自纽约州汤普金斯公司的eBird数据,发现这种自动化过程显着减少了多达43%的审稿人的工作量,并识别出52%的潜在无效观察结果。
{"title":"Automated data verification in a large-scale citizen science project: A case study","authors":"Jun Yu, S. Kelling, Jeff Gerbracht, Weng-Keen Wong","doi":"10.1109/eScience.2012.6404472","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404472","url":null,"abstract":"Although citizen science projects can engage a very large number of volunteers to collect volumes of data, they are susceptible to issues with data quality. Our experience with eBird, which is a broad-scale citizen science project to collect bird observations, has shown that a massive effort by volunteer experts is needed to screen data, identify outliers and flag them in the database. The increasing volume of data being collected by eBird places a huge burden on these volunteer experts and other automated approaches to improve data quality are needed. In this work, we describe a case study in which we evaluate an automated data quality filter that improves data quality by identifying outliers and categorizing these outliers as either unusual valid observations or mis-identified (invalid) observations. This automated data filter involves a two-step process: first, a data-driven method detects outliers (ie. observations that are unusual for a given region and date). Next, we use a data quality model based on an observer's predicted expertise to decide if an outlier should be flagged for review. We applied this automated data filter retrospectively to eBird data from Tompkins Co., NY and found that that this automated process significantly reduced the workload of reviewers by as much as 43% and identifies 52% more potentially invalid observations.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"93 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75969153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 20
Virtual Simulation Objects concept as a framework for system-level simulation 虚拟仿真对象概念作为系统级仿真的框架
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404413
S. Kovalchuk, Pavel A. Smirnov, Sergey S. Kosukhin, A. Boukhanovsky
This paper presents Virtual Simulation Objects (VSO) concept which forms theoretical basis for building tools and framework that is developed for system-level simulations using existing software modules available within cyber-infrastructure. Presented concept is implemented by the software tool for building composite solutions using VSO-based GUI and running them using CLAVIRE simulation environment.
本文提出了虚拟仿真对象(VSO)概念,它构成了构建工具和框架的理论基础,这些工具和框架是利用网络基础设施中可用的现有软件模块为系统级仿真开发的。提出的概念是通过软件工具实现的,该软件工具使用基于vso的GUI构建复合解决方案,并使用CLAVIRE仿真环境运行它们。
{"title":"Virtual Simulation Objects concept as a framework for system-level simulation","authors":"S. Kovalchuk, Pavel A. Smirnov, Sergey S. Kosukhin, A. Boukhanovsky","doi":"10.1109/eScience.2012.6404413","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404413","url":null,"abstract":"This paper presents Virtual Simulation Objects (VSO) concept which forms theoretical basis for building tools and framework that is developed for system-level simulations using existing software modules available within cyber-infrastructure. Presented concept is implemented by the software tool for building composite solutions using VSO-based GUI and running them using CLAVIRE simulation environment.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"785 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76228270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
A satellite data portal developed for crowdsourcing data analysis and interpretation 为众包数据分析和解释开发了卫星数据门户
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404453
Zhenghui Hu, Wenjun Wu
Satellite data products derived from the remote sensing observations describe features of the land, ocean and atmosphere. And by data processing, they can be used to study processes and trends on local/global scale for real-time environmental research and applications. However, the advances of cutting-edge remote sensing technology bring the challenge of data deluge for satellite data analysis and interpretation. With combinations of human intelligence and machine intelligence, we develop a satellite data portal for crowdsourcing data analysis and interpretation through teaching and learning to cope with the overwhelming data deluge. Compared with all the existing data portals and crowdsourcing systems, it is the first attempt to embed crowdsourcing into a data portal to provide integrated services of satellite data access and analysis.
来自遥感观测的卫星数据产品描述了陆地、海洋和大气的特征。通过数据处理,它们可以用于研究本地/全球范围内的过程和趋势,以进行实时环境研究和应用。然而,尖端遥感技术的发展给卫星数据分析和解译带来了数据泛滥的挑战。结合人类智能和机器智能,我们开发了一个卫星数据门户,通过教与学的方式进行众包数据分析和解释,以应对势不可挡的数据洪流。与现有的所有数据门户和众包系统相比,这是第一次将众包嵌入到数据门户中,提供卫星数据接入和分析的综合服务。
{"title":"A satellite data portal developed for crowdsourcing data analysis and interpretation","authors":"Zhenghui Hu, Wenjun Wu","doi":"10.1109/eScience.2012.6404453","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404453","url":null,"abstract":"Satellite data products derived from the remote sensing observations describe features of the land, ocean and atmosphere. And by data processing, they can be used to study processes and trends on local/global scale for real-time environmental research and applications. However, the advances of cutting-edge remote sensing technology bring the challenge of data deluge for satellite data analysis and interpretation. With combinations of human intelligence and machine intelligence, we develop a satellite data portal for crowdsourcing data analysis and interpretation through teaching and learning to cope with the overwhelming data deluge. Compared with all the existing data portals and crowdsourcing systems, it is the first attempt to embed crowdsourcing into a data portal to provide integrated services of satellite data access and analysis.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"58 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90339614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
RightField: Semantic enrichment of Systems Biology data using spreadsheets 右领域:使用电子表格的系统生物学数据的语义丰富
Pub Date : 2012-10-08 DOI: 10.1109/ESCIENCE.2012.6404412
K. Wolstencroft, S. Owen, C. Goble, Quyen Nguyen, Olga Krebs, Wolfgang Müller
The interpretation and integration of experimental data depends on consistent metadata and uniform annotation. However, there are many barriers to the acquisition of this rich semantic metadata, not least the overhead and complexity of its collection by scientists. We present RightField, a lightweight spreadsheet-based annotation tool for lowering the barrier of manual metadata acquisition; and a data integration application for extracting and querying RDF data from these enriched spreadsheets. By hiding the complexities of semantic annotation, we can improve the collection of rich metadata, at source, by scientists. We illustrate the approach with results from the SysMO program, showing that RightField supports the whole workflow of semantic data collection, submission and RDF querying in Systems Biology. The RightField tool is freely available from http://www.rightfield.org.uk, and the code is open source under the BSD License.
实验数据的解释和集成依赖于一致的元数据和统一的标注。然而,获取这种丰富的语义元数据有许多障碍,尤其是科学家收集数据的开销和复杂性。我们提出了RightField,一个轻量级的基于电子表格的注释工具,用于降低人工获取元数据的障碍;以及用于从这些丰富的电子表格中提取和查询RDF数据的数据集成应用程序。通过隐藏语义注释的复杂性,我们可以从源头上改进科学家对丰富元数据的收集。我们用SysMO程序的结果说明了该方法,表明RightField支持系统生物学中语义数据收集、提交和RDF查询的整个工作流程。RightField工具可以从http://www.rightfield.org.uk免费获得,其代码在BSD许可证下是开源的。
{"title":"RightField: Semantic enrichment of Systems Biology data using spreadsheets","authors":"K. Wolstencroft, S. Owen, C. Goble, Quyen Nguyen, Olga Krebs, Wolfgang Müller","doi":"10.1109/ESCIENCE.2012.6404412","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2012.6404412","url":null,"abstract":"The interpretation and integration of experimental data depends on consistent metadata and uniform annotation. However, there are many barriers to the acquisition of this rich semantic metadata, not least the overhead and complexity of its collection by scientists. We present RightField, a lightweight spreadsheet-based annotation tool for lowering the barrier of manual metadata acquisition; and a data integration application for extracting and querying RDF data from these enriched spreadsheets. By hiding the complexities of semantic annotation, we can improve the collection of rich metadata, at source, by scientists. We illustrate the approach with results from the SysMO program, showing that RightField supports the whole workflow of semantic data collection, submission and RDF querying in Systems Biology. The RightField tool is freely available from http://www.rightfield.org.uk, and the code is open source under the BSD License.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":"26 1","pages":"1-8"},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90782986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
期刊
2012 IEEE 8th International Conference on E-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1