Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience最新文献

英文中文

ERMrest: A Collaborative Data Catalog with Fine Grain Access Control. ERMrest:具有细粒度访问控制的协作数据目录。

Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience

Pub Date : 2017-10-01 Epub Date: 2017-11-16 DOI: 10.1109/eScience.2017.83

Karl Czajkowski, Carl Kesselman, Robert Schuler

Creating and maintaining an accurate description of data assets and the relationships between assets is a critical aspect of making data findable, accessible, interoperable, and reusable (FAIR). Typically, such metadata are created and maintained in a data catalog by a curator as part of data publication. However, allowing metadata to be created and maintained by data producers as the data is generated rather then waiting for publication can have significant advantages in terms of productivity and repeatability. The responsibilities for metadata management need not fall on any one individual, but rather may be delegated to appropriate members of a collaboration, enabling participants to edit or maintain specific attributes, to describe relationships between data elements, or to correct errors. To support such collaborative data editing, we have created ERMrest, a relational data service for the Web that enables the creation, evolution and navigation of complex models used to describe and structure diverse file or relational data objects. A key capability of ERMrest is its ability to control operations down to the level of individual data elements, i.e. fine-grained access control, so that many different modes of data-oriented collaboration can be supported. In this paper we introduce ERMrest and describe its fine-grained access control capabilities that support collaborative editing. ERMrest is in daily use in many data driven collaborations and we describe a sample policy that is based on a common biocuration pattern.

创建和维护数据资产和资产之间关系的准确描述是使数据可查找、可访问、可互操作和可重用(FAIR)的关键方面。通常，管理员在数据目录中创建和维护这些元数据，作为数据发布的一部分。但是，允许数据生产者在数据生成时创建和维护元数据，而不是等待发布，在生产力和可重复性方面具有显著的优势。元数据管理的责任不需要落在任何一个人身上，而是可以委托给协作的适当成员，使参与者能够编辑或维护特定的属性，描述数据元素之间的关系，或纠正错误。为了支持这种协作性数据编辑，我们创建了ERMrest，这是一种用于Web的关系数据服务，它支持创建、演变和导航用于描述和构建各种文件或关系数据对象的复杂模型。ERMrest的一个关键功能是它能够控制操作直至单个数据元素的级别，即细粒度访问控制，因此可以支持许多不同的面向数据的协作模式。在本文中，我们介绍了ERMrest，并描述了其支持协作编辑的细粒度访问控制功能。ERMrest在许多数据驱动的协作中被日常使用，我们描述了一个基于通用生物定位模式的示例策略。

{"title":"ERMrest: A Collaborative Data Catalog with Fine Grain Access Control.","authors":"Karl Czajkowski, Carl Kesselman, Robert Schuler","doi":"10.1109/eScience.2017.83","DOIUrl":"https://doi.org/10.1109/eScience.2017.83","url":null,"abstract":"Creating and maintaining an accurate description of data assets and the relationships between assets is a critical aspect of making data findable, accessible, interoperable, and reusable (FAIR). Typically, such metadata are created and maintained in a data catalog by a curator as part of data publication. However, allowing metadata to be created and maintained by data producers as the data is generated rather then waiting for publication can have significant advantages in terms of productivity and repeatability. The responsibilities for metadata management need not fall on any one individual, but rather may be delegated to appropriate members of a collaboration, enabling participants to edit or maintain specific attributes, to describe relationships between data elements, or to correct errors. To support such collaborative data editing, we have created ERMrest, a relational data service for the Web that enables the creation, evolution and navigation of complex models used to describe and structure diverse file or relational data objects. A key capability of ERMrest is its ability to control operations down to the level of individual data elements, i.e. fine-grained access control, so that many different modes of data-oriented collaboration can be supported. In this paper we introduce ERMrest and describe its fine-grained access control capabilities that support collaborative editing. ERMrest is in daily use in many data driven collaborations and we describe a sample policy that is based on a common biocuration pattern.","PeriodicalId":90293,"journal":{"name":"Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/eScience.2017.83","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"36094989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Experiences with Deriva: An Asset Management Platform for Accelerating eScience. 衍生品的经验:加速eScience的资产管理平台。

Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience

Pub Date : 2017-10-01 Epub Date: 2017-11-16 DOI: 10.1109/eScience.2017.20

Alejandro Bugacov, Karl Czajkowski, Carl Kesselman, Anoop Kumar, Robert E Schuler, Hongsuda Tangmunarunkit

The pace of discovery in eScience is increasingly dependent on a scientist's ability to acquire, curate, integrate, analyze, and share large and diverse collections of data. It is all too common for investigators to spend inordinate amounts of time developing ad hoc procedures to manage their data. In previous work, we presented Deriva, a Scientific Asset Management System, designed to accelerate data driven discovery. In this paper, we report on the use of Deriva in a number of substantial and diverse eScience applications. We describe the lessons we have learned, both from the perspective of the Deriva technology, as well as the ability and willingness of scientists to incorporate Scientific Asset Management into their daily workflows.

科学发现的步伐越来越依赖于科学家获取、整理、整合、分析和共享大量不同数据集的能力。对于调查人员来说，花费大量的时间来开发专门的程序来管理他们的数据是很常见的。在之前的工作中，我们介绍了用于加速数据驱动发现的科学资产管理系统derivatives。在本文中，我们报告了衍生品在许多实质性和多样化的电子科学应用中的使用。我们从派生技术的角度，以及科学家将科学资产管理纳入其日常工作流程的能力和意愿，描述了我们所学到的经验教训。

引用次数: 20

Using Hidden Markov Models to Determine Changes in Subject Data over Time, Studying the Immunoregulatory effect of Mesenchymal Stem Cells. 利用隐马尔可夫模型确定受试者数据随时间的变化，研究间充质干细胞的免疫调节作用。

Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience

Pub Date : 2014-10-01 DOI: 10.1109/eScience.2014.29

Edgar F Black, Luigi Marini, Ashwini Vaidya, Dora Berman, Melissa Willman, Dan Salomon, Amelia Bartholomew, Norma Kenyon, Kenton McHenry

A novel application of Hidden Markov Models is used to help research intended to test the immunuregulatory effects of mesenchymal stem cells in a cynomolgus monkey model of islet transplantation. The Hidden Markov Model, an unsupervised learning data mining technique, is used to automatically determine the postoperative day (POD) corresponding to a decrease of graft function, a possible sign of transplant rejection, on nonhuman primates after isolated islet cell transplant. Currently, decrease of graft function is being determined solely on experts' judgment. Further, information gathered from the evaluation of construted Hidden Markov Models is used as part of a clustering method to aggregate the nonhuman subjects into groups or clusters with the objective of finding similarities that could potentially help predict the health outcome of subjects undergoing postoperative care. Results on expert labeled data show the HMM to be accurate 60% of the time. Clusters based on the HMMs further suggest a possible correspondence between donor haplotypes matching and loss of function outcomes.

隐马尔可夫模型的一种新应用被用来帮助研究旨在测试间充质干细胞在食蟹猴胰岛移植模型中的免疫调节作用。隐马尔可夫模型(Hidden Markov Model)是一种无监督学习数据挖掘技术，用于自动确定非人灵长类动物离体胰岛细胞移植后移植功能下降对应的术后天数(POD)，这可能是移植排斥的迹象。目前，移植物功能的下降仅靠专家的判断来确定。此外，从构建的隐马尔可夫模型的评估中收集的信息被用作聚类方法的一部分，将非人类受试者聚集到组或聚类中，目的是寻找可能有助于预测接受术后护理的受试者的健康结果的相似性。专家标记数据的结果表明HMM的准确率为60%。基于hmm的聚类进一步表明供体单倍型匹配和功能丧失结果之间可能存在对应关系。

{"title":"Using Hidden Markov Models to Determine Changes in Subject Data over Time, Studying the Immunoregulatory effect of Mesenchymal Stem Cells.","authors":"Edgar F Black, Luigi Marini, Ashwini Vaidya, Dora Berman, Melissa Willman, Dan Salomon, Amelia Bartholomew, Norma Kenyon, Kenton McHenry","doi":"10.1109/eScience.2014.29","DOIUrl":"https://doi.org/10.1109/eScience.2014.29","url":null,"abstract":"A novel application of Hidden Markov Models is used to help research intended to test the immunuregulatory effects of mesenchymal stem cells in a cynomolgus monkey model of islet transplantation. The Hidden Markov Model, an unsupervised learning data mining technique, is used to automatically determine the postoperative day (POD) corresponding to a decrease of graft function, a possible sign of transplant rejection, on nonhuman primates after isolated islet cell transplant. Currently, decrease of graft function is being determined solely on experts' judgment. Further, information gathered from the evaluation of construted Hidden Markov Models is used as part of a clustering method to aggregate the nonhuman subjects into groups or clusters with the objective of finding similarities that could potentially help predict the health outcome of subjects undergoing postoperative care. Results on expert labeled data show the HMM to be accurate 60% of the time. Clusters based on the HMMs further suggest a possible correspondence between donor haplotypes matching and loss of function outcomes.","PeriodicalId":90293,"journal":{"name":"Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/eScience.2014.29","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33388827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Folding Proteins at 500 ns/hour with Work Queue. 用工作队列以500纳秒/小时的速度折叠蛋白质。

Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience

Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404429

Badi' Abdul-Wahid, Li Yu, Dinesh Rajan, Haoyun Feng, Eric Darve, Douglas Thain, Jesús A Izaguirre

Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.

分子建模是一个传统上具有大量计算成本的领域。直到最近，大多数仿真技术都依赖于长轨迹，这本身就具有较差的可扩展性。提出了一种新的方法，它只需要大量的短计算，并且在计算机节点之间需要最少的通信。我们考虑了一种更准确的变体，称为加速加权集成动力学(AWE)，分布式计算可以变得高效。我们使用工作队列框架实现了AWE任务管理，并将其应用于全原子蛋白质模型(Fip35 WW结构域)。通过同时利用来自多个计算平台的异构资源，例如云(Amazon EC2, Microsoft Azure)、专用集群、网格、多个架构(CPU/GPU, 32/64位)，以及定期从池中添加或删除进程的动态环境，我们可以以出色的可扩展性运行。这使我们能够实现超过500纳秒/小时的总采样率。相比之下，单个过程通常达到0.1纳秒/小时。

{"title":"Folding Proteins at 500 ns/hour with Work Queue.","authors":"Badi' Abdul-Wahid, Li Yu, Dinesh Rajan, Haoyun Feng, Eric Darve, Douglas Thain, Jesús A Izaguirre","doi":"10.1109/eScience.2012.6404429","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404429","url":null,"abstract":"Molecular modeling is a field that traditionally has large computational costs. Until recently, most simulation techniques relied on long trajectories, which inherently have poor scalability. A new class of methods is proposed that requires only a large number of short calculations, and for which minimal communication between computer nodes is required. We considered one of the more accurate variants called Accelerated Weighted Ensemble Dynamics (AWE) and for which distributed computing can be made efficient. We implemented AWE using the Work Queue framework for task management and applied it to an all atom protein model (Fip35 WW domain). We can run with excellent scalability by simultaneously utilizing heterogeneous resources from multiple computing platforms such as clouds (Amazon EC2, Microsoft Azure), dedicated clusters, grids, on multiple architectures (CPU/GPU, 32/64bit), and in a dynamic environment in which processes are regularly added or removed from the pool. This has allowed us to achieve an aggregate sampling rate of over 500 ns/hour. As a comparison, a single process typically achieves 0.1 ns/hour.","PeriodicalId":90293,"journal":{"name":"Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/eScience.2012.6404429","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"32935576","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

rCAD: A Novel Database Schema for the Comparative Analysis of RNA. rCAD:一种用于RNA比较分析的新型数据库模式。

Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience

Pub Date : 2011-12-31 DOI: 10.1109/eScience.2011.11

Stuart Ozer, Kishore J Doshi, Weijia Xu, Robin R Gutell

Beyond its direct involvement in protein synthesis with mRNA, tRNA, and rRNA, RNA is now being appreciated for its significance in the overall metabolism and regulation of the cell. Comparative analysis has been very effective in the identification and characterization of RNA molecules, including the accurate prediction of their secondary structure. We are developing an integrative scalable data management and analysis system, the RNA Comparative Analysis Database (rCAD), implemented with SQL Server to support RNA comparative analysis. The platformagnostic database schema of rCAD captures the essential relationships between the different dimensions of information for RNA comparative analysis datasets. The rCAD implementation enables a variety of comparative analysis manipulations with multiple integrated data dimensions for advanced RNA comparative analysis workflows. In this paper, we describe details of the rCAD schema design and illustrate its usefulness with two usage scenarios.

除了直接参与mRNA、tRNA和rRNA的蛋白质合成外，RNA在细胞的整体代谢和调节中也具有重要意义。比较分析在RNA分子的鉴定和表征中非常有效，包括对其二级结构的准确预测。我们正在开发一个集成的可扩展数据管理和分析系统，RNA比较分析数据库(rCAD)，通过SQL Server实现以支持RNA比较分析。rCAD的平台无关数据库模式捕获了RNA比较分析数据集不同信息维度之间的基本关系。rCAD实现为高级RNA比较分析工作流程提供了多种具有多个集成数据维度的比较分析操作。在本文中，我们描述了rCAD模式设计的细节，并通过两个使用场景说明了它的有用性。

引用次数: 8

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings ... IEEE International Conference on eScience. IEEE International Conference on eScience

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀