首页 > 最新文献

2012 IEEE 8th International Conference on E-Science最新文献

英文 中文
Digitization and search: A non-traditional use of HPC 数字化与搜索:高性能计算的非传统应用
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404445
Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry
Automated search of handwritten content is a highly interesting and applicative subject, especially important today due to the public availability of large digitized document collections. We describe our efforts with the National Archives (NARA) to provide searchable access to the 1940 Census data and discuss the HPC resources needed to implement the suggested framework. Instead of trying to recognize the handwritten text, a still very difficult task, we use a content based image retrieval technique known as Word Spotting. Through this paradigm, the system is queried by the use of handwritten text images instead of ASCII text and ranked groups of similar looking images are presented to the user. A significant amount of computing power is needed to accomplish the pre-processing of the data so to make this search capability available on an archive. The required preprocessing steps and the open source framework developed are discussed focusing specifically on HPC considerations that are relevant when preparing to provide searchable access to sizeable collections, such as the US Census. Having processed the state of North Carolina from the 1930 Census using 98,000 SUs we estimate the processing of the entire country for 1940 could require up to 2.5 million SUs. The proposed framework can be used to provide an alternative to costly manual transcriptions for a variety of digitized paper archives.
手写内容的自动搜索是一个非常有趣和实用的主题,由于大型数字化文档集合的公共可用性,在今天尤其重要。我们描述了我们与国家档案馆(NARA)为提供1940年人口普查数据的可搜索访问所做的努力,并讨论了实施建议框架所需的高性能计算资源。我们没有尝试识别手写文本,这仍然是一项非常困难的任务,而是使用了一种基于内容的图像检索技术,即Word Spotting。通过这种模式,使用手写文本图像而不是ASCII文本来查询系统,并将相似图像的排名组呈现给用户。需要大量的计算能力来完成数据的预处理,以便在存档中使用这种搜索功能。讨论了所需的预处理步骤和开发的开源框架,重点是在准备提供可搜索访问大型集合(如美国人口普查)时相关的HPC考虑因素。在处理了1930年人口普查中北卡罗来纳州的98,000个SUs后,我们估计1940年处理整个国家可能需要多达250万个SUs。所提出的框架可用于为各种数字化纸质档案提供昂贵的手动转录的替代方案。
{"title":"Digitization and search: A non-traditional use of HPC","authors":"Liana Diesendruck, Luigi Marini, R. Kooper, M. Kejriwal, Kenton McHenry","doi":"10.1109/eScience.2012.6404445","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404445","url":null,"abstract":"Automated search of handwritten content is a highly interesting and applicative subject, especially important today due to the public availability of large digitized document collections. We describe our efforts with the National Archives (NARA) to provide searchable access to the 1940 Census data and discuss the HPC resources needed to implement the suggested framework. Instead of trying to recognize the handwritten text, a still very difficult task, we use a content based image retrieval technique known as Word Spotting. Through this paradigm, the system is queried by the use of handwritten text images instead of ASCII text and ranked groups of similar looking images are presented to the user. A significant amount of computing power is needed to accomplish the pre-processing of the data so to make this search capability available on an archive. The required preprocessing steps and the open source framework developed are discussed focusing specifically on HPC considerations that are relevant when preparing to provide searchable access to sizeable collections, such as the US Census. Having processed the state of North Carolina from the 1930 Census using 98,000 SUs we estimate the processing of the entire country for 1940 could require up to 2.5 million SUs. The proposed framework can be used to provide an alternative to costly manual transcriptions for a variety of digitized paper archives.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80601556","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
eResearch environment for remote instrumentation: VBL, RLI, VisLabl & 2 远程仪器研究环境:VBL, RLI, VisLabl & 2
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404465
C. Myers, Michael D'Silva
This talk demonstrates the current remote experimentation capabilities deployed at the Australian Synchrotron and La Trobe university, as well as remote data transfer services deployed at the above locations and at Bragg, ansto, metadata extraction tool, MyTardis node's, remote analysis and visualisation environments for medical imaging and IR spectroscopy and the use of high resolution multi screen displays.
本次演讲展示了目前部署在澳大利亚同步加速器和拉筹伯大学的远程实验能力,以及部署在上述地点和布拉格的远程数据传输服务,ansto,元数据提取工具,MyTardis节点,用于医学成像和红外光谱的远程分析和可视化环境以及高分辨率多屏幕显示器的使用。
{"title":"eResearch environment for remote instrumentation: VBL, RLI, VisLabl & 2","authors":"C. Myers, Michael D'Silva","doi":"10.1109/eScience.2012.6404465","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404465","url":null,"abstract":"This talk demonstrates the current remote experimentation capabilities deployed at the Australian Synchrotron and La Trobe university, as well as remote data transfer services deployed at the above locations and at Bragg, ansto, metadata extraction tool, MyTardis node's, remote analysis and visualisation environments for medical imaging and IR spectroscopy and the use of high resolution multi screen displays.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90424526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Partial replica selection for spatial datasets 空间数据集的部分副本选择
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404473
Yun Tian, P. J. Rhodes
The implementation of partial or incomplete replicas, which represent only a subset of a larger dataset, has been an active topic of research. Partial Spatial Replicas extend this functionality to spatial data, allowing us to distribute a spatial dataset in pieces over several locations. Accessing only a subset of a spatial replica usually results in a large number of relatively small read requests made to the underlying storage device. For this reason, an accurate model of disk access is important when working with spatial subsets. We make two primary contributions in this paper. First, we describe a model for disk access performance that takes filesystem prefetching into account and is sufficiently accurate for spatial replica selection. Second, making a few simplifying assumptions, we propose a fast replica selection algorithm for partial spatial replicas. The algorithm uses a greedy approach that attempts to maximize performance by choosing a collection of replica subsets that allow fast data retrieval by a client machine. Experiments show that the performance of the solution found by our algorithm is on average always at least 91% and 93.4% of the performance of the optimal solution in 4-node and 8-node tests respectively.
部分或不完整副本的实现仅代表较大数据集的一个子集,一直是研究的活跃主题。部分空间复制将此功能扩展到空间数据,允许我们将空间数据集分散到多个位置。只访问空间副本的一个子集通常会导致对底层存储设备发出大量相对较小的读请求。因此,在处理空间子集时,准确的磁盘访问模型非常重要。我们在本文中做出了两个主要贡献。首先,我们描述了一个磁盘访问性能模型,该模型考虑了文件系统预取,并且对于空间副本选择足够准确。其次,通过一些简化的假设,提出了一种局部空间副本的快速副本选择算法。该算法使用一种贪心方法,通过选择一组副本子集来实现性能最大化,从而允许客户端机器快速检索数据。实验表明,在4节点和8节点测试中,我们的算法找到的解的性能平均至少是最优解的91%和93.4%。
{"title":"Partial replica selection for spatial datasets","authors":"Yun Tian, P. J. Rhodes","doi":"10.1109/eScience.2012.6404473","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404473","url":null,"abstract":"The implementation of partial or incomplete replicas, which represent only a subset of a larger dataset, has been an active topic of research. Partial Spatial Replicas extend this functionality to spatial data, allowing us to distribute a spatial dataset in pieces over several locations. Accessing only a subset of a spatial replica usually results in a large number of relatively small read requests made to the underlying storage device. For this reason, an accurate model of disk access is important when working with spatial subsets. We make two primary contributions in this paper. First, we describe a model for disk access performance that takes filesystem prefetching into account and is sufficiently accurate for spatial replica selection. Second, making a few simplifying assumptions, we propose a fast replica selection algorithm for partial spatial replicas. The algorithm uses a greedy approach that attempts to maximize performance by choosing a collection of replica subsets that allow fast data retrieval by a client machine. Experiments show that the performance of the solution found by our algorithm is on average always at least 91% and 93.4% of the performance of the optimal solution in 4-node and 8-node tests respectively.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89349493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A system for management of Computational Fluid Dynamics simulations for civil engineering 土木工程计算流体力学模拟管理系统
Pub Date : 2012-10-08 DOI: 10.1109/eScience.2012.6404433
Peter Sempolinski, D. Thain, Daniel Wei, A. Kareem
We introduce a web-based system for management of Computational Fluid Dynamics(CFD) simulations. This system provides an interface for users, on a web-browser, to have an intuitive, user-friendly means of dispatching and controlling long-running simulations. CFD presents a challenge to its users due to the complexity of its internal mathematics, the high computational demands of its simulations and the complexity of inputs to its simulations and related tasks. We designed this system to be as extensible as possible in order to be suitable for many different civil engineering applications. The front-end of this system is a webserver, which provides the user interface. The back-end is responsible for starting and stopping jobs as requested. There are also numerous components specifically for facilitating CFD computation. We discuss our experience with presenting this system to real users and the future ambitions for this project.
本文介绍了一个基于web的计算流体动力学(CFD)仿真管理系统。该系统为用户提供了一个界面,在网络浏览器上,有一个直观的,用户友好的方式调度和控制长时间运行的模拟。CFD由于其内部数学的复杂性,其模拟的高计算需求以及其模拟和相关任务的输入的复杂性,对其用户提出了挑战。为了适应不同的土木工程应用,我们将这个系统设计得尽可能具有可扩展性。该系统的前端是一个web服务器,提供用户界面。后端负责根据请求启动和停止作业。还有许多专门用于促进CFD计算的组件。我们讨论了向实际用户展示该系统的经验,以及该项目未来的目标。
{"title":"A system for management of Computational Fluid Dynamics simulations for civil engineering","authors":"Peter Sempolinski, D. Thain, Daniel Wei, A. Kareem","doi":"10.1109/eScience.2012.6404433","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404433","url":null,"abstract":"We introduce a web-based system for management of Computational Fluid Dynamics(CFD) simulations. This system provides an interface for users, on a web-browser, to have an intuitive, user-friendly means of dispatching and controlling long-running simulations. CFD presents a challenge to its users due to the complexity of its internal mathematics, the high computational demands of its simulations and the complexity of inputs to its simulations and related tasks. We designed this system to be as extensible as possible in order to be suitable for many different civil engineering applications. The front-end of this system is a webserver, which provides the user interface. The back-end is responsible for starting and stopping jobs as requested. There are also numerous components specifically for facilitating CFD computation. We discuss our experience with presenting this system to real users and the future ambitions for this project.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89687975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Temporal representation for scientific data provenance 科学数据来源的时态表示
Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404477
Peng Chen, Beth Plale, M. Aktaş
Provenance of digital scientific data is an important piece of the metadata of a data object. It can however grow voluminous quickly because the granularity level of capture can be high. It can also be quite feature rich. We propose a representation of the provenance data based on logical time that reduces the feature space. Creating time and frequency domain representations of the provenance, we apply clustering, classification and association rule mining to the abstract representations to determine the usefulness of the temporal representation. We evaluate the temporal representation using an existing 10 GB database of provenance captured from a range of scientific workflows.
数字科学数据的来源是数据对象元数据的重要组成部分。但是,由于捕获的粒度级别可能很高,因此它可以快速增长。它也可以是相当丰富的功能。提出了一种基于逻辑时间的来源数据表示方法,减少了特征空间。我们创建了来源的时域和频域表示,并对抽象表示应用聚类、分类和关联规则挖掘来确定时间表示的有用性。我们使用从一系列科学工作流程中捕获的现有的10gb来源数据库来评估时间表示。
{"title":"Temporal representation for scientific data provenance","authors":"Peng Chen, Beth Plale, M. Aktaş","doi":"10.1109/eScience.2012.6404477","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404477","url":null,"abstract":"Provenance of digital scientific data is an important piece of the metadata of a data object. It can however grow voluminous quickly because the granularity level of capture can be high. It can also be quite feature rich. We propose a representation of the provenance data based on logical time that reduces the feature space. Creating time and frequency domain representations of the provenance, we apply clustering, classification and association rule mining to the abstract representations to determine the usefulness of the temporal representation. We evaluate the temporal representation using an existing 10 GB database of provenance captured from a range of scientific workflows.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87441105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
A data-driven urban research environment for Australia 数据驱动的澳大利亚城市研究环境
Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404481
R. Sinnott, Christopher Bayliss, G. Galang, Phillip Greenwood, George Koetsier, D. Mannix, L. Morandini, Marcos Nino-Ruiz, C. Pettit, Martin Tomko, M. Sarwar, R. Stimson, W. Voorsluys, I. Widjaja
The Australian Urban Research Infrastructure Network (AURIN) project (www.aurin.org.au) is tasked with developing an e-Infrastructure to support urban and built environment research across Australia. As identified in [1], this e-Infrastructure must provide seamless access to highly distributed and heterogeneous data sets from multiple organisations with accompanying analytical and visualization capabilities. The project is tasked with delivering a secure, web-based unifying environment offering a one-stop-shop for Australia-wide urban and built environment research. This paper describes the architectural design and implementation of the AURIN data-driven e-Infrastructure, where data is not just a passive entity that is accessed and used as a consequence of research demand, but is instead, directly shaping the computational access, processing and intelligent utilization possibilities. This is demonstrated in a situational context.
澳大利亚城市研究基础设施网络(AURIN)项目(www.aurin.org.au)的任务是开发电子基础设施,以支持澳大利亚各地的城市和建筑环境研究。正如[1]中所确定的那样,这个电子基础设施必须提供对来自多个组织的高度分布式和异构数据集的无缝访问,并附带分析和可视化功能。该项目的任务是提供一个安全的、基于网络的统一环境,为澳大利亚范围内的城市和建筑环境研究提供一站式服务。本文描述了AURIN数据驱动的电子基础设施的架构设计和实现,其中数据不仅仅是作为研究需求的结果而被访问和使用的被动实体,而是直接塑造计算访问,处理和智能利用的可能性。这在情景上下文中进行了演示。
{"title":"A data-driven urban research environment for Australia","authors":"R. Sinnott, Christopher Bayliss, G. Galang, Phillip Greenwood, George Koetsier, D. Mannix, L. Morandini, Marcos Nino-Ruiz, C. Pettit, Martin Tomko, M. Sarwar, R. Stimson, W. Voorsluys, I. Widjaja","doi":"10.1109/eScience.2012.6404481","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404481","url":null,"abstract":"The Australian Urban Research Infrastructure Network (AURIN) project (www.aurin.org.au) is tasked with developing an e-Infrastructure to support urban and built environment research across Australia. As identified in [1], this e-Infrastructure must provide seamless access to highly distributed and heterogeneous data sets from multiple organisations with accompanying analytical and visualization capabilities. The project is tasked with delivering a secure, web-based unifying environment offering a one-stop-shop for Australia-wide urban and built environment research. This paper describes the architectural design and implementation of the AURIN data-driven e-Infrastructure, where data is not just a passive entity that is accessed and used as a consequence of research demand, but is instead, directly shaping the computational access, processing and intelligent utilization possibilities. This is demonstrated in a situational context.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82101246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
High-performance computing without commitment: SC2IT: A cloud computing interface that makes computational science available to non-specialists 无需承诺的高性能计算:SC2IT:使非专业人员可以使用计算科学的云计算接口
Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404441
K. Jorissen, W. Johnson, F. Vila, J. Rehr
Computational work is a vital part of many scientific studies. In materials science research in particular, theoretical models are often needed to understand measurements. There is currently a double barrier that keeps a broad class of researchers from using state-of-the-art materials science codes: the software typically lacks user-friendliness, and the hardware requirements can demand a significant investment, e.g. the purchase of a Beowulf cluster. Scientific Cloud Computing has the potential to remove this barrier and make computational science accessible to a wider class of scientists who are not computational specialists. We present a set of interface tools, SC2IT, that enables seamless control of virtual compute clusters in the Amazon EC2 cloud and is designed to be embedded in user-friendly Java GUIs. We present applications of our Scientific Cloud Computing method to the materials science codes FEFF9, WIEN2k, and MEEP-mpi. SC2IT and the paradigm described here are applicable to other fields of research outside materials science within current Cloud Computing capability.
计算工作是许多科学研究的重要组成部分。特别是在材料科学研究中,通常需要理论模型来理解测量。目前,有双重障碍阻碍了大量研究人员使用最先进的材料科学代码:软件通常缺乏用户友好性,硬件要求可能需要大量投资,例如购买贝奥武夫集群。科学云计算有可能消除这一障碍,使更多非计算专家的科学家能够接触到计算科学。我们提供了一组接口工具SC2IT,它可以无缝地控制Amazon EC2云中的虚拟计算集群,并被设计为嵌入到用户友好的Java gui中。我们介绍了我们的科学云计算方法在材料科学代码FEFF9, WIEN2k和MEEP-mpi中的应用。SC2IT和这里描述的范例适用于当前云计算能力范围内材料科学以外的其他研究领域。
{"title":"High-performance computing without commitment: SC2IT: A cloud computing interface that makes computational science available to non-specialists","authors":"K. Jorissen, W. Johnson, F. Vila, J. Rehr","doi":"10.1109/eScience.2012.6404441","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404441","url":null,"abstract":"Computational work is a vital part of many scientific studies. In materials science research in particular, theoretical models are often needed to understand measurements. There is currently a double barrier that keeps a broad class of researchers from using state-of-the-art materials science codes: the software typically lacks user-friendliness, and the hardware requirements can demand a significant investment, e.g. the purchase of a Beowulf cluster. Scientific Cloud Computing has the potential to remove this barrier and make computational science accessible to a wider class of scientists who are not computational specialists. We present a set of interface tools, SC2IT, that enables seamless control of virtual compute clusters in the Amazon EC2 cloud and is designed to be embedded in user-friendly Java GUIs. We present applications of our Scientific Cloud Computing method to the materials science codes FEFF9, WIEN2k, and MEEP-mpi. SC2IT and the paradigm described here are applicable to other fields of research outside materials science within current Cloud Computing capability.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80195839","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Discovering drug targets for neglected diseases using a pharmacophylogenomic cloud workflow 使用药理学云工作流发现被忽视疾病的药物靶点
Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404431
Kary A. C. S. Ocaña, Daniel de Oliveira, Jonas Dias, Eduardo S. Ogasawara, M. Mattoso
Illnesses caused by parasitic protozoan are a research priority. A representative group of these illnesses is the commonly known as Neglected Tropical Diseases (NTD). NTD specially attack low socioeconomic population around the world and new anti-protozoan inhibitors are needed and several drug discovery projects focus on researching new drug targets. Pharmacophylogenomics is a novel bioinformatics field that aims at reducing the time and the financial cost of the drug discovery process. Pharmacophylogenomic analyses are applied mainly in the early stages of the research phase in drug discovery. Pharmacophylogenomic analysis executes several bioinformatics programs in a coherent flow to identify homologues sequences, construct phylogenetic trees and execute evolutionary and structural experiments. This way, it can be modeled as scientific workflows. Pharmacophylogenomic analysis workflows are complex, computing and data intensive and may execute during weeks. This way, it benefits from parallel execution. We propose SciPPGx, a scientific workflow that aims at providing thorough inferring support for pharmacophylogenomic hypotheses. SciPPGx is executed in parallel in a cloud using SciCumulus workflow engine. Experiments show that SciPPGx considerably reduces the total execution time up to 97.1% when compared to a sequential execution. We also present representative biological results taking advantage of the inference covering several related bioinformatics overviews.
由寄生原生动物引起的疾病是研究的重点。这些疾病的一个代表性群体是通常被忽视的热带病(NTD)。NTD特别针对世界各地低社会经济人群,需要新的抗原生动物抑制剂,一些药物发现项目正在研究新的药物靶点。药物基因组学是一个新的生物信息学领域,旨在减少药物发现过程的时间和财务成本。药物基因组学分析主要应用于药物发现的早期研究阶段。药物基因组学分析在一个连贯的流程中执行几个生物信息学程序,以识别同源序列,构建系统发育树并执行进化和结构实验。这样,就可以将其建模为科学工作流。药物基因组学分析工作流程复杂,计算和数据密集,可能在数周内完成。这样,它就可以从并行执行中获益。我们提出SciPPGx,这是一个科学的工作流程,旨在为药物基因组学假设提供全面的推断支持。SciPPGx在云中使用SciCumulus工作流引擎并行执行。实验表明,与顺序执行相比,SciPPGx大大减少了总执行时间,最多可减少97.1%。我们还介绍了代表性的生物学结果,利用涵盖几个相关生物信息学概述的推理。
{"title":"Discovering drug targets for neglected diseases using a pharmacophylogenomic cloud workflow","authors":"Kary A. C. S. Ocaña, Daniel de Oliveira, Jonas Dias, Eduardo S. Ogasawara, M. Mattoso","doi":"10.1109/eScience.2012.6404431","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404431","url":null,"abstract":"Illnesses caused by parasitic protozoan are a research priority. A representative group of these illnesses is the commonly known as Neglected Tropical Diseases (NTD). NTD specially attack low socioeconomic population around the world and new anti-protozoan inhibitors are needed and several drug discovery projects focus on researching new drug targets. Pharmacophylogenomics is a novel bioinformatics field that aims at reducing the time and the financial cost of the drug discovery process. Pharmacophylogenomic analyses are applied mainly in the early stages of the research phase in drug discovery. Pharmacophylogenomic analysis executes several bioinformatics programs in a coherent flow to identify homologues sequences, construct phylogenetic trees and execute evolutionary and structural experiments. This way, it can be modeled as scientific workflows. Pharmacophylogenomic analysis workflows are complex, computing and data intensive and may execute during weeks. This way, it benefits from parallel execution. We propose SciPPGx, a scientific workflow that aims at providing thorough inferring support for pharmacophylogenomic hypotheses. SciPPGx is executed in parallel in a cloud using SciCumulus workflow engine. Experiments show that SciPPGx considerably reduces the total execution time up to 97.1% when compared to a sequential execution. We also present representative biological results taking advantage of the inference covering several related bioinformatics overviews.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74944990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
BIGS: A framework for large-scale image processing and analysis over distributed and heterogeneous computing resources BIGS:用于在分布式和异构计算资源上进行大规模图像处理和分析的框架
Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404424
R. Ramos-Pollán, F. González, Juan C. Caicedo, Angel Cruz-Roa, Jorge E. Camargo, Jorge A. Vanegas, Santiago A. Pérez-Rubiano, J. Bermeo, Juan Sebastian Otálora Montenegro, Paola K. Rozo, John Arevalo
This paper presents BIGS the Big Image Data Analysis Toolkit, a software framework for large scale image processing and analysis over heterogeneous computing resources, such as those available in clouds, grids, computer clusters or throughout scattered computer resources (desktops, labs) in an opportunistic manner. Through BIGS, eScience for image processing and analysis is conceived to exploit coarse grained parallelism based on data partitioning and parameter sweeps, avoiding the need of inter-process communication and, therefore, enabling loosely coupled computing nodes (BIGS workers). It adopts an uncommitted resource allocation model where (1) experimenters define their image processing pipelines in a simple configuration file, (2) a schedule of jobs is generated and (3) workers, as they become available, take over pending jobs as long as their dependency on other jobs is fulfilled. BIGS workers act autonomously, querying the job schedule to determine which one to take over. This removes the need for a central scheduling node, requiring only access by all workers to a shared information source. Furthermore, BIGS workers are encapsulated within different technologies to enable their agile deployment over the available computing resources. Currently they can be launched through the Amazon EC2 service over their cloud resources, through Java Web Start from any desktop computer and through regular scripting or SSH commands. This suits well different kinds of research environments, both when accessing dedicated computing clusters or clouds with committed computing capacity or when using opportunistic computing resources whose access is seldom or cannot be provisioned in advance. We also adopt a NoSQL storage model to ensure the scalability of the shared information sources required by all workers, including within BIGS support for HBase and Amazon's DynamoDB service. Overall, BIGS now enables researchers to run large scale image processing pipelines in an easy, affordable and unplanned manner with the capability to take over computing resources as they become available at run time. This is shown in this paper by using BIGS in different experimental setups in the Amazon cloud and in an opportunistic manner, demonstrating its configurability, adaptability and scalability capabilities.
本文介绍了BIGS,即大图像数据分析工具包,这是一个软件框架,用于在异构计算资源上进行大规模图像处理和分析,例如在云、网格、计算机集群或整个分散的计算机资源(桌面、实验室)中可用的资源。通过BIGS,用于图像处理和分析的eScience被设想为利用基于数据分区和参数扫描的粗粒度并行性,避免了进程间通信的需要,因此,实现了松耦合计算节点(BIGS worker)。它采用一种未提交的资源分配模型,其中:(1)实验者在一个简单的配置文件中定义他们的图像处理管道,(2)生成一个作业计划,(3)当工人可用时,只要完成对其他作业的依赖,就接管待处理的作业。BIGS工人自主行动,查询工作计划以确定哪一个接管。这消除了对中央调度节点的需求,只需要所有工作人员访问共享信息源。此外,BIGS工作器被封装在不同的技术中,以便在可用的计算资源上实现敏捷部署。目前,它们可以通过云资源上的Amazon EC2服务启动,也可以通过Java Web Start从任何桌面计算机启动,还可以通过常规脚本或SSH命令启动。这非常适合不同类型的研究环境,无论是在访问具有承诺计算能力的专用计算集群或云时,还是在使用访问很少或无法预先提供的机会性计算资源时。我们还采用了NoSQL存储模型,以确保所有工作人员所需的共享信息源的可扩展性,包括BIGS对HBase和Amazon的DynamoDB服务的支持。总的来说,BIGS现在使研究人员能够以一种简单、负担得起和计划外的方式运行大规模图像处理管道,并能够在运行时接管可用的计算资源。本文通过在亚马逊云中的不同实验设置中使用BIGS,并以机会主义的方式展示了它的可配置性、适应性和可扩展性。
{"title":"BIGS: A framework for large-scale image processing and analysis over distributed and heterogeneous computing resources","authors":"R. Ramos-Pollán, F. González, Juan C. Caicedo, Angel Cruz-Roa, Jorge E. Camargo, Jorge A. Vanegas, Santiago A. Pérez-Rubiano, J. Bermeo, Juan Sebastian Otálora Montenegro, Paola K. Rozo, John Arevalo","doi":"10.1109/eScience.2012.6404424","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404424","url":null,"abstract":"This paper presents BIGS the Big Image Data Analysis Toolkit, a software framework for large scale image processing and analysis over heterogeneous computing resources, such as those available in clouds, grids, computer clusters or throughout scattered computer resources (desktops, labs) in an opportunistic manner. Through BIGS, eScience for image processing and analysis is conceived to exploit coarse grained parallelism based on data partitioning and parameter sweeps, avoiding the need of inter-process communication and, therefore, enabling loosely coupled computing nodes (BIGS workers). It adopts an uncommitted resource allocation model where (1) experimenters define their image processing pipelines in a simple configuration file, (2) a schedule of jobs is generated and (3) workers, as they become available, take over pending jobs as long as their dependency on other jobs is fulfilled. BIGS workers act autonomously, querying the job schedule to determine which one to take over. This removes the need for a central scheduling node, requiring only access by all workers to a shared information source. Furthermore, BIGS workers are encapsulated within different technologies to enable their agile deployment over the available computing resources. Currently they can be launched through the Amazon EC2 service over their cloud resources, through Java Web Start from any desktop computer and through regular scripting or SSH commands. This suits well different kinds of research environments, both when accessing dedicated computing clusters or clouds with committed computing capacity or when using opportunistic computing resources whose access is seldom or cannot be provisioned in advance. We also adopt a NoSQL storage model to ensure the scalability of the shared information sources required by all workers, including within BIGS support for HBase and Amazon's DynamoDB service. Overall, BIGS now enables researchers to run large scale image processing pipelines in an easy, affordable and unplanned manner with the capability to take over computing resources as they become available at run time. This is shown in this paper by using BIGS in different experimental setups in the Amazon cloud and in an opportunistic manner, demonstrating its configurability, adaptability and scalability capabilities.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75651632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
IRMIS: The care and feeding of a generalized relatively relational database for accelerator components with a connection to the real time EPICS Input output controllers IRMIS:与实时EPICS输入输出控制器连接的加速器组件的通用相对关系数据库的维护和馈送
Pub Date : 2012-10-01 DOI: 10.1109/eScience.2012.6404469
R. Farnsworth, S. Benes
IRMIS: The care and feeding of a generalized relatively relational database for accelerator components with a connection to the real time EPICS Input output controllers. This paper describes a relational database approach to documenting and maintaining; the feeding. It describes the automated process used to generate accelerator or synchrotron component data for the relational tables and the role of devices and components. The data this obtained turn may be used or presented in a variety of ways to the end use in order to either optimize the maintenance or to provide machine metadata for experimental performance purposes.
IRMIS:与实时EPICS输入输出控制器连接的加速器组件的通用相对关系数据库的维护和馈送。本文描述了一种关系数据库的文档和维护方法;喂养。它描述了用于为关系表生成加速器或同步加速器组件数据的自动化过程以及设备和组件的角色。获得的数据可以以多种方式使用或呈现给最终用途,以便优化维护或为实验性能目的提供机器元数据。
{"title":"IRMIS: The care and feeding of a generalized relatively relational database for accelerator components with a connection to the real time EPICS Input output controllers","authors":"R. Farnsworth, S. Benes","doi":"10.1109/eScience.2012.6404469","DOIUrl":"https://doi.org/10.1109/eScience.2012.6404469","url":null,"abstract":"IRMIS: The care and feeding of a generalized relatively relational database for accelerator components with a connection to the real time EPICS Input output controllers. This paper describes a relational database approach to documenting and maintaining; the feeding. It describes the automated process used to generate accelerator or synchrotron component data for the relational tables and the role of devices and components. The data this obtained turn may be used or presented in a variety of ways to the end use in order to either optimize the maintenance or to provide machine metadata for experimental performance purposes.","PeriodicalId":6364,"journal":{"name":"2012 IEEE 8th International Conference on E-Science","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2012-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83187892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
2012 IEEE 8th International Conference on E-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1