首页 > 最新文献

2013 IEEE 9th International Conference on e-Science最新文献

英文 中文
Mining Common Spatial-Temporal Periodic Patterns of Animal Movement 挖掘动物运动的共同时空周期模式
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.11
Yuwei Wang, Ze Luo, Gang Qin, Yuanchun Zhou, Danhuai Guo, Baoping Yan
Advanced satellite tracking technologies enable biologists to track animal movements at finer spatial and temporal scales. The resulting long-term movement data is very meaningful for understanding animal activities. Periodic pattern analysis can provide insightful approach to reveal animal activity patterns. However, individual GPS data is usually incomplete and in limited lifespan. In addition, individual periodic behaviors are inherently complicated with many uncertainties. In this paper, we address the problem of mining periodic patterns of animal movements by combining multiple individuals with similar periodicities. We formally define the problem of mining common periodicity and propose a novel periodicity measure. We introduce the information entropy in the proposed measure to detect common period. Data incompleteness, noises, and ambiguity of individual periodicity are considered in our method. Furthermore, we mine multiple common periodic patterns by grouping periodic segments w.r.t. the detected period, and provide a visualization method of common periodic patterns by designing a cyclical filled line chart. To assess effectiveness of our proposed method, we provide an experimental study using a real GPS dataset collected on 29 birds in Qinghai Lake, China.
先进的卫星跟踪技术使生物学家能够在更精细的空间和时间尺度上跟踪动物的运动。由此产生的长期运动数据对了解动物活动非常有意义。周期性模式分析可以提供深刻的方法来揭示动物的活动模式。然而,单个GPS数据通常是不完整的,而且使用寿命有限。此外,单个周期行为本身就具有许多不确定性。在本文中,我们通过结合具有相似周期的多个个体来解决挖掘动物运动周期模式的问题。我们正式定义了公共周期的挖掘问题,并提出了一种新的周期测度。我们在该方法中引入了信息熵来检测共同周期。该方法考虑了数据的不完全性、噪声和单个周期的模糊性。在此基础上,通过对检测到的周期分段进行分组,挖掘出多个常见的周期模式,并通过设计周期填充折线图,提供了一种常见周期模式的可视化方法。为了验证该方法的有效性,我们利用青海湖29只鸟类的真实GPS数据进行了实验研究。
{"title":"Mining Common Spatial-Temporal Periodic Patterns of Animal Movement","authors":"Yuwei Wang, Ze Luo, Gang Qin, Yuanchun Zhou, Danhuai Guo, Baoping Yan","doi":"10.1109/eScience.2013.11","DOIUrl":"https://doi.org/10.1109/eScience.2013.11","url":null,"abstract":"Advanced satellite tracking technologies enable biologists to track animal movements at finer spatial and temporal scales. The resulting long-term movement data is very meaningful for understanding animal activities. Periodic pattern analysis can provide insightful approach to reveal animal activity patterns. However, individual GPS data is usually incomplete and in limited lifespan. In addition, individual periodic behaviors are inherently complicated with many uncertainties. In this paper, we address the problem of mining periodic patterns of animal movements by combining multiple individuals with similar periodicities. We formally define the problem of mining common periodicity and propose a novel periodicity measure. We introduce the information entropy in the proposed measure to detect common period. Data incompleteness, noises, and ambiguity of individual periodicity are considered in our method. Furthermore, we mine multiple common periodic patterns by grouping periodic segments w.r.t. the detected period, and provide a visualization method of common periodic patterns by designing a cyclical filled line chart. To assess effectiveness of our proposed method, we provide an experimental study using a real GPS dataset collected on 29 birds in Qinghai Lake, China.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126564506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Sharing Australia's Nationally Significant Terrestrial Ecosystem Data: A Collaboration between TERN and ANDS 共享澳大利亚国家级重要陆地生态系统数据:TERN和ANDS之间的合作
Pub Date : 2013-10-22 DOI: 10.1109/ESCIENCE.2013.28
S. Guru, Xiaobin Shen, C. Love, A. Treloar, S. Phinn, Ross Wilkinson, Cathrine Brady, P. Isaac, T. Clancy
Collection based approaches are commonly used in libraries for collections of physical and electronic resources. However nationally significant collections of research data are new development, one that is increasing importance to researchers. Bringing together datasets of national significance and making them openly accessible will enable to address some of the critical questions facing our society. In ecosystem domain, this will enable us to understand more about causes and effects of changes in the ecosystem. An implementation case study based on Tern's OzFlux data collections as a national collection program initiated by ANDS is presented. This paper demonstrates how the Terrestrial Ecosystem Research Network (TERN) and the Australian National Data Service (ANDS) are working together to identify and publish nationally significant ecosystem data collections to enhance discoverability, accessibility and re-use.
基于集合的方法通常用于图书馆的物理和电子资源集合。然而,全国性的重要研究数据收集是新的发展,对研究人员来说越来越重要。汇集具有国家意义的数据集并使其公开可访问,将能够解决我们社会面临的一些关键问题。在生态系统领域,这将使我们能够更多地了解生态系统变化的原因和影响。提出了一个基于Tern的OzFlux数据收集的实施案例研究,该数据收集是由ANDS发起的国家收集计划。本文展示了陆地生态系统研究网络(TERN)和澳大利亚国家数据服务(ANDS)如何共同确定和发布全国重要的生态系统数据收集,以提高可发现性、可访问性和重用性。
{"title":"Sharing Australia's Nationally Significant Terrestrial Ecosystem Data: A Collaboration between TERN and ANDS","authors":"S. Guru, Xiaobin Shen, C. Love, A. Treloar, S. Phinn, Ross Wilkinson, Cathrine Brady, P. Isaac, T. Clancy","doi":"10.1109/ESCIENCE.2013.28","DOIUrl":"https://doi.org/10.1109/ESCIENCE.2013.28","url":null,"abstract":"Collection based approaches are commonly used in libraries for collections of physical and electronic resources. However nationally significant collections of research data are new development, one that is increasing importance to researchers. Bringing together datasets of national significance and making them openly accessible will enable to address some of the critical questions facing our society. In ecosystem domain, this will enable us to understand more about causes and effects of changes in the ecosystem. An implementation case study based on Tern's OzFlux data collections as a national collection program initiated by ANDS is presented. This paper demonstrates how the Terrestrial Ecosystem Research Network (TERN) and the Australian National Data Service (ANDS) are working together to identify and publish nationally significant ecosystem data collections to enhance discoverability, accessibility and re-use.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"57 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125358786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
An Algorithm for Cost-Effectively Storing Scientific Datasets with Multiple Service Providers in the Cloud 在云中与多个服务提供商经济有效地存储科学数据集的算法
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.34
Dong Yuan, X. Liu, Li-zhen Cui, Tiantian Zhang, Wenhao Li, Dahai Cao, Yun Yang
The proliferation of cloud computing allows scientists to deploy computation and data intensive applications without infrastructure investment, where large generated datasets can be flexibly stored with multiple cloud service providers. Due to the pay-as-you-go model, the total application cost largely depends on the usage of computation, storage and bandwidth resources, and cutting the cost of cloud-based data storage becomes a big concern for deploying scientific applications in the cloud. In this paper, we propose a novel algorithm that can automatically decide whether a generated dataset should be 1) stored in the current cloud, 2) deleted and re-generated whenever reused or 3) transferred to cheaper cloud service for storage. The algorithm finds the trade-off among computation, storage and bandwidth costs in the cloud, which are three key factors for the cost of storing generated application datasets with multiple cloud service providers. Simulations conducted with popular cloud service providers' pricing models show that the proposed algorithm is highly cost-effective to be utilised in the cloud.
云计算的扩散使科学家能够部署计算和数据密集型应用程序,而无需基础设施投资,其中生成的大型数据集可以灵活地存储在多个云服务提供商中。由于采用按需付费的模式,应用程序的总成本在很大程度上取决于计算、存储和带宽资源的使用情况,削减基于云的数据存储成本成为在云中部署科学应用程序的一个大问题。在本文中,我们提出了一种新的算法,该算法可以自动决定生成的数据集是否应该1)存储在当前云中,2)在重用时删除并重新生成,或者3)转移到更便宜的云服务进行存储。该算法找到了云计算、存储和带宽成本之间的权衡,这是在多个云服务提供商中存储生成的应用程序数据集的成本的三个关键因素。用流行的云服务提供商的定价模型进行的仿真表明,所提出的算法在云中使用具有很高的成本效益。
{"title":"An Algorithm for Cost-Effectively Storing Scientific Datasets with Multiple Service Providers in the Cloud","authors":"Dong Yuan, X. Liu, Li-zhen Cui, Tiantian Zhang, Wenhao Li, Dahai Cao, Yun Yang","doi":"10.1109/eScience.2013.34","DOIUrl":"https://doi.org/10.1109/eScience.2013.34","url":null,"abstract":"The proliferation of cloud computing allows scientists to deploy computation and data intensive applications without infrastructure investment, where large generated datasets can be flexibly stored with multiple cloud service providers. Due to the pay-as-you-go model, the total application cost largely depends on the usage of computation, storage and bandwidth resources, and cutting the cost of cloud-based data storage becomes a big concern for deploying scientific applications in the cloud. In this paper, we propose a novel algorithm that can automatically decide whether a generated dataset should be 1) stored in the current cloud, 2) deleted and re-generated whenever reused or 3) transferred to cheaper cloud service for storage. The algorithm finds the trade-off among computation, storage and bandwidth costs in the cloud, which are three key factors for the cost of storing generated application datasets with multiple cloud service providers. Simulations conducted with popular cloud service providers' pricing models show that the proposed algorithm is highly cost-effective to be utilised in the cloud.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127797984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Decentralized Prioritization-Based Management Systems for Distributed Computing 分布式计算的分散优先级管理系统
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.44
Per-Olov Östberg, E. Elmroth
Fairshare scheduling is an established technique to provide user-level differentiation in management of capacity consumption in high-performance and grid computing scheduler systems. In this paper we extend on a state-of-the-art approach to decentralized grid fairs hare and propose a generalized model for construction of decentralized prioritization-based management systems. The approach is based on (re)formulation of control problems as prioritization problems, and a proposed framework for computationally efficient decentralized priority calculation. The model is presented along with a discussion of application of decentralized management systems in distributed computing environments that outlines selected use cases and illustrates key trade-off behaviors of the proposed model.
Fairshare调度是一种成熟的技术,用于在高性能和网格计算调度系统的容量消耗管理中提供用户级别的差异化。在本文中,我们扩展了最先进的分散式网格集市方法,并提出了一个用于构建基于优先级的分散式管理系统的通用模型。该方法基于将控制问题(重新)表述为优先级问题,并提出了计算效率高的分散优先级计算框架。该模型与分布式计算环境中分散管理系统应用的讨论一起提出,该环境概述了选定的用例并说明了所提议模型的关键权衡行为。
{"title":"Decentralized Prioritization-Based Management Systems for Distributed Computing","authors":"Per-Olov Östberg, E. Elmroth","doi":"10.1109/eScience.2013.44","DOIUrl":"https://doi.org/10.1109/eScience.2013.44","url":null,"abstract":"Fairshare scheduling is an established technique to provide user-level differentiation in management of capacity consumption in high-performance and grid computing scheduler systems. In this paper we extend on a state-of-the-art approach to decentralized grid fairs hare and propose a generalized model for construction of decentralized prioritization-based management systems. The approach is based on (re)formulation of control problems as prioritization problems, and a proposed framework for computationally efficient decentralized priority calculation. The model is presented along with a discussion of application of decentralized management systems in distributed computing environments that outlines selected use cases and illustrates key trade-off behaviors of the proposed model.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127735168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
A Geographical Approach for Metadata Quality Improvement in Biological Observation Databases 提高生物观测数据库元数据质量的地理方法
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.14
D. C. Cugler, C. B. Medeiros, S. Shekhar, L. F. Toledo
This paper addresses the problem of improving the quality of metadata in biological observation databases, in particular those associated with observations of living beings, and which are often used as a starting point for biodiversity analyses. Poor quality metadata lead to incorrect scientific conclusions, and can mislead experts. Thus, it is important to design and develop methods to detect and correct metadata quality problems. This is a challenging problem because of the variety of issues concerning such metadata, e.g., misnaming of species, location uncertainty and imprecision concerning where observations were recorded. Related work is limited because it does not adequately model such issues. We propose a geographic approach based on expert-led classification of place and/or range mismatch anomalies detected by our algorithms. Our approach enables detection of anomalies in both species' reported geographic distributions and in species' identification. Our main contribution is our geographic algorithm that deals with uncertain/imprecise locations. Our work is tested using a case study with the Fonoteca Neotropical Jacques Vielliard, one of the 10 largest animal sound collections in the world.
本文讨论了如何提高生物观测数据库中元数据的质量,特别是那些与生物观测相关的元数据,这些元数据通常被用作生物多样性分析的起点。质量差的元数据会导致不正确的科学结论,并可能误导专家。因此,设计和开发检测和纠正元数据质量问题的方法非常重要。这是一个具有挑战性的问题,因为与这种元数据有关的各种问题,例如,物种的错误命名,地点的不确定性和观测记录地点的不精确。相关工作是有限的,因为它没有充分模拟这些问题。我们提出了一种地理方法,该方法基于我们的算法检测到的由专家主导的地点和/或范围不匹配异常分类。我们的方法可以检测物种报告的地理分布和物种鉴定中的异常。我们的主要贡献是处理不确定/不精确位置的地理算法。我们的工作通过与Fonoteca Neotropical Jacques Vielliard(世界十大动物声音收藏之一)的案例研究进行了验证。
{"title":"A Geographical Approach for Metadata Quality Improvement in Biological Observation Databases","authors":"D. C. Cugler, C. B. Medeiros, S. Shekhar, L. F. Toledo","doi":"10.1109/eScience.2013.14","DOIUrl":"https://doi.org/10.1109/eScience.2013.14","url":null,"abstract":"This paper addresses the problem of improving the quality of metadata in biological observation databases, in particular those associated with observations of living beings, and which are often used as a starting point for biodiversity analyses. Poor quality metadata lead to incorrect scientific conclusions, and can mislead experts. Thus, it is important to design and develop methods to detect and correct metadata quality problems. This is a challenging problem because of the variety of issues concerning such metadata, e.g., misnaming of species, location uncertainty and imprecision concerning where observations were recorded. Related work is limited because it does not adequately model such issues. We propose a geographic approach based on expert-led classification of place and/or range mismatch anomalies detected by our algorithms. Our approach enables detection of anomalies in both species' reported geographic distributions and in species' identification. Our main contribution is our geographic algorithm that deals with uncertain/imprecise locations. Our work is tested using a case study with the Fonoteca Neotropical Jacques Vielliard, one of the 10 largest animal sound collections in the world.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"199 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128218935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Biocharts: Unifying Biological Hypotheses with Models and Experiments 生物图:用模型和实验统一生物学假说
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.41
H. Kugler
Understanding how biological systems develop and function remains one of the main open scientific challenges of our times. An improved quantitative understanding of biological systems, assisted by computational models is also important for future bioengineering and biomedical applications. We present a computational approach aimed towards unifying hypotheses with models and experiments, allowing to formally represent what a biological system does (specification) how it does it (mechanism) and systematically compare to data characterizing system behavior(experiments). We describe our Biocharts framework geared towards supporting this approach and illustrate its application in several biological domains including bacterial colony growth, developmental biology, and stem cell population dynamics.
了解生物系统的发展和功能仍然是我们这个时代的主要科学挑战之一。在计算模型的辅助下,对生物系统的定量理解的改进对未来的生物工程和生物医学应用也很重要。我们提出了一种计算方法,旨在将假设与模型和实验统一起来,允许正式表示生物系统做什么(规范),它是如何做的(机制),并系统地与表征系统行为的数据进行比较(实验)。我们描述了我们的生物图表框架,旨在支持这种方法,并说明其在几个生物学领域的应用,包括细菌菌落生长,发育生物学和干细胞群体动力学。
{"title":"Biocharts: Unifying Biological Hypotheses with Models and Experiments","authors":"H. Kugler","doi":"10.1109/eScience.2013.41","DOIUrl":"https://doi.org/10.1109/eScience.2013.41","url":null,"abstract":"Understanding how biological systems develop and function remains one of the main open scientific challenges of our times. An improved quantitative understanding of biological systems, assisted by computational models is also important for future bioengineering and biomedical applications. We present a computational approach aimed towards unifying hypotheses with models and experiments, allowing to formally represent what a biological system does (specification) how it does it (mechanism) and systematically compare to data characterizing system behavior(experiments). We describe our Biocharts framework geared towards supporting this approach and illustrate its application in several biological domains including bacterial colony growth, developmental biology, and stem cell population dynamics.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132244640","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Protein Structure Modeling in a Grid Computing Environment 网格计算环境下的蛋白质结构建模
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.15
Daniel Li, B. Tsui, Charles Xue, J. Haga, Koheix Ichikawa, S. Date
Advances in sequencing technology have resulted in an exponential increase in the availability of protein sequence information. In order to fully utilize information, it is important to translate the primary sequences into high-resolution tertiary protein structures. MODELLER is a leading homology modeling method that produces high quality protein structures. In this study, the function of MODELLER was expanded by configuring and deploying it on a parallel grid computing platform using a custom four-step workflow. The workflow consisted of template selection through a protein BLAST algorithm, target-template protein sequence alignment, distribution of model generation jobs among the compute clusters, and final protein model optimization. To test the validity of this workflow, we used the Dual Specificity Phosphatase (DSP) protein family, which shares high homology among each other. Comparison of the DSP member SSH-2 with its model counterpart revealed a minimal 1.3% difference in output energy scores. Furthermore, the Dali Pair wise Comparison Program demonstrated a 98% match among amino acid features and a Z-score of 26.6 indicating very significant similarities between the model and actual protein structure. After confirming the accuracy of our workflow, we generated 23 previously unknown DSP family protein structure models. Over 40,000 models were generated 30 times faster than conventional computing. Virtual receptor-ligand screening results of modeled protein DSP21 were compared with two known structures that had either higher or lower structural homology to DSP21. There was a significant difference (p!0.001) between the average ligand ranking discrepancy of a more homologous protein pair and a less homologous protein pair, suggesting that the protein models generated were sufficiently accurate for virtual screening. These results demonstrate the accuracy and usability of a grid-enabled MODELLER program and the increased efficiency of processing protein structure models. This workflow will help increase the speed of future drug development pipelines.
测序技术的进步导致蛋白质序列信息的可用性呈指数级增长。为了充分利用信息,将一级序列翻译成高分辨率的三级蛋白结构是很重要的。modeler是一种领先的同源建模方法,可产生高质量的蛋白质结构。在本研究中,使用自定义的四步工作流,通过配置和部署modeler在并行网格计算平台上扩展其功能。该工作流程包括通过蛋白质BLAST算法选择模板,目标模板蛋白质序列比对,在计算集群中分配模型生成作业,以及最终的蛋白质模型优化。为了验证该工作流程的有效性,我们使用了双特异性磷酸酶(DSP)蛋白家族,它们之间具有很高的同源性。DSP成员SSH-2与其模型对应的比较显示,输出能量评分差异极小,仅为1.3%。此外,Dali配对比较程序显示氨基酸特征之间的匹配率为98%,z分数为26.6,表明模型与实际蛋白质结构之间存在非常显著的相似性。在确认我们工作流程的准确性后,我们生成了23个以前未知的DSP家族蛋白质结构模型。生成4万多个模型的速度是传统计算的30倍。将模拟蛋白DSP21的虚拟受体配体筛选结果与与DSP21结构同源性较高或较低的两种已知结构进行比较。同源性较高的蛋白质对和同源性较低的蛋白质对的平均配体排序差异有显著差异(p!0.001),表明所生成的蛋白质模型对于虚拟筛选具有足够的准确性。这些结果证明了网格支持的modeler程序的准确性和可用性,以及处理蛋白质结构模型的效率提高。该工作流程将有助于提高未来药物开发管道的速度。
{"title":"Protein Structure Modeling in a Grid Computing Environment","authors":"Daniel Li, B. Tsui, Charles Xue, J. Haga, Koheix Ichikawa, S. Date","doi":"10.1109/eScience.2013.15","DOIUrl":"https://doi.org/10.1109/eScience.2013.15","url":null,"abstract":"Advances in sequencing technology have resulted in an exponential increase in the availability of protein sequence information. In order to fully utilize information, it is important to translate the primary sequences into high-resolution tertiary protein structures. MODELLER is a leading homology modeling method that produces high quality protein structures. In this study, the function of MODELLER was expanded by configuring and deploying it on a parallel grid computing platform using a custom four-step workflow. The workflow consisted of template selection through a protein BLAST algorithm, target-template protein sequence alignment, distribution of model generation jobs among the compute clusters, and final protein model optimization. To test the validity of this workflow, we used the Dual Specificity Phosphatase (DSP) protein family, which shares high homology among each other. Comparison of the DSP member SSH-2 with its model counterpart revealed a minimal 1.3% difference in output energy scores. Furthermore, the Dali Pair wise Comparison Program demonstrated a 98% match among amino acid features and a Z-score of 26.6 indicating very significant similarities between the model and actual protein structure. After confirming the accuracy of our workflow, we generated 23 previously unknown DSP family protein structure models. Over 40,000 models were generated 30 times faster than conventional computing. Virtual receptor-ligand screening results of modeled protein DSP21 were compared with two known structures that had either higher or lower structural homology to DSP21. There was a significant difference (p!0.001) between the average ligand ranking discrepancy of a more homologous protein pair and a less homologous protein pair, suggesting that the protein models generated were sufficiently accurate for virtual screening. These results demonstrate the accuracy and usability of a grid-enabled MODELLER program and the increased efficiency of processing protein structure models. This workflow will help increase the speed of future drug development pipelines.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121220086","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallelizing Astronomical Source Extraction on the GPU GPU上的并行天文源提取
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.10
B. Zhao, Qiong Luo, Chao Wu
In astronomical observatory projects, raw images are processed so that information about the celestial objects in the images is extracted into catalogs. As such, this source extraction is the basis for the various analysis tasks that are subsequently performed on the catalog products. With the rapid progress of new, large astronomical projects, observational images will be produced every few seconds. This high speed of image production requires fast source extraction. Unfortunately, current source extraction tools cannot meet the speed requirement. To address this problem, we propose to use the GPU (Graphics Processing Unit) to accelerate source extraction. Specifically, we start from SExtractor, an astronomical source extraction tool widely used in astronomy projects, and study its parallelization on the GPU. We identify the object detection and deblending components as the most complex and time-consuming, and design a parallel connected component labelling algorithm for detection and a parallel object tree pruning method for deblending respectively on the GPU. We further parallelize other components, including cleaning, background subtraction, and measurement, effectively on the GPU, such that the entire source extraction is done on the GPU. We have evaluated our GPU-SExtractor in comparison with the original SExtractor on a desktop with an Intel i7 CPU and an NVIDIA GTX670 GPU on a set of real-world and synthetic astronomical images of different sizes. Our results show that the GPU-SExtractor outperforms the original SExtractor by a factor of 6, taking a merely 1.9 second to process a typical 4KX4K image containing 167 thousands objects.
在天文台项目中,对原始图像进行处理,以便将图像中天体的信息提取到目录中。因此,此源提取是随后在目录产品上执行的各种分析任务的基础。随着新的大型天文项目的快速发展,每隔几秒钟就会产生观测图像。这种高速的图像生成需要快速的源提取。不幸的是,目前的源提取工具不能满足速度要求。为了解决这个问题,我们建议使用GPU(图形处理单元)来加速源提取。具体来说,我们从天文学项目中广泛使用的天文源提取工具SExtractor开始,研究其在GPU上的并行化。我们认为目标检测和去混是最复杂和耗时的,并在GPU上分别设计了用于检测的并行连接分量标记算法和用于去混的并行目标树修剪方法。我们进一步在GPU上并行化其他组件,包括清理、背景减去和测量,这样整个源提取就在GPU上完成了。我们在一组不同尺寸的真实世界和合成天文图像上,将我们的GPU-SExtractor与具有Intel i7 CPU和NVIDIA GTX670 GPU的台式机上的原始SExtractor进行了比较。我们的结果表明,GPU-SExtractor的性能比原来的SExtractor高出6倍,处理一张包含16.7万个对象的典型4KX4K图像只需1.9秒。
{"title":"Parallelizing Astronomical Source Extraction on the GPU","authors":"B. Zhao, Qiong Luo, Chao Wu","doi":"10.1109/eScience.2013.10","DOIUrl":"https://doi.org/10.1109/eScience.2013.10","url":null,"abstract":"In astronomical observatory projects, raw images are processed so that information about the celestial objects in the images is extracted into catalogs. As such, this source extraction is the basis for the various analysis tasks that are subsequently performed on the catalog products. With the rapid progress of new, large astronomical projects, observational images will be produced every few seconds. This high speed of image production requires fast source extraction. Unfortunately, current source extraction tools cannot meet the speed requirement. To address this problem, we propose to use the GPU (Graphics Processing Unit) to accelerate source extraction. Specifically, we start from SExtractor, an astronomical source extraction tool widely used in astronomy projects, and study its parallelization on the GPU. We identify the object detection and deblending components as the most complex and time-consuming, and design a parallel connected component labelling algorithm for detection and a parallel object tree pruning method for deblending respectively on the GPU. We further parallelize other components, including cleaning, background subtraction, and measurement, effectively on the GPU, such that the entire source extraction is done on the GPU. We have evaluated our GPU-SExtractor in comparison with the original SExtractor on a desktop with an Intel i7 CPU and an NVIDIA GTX670 GPU on a set of real-world and synthetic astronomical images of different sizes. Our results show that the GPU-SExtractor outperforms the original SExtractor by a factor of 6, taking a merely 1.9 second to process a typical 4KX4K image containing 167 thousands objects.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117062419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Dependency Provenance in Agent Based Modeling 基于Agent的建模中的依赖关系来源
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.39
Peng Chen, Beth Plale, Tom Evans
Researchers who use agent-based models (ABM) to model social patterns often focus on the model's aggregate phenomena. However, aggregation of individuals complicates the understanding of agent interactions and the uniqueness of individuals. We develop a method for tracing and capturing the provenance of individuals and their interactions in the Net Logo ABM, and from this create a "dependency provenance slice", which combines a data slice and a program slice to yield insights into the cause-effect relations among system behaviors. To cope with the large volume of fine-grained provenance traces, we propose use-inspired filters to reduce the amount of provenance, and a provenance slicing technique called "non-preprocessing provenance slicing" that directly queries over provenance traces without recovering all provenance entities and dependencies beforehand. We evaluate performance and utility using a well known ecological Net Logo model called "wolf-sheep-predation".
使用基于主体的模型(ABM)对社会模式进行建模的研究人员通常关注模型的聚合现象。然而,个体的聚集使对代理相互作用和个体独特性的理解变得复杂。我们开发了一种在Net Logo ABM中跟踪和捕获个体及其交互的来源的方法,并由此创建了一个“依赖来源片”,它结合了数据片和程序片,以深入了解系统行为之间的因果关系。为了应对大量细粒度的来源痕迹,我们提出了使用启发过滤器来减少来源数量,并提出了一种称为“非预处理来源切片”的来源切片技术,该技术直接查询来源痕迹,而无需事先恢复所有来源实体和依赖关系。我们使用一个众所周知的生态网络标志模型“狼-羊-捕食”来评估性能和效用。
{"title":"Dependency Provenance in Agent Based Modeling","authors":"Peng Chen, Beth Plale, Tom Evans","doi":"10.1109/eScience.2013.39","DOIUrl":"https://doi.org/10.1109/eScience.2013.39","url":null,"abstract":"Researchers who use agent-based models (ABM) to model social patterns often focus on the model's aggregate phenomena. However, aggregation of individuals complicates the understanding of agent interactions and the uniqueness of individuals. We develop a method for tracing and capturing the provenance of individuals and their interactions in the Net Logo ABM, and from this create a \"dependency provenance slice\", which combines a data slice and a program slice to yield insights into the cause-effect relations among system behaviors. To cope with the large volume of fine-grained provenance traces, we propose use-inspired filters to reduce the amount of provenance, and a provenance slicing technique called \"non-preprocessing provenance slicing\" that directly queries over provenance traces without recovering all provenance entities and dependencies beforehand. We evaluate performance and utility using a well known ecological Net Logo model called \"wolf-sheep-predation\".","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124190557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Balanced Task Clustering in Scientific Workflows 科学工作流中的均衡任务聚类
Pub Date : 2013-10-22 DOI: 10.1109/eScience.2013.40
Weiwei Chen, Rafael Ferreira da Silva, E. Deelman, R. Sakellariou
Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In our work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Next, we propose quantitative metrics to evaluate the severity of the two imbalance problems respectively. Furthermore, we propose a series of task balancing methods to address these imbalance problems. Finally, we analyze their relationship with the performance of these task balancing methods. A trace-based simulation shows our methods can significantly improve the runtime performance of two widely used workflows compared to the actual implementation of task clustering.
科学工作流可以由许多精细的计算粒度任务组成。例如,当使用云基础设施的多个资源时,这些任务的运行时间可能比系统开销的持续时间短。任务集群是一种运行时优化技术,它将多个短任务合并到单个作业中,从而减少调度开销并提高整体运行时性能。然而,现有的任务聚类策略只提供一种依赖于过度简化的工作流模型的粗粒度方法。在我们的工作中,我们研究了导致任务集群中运行时不平衡和依赖不平衡的原因。接下来,我们分别提出量化指标来评估这两种失衡问题的严重程度。此外,我们提出了一系列的任务平衡方法来解决这些不平衡问题。最后,我们分析了它们与这些任务均衡方法的性能之间的关系。基于跟踪的仿真结果表明,与任务聚类的实际实现相比,我们的方法可以显著提高两个广泛使用的工作流的运行时性能。
{"title":"Balanced Task Clustering in Scientific Workflows","authors":"Weiwei Chen, Rafael Ferreira da Silva, E. Deelman, R. Sakellariou","doi":"10.1109/eScience.2013.40","DOIUrl":"https://doi.org/10.1109/eScience.2013.40","url":null,"abstract":"Scientific workflows can be composed of many fine computational granularity tasks. The runtime of these tasks may be shorter than the duration of system overheads, for example, when using multiple resources of a cloud infrastructure. Task clustering is a runtime optimization technique that merges multiple short tasks into a single job such that the scheduling overhead is reduced and the overall runtime performance is improved. However, existing task clustering strategies only provide a coarse-grained approach that relies on an over-simplified workflow model. In our work, we examine the reasons that cause Runtime Imbalance and Dependency Imbalance in task clustering. Next, we propose quantitative metrics to evaluate the severity of the two imbalance problems respectively. Furthermore, we propose a series of task balancing methods to address these imbalance problems. Finally, we analyze their relationship with the performance of these task balancing methods. A trace-based simulation shows our methods can significantly improve the runtime performance of two widely used workflows compared to the actual implementation of task clustering.","PeriodicalId":325272,"journal":{"name":"2013 IEEE 9th International Conference on e-Science","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123301916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 53
期刊
2013 IEEE 9th International Conference on e-Science
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1