Data Decomposition in Biomedical e-Science Applications

2011 IEEE Seventh International Conference on e-Science Workshops Pub Date : 2011-12-05 DOI:10.1109/eScienceW.2011.7

Yassene Mohammed, Shayan Shahand, V. Korkhov, Angela C. M. Luyf, B. V. Schaik, M. Caan, A. V. Kampen, Magnus Palmblad, S. Olabarriaga

{"title":"Data Decomposition in Biomedical e-Science Applications","authors":"Yassene Mohammed, Shayan Shahand, V. Korkhov, Angela C. M. Luyf, B. V. Schaik, M. Caan, A. V. Kampen, Magnus Palmblad, S. Olabarriaga","doi":"10.1109/eScienceW.2011.7","DOIUrl":null,"url":null,"abstract":"As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer's experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"341 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Seventh International Conference on e-Science Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScienceW.2011.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer's experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

生物医学电子科学应用中的数据分解

随着电子科学的重点向第四范式和数据密集型科学转移，数据访问仍然依赖于所使用的电子科学基础设施的体系结构。这样的体系结构通常是作业驱动的，也就是说，(网格)作业是在同一工作节点上运行的一系列命令。使用基础设施涉及到并行化应用程序。这首先是通过数据分解完成的。在并行编程的一般实践中，数据分解取决于程序员对所使用的数据和算法/应用程序的经验和知识。另一方面，数据挖掘科学家已经建立了数据分解的基础，自动分解方法已经在使用，方法和模式已经定义。我们在将生物医学应用程序移植到荷兰e-Science基础设施方面的经验表明，用于获得并行性的数据分解在某种程度上适合数据挖掘分解模式的一个子组，即对象集分解。在本文中，我们讨论将三个生物医学包移植到网格计算环境中，两个用于医学成像，一个用于DNA测序。我们将展示如何围绕可执行文件重新设计应用程序的数据访问，以利用e-Science基础设施的并行能力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2011 IEEE Seventh International Conference on e-Science Workshops

自引率

0.00%

发文量

期刊最新文献

A Flexible Database-Centric Platform for Citizen Science Data Capture Parallel Scale-Transfer in Multiscale MD-FE Coupling Using Remote Memory Access Taxonomy of Multiscale Computing Communities Accelerating 3D Protein Modeling Using Cloud Computing: Using Rosetta as a Service on the IBM SmartCloud Skel: Generative Software for Producing Skeletal I/O Applications