Data Decomposition in Biomedical e-Science Applications

Yassene Mohammed, Shayan Shahand, V. Korkhov, Angela C. M. Luyf, B. V. Schaik, M. Caan, A. V. Kampen, Magnus Palmblad, S. Olabarriaga
{"title":"Data Decomposition in Biomedical e-Science Applications","authors":"Yassene Mohammed, Shayan Shahand, V. Korkhov, Angela C. M. Luyf, B. V. Schaik, M. Caan, A. V. Kampen, Magnus Palmblad, S. Olabarriaga","doi":"10.1109/eScienceW.2011.7","DOIUrl":null,"url":null,"abstract":"As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer's experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.","PeriodicalId":267737,"journal":{"name":"2011 IEEE Seventh International Conference on e-Science Workshops","volume":"341 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE Seventh International Conference on e-Science Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScienceW.2011.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

As the focus of e-Science is moving toward the forth paradigm and data intensive science, data access remains dependent on the architecture of the used e-Science infrastructure. Such architecture is in general job-driven, i.e., a (grid) job is a sequence of commands that run on the same worker node. Making use of the infrastructure involves having a parallelized application. This is done foremost by data decomposition. In general practice of parallel programming, data decomposition depends on the programmer's experience and knowledge about the used data and the algorithm/application. On the other hand, data mining scientists have an established foundation for data decomposition, automatic decomposition methods are already in use, methodologies and patterns are defined. Our experience in porting biomedical applications to the Dutch e-Science infrastructure shows that the used data decomposition to gain parallelism fit to some degree a subgroup of the data mining decomposition patterns, i.e., object set decomposition. In this paper we discuss porting three biomedical packages to a grid computing environment, two for medical imaging and one for DNA sequencing. We show how the data access of the applications was reengineered around the executables to make use of the parallel capacity of e-Science infrastructure.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
生物医学电子科学应用中的数据分解
随着电子科学的重点向第四范式和数据密集型科学转移,数据访问仍然依赖于所使用的电子科学基础设施的体系结构。这样的体系结构通常是作业驱动的,也就是说,(网格)作业是在同一工作节点上运行的一系列命令。使用基础设施涉及到并行化应用程序。这首先是通过数据分解完成的。在并行编程的一般实践中,数据分解取决于程序员对所使用的数据和算法/应用程序的经验和知识。另一方面,数据挖掘科学家已经建立了数据分解的基础,自动分解方法已经在使用,方法和模式已经定义。我们在将生物医学应用程序移植到荷兰e-Science基础设施方面的经验表明,用于获得并行性的数据分解在某种程度上适合数据挖掘分解模式的一个子组,即对象集分解。在本文中,我们讨论将三个生物医学包移植到网格计算环境中,两个用于医学成像,一个用于DNA测序。我们将展示如何围绕可执行文件重新设计应用程序的数据访问,以利用e-Science基础设施的并行能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A Flexible Database-Centric Platform for Citizen Science Data Capture Parallel Scale-Transfer in Multiscale MD-FE Coupling Using Remote Memory Access Taxonomy of Multiscale Computing Communities Accelerating 3D Protein Modeling Using Cloud Computing: Using Rosetta as a Service on the IBM SmartCloud Skel: Generative Software for Producing Skeletal I/O Applications
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1