{"title":"A method to find unique sequences on distributed genomic databases","authors":"K. Kurata, V. Breton, Hiroshi Nakamura","doi":"10.1109/CCGRID.2003.1199353","DOIUrl":null,"url":null,"abstract":"Thanks to the development of genetic engineering, various kinds of genomic information are being unveiled. Hence, it becomes feasible to analyze the entire genomic information all at once. On the other hand, the quantity of the genomic information stocked on databases is increasing day after day. In order to process the whole information, we have to develop an effective method to deal with lots of data. Therefore, it is indispensable not only to make an effective and rapid algorithm but also to use high-speed computer resource so as to analyze the biological information. For this purpose, as one of the most promised computing environments, the grid computing architecture has appeared recently. The European Data Grid (EDG) is one of the data-oriented grid computing environments [11]. In the field of bioinformatics, it is important to find unique sequences to succeed in molecular biological experiments [6]. Once unique sequences have been found they can be useful for target specific probes/primers design, gene sequence comparison and so on. In this paper, we propose a method to discover unique sequences from among genomic databases located in a distributed environment. Next, we implement this method upon the European Data Grid and show the calculation results for E. coli genomes.","PeriodicalId":433323,"journal":{"name":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-05-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CCGrid 2003. 3rd IEEE/ACM International Symposium on Cluster Computing and the Grid, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2003.1199353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Thanks to the development of genetic engineering, various kinds of genomic information are being unveiled. Hence, it becomes feasible to analyze the entire genomic information all at once. On the other hand, the quantity of the genomic information stocked on databases is increasing day after day. In order to process the whole information, we have to develop an effective method to deal with lots of data. Therefore, it is indispensable not only to make an effective and rapid algorithm but also to use high-speed computer resource so as to analyze the biological information. For this purpose, as one of the most promised computing environments, the grid computing architecture has appeared recently. The European Data Grid (EDG) is one of the data-oriented grid computing environments [11]. In the field of bioinformatics, it is important to find unique sequences to succeed in molecular biological experiments [6]. Once unique sequences have been found they can be useful for target specific probes/primers design, gene sequence comparison and so on. In this paper, we propose a method to discover unique sequences from among genomic databases located in a distributed environment. Next, we implement this method upon the European Data Grid and show the calculation results for E. coli genomes.
由于基因工程的发展,各种各样的基因组信息正在被揭示。因此,一次性分析整个基因组信息成为可能。另一方面,存储在数据库中的基因组信息的数量也在与日俱增。为了处理全部的信息,我们必须开发一种有效的方法来处理大量的数据。因此,对生物信息进行分析,不仅需要制定一个有效、快速的算法,而且需要利用高速的计算机资源。为此,网格计算体系结构作为最有希望的计算环境之一,近年来应运而生。欧洲数据网格(European Data Grid, EDG)是面向数据的网格计算环境之一[11]。在生物信息学领域,寻找独特的序列是分子生物学实验成功的关键[6]。一旦发现独特的序列,它们就可以用于目标特异性探针/引物设计,基因序列比较等。本文提出了一种从分布环境中的基因组数据库中发现独特序列的方法。接下来,我们在欧洲数据网格上实现了该方法,并给出了大肠杆菌基因组的计算结果。